glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	tests/geo-rep: Add geo-rep glusterd test cases	Kotresh HR	2019-06-04	3	-0/+223
\| \| \| \| \| \| \| \| \| \|	1. Add geo-rep fanout test case 2. Add glusterd geo-rep negative test cases 3. Add glusterd geo-rep config test cases Change-Id: I856c087eb3216d8f0ffd1f266deac88e9a4effec Signed-off-by: Kotresh HR <khiremat@redhat.com> updates: bz#1693692
*	tests/geo-rep: Remove a rename test case on EC volume	Kotresh HR	2019-06-04	2	-5/+5
\| \| \| \| \| \| \| \| \| \|	Rename with existing name testcase is occasionaly failing on EC volume. Hence commenting the same until it's analysed Change-Id: Icb2ad189b9e4d12101e8f5abcb8a033181360386 Signed-off-by: Kotresh HR <khiremat@redhat.com> updates: bz#1193929
*	glusterd: coverity fix	Mohit Agrawal	2019-06-04	1	-5/+11
\| \| \| \| \| \| \| \| \| \|	1401716: Resource leak 1401714: Dereference before null check updates: bz#789278 Change-Id: I8fb0b143a1d4b37ee6be7d880d9b5b84ba00bf36 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	glusterd/tier: gluster upgrade broken because of tier	hari gowtham	2019-06-03	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: While tier code was removed, the is_tier_enabled related to tier wasn't handled for upgrade. As this option was missing in the info file, the checksum mismatch issue happens during upgrade. This results in the peer rejections happening. Fix: use the op_version check and note down the is_tier_enabled always. This way it will be dummy key, but the future upgrades will work fine. NOTE: Just having the key from 3.10 to 7 will cause issues when upgraded from 5 to 8 or any such upgrade which skips the version where we handle it. Change-Id: I9951e2b74f16e58e884e746c34dcf53e559c7143 fixes: bz#1714973 Signed-off-by: hari gowtham <hgowtham@redhat.com>
*	lcov: improve line coverage	Amar Tumballi	2019-06-03	3	-117/+55
\| \| \| \| \| \| \| \| \| \| \| \|	upcall: remove extra variable assignment and use just one initialization. open-behind: reduce the overall number of lines, in functions not frequently called selinux: reduce some lines in init failure cases updates: bz#1693692 Change-Id: I7c1de94f2ec76a5bfe1f48a9632879b18e5fbb95 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	across: coverity fixes	Amar Tumballi	2019-06-03	5	-5/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* locks/posix.c: key was not freed in one of the cases. * locks/common.c: lock was being free'd out of context. * nfs/exports: handle case of missing free. * protocol/client: handle case of entry not freed. * storage/posix: handle possible case of double free CID: 1398628, 1400731, 1400732, 1400756, 1124796, 1325526 updates: bz#789278 Change-Id: Ieeaca890288bc4686355f6565f853dc8911344e8 Signed-off-by: Amar Tumballi <amarts@redhat.com> Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
*	posix: add storage.reserve-size option	Sheetal Pamecha	2019-06-03	6	-13/+138
\| \| \| \| \| \| \| \| \| \| \|	storage.reserve-size option will take size as input instead of percentage. If set, priority will be given to storage.reserve-size over storage.reserve. Default value of this option is 0. fixes: bz#1651445 Change-Id: I7a7342c68e436e8bf65bd39c567512ee04abbcea Signed-off-by: Sheetal Pamecha <sheetal.pamecha08@gmail.com>
*	glusterd: remove trivial conditions	Sanju Rakonde	2019-06-01	1	-4/+2
\| \| \| \| \| \| \|	updates: bz#1193929 Change-Id: Ieb5e35d454498bc389972f9f15fe46b640f1b97d Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	glusterd: Optimize code to copy dictionary in handshake code path	Mohit Agrawal	2019-05-31	7	-41/+187
\| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: While high no. of volumes are configured around 2000 glusterd has bottleneck during handshake at the time of copying dictionary Solution: To avoid the bottleneck serialize a dictionary instead of copying key-value pair one by one Change-Id: I9fb332f432e4f915bc3af8dcab38bed26bda2b9a fixes: bz#1711297 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	tests/geo-rep: Add tests to cover glusterd geo-rep	Kotresh HR	2019-05-31	1	-0/+3
\| \| \| \| \| \|	Change-Id: Ide59a3fde11b23f654b1ec03d72b4ec53b36a03b Signed-off-by: Kotresh HR <khiremat@redhat.com> updates: bz#1693692
*	glusterd/shd: Optimize the glustershd manager to send reconfigure	Mohammed Rafi KC	2019-05-31	2	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	Traditionally all svc manager will execute process stop and then followed by start each time when they called. But that is not required by shd, because the attach request implemented in the shd multiplex has the intelligence to check whether a detach is required prior to attaching the graph. So there is no need to send an explicit detach request if we are sure that the next call is an attach request Change-Id: I9157c8dcaffdac038f73286bcf5646a3f1d3d8ec fixes: bz#1710054 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	glusterfsd/cleanup: Protect graph object under a lock	Mohammed Rafi KC	2019-05-31	3	-28/+50
\| \| \| \| \| \| \| \| \| \| \|	While processing a cleanup_and_exit function, we are accessing a graph object. But this has not been protected under a lock. Because a parallel cleanup of a graph is quite possible which might lead to an invalid memory access Change-Id: Id05ca70d5b57e172b0401d07b6a1f5386c044e79 fixes: bz#1708926 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	tests/geo-rep: Add EC volume test case	Shwetha K Acharya	2019-05-31	2	-0/+447
\| \| \| \| \| \| \| \|	Added geo-rep regression tests with EC volume. fixes: bz#1650095 Change-Id: Ifb6e68e0a6103a98fced7f84d3088b8edf33d52f Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
*	lcov: more coverage to shard, old-protocol, sdfs	Amar Tumballi	2019-05-31	6	-6/+58
\| \| \| \| \| \|	updates: bz#1693692 Change-Id: If4c30572d4501d169bb4b0871c677d974515867c Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	glusterd/svc: Stop stale process using the glusterd_proc_stop	Mohammed Rafi KC	2019-05-31	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	While restarting a glusterd process, when we have a stale pid we were doing a simple kill. Instead we can use glusterd_proc_stop Because it has more logging plus force kill in case if there is any problem with kill signal handling. Change-Id: I4a2dadc210a7a65762dd714e809899510622b7ec updates: bz#1710054 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	glusterd/svc: glusterd_svcs_stop should call individual wrapper function	Mohammed Rafi KC	2019-05-31	2	-7/+15
\| \| \| \| \| \| \| \| \| \| \| \|	glusterd_svcs_stop should call individual wrapper function to stop a daemon rather than calling glusterd_svc_stop. For example for shd, it should call glusterd_shdsvc_stop instead of calling basic API function to stop. Because the individual functions for each daemon could be doing some specific operation in their wrapper function. Change-Id: Ie6d40590251ad470ef3901d1141ab7b22c3498f5 fixes: bz#1712741 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	glusterd: add an op-version check	Sanju Rakonde	2019-05-31	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: "gluster v status" is hung in heterogenous cluster when issued from a non-upgraded node. Cause: commit 34e010d64 fixes the txn-opinfo mem leak in op-sm framework by not setting the txn-opinfo if some conditions are true. When vol status is issued from a non-upgraded node, command is hanging in its upgraded peer as the upgraded node setting the txn-opinfo based on new conditions where as non-upgraded nodes are following diff conditions. Fix: Add an op-version check, so that all the nodes follow same set of conditions to set txn-opinfo. fixes: bz#1710159 Change-Id: Ie1f353212c5931ddd1b728d2e6949dfe6225c4ab Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	scripts: Find hung frames given a directory with statedumps	Pranith Kumar K	2019-05-30	1	-0/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Given a directory with statedumps captured at different times if there are any stacks that appear in multiple statedumps, it prints them. Sample output: glusterdump.25425.dump repeats=5 stack=0x7f53642cb968 pid=0 unique=0 lk-owner= glusterdump.25427.dump repeats=5 stack=0x7f85002cb968 pid=0 unique=0 lk-owner= glusterdump.25428.dump repeats=5 stack=0x7f962c2cb968 pid=0 unique=0 lk-owner= glusterdump.25428.dump repeats=2 stack=0x7f962c329f18 pid=60830 unique=0 lk-owner=88f50620967f0000 glusterdump.25429.dump repeats=5 stack=0x7f20782cb968 pid=0 unique=0 lk-owner= glusterdump.25472.dump repeats=5 stack=0x7f27ac2cb968 pid=0 unique=0 lk-owner= glusterdump.25473.dump repeats=5 stack=0x7f4fbc2cb9d8 pid=0 unique=0 lk-owner= NOTE: stacks with lk-owner=""/lk-owner=0000000000000000/unique=0 may not be hung frames and need further inspection fixes bz#1714415 Change-Id: Ib64a3fca63f49df2fafedcd4baa57e9b25411b08 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	stack: Make sure to have unique call-stacks in all cases	Pranith Kumar K	2019-05-30	5	-14/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	At the moment new stack doesn't populate frame->root->unique in all cases. This makes it difficult to debug hung frames by examining successive state dumps. Fuse and server xlators populate it whenever they can, but other xlators won't be able to assign 'unique' when they need to create a new frame/stack because they don't know what 'unique' fuse/server xlators already used. What we need is for unique to be correct. If a stack with same unique is present in successive statedumps, that means the same operation is still in progress. This makes 'finding hung frames' part of debugging hung frames easier. fixes bz#1714098 Change-Id: I3e9a8f6b4111e260106c48a2ac3a41ef29361b9e Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	glusterd: coverity fix	Sanju Rakonde	2019-05-30	1	-2/+0
\| \| \| \| \| \| \| \| \|	1401590: Deadcode updates: bz#789278 Change-Id: I3aa1d3aa9769e6990f74b6a53e288e788173c5e0 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	marker: remove some unused functions	Amar Tumballi	2019-05-30	7	-148/+8
\| \| \| \| \| \| \| \| \|	After basic analysis, found that these methods were not being used at all. updates: bz#1693692 Change-Id: If9cfa1ab189e6e7b56230c4e1d8e11f9694a9a65 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	tests: add tests for different signal handling	Amar Tumballi	2019-05-30	7	-15/+80
\| \| \| \| \| \| \| \| \| \| \|	Also some cleanup: * old-protocol.t was actually added to make sure we have line-coverage * first-test.t should have been removed as per the comment. It doesn't do anything. * add statvfs to rpc-coverage so we can cover statvfs in few xlators. updates: bz#1693692 Change-Id: Ie8651ce007de484c4abced16b4de765aa5e517be Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	If bind-address is IPv6 return it successfully	Amgad Saleh	2019-05-28	1	-6/+11
\| \| \| \| \| \|	Change-Id: Ibd37b6ea82b781a1a266b95f7596874134f30079 fixes: bz#1713730 Signed-off-by: Amgad Saleh <amgad.saleh@nokia.com>
*	glusterd: bulkvoldict thread is not handling all volumes	Mohit Agrawal	2019-05-27	2	-7/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In commit ac70f66c5805e10b3a1072bd467918730c0aeeb4 I missed one condition to populate volume dictionary in multiple threads while brick_multiplex is enabled.Due to that glusterd is not sending volume dictionary for all volumes to peer. Solution: Update the condition in code as well as update test case also to avoid the issue Change-Id: I06522dbdfee4f7e995d9cc7b7098fdf35340dc52 fixes: bz#1711250 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	tests: Add changelog api tests	Kotresh HR	2019-05-27	2	-0/+135
\| \| \| \| \| \|	updates: bz#1193929 Change-Id: Iee9aab8140882069165621189741f189fb2cc884 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	glusterd/tier: remove tier related code from glusterd	Hari Gowtham	2019-05-27	30	-5484/+126
\| \| \| \| \| \| \| \| \| \| \| \| \|	The handler functions are pointed to dummy functions. The switch case handling for tier also have been moved to point default case to avoid issues, if reintroduced. The tier changes in DHT still remain as such. updates: bz#1693692 Change-Id: I80d80c9a3eb862b4440a36b31ae82b2e9d92e4dc Signed-off-by: Hari Gowtham <hgowtham@redhat.com>
*	tests: Add history api tests	Kotresh HR	2019-05-27	5	-0/+171
\| \| \| \| \| \|	updates: bz#1193929 Change-Id: Ic26ab5277f720c734f083150c1c541763dfa64aa Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	gfapi:add missng api to increase code coverage	Sheetal Pamecha	2019-05-26	1	-18/+340
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	add test for async Read/Write combinations glfs_read_async/write_async glfs_pread_async/pwrite_async glfs_readv_async/writev_async glfs_preadv_async/pwritev_async ftruncate/ftruncate_async fsync/fsync_async fdatasync/fdatasync_async Updates: #655 Change-Id: I12beb97029fd60bce79650a376d8fcd8d383ef16 Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
*	api/glfsxmp.c: minor fixes	Sheetal Pamecha	2019-05-26	2	-63/+266
\| \| \| \| \| \| \| \| \| \| \|	* add more fops: f{get,set,list,remove}xattr(), access(), fstat(), fsetattr(), getxattr(), lgetxattr(), llistxattr(), lsetxattr(), fgetxattr() * handle some error cases (like volume not found) Updates: #655 Change-Id: I3334bdf3090eafd83a54e1be12036ea01b181089 Signed-off-by: Amar Tumballi <amarts@redhat.com> Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
*	Fix some "Null pointer dereference" coverity issues	Xavi Hernandez	2019-05-26	12	-17/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes the following CID's: * 1124829 * 1274075 * 1274083 * 1274128 * 1274135 * 1274141 * 1274143 * 1274197 * 1274205 * 1274210 * 1274211 * 1288801 * 1398629 Change-Id: Ia7c86cfab3245b20777ffa296e1a59748040f558 Updates: bz#789278 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	cluster/ec: honor contention notifications for partially acquired locks	Xavi Hernandez	2019-05-25	2	-1/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	EC was ignoring lock contention notifications received while a lock was being acquired. When a lock is partially acquired (some bricks have granted the lock but some others not yet) we can receive notifications from acquired bricks, which should be honored, since we may not receive more notifications after that. Since EC was ignoring them, once the lock was acquired, it was not released until the eager-lock timeout, causing unnecessary delays on other clients. This fix takes into consideration the notifications received before having completed the full lock acquisition. After that, the lock will be releaed as soon as possible. Fixes: bz#1708156 Change-Id: I2a306dbdb29fb557dcab7788a258bd75d826cc12 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	tests: Fix spurious failures in ta-write-on-bad-brick.t	Pranith Kumar K	2019-05-24	5	-17/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: afr_child_up_status_meta works only when LOOKUP on $M0 is successful. There are cases where quorum is not met and LOOKUP fails on $M0 which leads to failures similar to: grep: /mnt/glusterfs/0/.meta/graphs/active/patchy-replicate-0/private: Transport endpoint is not connected This was happening once in a while based on attribute-timeout and md-cache not serving the lookup. Fix: Find child-up status based on statedump instead. Also changed mount options to include --entry-timeout=0 and --attribute-timeout=0 updates bz#1193929 Change-Id: Ic0de72c3006d7399a5feb3e4d10d4748949b2ab3 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	tests: Test openfd heal doesn't truncate files	Pranith Kumar K	2019-05-24	2	-0/+218
\| \| \| \| \| \|	fixes bz#1706603 Change-Id: I0bfd30f787f157b7a54f71088f767ccfd7621208 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	glusterd: coverity fix	Sanju Rakonde	2019-05-23	1	-1/+1
\| \| \| \| \| \| \| \| \|	CID: 1401345 - Unused value updates: bz#789278 Change-Id: I6b8f2611151ce0174042384b7632019c312ebae3 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	geo-rep: Geo-rep help text issue	Shwetha K Acharya	2019-05-23	1	-2/+2
\| \| \| \| \| \| \| \| \|	Modified Geo-rep help text for better sanity. fixes: bz#1652887 Change-Id: I40ef7ef709eaecf0125ab4b4a7517e2c5d1ef4a0 Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
*	glusterd-utils.c: skip checksum when possible.	Yaniv Kaul	2019-05-23	1	-22/+18
\| \| \| \| \| \| \| \| \| \| \| \|	We only need to calculate and write the checksum in case of !is_quota_conf . Align the code in accordance. Also, use a smaller buffer (to write few chars). Change-Id: I40c83ce10447df77ff9975d314d768ec2c0087c2 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
*	cli: Fixed typos	N Balachandran	2019-05-23	1	-2/+2
\| \| \| \| \| \|	Change-Id: I14957c5161f31d5dfc6cf56f8d7ccf4d39372f39 fixes: bz#1711820 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	inode: fix wrong loop count in __inode_ctx_free	Xie Changlong	2019-05-23	1	-5/+6
\| \| \| \| \| \| \| \|	Avoid serious memory leak fixes: bz#1711240 Change-Id: Ic61a8fdd0e941e136c98376a87b5a77fa8c22316 Signed-off-by: Xie Changlong <xiechanglong@cmss.chinamobile.com>
*	cluster/dht: Lookup all files when processing directory	N Balachandran	2019-05-23	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A rebalance process currently only looks up files that it is supposed to migrate. This could cause issues when lookup-optimize is enabled as the dir layout can be updated with the commit hash before all files are looked up. This is expecially problematic of one of the rebalance processes fails to complete as clients will try to access files whose linkto files might not have been created. Each process will now lookup every file in the directory it is processing. Pros: Less likely that files will be inaccessible. Cons: More lookup requests sent to the bricks and a potential performance hit. Note: this does not handle races such as when a layout is updated on disk just as the create fop is sent by the client. Change-Id: I22b55846effc08d3b827c3af9335229335f67fb8 fixes: bz#1711764 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	lock: check null value of dict to avoid log flooding	Susant Palai	2019-05-23	1	-1/+1
\| \| \| \| \| \|	updates: bz#1712322 Change-Id: I120a1d23506f9ebcf88c7ea2f2eff4978a61cf4a Signed-off-by: Susant Palai <spalai@redhat.com>
*	ec/fini: Fix race with ec_fini and ec_notify	Mohammed Rafi KC	2019-05-21	6	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During a graph cleanup, we first sent a PARENT_DOWN and wait for a child down to ultimately free the xlator and the graph. In the ec xlator, we cleanup the threads when we get a PARENT_DOWN event. But a racing event like CHILD_UP or event xl_op may trigger healing threads after threads cleanup. So there is a chance that the threads might access a freed private variabe Change-Id: I252d10181bb67b95900c903d479de707a8489532 fixes: bz#1703948 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	tests/quick-read-with-upcall.t: increase the timeout	Amar Tumballi	2019-05-21	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Running with 2 second sleep at this place caused failures like: `not ok 14 [ 2014/ 7] < 41> 'test-message1 cat /mnt/glusterfs/1/test.txt' -> 'Got "test-message0" instead of "test-message1"'` in few runs in 100 iterations. But when increased to higher than sleep 3, have not seen any failures in 100 runs. While I don't know the exact reasons for the behavior yet, looks like this increase in wait helps to pass the regression without failures. updates: bz#1693692 Change-Id: I0610b79bea53e36de3eea6c11234b7fc9dfd6232 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	afr/frame: Destroy frame after afr_selfheal_entry_granular	Mohammed Rafi KC	2019-05-21	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \|	In function "afr_selfheal_entry_granular", after completing the heal we are not destroying the frame. This will lead to crash. when we execute statedump operation, where it tried to access xlator object. If this xlator object is freed as part of the graph destroy this will lead to an invalid memory access Change-Id: I0a5e78e704ef257c3ac0087eab2c310e78fbe36d fixes: bz#1708926 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	Revert "rpc: implement reconnect back-off strategy"	Amar Tumballi	2019-05-21	2	-18/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 59841f7e1ff0511b04884015441a181a56d07bea. This revert is done as a 'possible' fix for frequent regression failures, which are random in nature too (ie, different tests fails in different runs). Why exactly this patch? Because this patch seemed like most probable candidate which got merged in last 15days, and after which regressions are failing more often. Updates: bz#1711827 Change-Id: I35333162fcd4064f9609525ca93c666053c6d959
*	tests: change usleep() to sleep()	Sanju Rakonde	2019-05-16	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	While running a test case the following warning messages are seen on the display. To avoid suh warnings changing usleep() to sleep(). warning: usleep is deprecated, and will be removed in near future! warning: use "sleep 0.25" instead... updates: bz#1193929 Signed-off-by: Sanju Rakonde <srakonde@redhat.com> Change-Id: I48b79ede1c70b101f654635dd4cc83e50ea55b73
*	features/shard: Fix crash during background shard deletion in a specific case	Krutika Dhananjay	2019-05-16	4	-4/+164
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Consider the following case - 1. A file gets FALLOCATE'd such that > "shard-lru-limit" number of shards are created. 2. And then it is deleted after that. The unique thing about FALLOCATE is that unlike WRITE, all of the participant shards are resolved and created and fallocated in a single batch. This means, in this case, after the first "shard-lru-limit" number of shards are resolved and added to lru list, as part of resolution of the remaining shards, some of the existing shards in lru list will need to be evicted. So these evicted shards will be inode_unlink()d as part of eviction. Now once the fop gets to the actual FALLOCATE stage, the lru'd-out shards get added to fsync list. 2 things to note at this point: i. the lru'd out shards are only part of fsync list, so each holds 1 ref on base shard ii. and the more recently used shards are part of both fsync and lru list. So each of these shards holds 2 refs on base inode - one for being part of fsync list, and the other for being part of lru list. FALLOCATE completes successfully and then this very file is deleted, and background shard deletion launched. Here's where the ref counts get mismatched. First as part of inode_resolve()s during the deletion, the lru'd-out inodes return NULL, because they are inode_unlink()'d by now. So these inodes need to be freshly looked up. But as part of linking them in lookup_cbk (precisely in shard_link_block_inode()), inode_link() returns the lru'd-out inode object. And its inode ctx is still valid and ctx->base_inode valid from the last time it was added to list. But shard_common_lookup_shards_cbk() passes NULL in the place of base_pointer to __shard_update_shards_inode_list(). This means, as part of adding the lru'd out inode back to lru list, base inode is not ref'd since its NULL. Whereas post unlinking this shard, during shard_unlink_block_inode(), ctx->base_inode is accessible and is unref'd because the shard was found to be part of LRU list, although the matching ref didn't occur. This at some point leads to base_inode refcount becoming 0 and it getting destroyed and released back while some of its associated shards are continuing to be unlinked in parallel and the client crashes whenever it is accessed next. Fix is to pass base shard correctly, if available, in shard_link_block_inode(). Also, the patch fixes the ret value check in tests/bugs/shard/shard-fallocate.c Change-Id: Ibd0bc4c6952367608e10701473cbad3947d7559f Updates: bz#1696136 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
*	geo-rep: Convert gfid conflict resolutiong logs into debug	Kotresh HR	2019-05-14	1	-9/+12
\| \| \| \| \| \| \| \| \| \| \| \|	The gfid conflict resolution code path is not supposed to hit in generic code path. But few of the heavy rename workload (BUG: 1694820) makes it a generic case. So logging the entries to be fixed as INFO floods the log in these particular workloads. Hence convert them to DEBUG. fixes: bz#1709653 Change-Id: I4d5e102b87be5fe5b54f78f329e588882d72b9d9 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	geo-rep: Fix sync hang with tarssh	Kotresh HR	2019-05-13	3	-4/+163
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Geo-rep sync hangs when tarssh is used as sync engine at heavy workload. Analysis and Root cause: It's found out that the tar process was hung. When debugged further, it's found out that stderr buffer of tar process on master was full i.e., 64k. When the buffer was copied to a file from /proc/pid/fd/2, the hang is resolved. This can happen when files picked by tar process to sync doesn't exist on master anymore. If this count increases around 1k, the stderr buffer is filled up. Fix: The tar process is executed using Popen with stderr as PIPE. The final execution is something like below. tar \| ssh <args> root@slave tar --overwrite -xf - -C <path> It was waiting on ssh process first using communicate() and then tar. Note that communicate() reads stdout and stderr. So when stderr of tar process is filled up, there is no one to read until untar via ssh is completed. This can't happen and leads to deadlock. Hence we should be waiting on both process parallely, so that stderr is read on both processes. Change-Id: I609c7cc5c07e210c504771115b4d551a2e891adf fixes: bz#1707728 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	ec/shd: Cleanup self heal daemon resources during ec fini	Mohammed Rafi KC	2019-05-13	6	-13/+124
\| \| \| \| \| \| \| \| \| \|	We were not properly cleaning self-heal daemon resources during ec fini. With shd multiplexing, it is absolutely necessary to cleanup all the resources during ec fini. Change-Id: Iae4f1bce7d8c2e1da51ac568700a51088f3cc7f2 fixes: bz#1703948 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	rpc: implement reconnect back-off strategy	Xavier Hernandez	2019-05-11	2	-16/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a connection failure happens, gluster tries to reconnect every 3 seconds. In some cases the failure is spurious, so a delay of 3 seconds could be unnecessarily long. This patch implements a back-off strategy that tries a reconnect as soon as 1 tenth of a second. If this fails, the time is doubled until it's around 3 seconds. After that, the reconnect is attempted every 3 seconds as before. Change-Id: Icb3fbe20d618f50cbbb599dce542b4e871c22149 Updates: bz#1193929 Signed-off-by: Xavier Hernandez <xhernandez@redhat.com>