glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	glusterd: bypass add-brick validation with force	Atin Mukherjee	2017-01-18	2	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit c916a2f added a validation to restrict add-brick operation if a replica configuration is changed and any of the bricks belonging to the volume is down. However we should bypass this validation with a force option if users really want to have add-brick to go through at the sake of the corner cases of data loss issue. The original problem of add-brick getting failed when layout is not set will still be a problem with a force option as the issue has to be taken care in the DHT layer. Change-Id: I0ed3df91ea712f77674eb8afc6fdfa577f25a7bb BUG: 1406411 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/16358 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	tier : Tier as a service	hari gowtham	2017-01-16	8	-14/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tierd is implemented by separating from rebalance process. The commands affected: 1) Attach tier will trigger this process instead of old one 2) tier start and tier start force will also trigger this process. 3) volume status [tier] will show tier daemon as a process instead of task and normal tier status and tier detach status works. 4) tier stop implemented. 5) detach tier implemented separately along with new detach tier status 6) volume tier volname status will work using the changes. 7) volume set works This patch has separated the tier translator from the legacy DHT rebalance code. It now sends the RPCs from the CLI to glusterd separate to the DHT rebalance code. The daemon is now a service, similar to the snapshot daemon, and can be viewed using the volume status command. The code for the validation and commit phase are the same as the earlier tier validation code in DHT rebalance. The “brickop” phase has been changed so that the status command can use this framework. The service management framework is now used. DHT rebalance does not use this framework. This service framework takes care of : ) spawning the daemon, killing it and other such processes. ) volume set options , which are written on the volfile. ) restart and reconfigure functions. Restart is to restart the daemon at two points 1)after gluster goes down and comes up. 2) to stop detach tier. ) reconfigure is used to make immediate volfile changes. By doing this, we don’t restart the daemon. it has the code to rewrite the volfile for topological changes too (which comes into place during add and remove brick). With this patch the log, pid, and volfile are separated and put into respective directories. Change-Id: I3681d0d66894714b55aa02ca2a30ac000362a399 BUG: 1313838 Signed-off-by: hari gowtham <hgowtham@redhat.com> Reviewed-on: http://review.gluster.org/13365 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: hari gowtham <hari.gowtham005@gmail.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
*	tests/distribute: remove bug-1063230.t	N Balachandran	2017-01-13	1	-29/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	bug-1063230.t does not add value and has diverged from the original BZ it was supposed to test. Change-Id: I13ae1c68c276233dd53548d1333e3eb4b936785d BUG: 1412467 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/16379 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	readdir-ahead : Perform STACK_UNWIND outside of mutex locks	Poornima G	2017-01-09	1	-0/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently STACK_UNWIND is performnd within ctx->lock. If readdir-ahead is loaded as a child of dht, then there can be scenarios where the function calling STACK_UNWIND becomes re-entrant. Its a good practice to not call STACK_WIND/UNWIND within local mutex's Change-Id: If4e869849d99ce233014a8aad7c4d5eef8dc2e98 BUG: 1401812 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/16068 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
*	tests: Remove rpm test	Nigel Babu	2017-01-06	1	-145/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is already tested by our smoke test. Testing it again is redundant. Change-Id: Icae4e8650002cd847dcb7ea76fd0447df7e72816 BUG: 1408755 Signed-off-by: Nigel Babu <nigelb@redhat.com> Reviewed-on: http://review.gluster.org/16287 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Anoop C S <anoopcs@redhat.com> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
*	posix: make sure atime and mtime are set when calling lutimes()	Niels de Vos	2017-01-06	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When overwriting an existing file with O_TRUNC, the 'atime' was set to 0, meaning the Epoch (01-Jan-1970 UTC). However, the 'mtime' gets updated correcty. In case 'atime' or 'mtime' is not passed in the 'struct iatt', the time values passed to the systemcall are taken from the current values are returned by lstat(). Change-Id: I7021b7161dcd6c9a3e515d98f6d4847533c434b3 BUG: 1401777 Reported-by: Eivind Sarto <eivindsarto@gmail.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/16034 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
*	glusterd: Fail add-brick on replica count change, if brick is down	karthik-us	2017-01-06	2	-3/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: 1. Have a replica 2 volume with bricks b1 and b2 2. Before setting the layout, b1 goes down 3. Set the layout write some data, which gets populated on b2 4. b2 goes down then b1 comes up 5. Add another brick b3, and heal will take place from b1 to b3, which basically have no data 6. Write some data. Both b1 and b3 will mark b2 for pending writes 7. b1 goes down, and b2 comes up 8. b2 gets heald from b1. During heal it removes the data which is already in b2, considering that as stale data. This leads to data loss. Solution: 1. In glusterd stage-op, while adding bricks, check whether the replica count is being increased 2. If yes, then check whether any of the bricks are down at that time 3. If yes, then fail the add-brick to avoid such data loss 4. Else continue the normal operation. This check will work enen when we convert plain distribute volume to replicate Test: 1. Create a replica 2 volume 2. Kill one brick from the volume 3. Try adding a brick to the volume 4. It should fail with all bricks are not up error 5. Cretae a distribute volume and kill one of the brick 6. Try to convert it to replicate volume, by adding bricks. 7. This should also fail. Change-Id: I9c8d2ab104263e4206814c94c19212ab914ed07c BUG: 1406411 Signed-off-by: karthik-us <ksubrahm@redhat.com> Reviewed-on: http://review.gluster.org/16330 Tested-by: Ravishankar N <ravishankar@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
*	tests: Remove tests/distaf	Nigel Babu	2017-01-05	38	-8580/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We will not be using it in the future BUG: 1408131 Change-Id: Idebaaadb06786eebc08969ce0cc15a9df4bd9f42 Signed-off-by: Nigel Babu <nigelb@redhat.com> Reviewed-on: http://review.gluster.org/16270 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	tests: Fix split-brain-favorite-child-policy.t failures	Ravishankar N	2017-01-02	2	-4/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In CentOS-7, the file was receving an extra removexattr(security.ima) FOP which changed its ctime, breaking the assumption that a particular brick had the latest ctime based on the writevs done in the .t Fix: 1. Compare the ctime of both files in the backend and pick the one with the latest ctime for the fav-child policy. Also unmount the volume before comparing, to avoid any further FOPS on the file that can possibly modify the timestamps. 2. Added floating point handling in stat function. Thanks to Pranith for the helping debugging the regex. Change-Id: I06041a0f39a29d2593b867af8685d65c7cd99150 BUG: 1408757 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/16288 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	tests: fix tests/bugs/glusterd/bug-913555.t spurious failures	Atin Mukherjee	2017-01-01	1	-5/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Mainly replaced EXPECT instances with EXPECT_WITHIN Change-Id: If48f444f6b2ba6713fdc5e31ff3a642092e62ada BUG: 1408758 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/16289 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
*	glusterd, cli: Get global options through volume get functionality	Samikshan Bairagya	2016-12-30	1	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently it is not possible to retrieve values of global options by using the 'gluster volume get' functionality if there are no volumes present. In order to get the global options one has to use 'gluster volume get' with a specific volume name. This usage makes the illusion as though the option is set only on one volume, which is incorrect. When setting the global options, 'gluster volume set' provides a way to set them using the volume name as 'all'. Similarly, retrieving the global options should be made possible by using the volume name 'all' with the 'gluster volume get' functionality. This patch adds that functionality to 'volume get' Usage: # gluster volume get all <OPTION/all> Change-Id: Ic2fdb9eda69d4806d432dae26d117d9660fe6d4e BUG: 1378842 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: http://review.gluster.org/15563 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
*	Repair EC tests for FB environment (/mnt/gvfs is teh sux00r)	Kevin Vigor	2016-12-29	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Don't blindly 'df', target the volume in question. (this is because FB build environment has a /mnt/gvfs which fails df). Test Plan: runtests.sh Reviewers: Subscribers: Tasks: Blame Revision: Change-Id: Ic2c5883dd102835db64be9594657257e20711ba0 BUG: 1406878 Signed-off-by: Kevin Vigor <kvigor@fb.com> Reviewed-on-3.8-fb: http://review.gluster.org/16182 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Tested-by: Shreyas Siravara <sshreyas@fb.com> Reviewed-by: Shreyas Siravara <sshreyas@fb.com> Reviewed-on: http://review.gluster.org/16226 Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
*	cluster/afr: Fix missing name indices due to EEXIST error	Krutika Dhananjay	2016-12-27	2	-0/+88
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PROBLEM: Consider a volume with granular-entry-heal and sharding enabled. When a replica is down and a shard is created as part of a write, the name index is correctly created under indices/entry-changes/<dot-shard-gfid>. Now when a read on the same region triggers another MKNOD, the fop fails on the online bricks with EEXIST. By virtue of this being a symmetric error, the failed_subvols[] array is reset to all zeroes. Because of this, before post-op, the GF_XATTROP_ENTRY_OUT_KEY will be set, causing the name index, which was created in the previous MKNOD operation, to be wrongly deleted in THIS MKNOD operation. FIX: The ideal fix would have been for a transaction to delete the name index ONLY if it knows it is the one that created the index in the first place. This would involve gathering information as to whether THIS xattrop created the index from individual bricks, aggregating their responses and based on the various posisble combinations of responses, decide whether to delete the index or not. This is rather complex. Simpler fix would be for post-op to examine local->op_ret in the event of no failed_subvols to figure out whether to delete the name index or not. This can occasionally lead to creation of stale name indices but they won't be affecting the IO path or mess with pending changelogs in any way and self-heal in its crawl of "entry-changes" directory would take care to delete such indices. Change-Id: Ic1b5257f4dc9c20cb740a866b9598cf785a1affa BUG: 1408712 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/16286 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	tests: Fix spurious failure in tests/bugs/replicate/bug-1402730.t	Krutika Dhananjay	2016-12-21	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replace the EXPECT '00000001' with EXPECT_NOT '00000000'. This is because occasionally a name-heal is performing new-entry marking on 'c' causing the pending entry changelog on it to become '00000002'. Change-Id: I30916e6266534d18899cfa5771c892db8c51ad9a BUG: 1405902 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/16193 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	tests: Fix spurious failure in bug-1402841.t-mt-dir-scan-race.t	Krutika Dhananjay	2016-12-19	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Check that shd is up before executing 'volume heal' command Change-Id: Ib510a5de06d732fd3a738e90fa16376698479897 BUG: 1405554 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/16169 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
*	tests: Move tests/basic/gfapi/bug1291259.t to bad tests list	Krutika Dhananjay	2016-12-19	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	See thread http://www.gluster.org/pipermail/gluster-devel/2016-December/051714.html for more information. Change-Id: I9abe4b0e40499e53c1276a10a6bc192fd0f2cef7 BUG: 1405301 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/16159 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	tests: Fix spurious test failure in bug-1316437.t	Rajesh Joseph	2016-12-16	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After sending SIGTERM to gluster process we immediately check if process exited. We should wait for some time before checking process state. BUG: 1404573 Change-Id: Iaba0067f6e880a7fe38e11b9fa0fe9bd103b19e2 Signed-off-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-on: http://review.gluster.org/16162 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Avra Sengupta <asengupt@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	glfsheal: Explicitly enable self-heal xlator options	Ravishankar N	2016-12-15	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Enable data, metadata and entry self-heal as xlator-options so that glfs-heal.c can heal split-brain files even if they are disabled on the volume via volume set commands. Change-Id: Ic191a1017131db1ded94d97c932079d7bfd79457 BUG: 1234054 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/11333 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	access_control : address O_TRUNC and O_APPEND flag properly in posix_acl_open	Jiffin Tony Thottan	2016-12-14	2	-0/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In posix_acl_open, in switch value passed is (flag & O_ACCMODE). The value for O_ACCMODE is 0003, so the result will always be less than or equal to 3. But value for O_TRUNC is 01000 and O_APPEND is 02000, so it is not right to check it in switch case Change-Id: Ia17db80a6a5f681c35e08e062d384f33ef7e0354 BUG: 1387241 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> Reviewed-on: http://review.gluster.org/15688 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
*	cluster/afr: Fix per-txn optimistic changelog initialisation	Krutika Dhananjay	2016-12-12	1	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Incorrect initialisation of local->optimistic_change_log was leading to skipped pre-op and post-op even when a brick didn't participate in the txn because it was down. The result - missing granular name index resulting in some entries never getting healed. FIX: Initialise local->optimistic_change_log just before pre-op. Also fixed granular entry heal to create the granular name index in pre-op as opposed to post-op. This is to prevent loss of granular information when during an entry txn, the good (src) brick goes offline before the post-op is done. This would cause self-heal to do conservative merge (since dirty xattr is the only information available), which when granular-entry-heal is enabled, expects granular indices, the lack of which can lead to loss of data in the worst case. Change-Id: Ia3ad716d6fb1821555f02180e86e8711a79f958d BUG: 1402730 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/16075 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	syncop: fix conditional wait bug in parallel dir scan	Ravishankar N	2016-12-09	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: The issue as seen by the user is detailed in the BZ but what is happening is if the no. of items in the wait queue == max-qlen, syncop_mt_dir_scan() does a pthread_cond_wait until the launched synctask workers dequeue the queue. But if for some reason the worker fails, the queue is never emptied due to which further invocations of syncop_mt_dir_scan() are blocked forever. Fix: Made some changes to _dir_scan_job_fn - If a worker encounters error while processing an entry, notify the readdir loop in syncop_mt_dir_scan() of the error but continue to process other entries in the queue, decrementing the qlen as and when we dequeue elements, and ending only when the queue is empty. - If the readdir loop in syncop_mt_dir_scan() gets an error form the worker, stop the readdir+queueing of further entries. Change-Id: I39ce073e01a68c7ff18a0e9227389245a6f75b88 BUG: 1402841 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/16073 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	uss: snapd should enable SSL if SSL is enabled on volume	Rajesh Joseph	2016-12-01	1	-0/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During snapd graph generation we should check if SSL is enabled on main volume or not. This is because clients will communicate with snapd as if it is communicating to a brick. Change-Id: I0d7fe86c567b297a8528a48faf06161d4c3cb415 Signed-off-by: Rajesh Joseph <rjoseph@redhat.com> BUG: 1400013 Reviewed-on: http://review.gluster.org/15979 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kaushal M <kaushal@redhat.com>
*	libglusterfs: Fix a read hang	Poornima G	2016-11-28	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Issue: ===== In certain cases, there was no unwind of read from read-ahead xlator, thus resulting in hang. RCA: ==== In certain cases, ioc_readv() issues STACK_WIND_TAIL() instead of STACK_WIND(). One such case is when inode_ctx for that file is not present (can happen if readdirp was called, and populates md-cache and serves all the lookups from cache). Consider the following graph: ... io-cache (parent) \| readdir-ahead \| read-ahead ... Below is the code snippet of ioc_readv calling STACK_WIND_TAIL: ioc_readv() { ... if (!inode_ctx) STACK_WIND_TAIL (frame, FIRST_CHILD (frame->this), FIRST_CHILD (frame->this)->fops->readv, fd, size, offset, flags, xdata); /* Ideally, this stack_wind should wind to readdir-ahead:readv() but it winds to read-ahead:readv(). See below for explaination. / ... } STACK_WIND_TAIL (frame, obj, fn, ...) { frame->this = obj; / for the above mentioned graph, frame->this will be readdir-ahead * frame->this = FIRST_CHILD (frame->this) i.e. readdir-ahead, which * is as expected / ... THIS = obj; / THIS will be read-ahead instead of readdir-ahead!, as obj expands * to "FIRST_CHILD (frame->this)" and frame->this was pointing * to readdir-ahead in the previous statement. / ... fn (frame, obj, params); / fn will call read-ahead:readv() instead of readdir-ahead:readv()! * as fn expands to "FIRST_CHILD (frame->this)->fops->readv" and * frame->this was pointing ro readdir-ahead in the first statement / ... } Thus, the readdir-ahead's readv() implementation will be skipped, and ra_readv() will be called with frame->this = "readdir-ahead" and this = "read-ahead". This can lead to corruption / hang / other problems. But in this perticular case, when 'frame->this' and 'this' passed to ra_readv() doesn't match, it causes ra_readv() to call ra_readv() again!. Thus the logic of read-ahead readv() falls apart and leads to hang. Solution: ========= Modify STACK_WIND_TAIL() as: STACK_WIND_TAIL (frame, obj, fn, ...) { next_xl = obj / resolve obj as the variables passed in obj macro can be overwritten in the further instrucions / next_xl_fn = fn / resolve fn and store in a tmp variable, before modifying any variables */ frame->this = next_xl; ... THIS = next_xl; ... next_xl_fn (frame, next_xl, params); ... } As a part of http://review.gluster.org/15901/ the caller io-cache was fixed. BUG: 1388292 Change-Id: Ie662ac8f18fa16909376f1e59387bc5b886bd0f9 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15923 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
*	cluster/afr: CLI for granular entry heal enablement/disablement	Krutika Dhananjay	2016-11-28	6	-7/+149
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When there are already existing non-granular indices created that are yet to be healed, if granular-entry-heal option is toggled from 'off' to 'on', AFR self-heal whenever it kicks in, will try to look for granular indices in 'entry-changes'. Because of the absence of name indices, granular entry healing logic will fail to heal these directories, and worse yet unset pending extended attributes with the assumption that are no entries that need heal. To get around this, a new CLI is introduced which will invoke glfsheal program to figure whether at the time an attempt is made to enable granular entry heal, there are pending heals on the volume OR there are one or more bricks that are down. If either of them is true, the command will be failed with the appropriate error. New CLI: gluster volume heal <VOL> granular-entry-heal {enable,disable} Change-Id: I1f4fe8162813b9068e198965d94169fee4adc099 BUG: 1370410 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/15747 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
*	afr: allow I/O when favorite-child-policy is enabled	Ravishankar N	2016-11-27	1	-0/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Currently, I/O on a split-brained file fails even when the favorite-child-policy is set until the self-heal is complete. Fix: If a valid 'source' is found using the set favorite-child-policy, inspect and reset the afr pending xattrs on the 'sinks' (inside appropriate locks), refresh the inode and then proceed with the read or write transaction. The resetting itself happens in the self-heal code and hence can also happen in the client side background-heal or by the shd's index-heal in addition to the txn code path explained above. When it happens in via heal, we also add checks in undo-pending to not reset the sink xattrs again. Change-Id: Ic8c1317720cb26bd114b6fe6af4e58c73b864626 BUG: 1386188 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reported-by: Simon Turcotte-Langevin <simon.turcotte-langevin@ubisoft.com> Reviewed-on: http://review.gluster.org/15673 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	cluster/afr: Fix bugs in [f]inodelk/[f]entrylk	Pranith Kumar K	2016-11-26	1	-0/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problems: 1) Inodelk is not taking quorum into account 2) finodelk, [f]entrylk are not implemented correctly 3) By default afr doesn't go for non-blocking parallel locks. Fix: Implemented a common framework which can be used by [f]inodelk/[f]entrylk. Used quorum for the same. Change-Id: I239f13875a065298630d266941df10cfa3addc85 BUG: 1369077 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15802 Tested-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
*	features/index: Delete granular entry indices of already healed directories ↵	Krutika Dhananjay	2016-11-24	1	-0/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	during crawl If granular name indices are already in existence for a volume, and before they are healed, granular entry heal be disabled, a crawl on indices/xattrop will clear the changelogs on these directories. When their corresponding entry-changes indices are crawled subsequently, if it is found that the directories don't need heal anymore, the granular indices are not cleaned up. This patch fixes that problem by ensuring that the zero-xattrop also deletes the stale indices at the level of index translator. Change-Id: Ifbaa6bec2a14e3041addfee4054131babbf4d35e BUG: 1370410 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/15880 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	io-cache: Fix a read hang	Poornima G	2016-11-23	2	-0/+157
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Issue: ===== In certain cases, there was no unwind of read from read-ahead xlator, thus resulting in hang. RCA: ==== In certain cases, ioc_readv() issues STACK_WIND_TAIL() instead of STACK_WIND(). One such case is when inode_ctx for that file is not present (can happen if readdirp was called, and populates md-cache and serves all the lookups from cache). Consider the following graph: ... io-cache (parent) \| readdir-ahead \| read-ahead ... Below is the code snippet of ioc_readv calling STACK_WIND_TAIL: ioc_readv() { ... if (!inode_ctx) STACK_WIND_TAIL (frame, FIRST_CHILD (frame->this), FIRST_CHILD (frame->this)->fops->readv, fd, size, offset, flags, xdata); /* Ideally, this stack_wind should wind to readdir-ahead:readv() but it winds to read-ahead:readv(). See below for explaination. / ... } STACK_WIND_TAIL (frame, obj, fn, ...) { frame->this = obj; / for the above mentioned graph, frame->this will be readdir-ahead * frame->this = FIRST_CHILD (frame->this) i.e. readdir-ahead, which * is as expected / ... THIS = obj; / THIS will be read-ahead instead of readdir-ahead!, as obj expands * to "FIRST_CHILD (frame->this)" and frame->this was pointing * to readdir-ahead in the previous statement. / ... fn (frame, obj, params); / fn will call read-ahead:readv() instead of readdir-ahead:readv()! * as fn expands to "FIRST_CHILD (frame->this)->fops->readv" and * frame->this was pointing ro readdir-ahead in the first statement / ... } Thus, the readdir-ahead's readv() implementation will be skipped, and ra_readv() will be called with frame->this = "readdir-ahead" and this = "read-ahead". This can lead to corruption / hang / other problems. But in this perticular case, when 'frame->this' and 'this' passed to ra_readv() doesn't match, it causes ra_readv() to call ra_readv() again!. Thus the logic of read-ahead readv() falls apart and leads to hang. Solution: ========= Ideally, STACK_WIND_TAIL() should be modified as: STACK_WIND_TAIL (frame, obj, fn, ...) { next_xl = obj / resolve obj as the variables passed in obj macro can be overwritten in the further instrucions / next_xl_fn = fn / resolve fn and store in a tmp variable, before modifying any variables */ frame->this = next_xl; ... THIS = next_xl; ... next_xl_fn (frame, next_xl, params); ... } But for this solution, knowing the type of variable 'next_xl_fn' is a challenge and is not easy. Hence just modifying all the existing callers to pass "FIRST_CHILD (this)" as obj, instead of "FIRST_CHILD (frame->this)". Change-Id: I179ffe3d1f154bc5a1935fd2ee44e912eb0fbb61 BUG: 1388292 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15901 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	marker: Fix inode value in loc, in setxattr fop	Poornima G	2016-11-17	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On recieving a rename fop, marker_rename() stores the, oldloc and newloc in its 'local' struct, once the rename is done, the xtime marker(last updated time) is set on the file, but sending a setxattr fop. When upcall receives the setxattr fop, the loc->inode is NULL and it crashes. The loc->inode can be NULL only in one valid case, i.e. in rename case where the inode of new loc can be NULL. Hence, marker should have filled the inode of the new_loc before issuing a setxattr. Change-Id: Id638f678c3daaf4a5c29b970b58929d377ae8977 BUG: 1394131 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15826 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
*	cli/rebalance: remove brick status is incorrect	N Balachandran	2016-11-17	1	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a remove brick operation is preceded by a fix-layout, running remove-brick status on a node which does not contain any of the bricks that were removed displays fix-layout status. The defrag_cmd variable was not updated in glusterd for the nodes not hosting removed bricks causing the status parsing to go wrong. This is now updated. Also made minor modifications to the spacing in the fix-layout status output. Change-Id: Ib735ce26be7434cd71b76e4c33d9b0648d0530db BUG: 1389697 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/15749 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
*	jbr: Sending rollback from failed fop to fdl	Avra Sengupta	2016-11-08	5	-23/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In case of a failed fop, the failure is detected by the leader in the jbr-server in two places. First during a quorum check of +ve responses when it receives responses from all the followers. At this point if the fop hasn't been successfully journaled at a quorum of followers (as in there is no merit in trying the fop in the leader as the quorum will never be met), then we fail the fop. Also if this quorum is met, then the fop is tried on the leader, and after the leader completes the fop a quorum check similar to the previous one is done again, this time including the leaders outcome. If quorum is not met, then we fail the fop. In both these cases, when the fop fails we send a -ve ack to the client. With this patch, now we will also send a rollback through a GF_FOP_IPC to all the followers(and also to the leader in the second case of failure). This rollback will contain the index and term number of the fop which failed. This will be recorded in the respective journals of the bricks and will be used to rollback the fop on that brick later. A subsequent write, and it's respective rollback would look something like the following in the journal. The trusted.jbr.term and trusted.jbr.index present in the dict of both the logs, relate them, and the presence of "rollback-fop" in the dict of IPC indicates that it is a rollback fop, and the value 13(stands for GF_FOP_WRITE) indicates what kind of rollback operation it is. === GF_FOP_WRITE fd = <gfid 77f12ea2-ca56-40e3-a46e-ba2308baa035> vector = <158 bytes> offset = 0 (0x0) flags = 32769 (0x8001) xdata = dict { trusted.jbr.term = 0 <2 bytes> trusted.jbr.index = 4 <2 bytes> } === GF_FOP_IPC xdata = dict { trusted.jbr.term = 0 <2 bytes> trusted.jbr.index = 4 <2 bytes> rollback-fop = 13 <3 bytes> } Change-Id: I70b6a143d20697153d58e2f719e34ecd1ed160a5 BUG: 1349385 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/14783 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
*	gfapi: async fops should unref in callbacks	Raghavendra Talur	2016-11-05	2	-0/+203
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If fd is unref'd at the end of async call then the unref in cbks would lead to double unref and possible crash. Removing duplicate unrefs. Added unref only in failure cases. A simple test case has been added to test async write case. Need to extend the same for other async APIs too. Details: All glfd based calls in libgfapi, except for glfs_open and glfs_close, behave in the same way. At the start of the operation, they take a ref on glfd and fd. At the end of the operation, they unref it. Async calls are a little different as they unref in the cbk function. A successfull open call does not unref either the glfd or fd, thereby functioning as a reference for a OPEN file object. glfs_close makes a syncop_flush call sandwiched between a fd ref and unref(this can be removed, more on this below), followed by a call to glfs_mark_glfd_for_deletion which unrefs glfd and also calls glfs_fd_destroy as a release function thereby doing a unref on fd too. Functionally, there is no problem with how everything works when as described above. However, it is a little non-intuitive that we need to perform a fd_unref as a consequence of a implicit fd_ref that happens within glfs_resolve_fd. As we perform a GF_REF_GET(glfd) at the start of every operation, it would be worthwhile to remove the fd_ref that glfs_resovle_fd takes and do away with explicit fd_unref()s at the end of every operation. This is the same reason why we don't need the fd_ref in glfs_close. This is however not in the scope of this patch. Change-Id: I86b1d3b2ad846b16ea527d541dc82b5e90b0ba85 BUG: 1391086 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: http://review.gluster.org/15768 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-by: soumya k <skoduri@redhat.com> Reviewed-by: Prasanna Kumar Kalever <pkalever@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
*	snapshot: Fix the failure to recreate clones with same name	Avra Sengupta	2016-11-01	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The brick path of snapshot clones contained the clonename, thereby failing to create newer clones with the same name after the original clone had been deleted. This fix creates the brick path with the clone's vol id instead of the clones name. Hence future clones with the same name will not have the namespace clash. Change-Id: I262712adc576122f051b5d1ce171d020efaefd1a BUG: 1387160 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/15683 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
*	tests: fix cleanup in bug-1110262.t	Jeff Darcy	2016-10-28	1	-7/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The test wasn't cleaning up the user/group it had created if it terminated abnormally. We have a mechanism (push_trapfunc) to add cleanup actions in a way that ensures they'll be run when necessary, so I changed the test to use it. While I was there, I fixed it to use kill_brick instead of "ps\|grep\|kill" because that will be necessary for it to pass with brick multiplexing anyway. Change-Id: Ia515bd2420050f922970d28c5856c55df9b5247b Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/15744 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	afr,ec: Heal device files with correct major, minor numbers	Pranith Kumar K	2016-10-26	2	-11/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Thanks a lot to xiaoping.wu@nokia.com from Nokia for the bug and the fix. BUG: 1384297 Change-Id: Ie443237e85d34633b5dd30f85eaa2ac34e45754c Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15728 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	compound fops: Fix file corruption issue	Krutika Dhananjay	2016-10-24	1	-0/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. Address of a local variable @args is copied into state->req in server3_3_compound (). But even after the function has gone out of scope, in server_compound_resume () this pointer is accessed and dereferenced. This patch fixes that. 2. Compound fops, by virtue of NOT having a vector sizer (like the one writev has), ends up having both the header and the data (in case one of its member fops is WRITEV) in the same hdr_iobuf. This buffer was not being preserved through the lifetime of the compound fop, causing it to be overwritten by a parallel write fop, even when the writev associated with the currently executing compound fop is yet to hit the desk, thereby corrupting the file's data. This is fixed by associating the hdr_iobuf with the iobref so its memory remains valid through the lifetime of the fop. 3. Also fixed a use-after-free bug in protocol/client in compound fops cbk, missed by Linux but caught by NetBSD. Finally, big thanks to Pranith Kumar K and Raghavendra Gowdappa for their help in debugging this file corruption issue. Change-Id: I6d5c04f400ecb687c9403a17a12683a96c2bf122 BUG: 1378778 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/15654 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	tests: gfapi/bug1291259.c should only call glfs_free() on success	Niels de Vos	2016-10-20	2	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In case glfs_h_poll_upcall() does not return success, the 'struct glfs_upcall' would not have been allocated. A retry will be done and glfs_free() is called on the unallocated structure. In case the pointer does not point to NULL, glfs_free() will try to free up some random area. Change-Id: I38788d3bf22bbac3924f25edf45cd4a2637fa777 BUG: 1371540 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/15603 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
*	md-cache, afr: Reduce the window of stale read	Poornima G	2016-10-20	1	-0/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Consider a replica setup, where one mount writes data to a file and the other mount reads the file. In afr, read operations are not transaction based, a brick(read subvolume) is chosen as a part of lookup or other operations, read is always wound only to the read subvolume, even if there was write from a different client that failed on this brick. This stale read continues until there is a lookup or any write operation from the mount point. Currently, this is not a major issue, as a lookup is issued before every read and it will switch the read subvolume to a correct one. But with the plan of increasing md-cache timeout to 600s, the stale read problem will be more pronounced, i.e. stale read can continue for 600s(or more if cascaded with readdirp), as there will be no lookups. Solution: Afr doesn't have any built-in solution for stale read(without affecting the performance). The solution that came up, was to use upcall. When a file on any brick is marked bad for the first time, upcall sends a notification to all the clients that had recently accessed the file. The solution has 2 parts: - Identifying when a file is marked bad, on any of the bricks, for the first time - Client side actions on recieving the notifications Identifying when a file is marked bad on any of the bricks for the first time: ----------------------------------------------------------------------------- The idea is to track xattrop in upcall. xattrop currently comes with 2 afr xattrs - afr dirty bit and afr pending xattrs. Dirty xattr is set to 1 before every write, and is unset if write succeeds. In certain scenarios, dirty xattr can be 0 and still the file could be bad copy. Hence do not track dirty xattr. Pending xattr is set on the good copy, indicating the other bricks that have bad copy. It is still not as simple as, notifying when any of the pending xattrs change. It could lead to flood of notifcations, in case the other brick is completely down or consistantly failing. Hence it is important to notify only once, the first time a good copy is marked bad. Client side actions on recieving pending xattr change, notification: -------------------------------------------------------------------- md-cache will invalidate the cache of that file, so that further lookup is passed down to afr and hence update the read subvolume. Invalidating only in md-cache is not enough, consider the folling oder of opertaions: - pending xattr invalidation - invalidate md-cache - readdirp on the bad read subvolume - fill md-cache - lookup (served from md-cache) - read - wound to the old read subvol. Hence, along with invalidating md-cache, it is very important to reset the read subvolume for that file, in afr. Design Credit: Anuradha Talur, Ravishankar N 1. xattrop doesn't carry info saying post op/pre op. 2. Pre xattrop will have 0 value for all pending xattrs, the cbk of pre xattrop carries the on-disk xattr value. Non zero indicated healing is required. 3. Post xattrop will have non zero value for any of the pending xattrs, if the fop failed on any of the bricks. Change-Id: I469cbc111714c433984fe1c922be2ef113c25804 BUG: 1211863 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15398 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	cluster/ec: Implement heal info with lock	Ashish Pandey	2016-10-11	3	-7/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Currently heal info command prints all the files/directories if the index for the file/directory is present in .glusterfs/indices folder. After implementing patch http://review.gluster.org/#/c/13733/ indices of the file which is going through update fop will also be present in .glusterfs/indices even if the fop is successful on all the brick. At this time if heal info command is being used, it will also display this file which is actually healthy and does not require any heal. Solution: Take lock on a file corresponding to the indices and inspect xattrs to decide if the file needs heal or not. Change-Id: I6361e2813ece369be12d02e74816df4eddb81cfa BUG: 1366815 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/15543 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
*	gfapi: redesign the public interface for upcall consumers	Niels de Vos	2016-09-28	5	-80/+96
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The glfs_callback_arg and glfs_callback_inode_arg were allocated by gfapi, and expected to be free()'d by the application. However it is not reasonable to expect that applications use the same memory allocator to as the compiled libgfapi.so. For instance, it is possible that gfapi uses glibc malloc/free, and an application like NFS-Ganesha the versions from jemalloc. Mismatching of the malloc() and free() functions causes segmentation faults at best. In order to prevent problems like this in the future, the API for applications that consume upcalls has been remodeled. Any of the structures that gfapi allocates, should be free'd with glfs_free(). The members of the structures can not be accessed directly anymore, each has its own function to access now. Correcting the naming of the functions, structures and constants is a continuation of commit 2775dc64101ed37c8d9809bf9852dbf0746ee2b6. These new improvements not only have correct prefixes for the functions and structures, the naming also reflects more to the upcall framework and does not use "callback" anymore. Change-Id: I2b8bd5a0a82036d2abea1a217f5e5975a1d4fe93 BUG: 1344714 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/14701 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: soumya k <skoduri@redhat.com> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
*	tests: Fix races in open-behind.t	Pranith Kumar K	2016-09-27	3	-3/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problems: 1) flush-behind is on by default, so just because write completes doesn't mean it will be on the disk, it could still be in write-behind's cache. This leads to failure where if you write from one mount and expect it to be there on the other mount, sometimes it won't be there. 2) Sometimes the graph switch is not completing by the time we issue read which is leading to opens not being sent on brick leading to failures. Fixes: 1) Disable flush-behind 2) Add new functions to check the new graph is there and connected to bricks before 'cat' is executed. BUG: 1379511 Change-Id: I0faed684e0dc70cfd2258ce6fdaed655ee915ae6 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15575 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	glusterd: Revert 368e26f454fe35477e46dc698fa6b8c3c608ea8d	N Balachandran	2016-09-26	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The earlier fix prevented rebalance from starting even if only a single brick of a replica set was down. Change-Id: I8c9a7d4c1fe33f7749568357462f29796e1b832a BUG: 1376671 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/15517 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
*	glusterd: "gluster v heal test statistics heal-count replica" output is not ↵	Mohit Agrawal	2016-09-26	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	correct Problem : "gluster v heal test statistcs heal-count replica" does not show correct output. Solution: After update condition (match brick name) in _select_hxlator_with_matching_brick, it shows correct output. BUG: 1325792 Change-Id: I60cc7c68ea70bce267a747570f91dcddbc1d9016 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: http://review.gluster.org/15494 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
*	socket: pollerr event shouldn't trigger socket_connnect_finish	Atin Mukherjee	2016-09-19	2	-7/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If connect fails with any other error than EINPROGRESS we cannot get the error status using getsockopt (... SO_ERROR ... ). Hence we need to remember the state of connect and take appropriate action in the event_handler for the same. As an added note, a event can come where poll_err is HUP and we have poll_in as well (i.e some status was written to the socket), so for such cases we need to finish the connect, process the data and then the poll_err as is the case in the current code. Special thanks to Kaushal M & Raghavendra G for figuring out the issue. Change-Id: Ic45ad59ff8ab1d0a9d2cab2c924ad940b9d38528 BUG: 1372356 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Signed-off-by: Shyam <srangana@redhat.com> Reviewed-on: http://review.gluster.org/15440 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
*	cluster/ec: set/unset dirty flag for data/metadata update	Ashish Pandey	2016-09-15	2	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, for all the update operations, metadata or data, we set the dirty flag at the end of the operation only if a brick is down. This leads to delay in healing and in some cases not at all. In this patch we set (+1) the dirty flag at the start of the metadata or data update operations and after successfull completion of the fop, we unset (-1) it again. Change-Id: Ide5668bdec7b937a61c5c840cdc79a967598e1e9 BUG: 1316873 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/13733 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
*	tests: Kill rpc.statd on tests in Linux as well	Nigel Babu	2016-09-14	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The lack of this causes the /var/messages file on Linux test nodes to be filled up and cause space issues. Change-Id: I4c741c34de7f584859d1c62bdfda44a3d79c7ecc BUG: 1375526 Signed-off-by: Nigel Babu <nigelb@redhat.com> Reviewed-on: http://review.gluster.org/15485 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com>
*	dht: "replica.split-brain-status" attribute value is not correct	Mohit Agrawal	2016-09-12	1	-0/+84
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In a distributed-replicate volume attribute "replica.split-brain-status" value does not display split-brain condition though directory is in split-brain. If directory is in split brain on mutiple replica-pairs it does not show full list of replica pairs. Solution: Update the dht_aggregate code to aggregate the xattr value in this specific condition. Fix: 1) function getChoices returns the choices from split-brain status string. 2) function add_opt adding the choices to local buffer to store in dictionary 3) For the key "replica.split-brain-status" function dht_aggregate call dht_aggregate_split_brain_xattr to prepare the list. Test: To verify the patch followed below steps 1) Create a distributed replica volume and create mount point 2) Stop heal daemon 3) Touch file and directories on mount point mkdir test{1..5};touch tmp{1..5} 4) Down brick process on one of the replica set pkill -9 glusterfsd 5) Change permission of dir on mount point chmod 755 test{1..5} 6) Restart brick process on node with force option 7) kill brick process on other node in same replica set 8) Change permission of dir again on mount point chmod 766 test{1..5} 9) Reexecute same step from 4-9 on other replica set also 10) After check heal status on server it will show dir's are in split brain on all replica sets 11) After check the replica.split-brain-status attr on mount point it will show wrong status of split brain. 12) After apply the patch the attribute shows correct value. BUG: 1368312 Change-Id: Icdfd72005a4aa82337c342762775a3d1761bbe4a Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: http://review.gluster.org/15201 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
*	cluster/dht: heal root permission post add-brick	Susant Palai	2016-09-11	1	-0/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Post add-brick event the new brick will have permission of 755 by default. If the root directory permission was other than 755, that does not get healed to the new brick leading to permission errors/inconsistencies. For choosing source of attr heal we can trust the subvols which have layouts with latest ctime(as part of missing directory heal, we heal the proper attr). In case none of the subvols have layout, return ESTALE to retrigger a fresh lookup. Note: This patch heals the permission of the root directories only. Since, permission healing of directory is not straight forward and required intrusive fix, those are not addressed here. Change-Id: If894e3895d070d46b62d2452e52c1eaafcf56c29 BUG: 1368012 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/15195 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
*	tests: fix bug-963541.t spurious failure	Atin Mukherjee	2016-09-11	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	wait for remove brick to complete before attempt for a commit. Change-Id: I66ea6c48b6a69fe33d79f9d9080b6f2c1462578e BUG: 1374993 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/15457 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	cluster/dht: Skip layout overlap maximization on weighted rebalance	N Balachandran	2016-09-08	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dht_selfheal_layout_maximize_overlap () does not consider chunk sizes while calculating overlaps. Temporarily enabling this operation if only if weighted rebalance is disabled or all bricks are the same size. Change-Id: I5ed16cdff2551b826a1759ca8338921640bfc7b3 BUG: 1366494 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/15403 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>