summaryrefslogtreecommitdiffstats
path: root/tests/bugs
Commit message (Collapse)AuthorAgeFilesLines
* self-heal: fix automatic split-brain resolution optionsJeff Darcy2017-09-274-25/+8
| | | | | | Differential Revision: https://phabricator.intern.facebook.com/D5927193 Change-Id: Ife04c8738b9ee721e7be9bc843b2f6d54bbb468e
* cluster/afr: Add additional test coverage for unsplit flowsRichard Wareing2017-09-114-0/+614
| | | | | | | | | | | | | | | | | | | | | | Summary: - Adds test coverage for unsplitting via SHD Test Plan: - Run prove -v tests/bugs/fb2506544* (https://phabricator.fb.com/P56056659) Reviewers: moox, dld, dph, sshreyas Reviewed By: sshreyas Differential Revision: https://phabricator.fb.com/D2770524 Porting note: also added fb*.t tests to test_env. Change-Id: Iac28b595194925a45e62b6438611c9bade58b30b Signed-off-by: Jeff Darcy <jdarcy@fb.com> Reviewed-on: https://review.gluster.org/18261 Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* clusters/afr: Move root entry heal flow to SHDRichard Wareing2017-09-111-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: - Improves upon D2387001 by moving the "forced" root gfid heal to the SHDs - Removed code which forced NFSd/FUSE clients through the entry heal for the root GFID, this will make them spin up just as fast as prior to D2387001 (i.e. instantly) Porting note: mostly inapplicable in 3.8, only one non-test change survived Test Plan: - Must pass tests/bugs/fb8149516.t Reviewers: dph, moox, sshreyas Reviewed By: sshreyas Differential Revision: https://phabricator.fb.com/D2722239 Change-Id: I35f5827df6ead1bb0ff886ca0adabb2add2e7163 Signed-off-by: Jeff Darcy <jdarcy@fb.com> Reviewed-on: https://review.gluster.org/18259 Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* cluster/afr: AFR2 Discovery entry heal flow should only happen on root gfidRichard Wareing2017-09-081-4/+13
| | | | | | | | | | | | | | | | | | | | Summary: - Prevents entry self-heal flow from happening on non-root GFIDs Test Plan: - Run prove -v tests/bugs/fb8149516.t Reviewers: dph, moox, sshreyas Reviewed By: sshreyas Differential Revision: https://phabricator.fb.com/D2470622 Change-Id: Id8559f2cfeb6e1e5c26dc1571854c0fbc0b59e08 Signed-off-by: Jeff Darcy <jdarcy@fb.com> Reviewed-on: https://review.gluster.org/18250 Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* Merge remote-tracking branch 'origin/release-3.8' into release-3.8-fbJeff Darcy2017-08-3115-11/+242
|\ | | | | | | Change-Id: Ie35cd1c8c7808949ddf79b3189f1f8bf0ff70ed8
| * afr: mark non sources as sinks in metadata healRavishankar N2017-07-281-0/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of https://review.gluster.org/#/c/17717/ Problem: In a 3 way replica, when the source brick does not have pending xattrs for the sinks, but the 2 sinks blame each other, metadata heal was not happpening because we were not setting all non-sources as sinks. Fix: Mark all non-sources as sinks, like it is done in data and entry heal. Change-Id: I534978940f5087302e307fcc810a48ffe898ce08 BUG: 1471613 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: https://review.gluster.org/17784 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
| * rpc: bump up conn->cleanup_gen in rpc_clnt_reconnect_cleanupAtin Mukherjee2017-07-111-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 086436a introduced generation number (cleanup_gen) to ensure that rpc layer doesn't end up cleaning up the connection object if application layer has already destroyed it. Bumping up cleanup_gen was done only in rpc_clnt_connection_cleanup (). However the same is needed in rpc_clnt_reconnect_cleanup () too as with out it if the object gets destroyed through the reconnect event in the application layer, rpc layer will still end up in trying to delete the object resulting into double free and crash. Peer probing an invalid host/IP was the basic test to catch this issue. Cherry picked from commit 39e09ad1e0e93f08153688c31433c38529f93716: > Change-Id: Id5332f3239cb324cead34eb51cf73d426733bd46 > BUG: 1433578 > Signed-off-by: Atin Mukherjee <amukherj@redhat.com> > Reviewed-on: https://review.gluster.org/16914 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > Reviewed-by: Milind Changire <mchangir@redhat.com> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Change-Id: Id5332f3239cb324cead34eb51cf73d426733bd46 BUG: 1462447 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://review.gluster.org/17743 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Milind Changire <mchangir@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
| * libglusterfs : Fix crash in glusterd while peer probingGaurav Yadav2017-06-191-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | glusterd crashes when port is being set explcitly to a range which is outside greater than short data type range. Eg. sysctl net.ipv4.ip_local_reserved_ports="49152-49156" In above case glusterd crashes while parsing the port. With this fix glusterd will be able to handle port range between INT_MIN to INT_MAX > Reviewed-on: https://review.gluster.org/17359 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Samikshan Bairagya <samikshan@gmail.com> > Reviewed-by: Atin Mukherjee <amukherj@redhat.com> > Reviewed-by: Niels de Vos <ndevos@redhat.com> > Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Change-Id: I7c75ee67937b0e3384502973d96b1c36c89e0fe1 BUG: 1447523 Signed-off-by: Gaurav Yadav <gyadav@redhat.com> Reviewed-on: https://review.gluster.org/17505 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
| * Fixes quota aux mount failureSanoj Unnikrishnan2017-06-198-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The aux mount is created on the first limit/remove_limit/list command and it remains until volume is stopped / deleted / (quota is disabled) , where we do a lazy unmount. If the process is uncleanly terminated, then the mount entry remains and we get (Transport disconnected) error on subsequent attempts to run quota list/limit-usage/remove commands. Second issue, There is also a risk of inadvertent rm -rf on the /var/run/gluster causing data loss for the user. Ideally, /var/run is a temp path for application use and should not cause any data loss to persistent storage. Solution: 1) unmount the aux mount after each use. 2) clean stale mount before mounting, if any. One caveat with doing mount/unmount on each command is that we cannot use same mount point for both list and limit commands. The reason for this is that list command needs mount to be accessible in cli after response from glusterd, So it could be unmounted by a limit command if executed in parallel (had we used same mount point) Hence we use separate mount points for list and limit commands. > Reviewed-on: https://review.gluster.org/16938 > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Manikandan Selvaganesh <manikandancs333@gmail.com> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > Reviewed-by: Atin Mukherjee <amukherj@redhat.com> > (cherry picked from commit 2ae4b4058691b324535d802f4e6d24cce89a10e5) Change-Id: I4f9e39da2ac2b65941399bffb6440db8a6ba59d0 BUG: 1449782 Signed-off-by: Sanoj Unnikrishnan <sunnikri@redhat.com> Reviewed-on: https://review.gluster.org/17242 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
| * geo-rep: filter out xtime attribute during getxattrSaravanakumar Arumugam2017-05-032-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | georep gsyncd's xtime needs to filtered irrespective of any process access. This way, we can avoid (unnecessarily)syncing xtime attribute to slave, which may raise permission denied errors. test case modified to check for xtime xattr only in backend. Back port of> >Change-Id: I2390b703048d5cc747d91fa2ae884dc55de58669 >BUG: 1353952 >Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> >Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> >Reviewed-on: https://review.gluster.org/14880 >Smoke: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Kotresh HR <khiremat@redhat.com> >Tested-by: Kotresh HR <khiremat@redhat.com> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Change-Id: Ibdee6f3093648a7e0fb1e2b6be8172e604ab657f BUG: 1441574 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: https://review.gluster.org/17045 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
| * afr: don't do a post-op on a brick if op failedRavishankar N2017-04-291-0/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In afr-v2, self-blaming xattrs are not there by design. But if the FOP failed on a brick due to an error other than ENOTCONN (or even due to ENOTCONN, but we regained connection before postop was wound), we wind the post-op also on the failed brick, leading to setting self-blaming xattrs on that brick. This can lead to undesired results like healing of files in split-brain etc. Fix: If a fop failed on a brick on which pre-op was successful, do not perform post-op on it. This also produces the desired effect of not resetting the dirty xattr on the brick, which is how it should be because if the fop failed on a brick, there is no reason to clear the dirty bit which actually serves as an indication of the failure. > Reviewed-on: https://review.gluster.org/16976 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Change-Id: I5f1caf4d1b39f36cf8093ccef940118638caa9c4 BUG: 1443319 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: https://review.gluster.org/17082 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
| * cluster/afr: Undo pending xattrs only on the up brickskarthik-us2017-04-071-0/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: While doing conservative merge, even if a brick is down, it will reset the pending xattr on that. When that brick comes up, as part of the heal, it will consider this brick as the source and removes the entries on the other bricks, which leads to data loss. Fix: Undo pending only for the bricks which are up. > Change-Id: I18436fa0bb1faa5f60531b357dea3f6b20446303 > BUG: 1433571 > Signed-off-by: karthik-us <ksubrahm@redhat.com> > Reviewed-on: https://review.gluster.org/16913 > Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit f91596e6566c605e70a31a60523d11f78a097c3c) Change-Id: Id20c9ce53ee59f005d977494903247e2a8024ed1 BUG: 1436231 Signed-off-by: karthik-us <ksubrahm@redhat.com> Reviewed-on: https://review.gluster.org/16956 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* | Hack out rmtab based tests in FB branchKevin Vigor2017-03-092-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: rmtab support is brutally hacked out in FB branch, these tests are doomed to be sad. Test Plan: Reviewers: sshreyas Subscribers: Tasks: Blame Revision: Change-Id: I7bc3c061108682a12f051edcc4379721d5a216df Signed-off-by: Kevin Vigor <kvigor@fb.com> Reviewed-on: https://review.gluster.org/16884 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Shreyas Siravara <sshreyas@fb.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* | cluster/afr: AFR2 discovery should always do entry heal flowRichard Wareing2017-03-061-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: - Fixes case where when a brick is completely wiped, the AFR2 discovery mechanism would potentially (1/R chance where R is your replication factor) pin a NFSd or client to the wiped brick. This would in turn prevent the client from seeing the contents of the (degraded) subvolume. - The fix proposed in this patch is to force the entry-self heal code path when the discovery process happens. And furthermore, forcing a conservative merge in the case where no brick is found to be degraded. - This also restores the property of our 3.4.x builds where-by bricks automagically rebuild via the SHDs without having to run any sort of "full heal". SHDs are given enough signal via this patch to figure out what they need to heal. Test Plan: Run "prove -v tests/bugs/fb8149516.t" Output: https://phabricator.fb.com/P19989638 Prove test showing failed run on v3.6.3-fb_10 without the patch -> https://phabricator.fb.com/P19989643 Reviewers: dph, moox, sshreyas Reviewed By: sshreyas FB-commit-id: 3d6f171 Change-Id: I7e0dec82c160a2981837d3f07e3aa6f6a701703f Signed-off-by: Kevin Vigor <kvigor@fb.com> Reviewed-on: https://review.gluster.org/16862 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
* | Hack out failing tests in FB branchKevin Vigor2017-03-062-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Try to get a passing smoke test by crudely hacking out failing tests. Test Plan: Commit & hope for happy smoke. Reviewers: sshreyas Subscribers: Tasks: Blame Revision: Change-Id: I564ac50557276a839d8de3a89a5c154c751b7503 Signed-off-by: Kevin Vigor <kvigor@fb.com> Reviewed-on: https://review.gluster.org/16856 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shreyas Siravara <sshreyas@fb.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* | Prevent frame-timeouts from hanging syncopsRichard Wareing2017-03-051-0/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: It was observed while testing the SHD threading code, that under high loads SHD/AFR related SyncOps & SyncTasks can actually hang/deadlock as the transport disconnected event (for frame timeouts) never gets bubbled up correctly. Various tests indicated the ping timeouts worked fine, while "frame timeouts" did not. The only difference? Ping timeouts actually disconnect the transport while frame timeouts did not. So from a high-level we know this prevents deadlock as subsequent tests showed the deadlocks no longer ocurred (after this change). That said, there may be some more elegant solution. For now though, forcing a reconnect is preferential vs hanging clients or deadlocking the SHD. Test Plan: It's fairly difficult to write a good prove test for this since it requires human eyes to observe if the SHD is deadlocked (I'm open to ideas). Here's the repro though: 1. Create a 3x replicated cluster on a host. 2. Set the frame-timeout low (say 2 sec) 3. Down a brick, and write a pile of files (maybe 2000) 4. Bring up the downed brick and let the SHD begin healing files 5. During the heal process, kill -STOP <pid of brick> (hang) one of the bricks Without this patch the SHD will be deadlocked, even though the frame timed out after 2 seconds. With the patch, the plug is pulled on the transport, a disconnect is bubbled up to the syncop and the SHD resumes. Reviewers: dph, meyering, cjh Reviewed By: cjh Subscribers: ethanr Conflicts: rpc/rpc-lib/src/rpc-clnt.c FB-commit-id: c99357c Change-Id: I344079161492b195267c2d64b6eab0b441f12ded Signed-off-by: Kevin Vigor <kvigor@fb.com> Reviewed-on: https://review.gluster.org/16846 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
* | Merge remote-tracking branch 'origin/release-3.8' into merge-3.8Kevin Vigor2017-03-051-0/+40
|\|
| * glusterd: ignore return code of glusterd_restart_bricksAtin Mukherjee2017-02-201-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When GlusterD is restarted on a multi node cluster, while syncing the global options from other GlusterD, it checks for quorum and based on which it decides whether to stop/start a brick. However we handle the return code of this function in which case if we don't want to start any bricks the ret will be non zero and we will end up failing the import which is incorrect. Fix is just to ignore the ret code of glusterd_restart_bricks () >Reviewed-on: https://review.gluster.org/16574 >Smoke: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Samikshan Bairagya <samikshan@gmail.com> >Reviewed-by: Jeff Darcy <jdarcy@redhat.com> >(cherry picked from commit 55625293093d485623f3f3d98687cd1e2c594460) Change-Id: I37766b0bba138d2e61d3c6034bd00e93ba43e553 BUG: 1420993 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/16594 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Samikshan Bairagya <samikshan@gmail.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* | Merge remote-tracking branch 'origin/release-3.8' into merge-3.8Kevin Vigor2017-02-161-0/+66
|\|
| * afr: all children of AFR must be up to resolve s-brainRavishankar N2017-02-151-0/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: The various split-brain resolution policies (favorite-child-policy based, CLI based and mount (get/setfattr) based) attempt to resolve split-brain even when not all bricks of replica are up. This can be a problem when say in a replica 3, the only good copy is down and the other 2 bricks are up and blame each other (i.e. split-brain). We end up healing the file in such a case and allow I/O on it. Fix: A decision on whether the file is in split-brain or not must be taken only if we are able to examine the afr xattrs of *all* bricks of a given replica. > Reviewed-on: https://review.gluster.org/16476 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> (cherry picked from commit 0e03336a9362e5717e561f76b0c543e5a197b31b) Change-Id: Icddb1268b380005799990f5379ef957d84639ef9 BUG: 1420984 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: https://review.gluster.org/16589 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* | Merge remote-tracking branch 'origin/release-3.8' into 3.8-mergeKevin Vigor2017-01-092-0/+113
|\|
| * posix: make sure atime and mtime are set when calling lutimes()Niels de Vos2017-01-081-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When overwriting an existing file with O_TRUNC, the 'atime' was set to 0, meaning the Epoch (01-Jan-1970 UTC). However, the 'mtime' gets updated correcty. In case 'atime' or 'mtime' is not passed in the 'struct iatt', the time values passed to the systemcall are taken from the current values are returned by lstat(). Cherry picked from commit 9bed81ada6f91f998e9abd915b18e3f06557cdcb: > Change-Id: I7021b7161dcd6c9a3e515d98f6d4847533c434b3 > BUG: 1401777 > Reported-by: Eivind Sarto <eivindsarto@gmail.com> > Signed-off-by: Niels de Vos <ndevos@redhat.com> > Reviewed-on: http://review.gluster.org/16034 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Change-Id: I7021b7161dcd6c9a3e515d98f6d4847533c434b3 BUG: 1411011 Reported-by: Eivind Sarto <eivindsarto@gmail.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/16356 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
| * afr: allow I/O when favorite-child-policy is enabledRavishankar N2017-01-081-0/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Currently, I/O on a split-brained file fails even when the favorite-child-policy is set until the self-heal is complete. Fix: If a valid 'source' is found using the set favorite-child-policy, inspect and reset the afr pending xattrs on the 'sinks' (inside appropriate locks), refresh the inode and then proceed with the read or write transaction. The resetting itself happens in the self-heal code and hence can also happen in the client side background-heal or by the shd's index-heal in addition to the txn code path explained above. When it happens in via heal, we also add checks in undo-pending to not reset the sink xattrs again. > Reviewed-on: http://review.gluster.org/15673 > Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Change-Id: Ic8c1317720cb26bd114b6fe6af4e58c73b864626 BUG: 1378547 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reported-by: Simon Turcotte-Langevin <simon.turcotte-langevin@ubisoft.com> Reviewed-on: http://review.gluster.org/16091 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* | Fix prove test bug-1292020.tKevin Vigor2017-01-061-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: dd is doing a statfs and failing with ENOSPC instead of writign and getting EDQUOTA. Make either error a success in this test. Test Plan: prove bug-1292020.t Reviewers: Subscribers: Tasks: Blame Revision: Change-Id: I9f580d9e4a4dd293df55a1d954f86a9862fcae7b Signed-off-by: Kevin Vigor <kvigor@fb.com> Reviewed-on: http://review.gluster.org/16352 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
* | Merge remote-tracking branch 'origin/release-3.8' into merge-3.8-againKevin Vigor2017-01-059-49/+453
|\| | | | | | | | | Change-Id: I844adf2aef161a44d446f8cd9b7ebcb224ee618a Signed-off-by: Kevin Vigor <kvigor@fb.com>
| * cluster/afr: Fix missing name indices due to EEXIST errorKrutika Dhananjay2016-12-291-0/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/16286 PROBLEM: Consider a volume with granular-entry-heal and sharding enabled. When a replica is down and a shard is created as part of a write, the name index is correctly created under indices/entry-changes/<dot-shard-gfid>. Now when a read on the same region triggers another MKNOD, the fop fails on the online bricks with EEXIST. By virtue of this being a symmetric error, the failed_subvols[] array is reset to all zeroes. Because of this, before post-op, the GF_XATTROP_ENTRY_OUT_KEY will be set, causing the name index, which was created in the previous MKNOD operation, to be wrongly deleted in THIS MKNOD operation. FIX: The ideal fix would have been for a transaction to delete the name index ONLY if it knows it is the one that created the index in the first place. This would involve gathering information as to whether THIS xattrop created the index from individual bricks, aggregating their responses and based on the various posisble combinations of responses, decide whether to delete the index or not. This is rather complex. Simpler fix would be for post-op to examine local->op_ret in the event of no failed_subvols to figure out whether to delete the name index or not. This can occasionally lead to creation of stale name indices but they won't be affecting the IO path or mess with pending changelogs in any way and self-heal in its crawl of "entry-changes" directory would take care to delete such indices. Change-Id: Icc642a987d1b6a5097562315aecf1263ed35ceb6 BUG: 1408786 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/16293 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
| * tests: Fix spurious failure in tests/bugs/replicate/bug-1402730.tKrutika Dhananjay2016-12-231-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/16193 Replace the EXPECT '00000001' with EXPECT_NOT '00000000'. This is because occasionally a name-heal is performing new-entry marking on 'c' causing the pending entry changelog on it to become '00000002'. Change-Id: Ib7b0d64c8de2498c2ffb3b8e06228694f2c55755 BUG: 1406740 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/16224 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
| * tests: Fix spurious failure in bug-1402841.t-mt-dir-scan-race.tKrutika Dhananjay2016-12-211-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/16169 Check that shd is up before executing 'volume heal' command Change-Id: I43634a979791fcb92bfebf93ec48eff42af2bb97 BUG: 1405890 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/16190 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
| * tests: Fix spurious test failure in bug-1316437.tRajesh Joseph2016-12-191-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After sending SIGTERM to gluster process we immediately check if process exited. We should wait for some time before checking process state. > Reviewed-on: http://review.gluster.org/16162 > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Avra Sengupta <asengupt@redhat.com> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > Reviewed-by: N Balachandran <nbalacha@redhat.com> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> (cherry picked from commit e9d8525a0d34130ba2a582109937b8e79eecf6ab) BUG: 1405450 Change-Id: Iaba0067f6e880a7fe38e11b9fa0fe9bd103b19e2 Signed-off-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-on: http://review.gluster.org/16164 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Avra Sengupta <asengupt@redhat.com>
| * cluster/afr: Fix per-txn optimistic changelog initialisationKrutika Dhananjay2016-12-131-0/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/16075 Incorrect initialisation of local->optimistic_change_log was leading to skipped pre-op and post-op even when a brick didn't participate in the txn because it was down. The result - missing granular name index resulting in some entries never getting healed. FIX: Initialise local->optimistic_change_log just before pre-op. Also fixed granular entry heal to create the granular name index in pre-op as opposed to post-op. This is to prevent loss of granular information when during an entry txn, the good (src) brick goes offline before the post-op is done. This would cause self-heal to do conservative merge (since dirty xattr is the only information available), which when granular-entry-heal is enabled, expects granular indices, the lack of which can lead to loss of data in the worst case. Change-Id: Ibc0fbfb3fa21c578e28868d9e30b274e33c12064 BUG: 1403646 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/16105 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
| * syncop: fix conditional wait bug in parallel dir scanRavishankar N2016-12-111-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: The issue as seen by the user is detailed in the BZ but what is happening is if the no. of items in the wait queue == max-qlen, syncop_mt_dir_scan() does a pthread_cond_wait until the launched synctask workers dequeue the queue. But if for some reason the worker fails, the queue is never emptied due to which further invocations of syncop_mt_dir_scan() are blocked forever. Fix: Made some changes to _dir_scan_job_fn - If a worker encounters error while processing an entry, notify the readdir loop in syncop_mt_dir_scan() of the error but continue to process other entries in the queue, decrementing the qlen as and when we dequeue elements, and ending only when the queue is empty. - If the readdir loop in syncop_mt_dir_scan() gets an error form the worker, stop the readdir+queueing of further entries. > Reviewed-on: http://review.gluster.org/16073 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> (cherry picked from commit 2d012c4558046afd6adb3992ff88f937c5f835e4) Change-Id: I39ce073e01a68c7ff18a0e9227389245a6f75b88 BUG: 1403192 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/16096 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
| * uss: snapd should enable SSL if SSL is enabled on volumeRajesh Joseph2016-12-111-0/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During snapd graph generation we should check if SSL is enabled on main volume or not. This is because clients will communicate with snapd as if it is communicating to a brick. > Reviewed-on: http://review.gluster.org/15979 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Kaushal M <kaushal@redhat.com> (cherry picked from commit 182f0d12040dab5081ca645a3f370f65cd68b528) Change-Id: I0d7fe86c567b297a8528a48faf06161d4c3cb415 Signed-off-by: Rajesh Joseph <rjoseph@redhat.com> BUG: 1400459 Reviewed-on: http://review.gluster.org/15986 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
| * libglusterfs: Fix a read hangPoornima G2016-11-282-0/+155
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue: ===== In certain cases, there was no unwind of read from read-ahead xlator, thus resulting in hang. RCA: ==== In certain cases, ioc_readv() issues STACK_WIND_TAIL() instead of STACK_WIND(). One such case is when inode_ctx for that file is not present (can happen if readdirp was called, and populates md-cache and serves all the lookups from cache). Consider the following graph: ... io-cache (parent) | readdir-ahead | read-ahead ... Below is the code snippet of ioc_readv calling STACK_WIND_TAIL: ioc_readv() { ... if (!inode_ctx) STACK_WIND_TAIL (frame, FIRST_CHILD (frame->this), FIRST_CHILD (frame->this)->fops->readv, fd, size, offset, flags, xdata); /* Ideally, this stack_wind should wind to readdir-ahead:readv() but it winds to read-ahead:readv(). See below for explaination. */ ... } STACK_WIND_TAIL (frame, obj, fn, ...) { frame->this = obj; /* for the above mentioned graph, frame->this will be readdir-ahead * frame->this = FIRST_CHILD (frame->this) i.e. readdir-ahead, which * is as expected */ ... THIS = obj; /* THIS will be read-ahead instead of readdir-ahead!, as obj expands * to "FIRST_CHILD (frame->this)" and frame->this was pointing * to readdir-ahead in the previous statement. */ ... fn (frame, obj, params); /* fn will call read-ahead:readv() instead of readdir-ahead:readv()! * as fn expands to "FIRST_CHILD (frame->this)->fops->readv" and * frame->this was pointing ro readdir-ahead in the first statement */ ... } Thus, the readdir-ahead's readv() implementation will be skipped, and ra_readv() will be called with frame->this = "readdir-ahead" and this = "read-ahead". This can lead to corruption / hang / other problems. But in this perticular case, when 'frame->this' and 'this' passed to ra_readv() doesn't match, it causes ra_readv() to call ra_readv() again!. Thus the logic of read-ahead readv() falls apart and leads to hang. Solution: ========= Modify STACK_WIND_TAIL() as: STACK_WIND_TAIL (frame, obj, fn, ...) { next_xl = obj /* resolve obj as the variables passed in obj macro can be overwritten in the further instrucions */ next_xl_fn = fn /* resolve fn and store in a tmp variable, before modifying any variables */ frame->this = next_xl; ... THIS = next_xl; ... next_xl_fn (frame, next_xl, params); ... } BUG: 1399018 Change-Id: Ie662ac8f18fa16909376f1e59387bc5b886bd0f9 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15934 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
| * glusterd: clean up old port and allocate new one on every restartAtin Mukherjee2016-11-231-47/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/#/c/15005/9. GlusterD as of now was blindly assuming that the brick port which was already allocated would be available to be reused and that assumption is absolutely wrong. Solution : On first attempt, we thought GlusterD should check if the already allocated brick ports are free, if not allocate new port and pass it to the daemon. But with that approach there is a possibility that if PMAP_SIGNOUT is missed out, the stale port will be given back to the clients where connection will keep on failing. Now given the port allocation always start from base_port, if everytime a new port has to be allocated for the daemons, the port range will still be under control. So this fix tries to clean up old port using pmap_registry_remove () if any and then goes for pmap_registry_alloc () This patch is being ported to 3.8 branch because, the brick process blindly re-using old port, without registering with the pmap server, causes snapd daemon to not start properly, even though snapd registers with the pmap server. With this patch, all the brick processes and snapd will register with the pmap server to either get the same port, or a new port, and avoid port collision. > Reviewed-on: http://review.gluster.org/15005 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Avra Sengupta <asengupt@redhat.com> (cherry picked from commit c3dee6d35326c6495591eb5bbf7f52f64031e2c4) Change-Id: If54a055d01ab0cbc06589dc1191d8fc52eb2c84f BUG: 1369766 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/15308 Tested-by: Avra Sengupta <asengupt@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com>
| * marker: Fix inode value in loc, in setxattr fopPoornima G2016-11-211-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/15826 On recieving a rename fop, marker_rename() stores the, oldloc and newloc in its 'local' struct, once the rename is done, the xtime marker(last updated time) is set on the file, but sending a setxattr fop. When upcall receives the setxattr fop, the loc->inode is NULL and it crashes. The loc->inode can be NULL only in one valid case, i.e. in rename case where the inode of new loc can be NULL. Hence, marker should have filled the inode of the new_loc before issuing a setxattr. > Reviewed-on: http://review.gluster.org/15826 > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Kotresh HR <khiremat@redhat.com> > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> (cherry picked from commit 46e5466850311ee69e6ae9a11c2bba2aabadd5de) Change-Id: Id638f678c3daaf4a5c29b970b58929d377ae8977 BUG: 1396418 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15878 Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* | cluster/dht: Bug fixes to cluster.min-free-diskRichard Wareing2016-12-202-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: - Enforces FUSE/gNFSd/SHD/rebalance rejection of writes when all subvolumes are beyond the value set in "cluster.min-free-disk" - Fixes existing code paths to be more intuitive & straightforward - Write path now honors min-free-disk - Adds test to ensure feature doesn't break in future - This is a port of D2981282 to 3.8 Signed-off-by: Shreyas Siravara <sshreyas@fb.com> Change-Id: I76923bf76178fe589aa1a26bd1970cf8d009642a Reviewed-on: http://review.gluster.org/16153 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shreyas Siravara <sshreyas@fb.com> Tested-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* | storage/posix: Add free space limits to bricksKevin Vigor2016-12-192-15/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: - Add a configurable minimum free space for bricks, using the new options storage.min-free-disk (analagous to cluster.min-free-disk, and using the same units: either a percentage or an absolute number of bytes) and storage.freespace-check-interval (how frequently to check free space, in seconds). - This is a cherry-pick of D2920210 to 3.8 Signed-off-by: Shreyas Siravara <sshreyas@fb.com> Change-Id: I4b87e421aad023e49b5972c6e61539670a818411 Reviewed-on: http://review.gluster.org/16176 Tested-by: Shreyas Siravara <sshreyas@fb.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kevin Vigor <kvigor@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* | tests: Fix tests/bugs/distribute/bug-1161311.tShreyas Siravara2016-12-141-2/+8
|/ | | | | | | | | | | | | | | Summary: - http://review.gluster.org/#/c/16078/ made rebalance faster and broke the test. - We made the file bigger so rebalance takes longer. Change-Id: I86f08d3d53bbff8373e954b8ae57a3a9a5942b74 Signed-off-by: Shreyas Siravara <sshreyas@fb.com> Reviewed-on: http://review.gluster.org/16133 Reviewed-by: Kevin Vigor <kvigor@fb.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* socket: pollerr event shouldn't trigger socket_connnect_finishAtin Mukherjee2016-09-302-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If connect fails with any other error than EINPROGRESS we cannot get the error status using getsockopt (... SO_ERROR ... ). Hence we need to remember the state of connect and take appropriate action in the event_handler for the same. As an added note, a event can come where poll_err is HUP and we have poll_in as well (i.e some status was written to the socket), so for such cases we need to finish the connect, process the data and then the poll_err as is the case in the current code. Special thanks to Kaushal M & Raghavendra G for figuring out the issue. >Signed-off-by: Shyam <srangana@redhat.com> >Reviewed-on: http://review.gluster.org/15440 >Smoke: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Change-Id: Ic45ad59ff8ab1d0a9d2cab2c924ad940b9d38528 BUG: 1373723 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/15532 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* dht: "replica.split-brain-status" attribute value is not correctMohit Agrawal2016-09-261-0/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In a distributed-replicate volume attribute "replica.split-brain-status" value does not display split-brain condition though directory is in split-brain. If directory is in split brain on mutiple replica-pairs it does not show full list of replica pairs. Solution: Update the dht_aggregate code to aggregate the xattr value in this specific condition. Fix: 1) function getChoices returns the choices from split-brain status string. 2) function add_opt adding the choices to local buffer to store in dictionary 3) For the key "replica.split-brain-status" function dht_aggregate call dht_aggregate_split_brain_xattr to prepare the list. Test: To verify the patch followed below steps 1) Create a distributed replica volume and create mount point 2) Stop heal daemon 3) Touch file and directories on mount point mkdir test{1..5};touch tmp{1..5} 4) Down brick process on one of the replica set pkill -9 glusterfsd 5) Change permission of dir on mount point chmod 755 test{1..5} 6) Restart brick process on node with force option 7) kill brick process on other node in same replica set 8) Change permission of dir again on mount point chmod 766 test{1..5} 9) Reexecute same step from 4-9 on other replica set also 10) After check heal status on server it will show dir's are in split brain on all replica sets 11) After check the replica.split-brain-status attr on mount point it will show wrong status of split brain. 12) After apply the patch the attribute shows correct value. > Change-Id: Icdfd72005a4aa82337c342762775a3d1761bbe4a > Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> > Reviewed-on: http://review.gluster.org/15201 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > (cherry picked from commit c4e9ec653c946002ab6d4c71ee8e6df056438a04) Change-Id: I85a5ae60189066d9e80799f00f1352c2f33ef4f8 Backport of commit c4e9ec653c946002ab6d4c71ee8e6df056438a04 BUG: 1375098 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: http://review.gluster.org/15467 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* tests: Enable all gfapi test casesRajesh Joseph2016-09-2115-41/+45
| | | | | | | | | | | | | | | | | | | | > Change-Id: I32bfec4af91348d96dc3e81a9d5c9cad599f821b > Bug: 1358594 > Signed-off-by: Poornima G <pgurusid@redhat.com> > Reviewed-on: http://review.gluster.org/14748 > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Bug: 1375990 Change-Id: I87f6c7d20959e2d4bbe8c064767a9fed004e8c4a Signed-off-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-on: http://review.gluster.org/15499 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
* tests: fix bug-963541.t spurious failureAtin Mukherjee2016-09-151-1/+2
| | | | | | | | | | | | | | | | | | | wait for remove brick to complete before attempt for a commit. >Reviewed-on: http://review.gluster.org/15457 >Smoke: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Vijay Bellur <vbellur@redhat.com> Change-Id: I66ea6c48b6a69fe33d79f9d9080b6f2c1462578e BUG: 1375043 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/15459 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* cluster/dht: heal root permission post add-brickSusant Palai2016-09-131-0/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Post add-brick event the new brick will have permission of 755 by default. If the root directory permission was other than 755, that does not get healed to the new brick leading to permission errors/inconsistencies. For choosing source of attr heal we can trust the subvols which have layouts with latest ctime(as part of missing directory heal, we heal the proper attr). In case none of the subvols have layout, return ESTALE to retrigger a fresh lookup. Note: This patch heals the permission of the root directories only. Since, permission healing of directory is not straight forward and required intrusive fix, those are not addressed here. > Reviewed-on: http://review.gluster.org/15195 > Smoke: Gluster Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >Reviewed-by: Raghavendra G <rgowdapp@redhat.com> (cherry picked from commit 801cd07a4c6ec65ff930b2ae6bb5e405ccd03334) Change-Id: If894e3895d070d46b62d2452e52c1eaafcf56c29 BUG: 1374573 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/15465 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: Skip layout overlap maximization on weighted rebalanceN Balachandran2016-09-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | During a fix-layout, dht_selfheal_layout_maximize_overlap () does not consider chunk sizes while calculating layout overlaps, causing smaller bricks to sometimes get larger ranges than larger bricks. Temporarily enabling this operation if only if weighted rebalance is disabled or all bricks are the same size. > Change-Id: I5ed16cdff2551b826a1759ca8338921640bfc7b3 > BUG: 1366494 > Signed-off-by: N Balachandran <nbalacha@redhat.com> > Reviewed-on: http://review.gluster.org/15403 > Smoke: Gluster Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> (cherry picked from commit b93692cce603006d9cb6750e08183bca742792ac) Change-Id: Icf0dd83f36912e721982bcf818a06c4b339dc974 BUG: 1374135 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/15422 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* tests/cli: Generate SSL certificatesAshish Pandey2016-09-071-0/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Generate SSL certificates before enabling management encryption to avoid test failure. master - This patch is backport of following two master patches http://review.gluster.org/#/c/13959/ - bug-1320388.t was first introduced in this patch http://review.gluster.org/#/c/15202/ - Modified bug-1320388.t to create SSL cerificate >Change-Id: Iab23b36703f4653f1d5bb9d14695e4d3fa63ad61 >Signed-off-by: Ashish Pandey <aspandey@redhat.com> >BUG: 1368349 >Signed-off-by: Ashish Pandey <aspandey@redhat.com> >Reviewed-on: http://review.gluster.org/15202 >Smoke: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Atin Mukherjee <amukherj@redhat.com> >Signed-off-by: Ashish Pandey <aspandey@redhat.com> Change-Id: Iab23b36703f4653f1d5bb9d14695e4d3fa63ad61 BUG: 1368918 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/15228 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* cluster/afr: Prevent split-brain when bricks are brought off and on in ↵Krutika Dhananjay2016-08-221-0/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cyclic order Backport of: http://review.gluster.org/15080 When the bricks are brought offline and then online in cyclic order while writes are in progress on a file, thanks to inode refresh in write txns, AFR will mostly fail the write attempt when the only good copy is offline. However, there is still a remote possibility that the file will run into split-brain if the brick that has the lone good copy goes offline *after* the inode refresh but *before* the write txn completes (I call it in-flight split-brain in the patch for ease of reference), requiring intervention from admin to resolve the split-brain before the IO can resume normally on the file. To get around this, the patch does the following things: i) retains the dirty xattrs on the file ii) avoids marking the last of the good copies as bad (or accused) in case it is the one to go down during the course of a write. iii) fails that particular write with the appropriate errno. This way, we still have one good copy left despite the split-brain situation which when it is back online, will be chosen as source to do the heal. > Change-Id: I9ca634b026ac830b172bac076437cc3bf1ae7d8a > BUG: 1363721 > Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> > Reviewed-on: http://review.gluster.org/15080 > Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> > Smoke: Gluster Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Ravishankar N <ravishankar@redhat.com> > Reviewed-by: Oleksandr Natalenko <oleksandr@natalenko.name> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> (cherry picked from commit fcb5b70b1099d0379b40c81f35750df8bb9545a5) Change-Id: I157f1025aebd6624fa3d412abc69a4ae6f2fe9e0 BUG: 1367272 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Signed-off-by: Oleksandr Natalenko <oleksandr@natalenko.name> Reviewed-on: http://review.gluster.org/15221 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* glusterd: Convert volume to replica after adding brick self heal is not ↵Mohit Agrawal2016-08-181-0/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | triggered Problem: After add brick to a distribute volume to convert to replica is not triggering self heal. Solution: Modify the condition in brick_graph_add_index to set trusted.afr.dirty attribute in xlator. Test : To verify the patch followd below steps 1) Create a single node volume gluster volume create <DIS> <IP:/dist1/brick1> 2) Start volume and create mount point mount -t glusterfs <IP>:/DIS /mnt 3) Touch some file and write some data on file 4) Add another brick along with replica 2 gluster volume add-brick DIS replica 2 <IP>:/dist2/brick2 5) Before apply the patch file size is 0 bytes in mount point. Backport of commit 87bb8d0400d4ed18dd3954b1d9e5ca6ee0fb9742 BUG: 1366440 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> > Change-Id: Ief0ccbf98ea21b53d0e27edef177db6cabb3397f > Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> > Reviewed-on: http://review.gluster.org/15118 > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > Reviewed-by: Ravishankar N <ravishankar@redhat.com> > Reviewed-by: Anuradha Talur <atalur@redhat.com> > Smoke: Gluster Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Atin Mukherjee <amukherj@redhat.com> > (cherry picked from commit 87bb8d0400d4ed18dd3954b1d9e5ca6ee0fb9742) Change-Id: Icd104cf5a2152a9c606dac209746e2953c4d293e Reviewed-on: http://review.gluster.org/15151 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: Fix volume restart issue upon glusterd restartSamikshan Bairagya2016-08-182-1/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | http://review.gluster.org/#/c/14758/ introduces a check in glusterd_restart_bricks that makes sure that if server quorum is enabled and if the glusterd instance has been restarted, the bricks do not get started. This prevents bricks which have been brought down purposely, say for maintainence, from getting started upon a glusterd restart. However this change introduced regression for a situation that involves multiple volumes. The bricks from the first volume get started, but then for the subsequent volumes the bricks do not get started. This patch fixes that by setting the value of conf->restart_done to _gf_true only after bricks are started correctly for all volumes. > Reviewed-on: http://review.gluster.org/15183 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Atin Mukherjee <amukherj@redhat.com> (cherry picked from commit dd8d93f24a320805f1f67760b2d3266555acf674) Change-Id: I2c685b43207df2a583ca890ec54dcccf109d22c3 BUG: 1366813 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: http://review.gluster.org/15186 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* posix: Do not move and recreate .glusterfs/unlink directoryAshish Pandey2016-08-101-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: At the time of start of a volume, it is checked if .glusterfs/unlink exist or not. If it does, move it to landfill and recreate unlink directory. If a volume is mounted and we write data on it till we face ENOSPC, restart of that volume fails as it will not be able to create unlink dir. mkdir will fail with ENOSPC. This will not allow volume to restart. Solution: If .glusterfs/unlink directory exist, don't move it to landfill. Delete all the entries inside it. master - http://review.gluster.org/#/c/15030/ Change-Id: Icde3fb36012f2f01aeb119a2da042f761203c11f BUG: 1364365 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/15093 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* glusterd : skip non directories inside /var/lib/glusterd/volsJiffin Tony Thottan2016-08-091-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now glusterd won't come up if vols directory contains an invalid entry. Instead of doing that with this change a message will be logged and then skip that entry Backport details: >Change-Id: I665b5c35291b059cf054622da0eec4db44ec5f68 >BUG: 1318591 >Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> >Reviewed-on: http://review.gluster.org/13764 >Reviewed-by: Prashanth Pai <ppai@redhat.com> >Reviewed-by: Atin Mukherjee <amukherj@redhat.com> >Smoke: Gluster Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> (cherry picked from commit 720b63c24b07ee64e1338db28de602b9abbef0a1) Change-Id: I665b5c35291b059cf054622da0eec4db44ec5f68 BUG: 1365265 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> Reviewed-on: http://review.gluster.org/15113 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com>