summaryrefslogtreecommitdiffstats
path: root/tests/basic/afr
Commit message (Collapse)AuthorAgeFilesLines
* tests: Fix split-brain-favorite-child-policy.t failuresRavishankar N2017-01-051-4/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In CentOS-7, the file was receving an extra removexattr(security.ima) FOP which changed its ctime, breaking the assumption that a particular brick had the latest ctime based on the writevs done in the .t Fix: 1. Compare the ctime of both files in the backend and pick the one with the latest ctime for the fav-child policy. Also unmount the volume before comparing, to avoid any further FOPS on the file that can possibly modify the timestamps. 2. Added floating point handling in stat function. Thanks to Pranith for the helping debugging the regex. > Reviewed-on: http://review.gluster.org/16288 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> (cherry picked from commit 76fff8cb2a164b596ca67e65c99623f5b68361fd) Change-Id: I06041a0f39a29d2593b867af8685d65c7cd99150 BUG: 1410073 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/16324 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* glfsheal: Explicitly enable self-heal xlator optionsRavishankar N2016-12-151-0/+3
| | | | | | | | | | | | | | | | | | | | | | | Enable data, metadata and entry self-heal as xlator-options so that glfs-heal.c can heal split-brain files even if they are disabled on the volume via volume set commands. > Reviewed-on: http://review.gluster.org/11333 > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> > Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> (cherry picked from commit 209c2d447be874047cb98d86492b03fa807d1832) Change-Id: Ic191a1017131db1ded94d97c932079d7bfd79457 BUG: 1405130 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/16144 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/afr: CLI for granular entry heal enablement/disablementKrutika Dhananjay2016-11-296-7/+149
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/15747 When there are already existing non-granular indices created that are yet to be healed, if granular-entry-heal option is toggled from 'off' to 'on', AFR self-heal whenever it kicks in, will try to look for granular indices in 'entry-changes'. Because of the absence of name indices, granular entry healing logic will fail to heal these directories, and worse yet unset pending extended attributes with the assumption that are no entries that need heal. To get around this, a new CLI is introduced which will invoke glfsheal program to figure whether at the time an attempt is made to enable granular entry heal, there are pending heals on the volume OR there are one or more bricks that are down. If either of them is true, the command will be failed with the appropriate error. New CLI: gluster volume heal <VOL> granular-entry-heal {enable,disable} Change-Id: I342e0390f847fcb015a50ef58aedfcbcb58f4ed3 BUG: 1398501 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/15942 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* features/index: Delete granular entry indices of already healed directories ↵Krutika Dhananjay2016-11-261-0/+76
| | | | | | | | | | | | | | | | | | | | | | | | during crawl Backport of: http://review.gluster.org/15880 If granular name indices are already in existence for a volume, and before they are healed, granular entry heal be disabled, a crawl on indices/xattrop will clear the changelogs on these directories. When their corresponding entry-changes indices are crawled subsequently, if it is found that the directories don't need heal anymore, the granular indices are not cleaned up. This patch fixes that problem by ensuring that the zero-xattrop also deletes the stale indices at the level of index translator. Change-Id: Iae0a560c1c9d37b083cad89f16d3dcf83c4f7dc7 BUG: 1398501 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/15927 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr,ec: Heal device files with correct major, minor numbersPranith Kumar K2016-10-261-10/+18
| | | | | | | | | | | | | | | | | | | | | | | Thanks a lot to xiaoping.wu@nokia.com from Nokia for the bug and the fix. >BUG: 1384297 >Change-Id: Ie443237e85d34633b5dd30f85eaa2ac34e45754c >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/15728 >Smoke: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Change-Id: I7646adc3771ff76cdf9c979b575bbcd0b3bc1b9a BUG: 1388948 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15735 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* tests: Fix spurious failures with split-brain-favorite-child-policy.tPranith Kumar K2016-07-281-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: It is not guranteed that the self-heal daemon would apply the new option as soon as volume set is executed because all the command gurantees is that the process is notified of the change in volfile. Shd still needs to fetch volfile and reconfigure. If the next volume heal command comes even before the reconfigure happens, then the heal won't happen. Fix: Restart shd to make sure it has the option loaded with new value. >BUG: 1358976 >Change-Id: I3ed30ebbec17bd06caa632e79e9412564f431b19 >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/14978 >Smoke: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> >Tested-by: Jeff Darcy <jdarcy@redhat.com> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Jeff Darcy <jdarcy@redhat.com> BUG: 1360573 Change-Id: I09e097dbdc2cae659ad1617d336945eb804b09a5 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15022 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com>
* afr: some coverity fixesRavishankar N2016-07-282-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note: This is a backport of http://review.gluster.org/14895. It contains: i) fixes that prevent deadlocks (afr-common.c). ii) fixes over-writing op-errno=ENOMEM with possible other values (afr-inode-read.c). iii) prevents doing further operations with a NULL dictionary if allocation fails (afr-self-heal-data.c). iv) prevents falsely marking a sink as healed if metadata heal fails midway(afr-self-heal-metadata.c). v) other minor fixes. Considering the above are not trivial fixes, the patch is a good candidate for merging in 3.8 branch. Thanks to Krutika for a cleaner way to track inode refs in afr_set_split_brain_choice(). Change-Id: I2d968d05b815ad764b7e3f8aa9ad95a792b3c1df BUG: 1360556 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/15018 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr, index: Clean up stale directory and file indices in granular entry shKrutika Dhananjay2016-07-151-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/14832 Specifically when a directory tree is removed (rm -rf) while a brick is down, both the directory index and the name indices of the files and subdirs under it will remain. Self-heal will need to pick up these and remove them. Towards this, afr sh will now also crawl indices/entry-changes and call an rmdir on the dir if the directory index is stale. On the brick side, rmdir fop has been implemented for index xl, which would delete the directory index and its contents if present in a synctask. Change-Id: I08f45201adca56737ec2be1aab5433aebaefefd0 BUG: 1355609 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/14920 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* features/index: Delete parent dir indices when heal on it is completeKrutika Dhananjay2016-07-152-17/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/#/c/14781 In this patch, the state information about whether a directory gfid index is present or not is stored in the inode ctx with values IN and NOTIN. This saves index xl the need to perform stat() everytime an index_entry_create() is called. When a brick is restarted these in-memory inode ctx records will be gone. So when granular entry heal happens after a brick is restarted, and a post-op is done on the parent, if the state gotten from inode ctx is UNKNOWN, then index xl does a stat to initialize the state as IN or NOTIN. Note that this is a one-time operation for the lifetime of the brick. Such a change also helps avoid calling index_del() in xattrop_index_action() periodically even when granular self-heal is disabled or when the volume type is disperse. Change-Id: I037d0a8936381fbe3105e2e78489bfa571e5bdb0 BUG: 1355609 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/14896 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* cli: fix crash in arbiter keyword parsingRavishankar N2016-06-201-0/+25
| | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/#/c/14738/ A negative case like `gluster volume create volname arbiter 3 /bricks{1..3}` must not crash. 'arbiter' keyword is valid only for (3 way) replica volumes. The .t that is added will crash and create a core *without* the fix when run but will still pass all TESTs. Since the regression framework fails the .t if it creates a core, we can consider it a valid test 'that fails without the fix'. Change-Id: Ie2d7ced66025ea3617d30f6f823b22401e6d2fde BUG: 1348055 Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit b5c492dfea2d2e2075aa88d7153fba57b06e739d) Reviewed-on: http://review.gluster.org/14764 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* tests: Add a test for conservative merge with granular eshKrutika Dhananjay2016-06-101-0/+139
| | | | | | | | | | | | | Backport of: http://review.gluster.org/#/c/14601 Change-Id: Ic289b72443cbb4dcf1c289ac28622b49cb1153c3 BUG: 1340991 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/14602 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* tests: Add more tests for granular entry self-heal featureKrutika Dhananjay2016-06-011-0/+169
| | | | | | | | | | | | | | | Backport of: http://review.gluster.org/#/c/14542/ Change-Id: I62d383afdb2ee1dfde1e2646bdfd951ed9d5a75d BUG: 1340991 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/14560 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com>
* afr: Automagic unsplit-brain by [ctime|mtime|size|majority]Ravishankar N2016-05-271-0/+175
| | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/#/c/14026/ Introduce cluster.favorite-child-policy which when enabled with [ctime|mtime|size|majority], automatically heals files that are in split-brian. The majority policy will not pick a source if there is no majority. The other three policies pick the first brick with a valid reply and non-zero ctime/mtime/size as source. Change-Id: I93623a914dce2839957fce87b514050e9d274d4c BUG: 1339639 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/14535 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cli/glusterd: add/remove brick fixes for arbiter volumesRavishankar N2016-05-242-0/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/14126 1.Provide a command to convert replica 2 volumes to arbiter volumes. Existing self-heal logic will automatically heal the file hierarchy into the arbiter brick, the progress of which can be monitored using the heal info command. Syntax: gluster volume add-brick <VOLNAME> replica 3 arbiter 1 <HOST:arbiter-brick-path> 2. Add checks when removing bricks from arbiter volumes: - When converting from arbiter to replica 2 volume, allow only arbiter brick to be removed. - When converting from arbiter to plain distribute volume, allow only if arbiter is one of the bricks that is removed. 3. Some clean-up: - Use GD_MSG_DICT_GET_SUCCESS instead of GD_MSG_DICT_GET_FAILED to log messages that are not failures. - Remove unused variable `brick_list` - Move 'brickinfo->group' related functions to glusted-utils. Change-Id: Ifa75d137c67ffddde7dcb8e0df0873163e713119 BUG: 1337387 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/14502 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* tests: Add afr/tarissue.t to bad testsRavishankar N2016-05-231-0/+3
| | | | | | | | | | | | | | | | Backport of:http://review.gluster.org/#/c/14446/ Likey a tar binary bug and nothing to do with gluster but adding to bad tests for now. Change-Id: I5cc419f555fef98de555aabb16033f8fe7dc87d0 BUG: 1337795 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/14447 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* cluster/afr: Don't let NFS cache stat after writesPranith Kumar K2016-05-141-0/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Afr does post-ops after write but the stat buffer it unwinds is at the time of write, so if nfs client caches this, it will see different ctime when it does stat on it after post-op is done. From NFS client's perspective it thinks the file is changed. Tar which depends on this to be correct keeps giving 'file changed as we read it' warning. If Afr instead has to choose to unwind after post-op, eager-lock, delayed-post-op will have to be disabled which will lead to bad performance for all write usecases. Fix: Don't let client cache stat after write. >Change-Id: Ic6062acc6e5cdd97a9c83c56bd529ec83cee8a23 >BUG: 1302948 >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Signed-off-by: Anuradha Talur <atalur@redhat.com> >Reviewed-on: http://review.gluster.org/13785 >Smoke: Gluster Build System <jenkins@build.gluster.com> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Niels de Vos <ndevos@redhat.com> BUG: 1335285 Change-Id: Ibef4fc80496d12acd15db57713af2e3a1c9109a7 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/14300 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* cluster/afr: Do heals with shd pidPranith Kumar K2016-05-141-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | Multi-threaded healing doesn't create synctask with shd pid, this leads to healing problems when quota exceeds. >BUG: 1332994 >Change-Id: I80f57c1923756f3298730b8820498127024e1209 >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/14211 >Smoke: Gluster Build System <jenkins@build.gluster.com> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Ravishankar N <ravishankar@redhat.com> BUG: 1335283 Change-Id: If59d8f88d8f4a3ca6a3b6e1c9dfd594dd93f542b Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/14298 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* tests: Add test cases for add/replace brick with granular entry shKrutika Dhananjay2016-05-012-0/+155
| | | | | | | | | | | | | | Most of the tests borrowed from Anuradha's original replace-brick and add-brick tests under tests/basic/afr/. Change-Id: I874c04a6af3223e07aa6099b818ff502b6ba2a15 BUG: 1269461 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/14130 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Do not fsync when durability is offPranith Kumar K2016-04-271-0/+44
| | | | | | | | | | | | BUG: 1329501 Change-Id: Id402c20f2fa19b22bc402295e03e7a0ea96b0c40 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/14048 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* glusterd: default value of nfs.disable, change from false to trueKaleb S KEITHLEY2016-04-271-0/+1
| | | | | | | | | | | | | | | | | | Next step in eventual deprecation of glusterfs nfs server in favor of ganesha.nfsd. Also replace several open-coded strings with constant. Change-Id: If52f5e880191a14fd38e69b70a32b0300dd93a50 BUG: 1092414 Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/13738 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* cluster/afr: Fix inode-leak in data self-healPranith Kumar K2016-04-241-5/+5
| | | | | | | | | | | | | | | Thanks to Olia-Kremmyda for finding the bug on github review, https://github.com/gluster/glusterfs/commit/b8106d1127f034ffa88b5dd322c23a10e023b9b6 Change-Id: Ib8640ed0c331a635971d5d12052f0959c24f76a2 BUG: 1329773 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/14052 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
* cluster/afr: Fix spurious entries in heal infoPranith Kumar K2016-04-201-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Locking schemes in afr-v1 were locking the directory/file completely during self-heal. Newer schemes of locking don't require Full directory, file locking. But afr-v2 still has compatibility code to work-well with older clients, where in entry-self-heal it takes a lock on a special 256 character name which can't be created on the fs. Similarly for data self-heal there used to be a lock on (LLONG_MAX-2, 1). Old locking scheme requires heal info to take sh-domain locks before examining heal-state. If it doesn't take sh-domain locks, then there is a possibility of heal-info hanging till self-heal completes because of compatibility locks. But the problem with heal-info taking sh-domain locks is that if two heal-info or shd, heal-info try to inspect heal state in parallel using trylocks on sh-domain, there is a possibility that both of them assuming a heal is in progress. This was leading to spurious entries being shown in heal-info. Fix: As long as there is afr-v1 way of locking, we can't fix this problem with simple solutions. If we know that the cluster is running newer versions of locking schemes, in those cases we can give accurate information in heal-info. So introduce a new option called 'locking-scheme' which if it is 'granular' will give correct information in heal-info. Not only that, Extra network hops for taking compatibility locks, sh-domain locks in heal info will not be necessary anymore. Thus it improves performance. BUG: 1322850 Change-Id: Ia563c5f096b5922009ff0ec1c42d969d55d827a3 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13873 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ashish Pandey <aspandey@redhat.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
* cluster/afr: Fix witness counting code in src/sink detectionPranith Kumar K2016-04-111-2/+45
| | | | | | | | | | | | | | | | | | | | | | | Problem: In afr-v1 pre-op, xattrop increments self xattr first then it increments the value on rest. In post-op, xattr value is decreased first on rest and at last it gets decremented on self. So for a possible operation to be witnessed i.e. a fop is seen by the brick it is important to have at least 1 pending op because without completing pre-op fop won't come. The other possibility is when fop completes but at the time of post-op after decrementing pending counts on others just before decrementing its own pending count, the brick dies. Fix: Fix witness detection code in afr_self_heal_find_direction() BUG: 1322253 Change-Id: Ia7e76482c0a46e775e269bb96ec1b9490a3ac18f Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13811 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com>
* tests: Fix typo in split-brain-healing.tRavishankar N2016-04-071-4/+4
| | | | | | | | | | | | Change-Id: Ie4554a13fd60d2b14518cc54e8c464f898970030 BUG: 1321322 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/13875 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com>
* afr: add mtime based split-brain resolution to CLIRavishankar N2016-03-291-0/+43
| | | | | | | | | | | | | | | | | | | Extended the CLI to include support for split-brain resolution based on mtime. The command syntax is: $:gluster volume heal <VOLNAME> split-brain latest-mtime <FILE> where <FILE> can be either the full file name as seen from the root of the volume (or) the gfid-string representation of the file. Change-Id: I7a16f72ff1a4495aa69f43f22758a9404e958b4f BUG: 1321322 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/13828 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* afr : Enable auto heal when replica count increasesAnuradha Talur2016-03-211-0/+67
| | | | | | | | | | | | | | | | | | | | | | | This patch is part two change to prevent data loss in a replicate volume on doing a add-brick operation. Problem: After doing add-brick, there is a chance that self heal might happen from the newly added brick rather than the source brick, leading to data loss. Solution: Mark pending changelogs on afr children for the new afr-child so that heal is performed in the correct direction. Change-Id: I11871e55eef3593aec874f92214a2d97da229b17 BUG: 1276203 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/12454 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* cluster/afr: Enhance the test to be more robustPranith Kumar K2016-03-171-5/+18
| | | | | | | | | | | | | | | | | | In some cases of dht, there is code path (dht_lookup_directory) where it sets gfid-req before lookup. This leads to successful setting of gfid when there are only two subvolumes in distribute. So increased number of replica subvolumes. Also increased number of directories. Change-Id: I17092ce6dc69c7fed6e6b380eb0fc0040f19c06a BUG: 1312816 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13754 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Ravishankar N <ravishankar@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* tests: Self-heald.t spurious failure fixPranith Kumar K2016-03-162-2/+1
| | | | | | | | | | | | | | | | | | | | | | Problem: There is no guarantee that the indices are created by the time write is complete because write-behind may not flush the buffers. Fix: Disable flush-behind so that by the time 'echo abc > file' completes, indices are created. Also removed split-brain-healing.t from spurious failures as we are not able to recreate it. BUG: 1306897 Change-Id: I5c9c735430f1736747c8d7396d2cbf487533f4b5 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13434 Reviewed-by: Anuradha Talur <atalur@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Smoke: Gluster Build System <jenkins@build.gluster.com>
* afr: Add more checks to check bricks being upRavishankar N2016-03-141-0/+8
| | | | | | | | | | | Change-Id: I27f1a1c2f28d129ef7fafc676a8d3d6b82bcf2e4 BUG: 1316462 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/13667 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* tests: Use force_umount instead of umountPranith Kumar K2016-03-111-3/+3
| | | | | | | | | | | | | | | umount leads to spurious failures with "mount is busy" kind of errors at the time of umount. Use 'force_umount' instead. BUG: 1310171 Change-Id: I5a5579288f002de14effc00b793143fef86eb828 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13611 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
* tests: Add mechanism for disabled testsRaghavendra Talur2016-03-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Requirements: Should be able to skip tests from run-tests.sh run. Should be granular enough to disable on subset of OSes. Solution: Tests can have special comment lines with some comma separated values within them. Key names used to determine test status are G_TESTDEF_TEST_STATUS_CENTOS6 G_TESTDEF_TEST_STATUS_NETBSD7 Some examples: G_TESTDEF_TEST_STATUS_CENTOS6=BAD_TEST,BUG=123456 G_TESTDEF_TEST_STATUS_NETBSD7=KNOWN_ISSUE,BUG=4444444 G_TESTDEF_TEST_STATUS_CENTOS6=BAD_TEST,BUG=123456;555555 You can change status of test to enabled or delete the line only if all the bugs are closed or modified or if the patch fixes it. Change-Id: Idee21fecaa5837fd4bd06e613f5c07a024f7b0c2 BUG: 1295704 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: http://review.gluster.org/13393 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Kaushal M <kaushal@redhat.com>
* afr: do not set arbiter as a readable subvol in inode contextRavishankar N2016-03-042-4/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: If afr_lookup_done() or afr_read_subvol_select_by_policy() chooses the arbiter brick to serve the stat() data, file size will be reported as zero from the mount, despite other data bricks being available. This can break programs like tar which use the stat info to decide how much to read. Fix: In the inode-context, mark arbiter as a non-readable subvol for both data and metadata. It it to be noted that by making this fix, we are *not* going to serve metadata FOPS anymore from the arbiter brick despite the brick storing the metadata. It makes sense to do this because the ever increasing over-loaded FOPs (getxattr returning stat data etc.) and compound FOPS in gluster will otherwise make it difficult to add checks in code to handle corner cases. Change-Id: Ic60b25d77fd05e0897481b7fcb3716d4f2101001 BUG: 1310171 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reported-by: Mat Clayton <mat@mixcloud.com> Reviewed-on: http://review.gluster.org/13539 Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/afr: Don't delete gfid-req from lookup requestPranith Kumar K2016-03-021-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Afr does dict_ref of the xattr_req that comes to it and deletes "gfid-req" key. Dht uses same dict to send lookup to other subvolumes. So in case of directories and more than 1 dht subvolumes, second subvolume till the last subvolume won't get a lookup request with "gfid-req". So gfid reset never happens on the directories in distributed replicate subvolume for 2nd till last subvolumes. Fix: Make a copy of lookup xattr request. Also fixed replies_wipe possibly resetting gfid to NULL gfid BUG: 1312816 Change-Id: Ic16260e5a4664837d069c1dc05b9e96ca05bda88 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13545 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
* afr: Add throttled background client-side healsRavishankar N2016-03-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | If a heal is needed after inode refresh (lookup, read_txn), launch it in the background instead of blocking the fop (that triggered refresh) until the heal happens. afr_replies_interpret() is modified such that the heal is launched only if atleast one sink brick is up. Max. no of heals that can happen in parallel is configurable via the 'background-self-heal-count' volume option. Any number greater than that is put in a wait queue whose length is configurable via 'heal-wait-queue-leng' volume option. If the wait queue is also full, further heals will be ignored. Default values: background-self-heal-count=8, heal-wait-queue-leng=128 Change-Id: I1d4a52814cdfd43d90591b6d2ad7b6219937ce70 BUG: 1297172 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/13207 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* cli/ afr: op_ret for index heal launchRavishankar N2016-02-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: If index heal is launched when some of the bricks are down, glustershd of that node sends a -1 op_ret to glusterd which eventually propagates it to the CLI. Also, glusterd sometimes sends an err_str and sometimes not (depending on the failure happening in the brick-op phase or commit-op phase). So the message that gets displayed varies in each case: "Launching heal operation to perform index self heal on volume testvol has been unsuccessful" (OR) "Commit failed on <host>. Please check log file for details." Fix: 1. Modify afr_xl_op() to return -1 even if index healing of atleast one brick fails. 2. Ignore glusterd's error string in gf_cli_heal_volume_cbk and print a more meaningful message. The patch also fixes a bug in glusterfs_handle_translator_op() where if we encounter an error in notify of one xlator, we break out of the loop instead of sending the notify to other xlators. Change-Id: I957f6c4b4d0a45453ffd5488e425cab5a3e0acca BUG: 1302291 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/13303 Reviewed-by: Anuradha Talur <atalur@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* tests: Fix sparse-file-self-heal.tRavishankar N2016-01-201-10/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Psuedo Problem: https://build.gluster.org/job/rackspace-regression-2GB-triggered/16682/consoleFull The 'zeroedfile' disk usage comparision which is failing in this .t file fails so only on XFS. The test passes when the backend is on a s̶a̶n̶e̶r̶ different filesystem like EXT4 or BTRFS. This is due to the speculative preallocation in XFS which can reserve different disk space on different XFS mounts for the same version and same file operation. See BZ 1277992 for an example of XFS behaviour. Fix: Don't compare the disk usage of the file on the bricks of the replica: instead, check that the disk space consumed is atleast equal to the size of the file. Also remove sparse-file-self-heal.t from is_bad_test() Change-Id: If43f59549136ebf91f17ff9d958954b3587afe56 BUG: 1298111 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/13233 Tested-by: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.com>
* tests: Fix arbiter-statfs.tRavishankar N2015-12-101-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ..and remove it from bad tests list. Problem: https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/12516/consoleFull ++ SETUP_LOOP /d/backends/brick1 ++ '[' 1 '!=' 1 ']' ++ backend=/d/backends/brick1 ++ case ${OSTYPE} in +++ awk -F: '/not in use/{print $1; exit}' +++ vnconfig -l vnconfig: VNDIOCGET: Bad file descriptor ++ vnd= ++ '[' x = x ']' ++ echo 'no more vnd' no more vnd ++ return 1 Fix: TEST the return value of SETUP_LOOP. Also added EXIT_EARLY to the test case because there is no point in continuing the test when setting the bricks fail. Change-Id: I933611c41f93ac646f1170b62db656314c801ef1 BUG: 1290125 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/12936 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: write zeros to sink for non-sparse filesRavishankar N2015-10-281-0/+21
| | | | | | | | | | | | | | | | | Problem: If a file is created with zeroes ('dd', 'fallocate' etc.) when a brick is down, the self-heal does not write the zeroes to the sink after it comes up. Consequenty, there is a mismatch in disk-usage amongst the bricks of the replica. Fix: If we definitely know that the file is not sparse, then write the zeroes to the sink even if the checksums match. Change-Id: Ic739b3da5dbf47d99801c0e1743bb13aeb3af864 BUG: 1272460 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/12371 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* xlators: add JSON FOP statistics dumps every N secondsRichard Wareing2015-10-081-0/+8
| | | | | | | | | | | | | | | | | | | | | | | Summary: - Adds a thread to the io-stats translator which dumps out statistics every N seconds where N is configurable by an option called "diagnostics.stats-dump-interval" - Thread cleanly starts/stops when translator is unloaded - Updates macros to use "Atomic Builtins" (e.g. intel CPU extentions) to use memory barries to update counters vs using locks. This should reduce overhead and prevent any deadlock bugs due to lock contention. Test Plan: - Test on development machine - Run prove -v tests/basic/stats-dump.t Change-Id: If071239d8fdc185e4e8fd527363cc042447a245d BUG: 1266476 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/12209 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Avra Sengupta <asengupt@redhat.com>
* tests: check free space only on gluster mountRavishankar N2015-09-031-1/+1
| | | | | | | | | | Change-Id: I06de4f555e66fac2594676572c8f8a4ee08f8131 BUG: 1251346 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/12096 Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* tests: add basic/afr/arbiter-statfs.tRavishankar N2015-08-111-0/+37
| | | | | | | | | | | | This is a test-case for BZ 1251346 Change-Id: Icbde8d17c823a3f2c98056c14a75f0ef5227b848 BUG: 1251346 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/11864 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* cluster/ec: Make background healing optional behaviorPranith Kumar K2015-07-068-32/+32
| | | | | | | | | | | Provide options to control number of active background heal count and qlen. Change-Id: Idc2419219d881f47e7d2e9bbc1dcdd999b372033 BUG: 1237381 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11473 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr : set pending xattrs for replaced brickAnuradha2015-06-251-0/+64
| | | | | | | | | | | | | | | | | | | | | | This patch is part two change to prevent data loss in a replicate volume on doing a replace-brick commit force operation. Problem: After doing replace-brick commit force, there is a chance that self heal might happen from the replaced (sink) brick rather than the source brick leading to data loss. Solution: Mark pending changelogs on afr children for the replaced afr-child so that heal is performed in the correct direction. Change-Id: Icb9807e49b4c1c4f1dcab115318d9a58ccf95675 BUG: 1207829 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/10448 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
* Tests portability: umount(8)Emmanuel Dreyfus2015-06-092-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1) Avoid hangs on unmounting NFS on NetBSD NetBSD umount(8) on a NFS mount whose server is gone will wait forever because umount(8) calls realpath(3) and tries to access the mount before it calls unmount(2). The non-portable, NetBSD-specific umount -R flag prevent that behavior. We therefore introduce UMOUNT_F, defined as "umount -f" on Linux and "umount -f -R" on NetBSD to take care of forced unmounts, especially in the NFS case. 2) Enforce usage of force_umount wrapper with timeout Whenever umount is used it should be wrapped in force_umount with tiemout handling. That saves us timing issues, and it handles the NetBSD NFS case. 3) Cleanup kernel cache flush. We used (cd $M0 && umount $M0 ) as a portable kernel cache flush trick, but it does not flush everything we need on Linux. Introduce a drop_cache() shell function that reverts to previously used echo 3 > /proc/sys/vm/drop_caches on Linux, and keeps (cd $M0 && umount $M0 ) on other systems. BUG: 1129939 Change-Id: Iab1f5a023405f1f7270c42b595573702ca1eb6f3 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/11114 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* afr: honour selfheal enable/disable volume set optionsRavishankar N2015-06-031-0/+86
| | | | | | | | | | | | | | | | | | | | afr-v1 had the following volume set options that are used to enable/ disable self-heals from happening in AFR xlator when loaded in the client graph: cluster.metadata-self-heal cluster.data-self-heal cluster.entry-self-heal In afr-v2, these 3 heals can happen from the client if there is an inode refresh. This patch allows such heals to proceed only if the corresponding volume set options are set to true. Change-Id: I8d97d6020611152e73a269f3fdb607652c66cc86 BUG: 1226507 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/11012 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* tests: Fix entry-self-heal.tKrutika Dhananjay2015-05-261-3/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | Because both bricks in the replica pair of patchy are in the same node, both full healer threads within the same shd try and fail to acquire non-blocking locks when each one gets lock on one of the bricks, causing heal to fail occasionally. Now heals are triggered from the mount as part of inode refresh. And because the AFR on the mount graph a. does not treat presence of dirty xattrs as something that needs a heal (this is true for dirs fool_heal and fool_me) and b. does not recursively heal the entire hierarchy of subdirs and their entries in one shot (this is true with source_creations_heal/dir1), index heal is used to heal fool_heal, fool_me and source_creations_heal/dir1 wherein only one brick (which is the brick that contains the good copy of source_creations_heal/dir_1: brick-1) has all the gfids to be healed copied into its indices/xattrop directory. Change-Id: I46df4188f16d1623f20cc0d7266b3afaeca6c31f BUG: 1163543 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/10916 Tested-by: NetBSD Build System Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* tests: arbiter.t fixRavishankar N2015-05-241-1/+7
| | | | | | | | | | | | | | | | | | | Wait for AFR's children to be up in glustershd process before attempting heal. Also, grep (version 2.21) is detecting statedump files as binary, causing tests to succeed incorrectly. Hence adding the -a switch to force it to treat it as a text file. Thanks to Vijay Bellur for identifying the issue (http://lists.gnu.org/archive/html/bug-grep/2015-05/msg00000.html) and the workaround. Change-Id: Ie3d9591ffaf44baa0cd8c2baa327aed24378e3df BUG: 1163543 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/10833 Tested-by: NetBSD Build System Tested-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* tests: data-self-heal.t fixRavishankar N2015-05-221-2/+26
| | | | | | | | | | | | | Use Index heal instead of full heal to heal files because if both bricks are on the same node, the 2 full heal threads might compete and fail to acquire the non blocking locks and the file might not get healed during the full heal crawl. Change-Id: I3b9e2de7b0366b4bc40b54314807ef165baad68f BUG: 1163543 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/10875 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* test: Fix sparse file self heal testzhoushicheng2015-05-191-0/+10
| | | | | | | | | | | | | | | | | | | | | | | This patch solves problems caused by XFS with speculative preallocation feature on : Test EXPECT "1" has_holes $B0/${V0}0/big2bigger would fall when XFS has not freed the preallocated blocks. It is caused by XFS speculative preallocation feature. The test would pass if this feature is disabled. Speculative preallocation can speed up under linux 3.8(and later). Otherwise, the test would pass by dropping cache manually to speed up speculative preallocation. As in http://review.gluster.org/#/c/10411/, using "( cd $M0 ; umount $M0 )" to drop caches, which is better than "echo 3 > /proc/sys/vm/drop_caches". drop caches operation was added in test: tests/basic/afr/sparse-file-self-heal.t BUG: 1206461 Change-Id: Ie2c9d1b92fa8307c44498752fdd100eb86f9689c Signed-off-by: zhoushicheng <madaozhou@gmail.com> Reviewed-on: http://review.gluster.org/10253 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* tests: data-self-heal.t-create files from the mount point.Ravishankar N2015-05-181-32/+14
| | | | | | | | | | | | | | | | | | Currently data-self-heal.t creates files directly on the brick and sets the trusted.gfid xattr. Later it depends on stat from mount to create the .glusterfs/<gfid hard link>. The link creation doesn't seem to be happening always. Hence changing the test to create files from the mount point before modifying afr xattrs in the backend and triggering heal. Also disabled all performance translators. With these changes, http://review.gluster.org/#/c/10667/ is not failing in data-self-heal.t Change-Id: I7d054e52b97aeb0bdc2fdf9d70a8cf33318d4310 BUG: 1218304 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/10530 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>