cluster/afr: AFR2 discovery should always do entry heal flow

Summary: - Fixes case where when a brick is completely wiped, the AFR2 discovery mechanism would potentially (1/R chance where R is your replication factor) pin a NFSd or client to the wiped brick. This would in turn prevent the client from seeing the contents of the (degraded) subvolume. - The fix proposed in this patch is to force the entry-self heal code path when the discovery process happens. And furthermore, forcing a conservative merge in the case where no brick is found to be degraded. - This also restores the property of our 3.4.x builds where-by bricks automagically rebuild via the SHDs without having to run any sort of "full heal". SHDs are given enough signal via this patch to figure out what they need to heal. Test Plan: Run "prove -v tests/bugs/fb8149516.t" Output: https://phabricator.fb.com/P19989638 Prove test showing failed run on v3.6.3-fb_10 without the patch -> https://phabricator.fb.com/P19989643 Reviewers: dph, moox, sshreyas Reviewed By: sshreyas FB-commit-id: 3d6f171 Change-Id: I7e0dec82c160a2981837d3f07e3aa6f6a701703f Signed-off-by: Kevin Vigor <kvigor@fb.com> Reviewed-on: https://review.gluster.org/16862 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
author: Richard Wareing <rwareing@fb.com> 2015-08-21 21:44:44 -0700
committer: Kevin Vigor <kvigor@fb.com> 2017-03-06 19:53:31 -0500
commit: f6cc23fb1d8f157ec598e0bbb63081c881388380 (patch)
tree: bdb0a579a0a548e3e2113d5641ffa951bb3fbaa9 /tests
parent: 259d65ffb7296415cb9110ba1877d0378265bf52 (diff)
1 files changed, 40 insertions, 0 deletions
diff --git a/tests/bugs/fb8149516.t b/tests/bugs/fb8149516.t
new file mode 100644
index 00000000000..54372794c6f
--- /dev/null
+++ b/tests/bugs/fb8149516.t
@@ -0,0 +1,40 @@
+#!/bin/bash
+
+. $(dirname $0)/../include.rc
+. $(dirname $0)/../volume.rc
+
+cleanup;
+
+TEST glusterd
+TEST pidof glusterd
+TEST $CLI volume create $V0 replica 3 $H0:$B0/${V0}{0,1,2}
+TEST $CLI volume set $V0 cluster.read-subvolume-index 2
+TEST $CLI volume set $V0 cluster.background-self-heal-count 0
+TEST $CLI volume set $V0 cluster.heal-timeout 30
+TEST $CLI volume set $V0 cluster.choose-local off
+TEST $CLI volume set $V0 cluster.entry-self-heal off
+TEST $CLI volume set $V0 cluster.data-self-heal off
+TEST $CLI volume set $V0 cluster.metadata-self-heal off
+TEST $CLI volume set $V0 nfs.disable off
+TEST $CLI volume start $V0
+TEST glusterfs --volfile-id=/$V0 --volfile-server=$H0 $M0 --attribute-timeout=0 --entry-timeout=0
+cd $M0
+for i in {1..10}
+do
+        dd if=/dev/urandom of=testfile$i bs=1M count=1 2>/dev/null
+done
+cd ~
+TEST kill_brick $V0 $H0 $B0/${V0}2
+TEST rm -rf $B0/${V0}2/testfile*
+TEST rm -rf $B0/${V0}2/.glusterfs
+
+TEST $CLI volume start $V0 force
+EXPECT_WITHIN 20 "1" afr_child_up_status_in_shd $V0 2
+
+# Verify we see all ten files when ls'ing, without the patch this should
+# return no files and fail.
+FILE_LIST=($(\ls $M0))
+TEST "((${#FILE_LIST[@]} == 10))"
+EXPECT_WITHIN 30 "0" get_pending_heal_count $V0
+
+cleanup
author	Richard Wareing <rwareing@fb.com>	2015-08-21 21:44:44 -0700
committer	Kevin Vigor <kvigor@fb.com>	2017-03-06 19:53:31 -0500
commit	f6cc23fb1d8f157ec598e0bbb63081c881388380 (patch)
tree	bdb0a579a0a548e3e2113d5641ffa951bb3fbaa9 /tests
parent	259d65ffb7296415cb9110ba1877d0378265bf52 (diff)