Prevent frame-timeouts from hanging syncops

Summary: It was observed while testing the SHD threading code, that under high loads SHD/AFR related SyncOps & SyncTasks can actually hang/deadlock as the transport disconnected event (for frame timeouts) never gets bubbled up correctly. Various tests indicated the ping timeouts worked fine, while "frame timeouts" did not. The only difference? Ping timeouts actually disconnect the transport while frame timeouts did not. So from a high-level we know this prevents deadlock as subsequent tests showed the deadlocks no longer ocurred (after this change). That said, there may be some more elegant solution. For now though, forcing a reconnect is preferential vs hanging clients or deadlocking the SHD. Test Plan: It's fairly difficult to write a good prove test for this since it requires human eyes to observe if the SHD is deadlocked (I'm open to ideas). Here's the repro though: 1. Create a 3x replicated cluster on a host. 2. Set the frame-timeout low (say 2 sec) 3. Down a brick, and write a pile of files (maybe 2000) 4. Bring up the downed brick and let the SHD begin healing files 5. During the heal process, kill -STOP <pid of brick> (hang) one of the bricks Without this patch the SHD will be deadlocked, even though the frame timed out after 2 seconds. With the patch, the plug is pulled on the transport, a disconnect is bubbled up to the syncop and the SHD resumes. Reviewers: dph, meyering, cjh Reviewed By: cjh Subscribers: ethanr Conflicts: rpc/rpc-lib/src/rpc-clnt.c FB-commit-id: c99357c Change-Id: I344079161492b195267c2d64b6eab0b441f12ded Signed-off-by: Kevin Vigor <kvigor@fb.com> Reviewed-on: https://review.gluster.org/16846 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
author: Richard Wareing <rwareing@fb.com> 2014-05-28 15:46:48 -0700
committer: Kevin Vigor <kvigor@fb.com> 2017-03-05 13:46:32 -0500
commit: 961dc5e6e7af437a77cd9736a0a925c5621b12cc (patch)
tree: d77dbfa104f24649b5ba354c0b879408c2be343d /tests
parent: 523a544a58737522866a9c6b8fc3c041a9b0621f (diff)
2 files changed, 69 insertions, 0 deletions
diff --git a/tests/bugs/fb4482137.t b/tests/bugs/fb4482137.t
new file mode 100755
index 00000000000..3616ab6022d
--- /dev/null
+++ b/tests/bugs/fb4482137.t
@@ -0,0 +1,62 @@
+#!/bin/bash
+
+#
+# Test the scenario where a SHD daemon suffers a frame timeout during a
+# crawl.  The expected behavior is that present crawl will continue
+# after the timeout and not deadlock.
+#
+
+. $(dirname $0)/../include.rc
+. $(dirname $0)/../volume.rc
+
+cleanup;
+
+function wait_for_shd_no_sink() {
+  local TIMEOUT=$1
+  # If we see the "no active sinks" log message we know
+  # the heal is alive.  It cannot proceed as the "sink"
+  # is hung, but it's at least alive and trying.
+  timeout $TIMEOUT grep -q 'replicate-0: no active sinks for' \
+    <(tail -fn0 /var/log/glusterfs/glustershd.log)
+  return $?
+}
+
+TEST glusterd
+TEST pidof glusterd
+TEST $CLI volume info 2> /dev/null;
+
+# Setup a cluster with 3 replicas, and fav child by majority on
+TEST $CLI volume create $V0 replica 3 $H0:$B0/${V0}{1..3};
+TEST $CLI volume set $V0 network.frame-timeout 2
+TEST $CLI volume set $V0 cluster.choose-local off
+TEST $CLI volume set $V0 cluster.entry-self-heal off
+TEST $CLI volume set $V0 cluster.metadata-self-heal off
+TEST $CLI volume set $V0 cluster.data-self-heal off
+TEST $CLI volume set $V0 cluster.self-heal-daemon on
+TEST $CLI volume set $V0 cluster.heal-timeout 10
+TEST $CLI volume start $V0
+sleep 5
+
+# Mount the volume
+TEST glusterfs --volfile-id=/$V0 --volfile-server=$H0 $M0 \
+  --attribute-timeout=0 --entry-timeout=0
+
+# Kill bricks 1
+TEST kill_brick $V0 $H0 $B0/${V0}1
+sleep 1
+
+# Write some data into the mount which will require healing
+cd $M0
+for i in {1..1000}; do
+  dd if=/dev/urandom of=testdata_$i bs=64k count=1 2>/dev/null
+done
+
+# Re-start the brick
+TEST $CLI volume start $V0 force
+EXPECT_WITHIN 20 "1" afr_child_up_status $V0 0
+
+sleep 1
+TEST hang_brick $V0 $H0 $B0/${V0}1
+sleep 4
+TEST wait_for_shd_no_sink 20
+cleanup
diff --git a/tests/volume.rc b/tests/volume.rc
index 5ea75a51d22..f75d8969e94 100644
--- a/tests/volume.rc
+++ b/tests/volume.rc
@@ -237,6 +237,13 @@ function kill_brick {
         kill -9 $(get_brick_pid $vol $host $brick)
 }
 
+function hang_brick {
+        local vol=$1
+        local host=$2
+        local brick=$3
+        kill -STOP $(get_brick_pid $vol $host $brick)
+}
+
 function check_option_help_presence {
         local option=$1
         $CLI volume set help | grep "^Option:" | grep -w $option
author	Richard Wareing <rwareing@fb.com>	2014-05-28 15:46:48 -0700
committer	Kevin Vigor <kvigor@fb.com>	2017-03-05 13:46:32 -0500
commit	961dc5e6e7af437a77cd9736a0a925c5621b12cc (patch)
tree	d77dbfa104f24649b5ba354c0b879408c2be343d /tests
parent	523a544a58737522866a9c6b8fc3c041a9b0621f (diff)