summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorRavishankar N <ravishankar@redhat.com>2013-12-23 09:32:22 +0000
committerVijay Bellur <vbellur@redhat.com>2013-12-24 02:24:20 -0800
commitf9698036fcc1ceedea19110139400d0cf4a54c9a (patch)
tree30d86c2a7f8af8aa48d00009bd56529a969eeec5
parent879be836145f1d0b4bc381e7416ca8bd0811b718 (diff)
cluster/afr: avoid race due to afr_is_transaction_running()
Problem: ------------------------------------------ afr_lookup_perform_self_heal() { if(afr_is_transaction_running()) goto out else afr_launch_self_heal(); } ------------------------------------------ When 2 clients simultaneously access a file in split-brain, one of them acquires the inode lock and proceeds with afr_launch_self_heal (which eventually fails and sets "sh-failed" in the callback.) The second client meanwhile bails out of afr_lookup_perform_self_heal() because afr_is_transaction_running() returns true due to the lock obtained by client-1. Consequetly in client-2, "sh-failed" does not get set in the dict, causing quick-read translator to *not* invalidate the inode, thereby serving data randomly from one of the bricks. Fix: If a possible split-brain is detected on lookup, forcefully traverse the afr_launch_self_heal() code path in afr_lookup_perform_self_heal(). Change-Id: I316f9f282543533fd3c958e4b63ecada42c2a14f BUG: 870565 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/6578 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Varun Shastry <vshastry@redhat.com>
-rw-r--r--xlators/cluster/afr/src/afr-common.c5
1 files changed, 5 insertions, 0 deletions
diff --git a/xlators/cluster/afr/src/afr-common.c b/xlators/cluster/afr/src/afr-common.c
index a4f97e950ae..250b0944e90 100644
--- a/xlators/cluster/afr/src/afr-common.c
+++ b/xlators/cluster/afr/src/afr-common.c
@@ -1833,6 +1833,11 @@ afr_lookup_perform_self_heal (call_frame_t *frame, xlator_t *this,
afr_lookup_set_self_heal_params (local, this);
if (afr_can_self_heal_proceed (&local->self_heal, priv)) {
if (afr_is_transaction_running (local) &&
+ /*Forcefully call afr_launch_self_heal (which will go on to
+ fail) for SB files.This prevents stale data being served
+ due to race in afr_is_transaction_running() when
+ multiple clients access the same SB file*/
+ !local->cont.lookup.possible_spb &&
(!local->attempt_self_heal))
goto out;