diff options
author | Ravishankar N <ravishankar@redhat.com> | 2013-12-23 09:32:22 +0000 |
---|---|---|
committer | Vijay Bellur <vbellur@redhat.com> | 2013-12-24 02:24:20 -0800 |
commit | f9698036fcc1ceedea19110139400d0cf4a54c9a (patch) | |
tree | 30d86c2a7f8af8aa48d00009bd56529a969eeec5 | |
parent | 879be836145f1d0b4bc381e7416ca8bd0811b718 (diff) |
cluster/afr: avoid race due to afr_is_transaction_running()
Problem:
------------------------------------------
afr_lookup_perform_self_heal() {
if(afr_is_transaction_running())
goto out
else
afr_launch_self_heal();
}
------------------------------------------
When 2 clients simultaneously access a file in split-brain, one of them
acquires the inode lock and proceeds with afr_launch_self_heal (which
eventually fails and sets "sh-failed" in the callback.)
The second client meanwhile bails out of afr_lookup_perform_self_heal()
because afr_is_transaction_running() returns true due to the lock obtained by
client-1. Consequetly in client-2, "sh-failed" does not get set in the dict,
causing quick-read translator to *not* invalidate the inode, thereby serving
data randomly from one of the bricks.
Fix:
If a possible split-brain is detected on lookup, forcefully traverse the
afr_launch_self_heal() code path in afr_lookup_perform_self_heal().
Change-Id: I316f9f282543533fd3c958e4b63ecada42c2a14f
BUG: 870565
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/6578
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Varun Shastry <vshastry@redhat.com>
-rw-r--r-- | xlators/cluster/afr/src/afr-common.c | 5 |
1 files changed, 5 insertions, 0 deletions
diff --git a/xlators/cluster/afr/src/afr-common.c b/xlators/cluster/afr/src/afr-common.c index a4f97e950ae..250b0944e90 100644 --- a/xlators/cluster/afr/src/afr-common.c +++ b/xlators/cluster/afr/src/afr-common.c @@ -1833,6 +1833,11 @@ afr_lookup_perform_self_heal (call_frame_t *frame, xlator_t *this, afr_lookup_set_self_heal_params (local, this); if (afr_can_self_heal_proceed (&local->self_heal, priv)) { if (afr_is_transaction_running (local) && + /*Forcefully call afr_launch_self_heal (which will go on to + fail) for SB files.This prevents stale data being served + due to race in afr_is_transaction_running() when + multiple clients access the same SB file*/ + !local->cont.lookup.possible_spb && (!local->attempt_self_heal)) goto out; |