diff options
author | Brian Foster <bfoster@redhat.com> | 2012-12-03 10:45:04 -0500 |
---|---|---|
committer | Anand Avati <avati@redhat.com> | 2012-12-04 14:45:23 -0800 |
commit | 741766c708f2a246854584c064d63d3fba67be90 (patch) | |
tree | 659ec297c9a4f0bdb9118bb16d791aea101b29fb | |
parent | e19bf891d5373e1660e666fecf6740062a375617 (diff) |
afr: use data trylock mode in read/write self-heal trigger paths
Self-heal data lock contention between clients and glustershd
instances can lead to long wait and user response times if the
client ends up pending its lock on glustershd self-heal of a large
file. We have reports of guest vm instances going completely
unresponsive during self-heal of virtual disk images.
Optimize the read/write self-heal trigger codepath
(i.e., afr_open_fd_fix()) to trylock for self-heal and skip the
self-heal otherwise to minimize the likelihood of a running/active
guest of competing with glustershd on arrival of a brick. Note that
lock contention is still possible from the client (e.g., via
lookup).
BUG: 874045
Change-Id: I406443c061ff6acd2a851179626b78352caa5c03
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-on: http://review.gluster.org/4258
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
-rw-r--r-- | xlators/cluster/afr/src/afr-self-heal-data.c | 9 |
1 files changed, 8 insertions, 1 deletions
diff --git a/xlators/cluster/afr/src/afr-self-heal-data.c b/xlators/cluster/afr/src/afr-self-heal-data.c index bf20d865..c7a97c99 100644 --- a/xlators/cluster/afr/src/afr-self-heal-data.c +++ b/xlators/cluster/afr/src/afr-self-heal-data.c @@ -1235,6 +1235,7 @@ afr_sh_data_open_cbk (call_frame_t *frame, void *cookie, xlator_t *this, afr_private_t *priv = NULL; int call_count = 0; int child_index = 0; + gf_boolean_t block = _gf_true; local = frame->local; sh = &local->self_heal; @@ -1276,7 +1277,13 @@ afr_sh_data_open_cbk (call_frame_t *frame, void *cookie, xlator_t *this, "fd for %s opened, commencing sync", local->loc.path); - afr_sh_data_lock (frame, this, 0, 0, _gf_true, + /* + * The read and write self-heal trigger codepaths do not provide + * an unwind callback. We run a trylock in these codepaths + * because we are sensitive to locking latency. + */ + block = sh->unwind ? _gf_true : _gf_false; + afr_sh_data_lock (frame, this, 0, 0, block, afr_sh_data_big_lock_success, afr_sh_data_fail); } |