diff options
author | Jeff Darcy <jdarcy@redhat.com> | 2012-03-12 09:32:40 -0400 |
---|---|---|
committer | Anand Avati <avati@redhat.com> | 2012-05-31 17:29:01 -0700 |
commit | ddc044bfa2840981de4003c3b9efcac84387dc2b (patch) | |
tree | a83d476702cac7ecc7ae59057c368f622a51af4c /xlators/cluster/afr/src/afr-dir-write.c | |
parent | e066a5fea7bdaa5da78e49c9a5bf344af2f33d3c (diff) |
replicate: add hashed read-child method.
Both the first-to-respond method and the round-robin method are susceptible
to clients repeatedly choosing the same servers across a series of opens,
creating hot spots. Also, the code to handle a replica being down will
ignore both methods and just choose the first remaining (which is not an
issue for two-way but can be otherwise). The hashed method more reliably
avoids such hot spots. There are three values/modes.
0: use the old (broken) methods.
1: select a read-child based on a hash of the file's GFID, so all clients
will choose the same subvolume for a file (ensuring maximum consistency)
but will distribute load for a set of files.
2: select a read-child based on a hash of the file's GFID plus the client's
PID, so different children will distribute load even for one file.
Mode 2 will probably be optimal for most cases. Using response time when we
open the file is problematic, both because a single sample might not have
been representative even then and because load might have shifted in the
hours or days since (for long-lived files). Trying to use more current load
information can lead to "herd following" behavior which is just as bad.
Pseudo-random distribution is likely to be the best we can reasonably do,
just as it is for DHT.
Change-Id: I798c2760411eacf32e82a85f03bb7b08a4a49461
BUG: 802513
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.com/2926
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
Diffstat (limited to 'xlators/cluster/afr/src/afr-dir-write.c')
-rw-r--r-- | xlators/cluster/afr/src/afr-dir-write.c | 15 |
1 files changed, 10 insertions, 5 deletions
diff --git a/xlators/cluster/afr/src/afr-dir-write.c b/xlators/cluster/afr/src/afr-dir-write.c index 9f2b975df6f..0b804bef580 100644 --- a/xlators/cluster/afr/src/afr-dir-write.c +++ b/xlators/cluster/afr/src/afr-dir-write.c @@ -196,7 +196,8 @@ unlock: afr_set_read_ctx_from_policy (this, inode, local->fresh_children, local->read_child_index, - priv->read_child); + priv->read_child, + local->cont.create.buf.ia_gfid); local->transaction.unwind (frame, this); local->transaction.resume (frame, this); @@ -429,7 +430,8 @@ afr_mknod_wind_cbk (call_frame_t *frame, void *cookie, xlator_t *this, afr_set_read_ctx_from_policy (this, inode, local->fresh_children, local->read_child_index, - priv->read_child); + priv->read_child, + local->cont.mknod.buf.ia_gfid); local->transaction.unwind (frame, this); local->transaction.resume (frame, this); @@ -657,7 +659,8 @@ afr_mkdir_wind_cbk (call_frame_t *frame, void *cookie, xlator_t *this, afr_set_read_ctx_from_policy (this, inode, local->fresh_children, local->read_child_index, - priv->read_child); + priv->read_child, + local->cont.mkdir.buf.ia_gfid); local->transaction.unwind (frame, this); local->transaction.resume (frame, this); @@ -887,7 +890,8 @@ afr_link_wind_cbk (call_frame_t *frame, void *cookie, xlator_t *this, afr_set_read_ctx_from_policy (this, inode, local->fresh_children, local->read_child_index, - priv->read_child); + priv->read_child, + local->cont.link.buf.ia_gfid); local->transaction.unwind (frame, this); local->transaction.resume (frame, this); @@ -1110,7 +1114,8 @@ afr_symlink_wind_cbk (call_frame_t *frame, void *cookie, xlator_t *this, afr_set_read_ctx_from_policy (this, inode, local->fresh_children, local->read_child_index, - priv->read_child); + priv->read_child, + local->cont.symlink.buf.ia_gfid); local->transaction.unwind (frame, this); local->transaction.resume (frame, this); |