From ddc044bfa2840981de4003c3b9efcac84387dc2b Mon Sep 17 00:00:00 2001 From: Jeff Darcy Date: Mon, 12 Mar 2012 09:32:40 -0400 Subject: replicate: add hashed read-child method. Both the first-to-respond method and the round-robin method are susceptible to clients repeatedly choosing the same servers across a series of opens, creating hot spots. Also, the code to handle a replica being down will ignore both methods and just choose the first remaining (which is not an issue for two-way but can be otherwise). The hashed method more reliably avoids such hot spots. There are three values/modes. 0: use the old (broken) methods. 1: select a read-child based on a hash of the file's GFID, so all clients will choose the same subvolume for a file (ensuring maximum consistency) but will distribute load for a set of files. 2: select a read-child based on a hash of the file's GFID plus the client's PID, so different children will distribute load even for one file. Mode 2 will probably be optimal for most cases. Using response time when we open the file is problematic, both because a single sample might not have been representative even then and because load might have shifted in the hours or days since (for long-lived files). Trying to use more current load information can lead to "herd following" behavior which is just as bad. Pseudo-random distribution is likely to be the best we can reasonably do, just as it is for DHT. Change-Id: I798c2760411eacf32e82a85f03bb7b08a4a49461 BUG: 802513 Signed-off-by: Jeff Darcy Reviewed-on: http://review.gluster.com/2926 Tested-by: Gluster Build System Reviewed-by: Anand Avati --- xlators/cluster/afr/src/afr.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'xlators/cluster/afr/src/afr.h') diff --git a/xlators/cluster/afr/src/afr.h b/xlators/cluster/afr/src/afr.h index 27cb83a57..a1a30562b 100644 --- a/xlators/cluster/afr/src/afr.h +++ b/xlators/cluster/afr/src/afr.h @@ -130,6 +130,7 @@ typedef struct _afr_private { gf_boolean_t entry_change_log; /* on/off */ int read_child; /* read-subvolume */ + unsigned int hash_mode; /* for when read_child is not set */ int favorite_child; /* subvolume to be preferred in resolving split-brain cases */ @@ -936,12 +937,13 @@ afr_first_up_child (unsigned char *child_up, size_t child_count); int afr_select_read_child_from_policy (int32_t *fresh_children, int32_t child_count, int32_t prev_read_child, - int32_t config_read_child, int32_t *sources); + int32_t config_read_child, int32_t *sources, + unsigned int hmode, uuid_t gfid); void afr_set_read_ctx_from_policy (xlator_t *this, inode_t *inode, int32_t *fresh_children, int32_t prev_read_child, - int32_t config_read_child); + int32_t config_read_child, uuid_t gfid); int32_t afr_get_call_child (xlator_t *this, unsigned char *child_up, int32_t read_child, -- cgit