diff options
author | Jeff Darcy <jdarcy@redhat.com> | 2014-06-17 13:42:45 +0000 |
---|---|---|
committer | Vijay Bellur <vbellur@redhat.com> | 2014-07-12 09:20:52 -0700 |
commit | 99685f18f190a73f2a46478cac0b09f4c59834b1 (patch) | |
tree | f5e787ace3038b97876425c5397e91c0a37df04d /xlators/mgmt/glusterd/src/glusterd-volume-set.c | |
parent | d5ec66032ff96d7d417b5838a6bd1a047d52204c (diff) |
dht: support heterogeneous brick sizes
Calculation of layouts now considers the size of each brick, so that
smaller bricks don't get an "unfair" share of allocations and start
returning ENOSPC while the larger bricks still have plenty of space.
The observation has been made that some clients might get ENOTCONN when
trying to fetch disk-size information, and end up calculating layouts
differently. The following meta-observations can be made.
(1) This scenario is extremely unlikely in configurations with AFR.
(2) The most likely consequence of this scenario is that some files will
be placed sub-optimally by the client with the obsolete (non-weighted)
layout. They'll still be found anyway, so this isn't a show stopper.
(3) Without this patch it's *guaranteed* that some files will be placed
sub-optimally, because any layout that fails to account for brick sizes
is sub-optimal.
(4) We shouldn't be doing fix-layout from two nodes simultaneously
anyway. That's inefficient at best. Any instances of such behavior are
separate bugs, which should be fixed separately.
(5) In the most extreme edge case, two nodes doing weighted and
non-weighted layout fixes could race and end up creating an internally
inconsistent layout. This condition is still transient; it will be
detected and repaired automatically the next time anyone fetches the
layout. (If it's not that's also a preexisting bug that can show up in
other contexts.)
In conclusion, it's not the purpose of this patch to fix bugs elsewhere
in DHT. Its purpose is to make life incrementally better for users who
add new hardware with larger disks etc. than the older equipment. It's
only one part of an ongoing process to improve layout management and
repair, all the way up to support for multiple hash rings or tiering.
Change-Id: I05eb6f9eface9cdaf8622e0260c8c7f29020447f
BUG: 1114680
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/8093
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Diffstat (limited to 'xlators/mgmt/glusterd/src/glusterd-volume-set.c')
-rw-r--r-- | xlators/mgmt/glusterd/src/glusterd-volume-set.c | 4 |
1 files changed, 4 insertions, 0 deletions
diff --git a/xlators/mgmt/glusterd/src/glusterd-volume-set.c b/xlators/mgmt/glusterd/src/glusterd-volume-set.c index 92ab3d1a3a3..5358d52a43a 100644 --- a/xlators/mgmt/glusterd/src/glusterd-volume-set.c +++ b/xlators/mgmt/glusterd/src/glusterd-volume-set.c @@ -420,6 +420,10 @@ struct volopt_map_entry glusterd_volopt_map[] = { .op_version = 3, .flags = OPT_FLAG_CLIENT_OPT }, + { .key = "cluster.weighted-rebalance", + .voltype = "cluster/distribute", + .op_version = GD_OP_VERSION_3_6_0, + }, /* Switch xlator options (Distribute special case) */ { .key = "cluster.switch", |