diff options
author | Anand Avati <avati@redhat.com> | 2012-08-30 13:15:39 -0700 |
---|---|---|
committer | Anand Avati <avati@redhat.com> | 2012-09-12 14:29:51 -0700 |
commit | 4f87fd0ae2ce629576ca5f647a99888d31a46815 (patch) | |
tree | b548adb73477f5e23905adc18fefd90fa9c9e3e8 /glusterfsd | |
parent | c3d7286a67ce0ac4db9cb8fa079a48f423245000 (diff) |
dht: improve dht_fix_layout_of_directory for better re-assignment
Jeff Darcy wrote:
> AFAICT, the fix-layout code doesn't do the same rotation that the
> new-directory code does. Therefore, the new bricks always claim
> completely predictable hash ranges for every directory, leading to
> either a 0-1-2-3 pattern or a 1-0-2-3 pattern. In other words, a
> file whose hash falls into the second quarter of the range will always
> be assigned to brick 2, and a file whose hash falls into the fourth
> quarter will always be assigned to brick 3. The rest will be split
> according to the original pattern. Put still another way, instead of
> same-named files in different directories being spread across N bricks,
> they might be spread across only two bricks (bad) or totally
> concentrated on one brick (worse) regardless of N.
The current dht_fix_layout_of_directory() code, in an attempt to
maximize overlap of new layout with existing layout (to minimize
movement of data) fails to do a good job of randomizing new assignment
even when it could do a better job. In an example where we expand
from 2 nodes to 4 nodes, the current possibilities are limited in the
following way -
(theoretical hash range: 00 - 99)
OLD 1
-----
server1: 00 - 49
server2: 50 - 99
NEW 1
-----
server1: 00 - 24
server2: 50 - 74
server3: 25 - 49
server4: 75 - 99
OLD 2
-----
server1: 50 - 99
server2: 00 - 49
NEW 2
------
server1: 50 - 74
server2: 00 - 24
server3: 25 - 49
server4: 75 - 99
The above shows that when add-brick from 2 bricks to 4 bricks, server3
and server4 always get the _same_ hash range no matter what the original
hash range assignment was.
The fix in this patch is first do the standard new directory assignment
to a directory (with rotation etc.) and then do the reassignment to
maximize overlap. This way newly added servers still get random ranges
and existing servers have a probability of getting either of the quarters
which were part of its half previously. The same principles hold for
all add-brick from M to M+N.
Change-Id: I0cbbf3bfa334645728072d66aaaa80120d0b295f
BUG: 853258
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/3883
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Diffstat (limited to 'glusterfsd')
0 files changed, 0 insertions, 0 deletions