diff options
author | Xavier Hernandez <xhernandez@datalab.es> | 2014-10-15 09:44:55 +0200 |
---|---|---|
committer | Vijay Bellur <vbellur@redhat.com> | 2014-10-27 07:18:51 -0700 |
commit | 408d4454870e7374bc9c4060c6c23d224d5174a2 (patch) | |
tree | 01a0e65e9e825dadb23e8a14c4838b2e93eb7519 /tests | |
parent | 8ea1a4844975940013d8704f87ee137dcb27bfb5 (diff) |
ec: Fix rebalance issues
Some issues in ec xlator made that rebalance didn't complete
successfully and generated some warnings and errors in the
log. The most critical error was a race condition that caused
false corruption detection when two specific operations were
executed sequentially and they shared the same lock.
This explains the problem:
1. A setxattr is issued.
2. setxattr: ec locks the inode before updating the xattr.
3. setxattr: The xattr is updated.
4. setxattr: Upper xlator is notified that the operation completed.
5. setxattr: A background task is initiated to update the version
of the file.
6. A stat is issued on the same file.
7. stat: Since the lock is already acquired, it's reused.
8. stat: A lookup is issued to determine version and size
information of the file.
At this point, operations 5 and 8 can interfere. This can make that
lookup sees different information on each brick, determining that
some bricks are corrupted and incorrectly excluding them from the
operation and initiating a self-heal. In some cases this false
detection combined with self-heal could lead to invalid updates of
the trusted.ec.size xattr, leaving the file smaller than it should
be.
This only happens if the first operation does not perform a lookup,
because chained operations reuse the information returned by the
previous one, avoiding this kind of problems.
To solve this, now the background update is executed atomically with
the posterior unlock. This avoids some reuses of the lock while
updating. However this reduces performance because the window in
which new requests can reuse the lock is much smaller now. This has
been alleviated by using the same technique implemented in AFR (i.e.
waiting some time before releasing the lock).
Some minor changes also introduced in this patch:
* Bug in management of 'trusted.glusterfs.pathinfo' that was writing
beyond the allocated space.
* Uninitialized variable.
* trusted.ec.config was not created for regular files created with
mknod.
* An invalid state was used in access fop.
This is a backport of http://review.gluster.org/8947/
Change-Id: Idfaf69578ed04dbac97a62710326729715b9b395
BUG: 1152903
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8948
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Diffstat (limited to 'tests')
-rw-r--r-- | tests/basic/ec/ec-common | 5 | ||||
-rw-r--r-- | tests/basic/ec/ec.t | 4 | ||||
-rw-r--r-- | tests/basic/ec/self-heal.t | 4 |
3 files changed, 13 insertions, 0 deletions
diff --git a/tests/basic/ec/ec-common b/tests/basic/ec/ec-common index 65422d72095..19818355f04 100644 --- a/tests/basic/ec/ec-common +++ b/tests/basic/ec/ec-common @@ -81,6 +81,11 @@ for dir in . dir1; do EXPECT "$cs_big" echo $(sha1sum $dir/big | awk '{ print $1 }') +# Give enough time for current operations to complete. Otherwise the +# following kill_brick can cause data corruption and self-heal will be +# needed, but this script is not prepared to handle self-healing. + sleep 2 + for idx in `seq 0 $LAST_BRICK`; do TEST kill_brick $V0 $H0 $B0/$V0$idx diff --git a/tests/basic/ec/ec.t b/tests/basic/ec/ec.t index 91bad52499a..77c6460f11a 100644 --- a/tests/basic/ec/ec.t +++ b/tests/basic/ec/ec.t @@ -190,6 +190,8 @@ TEST touch $M0/setxattr TEST touch $M0/removexattr TEST setfattr -n user.bar -v "ash_nazg_gimbatul" $M0/removexattr +sleep 2 + # Kill a couple of bricks and allow some time for things to settle. TEST kill_brick $V0 $H0 $B0/${V0}3 TEST kill_brick $V0 $H0 $B0/${V0}8 @@ -216,6 +218,8 @@ TEST setfattr -x user.bar $M0/removexattr # Test uid/gid behavior TEST setup_perm_file $M0 +sleep 2 + # Unmount/remount so that create/write and truncate don't see cached data. TEST umount $M0 TEST glusterfs --entry-timeout=0 --attribute-timeout=0 -s $H0 --volfile-id $V0 $M0 diff --git a/tests/basic/ec/self-heal.t b/tests/basic/ec/self-heal.t index 3df19184169..2eb2663f8ea 100644 --- a/tests/basic/ec/self-heal.t +++ b/tests/basic/ec/self-heal.t @@ -178,6 +178,8 @@ for idx1 in {0..4}; do done done +sleep 2 + TEST kill_brick $V0 $H0 $B0/${V0}0 TEST kill_brick $V0 $H0 $B0/${V0}1 TEST cp $tmp/test test2 @@ -196,6 +198,8 @@ TEST [ -f test4 ] EXPECT "2" stat -c "%h" test2 EXPECT "2" stat -c "%h" test4 +sleep 2 + TEST $CLI volume start $V0 force # Wait until the killed bricks have been started and recognized by the ec # xlator |