diff options
author | Pranith Kumar K <pkarampu@redhat.com> | 2015-04-16 09:25:31 +0530 |
---|---|---|
committer | Pranith Kumar Karampuri <pkarampu@redhat.com> | 2015-05-08 15:03:59 -0700 |
commit | fae1e70ff3309d2b64febaafc70abcaa2771ecf0 (patch) | |
tree | 53498b10f860b0d8d0a9500e5afae535965a7f22 /xlators/cluster/ec/src/ec-common.h | |
parent | e0401209cf58a638a32e7d867ab4c6199aa0e92f (diff) |
cluster/ec: metadata/name/entry heal implementation for ec
Metadata self-heal:
1) Take inode lock in domain 'this->name' on 0-0 range (full file)
2) perform lookup and get the xattrs on all the bricks
3) Choose the brick with highest version as source
4) Setattr uid/gid/permissions
5) removexattr stale xattrs
6) Setxattr existing/new xattrs
7) xattrop with -ve values of 'dirty' and difference of highest and its own
version values for version xattr
8) unlock lock acquired in 1)
Entry self-heal:
1) take directory lock in domain 'this->name:self-heal' on 'NULL' to prevent
more than one self-heal
2) we take directory lock in domain 'this->name' on 'NULL'
3) Perform lookup on version, dirty and remember the values
4) unlock lock acquired in 2)
5) readdir on all the bricks and trigger name heals
6) xattrop with -ve values of 'dirty' and difference of highest and its own
version values for version xattr
7) unlock lock acquired in 1)
Name heal:
1) Take 'name' lock in 'this->name' on 'NULL'
2) Perform lookup on 'name' and get stat and xattr structures
3) Build gfid_db where for each gfid we know what subvolumes/bricks have
a file with 'name'
4) Delete all the stale files i.e. the file does not exist on more than
ec->redundancy number of bricks
5) On all the subvolumes/bricks with missing entry create 'name' with same
type,gfid,permissions etc.
6) Unlock lock acquired in 1)
Known limitation: At the moment with present design, it conservatively
preserves the 'name' in case it can not decide whether to delete it. this can
happen in the following scenario:
1) we have 3=2+1 (bricks: A, B, C) ec volume and 1 brick is down (Lets say A)
2) rename d1/f1 -> d2/f2 is performed but the rename is successful only on one
of the bricks (Lets say B)
3) Now name self-heal on d1 and d2 would re-create the file on both d1 and d2
resulting in d1/f1 and d2/f2.
Because we wanted to prevent data loss in the case above, the following
scenario is not healable, i.e. it needs manual intervention:
1) we have 3=2+1 (bricks: A, B, C) ec volume and 1 brick is down (Lets say A)
2) We have two hard links: d1/a, d2/b and another file d3/c even before the
brick went down
3) rename d3/c -> d2/b is performed
4) Now name self-heal on d2/b doesn't heal because d2/b with older gfid will
not be deleted. One could think why not delete the link if there is
more than 1 hardlink, but that leads to similar data loss issue I described
earlier:
Scenario:
1) we have 3=2+1 (bricks: A, B, C) ec volume and 1 brick is down (Lets say A)
2) We have two hard links: d1/a, d2/b
3) rename d1/a -> d3/c, d2/b -> d4/d is performed and both the operations are
successful only on one of the bricks (Lets say B)
4) Now name self-heal on the 'names' above which can happen in parallel can
decide to delete the file thinking it has 2 links but after all the
self-heals do unlinks we are left with data loss.
Change-Id: I3a68218a47bb726bd684604efea63cf11cfd11be
BUG: 1216303
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/10298
Reviewed-on: http://review.gluster.org/10691
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Diffstat (limited to 'xlators/cluster/ec/src/ec-common.h')
-rw-r--r-- | xlators/cluster/ec/src/ec-common.h | 5 |
1 files changed, 5 insertions, 0 deletions
diff --git a/xlators/cluster/ec/src/ec-common.h b/xlators/cluster/ec/src/ec-common.h index aaae16e71c3..ba009040b71 100644 --- a/xlators/cluster/ec/src/ec-common.h +++ b/xlators/cluster/ec/src/ec-common.h @@ -15,6 +15,11 @@ #include "ec-data.h" +typedef enum { + EC_DATA_TXN, + EC_METADATA_TXN +} ec_txn_t; + #define EC_CONFIG_VERSION 0 #define EC_CONFIG_ALGORITHM 0 |