diff options
author | Xavi Hernandez <jahernan@redhat.com> | 2018-02-21 17:47:37 +0100 |
---|---|---|
committer | Xavi Hernandez <xhernandez@redhat.com> | 2018-03-15 08:20:10 +0100 |
commit | 8fb21afdd033c5d466854400c6a7604fcf5241c3 (patch) | |
tree | 0e1902bf50584f3f87c5865f043804c9c5979d8e /xlators/cluster/ec/src/ec-heald.c | |
parent | 2628a91eaaf6a8584492b2d622c27b9d9b8b2e20 (diff) |
cluster/ec: avoid delays in self-heal
Self-heal creates a thread per brick to sweep the index looking for
files that need to be healed. These threads are started before the
volume comes online, so nothing is done but waiting for the next
sweep. This happens once per minute.
When a replace brick command is executed, the new graph is loaded and
all index sweeper threads started. When all bricks have reported, a
getxattr request is sent to the root directory of the volume. This
causes a heal on it (because the new brick doesn't have good data),
and marks its contents as pending to be healed. This is done by the
index sweeper thread on the next round, one minute later.
This patch solves this problem by waking all index sweeper threads
after a successful check on the root directory.
Additionally, the index sweep thread scans the index directory
sequentially, but it might happen that after healing a directory entry
more index entries are created but skipped by the current directory
scan. This causes the remaining entries to be processed on the next
round, one minute later. The same can happen in the next round, so
the heal is running in bursts and taking a lot to finish, specially
on volumes with many directory levels.
This patch solves this problem by immediately restarting the index
sweep if a directory has been healed.
Backport of:
> BUG: 1547662
Change-Id: I58d9ab6ef17b30f704dc322e1d3d53b904e5f30e
BUG: 1555198
Signed-off-by: Xavi Hernandez <jahernan@redhat.com>
Diffstat (limited to 'xlators/cluster/ec/src/ec-heald.c')
-rw-r--r-- | xlators/cluster/ec/src/ec-heald.c | 27 |
1 files changed, 21 insertions, 6 deletions
diff --git a/xlators/cluster/ec/src/ec-heald.c b/xlators/cluster/ec/src/ec-heald.c index b4fa6f87189..a703379a59b 100644 --- a/xlators/cluster/ec/src/ec-heald.c +++ b/xlators/cluster/ec/src/ec-heald.c @@ -184,8 +184,19 @@ ec_shd_index_purge (xlator_t *subvol, inode_t *inode, char *name) int ec_shd_selfheal (struct subvol_healer *healer, int child, loc_t *loc) { - return syncop_getxattr (healer->this, loc, NULL, EC_XATTR_HEAL, NULL, - NULL); + int32_t ret; + + ret = syncop_getxattr (healer->this, loc, NULL, EC_XATTR_HEAL, NULL, + NULL); + if ((ret >= 0) && (loc->inode->ia_type == IA_IFDIR)) { + /* If we have just healed a directory, it's possible that + * other index entries have appeared to be healed. We put a + * mark so that we can check it later and restart a scan + * without delay. */ + healer->rerun = _gf_true; + } + + return ret; } @@ -472,11 +483,15 @@ ec_shd_index_healer_spawn (xlator_t *this, int subvol) } void -ec_selfheal_childup (ec_t *ec, int child) +ec_shd_index_healer_wake(ec_t *ec) { - if (!ec->shd.iamshd) - return; - ec_shd_index_healer_spawn (ec->xl, child); + int32_t i; + + for (i = 0; i < ec->nodes; i++) { + if (((ec->xl_up >> i) & 1) != 0) { + ec_shd_index_healer_spawn(ec->xl, i); + } + } } int |