diff options
author | Aravinda VK <avishwan@redhat.com> | 2015-04-12 17:46:45 +0530 |
---|---|---|
committer | Vijay Bellur <vbellur@redhat.com> | 2015-05-05 06:45:01 -0700 |
commit | 08107796c89f5f201b24d689ab6757237c743c0d (patch) | |
tree | 4fe4af7979e6734e9b2246fd4b2582570e8592f0 /geo-replication/syncdaemon/syncdutils.py | |
parent | cfa6c85334fd62175aa114d779873b6790d6db8a (diff) |
geo-rep: Minimize rm -rf race in Geo-rep
While doing RMDIR worker gets ENOTEMPTY because same directory will
have files from other bricks which are not deleted since that worker
is slow processing. So geo-rep does recursive_delete.
Recursive delete was done using shutil.rmtree. once started, it will
not check disk_gfid in between. So it ends up deleting the new files
created by other workers. Also if other worker creates files after one
worker gets list of files to be deleted, then first worker will again
get ENOTEMPTY again.
To fix these races, retry is added when it gets ENOTEMPTY/ESTALE/ENODATA.
And disk_gfid check added for original path for which recursive_delete is
called. This disk gfid check executed before every Unlink/Rmdir. If disk
gfid is not matching with GFID from Changelog, that means other worker
deleted the directory. Even if the subdir/file present, it belongs to
different parent. Exit without performing further deletes.
Retry on ENOENT during create is ignored, since if CREATE/MKNOD/MKDIR
failed with ENOENT will not succeed unless parent directory is created
again.
Rsync errors handling was handling unlinked_gfids_list only for one
Changelog, but when processed in batch it fails to detect unlinked_gfids
and retries again. Finally skips the entire Changelogs in that batch.
Fixed this issue by moving self.unlinked_gfids reset logic before batch
start and after batch end.
Most of the Geo-rep races with rm -rf is eliminated with this patch,
but in some cases stale directories left in some bricks and in mount
point we get ENOTEMPTY.(DHT issue, Error will be logged in Slave log)
BUG: 1211037
Change-Id: I8716b88e4c741545f526095bf789f7c1e28008cb
Signed-off-by: Aravinda VK <avishwan@redhat.com>
Reviewed-on: http://review.gluster.org/10204
Reviewed-by: Kotresh HR <khiremat@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Diffstat (limited to 'geo-replication/syncdaemon/syncdutils.py')
-rw-r--r-- | geo-replication/syncdaemon/syncdutils.py | 6 |
1 files changed, 3 insertions, 3 deletions
diff --git a/geo-replication/syncdaemon/syncdutils.py b/geo-replication/syncdaemon/syncdutils.py index 2614c828104..b565ec66cb5 100644 --- a/geo-replication/syncdaemon/syncdutils.py +++ b/geo-replication/syncdaemon/syncdutils.py @@ -466,9 +466,8 @@ def selfkill(sig=SIGTERM): os.kill(os.getpid(), sig) -def errno_wrap(call, arg=[], errnos=[], retry_errnos=[ESTALE]): +def errno_wrap(call, arg=[], errnos=[], retry_errnos=[]): """ wrapper around calls resilient to errnos. - retry in case of ESTALE by default. """ nr_tries = 0 while True: @@ -483,7 +482,8 @@ def errno_wrap(call, arg=[], errnos=[], retry_errnos=[ESTALE]): nr_tries += 1 if nr_tries == GF_OP_RETRIES: # probably a screwed state, cannot do much... - logging.warn('reached maximum retries (%s)...' % repr(arg)) + logging.warn('reached maximum retries (%s)...%s' % + (repr(arg), ex)) return ex.errno time.sleep(0.250) # retry the call |