glusterfs.git/geo-replication/syncdaemon/resource.py, branch v8.1

[geo-rep] Merging multiple import into single import

2020-04-08T11:22:56+00:00

Geo-replication has a large number of repeated imports as shown below:
```
from syncdutils import set_term_handler, finalize, lf
from syncdutils import log_raise_exception, FreeObject, escape
```

There imports can be clubbed together as shown below:
``
from syncdutils import (set_term_handler, finalize, lf,
                        log_raise_exception, FreeObject, escape)
```

Fixes: #1105
Change-Id: I59a48dd57a70fc851d93150b85e736ce41e8b793
Signed-off-by: kshithijiyer

geo-rep: Fix for "Transport End Point not connected" issue

2020-01-31T10:13:14+00:00

problem: Geo-rep gsyncd process mounts the master and slave volume
         on master nodes and slave nodes respectively and starts
         the sync. But it doesn't wait for the mount to be in ready
         state to accept I/O. The gluster mount is considered to be
         ready when all the distribute sub-volumes is up. If the all
         the distribute subvolumes are not up, it can cause ENOTCONN
         error, when lookup on file comes and file is on the subvol
         that is down.

solution: Added a Virtual Xattr "dht.subvol.status" which returns "1"
          if all subvols are up and "0" if all subvols are not up.
          Geo-rep then uses this virtual xattr after a fresh mount, to
          check whether all subvols are up or not and then starts the
          I/O.

fixes: bz#1664335
Change-Id: If3ad01d728b1372da7c08ccbe75a45bdc1ab2a91
Signed-off-by: Harpreet Kaur 
Signed-off-by: Kotresh HR

georep: Merge Worker and Agent as a single process

2019-11-07T06:24:39+00:00

- libgfchangelog is simplified by removing unnecessary API Class
- Merged Agent logic into Worker instead of running Worker and Agent as
  two separate processes and maintaining RPC between Worker and Agent.
- Geo-rep command Pause and Resume will continue without any changes.
  But Agent functionality also gets paused with that.

Updates: #755
Change-Id: Ie2c00fa7dddf21f180f0649e0aaf084d29023c98
Signed-off-by: Aravinda VK

geo-rep: Fix spelling errors

2019-09-24T11:42:11+00:00

Fixes: bz#1741779
Change-Id: I708b6b7e6c520dee10445528e6f99ba38e141c25
Signed-off-by: Shwetha K Acharya

geo-rep: performance improvement while syncing renames with existing gfid

2019-09-23T12:05:56+00:00

Problem:
The bug[1] addresses issue of data inconsistency when handling RENAME with
existing destination. This fix requires some performance tuning considering
this issue occurs in heavy rename workload.

Solution:
If distribution count for master volume is one do not verify op's on
master and go ahead with rename.

The performance improvement with this patch can only be observed if
master volume has distribution count one.

[1]. https://bugzilla.redhat.com/show_bug.cgi?id=1694820

fixes: bz#1753857
Change-Id: I8e9bcd575e7e35f40f9f78b7961c92dee642f47b
Signed-off-by: Sunny Kumar

geo-rep: Fix sync hang with tarssh

2019-05-13T11:06:41+00:00

Problem:
Geo-rep sync hangs when tarssh is used as sync
engine at heavy workload.

Analysis and Root cause:
It's found out that the tar process was hung.
When debugged further, it's found out that stderr
buffer of tar process on master was full i.e., 64k.
When the buffer was copied to a file from /proc/pid/fd/2,
the hang is resolved.

This can happen when files picked by tar process
to sync doesn't exist on master anymore. If this count
increases around 1k, the stderr buffer is filled up.

Fix:
The tar process is executed using Popen with stderr as PIPE.
The final execution is something like below.

tar | ssh  root@slave tar --overwrite -xf - -C 

It was waiting on ssh process first using communicate() and then tar.
Note that communicate() reads stdout and stderr. So when stderr of tar
process is filled up, there is no one to read until untar via ssh is
completed. This can't happen and leads to deadlock.
Hence we should be waiting on both process parallely, so that stderr is
read on both processes.

Change-Id: I609c7cc5c07e210c504771115b4d551a2e891adf
fixes: bz#1707728
Signed-off-by: Kotresh HR

geo-rep: Fix sync-method config

2019-05-09T05:18:05+00:00

Problem:
When 'use_tarssh' is set to true, it exits with successful
message but the default 'rsync' was used as sync-engine.
The new config 'sync-method' is not allowed to set from cli.

Analysis and Fix:
The 'use_tarssh' config is deprecated with new
config framework and 'sync-method' is the new
config to choose sync-method i.e. tarssh or rsync.
This patch fixes the 'sync-method' config. The allowed
values are tarssh and rsync.

Change-Id: I0edb0319cad0455b29e49f2f08a64ce324735e84
fixes: bz#1707686
Signed-off-by: Kotresh HR

geo-rep: Fix rename with existing destination with same gfid

2019-04-26T07:15:00+00:00

Problem:
   Geo-rep fails to sync the rename properly if destination exists.
It results in source to be remained on slave causing more number of
files on slave. Also heavy rename workload like logrotate caused
lot of ESTALE errors

Cause:
   Geo-rep fails to sync rename if destination exists if creation
of source file also falls into single batch of changelogs being
processed. This is because, after fixing problematic gfids verifying
from master, while re-processing original entries, CREATE also was
re-processed causing more files on slave and rename to be failed.

Solution:
   Entries need to be removed from retrial list after fixing
problematic gfids on slave so that it's not re-created again on slave.
   Also treat ESTALE as EEXIST so that the error is properly handled
verifying the op on master volume.

Change-Id: I50cf289e06b997adddff0552bf2466d9201dd1f9
fixes: bz#1694820
Signed-off-by: Kotresh HR 
Signed-off-by: Sunny Kumar

geo-rep: Fix syncing multiple rename of symlink

2019-03-29T07:23:41+00:00

Problem:
Geo-rep fails to sync rename of symlink if it's
renamed multiple times if creation and rename
happened successively

Worker crash at slave:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py",  in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", in entry_ops
    [ESTALE, EINVAL, EBUSY])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", in errno_wrap
    return call(*arg)
  File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", in lsetxattr
    cls.raise_oserr()
  File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", in raise_oserr
    raise OSError(errn, os.strerror(errn))
OSError: [Errno 12] Cannot allocate memory

Geo-rep Behaviour:
1. SYMLINK doesn't record target path in changelog.
   So while syncing SYMLINK, readlink is done on
   master to get target path.

2. Geo-rep will create destination if source is not
   present while syncing RENAME. Hence while syncing
   RENAME of SYMLINK, target path is collected from
   destination.

Cause:
If symlink is created and renamed multiple times, creation of
symlink is ignored, as it's no longer present on master at
that path. While symlink is renamed multiple times at master,
when syncing first RENAME of SYMLINK, both source and destination
is not present, hence target path is not known.  In this case,
while creating destination directly at slave,  regular file
attributes were encoded into blob instead of symlink,
causing failure in gfid-access translator while decoding
blob.

Solution:
While syncing of RENAME of SYMLINK, when target is not known
and when src and destination is not present on the master,
don't create destination. Ignore the rename. It's ok to ignore.
If it's unliked, it's fine.  If it's renamed to something else,
it will be synced then.

Change-Id: Ibdfa495513b7c05b5370ab0b89c69a6802338d87
fixes: bz#1693648
Signed-off-by: Kotresh HR

geo-rep : fix rename sync on hybrid crawl

2019-01-17T10:13:52+00:00

Problem: When geo-rep is configured as hybrid crawl
         directory renames are not synced to the slave.

Solution: Rename sync of directory was failing due to incorrect
          destination path calculation.
          During check for existence on slave we miscalculated
          realpath. .

Change-Id: I23f1ea60e86a917598fe869d5d24f8da654d8a0a
fixes: bz#1665826
Signed-off-by: Sunny Kumar