summaryrefslogtreecommitdiffstats
path: root/geo-replication/syncdaemon/syncdutils.py
Commit message (Collapse)AuthorAgeFilesLines
* eventsapi: Fix Python3 compatibility issuesAravinda VK2019-02-261-5/+5
| | | | | | | | | | | | - Fixed Relative import and non-package import related issues. - socketserver import issues fix - Renamed installed directory name to `gfevents` from `events`(To avoid any issues with other global libs) Fixes: bz#1649054 Change-Id: I3dc38bc92b23387a6dfbcc0ab8283178235bf756 Signed-off-by: Aravinda VK <avishwan@redhat.com> (cherry picked from commit cd68f7b88b9a2c9a4e4ff9fca61517384e54130a)
* georep: python2 to python3 compat - schedulerKotresh HR2018-11-081-1/+1
| | | | | | | | | | | | | | | | 1. scheduler - Popen 2. syncdutils - corner case on failure Backport of: > Patch: https://review.gluster.org/21505 > BUG: 1643932 > Change-Id: I65af97a244a8790e976acedc2728db6ebbf2ae10 > Signed-off-by: Kotresh HR <khiremat@redhat.com> (cherry picked from commit 33e96100e17e9a293db6d63d9d5449d6c2d69376) fixes: bz#1644514 Change-Id: I65af97a244a8790e976acedc2728db6ebbf2ae10 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* georep: python2 to python3 compat - syscallsKotresh HR2018-10-101-12/+0
| | | | | | | | | | | | | 1. ctypes/syscalls A) arguments is expected to be encoded B) Raw conversion of return value from bytearray into string 2. struct pack/unpack - Raw converstion of string to bytearray 3. basestring -> str Updates: #411 Change-Id: I80f939adcdec0ed0022c87c0b76d057ad5559e5a Signed-off-by: Kotresh HR <khiremat@redhat.com> (cherry picked from commit fb6e8d0d0ca21b16d331fa69da9b9dadf6c5c35d)
* georep: python3 compatibility (Popen)Kotresh HR2018-10-051-4/+7
| | | | | | | | | | | | | | | The file objects for python3 by default is opened in binary mode where as in python2 it's opened as text by default. The geo-rep code parses the output of Popen assuming it as text, hence used the 'universal_newlines' flag which provides backward compatibility for the same. Change-Id: I371a03b6348af9666164cb2e8b93d47475431ad9 Updates: #411 Signed-off-by: Kotresh HR <khiremat@redhat.com> (cherry picked from commit 65aed1070cc2e44959cf3a0fbfde635de7e03103)
* georep: Fix python3 compatibility (os.pipe)Kotresh HR2018-10-051-0/+12
| | | | | | | | | | | | | | | 'os.pipe' returns pair of file descriptors which are non-inheritable by child processes. But geo-rep uses te inheritable nature of pipe fds to communicate between parent and child processes. Hence wrote a compatiable pipe routine which works well both with python2 and python3 with inheritable nature. Updates: #411 Change-Id: I869d7a52eeecdecf3851d44ed400e69b32a612d9 Signed-off-by: Kotresh HR <khiremat@redhat.com> (cherry picked from commit 173e89a6506bc8c727ce6d8e5ac84b59ad2e21de)
* georep: python2 and python3 compat - bytes-strKotresh HR2018-10-051-3/+5
| | | | | | | | | 1. Fix fdopen used for pid file 2. Fix sha256 checksum calculation Updates: #411 Change-Id: Ic173d104a73822c29aca260ba6de872cd8d23f86 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* All: run codespell on the code and fix issues.Yaniv Kaul2018-07-221-1/+1
| | | | | | | | | | | | Please review, it's not always just the comments that were fixed. I've had to revert of course all calls to creat() that were changed to create() ... Only compile-tested! Change-Id: I7d02e82d9766e272a7fd9cc68e51901d69e5aab5 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* geo-rep: Fix issues with gfid conflict handlingKotresh HR2018-07-201-0/+35
| | | | | | | | | | | | | | | | | | | | | | | 1. MKDIR/RMDIR is recorded on all bricks. So if one brick succeeds creating it, other bricks should ignore it. But this was not happening. The fix rename of directories in hybrid crawl, was trying to rename the directory to itself and in the process crashing with ENOENT if the directory is removed. 2. If file is created, deleted and a directory is created with same name, it was failing to sync. Again the issue is around the fix for rename of directories in hybrid crawl. Fixed the same. If the same case was done with hardlink present for the file, it was failing. This patch fixes that too. fixes: bz#1598884 Change-Id: I6f3bca44e194e415a3d4de3b9d03cc8976439284 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Fix geo-rep for older versions of unshareKotresh HR2018-06-221-0/+17
| | | | | | | | | | | | | | Geo-rep mounts are private to worker. It uses mount namespace using unshare command to achieve the same. Well, the unshare command has to support '--propagation' option. So geo-rep breaks on the systems with older unshare version. The patch makes it fall back to lazy umount behaviour if the unshare does not support propagation option. fixes: bz#1589782 Change-Id: Ia614f068aede288d63ac62fea4461b1865066054 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* core/various: python3 compat, prepare for python2 -> python3Kaleb S. KEITHLEY2018-06-041-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | see https://review.gluster.org/#/c/19788/, https://review.gluster.org/#/c/19871/, and https://review.gluster.org/#/c/19952/ This patch adds version agnostic imports for urllib, cpickle, socketserver, _thread, queue, etc., suggested by Aravinda in https://review.gluster.org/#/c/19767/1 Note: Fedora packaging guidelines require explicit shebangs, so popular practices like #!/usr/bin/env python and #!/usr/bin/python are not allowed; they must be #!/usr/bin/python2 or #!/usr/bin/python3 Note: Selected small fixes from 2to3 utility. Specifically apply, basestring, funcattrs, idioms, numliterals, set_literal, types, urllib, and zip have already been applied. Note: these 2to3 fixes report no changes are necessary: exec, execfile, exitfunc, filter, getcwdu, intern, itertools, metaclass, methodattrs, ne, next, nonzero, operator, paren, raw_input, reduce, reload, renames, repr, standarderror, sys_exc, throw, tuple_params, xreadlines. Change-Id: I8d393064a1837874d8b4bc87c8ce05c679664642 updates: #411 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* geo-rep: Remove lazy umount and use mount namespacesKotresh HR2018-02-221-6/+8
| | | | | | | | | | | | | Lazy umounting the master volume by worker causes issues with rsync's usage of getcwd. Henc removing the lazy umount and using private mount namespace for the same. On the slave, the lazy umount is retained as we can't use private namespace in non root geo-rep setup. Change-Id: I403375c02cb3cc7d257a5f72bbdb5118b4c8779a BUG: 1546129 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Support for using Volinfo from Conf fileAravinda VK2018-01-231-0/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | Once Geo-replication is started, it runs Gluster commands to get Volume info from Master and Slave. With this patch, Georep can get Volume info from Conf file if `--use-gconf-volinfo` argument is specified to monitor Create a config(Or add to the config if exists) with following fields [vars] master-bricks=NODEID:HOSTNAME:PATH,.. slave-bricks=NODEID:HOSTNAME,.. master-volume-id= slave-volume-id= master-replica-count= master-disperse_count= Note: Exising Geo-replication is not affected since this is activated only when `--use-gconf-volinfo` is passed while spawning `gsyncd monitor` Tiering support is not yet added since Tiering + Glusterd2 is still under discussion. Fixes: #396 Change-Id: I281baccbad03686c00f6488a8511dd6db0edc57a Signed-off-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Log message improvementsAravinda VK2017-12-281-1/+1
| | | | | | BUG: 1529480 Change-Id: If4775ed9886990c0e1bcf4e44c7dfef95cc4f0c3 Signed-off-by: Aravinda VK <avishwan@redhat.com>
* fips/geo-rep: Replace MD5 with SHA256Kotresh HR2017-12-221-6/+14
| | | | | | | | | | | | | | | | | | MD5 is not fips compliant. Hence replacing with SHA256. NOTE: The hash is used to form the ctl_path for the ssh connection. The length of ctl_path for ssh connection should not be > 108. ssh fails with ctl_path too long if it is so. But when rsync is piped to ssh, it is not taking > 90. rsync is failing with error number 12. Hence using first 32 bytes of hash. Hash collision doesn't matter as only one sock file is created per directory. Change-Id: I58aeb32a80b5422f6ac0188cf33fbecccbf08ae7 Updates: #230 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Refactoring Config and Arguments parsingAravinda VK2017-11-151-71/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Fixed Python pep8 issues - Removed dead code - Rewritten configuration management - Rewritten Arguments/subcommands handling - Added Args upgrade to accommodate all these changes without changing glusterd code - use of md5 removed, which was used to hash the brick path for workdir Both Master and Slave nodes will have subdir for session in the format "<mastervol>_<primary_slave_host>_<slavevol> $GLUSTER_LOGDIR/geo-replication/<mastervol>_<primary_slave_host>_<slavevol> $GLUSTER_LOGDIR/geo-replication-slaves/<mastervol>_<primary_slave_host>_<slavevol> Log file paths renamed since session info is available with directory name itself. $LOG_DIR_MASTER/ - gsyncd.log - Gsyncd, Worker monitor logs - mnt-<brick-path>.log - Aux mount logs, mounted by each worker - changes-<brick-path>.log - Changelog related logs(One per brick) $LOG_DIR_SLAVE/ - gsyncd.log - Slave Gsyncd logs - mnt-<master-node>-<master-brick-path>.log - Aux mount logs, mounted for each connection from master-node:master-brick - mnt-mbr-<master-node>-<master-brick-path>.log - Same as above, but mountbroker setup Fixes: #73 Change-Id: I2ec2a21e4e2a92fd92899d026e8543725276f021 Signed-off-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Fix rename of directory in hybrid crawlKotresh HR2017-11-101-1/+237
| | | | | | | | | | | | | In hybrid crawl, renames and unlink can't be synced but directory renames can be detected. While syncing the directory on slave, if the gfid already exists, it should be rename. Hence if directory gfid already exists, rename it. Change-Id: Ibf9f99e76a3e02795a3c2befd8cac48a5c365bb6 BUG: 1499566 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Fix syncing of hardlink of symlinkKotresh HR2017-08-241-0/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: If there is a hardlink to a symlink on master and if the symlink file is deleted on master, geo-rep fails to sync the hardlink. Typical Usecase: It's easily hit with rsnapshot use case where it uses hardlinks. Example Reproducer: Setup geo-replication between master and slave volume and in master mount point, do the following. 1. mkdir /tmp/symlinkbug 2. ln -f -s /does/not/exist /tmp/symlinkbug/a_symlink 3. rsync -a /tmp/symlinkbug ./ 4. cp -al symlinkbug symlinkbug.0 5. ln -f -s /does/not/exist2 /tmp/symlinkbug/a_symlink 6. rsync -a /tmp/symlinkbug ./ 7. cp -al symlinkbug symlinkbug.1 Cause: If the source was not present while syncing hardlink, it was always packing the blob as regular file. Fix: If the source was not present while syncing hardlink, pack the blob based on the mode. Change-Id: Iaa12d6f99de47b18e0650e7c4eb455f23f8390f2 BUG: 1432046 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reported-by: Christian Lohmaier <lohmaier+rhbz@gmail.com> Reviewed-on: https://review.gluster.org/18011 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Fix changelog encoding to encode only space and newlineAravinda VK2017-07-211-0/+15
| | | | | | | | | | | | | | | | | | libgfchangelog was encoding path using spec rfc3986, but encoding only required for SPACE and NEWLINE chars since the NEWLINE char is used as record separator and SPACE as field separator in the parsed changelogs output. Changed the encoding function to encode only SPACE and NEWLINE. BUG: 1451724 Change-Id: I1936efad31788a9e636f912c832ed7d7efea4fe2 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: https://review.gluster.org/17787 Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* geo-rep: Structured log supportAravinda VK2017-06-201-6/+21
| | | | | | | | | | | | | Changed all log messages to structured log format Change-Id: Idae25f8b4ad0bbae38f4362cbda7bbf51ce7607b Updates: #240 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: https://review.gluster.org/17551 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Rsync tunables for performance improvementsAravinda VK2017-05-231-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Flag: --ignore-missing-args This Rsync flag reduces sync failures if the source file is unlinked but present in --files-from list. This reduces Rsync retries in Geo-rep and improves the performance Flag: --existing Rsync in Geo-rep never creates target files. Using RPC Geo-rep creates entry in Slave and rsync --inplace used to prevent creating temporary file and rename.(To avoid different GFID in Slave). If the entry is missing in Slave then Geo-rep Rsync gets Permission denied errors when it tries to create file with name as GFID inside .gfid dir.(Geo-rep rsync syncs data using GFIDS with aux-gfid-mount) To disable these flags, gluster volume geo-replication <session> config \ rsync-opt-ignore-missing-args false gluster volume geo-replication <session> config \ rsync-opt-existing false Thanks Kotresh for finding these awesome tunables. BUG: 1400924 Change-Id: I6a84fb86a589bf6edc8dfd1086456a84b05a64fc Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: https://review.gluster.org/16010 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Fix mount cleanupKotresh HR2017-04-271-1/+6
| | | | | | | | | | | | | | On corner cases, mount cleanup might cause worker crash. Fixing the same. Change-Id: I38c0af51d10673765cdb37bc5b17bb37efd043b8 BUG: 1433506 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/17015 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Fix EBUSY tracebackKotresh HR2017-04-101-1/+1
| | | | | | | | | | | | | | EBUSY was added to retry list of errno_wrap without importing. Fixing the same. Change-Id: Ide81a9ccc9b948a96265b6890da078b722b45d51 BUG: 1434018 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/17011 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Retry on EBUSYKotresh HR2017-04-071-2/+2
| | | | | | | | | | | | | | | Do not crash on EBUSY error. Add EBUSY retry errno list. Crash only if the error persists even after max retries. Change-Id: Ia067ccc6547731f28f2a315d400705e616cbf662 BUG: 1434018 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/16924 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* geo-rep: Optionally allow access to mountsKotresh HR2017-03-191-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | In order to improve debuggability, it is important to have access to geo-rep master and slave mounts. With the default behaviour, geo-rep lazy unmounts the mounts after changing the current working directory into the mount point. It also cleans up the mount points. So only geo-rep worker has the access and it becomes impossible to take the client profile info and do any other client statck analysis. Hence the following new config is being introduced to allow access to mounts. gluster vol geo-rep <mastervol> <slavehost>::<slavevol> \ config access_mount true The default value of 'access_mount' is false. Change-Id: I53dce4ea86a6ffc979c82f9330e8954327180ca3 BUG: 1433506 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/16912 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: Use Host UUID to find local Gluster nodeAravinda VK2016-12-131-42/+20
| | | | | | | | | | | | | | | | | To spawn workers for each local brick, Geo-rep was collecting all the machine IPs based on hostname and finds based on the connectivity. With this patch, Geo-rep finds local brick if host UUID matches with UUID of the brick from Volume info. BUG: 1401801 Change-Id: Ic83c65df89e43cb86346e3ede227aa84d17ffd79 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/16035 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com>
* geo-rep/eventsapi: Additional EventsAravinda VK2016-10-181-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added following events EVENT_GEOREP_ACTIVE { "nodeid": NODEID, "ts": TIMESTAMP, "event": "GEOREP_ACTIVE", "message": { "master_volume": MASTER_VOLUME_NAME, "slave_host": SLAVE_HOST, "slave_volume": SLAVE_VOLUME, "brick_path": BRICK_PATH } } EVENT_GEOREP_PASSIVE { "nodeid": NODEID, "ts": TIMESTAMP, "event": "GEOREP_PASSIVE", "message": { "master_volume": MASTER_VOLUME_NAME, "slave_host": SLAVE_HOST, "slave_volume": SLAVE_VOLUME, "brick_path": BRICK_PATH } } EVENT_GEOREP_CHECKPOINT_COMPLETED { "nodeid": NODEID, "ts": TIMESTAMP, "event": "GEOREP_ACTIVE", "message": { "master_volume": MASTER_VOLUME_NAME, "slave_host": SLAVE_HOST, "slave_volume": SLAVE_VOLUME, "brick_path": BRICK_PATH, "checkpoint_time": CHECKPOINT_TIME, "checkpoint_completion_time": CHECKPOINT_COMPLETION_TIME } } BUG: 1379330 Change-Id: I90716175868c59dd65c8d202e73e0ede90347b6a Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/15630 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com> Tested-by: Kotresh HR <khiremat@redhat.com>
* eventsapi/geo-rep: Geo-rep will not work without eventsapi rpmsAravinda VK2016-09-281-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | If glusterfs-events rpm is not installed, Geo-replication will fail since it imports eventtypes. Any call to gsyncd will fail with Import error. Glusterd start fails since it runs `gsyncd.py --version` Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 29, in <module> from syncdutils import FreeObject, norm, grabpidfile, finalize File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 28, in <module> from events import eventtypes ImportError: No module named events BUG: 1378057 Change-Id: I1a9bc086c3d52449ec7296cb2f9ceb16cd41a8a4 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/15539 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Use configured log_level for libgfchangelog logsAravinda VK2016-09-091-0/+17
| | | | | | | | | | | | | | | | libgfchangelog was not respecting the log_level configured in Geo-replication. With this patch Libgfchangelog log level can be configured using `config changelog_log_level TRACE`. Default Changelog log level is INFO BUG: 1363965 Change-Id: Ida714931129f6a1331b9d0815da77efcb2b898e3 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/15078 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: add geo-rep events for server side changesSaravanakumar Arumugam2016-08-311-0/+9
| | | | | | | | | | | | | | | | | Event Type defined in #15351 to avoid merge conflicts Add geo-rep events applicable to changes in geo-rep session in the server side. Change-Id: Ia66574d2abccad7fce6a96667efbc7c6c8903fc6 BUG: 1370445 Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-on: http://review.gluster.org/15328 Tested-by: Aravinda VK <avishwan@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Post process Data and Meta ChangelogsAravinda VK2016-08-261-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With this patch, Data and Meta GFIDs are post processed. If Changelog has UNLINK entry then remove from Data and Meta GFIDs list(If stat on GFID is ENOENT in Master). While processing Changelogs, - Collect all the data and meta operations in a temporary database - Delete all Data and Meta GFIDs which are already unlinked as per Changelogs (unlink only if stat on GFID is ENOENT) - Process all Entry operations as usual - Process data and meta operations in batch(Fetch from Db in batch) - Data sync is again batched based on number of changelogs(Default 1day changelogs). Once the sync is complete, Update last Changelog's time as last_synced time as usual. Additionally maintain entry_stime on Brick root, ignore Entry ops if changelog suffix time is less than entry_stime. If data stime is more than entry_stime, this can happen only when passive worker updates stime by itself by getting mount point stime. Use entry_stime = data_stime in this case. New configurations: max-rsync-retries - Default Value is 10 max-data-changelogs-in-batch - Max number of changelogs to be considered in a batch for syncing. Default value is 5760(4 changelogs per min * 60 min * 24 hours) max-history-changelogs-in-batch - Max number of history changelogs to be processed at once. Default value 86400(4 changelogs per min * 60 min * 24 hours * 15 days) BUG: 1364420 Change-Id: I7b665895bf4806035c2a8573d361257cbadbea17 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/15110 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* features/libgfchangelog: Log failure in gf_histroy_changelogKotresh HR2016-08-091-0/+2
| | | | | | | | | | | | | | | | | Add error logs if gf_history_changelog fails. If requested changelog range is not available, log the error and exit instead of continuing the loop and exiting in readdir without logging. Also fixed the duplicate MSGID number in 'changelog-lib-messages.h' Change-Id: Icd71b89ae23b48a71380657ba5649029c32fabfd BUG: 1362151 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/15064 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Error message cleanupAravinda VK2016-06-151-10/+13
| | | | | | | | | | | | | | | | If ssh returns 127 that means the remote gsyncd path is wrong or push-pem failed during create. Existing error message was pointing old documentation. Change-Id: Ifbbb4a604fc0ae0fd5cb2746df6363bf28cde1e9 BUG: 1343943 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/14673 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Milind Changire <mchangir@redhat.com> Reviewed-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Do not crash worker on ESTALEKotresh HR2015-08-031-9/+1
| | | | | | | | | | | | | | Handle ESTALE returned by lstat gracefully by retrying it. Do not crash the worker. Change-Id: I2527cd8bd1f7d2428cb4fa3f20782bebaf2df12a BUG: 1247529 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/11772 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: Minimize rm -rf race in Geo-repAravinda VK2015-05-051-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While doing RMDIR worker gets ENOTEMPTY because same directory will have files from other bricks which are not deleted since that worker is slow processing. So geo-rep does recursive_delete. Recursive delete was done using shutil.rmtree. once started, it will not check disk_gfid in between. So it ends up deleting the new files created by other workers. Also if other worker creates files after one worker gets list of files to be deleted, then first worker will again get ENOTEMPTY again. To fix these races, retry is added when it gets ENOTEMPTY/ESTALE/ENODATA. And disk_gfid check added for original path for which recursive_delete is called. This disk gfid check executed before every Unlink/Rmdir. If disk gfid is not matching with GFID from Changelog, that means other worker deleted the directory. Even if the subdir/file present, it belongs to different parent. Exit without performing further deletes. Retry on ENOENT during create is ignored, since if CREATE/MKNOD/MKDIR failed with ENOENT will not succeed unless parent directory is created again. Rsync errors handling was handling unlinked_gfids_list only for one Changelog, but when processed in batch it fails to detect unlinked_gfids and retries again. Finally skips the entire Changelogs in that batch. Fixed this issue by moving self.unlinked_gfids reset logic before batch start and after batch end. Most of the Geo-rep races with rm -rf is eliminated with this patch, but in some cases stale directories left in some bricks and in mount point we get ENOTEMPTY.(DHT issue, Error will be logged in Slave log) BUG: 1211037 Change-Id: I8716b88e4c741545f526095bf789f7c1e28008cb Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/10204 Reviewed-by: Kotresh HR <khiremat@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: log ENTRY failures from slave on masterMilind Changire2015-04-271-1/+1
| | | | | | | | | | | | | | | ENTRY operations failures on slave left no trace for debugging purposes. This patch captures such failures on slave cluster and forwards them to the master and logs them. Failures of specific interest are the ones which return code EEXIST on the failing operations. Change-Id: Iecab876f16593c746d53f4b7ec2e0783367856bb BUG: 1207115 Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-on: http://review.gluster.org/10048 Reviewed-by: Aravinda VK <avishwan@redhat.com> Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com>
* geo-rep: Adhering to the common storage for geo-repKotresh HR2015-04-271-3/+0
| | | | | | | | | | | | | | | | | | | Making geo-rep use the common storage shared by nfs, snapshot and geo-rep. The meta volume should be named as gluster_shared_storage, and it should be mounted at "/var/run/gluster/shared_storage/". geo-rep will have create a directory called 'geo-rep' in the meta-volume and all the lock files are created inside it. Change-Id: I82d0bff9be191f75f643606a9a21d53559047ac4 BUG: 1210344 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/10196 Reviewed-by: Aravinda VK <avishwan@redhat.com> Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com>
* feature/geo-rep: Active Passive Switching logic flockKotresh HR2015-03-151-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CURRENT DESIGN AND ITS LIMITATIONS: ----------------------------------- Geo-replication syncs changes across geography using changelogs captured by changelog translator. Changelog translator sits on server side just above posix translator. Hence, in distributed replicated setup, both replica pairs collect changelogs w.r.t their bricks. Geo-replication syncs the changes using only one brick among the replica pair at a time, calling it as "ACTIVE" and other non syncing brick as "PASSIVE". Let's consider below example of distributed replicated setup where NODE-1 as b1 and its replicated brick b1r is in NODE-2 NODE-1 NODE-2 b1 b1r At the beginning, geo-replication chooses to sync changes from NODE-1:b1 and NODE-2:b1r will be "PASSIVE". The logic depends on virtual getxattr 'trusted.glusterfs.node-uuid' which always returns first up subvolume i.e., NODE-1. When NODE-1 goes down, the above xattr returns NODE-2 and that is made 'ACTIVE'. But when NODE-1 comes back again, the above xattr returns NODE-1 and it is made 'ACTIVE' again. So for a brief interval of time, if NODE-2 had not finished processing the changelog, both NODE-2 and NODE-1 will be ACTIVE causing rename race as mentioned in the bug. SOLUTION: --------- 1. Have a shared replicated storage, a glusterfs management volume specific to geo-replication. 2. Geo-rep creates a file per replica set on management volume. 3. fcntl lock on the above said file is used for synchronization between geo-rep workers belonging to same replica set. 4. If management volume is not configured, geo-replication will back to previous logic of using first up sub volume. Each worker tries to lock the file on shared storage, who ever wins will be ACTIVE. With this, we are able to solve the problem but there is an issue when the shared replicated storage goes down (when all replicas goes down). In that case, the lock state is lost. So AFR needs to rebuild the lock state after brick comes up. NOTE: ----- This patch brings in the, pre-requisite step of setting up management volume for geo-replication during creation. 1. Create mgmt-vol for geo-replicatoin and start it. Management volume should be part of master cluster and recommended to be three way replicated volume having each brick in different nodes for availability. 2. Create geo-rep session. 3. Configure mgmt-vol created with geo-replication session as follows. gluster vol geo-rep <mastervol> slavenode::<slavevol> config meta_volume \ <meta-vol-name> 4. Start geo-rep session. Backward Compatiability: ----------------------- If management volume is not configured, it falls back to previous logic of using node-uuid virtual xattr. But it is not recommended. Change-Id: I7319d2289516f534b69edd00c9d0db5a3725661a BUG: 1196632 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/9759 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: Handle ENOENT during cleanupAravinda VK2015-03-051-1/+6
| | | | | | | | | | | | | | | | shutil.rmtree was failing to remove file if file was not exists. Added error handling function to ignore ENOENT if a file/dir not present. BUG: 1198101 Change-Id: I1796db2642f81d9e2b5e52c6be34b4ad6f1c9786 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/9792 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Saravanakumar Arumugam <sarumuga@redhat.com>
* geo-rep: Fixing the typo errorsarao2015-02-031-1/+1
| | | | | | | | | Change-Id: Iacc67e4ba9ac45e0858f3befe84ffb8fccf7e1c3 BUG: 1075417 Signed-off-by: arao <arao@redhat.com> Reviewed-on: http://review.gluster.org/9502 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Humble Devassy Chirammal <humble.devassy@gmail.com>
* geo-rep: Pause and Resume feature for geo-replicationAravinda VK2014-05-091-0/+3
| | | | | | | | | | | | | | | | Changelog consumption/processing now happens in seperate process group than monitor. When monitor process group gets SIGSTOP all worker process, ssh, rsync will be paused except the changelog processing. When it gets SIGCONT it resumes its operation. Changelog agent runs as RepceServer, geo-rep worker communicates with changelog agent using RepceClient. Change-Id: I35c333e4d8b13d03a7808aed601960eef23cfa04 BUG: 1093602 Signed-off-by: Venky Shankar <vshankar@redhat.com> Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/7322
* geo-rep: Loading libgfchangelog.so only while running geo-repAravinda VK2014-05-091-0/+4
| | | | | | | | | | | | | | | | | In source install, libgfchangelog is installed in /usr/local/lib When glusterd runs /usr/local/libexec/glusterfs/python/gsyncd --version it fails to find library without LD_LIBRARY_PATH. This patch avoids loading library when it is run from glusterd during start. BUG: 1096026 Change-Id: I59912227ac27ff4877d947a7c8f1fe2e8c5be06e Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/7713 Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: Consume Changelog History APIAravinda VK2014-04-301-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | Every time when geo-rep restarts it first does FS crawl using XCrawl and then switches to Changelog Mode. This is because changelog only had live API, that is we can get changes only after registering. Now this(http://review.gluster.org/#/c/6930/) patch introduces History API for changelogs. If history is available then geo-rep will use it instead of FS Crawl. History API returns TS till what time history is available for given start and end time. If TS < endtime then switch to FS Crawl. (History => FS Crawl => Live Changelog) If TS >= endtime, then switch directly to Changelog mode (History => Live Changelog) Change-Id: I4922f62b9f899c40643bd35720e0c81c36b2f255 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/6938 Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Humble Devassy Chirammal <humble.devassy@gmail.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* geo-rep: code pep8/flake8 fixesAravinda VK2014-04-071-31/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | pep8 is a style guide for python. http://legacy.python.org/dev/peps/pep-0008/ pep8 can be installed using, `pip install pep8` Usage: `pep8 <python file>`, For example, `pep8 master.py` will display all the coding standard errors. flake8 is used to identify unused imports and other issues in code. pip install flake8 cd $GLUSTER_REPO/geo-replication/ flake8 syncdaemon Updated license headers to each source file. Change-Id: I01c7d0a6091d21bfa48720e9fb5624b77fa3db4a Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/7311 Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Prashanth Pai <ppai@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* gsyncd / geo-rep: geo-replication fixesAjeet Jha2013-12-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | -> "threaded" hybrid crawl. -> Enabling metatadata synchronization. -> Handling EINVAL/ESTALE gracefully while syncing metadata. -> Improvments to changelog crawl code. -> Initial crawl changelog generation format. -> No gsyncd restart when checkpoint updated. -> Fix symlink handling in hybrid crawl. -> Slave's xtime key is 'stime'. -> tar+ssh as data synchronization. -> Instead of 'raise', just log in warning level for xtime missing cases. -> Fix for JSON object load failure -> Get new config value after config value reset. -> Skip already processed changelogs. -> Saving status of each individual worker thread. -> GFID fetch on slave for purges. -> Add tar ssh keys and config options. -> Fix nlink count when using backend. -> Include "data" operation for hardlink. -> Use changelog time prefix as slave's time. -> Process changelogs in parallel. Change-Id: I09fcbb2e2e418149a6d8435abd2ac6b2f015bb06 BUG: 1036539 Signed-off-by: Ajeet Jha <ajha@redhat.com> Reviewed-on: http://review.gluster.org/6404 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* geo-rep: retry in case of ENOENT errors in entry creationsAmar Tumballi2013-09-171-4/+11
| | | | | | | | | | Change-Id: I8961633a7371c941a3feee44c949d5c934eca998 Original-Author: Venky Shankar <vshankar@redhat.com> Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 847839 Reviewed-on: http://review.gluster.org/5933 Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com>
* gsyncd / geo-rep: Introduce basic crawl instrumentationVenky Shankar2013-09-041-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch extends the persistent instrumentation work done by Aravinda (@avishwa), by introducing a handfull of instrumentation variables for crawl. These variables are "pulled up" by glusterd in the event of a geo-replication status cli command and looks something like below: "Uptime=00:21:10;FilesSyned=2982;FilesPending=0;BytesPending=0;DeletesPending=0;" "FilesPending", "BytesPending" and "DeletesPending" are short-lived variables that are non-zero when a changelog is being processes (ie. when an active sync in ongoing). After a successfull changelog process "FilesPending" is summed up into "FilesSynced". The three short-lived variabled are then reset to zero and the data is persisted Additionally this patch also reverts some of the changes made for BZ #986929 (those were not needed). Change-Id: I948f1a0884ca71bc5e5bcfdc017d16c8c54fc30b BUG: 990420 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/5441 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* geo-replication: Use a md5 based unique control pathHarshavardhana2013-09-041-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A hostname fqdn can be of length 255 according to RFC1123 -------------------------> /usr/include/bits/posix1_lim.h:#define _POSIX_HOST_NAME_MAX 255 <------------------------- On linux this length is 64 -------------------------> /usr/include/bits/local_lim.h:#define HOST_NAME_MAX 64 <------------------------- When a given hostname is > 45 (characters) - SSH fails with --------------------------> "ControlPath too long for Unix domain socket". <-------------------------- Indicating that the total length of ControlPath which is on linux should be 108 -------------------------> /usr/include/linux/un.h:#define UNIX_PATH_MAX 108 <------------------------- This leads to "faulty" geo-replication status. This patch brings in a new file called manifest which carries given a geo-rep session some unique information - with which a unique `md5` is generated in a 32length digest, this ensures that we don't exceed UNIX_PATH_MAX limitations instead we use a conservative approach and still be able to provide a unique socket path. Change-Id: I3a6a27d605d751a86e7c82eace4561d9b0134fe1 BUG: 990330 Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-on: http://review.gluster.org/5681 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Csaba Henk <csaba@redhat.com>
* gsyncd: distribute the crawling loadAvra Sengupta2013-07-261-1/+101
| | | | | | | | | | | | | | | | | | | * also consume changelog for change detection. * Status fixes * Use new libgfchangelog done API * process (and sync) one changelog at a time Change-Id: I24891615bb762e0741b1819ddfdef8802326cb16 BUG: 847839 Original Author: Csaba Henk <csaba@redhat.com> Original Author: Aravinda VK <avishwan@redhat.com> Original Author: Venky Shankar <vshankar@redhat.com> Original Author: Amar Tumballi <amarts@redhat.com> Original Author: Avra Sengupta <asengupt@redhat.com> Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/5131 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* move 'xlators/marker/utils/' to 'geo-replication/' directoryAvra Sengupta2013-07-221-0/+288
Change-Id: Ibd0faefecc15b6713eda28bc96794ae58aff45aa BUG: 847839 Original Author: Amar Tumballi <amarts@redhat.com> Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/5133 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>