summaryrefslogtreecommitdiffstats
path: root/geo-replication
Commit message (Collapse)AuthorAgeFilesLines
* geo-rep: Handling Rsync/Tar errors efficientlyAravinda VK2016-02-262-87/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | Geo-rep processes Changelogs in Batch, if one file in batch fails with rsync error that Changelog file is reprocessed multiple times. After MAX_RETRY, it logs all the GFIDs from that batch as Skipped. This patch addresses following issues, 1. When Rsync/Tar fails do not parse Changelog again for retry 2. When Rsync/Tar fails do not replay Entry operations, only retry rsync/tar for those GFIDs 3. Log Error in Rsync/Tar only in the last Retry 4. Do not log Skipped GFIDs since Rsync/Tar errors are logged for only failed files. 5. Changed Entry failures as Error instead of Warning BUG: 1287723 Change-Id: Ie134ce2572693056ab9b9008cd8aa5b5d87f7975 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/12856 Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Saravanakumar Arumugam <sarumuga@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: Handle ERANGE error during listxattrAravinda VK2016-02-261-7/+21
| | | | | | | | | | | | | | | | | | | | | | | | llistxattr in Geo-rep is two syscall instead of one SIZE = llistxattr(PATH, &BUF, 0); BUF = create_buf(SIZE); _ = llistxattr(PATH, &BUF, SIZE); So if any new xattrs added just after first call by any other worker, second syscall will fail with ERANGE error. Now Geo-rep sends BUF with large size(256*100) and gets value with only one syscall. Raises OSError if fails with ERANGE error even after sending large BUF. Change-Id: I8ade4bbe9a0a8ea908ed9dedcd3f2ff4c6fe6f51 Signed-off-by: Aravinda VK <avishwan@redhat.com> BUG: 1294588 Reviewed-on: http://review.gluster.org/13106 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Handle hardlink in Tiering based volumeSaravanakumar Arumugam2016-02-261-10/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Hardlinks are synced as Sticky bit files to Slave in a Tiering based volume. In a Tiering based volume, cold tier is hashed subvolume and geo-rep captures all namespace operations in cold tier. While syncing a file and its corresponding hardlink, it is recorded as MKNOD in cold tier(for both) and We end up creating two different files in Slave. Solution: If MKNOD with Sticky bit set is present, record it as LINK. This way it will create a HARDLINK if source file exists (on slave), else it will create a new file. This way, Slave can create Hardlink file itself (instead of creating a new file) in case of hardlink. Change-Id: Ic50dc6e64df9ed01799c30539a33daace0abe6d4 BUG: 1301032 Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/13281 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* geo-rep: Fix gsyncd prefix in gsec_createAravinda VK2016-02-041-1/+2
| | | | | | | | | | | | | | | | | | | | gsec_create script is generated after running ./configure libexec dir was formed using $prefix/libexec, but in Debian based distributions libexec dir is not present, instead they use lib directory to store these scripts. With this patch, full libexec path is fetched during ./configure. BUG: 1162905 Change-Id: I9f47a38e6ab0027c7df6716136fbe0635e95a593 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/13298 Smoke: Gluster Build System <jenkins@build.gluster.com> Tested-by: Raghavendra Talur <rtalur@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* geo-rep: Fix getting subvol countKotresh HR2015-12-221-1/+4
| | | | | | | | | | | | | | | | Tiering doesn't support disperse volume as hot tier, hence xml output doesn't give 'hotdisperseCount'. Remove the usage of 'hotdisperseCount' in geo-rep and return 0 instead. Change-Id: I736e29257de085a25e38eb02959caad3465ebcda BUG: 1292084 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/13062 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vivek Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Fix getting subvol numberKotresh HR2015-12-212-17/+47
| | | | | | | | | | | | | | | | | | Fix getting subvol number if the volume type is tier. If the volume type was tier, the subvol number was calculated incorrectly and hence few of workers didn't become ACTIVE resulting in files not being replicated from corresponding brick. This patch addresses the same. Change-Id: Ic10ad7f09a0fa91b4bf2aa361dea3bd48be74853 BUG: 1292084 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/12994 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* geo-rep: Symlink Rename issueAravinda VK2015-12-172-3/+15
| | | | | | | | | | | | | | | | | | | | | If ENTRY creation failed for symlink in Slave and symlink renamed in Master. If Source not exists to Rename in Slave Geo-rep interprets as Create of Target file. Geo-rep sends blob of regular file to create symlink instead of sending blob of symlink. With this patch, Geo-rep identifies symlink and sends respective blob. BUG: 1289859 Change-Id: If9351974d1945141a1d3abb838b7d0de7591e48e Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/12917 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Milind Changire <mchangir@redhat.com> Tested-by: Milind Changire <mchangir@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* Add support for sparse files to tarssh methodAlex Markelov2015-12-111-1/+1
| | | | | | | | | | | | | | | | Without '--sparse' option tar will not properly archive sparse file and geo-replication will result in non-sparse file on the remote end. Here is more on how I arrived at this http://markelov.org/wiki/index.php/GlusterFS_3.6.1_on_CentOS_6.5:_geo-replication_and_sparse_files_problem Change-Id: I8d671964a1b48bbb916e4a064571221bf3631494 BUG: 1276839 Signed-off-by: Alex Markelov <alex@markelov.org> Reviewed-on: http://review.gluster.org/12476 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: use cold tier bricks for namespace operationsSaravanakumar Arumugam2015-12-034-17/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: symlinks are not getting synced to slave in a Tiering based volume. Solution: Now, symlinks are created directly in cold tier bricks( in the backend). Earlier, cold tier was avoided for namespace operations and only hot tier was used while processing changelogs. Now, cold tier is HASH subvolume in a Tiering volume. So, carry out namespace operation only in cold tier subvolume and avoid hot tier subvolume to avoid any races. Earlier, XSYNC was used(and changeloghistory avoided) during initial sync in order to avoid race while processing historychangelog in Hot tier. This is no longer required as there is no race from Hot tier. Also, avoid both live and history changelog ENTRY operations from Hot tier to avoid any race with cold tier. Change-Id: Ia8fbb7ae037f5b6cb683f36c0df5c3fc2894636e BUG: 1287519 Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-on: http://review.gluster.org/12844 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: geo-rep to handle CAPS based HostnameSaravanakumar Arumugam2015-12-021-1/+1
| | | | | | | | | | | | | | | | | | | | | Problem: geo-replication session creation fails when Hostname is having CAPS in it. Issue is with the regex pattern which handles only small lettered Hostname. Fix: Fix the regex pattern to handle CAPS based hostname as well. Change-Id: I5c99c102e9706acc2b1fab1e6bf158e68beed373 BUG: 1265522 Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-on: http://review.gluster.org/12216 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Milind Changire <mchangir@redhat.com> Reviewed-by: Aravinda VK <avishwan@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: fd close and fcntl issueAravinda VK2015-12-011-15/+19
| | | | | | | | | | | | | | | | | | | | When any of the open fd of a file is closed on which fcntl lock is taken even though another fd of the same file is open on which lock is taken, all fcntl locks will be released. This causes both replica workers to be ACTIVE sometimes. This patche fixes that issue. Change-Id: I1e203ab0e29442275338276deb56d09e5679329c BUG: 1285488 Original-Author: Aravinda VK <avishwan@redhat.com> Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/12752 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Milind Changire <mchangir@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: Fix syntax errors in GsyncdErrorAravinda VK2015-11-231-3/+2
| | | | | | | | | | | | %s was not replaced by actual values in GsyncdError BUG: 1279921 Change-Id: I3c0a10f07383ca72844a46f930b4aa3d3c29f568 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/12566 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: Fix Geo-rep logging to log datetime in GMT/UTCAravinda VK2015-11-231-0/+1
| | | | | | | | | | | | | Geo-rep is logging in Local time, all other Gluster logs are in GMT/UTC. It is very difficult to co-relate Geo-rep logs with other Gluster logs. BUG: 1282331 Change-Id: Ieae8bda7e4788e587cf4595e21e0e772c210cfbb Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/12583 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: Fix FD leak from Active Geo-rep workerAravinda VK2015-11-182-0/+16
| | | | | | | | | | | | | | | | | | Active worker tries to acquire lock in each iteration. On every successfull lock acqusition it was not closing previously opened lock fd. To see the leak, get the PID of worker, ps -ax | grep feedback-fd watch ls /proc/$pid/fd BUG: 1225566 Change-Id: Ic476c24c306e7ab372c5560fbb80ef39f4fb31af Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/12332 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Milind Changire <mchangir@redhat.com> Reviewed-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* glusterd/geo-rep: Adding ssh-port option for geo-rep createKotresh HR2015-11-181-5/+5
| | | | | | | | | | | | | | | | | | | Geo-replication uses default ssh port 22 for setup. i.e., to distribute ssh keys to slaves. In container environments, custom port number might be used. Hence to support custom port number for ssh, option is provided in geo-rep create command to take the same. Change-Id: I0fb61959b1c085342b8e4c21ac4e076fba5462f1 BUG: 1276028 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/12504 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Avra Sengupta <asengupt@redhat.com> Reviewed-by: Aravinda VK <avishwan@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: New Config option for ssh_portAravinda VK2015-11-172-3/+8
| | | | | | | | | | | | | | | | | | | | | | | If different port used for SSH instead of 22, Geo-replication was failing to establish SSH connection. ssh_port option can be added using config:ssh_command and config:ssh_command_tar, but user has to remember complete ssh command used with parameter to add/modify ssh port. This patch adds new config option for ssh_port, gluster volume geo-replication <MASTERVOL> <SLAVEHOST::<SLAVEVOL> \ config ssh_port 52022 Change-Id: I7753a09485f0b1f49d2b2a80b962c720817c96f4 Signed-off-by: Aravinda VK <avishwan@redhat.com> BUG: 1276028 Reviewed-on: http://review.gluster.org/12444 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Saravanakumar Arumugam <sarumuga@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: Make restrictive ssh keys optionalKotresh HR2015-11-091-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | In containerized environment where networking configuration is "net=host", both host and containers use the same IP. The validations gsyncd shell and rsync to be the siblings fails. Hence, for now, creating restrictive ssh keys is made optional as follows. If the argument 'container' is passed, it will create non restrictive ssh keys else restrictive ssh keys. e.g., gluster system:: execute gsec_create container Creates non restrictive ssh keys. gluster system:: execute gsec_create Creates restrictive ssh keys. Change-Id: Ibed362f64b9b4c9931207f863a2da944c6bd1d66 BUG: 1276028 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/12459 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Kill Geo-rep Worker when Agent process diesAravinda VK2015-11-093-14/+44
| | | | | | | | | | | | | | | | | | When Changelog agent process dies, Geo-replication fails to detect and worker will run without respective Changelog agent. Status shows Active/Passive without any progress. With this patch, Worker process gets killed whenever Changelog agent dies. Change-Id: I30b4cc77f924f7e8174b8bfe415ac17f0b3851b4 Signed-off-by: Aravinda VK <avishwan@redhat.com> BUG: 1277076 Reviewed-on: http://review.gluster.org/12485 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Fix syncing chown in xsync crawlKotresh HR2015-11-081-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | GEO-REP INTEROP WITH SHARD FEATURE Problem: The sequence of entry creation and chown in master is recorded as creation of entry with resulted user:group in xsync changelog. During sync, entry creation is always split into two ops, MKNOD and SETATTR. Hence the issue is not being hit otherwise it would have failed with EPERM if parent is owned by different user. But with shard translator being enabled on slave, doing entry creation with MKNOD and SETATTR is not allowed, SETATTR fails as it accesses inode structure which is not linked. Solution: The sequence of entry creation and chown in master should be recorded as MKNOD and SETATTR separately always and do entry creation with single op in gfid-access xlator. The gfid-access patch will be sent separately. Change-Id: I93e554bf9342397a7660503f5128e9709f8a0cd8 BUG: 1265148 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/12205 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Update last_synced_time in XSyncAravinda VK2015-11-041-14/+9
| | | | | | | | | | | | | During XSync crawl, last_synced time in status file was not updated. This patch fixes the issue by updating status file when stime xattr is updated after Xsync or Changelog Crawl. Change-Id: I4dc3a2d4c3d8378a939da0868caf1aef4f789599 Signed-off-by: Aravinda VK <avishwan@redhat.com> BUG: 1247536 Reviewed-on: http://review.gluster.org/11771 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* geo-rep: Handle FXATTROP and XATTROPKotresh HR2015-10-291-1/+2
| | | | | | | | | | | | | | | GEO-REP INTEROP WITH SHARD FEATURE If it is FXATTROP or XATTROP in changelog, add the gfid to rsync queue. Change-Id: If68d38d7ed00b70a4618cfcc8e75df3fbadbf724 BUG: 1265148 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/12226 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* core: use syscall wrappers instead of direct syscalls - miscellaneousKaleb S. KEITHLEY2015-10-282-9/+12
| | | | | | | | | | | | | | | various xlators and other components are invoking system calls directly instead of using the libglusterfs/syscall.[ch] wrappers. If not using the system call wrappers there should be a comment in the source explaining why the wrapper isn't used. Change-Id: I1f47820534c890a00b452fa61f7438eb2b3f667c BUG: 1267967 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/12276 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* geo-rep: Avoid cold tier bricks during ENTRY operationSaravanakumar Arumugam2015-10-264-5/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a series of patch which aims to fix geo-replication in a Tiering Volume. Problem: Consider, a file is placed in volume initially and then hot tier is attached. During any operation on the file, due to lookup a linkto file is created in hot tier. Now, any namespace operation carried out on the file is recorded in both cold and hot tier. There is a room for races when both changelogs are replayed. Solution: So, We are going to replay (namespace related)operations only in the hot tier. Why? a. If the file is directly placed in Hot tier , all fops will be recorded in HOT tier. b. If the file is already present in Cold tier, and if any fop is carried out, it creates linkto file in Hot tier. Now, operations like UNLINK, RENAME are captured in Hot tier(by means of linkto file). This way, we can get both tier's operation in HOT tier itself. Now, once the file is demoted to COLD tier, any namespace operation carried out on the cold tier can be avoided as we directly RECORD the same in HOT tier. How? 1. Check whether the brick is cold tier and skip ENTRY operation. 2. Also, if it is cold tier brick, use Xsync(which is used during initial run). This will help in getting all cold tier bricks changes using File System crawl and helps in avoiding races with hot tier brick(which can happen if historychangelog used in cold tier brick). Dependent patches: 1. http://review.gluster.org/12239 2. http://review.gluster.org/12326 Change-Id: I7692b1dbb8813a7e253451bca02f8f09a5782dde BUG: 1266875 Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-on: http://review.gluster.org/12355 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Add data operation if mknod with tier attributeSaravanakumar Arumugam2015-10-261-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a series of patches which aims to fix geo-replication in a Tiering Volume. Problem: Consider, a file is placed in volume initially and then hot tier is attached. During any operation on the file, due to lookup a linkto file is created in hot tier. Now, any namespace operation carried out on the file is recorded in both cold and hot tier. There is a room for races when both changelogs are replayed. Solution: So, We are going to replay (namespace related)operations only in the hot tier. Why? a. If the file is directly placed in Hot tier, all fops will be recorded in HOT tier. b. If the file is already present in Cold tier, and if any fop is carried out, it creates linkto file in Hot tier. Now, operations like UNLINK, RENAME are captured in Hot tier(by means of linkto file). This way, we can get both tier's operation in HOT tier itself. But, We may miss initial Data sync immediately after creating the file as it is only recording MKNOD. So, if MKNOD encountered with sticky bit set, queue DATA operation for the corresponding gfid. (This is addressed here in this patch) So, If tier-gfid linkto is set, we need to record the corresponding MKNOD. Earlier this was avoided as it was set as INTERNAL fop. (This changelog related changes are addressed in the patch: - http://review.gluster.org/12417) Change-Id: I2fa84cfa2b0f86506c3d15d484138ab9651e4f83 BUG: 1266875 Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-on: http://review.gluster.org/12326 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Fix portability issues with NetBSDKotresh HR2015-09-072-38/+76
| | | | | | | | | | | | | | Fixes portability issues in gverify.sh and libcxattr.py with NetBSD. Change-Id: Idfaa6cf3815136e6a2343aab98d979b6ab451bbd BUG: 1257847 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/12088 Reviewed-by: Emmanuel Dreyfus <manu@netbsd.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: Update geo-rep status, if monitor process is killedSaravanakumar Arumugam2015-09-012-2/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: When the monitor process itself is getting killed, geo-rep session still shows as active. Status command will just pick up the content from the status file to show the output. Monitor process is the one which updates the Status file. When the monitor process itself gets killed, there is no way to update the status file. So, geo-rep session status command ends up showing last updated Status present in the status file. Solution: While getting the status output, check whether monitor process is running. If it is NOT running, update the status as STOPPED. Change-Id: I86a7ac1746dd8f27eef93658e992ef16f6068d9d BUG: 1251980 Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-on: http://review.gluster.org/11873 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Milind Changire <mchangir@redhat.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* geo-rep: Fix gsyncd failing to start on slaveKotresh HR2015-08-051-2/+13
| | | | | | | | | | | | | | | | This is a regression introduced by 2ca6441. The memory is freed while the caller expects it be not. The patch fixes the same. Change-Id: I76d95eef15b3f7af365b392f2426c8b0388254fc BUG: 1222898 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/11751 Reviewed-by: Saravanakumar Arumugam <sarumuga@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Milind Changire <mchangir@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: Do not crash worker on ESTALEKotresh HR2015-08-031-9/+1
| | | | | | | | | | | | | | Handle ESTALE returned by lstat gracefully by retrying it. Do not crash the worker. Change-Id: I2527cd8bd1f7d2428cb4fa3f20782bebaf2df12a BUG: 1247529 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/11772 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: Fix history failureKotresh HR2015-07-282-5/+19
| | | | | | | | | | | | | | | | | | | | | | | Both ACTIVE and PASSIVE workers register to changelog at almost same time. When PASSIVE worker becomes ACTIVE, the start and end time would be current stime and register_time repectively for history API. Hence register_time would be less then stime for which history obviously fails. But it will be successful for the next restart as new register_time > stime. Fix is to pass current time as the end time to history call instead of the register_time. Also improvised the logging for ACTIVE/PASSIVE switching. Change-Id: Idc08b4b55c7a4c575ba44918a98389164ccbee8f BUG: 1239044 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/11524 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* geo-replication: fix memory leak in gsyncdPrasanna Kumar Kalever2015-07-141-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. leak in str2argv() function: char *strdup(const char *s); The strdup() function returns a pointer to a new string which is a duplicate of the string s. Memory for the new string is obtained with malloc(3), and can be freed with free(3). so when using strdup, user has to take care of freeing the memory allocated by strdup library function. 2. leak in main() function: str2argv() function calls calloc and allocates memory pointed to argv, after return from str2argv() memory pointed to argv has to be cleaned in main(). This patch is to fix 2 memory leaks mentioned above. Change-Id: I6bf26101e0460a7324ac7bdb69905839688d4987 BUG: 1222898 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Reviewed-on: http://review.gluster.org/10831 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Milind Changire <mchangir@redhat.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: Fix fd referenced before assignmentKotresh HR2015-07-051-2/+6
| | | | | | | | | | | | | Fix fd reference before assignment in mgmt_lock function. Change-Id: Ie939d4262a59cae0817ae388658a000576ab69b8 BUG: 1233411 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/11318 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: ignore ESTALE as ENOENTAravinda VK2015-06-262-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | When DHT can't resolve a File it raises ESTALE, ignore ESTALE errors same as ENOENT after retry. Affected places: Xattr.lgetxattr os.listdir os.link Xattr.lsetxattr os.chmod os.chown os.utime os.readlink BUG: 1232912 Change-Id: I02015f508d901e4a74dd48e1c52423e78eaf1dcd Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/11296 Reviewed-by: Kotresh HR <khiremat@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* geo-rep: Fix add user in mountbroker user managementKotresh HR2015-06-261-1/+33
| | | | | | | | | | | | | | | | | | | | | | | | | The CLI 'gluster system:: execute mountbroker user <USERNAME> <VOLUMES>' to set volumes associated with a user replaces existing user and associated volumes upon setting with existing user. This patch fixes it by appending the volumes if the user already exists. It also introduces following CLI to remove volume for a corresponding user. 'gluster system:: execute mountbroker volumedel <USERNAME> <VOLUME>' <USERNAME>: username <VOLUME>: comman separated list of volumes to delete If it is the last volume to be deleted associated with the user, it will delete the user as well as it doesn't make sense to keep only user without volumes associated. Change-Id: I49f4b9279954d9f5d34aca2dd8a69c6f4b87fd19 BUG: 1226223 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/11385 Reviewed-by: darshan n <dnarayan@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Fix glusterd working directoryKotresh HR2015-06-241-0/+1
| | | | | | | | | | | | | | | | Mountbroker setup in geo-replication requires the script 'set_geo_rep_pem_keys.sh to be run manually out of gluster context. Hence the ${GLUSTERD_WORKDIR} is never set. So getting glusterd working dir using 'gluster system:: getwd'. Change-Id: Ic2fe85e685016b383e38af0ea43c3f0628c16a32 BUG: 1235292 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/11381 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Fix toggling of use_meta_volume configKotresh HR2015-06-241-1/+1
| | | | | | | | | | | | | | | If meta-volume is deleted and use_meta_volume is set to false, geo-rep still fails complaining meta volume is not mounted. The patch fixes that issue. Change-Id: Iecf732197926bf9ce69112287fccbb1c34e58e6d BUG: 1234694 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/11358 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Fix geo-rep fanout setup with meta volumeKotresh HR2015-06-241-2/+2
| | | | | | | | | | | | | | | | Lock filename was formed with 'master volume id' and 'subvol number'. Hence multiple slaves try acquiring lock on same file and become PASSIVE ending up not syncing data. Using 'slave volume id' in lock filename will fix the issue making lock file unique across different slaves. BUG: 1234882 Change-Id: Ie3590b36ed03e80d74c0cfc1290dd72122a3b4b1 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/11367 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* gverify: Adding StrictHostKeyChecking=no for ssh verificationM S Vishwanath Bhat2015-06-121-1/+1
| | | | | | | | | | | | | | | | | | | Before actually checking the compatibility between master and slave, gverify checks if there is a passwordless ssh connection between master to slave. So if the entry of the slave was not present in 'known_hosts' file in master gverify used to complain that passwordless ssh has not been setup. This used to happen even if there is a passwordless ssh between master to slave. This patch fixes the above problem by using StrictHostKeyChecking=no while doing ssh to slave. Change-Id: I953c278e411ad6bc1dd1966ea6d895b05f890492 BUG: 1228696 Signed-off-by: M S Vishwanath Bhat <vbhat@redhat.com> Reviewed-on: http://review.gluster.org/11106 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* build: fix compiling on older distributionsNiels de Vos2015-06-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | data-tiering is disabled on RHEL-5 because it depends on a too new SQLite version. This change also prevents installing some of files that are used by geo-replication, which is also not available on RHEL-5. geo-replication depends on a too recent version of Python. Due to an older version of OpenSSL, some of the newer functions can not be used. A fallback to previous functions is done. Unfortunately RHEL-5 does not seem to have TLSv1.2 support, so only older versions can be used. Change-Id: I672264a673f5432358d2e83b17e2a34efd9fd913 BUG: 1222317 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/10803 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: Ignore ESTALE during unlink/rmdirAravinda VK2015-05-311-4/+6
| | | | | | | | | | | | | | | | | during unlink/rmdir of Parent_GFID/Basename, if parent directory does not exists. Parent GFID will not get resolved and DHT raises ESTALE instead of ENOENT. Now ESTALE errors ignored during unlink/rmdir BUG: 1223280 Change-Id: If275c89fb9fc7d16004550805a4cd65be818540d Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/10837 Reviewed-by: Kotresh HR <khiremat@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: Fix Data counter issue in statusAravinda VK2015-05-301-2/+12
| | | | | | | | | | | | | | | | ENTRY and META operations executed sequentially, DATA operations are handled async, increment happens when a changelog parsed. Decrement happens after the sync of all files. 'files_in_batch' was reset multiple times in batch instead of once. BUG: 1224098 Change-Id: I87617f2fd5f4d3221a1c9f9d5a8efb0686c42bbe Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/10911 Reviewed-by: Kotresh HR <khiremat@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* build: do not #include "config.h" in each fileNiels de Vos2015-05-292-10/+0
| | | | | | | | | | | | | | | | | | Instead of including config.h in each file, and have the additional config.h included from the compiler commandline (-include option). When a .c file tests for a certain #define, and config.h was not included, incorrect assumtions were made. With this change, it can not happen again. BUG: 1222319 Change-Id: I4f9097b8740b81ecfe8b218d52ca50361f74cb64 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/10808 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* geo-rep: Disable xattrs and acls support with tar_sshKotresh HR2015-05-282-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Geo-rep can't sync xattrs and acls with tar over ssh for following reasons. Issue 1: xattrs doesn't sync with tar over ssh. Reason: untar doesn't respect '--overwrite' option when used along with '--xattrs'. So it sends unlink if the file exists on destination and re-creates afresh. But all entry operations are banned in aux-gfid-mount as it may lead to gfid-mismatch. Hence fails with EPERM. This happens only when some xattr is set on a file in master volume. Issue2: acls on directories does not sync with tar over ssh. Reason: tar tries to opendir ".gfid/<gfid1>" and is not supported by gfid-access-translator as readirp can't be handled on virtual inodes and hence fails with ENOTSUP where as it syncs for files. Since the issue is with tar commmand it self and nothing could be done from gluster side, disabling xattr and acls support with tar over ssh option. Geo-rep can sync xattrs and acls with 'rsync' as the sync engine. Change-Id: I6821d327e7fe15545adef644869aa2389f79c701 BUG: 1223642 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/10873 Tested-by: NetBSD Build System Reviewed-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: Fix corrupt gsyncd outputAravinda VK2015-05-081-3/+3
| | | | | | | | | | | | When gsyncd fails with Python traceback, glusterd fails parsing gsyncd output and shows error. BUG: 1219937 Change-Id: Ic32fd897c49a5325294a6588351b539c6e124338 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/10694 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: Fix Default values in Xml outputAravinda VK2015-05-071-0/+9
| | | | | | | | | | | | | | Default Values for last_synced, checkpoint_time and checkpoint_completion_time was zero instead of 'N/A' BUG: 1212410 Change-Id: Ie775508f8dcb9ba6f311946a2039739e4336d9a6 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/10580 Reviewed-by: darshan n <dnarayan@redhat.com> Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: Fix Rsync hang issueAravinda VK2015-05-071-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | When rsync is executed using Python subprocess, by default stdout of subprocess will be None. With the log rsync performance patch stdout is assigned to PIPE. Rsync writes to that PIPE whenever it syncs files. If log_rsync_performance is disabled then nobody will consume stdout and that gets full. Rsync hangs if PIPE is full. log_rsync_performance option is introduced with patch 10070 With this patch stdout=PIPE only if log_rsync_performance is enabled. Also removed -v option from Rsync. Thanks Venky and Kotresh for RCA. BUG: 1218552 Change-Id: I4ebcfb6999358c8e2c147f7964255bd836ed7499 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/10556 Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* geo-rep: Minimize rm -rf race in Geo-repAravinda VK2015-05-053-26/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While doing RMDIR worker gets ENOTEMPTY because same directory will have files from other bricks which are not deleted since that worker is slow processing. So geo-rep does recursive_delete. Recursive delete was done using shutil.rmtree. once started, it will not check disk_gfid in between. So it ends up deleting the new files created by other workers. Also if other worker creates files after one worker gets list of files to be deleted, then first worker will again get ENOTEMPTY again. To fix these races, retry is added when it gets ENOTEMPTY/ESTALE/ENODATA. And disk_gfid check added for original path for which recursive_delete is called. This disk gfid check executed before every Unlink/Rmdir. If disk gfid is not matching with GFID from Changelog, that means other worker deleted the directory. Even if the subdir/file present, it belongs to different parent. Exit without performing further deletes. Retry on ENOENT during create is ignored, since if CREATE/MKNOD/MKDIR failed with ENOENT will not succeed unless parent directory is created again. Rsync errors handling was handling unlinked_gfids_list only for one Changelog, but when processed in batch it fails to detect unlinked_gfids and retries again. Finally skips the entire Changelogs in that batch. Fixed this issue by moving self.unlinked_gfids reset logic before batch start and after batch end. Most of the Geo-rep races with rm -rf is eliminated with this patch, but in some cases stale directories left in some bricks and in mount point we get ENOTEMPTY.(DHT issue, Error will be logged in Slave log) BUG: 1211037 Change-Id: I8716b88e4c741545f526095bf789f7c1e28008cb Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/10204 Reviewed-by: Kotresh HR <khiremat@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: Status EnhancementsAravinda VK2015-05-057-374/+608
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Discussion in gluster-devel http://www.gluster.org/pipermail/gluster-devel/2015-April/044301.html MASTER NODE - Master Volume Node MASTER VOL - Master Volume name MASTER BRICK - Master Volume Brick SLAVE USER - Slave User to which Geo-rep session is established SLAVE - <SLAVE_NODE>::<SLAVE_VOL> used in Geo-rep Create command SLAVE NODE - Slave Node to which Master worker is connected STATUS - Worker Status(Created, Initializing, Active, Passive, Faulty, Paused, Stopped) CRAWL STATUS - Crawl type(Hybrid Crawl, History Crawl, Changelog Crawl) LAST_SYNCED - Last Synced Time(Local Time in CLI output and UTC in XML output) ENTRY - Number of entry Operations pending.(Resets on worker restart) DATA - Number of Data operations pending(Resets on worker restart) META - Number of Meta operations pending(Resets on worker restart) FAILURES - Number of Failures CHECKPOINT TIME - Checkpoint set Time(Local Time in CLI output and UTC in XML output) CHECKPOINT COMPLETED - Yes/No or N/A CHECKPOINT COMPLETION TIME - Checkpoint Completed Time(Local Time in CLI output and UTC in XML output) XML output: <?xml version="1.0" encoding="UTF-8" standalone="yes"?> cliOutput> geoRep> volume> name> sessions> session> session_slave> pair> master_node> master_brick> slave_user> slave/> slave_node> status> crawl_status> entry> data> meta> failures> checkpoint_completed> master_node_uuid> last_synced> checkpoint_time> checkpoint_completion_time> BUG: 1212410 Change-Id: I944a6c3c67f1e6d6baf9670b474233bec8f61ea3 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/10121 Tested-by: NetBSD Build System Reviewed-by: Kotresh HR <khiremat@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: Fix minor bugs in meta-volume setupKotresh HR2015-05-021-1/+2
| | | | | | | | | | | | | | | | | | | | | | 1. Access unreferenced access of fd: In meta volume configuration for geo-rep, if geo-rep directory is not created yet, open fails with no fd, but it is accessed in close(fd). So after creating 'geo-rep' directory in meta-volume, open the lock file to get fd. 2. Fix volume_id in forming lock file name. For the very first time, gconf.volume_id would be null, as config is not reloaded yet. Hence, use 'uuid' function to get the volume id. Change-Id: I8381ab7a44bc800df25d596218466641c10937a4 BUG: 1210344 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/10458 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Aravinda VK <avishwan@redhat.com> Tested-by: NetBSD Build System
* geo-rep: Limit number of changelogs to process in batchAravinda VK2015-04-281-6/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Changelog processing is done in batch, for example if 10 changelogs available for processing then process all at once. Collect Entry, Meta and Data operations separately, All the entry operations like CREATE, MKDIR, MKNOD, LINK, UNLINK will be executed first then rsync will be triggered for whole batch. Stime will get updated once the complete batch is complete. In case of large number of Changelogs in a batch, If geo-rep fails after Entry operations, but before rsync then on restart, it again starts from the beginning since stime is not updated. It has to process all the changelogs again. While processing same changelogs again, all CREATE will get EEXIST since all the files created in previous run. Big hit for performance. With this patch, Geo-rep limits number of changelogs per batch based on Changelog file size. So that when geo-rep fails it has to retry only last batch changelogs since stime gets updated after each batch. BUG: 1210965 Change-Id: I844448c4cdcce38a3a2e2cca7c9a50db8f5a9062 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/10202 Reviewed-by: Kotresh HR <khiremat@redhat.com> Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: log ENTRY failures from slave on masterMilind Changire2015-04-273-10/+48
| | | | | | | | | | | | | | | ENTRY operations failures on slave left no trace for debugging purposes. This patch captures such failures on slave cluster and forwards them to the master and logs them. Failures of specific interest are the ones which return code EEXIST on the failing operations. Change-Id: Iecab876f16593c746d53f4b7ec2e0783367856bb BUG: 1207115 Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-on: http://review.gluster.org/10048 Reviewed-by: Aravinda VK <avishwan@redhat.com> Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com>