summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/dht
Commit message (Collapse)AuthorAgeFilesLines
* cluster/dht: Add migration checks to dht_(f)xattropN Balachandran2018-01-305-53/+334
| | | | | | | | | | | | | | | | The dht_(f)xattrop implementation did not implement migration phase1/phase2 checks which could cause issues with rebalance on sharded volumes. This does not solve the issue where fops may reach the target out of order. > Change-Id: I2416fc35115e60659e35b4b717fd51f20746586c > BUG: 1471031 > Signed-off-by: N Balachandran <nbalacha@redhat.com> Change-Id: I2416fc35115e60659e35b4b717fd51f20746586c BUG: 1498081 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: EBADF handling for fremovexattr and fsetxattrN Balachandran2017-10-033-3/+44
| | | | | | | | | | | | | | | | | Add EBADF handling for dht_fremovexattr and dht_fsetxattr. > BUG: 1476665 > Signed-off-by: N Balachandran <nbalacha@redhat.com> > Reviewed-on: https://review.gluster.org/17999 > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> (cherry picked from commit 747a08d34e2a1e94d7fce68a3577370288bb1955) Change-Id: Ide0d5812dae79655d2565157e5baabcd753b4309 BUG: 1467010 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* dht: add FOP check to dht_file_setattr_cbkRavishankar N2017-09-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | Problem: bug-797171.7 loaded error-gen xlator on the brick which sent EBADF for a non fd-based fop, namely setattr. This caused dht_check_and_open_fd_on_subvol_task() to crash as local->fd was NULL. Fix: Call dht_check_and_open_fd_on_subvol_task() from dht_file_setattr_cbk only for dht_fsetattr and not dht_setattr or dht_setattr2 > Reviewed-on: https://review.gluster.org/18208 > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Susant Palai <spalai@redhat.com> > Reviewed-by: Amar Tumballi <amarts@redhat.com> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > Reviewed-by: N Balachandran <nbalacha@redhat.com> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> (cherry picked from commit 47188e9eac59de416a5c86c7ec7540ed6aaa1c98) Signed-off-by: Ravishankar N <ravishankar@redhat.com> Change-Id: Iab4999e213bf2065804f3f8237e470ad454e3c99 BUG: 1497122
* cluster/dht: Check for open fd only on EBADFN Balachandran2017-09-174-160/+160
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DHT fd based fops used to check if the fd was open on the cached subvol before winding the call. However, this introduced a performance regression of about 30% for reads. This check was introduced to handle cases where files were migrated while IOs were happening. As this is not the common case, dht will now check if the fd is open on the cached subvol only if the call fails with EBADF. This will prevent a performance hit where a rebalance is not running. > BUG: 1476665 > Signed-off-by: N Balachandran <nbalacha@redhat.com> > Reviewed-on: https://review.gluster.org/17976 > Smoke: Gluster Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Amar Tumballi <amarts@redhat.com> > Reviewed-by: Susant Palai <spalai@redhat.com> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Change-Id: I2035a858d63c3fcd22bb634055bbb0ad01686808 BUG: 1467010 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/18057 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/rebalance: Fix hardlink migration failuresSusant Palai2017-08-112-15/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A brief about how hardlink migration works: - Different hardlinks (to the same file) may hash to different bricks, but their cached subvol will be same. Rebalance picks up the first hardlink, calculates it's hash(call it TARGET) and set the hashed subvolume as an xattr on the data file. - Now all the hardlinks those come after this will fetch that xattr and will create linkto files on TARGET (all linkto files for the hardlinks will be hardlink to each other on TARGET). - When number of hardlinks on source is equal to the number of hardlinks on TARGET, the data migration will happen. RACE:1 Since rebalance is multi-threaded, the first lookup (which decides where the TARGET subvol should be), can be called by two hardlink migration parallely and they may end up creating linkto files on two different TARGET subvols. Hence, hardlinks won't be migrated. Fix: Rely on the xattr response of lookup inside gf_defrag_handle_hardlink since it is executed under synclock. RACE:2 The linkto files on TARGET can be created by other clients also if they are doing lookup on the hardlinks. Consider a scenario where you have 100 hardlinks. When rebalance is migrating 99th hardlink, as a result of continuous lookups from other client, linkcount on TARGET is equal to source linkcount. Rebalance will migrate data on the 99th hardlink itself. On 100th hardlink migration, hardlink will have TARGET as cached subvolume. If it's hash is also the same, then a migration will be triggered from TARGET to TARGET leading to data loss. Fix: Make sure before the final data migration, source is not same as destination. RACE:3 Since a hardlink can be migrating to a non-hashed subvolume, a lookup from other client or even the rebalance it self, might delete the linkto file on TARGET leading to hardlinks never getting migrated. This will be addressed in a different patch in future. > Change-Id: If0f6852f0e662384ee3875a2ac9d19ac4a6cea98 > BUG: 1469964 > Signed-off-by: Susant Palai <spalai@redhat.com> > Reviewed-on: https://review.gluster.org/17755 > Smoke: Gluster Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: N Balachandran <nbalacha@redhat.com> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > Signed-off-by: Susant Palai <spalai@redhat.com> Change-Id: If0f6852f0e662384ee3875a2ac9d19ac4a6cea98 BUG: 1473141 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/17838 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* cluster/dht: fix on demand migration files from clientSusant Palai2017-08-114-20/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On demand migration of files i.e. migration done by clients triggered by a setfattr was broken. Dependency on defrag led to crash when migration was triggered from client. Note: This functionality is not available for tiered volumes. Migration from tier served client will fail with ENOTSUP. usage (But refer to the steps mentioned below to avoid any issues) : setfattr -n "trusted.distribute.migrate-data" -v "1" <filename> The purpose of fixing the on-demand client migration was to give a workaround where the user has lots of empty directories compared to files and want to do a remove-brick process. Here are the steps to trigger file migration for remove-brick process from client. (This is highly recommended to follow below steps as is) Let's say it is a replica volume and user want to remove a replica pair named brick1 and brick2. (Make sure healing is completed before you run these steps) Step-1: Start remove-brick process - gluster v remove-brick <volname> brick1 brick2 start Step-2: Kill the rebalance daemon - ps aux | grep glusterfs | grep rebalance\/ | awk '{print $2}' | xargs kill Step-3: Do a fresh mount as mentioned here - glusterfs -s ${localhostname} --volfile-id rebalance/$volume-name /tmp/mount/point Step-4: Go to one of the bricks (among brick1 and brick2) - cd <brick1 path> Step-5: Run the following command. - find . -not \( -path ./.glusterfs -prune \) -type f -not -perm 01000 -exec bash -c 'setfattr -n "distribute.fix.layout" -v "1" ${mountpoint}/$(dirname '{}')' \; -exec setfattr -n "trusted.distribute.migrate-data" -v "1" ${mountpoint}/'{}' \; This command will ignore the linkto files and empty directories. Do a fix-layout of the parent directory. And trigger a migration operation on the files. Step-6: Once this process is completed do "remove-brick force" - gluster v remove-brick <volname> brick1 brick2 force Note: Use the above script only when there are large number of empty directories. Since the script does a crawl on the brick side directly and avoids directories those are empty, the time spent on fixing layout on those directories are eliminated(even if the script does not do fix-layout on empty directories, post remove-brick a fresh layout will be built for the directory, hence not affecting application continuity). Detailing the expectation for hardlink migartion with this patch: Hardlink is migrated only for remove-brick process. It is highly essential to have a new mount(step-3) for the hardlink migration to happen. Why?: setfattr operation is an inode based operation. Since, we are doing setfattr from fuse mount here, inode_path will try to build path from the linked dentries to the inode. For a file without hardlinks the path construction will be correct. But for hardlinks, the inode will have multiple dentries linked. Without fresh mount, inode_path will always get the most recently linked dentry. e.g. if there are three hardlinks named dir1/link1, dir2/link2, dir3/link3, on a client where these hardlinks are looked up, inode_path will always return the path dir3/link3 if dir3/link3 was looked up most recently. Hence, we won't be able to create linkto files for all other hardlinks on destination (read gf_defrag_handle_hardlink for more details on hardlink migration). With a fresh mount, the lookup and setfattr become serialized. e.g. link2 won't be looked up until link1 is looked up and migrated. Hence, inode_path will always have the correct path, in this case link1 dentry is picked up(as this is the most recently looked up inode) and the path is built right. Note: If you run the above script on an existing mount(all entries looked up), hard links may not be migrated, but there should not be any other issue. Please raise a bug, if you find any issue. Tests: Manual > Change-Id: I9854cdd4955d9e24494f348fb29ba856ea7ac50a > BUG: 1450975 > Signed-off-by: Susant Palai <spalai@redhat.com> > Reviewed-on: https://review.gluster.org/17115 > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > Signed-off-by: Susant Palai <spalai@redhat.com> Change-Id: I9854cdd4955d9e24494f348fb29ba856ea7ac50a BUG: 1473140 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/17837 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* cluster/dht: initialize throttle option "normal" to same in init and reconfigureSusant Palai2017-08-111-77/+52
| | | | | | | | | | | | | | | | | | | | | | | | | Normal value were different in dht_init and dht_reconfigure. Initialization/reconfigure of throttle option are carved out to a separate function (dht_configure_throttle) now. Normal value will be "2". > Change-Id: Ie323eae019af41d6bef0a136e3d284dc82bab9a1 > BUG: 1451162 > Signed-off-by: Susant Palai <spalai@redhat.com> > Reviewed-on: https://review.gluster.org/17303 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Zhou Zhengping <johnzzpcrystal@gmail.com> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > Signed-off-by: Susant Palai <spalai@redhat.com> Change-Id: Ie323eae019af41d6bef0a136e3d284dc82bab9a1 BUG: 1473137 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/17836 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* cluster/dht: Make rebalance throttle option tuned by numberSusant Palai2017-08-113-26/+110
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current rebalance throttle options: lazy/normal/aggressive may not always be sufficient for the purpose of throttling. In our recent test, we observed for certain setups, normal and aggressive modes behaved similarly consuming full disk bandwidth. So in cases like this admin should be able to tune it down(or vice versa) depending on the need. Along with old throttle configurations, thread counts are tuned based on number. e.g. gluster v set vol-name cluster-rebal.throttle 5. Admin can tune up/down between 0 and the number of cores available. Note: For heterogenous servers, validation will fail on the old server if "number" is given for throttle configuration. The message looks something like this: "volume set: failed: Staging failed on vm2. Error: cluster.rebal-throttle should be {lazy|normal|aggressive}" Test: Manual test by logging active thread number after reconfiguring throttle option. testcase: tests/basic/distribute/throttle-rebal.t > Change-Id: I46e3cde546900307831028b344ecf601fd9b02c3 > BUG: 1438370 > Signed-off-by: Susant Palai <spalai@redhat.com> > Reviewed-on: https://review.gluster.org/16980 > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > Smoke: Gluster Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Atin Mukherjee <amukherj@redhat.com> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > Signed-off-by: Susant Palai <spalai@redhat.com> Change-Id: I46e3cde546900307831028b344ecf601fd9b02c3 BUG: 1473136 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/17835 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* cluster/dht: rebalance perf enhancementSusant Palai2017-08-112-108/+246
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Throttle settings "normal" and "aggressive" for rebalance did not have performance difference. normal mode spawns $(no. of cores - 4)/2 threads and aggressive spawns $(no. of cores - 4) threads. Though aggressive mode has twice the number of threads compared to that of normal mode, there was no performance gain when switched to aggressive mode from normal mode. RCA: During the course of debugging the above problem, we tried assigning migration job to migration threads spawned by rebalance, rather than synctasks(as there is more overhead associated to manage the task queue and threads). This gave us a significant improvement over rebalance under synctasks. This patch does not really gurantee that there will be a clear performance difference between normal and aggressive mode, but this patch certainly maximized the disk utilization for 1GBfiles run. Results: Test enviroment: Gluster Config: Number of Bricks: 2 (one brick per disk(RAID-6 12 disk)) Bricks: Brick1: server1:/brick/test1/1 Brick2: server2:/brick/test1/1 Options Reconfigured: performance.readdir-ahead: on server.event-threads: 4 client.event-threads: 4 1000 files with 1GB each were created/renamed such that all files will have server1 as cached and server2 as hashed, so that all files will be migrated. Test machines had 24 cores each. Results with/without synctask based migration: ----------------------------------------------- mode normal(10threads) aggressive(20threads) timetaken 0:55:30 (h:m:s) 0:56:3 (h:m:s) withsynctask timetaken with migrator 0:38:3 (h:m:s) 0:23:41 (h:m:s) threads From above table it can be seen that, there is a clear 2x perf gain between rebalance with synctask vs rebalance with migrator threads. Additionally this patch modifies the code so that caller will have the exact error number returned by dht_migrate_file(earlier the errno meaning was overloaded). This will help avoiding scenarios where migration failure due to ENOENT, can result in rebalance abort/failure. > Change-Id: I8904e2fb147419d4a51c1267be11a08ffd52168e > BUG: 1420166 > Signed-off-by: Susant Palai <spalai@redhat.com> > Reviewed-on: https://review.gluster.org/16427 > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: N Balachandran <nbalacha@redhat.com> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Signed-off-by: Susant Palai <spalai@redhat.com> Change-Id: I8904e2fb147419d4a51c1267be11a08ffd52168e BUG: 1473134 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/17834 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* cluster/dht: correct space check for rebalanceSusant Palai2017-08-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | With rebalance doing fallocate on destination, we don't need to add file size to the "destination available space" to decide whether to migrate the file or not. Notes: Fallocate would have already occupied the file size space on destination > Change-Id: If7f6a6654e6257726680cf20d618482a6e9095a6 > BUG: 1441508 > Signed-off-by: Susant Palai <spalai@redhat.com> > Reviewed-on: https://review.gluster.org/17104 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Amar Tumballi <amarts@redhat.com> > Reviewed-by: N Balachandran <nbalacha@redhat.com> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > Signed-off-by: Susant Palai <spalai@redhat.com> Change-Id: If7f6a6654e6257726680cf20d618482a6e9095a6 BUG: 1473133 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/17833 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* cluster/dht: Skip file migration if the subvol that meets min-free-diskSusant Palai2017-08-113-25/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... criteria happens to be the same subvol containing data-file Rebalance need to figure out a new subvol in case the hashed subvol does not have enough space. In the process of figuring out the new subvol, we need to ignore the source subvol, otherwise it will lead to data loss. Test: Manual Ran the following sizeof /tmp/1: 1.5GB sizeof /brick/1: 16GB sizeof /tmp/2: 1.5GB <start> glusterd; gluster v create test1 vm1:/brick/1 vm1:/tmp/1; gluster v start test1; mount -t glusterfs vm1:test1 /mnt; for i in {1..2000} do dd if=/dev/zero of=/mnt/file$i bs=1KB count=1 &> /dev/null; done gluster v add-brick test1 vm1:/tmp/2 gluster v set test1 min-free-disk 12GB gluster v remove-brick test1 vm1:/tmp/1 star <end> file count and data were intact. > Change-Id: Ib8fc8467a3d48a7c12958824c4f0b88e160b86c1 > BUG: 1441508 > Signed-off-by: Susant Palai <spalai@redhat.com> > Reviewed-on: https://review.gluster.org/17064 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > Signed-off-by: Susant Palai <spalai@redhat.com> Change-Id: Ib8fc8467a3d48a7c12958824c4f0b88e160b86c1 BUG: 1473133 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/17832 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: Shyamsundar Ranganathan <srangana@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* cluster/dht: Make rebalance honor min-free-diskSusant Palai2017-08-113-16/+234
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | test: Manual created files of size 1K on 2 brick(of size 1GB) setup . added a brick of size 16GB. set min-free-disk to 12GB(so that first two bricks won't receive any files). removed one of the 1st brick of size 1GB. Logs from test: [2017-04-12 08:52:08.196484] W [MSGID: 0] [dht-rebalance.c:895:__dht_check_free_space] 0-test1-dht: Write will cross min-free-disk for file - /tile32 on subvol - test1-client-1. Looking for new subvol. [2017-04-12 08:52:08.196904] I [MSGID: 0] [dht-rebalance.c:925:__dht_check_free_space] 0-test1-dht: new target found - test1-client-2 for file - /tile32 - Post migration we have two files. The new destination (/brick/1) has the data file [root@vm1 ~]# ll /brick/1/tile32 -rw-r--r--. 2 root root 0 Apr 12 14:22 /brick/1/tile32 - On the old target the linkto file is there with linkto xattr pointing to /brick/1 [root@vm1 ~]# ll /tmp/2/tile32 ---------T. 2 root root 1000 Apr 12 14:22 /tmp/2/tile32 [root@vm1 ~]# getfattr -m . -de text /tmp/2/tile32 getfattr: Removing leading '/' from absolute path names security.selinux="unconfined_u:object_r:user_tmp_t:s0" trusted.gfid="����:Aс�#�/'b2" trusted.glusterfs.dht.linkto="test1-client-2" Marking ./tests/features/worm_sh.t as bad test. Reason being, this patch failed on master branch as well and it has nothing to do with rebalance/remove-brick. > BUG: 1441508 > Change-Id: I90bae251cda3d957a49cdceda90cd08311a392fb > Signed-off-by: Susant Palai <spalai@redhat.com> > Reviewed-on: https://review.gluster.org/17034 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > Reviewed-by: Amar Tumballi <amarts@redhat.com> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Signed-off-by: Susant Palai <spalai@redhat.com> Change-Id: I90bae251cda3d957a49cdceda90cd08311a392fb BUG: 1473132 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/17831 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* dht/rebalance: Crawler performance improvementSusant Palai2017-08-111-127/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The job of the crawler in rebalance is to fetch files from each local subvolume and push them to migration queue if it is eligible for migration. And we do a lookup on the entries received to figure out the eligibilty. Since, the lookup done is on a local subvolume we receive linkto files and regular files as well. This requires us to do two lookups. first: do a lookup on the file to figure out whether it is a linkto file second: do a lookup on the file to figure out if it should be migrated Note: The migrator thread also does one lookup for the file before migration. Optimization: Remove the lookup done by the crawler. Offload these task to the migrator threads. For linkto file verification get the stat and xattr information from readdirp. So in total we have one lookup instead of three for each entry. Performance numbers: Create two node, two brick setup. Created 100000 files. And started rebalance. Since, there is no add-brick, no files will be migrated and we will get the crawler performance. Without patch: [root@gprfs039 ~]# grs Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 50070 0 0 completed 0:0:48 server2 0 0Bytes 49930 0 0 completed 0:0:44 volume rebalance: test1: success Total: 48 seconds WiththecurrentPatch: [root@gprfs039 mnt]# gluster v rebalance test1 status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 50070 0 0 completed 0:0:12 server2 0 0Bytes 49930 0 0 completed 0:0:12 volume rebalance: test1: success Total: 12 seconds That's 4X speed gain. :) > Updates glusterfs#155 > Change-Id: Idc8e5b366e76c54aa40d698876ae62fe1630b6cc > BUG: 1439571 > Signed-off-by: Susant Palai <spalai@redhat.com> > Reviewed-on: https://review.gluster.org/15781 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Updates glusterfs#155 Change-Id: Idc8e5b366e76c54aa40d698876ae62fe1630b6cc BUG: 1473129 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/17830 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* refcount: typecast function for calling on freeNiels de Vos2017-08-111-7/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All of the functions called to free the refcounted structure are doing a typecast from (void*) to their own type taht is being free'd. This really is not needed and the refcount interface is made a little simpler without the requirement of typecasting. With this small improvement in the API, all callers are updated too. Cherry picked from commit f2ca301bd741e3e3f076cd3f72fcd377bcef2a1a: > Change-Id: I32473b6d1799f62861d4b2d78ea30c09e6c80ab1 > BUG: 1416889 > Signed-off-by: Niels de Vos <ndevos@redhat.com> > Reviewed-on: https://review.gluster.org/16471 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Backport note: This patch makes it easier to backport changes that use gf_refcount_t. There is no functional change. Change-Id: I32473b6d1799f62861d4b2d78ea30c09e6c80ab1 BUG: 1471870 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://review.gluster.org/17913 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* cluster/dht: Fix fd check raceN Balachandran2017-07-202-1/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a another race between the cached subvol being updated in the inode_ctx and the fd being opened on the target. 1. fop1 -> fd1 -> subvol0 2. file migrated from subvol0 to subvol1 and cached_subvol changed to subvol1 in inode_ctx 3. fop2 -> fd1 -> subvol1 [takes new cached subvol] 4. fop2 -> checks fd ctx (fd not open on subvol1) -> opens fd1 on subvol1 5. fop1 -> checks fd ctx (fd not open on subvol0) -> tries to open fd1 on subvol0 -> fails with "No such file on directory". Fix: If dht_fd_open_on_dst fails with ENOENT or ESTALE, wind to old subvol and let the phase1/phase2 checks handle it. Change-Id: I34f8011574a8b72e3bcfe03b0cc4f024b352f225 BUG: 1467010 > BUG: 1465075 > Signed-off-by: N Balachandran <nbalacha@redhat.com> > Reviewed-on: https://review.gluster.org/17731 > Smoke: Gluster Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > Reviewed-by: Amar Tumballi <amarts@redhat.com> (cherry picked from commit f7a450c17fee7e43c544473366220887f0534ed7) Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17829 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
* cluster/dht: Check if fd is opened on dst subvolN Balachandran2017-07-196-30/+543
| | | | | | | | | | | | | | | | | | | | | | | | | | | If an fd is opened on a file, the file is migrated and the cached subvol is updated in the inode_ctx before an fd based fop is sent, the fop is sent to the dst subvol on which the fd is not opened. This causes the FOP to fail with EBADF. Now, every fd based fop will check to see that the fd has been opened on the dst subvol before winding it down. > BUG: 1465075 > Signed-off-by: N Balachandran <nbalacha@redhat.com> > Reviewed-on: https://review.gluster.org/17630 > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > Reviewed-by: Susant Palai <spalai@redhat.com> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> (cherry picked from commit 91db0d47ca267aecfc6124a3f337a4e2f2c9f1e2) Change-Id: Id92ef5eb7a5b5226688e2d2868b15e383f5f240e BUG: 1467010 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17752 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: Additional checks for rebalance estimatesN Balachandran2017-07-041-5/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rebalance estimates calculation was not handling calculations correctly when no files had been processed, i.e., when rate_lookedup was 0. Now, the estimated time is set to 0 in such scenarios as there is no way for rebalance to figure out how long the process will take to complete without knowing the rate at which the files are being processed. > BUG: 1457985 > Signed-off-by: N Balachandran <nbalacha@redhat.com> > Reviewed-on: https://review.gluster.org/17564 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Amar Tumballi <amarts@redhat.com> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Change-Id: I7b6378e297e1ba139852bcb2239adf2477336b5b BUG: 1460914 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17599 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster:dht Fix crash in dht_rename_lock_cbkN Balachandran2017-07-011-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | Use a local variable to store the call count in the STACK_WIND for loop. Using frame->local is dangerous as it could be freed while the loop is still being processed > BUG: 1466110 > Signed-off-by: N Balachandran <nbalacha@redhat.com> > Reviewed-on: https://review.gluster.org/17645 > Smoke: Gluster Build System <jenkins@build.gluster.org> > Tested-by: Nigel Babu <nigelb@redhat.com> > Reviewed-by: Amar Tumballi <amarts@redhat.com> > Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> (cherry picked from commit 56da27cf5dc6ef54c7fa5282dedd6700d35a0ab0) Change-Id: Ie65cdcfb7868509b4a83bc2a5b5d6304eabfbc8e BUG: 1466863 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17665 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/dht: Include dirs in rebalance estimatesN Balachandran2017-06-203-31/+83
| | | | | | | | | | | | | | | | | | | | | | | | | Empty directories were not being considered while calculating rebalance estimates leading to negative time-left values being displayed as part of the rebalance status. > BUG: 1457985 > Signed-off-by: N Balachandran <nbalacha@redhat.com> > Reviewed-on: https://review.gluster.org/17448 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Amar Tumballi <amarts@redhat.com> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Change-Id: I48d41d702e72db30af10e6b87b628baa605afa98 BUG: 1460914 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17530 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
* cluster/dht: Fix crash in dht_selfheal_dir_setattrN Balachandran2017-05-291-2/+6
| | | | | | | | | | | | | | | | | | | | | | | Use a local variable to store the call cnt used in the for loop for the STACK_WIND so as not to access local which may be freed by STACK_UNWIND after all fops return. > BUG: 1452102 > Signed-off-by: N Balachandran <nbalacha@redhat.com> > Reviewed-on: https://review.gluster.org/17343 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> (cherry picked from commit 17784aaa311494e4538c616f02bf95477ae781bc) Change-Id: I24f49b6dbd29a2b706e388e2f6d5196c0f80afc5 BUG: 1453056 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17349 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* cluster/dht: Rebalance on all nodes should migrate filesN Balachandran2017-05-186-20/+215
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Rebalance compares the node-uuid of a file against its own to and migrates a file only if they match. However, the current behaviour in both AFR and EC is to return the node-uuid of the first brick in a replica set for all files. This means a single node ends up migrating all the files if the first brick of every replica set is on the same node. Fix: AFR and EC will return all node-uuids for the replica set. The rebalance process will divide the files to be migrated among all the nodes by hashing the gfid of the file and using that value to select a node to perform the migration. This patch makes the required DHT and tiering changes. Some tests in rebal-all-nodes-migrate.t will need to be uncommented once the AFR and EC changes are merged. > BUG: 1366817 > Signed-off-by: N Balachandran <nbalacha@redhat.com> > Reviewed-on: https://review.gluster.org/17239 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Amar Tumballi <amarts@redhat.com> > Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> > Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> (cherry picked from commit b23bd3dbc2c153171d0bb1205e6804afe022a55f) Change-Id: I5ce41600f5ba0e244ddfd986e2ba8fa23329ff0c BUG: 1451561 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17311 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* cluster/dht: Fix crash in dht rmdirN Balachandran2017-05-181-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Using local->call_cnt to check STACK_WINDs can cause dht_rmdir_do to be called erroneously if dht_rmdir_readdirp_cbk unwinds before we check if local->call_cnt is zero in dht_rmdir_opendir_cbk. This can cause frame corruptions and crashes. Thanks to Shyam (srangana@redhat.com) for the analysis. > BUG: 1451083 > Signed-off-by: N Balachandran <nbalacha@redhat.com> > Reviewed-on: https://review.gluster.org/17305 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> (cherry picked from commit 6f7d55c9d58797beaf8d5393c03a5a545bed8bec) Change-Id: I5362cf78f97f21b3fade0b9e94d492002a8d4a11 BUG: 1451371 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17309 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* dht: Add missing braces in dht_opendirPoornima G2017-05-141-1/+2
| | | | | | | | | | | | | | | | | | | | | >Change-Id: I6adce98f52e17953f501bc590ff7189cceac3c31 >BUG: 1431908 >Signed-off-by: Poornima G <pgurusid@redhat.com> >Reviewed-on: https://review.gluster.org/17057 >Smoke: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Vijay Bellur <vbellur@redhat.com> (cherry picked from commit af218797fa98f2f75594fc9ae595f184682f1a0d) Change-Id: I6adce98f52e17953f501bc590ff7189cceac3c31 BUG: 1435942 Reviewed-on: https://review.gluster.org/17285 Tested-by: Raghavendra Talur <rtalur@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
* cluster/dht: Fix ret checkN Balachandran2017-05-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Fixed an incorrect return code check in the rebalance code. > BUG: 1448640 > Signed-off-by: N Balachandran <nbalacha@redhat.com> > Reviewed-on: https://review.gluster.org/17197 > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> (cherry picked from commit 67598f538efb24a9e5ac561b294a05e707e15761) Change-Id: I60804ff121cec7a2f0419e2ee70dd22ea7533c0c BUG: 1448864 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17204 Reviewed-by: MOHIT AGRAWAL <moagrawa@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: Pass the correct xdata in fremovexattr fopKrutika Dhananjay2017-05-011-8/+4
| | | | | | | | | | | | | | | | | | Backport of: > Change-Id: Id84bc87e48f435573eba3b24d3fb3c411fd2445d > BUG: 1440051 > Reviewed-on: https://review.gluster.org/17126 > (cherry-picked from ab88f655e6423f51e2f2fac9265ff4d4f5c3e579) Change-Id: Id84bc87e48f435573eba3b24d3fb3c411fd2445d BUG: 1426508 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: https://review.gluster.org/17134 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* cluster/dht Remove redundant logs in dht rmdirN Balachandran2017-04-271-8/+7
| | | | | | | | | | | | | | | | | | | | | | | | Removing redundant logs were introduced in https://review.gluster.org/#/c/17065/ > BUG: 1445590 > Signed-off-by: N Balachandran <nbalacha@redhat.com> > Reviewed-on: https://review.gluster.org/17118 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > Reviewed-by: Susant Palai <spalai@redhat.com> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> (cherry picked from commit 25f0a7b153b30b2c0e8278b0ce11d1199c3fb006) Change-Id: I0d6055488b51a13c91d2121e87f653cdb94888b0 BUG: 1446227 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17130 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* cluster/dht: Pass the req dict instead of NULL in dht_attr2()Krutika Dhananjay2017-04-273-57/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of: > Change-Id: Id7823fd932b4e5a9b8779ebb2b612a399c0ef5f0 > BUG: 1440051 > Reviewed on: https://review.gluster.org/17085 > (cherry-picked from commit d60ca8e96bbc16b13f8f3456f30ebeb16d0d1e47) This bug was causing VMs to pause during rebalance. When qemu winds down a STAT, shard fills the trusted.glusterfs.shard.file-size attribute in the req dict which DHT doesn't wind its STAT fop with upon detecting the file has undergone migration. As a result shard doesn't find the value to this key in the unwind path, causing it to fail the STAT with EINVAL. Also, the same bug exists in other fops too, which is also fixed in this patch. Change-Id: Id7823fd932b4e5a9b8779ebb2b612a399c0ef5f0 BUG: 1426508 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: https://review.gluster.org/17119 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/dht: rm -rf fails if dir has stale linkto filesN Balachandran2017-04-271-45/+203
| | | | | | | | | | | | | | | | | | | | | | | | | | | | rm -rf <dir> fails with ENOENT if dir contains a lot of stale linkto files. This is because a single readdirp is sent as part of the rmdir which would return and delete only as many linkto files on the bricks as would fit in one readdirp buffer. Running rm -rf <dir> multiple times will eventually delete all the files. The fix sends readdirp on each subvol until no more entries are returned. > BUG: 1442724 > Signed-off-by: N Balachandran <nbalacha@redhat.com> > Reviewed-on: https://review.gluster.org/17065 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> (cherry picked from commit e5f9ba138571bd18226462c49ff6a55f5c3ed3a4) Change-Id: I447f2d193de4bd8ac16e4541c6b919d22250e39e BUG: 1444540 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17102 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht: The xattrs sent in readdirp should be sent in opendir aswellPoornima G2017-04-271-28/+44
| | | | | | | | | | | | | | | | | | | | | | As readdir-ahead can be loaded as a child of dht, dht has to specify the xattrs it is intrested in, as part of opendir call itself. >Reviewed-on: https://review.gluster.org/16902 >Smoke: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Raghavendra G <rgowdapp@redhat.com> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >(cherry picked from commit 0f71338e1d7c0b70f4fe3b19c68612fe730d9de2) Change-Id: I012ef96cc143b0cef942df78aa7150d85ec38606 BUG: 1435942 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/16947 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: Modify local->loc.gfid in thread safe mannerPranith Kumar K2017-04-131-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of https://review.gluster.org/16986 Problem: local->loc.gfid in dht_lookup_directory() will be null-gfid for a fresh lookup. dht_lookup_dir_cbk() updates local->loc.gfid while in other thread dht_lookup_directory() is still winding lookup calls to subvolumes so there is a chance of partial gfid being seen by EC. We saw in 12x(4+2) volume, ec is receiving an loc where the gfid has last 10 bytes matching with the gfid of the directory and the first 4 bytes are all-zeros. This is leading to EC erroring out the lookup with EINVAL which leads to NFS failing lookup with EIO. snip from gdb: $37 = (dht_local_t *) 0x7fde5de5b3cc (gdb) p /x $37->loc.gfid $39 = {0x3b, 0x82, 0x10, 0x5e, 0x40, 0x65, 0x43, 0x14, 0xa0, 0xc6, 0x8, 0xf5, 0x6c, 0x2c, 0xb8, 0x56} (gdb) fr 7 state=<optimized out>) at ec-generic.c:837 837 ec_lookup_rebuild(fop->xl->private, fop, cbk); (gdb) p /x fop->loc[0].gfid $40 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x43, 0x14, 0xa0, 0xc6, 0x8, 0xf5, 0x6c, 0x2c, 0xb8, 0x56} snip from log: [2017-01-29 03:22:30.132328] W [MSGID: 122019] [ec-helpers.c:354:ec_loc_gfid_check] 0-butcher-disperse-4: Mismatching GFID's in loc [2017-01-29 03:22:30.132709] W [MSGID: 112199] [nfs3-helpers.c:3515:nfs3_log_newfh_res] 0-nfs-nfsv3: /linux-4.9.5/Documentation => (XID: b27b9474, MKDIR: NFS: 5(I/O error), POSIX: 5(Input/output error)), FH: exportid 00000000-0000-0000-0000-000000000000, gfid 00000000-0000-0000-0000-000000000000, mountid 00000000-0000-0000-0000-000000000000 [Invalid argument] Fix: update local->loc.gfid in last-call to make sure there are no races. >BUG: 1438411 >Change-Id: Ifcb7e911568c1f1f83123da6ff0cf742b91800a0 >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> BUG: 1438423 Change-Id: I804822a1d50215301881ac18318282c1a6951cfb Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://review.gluster.org/16987 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: Fix crash in "nuke-dir" featureKrutika Dhananjay2017-03-041-1/+10
| | | | | | | | | | | | | | | | | | | | | Backport of: https://review.gluster.org/16829 My patch at https://review.gluster.org/16419 is resulting in core dumps everytime I run tests/features/nuke.t. Turns out dht, upon successfully "nuking" a directory, which was initiated through a setxattr, unwinds the operation with rmdir fop signature, resulting in readdir-ahead casting a struct iatt (preparent) to dict_t, leading to a crash. Change-Id: I480f80170c31a6da3ff7809852449e6f44f7d047 BUG: 1428739 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: https://review.gluster.org/16836 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht Fix error assignment in dht_*xattr2 functionsN Balachandran2017-02-201-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Corrected the op_errno assignments and NULL checks in the dht_sexattr2 and dht_removexattr2 functions. Earlier, they unwound with the default EINVAL op_errno if the file had been deleted. > Change-Id: Iaf837a473d769cea40132487a966c7f452990071 > BUG: 1421653 > Signed-off-by: N Balachandran <nbalacha@redhat.com> > Reviewed-on: https://review.gluster.org/16610 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: MOHIT AGRAWAL <moagrawa@redhat.com> > Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> (cherry picked from commit 028626a86ea409f908783b9007c02877f20be43e) Signed-off-by: N Balachandran <nbalacha@redhat.com> Change-Id: Icc53ca52281a533fc0c8eeafb4c889c818c313e1 BUG: 1424921 Reviewed-on: https://review.gluster.org/16679 Tested-by: N Balachandran <nbalacha@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* cluster/dht: Don't update layout in rebalance_task_completionN Balachandran2017-02-071-24/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Updating the layout in the dht inode_ctx in rebalance_task_completion after the file is migrated is erroneous in case of files with hardlinks. This step can be skipped as the layout will be set in the syncop_lookup call post the migration in dht_migrate_file. > Change-Id: I24ac798a919585d91a117d6a207e6a31b88486c6 > BUG: 1415761 > Signed-off-by: N Balachandran <nbalacha@redhat.com> > Reviewed-on: https://review.gluster.org/16457 > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > Smoke: Gluster Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > Reviewed-by: Susant Palai <spalai@redhat.com> (cherry picked from commit ddf05f3d1e39cc920251c809e9ba42fe42b2c5f2) Signed-off-by: N Balachandran <nbalacha@redhat.com> Change-Id: I46176f8605cfb782aa17c2bc9ffd7a5f0d07dd88 BUG: 1419855 Reviewed-on: https://review.gluster.org/16554 Tested-by: N Balachandran <nbalacha@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* Readdir-ahead : Honour readdir-optimise option of dhtPoornima G2017-01-271-0/+13
| | | | | | | | | | | | | | | | | | | | | | | >Change-Id: I9c5e65b32e316e6a2fc7e1f5c79fce79386b78e2 >BUG: 1401812 >Signed-off-by: Poornima G <pgurusid@redhat.com> >Reviewed-on: https://review.gluster.org/16071 >Smoke: Gluster Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Change-Id: I9c5e65b32e316e6a2fc7e1f5c79fce79386b78e2 BUG: 1417027 Signed-off-by: Poornima G <pgurusid@redhat.com> (cherry picked from commit 7c6538f6c8f9a015663b4fc57c640a7c451c87f7) Reviewed-on: https://review.gluster.org/16461 Tested-by: Raghavendra G <rgowdapp@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* dht/rebalance Estimate time to complete rebalanceN Balachandran2017-01-241-1/+101
| | | | | | | | | | | | | | | | | | | | | | | | | The estimates will be logged to the rebalance log on running gluster v rebalance <vol> status > Change-Id: I9d51b139cd4c8dfde1ff2c2050720ae606c13fc6 > BUG: 1396004 > Signed-off-by: N Balachandran <nbalacha@redhat.com> > Reviewed-on: http://review.gluster.org/15893 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> (cherry picked from commit 2edd75ec8de17da89004859375844f60890a4df0) Signed-off-by: N Balachandran <nbalacha@redhat.com> Change-Id: I58b0550210149443966798d9a26e73cb598eeb6a BUG: 1415915 Reviewed-on: https://review.gluster.org/16458 Tested-by: N Balachandran <nbalacha@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* tier : Tier as a servicehari gowtham2017-01-164-6/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tierd is implemented by separating from rebalance process. The commands affected: 1) Attach tier will trigger this process instead of old one 2) tier start and tier start force will also trigger this process. 3) volume status [tier] will show tier daemon as a process instead of task and normal tier status and tier detach status works. 4) tier stop implemented. 5) detach tier implemented separately along with new detach tier status 6) volume tier volname status will work using the changes. 7) volume set works This patch has separated the tier translator from the legacy DHT rebalance code. It now sends the RPCs from the CLI to glusterd separate to the DHT rebalance code. The daemon is now a service, similar to the snapshot daemon, and can be viewed using the volume status command. The code for the validation and commit phase are the same as the earlier tier validation code in DHT rebalance. The “brickop” phase has been changed so that the status command can use this framework. The service management framework is now used. DHT rebalance does not use this framework. This service framework takes care of : *) spawning the daemon, killing it and other such processes. *) volume set options , which are written on the volfile. *) restart and reconfigure functions. Restart is to restart the daemon at two points 1)after gluster goes down and comes up. 2) to stop detach tier. *) reconfigure is used to make immediate volfile changes. By doing this, we don’t restart the daemon. it has the code to rewrite the volfile for topological changes too (which comes into place during add and remove brick). With this patch the log, pid, and volfile are separated and put into respective directories. Change-Id: I3681d0d66894714b55aa02ca2a30ac000362a399 BUG: 1313838 Signed-off-by: hari gowtham <hgowtham@redhat.com> Reviewed-on: http://review.gluster.org/13365 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: hari gowtham <hari.gowtham005@gmail.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* feature/dht: undo partially successful dir renameCsaba Henk2017-01-114-7/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As with dht, dirs are present on all subvolumes, renaming them is a compound operation and thus a partial success + partial failure scenario is possible, resulting in an inconsistent state. For purposes of reproduction, such a scenario can easily be produced by stopping the volume, edit the volfile of a certain subvolume to get at an "option read-only on" setting, and then restart the volume. Thus those operations that are to make change on the affected subvolume will fail with EROFS. To handle such scenarios, we introduce an in-memory cache where we record the return values obtained from the subvolumes. At the final stage of the dir rename operation we check if it's a partial success/fail situation. If yes, then we perform a reverse rename op on those subvolumes where the operation succeeded. Change-Id: I3d05f74f53932cb984a918d252a7309c1009a51d BUG: 1412069 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/15739 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com>
* dht: At places needed use STACK_WIND_COOKIEPoornima G2017-01-0910-654/+664
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue: frame has a void * cookie pointer. In case of STACK_WIND_COOKIE frame->cookie is assigned to what is sent by the caller. In case of STACK_WIND frame->cookie is assigned to point point to the frame itself. For ease of coding, at many places, the cookie in the cbk is used to get the pointer to the next xl. This is inconsistent when STACK_WIND_TAIL comes into picture. Eg: dht_setxattr () { for (i = 0 ; i < conf->subvolume_cnt ; i++) { STACK_WIND (..dht_checking_pathinfo_cbk, conf->subvolumes[i] ..); } dht_checking_pathinfo_cbk (...void *cookie...) { prev = cookie; ... for (i = 0; i < conf->subvolume_cnt; i++) { if (conf->subvolumes[i] == prev->this) { ... } } } Consider the below graph: dht (parent) readdir-ahead => Doesn't define setxattr and uses STACK_WIND_TAIL protocol-client With this graph, when dht_checking_pathinfo_cbk is called, cookie will have frame pointing to protocol-client. i.e. prev->this will be protocol-client. But dht was expecting it to be readdir-ahead as it has stored in conf->subvolumes[i] Solution: Hence, as a thumb rule, if cbk is using cookie, then we explicitly call STACK_WIND_COOKIE. Change-Id: I83aea1e24c809c5a91a0db7283e908e125471bd4 BUG: 1401812 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/16039 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/dht: Incorrect migration checks in fsyncN Balachandran2017-01-081-5/+6
| | | | | | | | | | | | | | | | | Fixed the order of the migration phase checks in dht_fsync_cbk. Phase1 should never be hit if op_ret is non zero. Signed-off-by: N Balachandran <nbalacha@redhat.com> Change-Id: I9222692e04868bffa93498059440f0aa553c83ec BUG: 1410777 Reviewed-on: http://review.gluster.org/16350 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht/rebalance: remove errno check for failure detectionSusant Palai2017-01-061-16/+13
| | | | | | | | | | | BUG: 1410355 Change-Id: I867419ca36a81ef7209e6911a46c1c2c898b8eab Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/16328 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: Do rename cleanup as rootPranith Kumar K2017-01-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | Problem: Rename linkfile cleanup is done as non-root which may not have priviliges to do the rename so it fails with EACCESS. MKDIR on that name in future will start to hole on this subvolume. It is not easy to hit on fuse mounts because vfs takes care of the permission checks even before rename fop is wound. But with nfs-ganesha mounts it happens. Fix: Do rename cleanup as root BUG: 1409727 Change-Id: I414c1eb6dce76b4516a6c940557b249e6c3f22f4 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/16317 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* cluster/dht: Fix dict_leak in migration check tasksN Balachandran2017-01-021-0/+7
| | | | | | | | | | | | | | | | Fixed a memleak where dict was not being unrefed in the dht_migration_complete_check_task and dht_rebalance_inprogress_task functions. Change-Id: I3d42e9a2e5c8596c985bf6431a68fd3905227383 BUG: 1409186 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/16308 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: MOHIT AGRAWAL <moagrawa@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* dht/rebalance: reverify lookup failuresSusant Palai2016-12-222-58/+130
| | | | | | | | | | | | | | | | | | race: readdirp has read one entry, and doing a lookup on that entry, but user might have renamed/removed that entry just after readdirp but before lookup. Since remove-brick is a costly opertaion,will ingore any ENOENT/ESTALE failures and move on. Change-Id: I62c7fa93c0b9b7e764065ad1574b97acf51b5996 BUG: 1408115 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/15846 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* cluster/dht: Fix memory corruption while accessing regex stored inRaghavendra G2016-12-083-45/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | private If reconfigure is executed parallely (or concurrently with dht_init), there are races that can corrupt memory. One such race is modification of regexes stored in conf (conf->rsync_regex_valid and conf->extra_regex_valid) through dht_init_regex. With change [1], reconfigure codepath can get executed parallely (with itself or with dht_init) and this fix is needed. Also, a reconfigure can race with any thread doing dht_layout_search, resulting in dht_layout_search accessing regex freed up by reconfigure (like in bz 1399134). [1] http://review.gluster.org/15046 Change-Id: I039422a65374cf0ccbe0073441f0e8c442ebf830 BUG: 1399134 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/15945 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* dht/md-cache: Filter invalidate if the file is made a linkto filePoornima G2016-12-021-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | Upcall as a part of setattr, sends an invalidation and the invalidation carries the resulting stat value. When a file is converted to linkto files, even then an invalidation is set and as a result the mountpoint shows the sticky bit in the stat of the file. eg: ---------T. 945 root root 0 Nov 8 10:14 hardlink.999 Fix: When dht recieves a notification of sticky bit change, it updates the flag, to indicate md-cache to send the subsequent lookup. Change-Id: Ic2fd7a5b196db0754f9b97072e644e6bf69da606 BUG: 1392713 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15789 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Susant Palai <spalai@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
* dht/rename : Incase of failure remove linkto file properlyJiffin Tony Thottan2016-12-012-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Generally linkto file is created using root user. Consider following case, a user is trying to rename a file which he is not permitted. So the rename fails with EACESS and when rename tries to cleanup the linkto file, it fails. The above issue happens when rename/00.t test executed on nfs-ganesha clients : Steps executed in script * create a file "abc" using root * rename the file "abc" to "xyz" using a non root user, it fails with EACESS * delete "abc" * create directory "abc" using root * again try ot rename "abc" to "xyz" using non root user, test hungs here which slowly leds to OOM kill of ganesha process RCA put forwarded by Du for OOM kill of ganesha Note that when we hit this bug, we've a scenario of a dentry being present as: * a linkto file on one subvol * a directory on rest of subvols When a lookup happens on the dentry in such a scenario, the control flow goes into an infinite loop of: dht_lookup_everywhere dht_lookup_everywhere_cbk dht_lookup_unlink_cbk dht_lookup_everywhere_done dht_lookup_directory (as local->dir_count > 0) dht_lookup_dir_cbk (sets to local->need_selfheal = 1 as the entry is a linkto file on one of the subvol) dht_lookup_everywhere (as need_selfheal = 1). This infinite loop can cause increased consumption of memory due to: 1) dht_lookup_directory assigns a new layout to local->layout unconditionally 2) Most of the functions in this loop do a stack_wind of various fops. This results in growing of call stack (note that call-stack is destroyed only after lookup response is received by fuse - which never happens in this case) Thanks Du for root causing the oom kill and Sushant for suggesting the fix Change-Id: I1e16bc14aa685542afbd21188426ecb61fd2689d BUG: 1397052 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> Reviewed-on: http://review.gluster.org/15894 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: A hard link is lost during rebalance + lookupMohit Agrawal2016-11-291-40/+62
| | | | | | | | | | | | | | | | | | | | | | Problem: A hard link is lost during rebalance + lookup.Rebalance skip files if file has hardlink.In dht_migrate_file __is_file_migratable () function checks if a file has hardlink, if yes file is not migrated but if link is created after call this function then link will lost. Solution: Call __check_file_has_hardlink to check hardlink existence after (S+T) bits in migration process ,if file has hardlink then skip the file for migrate rebalance process. BUG: 1396048 Change-Id: Ia53c07ef42f1128c2eedf959a757e8df517b9d12 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: http://review.gluster.org/15866 Reviewed-by: Susant Palai <spalai@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht Set layout after mkdir as rootN Balachandran2016-11-211-0/+3
| | | | | | | | | | | | | | | | | | | | DHT does not set the layout for newly created directories as root. This causes EPERM failures when a non-root user with insufficient permissions creates directories. credit: srangana@redhat.com for RCA Change-Id: Ia646e41665ce172c43c5f01d2707455e8eb374ed BUG: 1392772 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/15794 Reviewed-by: Susant Palai <spalai@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* afr,dht,ec: Replace GF_EVENT_CHILD_MODIFIED with event SOME_DESCENDENT_DOWN/UPPoornima G2016-11-211-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently these are few events related to child_up/down: GF_EVENT_CHILD_UP : Issued when any of the protocol client connects. GF_EVENT_CHILD_MODIFIED : Issued by afr/dht/ec GF_EVENT_CHILD_DOWN : Issued when any of the protocol client disconnects. These events get modified at the dht/afr/ec layers. Here is a brief on the same. DHT: - All the subvolumes reported once, and atleast one child came up, then GF_EVENT_CHILD_UP is issued - connect GF_EVENT_CHILD_UP is issued - disconnect GF_EVENT_CHILD_MODIFIED is issued - All the subvolumes disconnected, GF_EVENT_CHILD_DOWN is issued AFR: - First subvolume came up, then GF_EVENT_CHILD_UP is issued - Subsequent subvolumes coming up, results in GF_EVENT_CHILD_MODIFIED - Any of the subvolumes go down, then GF_EVENT_SOME_CHILD_DOWN is issued - Last up subvolume goes down, then GF_EVENT_CHILD_DOWN is issued Until the patch [1] introduced GF_EVENT_SOME_CHILD_UP, GF_EVENT_CHILD_MODIFIED was issued by afr/dht when any of the subvolumes go up or down. Now with md-cache changes, there is a necessity to differentiate between child up and down. Hence, introducing GF_EVENT_SOME_DESCENDENT_DOWN/UP and getting rid of GF_EVENT_CHILD_MODIFIED. [1] http://review.gluster.org/12573 Change-Id: I704140b6598f7ec705493251d2dbc4191c965a58 BUG: 1396038 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15764 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
* events: Add FMT_WARN for gf_eventPranith Kumar K2016-11-181-1/+1
| | | | | | | | | | | | | | | | | Raghavendra G found that posix is trying to print %s but passing an int when HEALTH_CHECK fails in posix. These are the kind of bugs that should be caught at compilation itself. Also fixed the problematic gf_event() callers. BUG: 1386097 Change-Id: Id7bd6d9a9690237cec3ca1aefa2aac085e8a1270 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15671 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>