summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/dht
Commit message (Collapse)AuthorAgeFilesLines
* tier/create: Dynamically allocate gfid memoryMohammed Rafi KC2015-12-291-2/+13
| | | | | | | | | | | | | | | | Currently we are storing the memory as a static pointer. There is a chance to go that variable in out of scope. So we should allocate in Dynamic way. Change-Id: I096876deb8055ac3a44681599591a0a032bc0c24 BUG: 1290677 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/13102 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* tier/unlink: symlink failed to unlinkMohammed Rafi KC2015-12-291-1/+5
| | | | | | | | | | | | | | | | | | | | | during unlink of a file, we will get stat just after deleting the file, to see if the file is under migration or not. but this stat call will fail for symlink if the actual file was deleted. So it is better not to send stat request from client if it is a symlink as we are not migrating symlink. Change-Id: Idc033b24fa3522b5261e579889d2195b43419682 BUG: 1293963 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/13074 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* quota: limit xattr for subdir not healed on newly added bricksvmallika2015-12-291-1/+19
| | | | | | | | | | | | | | DHT after creating missing directory, tries to heal the xattrs. This xattrs operation fails as INTERNAL FOP key was not set Change-Id: I819d373cf7073da014143d9ada908228ddcd140c BUG: 1294479 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/13100 Reviewed-by: Susant Palai <spalai@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* Tier: "tier start force" command implementationhari gowtham2015-12-222-3/+3
| | | | | | | | | | | | | | | | The start command doesnt restart the tier deamon if the deamon is running at one node. hence to bring up the tierd on the nodes where the deamon is down, the force command is implemented. It skips the check for tierd running. Change-Id: I0037d3e5ecfe56637d0da201a97903c435d26436 BUG: 1292112 Signed-off-by: hari gowtham <hgowtham@redhat.com> Reviewed-on: http://review.gluster.org/12983 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* cluster/dht : Ftruncate on migrating file fails with EINVALN Balachandran2015-12-229-80/+317
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | What: If dht_open is called on a migrating file after the inode_ctx is set, subsequent FOPs on that fd do not open the fd on the dst subvol. This is seen when the open-ftruncate-close sequence is repeatedly called on a migrating file. A second call to the sequence described above causes dht_truncate_cbk to call dht_truncate2 as the dht_inode_ctx was already set by the first call. As dht_rebalance_in_progress_check is not called, the fd is not opened on the dst subvol. On a distributed-replicate volume, this causes AFR to open the fd using afr_fix_open, but with the wrong flags, causing posix_ftruncate to fail with EINVAL. The fix: We require fd specific information to make a decision while handling migrating files. Set the fd_ctx to indicate the fd has been opened on the dst subvol and check if it has been set while processing Phase1/Phase2 checks in the FOP callback functions. Change-Id: I43cdcd8017b4a11e18afdd210469de7cd9a5ef14 BUG: 1284823 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/12985 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* tier:delete the linkfile if data file creation failsMohammed Rafi KC2015-12-224-82/+288
| | | | | | | | | | | | | | | | | | | | | | | If we are creating data file in a hot subvolume then we will create a linkfile in cold subvolume. Linkfile creation happens first. If linkfile creation was successful and data file creation failed, then linkfile in cold subvolume will become stale. This patch will delete the linkfile as well, if data file creation fails. Also this code duplicates dht_create to make tier_create Change-Id: I377a90dad47f288e9576c7323b23cf694a91a7a3 BUG: 1290677 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12948 Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* cluster/tier: do not block in synctask created from pause tierDan Lambright2015-12-213-18/+126
| | | | | | | | | | | | | | | | | | | | | | | | | | | We had run sleep() in the pause tier callback. Blocking within a synctask is dangerous. The sleep() call does not inform the synctask scheduler that a thread is no longer running. It therefore believes it is running. If a second synctask already exists, it may not be able to run. This occurs if the thread limit in the pool has been reached. Note the pool size only grows when a synctask is created, not when it is moved from wait state to run state, as is the case when an FOP completes. When the tier is paused during migration, synctasks already exist waiting for responses to FOPs to the server with high probability. The fix is to yield() in the RPC callback, which will place the synctask into the wait queue and free up a thread for the FOP callback. A timer wakes the callback after sufficient time has elapsed for the pause to occur. Change-Id: I6a947ee04c6e5649946cb6d8207ba17263a67fc6 BUG: 1267950 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/12987 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
* tier: Demotion failed if the file was renamed when it was in coldMohammed Rafi KC2015-12-211-11/+26
| | | | | | | | | | | | | | | | | | | | During migration if the file is present we just open the file in hashed subvol. Now if the linkfile present on hashed is just linkfile to another subvol, we actually open in hashed subvol. But subsequent operation will go to linkto subvol ie, to non-hashed subvol. This operation will get failed since we haven't opened d on non-hashed. Change-Id: I9753ad3a48f0384c25509612ba76e7e10645add3 BUG: 1292067 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12980 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Susant Palai <spalai@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* tier/dht : Properly free file descriptors during data migrationJoseph Fernandes2015-12-211-6/+5
| | | | | | | | | | | | | | | While tier migration, free src and dst fd's when create of destination or open of source fails. Change-Id: I62978a669c6c9fbab5fed9df2716b9b2ba00ddf1 BUG: 1291566 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/12969 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* cluster/tier: fix tier-max-files bookeeping and helpDan Lambright2015-12-182-2/+1
| | | | | | | | | | | | | | | | | | The option tier-max-files represents the maximum number of files trasferred by a node in a gives cycle. Fix help message to reflect the "per node" aspect. The code transferred one more file per cycle than the given value. Also change the default values of max file and max bytes to very large values, effectively we will not throttle migration unless the administrator requests it via CLI. Change-Id: Ic2949ed3d8c35afe7c9ae4db72195603cfb2e28f BUG: 1292671 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/12984 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/dht: Handle failure in getxattrSusant Palai2015-12-161-0/+7
| | | | | | | | | | | | | | | | | Problem: Currently even if we have received xattrs from any one of the subvolume, we unwind with error in case the last subvol (which unwinds) received a negative response. To handle the case check if any of the subvolume has received a response and pass it down. Change-Id: Ia12a1f9671a6764f7550e6dc223324b1039fcc51 BUG: 1287539 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/12845 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* tier:unlink during migrationMohammed Rafi KC2015-12-165-75/+382
| | | | | | | | | | | | | | | | | | | | | | files deleted during promotion were not deleting as the files are moving from hashed to non-hashed. On deleting a file that is undergoing promotion, the unlink call is not sent to the dst file as the hashed subvol == cached subvol. This causes the file to reappear once the migration is complete. This patch also fixes a problem with stale linkfile deleting. Change-Id: I4b02a498218c9d8eeaa4556fa4219e91e7fa71e5 BUG: 1282390 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12829 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* dht/rebalance: Use seperate return variable for destination cleanupJoseph Fernandes2015-12-121-3/+3
| | | | | | | | | | | | | | | Use seperate local return variable for destination cleanup, without messing with the function return variable. Change-Id: Iaea9ed2927234fdb888aef7a31ec362090e98196 BUG: 1290975 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/12956 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* tier/dht: files are still going to decommissioned subvolMohammed Rafi KC2015-12-091-3/+27
| | | | | | | | | | | | | | After detach tier start, creates are still going to hot tier. Because when creating data files we are not checking for decommissioned bricks. Change-Id: I8e28258d9b2367dcc8ad6e5e91d0e54d92fdf771 BUG: 1289602 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12914 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* tier : Spawn promotion or demotion thread depending on local bricksJoseph Fernandes2015-12-091-22/+29
| | | | | | | | | | | | | | | | | | | Spawn Promotion or Demotion depending if there are any Cold or Hot bricks present localy. IF the local HOT brick list is empty dont spawn demote thread. IF the local COLD brick list is empty dont spawn promote thread. Signed-off-by: Joseph Fernandes <josferna@redhat.com> Change-Id: I524730e59414dd156c78ec0bd7a3629212697e6e BUG: 1289578 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/12912 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* cluster/tier : Fix double free in tier processN Balachandran2015-12-071-3/+1
| | | | | | | | | | | | | | | | | The tier process tries to free ipc_ctr_params twice if the syncop_ipc call in tier_process_ctr_query fails. ipc_ctr_params is freed when ctr_ipc_in_dict is freed. But ctr_ipc_out_dict is NULL when syncop_ipc fails, causing GF_FREE to be called on a non-NULL ipc_ctr_params ptr again. Change-Id: Ia15f36dfbcd97be5524588beb7caad5cb79efdb4 BUG: 1288995 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/12890 Reviewed-by: Joseph Fernandes Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* tier/dht: Fix mem leak from lookup response dictJoseph Fernandes2015-12-061-0/+6
| | | | | | | | | | | | | Fixing memory leak from response dict during a parent lookup to get the path. Change-Id: I60c23d0b25e7f763f0e53c40e71ee053aba6d555 BUG: 1288019 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/12867 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Joseph Fernandes Tested-by: Joseph Fernandes
* tier/tier: Ignoring status of already migrated filesJoseph Fernandes2015-12-031-5/+16
| | | | | | | | | | | | | Ignore the status of already migrated files and in the process don't count. Change-Id: Idba6402508d51a4285ac96742c6edf797ee51b6a BUG: 1276141 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/12758 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* cluster/tier: fix loading tier.so into glusterdN Balachandran2015-12-037-41/+72
| | | | | | | | | | | | | | | | | glusterd occasionally loads shared libraries of translators. This failed for tiering due to a reference to dht_methods which is defined as a global variable which is not necessary. The global variable has been removed and this is now a member of dht_conf and is now initialised in the *_init calls. Change-Id: Ifa0a21e3962b5cd8d9b927ef1d087d3b25312953 BUG: 1287842 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/12863 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* dht : changing variable type to avoid overflowSakshi Bansal2015-11-261-7/+9
| | | | | | | | | | | | | | | | | For layout computation we find total size of the cluster and store it in an unsigned 32 bit variable. For large clusters this value may overflow which leads to wrong computations and for some bricks the layout may overflow. Hence using unsigned 64 bit to handle large values. Change-Id: I7c3ba26ea2c4158065ea9e74705a7ede1b6759c7 BUG: 1282751 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/12597 Reviewed-by: Susant Palai <spalai@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/tier: readdirp to cold tier onlyDan Lambright2015-11-236-109/+496
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is possible a file would get migrated in the middle of a readdir operation. If there are four subvolumes A,B,C,D, and if readdir reads them in order and reaches subvol B, then, if a file is moved from D to A, it will not be included in the readdir output. This phenonema has pre-existed in DHT migration but is more apparent in tiering. When a file is moved off the hashed subvolume a T file is created. For tiering, we will make the cold subvolume the hashed subvolume. This will ensure the creation of a T file. Readdir will not skip T files in the tier translator. Making the cold subvolume the hashed subvolume ensures the T files created on promotions or creates will be less likely to fill the volume. Creates still put the data on the hot subvolume. Change-Id: Ifde557d3d0e94a4570ca9f115adee3db2ee75407 BUG: 1281598 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/12530 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/tier: Do not delete linkto file on demotionN Balachandran2015-11-182-2/+9
| | | | | | | | | | | | | | | | | The current DHT migration code will always delete the src linkto file after migration as dht always moves files to the hashed subvol. This is not the case in tiering. The lack of linkto files causes rename to fail leaving 2 files with the same name but different gfids on the volume. Modified to leave the linkto file behind if the source volume is the hashed subvolume. Change-Id: I2b99f7d34b4b719aee6232dc40c6a8f8ba88225d BUG: 1279376 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/12551 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* dht: loc should store proper gfidSakshi Bansal2015-11-171-0/+1
| | | | | | | | | | Change-Id: Ic1393d44a9ed4aaba23d7c9ddea45977b9dae5e4 BUG: 1281265 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/12574 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht: set proper errno when hashed subvol is not foundSakshi Bansal2015-11-171-5/+5
| | | | | | | | | | | Change-Id: I0c4c72e2f5a9f8a7c60ef65251c596b54de89479 BUG: 1279705 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/12559 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Susant Palai <spalai@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* core: use syscall wrappers instead of direct syscalls - tailKaleb S KEITHLEY2015-11-161-2/+2
| | | | | | | | | | | | | | | | | | | | tail, as in dog chasing its tail. These are the unwrapped syscalls that have crept in (or were missed) in the previous patches. various xlators and other components are invoking system calls directly instead of using the libglusterfs/syscall.[ch] wrappers. If not using the system call wrappers there should be a comment in the source explaining why the wrapper isn't used. Change-Id: If183487de92fc7cbc47d4c5aa3f3e80eae50b84f BUG: 1267967 Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/12589 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* afr: replica pair going offline does not require CHILD_MODIFIED eventSakshi Bansal2015-11-161-0/+6
| | | | | | | | | | | | | | | | | As a part of CHILD_MODIFIED event DHT forgets the current layout and performs fresh lookup. However this is not required when a replica pair goes offline as the xattrs can be read from other replica pairs. Hence setting different event to handle replica pair going down. Change-Id: I5ede2a6398e63f34f89f9d3c9bc30598974402e3 BUG: 1281230 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/12573 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Susant Palai <spalai@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/tier: Disallow detach commit when detach in progressDan Lambright2015-11-162-7/+5
| | | | | | | | | | | | | 1. Check if detach is running, disallow detach commit if so. 2. Cleanup shutdown of tier daemon on detach: do not rerun fix-layout, do not send incorrect status back to glusterd. Change-Id: I97202f748773c1176396a4ffd32a4c7fa9b9c1bc BUG: 1279637 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/12560 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* dht: heal directory path if the directory is not presentMohammed Rafi KC2015-11-083-7/+246
| | | | | | | | | | | | | | | | | | | After a successful nameless lookup if the directory is not present on any of the subvol, then we will get the path of the directory and will recursively send a named lookp on each parent directory. This will help particularly for the scenarios like add brick and attach-tier. Change-Id: I64c2118a5ab03bbaa59b0dfc62babdf4472a92a3 BUG: 1272949 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12376 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht: update cached subvolume during readdirp cbkMohammed Rafi KC2015-11-081-32/+62
| | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit bb2370514598a99e6ab268af81df57dc16caa2c5. issue and impact: readdirp_cbk was not resetting the layout for files, this causes problem if the files is moved from one cached subvolume and if the layout was not proper, then there is chance to fail entry fops if the fops executed with out a lookup. Because the cached subvolume will not change and the application assumes the presence of file in cached subvol. so it fails with ENOENT. The patch preset the layout information in readdirp cbk for each files in the entry. That leaves the problem the commit bb2370514598a99e6ab268af81df57dc16caa2c5 try to fix. We will fix the problem in a separate patch. Change-Id: I878ec32f44edde2fb9d4f132d9b1b547cde993d9 BUG: 1272949 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12449 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/tier correct promotion cycle calculationDan Lambright2015-11-071-3/+9
| | | | | | | | | | | | | | | The tier translator should only choose candidate files for promotion from the most recent cycle, not a multiple of the most recent cycles. Otherwise user observed behavior can be inconsistent. Remove related test in tier.t that is subject to race condition. Change-Id: I9ad1523cac00f904097ce468efa6ddd515857024 BUG: 1275524 Signed-off-by: root <root@rhs-cli-15.gdev.lab.eng.bos.redhat.com> Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/12480 Reviewed-by: Joseph Fernandes Tested-by: Gluster Build System <jenkins@build.gluster.com>
* tier/libgfdb: Replacing ASCII query file with binaryJoseph Fernandes2015-11-062-194/+152
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Earlier, when the database was queried we used to save all the queried records in an ASCII format in the query file. This caused issues like filename having ASCII delimiter and used to take a lot of space. The tier.c file also had a lot of parsing code. Here we changed the format of the query file to binary. All the logic of serialization and formating of query record is done by libgfdb. Libgfdb provides API, gfdb_write_query_record() and gfdb_read_query_record(), which the user i.e tier migrator and CTR xlator can use to write to and read from query file. With this binary format we save on disk space i.e reduce to 50% atleast as we are saving GFID's in binary format 16 bytes and not the string format which takes 36 bytes + We are not saving path of the file + we are also saving on ASCII delimiters. The on disk format of query record is as follows, +---------------------------------------------------------------------------+ | Length of serialized query record | Serialized Query Record | +---------------------------------------------------------------------------+ 4 bytes Length of serialized query record | | -------------------------------------------------| | | V Serialized Query Record Format: +---------------------------------------------------------------------------+ | GFID | Link count | <LINK INFO> |..... | FOOTER | +---------------------------------------------------------------------------+ 16 B 4 B Link Length 4 B | | | | -----------------------------| | | | | | V | Each <Link Info> will be serialized as | +-----------------------------------------------+ | | PGID | BASE_NAME_LENGTH | BASE_NAME | | +-----------------------------------------------+ | 16 B 4 B BASE_NAME_LENGTH | | | ------------------------------------------------------------------------| | | V FOOTER is a magic number 0xBAADF00D indicating the end of the record. This also serves as a serialized schema validator. Change-Id: I9db7416fd421e118dd44eafab8b535caafe50d5a BUG: 1272207 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/12354 Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* cluster/tier: fix lookup-unhashed on tiered volumesDan Lambright2015-11-051-2/+1
| | | | | | | | | | | | During attach tier the commit hash must be copied to the hot tier. Change-Id: I91b92fd8e98696993433856e1436409b657c439d BUG: 1277716 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/12498 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* tier/dht: Ignoring replica for migration countingJoseph Fernandes2015-11-041-7/+24
| | | | | | | | | | | | | | | | | | | We used to count replica files for migration counting even though they were ignore for migration as the replica brick didnt have the ownership (as per the replication xlator either AFR/EC). As a result the number of files migrated would show a wrong count, i.e each replicated file would be counted 1 + number of replica. This patch ignores such cases. Change-Id: I91aa352ee3b0a5029790653266e9333f3947d0ac BUG: 1276141 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/12453 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* cluster/tier : Files skipped during tier query parsingN Balachandran2015-11-031-2/+2
| | | | | | | | | | | | | | | | | The tier query parsing code was using fscanf to read each record. As space is a delimiter for fscanf, filenames containing spaces caused the parsing to return unexpected values causing various issues in the tier process, including crashes due to buffer overflows. Change-Id: Ife602cb7ecb158fccbc2c89e4d2959bd97098a87 BUG: 1276562 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/12469 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* quota: add version to quota xattrsvmallika2015-11-021-2/+2
| | | | | | | | | | | | | | | | | | | | | When a quota is disable and the clean-up process terminated without completely cleaning-up the quota xattrs. Now when quota is enabled again, this can mess-up the accounting A version number is suffixed for all quota xattrs and this version number is specific to marker xaltor, i.e when quota xattrs are requested by quotad/client marker will remove the version suffix in the key before sending the response Change-Id: I1ca2c11460645edba0f6b68db70d476d8d26e1eb BUG: 1272411 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/12386 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Manikandan Selvaganesh <mselvaga@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht/rebalance: rebalance failure handlingSusant Palai2015-10-292-25/+161
| | | | | | | | | | | | | | | | | | | | | | | | At current state rebalance aborts basically on any failure like fix-layout of a directory, readdirp, opendir etc. Unless it is not a remove-brick process we can ignore these failures. Major impact: Any failure in the gf_defrag_process_dir means there are files left unmigrated in the directory. Fix-layout(setxattr) failure will impact it's child subtree i.e. the child subtree will not be rebalanced. Settle-hash (commit-hash)failure will trigger lookup_everywhere for immediate children until the next commit-hash. Note: Remove-brick opertaion is still sensitive to any kind of failure. Change-Id: I08ab71909bc832f03cc1517172525376f7aed14a BUG: 1257076 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/12013 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* core: use syscall wrappers instead of direct syscalls - miscellaneousKaleb S. KEITHLEY2015-10-282-6/+8
| | | | | | | | | | | | | | | various xlators and other components are invoking system calls directly instead of using the libglusterfs/syscall.[ch] wrappers. If not using the system call wrappers there should be a comment in the source explaining why the wrapper isn't used. Change-Id: I1f47820534c890a00b452fa61f7438eb2b3f667c BUG: 1267967 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/12276 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/tier do not log error message on lookup heal for files on hot tierDan Lambright2015-10-282-17/+25
| | | | | | | | | | | | | | | | | | On fix-layout heal files are scanned. Files found are exist on the hot or cold subvolume. Those not found in the cold tier would exist on the hot. They should not be flagged as an error. Replace INFO with TRACE for common tier migration logs. Frequent migration was growing the log files too quickly. On migratation failures, do not acrue files towards cycle limit's budget. Change-Id: Ie832ee07c43bce5477ae81c939d1fe8416a11615 BUG: 1275383 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/12430 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Joseph Fernandes
* cluster/tier: add pause tier for snapshotsDan Lambright2015-10-216-7/+227
| | | | | | | | | | | | | | | | | | | | Snaps of tiered volumes cannot handle files undergoing migration. We implement a helper mechanism to "pause" migration. Any files undergoing migration are aborted. Clean up is done to remove sticky bits and data at the destination. Migration is restarted after snap completes. For testing an internal switch is added. It is not exposed externally. gluster volume set vol1 tier-pause [true|false] Change-Id: Ia85bbf89ac142e9b7e73fcbef98bb9da86097799 BUG: 1267950 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/12304 Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/dht : op_ret not set correctly in dht_fsync_cbkN Balachandran2015-10-211-1/+1
| | | | | | | | | | | | | local->op_ret was not set correctly in dht_fsync_cbk in case of files being migrated Change-Id: If73ae04368ea0c7f6868c8704dfc2deb2faee753 BUG: 1273372 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/12401 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/tier do not abort migration if a single brick is downDan Lambright2015-10-201-12/+12
| | | | | | | | | | | | | | When a bricks are down, promotion/demotion should still be possible. For example, if an EC brick is down, the other bricks are able to recover the data and migrate it. Change-Id: I8e650c640bce22a3ad23d75c363fbb9fd027d705 BUG: 1273215 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/12397 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Joseph Fernandes
* cluster/tier remove suprious log messages on valid failed migrationDan Lambright2015-10-191-1/+8
| | | | | | | | | | | | | | On a write to a replica volume, we record in all brick's databases an entry. When the tier daemon runs, it will only move the file if it is the true owner of the file as defined by the XATTR_NODE_UUID_KEY. Change-Id: Ib82717f87a3f94f3d0d9f969773de9e88d6aaf22 BUG: 1273043 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/12391 Reviewed-by: Joseph Fernandes Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/dht : Do not migrate files with POSIX locks heldN Balachandran2015-10-151-10/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dht_migrate_file does not migrate file locks to the dst file. Any locks held on the source file are lost once the migration is complete. This issue is magnified in the case of a tier volume as file migrations occur more frequently and repeatedly as compared to a DHT rebalance. The fix makes 2 changes: 1. Before starting the actual migration process, check if there are any locks held on the file. If yes, do not migrate the file. 2. The rebalance process tries to lock on the entire file just before moving into the Phase 2 of the file migration. If the lock acquisition fails, the file migration does not proceed. If the lock is granted, the file migration proceeds. This still leaves a small window where conflicting locks can be granted to different clients. If client1 requests a lock on the src file just after it is converted to a linkto file and client2 requests a lock on the dst data file, they will both be granted, but all FOPs will be redirected to the dst data file. This issue will be taken up in a subsequent patch. Change-Id: I8c895fc3cced50dd2894259d40a827c7b43d58ac BUG: 1271148 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/12347 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* tier/ctr: CTR DB named lookup heal of cold tier during attach tierJoseph Fernandes2015-10-101-2/+128
| | | | | | | | | | | | | | | | | | | | | Heal hardlink in the db for already existing data in the cold tier during attach tier. i.e during fix layout do lookup to files in the cold tier. CTR xlator on the brick/server side does db update/insert of the hardlink on a namelookup. Currently the namedlookup is done synchronous to the fixlayout that is triggered by attach tier. This is not performant, adding more time to fixlayout. The performant approach is record the hardlinks on a compressed datastore and then do the namelookup asynchronously later, giving the ctr db eventual consistency Change-Id: I4ffc337fffe7d447804786851a9183a51b5044a9 BUG: 1252586 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/11828 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* cluster/tier: add watermarks and policy driverDan Lambright2015-10-105-100/+456
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fix introduces infrastructure to support different policies for promotion and demotion. Currently the tier feature automatically promotes and demotes files periodically based on access. This is good for testing but too stringent for most real workloads. It makes it difficult to fully utilize a hot tier- data will be demoted before it is touched- its unlikely a 100GB hot SSD will have all its data touched in a window of time. A new parameter "mode" allows the user to pick promotion/demotion polcies. The "test mode" will be used for *.t and other general testing. This is the current mechanism. The "cache mode" introduces watermarks. The watermarks represent levels of data residing on the hot tier. "cache mode" policy: The % the hot tier is full is called P. Do not promote or demote more than D MB or F files. A random number [0-100] is called R. Rules for migration: if (P < watermark_low) don't demote, always promote. if (P >= watermark_low) && (P < watermark_hi) demote if R < P; promote if R > P. if (P > watermark_hi) always demote, don't promote. gluster volume set {vol} cluster.watermark-hi % gluster volume set {vol} cluster.watermark-low % gluster volume set {vol} cluster.tier-max-mb {D} gluster volume set {vol} cluster.tier-max-files {F} gluster volume set {vol} cluster.tier-mode {test|cache} Change-Id: I157f19667ec95aa1d53406041c1e3b073be127c2 BUG: 1257911 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/12039 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* tier/ctr: Solution for db locks for tier migrator and ctr using sqlite ↵Joseph Fernandes2015-10-083-16/+297
| | | | | | | | | | | | | | | | | | | | | | | | | | | version less than 3.7 i.e rhel 6.7 Problem: On RHEL 6.7, we have sqlite version 3.6.2 which doesnt support WAL journaling mode, as this journaling mode is only available in sqlite 3.7 and above. As a result we cannot have to progreses concurrently accessing sqlite, without running into db locks! Well WAL is also need for performace on CTR side. Solution: This solution is to use CTR db connection for doing queries when WAL mode is absent. i,e tier migrator will send sync_op ipc calls to CTR, which in turn will do the query and create/update the query file suggested by tier migrator. Pending: Well this solution will stop the db locks but the performance is still an issue for CTR. We are developing an in-Memory Transaction Log (iMeTaL) which will help boost the CTR performance by doing in memory udpates on the IO path and later flush the updates to the db in a batch/segment flush. Change-Id: Ie3149643ded159234b5cc6aa6cf93b9022c2f124 BUG: 1240577 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Signed-off-by: Dan Lambright <dlambrig@redhat.com> Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/12191 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Luis Pabon <lpabon@redhat.com>
* dht/rebalance: fix layout and dict leaksSusant Palai2015-10-062-0/+11
| | | | | | | | | Change-Id: Ib3911dfa1f950ff9decbe249ad798e97226dd06d BUG: 1266877 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/12295 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* dht/rebalance: fix mem-leak in migration code pathSusant Palai2015-10-011-5/+21
| | | | | | | | | Change-Id: I37faf983fc02996541f3d96a17cb2a2c2cdb6781 BUG: 1266877 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/12235 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* glusterd, dht: volume set for use-readdirp in dhtPranith Kumar K2015-10-011-0/+3
| | | | | | | | | | | | Change-Id: Icab246b1d02808864d878d949fa56f9f889b538a BUG: 1265677 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/12221 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Kaushal M <kaushal@redhat.com>
* build: export minimum symbols from xlators for correct resolutionKaleb S. KEITHLEY2015-09-245-5/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've been lucky that we haven't had any symbol collisions until now. Now we have a collision between the snapview-client's svc_lookup() and libntirpc's svc_lookup() with nfs-ganesha's FSAL_GLUSTER and libgfapi. As a short term solution all the snapview-client's FOP methods were changed to static scope. See http://review.gluster.org/11805. This works in snapview-client because all the FOP methods are defined in a single source file. This solution doesn't work for other xlators with FOP methods defined in multiple source files. To address this we link with libtool's '-export-symbols $symbol-file' (a wrapper around `ld --version-script ...` --- on linux anyway) and only export the minimum required symbols from the xlator sharedlib. N.B. the libtool man page says that the symbol file should be named foo.sym, thus the rename of *.exports to *.sym. While foo.exports worked, we will follow the documentation. Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> BUG: 1248669 Change-Id: I1de68b3e3be58ae690d8bfb2168bfc019983627c Reviewed-on: http://review.gluster.org/11814 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: soumya k <skoduri@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>