summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/dht/src
Commit message (Collapse)AuthorAgeFilesLines
...
* cluster/tier: make attach/detach work with new rebalance logicDan Lambright2015-09-022-25/+31
| | | | | | | | | | | | | | | | | | | | | | | This is a backport of 10795. > The new rebalance performance improvements added new > datastructures which were not initialized in the > tier case. Function dht_find_local_subvol_cbk() needs > to accept a list built by lower level DHT translators > in order to build the local subvolumes list. > Change-Id: Iab03fc8e7fadc22debc08cd5bc781b9e3e270497 > BUG: 1222088 > Signed-off-by: Dan Lambright <dlambrig@redhat.com> > Reviewed-on: http://review.gluster.org/10795 > Tested-by: NetBSD Build System <jenkins@build.gluster.org> > Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Change-Id: Icbd51c96ae4d367d1edf41cdd0edb35095195699 BUG: 1259079 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/12085 Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* cluster/dht: maintain start state of rebalance daemon across graph switch.Dan Lambright2015-09-021-3/+12
| | | | | | | | | | | | | | | | | | | This is a backport of fix 10977. > When we did a graph switch on a rebalance daemon, a second call > to gf_degrag_start() was done. This lead to multiple threads > doing migration. When multiple threads try to move the same > file there can be deadlocks. > Change-Id: I931ca7fe600022f245e3dccaabb1ad004f732c56 > BUG: 1226005 Change-Id: I163d2d04692eba36c986ea9835f588962c92b93f BUG: 1259078 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/12082 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: mohammed rafi kc <rkavunga@redhat.com>
* cluster/tier: account for reordered layoutsDan Lambright2015-09-022-14/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a backport of 11092 > For a tiered volume the cold subvolume is always at a fixed > position in the graph. DHT's layout array, on the other hand, > may have the cold subvolume in either the first or second > index, therefore code cannot make any assumptions. The fix > searches the layout for the correct position dynamically > rather than statically. > The bug manifested itself in NFS, in which a newly attached > subvolume had not received an existing directory. This case > is a "stale entry" and marked as such in the layout for > that directory. The code did not see this, because it > looked at the wrong index in the layout array. > The fix also adds the check for decomissioned bricks, and > fixes a problem in detach tier related to starting the > rebalance process: we never received the right defrag > command and it did not get directed to the tier translator. > Change-Id: I77cdf9fbb0a777640c98003188565a79be9d0b56 > BUG: 1214289 > Signed-off-by: Dan Lambright <dlambrig@redhat.com> Change-Id: Idb2eec9ba25812f41de7f960a0314c92341d6b5d BUG: 1259081 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/12086 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: mohammed rafi kc <rkavunga@redhat.com>
* cluster/dht: Don't set posix acls on linkto filesNithya Balachandran2015-08-311-0/+34
| | | | | | | | | | | | | | | | | | | | | | Posix acls on a linkto file change the file's permission bits and cause DHT to treat it as a non-linkto file.This happens on the migration failure of a file on which posix acls were set. The fix prevents posix acls from being set on a linkto file and copies them across only after a file has been successfully migrated. Change-Id: Iccf7ff6fba49fe05d691d9b83bf76a240848b212 BUG: 1258377 Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/12025 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/12062 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/dht: avoid mknod on decommissioned brickSusant Palai2015-08-272-35/+334
| | | | | | | | | | | | BUG: 1256702 Change-Id: I0795720cb77a9c77e608f34fbb69574fd2acb542 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/11998 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/12024
* dht: block/handle create op falling to decommissioned brickSusant Palai2015-08-265-56/+455
| | | | | | | | | | | | | | | | | | | | | | | | | Problem: Post remove-brick start till commit phase, the client layout may not be in sync with disk layout because of lack of lookup. Hence,a create call may fall on the decommissioned brick. Solution: Will acquire a lock on hashed subvol. So that a fix-layout or selfheal can not step on layout while reading the layout. Even if we read a layout before remove-brick fix-layout and the file falls on the decommissioned brick, the file should be migrated to a new brick as per the fix-layout. BUG: 1256283 Change-Id: I3ef1adaf20dfb9524396a3648d1a664464eda8c1 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/11260 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/12001
* dht/tiering : create new dictionary during migrationMohammed Rafi KC2015-08-191-2/+10
| | | | | | | | | | | | | | | | | | | | | | | To avoid setting wrong xattr during creating link file Back port of: >Change-Id: Iad8de3521eae17e510035ed42e3e01933d647096 >BUG: 1250828 >Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> >Reviewed-on: http://review.gluster.org/11838 >Reviewed-by: N Balachandran <nbalacha@redhat.com> >Tested-by: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Dan Lambright <dlambrig@redhat.com> >Reviewed-by: Raghavendra G <rgowdapp@redhat.com> (cherry picked from commit a3faffb259d5288907fac33a2822a8f61c3e86fe) Change-Id: I76ef168cd881c8fd828283a1ae70ed251fc44aaa BUG: 1254438 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/11945 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* dht/tier :rename fails with EBUSYMohammed Rafi KC2015-08-191-8/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the files was in hot tier and the look up was done already, then hashed and cached subvolume will be hot-tier. Once the file is moved from hot-tier to cold-tier, then subsequent lookup will send a revalidate lookup to hot-tier and it will find out that the file was actually moved and there is only link in the cached subvolume. So dht will return an ESTALE to fuse. Upon receiving ESTALE for a lookup, fuse will create a new inode and sent a fresh lookup. This lookup will be successful, and it will locate the file properly. Then fuse try to link the inode, but the older inode was already there in inmemory inode cache with same gfid and that is also shared with fuse kernal. So inode_link will return the older ionode itself. So the subsequent rename fop will come to gluster with the older inode. From dht_rename, we will take a lock on the inode and after successful inodelk on inode dht will send lookup before creating a link. this lookup will again find out that the file is a link file, and then dht will think that file is migrating/migrated in the mean time, and will send EBUSY. Back port of : >Change-Id: Ib3a01e5b1d7f64514b04bb6234026d049f082679 >BUG: 1248306 >Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> >Reviewed-on: http://review.gluster.org/11768 >Tested-by: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Raghavendra G <rgowdapp@redhat.com> >Reviewed-by: Dan Lambright <dlambrig@redhat.com> >Tested-by: NetBSD Build System <jenkins@build.gluster.org> >Tested-by: Dan Lambright <dlambrig@redhat.com> (cherry picked from commit 0ad26041fbf65ab36856a0ad178c32e51bf87319) Change-Id: I1278a2c2ccc2cadcbe147db836f0526f079f6038 BUG: 1254437 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/11944 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* cluster/tier : Use dht_* versions for xlator_fopsN Balachandran2015-08-191-16/+28
| | | | | | | | | | | | | | | | The tier xlator was using the default_* versions for some xlator_fops. Changed to use the dht_* versions for all xlator_fops Change-Id: I8252fb3911b8a48a55e9eee42b89bd66bbacf799 BUG: 1254468 Signed-off-by: N Balachandran <nbalacha@redhat.com> (cherry picked from commit 0c20107a60726804030f98a7f79b94c677e6a7b6) Reviewed-on: http://review.gluster.org/11951 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* cluster/tier: fix demotion when cold tier is ECDan Lambright2015-08-121-0/+2
| | | | | | | | | | | | | | | | | | | This is a backport of 11855. We did not set the gfid in the loc structure in tier demotion. EC has a sanity check which fails FOPs when the loc gfid mismatches with the file attribute. When the FOP failed demotion was aborted. > Change-Id: I69022c9ccb135b86e1feea93b01801b6a4100509 > BUG: 1251121 > Signed-off-by: Dan Lambright <dlambrig@redhat.com> Change-Id: I266d554e3e0a2ff024a5ba3a7e9ca40866688eae BUG: 1252907 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/11901 Reviewed-by: mohammed rafi kc <rkavunga@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* dht: Adding log messages to the new logging frameworkarao2015-07-2715-335/+961
| | | | | | | | | | | | | | | | | | | | | | | | Backported from: http://review.gluster.org/10021 > Change-Id: Ib3bb61c5223f409c23c68100f3fe884918d2dc3f > BUG: 1194640 > Reviewed-on: http://review.gluster.org/10021 > Reviewed-by: N Balachandran <nbalacha@redhat.com> > Reviewed-by: Joseph Fernandes <josferna@redhat.com> > Tested-by: Joseph Fernandes <josferna@redhat.com> > Reviewed-by: Dan Lambright <dlambrig@redhat.com> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > Tested-by: Raghavendra G <rgowdapp@redhat.com> > Signed-off-by: arao <arao@redhat.com> BUG: 1217722 Change-Id: Ide79c6c1e6a466fb52f955c90a2b22711bec794a Signed-off-by: arao <arao@redhat.com> Signed-off-by: Anusha Rao <arao@redhat.com> Reviewed-on: http://review.gluster.org/11350 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* dht: send lookup even for fd based operations during rebalanceRavishankar N2015-07-241-22/+30
| | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/11713 Problem: dht_rebalance_inprogress_task() was not sending lookups to the destination subvolume for a file undergoing writes during rebalance. Due to this, afr was not able to populate the read_subvol and failed the write with EIO. Fix: Send lookup for fd based operations as well. Thanks to Raghavendra G for helping with the RCA. Change-Id: Iaa427666328109bbdf228876e62c13b75b7df88e BUG: 1245934 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/11744 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* features/shard: Use xattrop (as opposed to setxattr) for updates to size xattrKrutika Dhananjay2015-07-211-2/+2
| | | | | | | | | | | | Backport of: http://review.gluster.org/11467 Change-Id: I9effecbb1296d11cf1629b5e5cc38192f84cfcb3 BUG: 1243655 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/11689 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* syncop: Include iatt to 'syncop_link' argsSoumya Koduri2015-07-151-1/+1
| | | | | | | | | | | | | | | | | Include iatt to 'syncop_link' args to fetch proper attributes of the newly linked inode. This is backport of the below fix - http://review.gluster.org/11611 Change-Id: If6b92961bd7a89add3791ed3a9b494087348b492 BUG: 1243408 Reviewed-on: http://review.gluster.org/11611 Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/11677 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/tier: fixes for migration over ec as cold tierDan Lambright2015-07-142-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a backport of fix 11433. > An opendir is done in rebalance. The graph constructed when > EC is used in tiering may have no local volumes (if > all the hot volumes are on one node and all the others on > another node). Previously the opendir only sent fops down > the local subvolumes for migration. They must be sent down > both the hot and cold subvolumes for tiering. > When setxattr2() received a NULL subvolume; this dereferenced > an uninitialized variable. > When a lookup is done during creation of the destination > file, the xattr dict is "polluted" with virtual xattrs. > These cause subsequent xattrs in the new file to not be > written by posix. They are required by EC. > The inode gfid for "entry_loc" in gf_defrag_migrate_single_file() > was not initialized. This made underlying translators > think the gfid was 0, and failed migration. > Change-Id: I6ccda8ca8e43485b9b354341bbfcb302496f632c > BUG: 1236212 > Signed-off-by: Dan Lambright <dlambrig@redhat.com> Change-Id: I9b26725e055eecfec235c4291ee90b0e53d0ea62 BUG: 1242274 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/11652 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* cluster/dht: Fix Null pointer dereference while loggingRaghavendra G2015-07-061-8/+8
| | | | | | | | | | | Change-Id: I1ea358b83267b0bcdf654ce18fe881fd4a6bf08d BUG: 1233158 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/11314 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: use refcount to manage memory used to store migrationRaghavendra G2015-07-012-21/+46
| | | | | | | | | | | | | | information. Without refcounting, we might free up memory while other fops are still accessing it. BUG: 1235928 Change-Id: Ia4fa4a651cd6fe2394a0c20cef83c8d2cbc8750f Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/11419 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* tier/ctr: Ignore creation of T file and Ctr Lookup heal improvememntsDan Lambright2015-06-271-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a back port of 11334 1) Ignore creation of T file in ctr_mknod 2) Ignore lookup for T file in ctr_lookup 3) Ctr_lookup: a. If the gfid and pgfid in empty dont record b. Decreased log level for multiple heal attempts c. Inode/File heal happens after an expiry period, which is configurable. d. Hardlink heal happens after an expiry period, which is configurable. > Change-Id: Id8eb5092e78beaec22d05f5283645081619e2452 > BUG: 1235269 > Signed-off-by: Joseph Fernandes <josferna@redhat.com> > Reviewed-on: http://review.gluster.org/11334 > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Dan Lambright <dlambrig@redhat.com> > Tested-by: Dan Lambright <dlambrig@redhat.com> Change-Id: Ia28a5cf975e41d318906f707deca447aaa35630f BUG: 1236288 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/11446 Reviewed-by: Joseph Fernandes Tested-by: Joseph Fernandes Tested-by: Gluster Build System <jenkins@build.gluster.com>
* tier/dht: Fixing non atomic promotion/demotion w.r.t to frequency periodJoseph Fernandes2015-06-271-40/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes the ping-pong issue i.e files getting demoted immediately after promition, caused by off-sync promotion/demotion processes. The solution is do promotion/demotion refering to the system time. To have the fix working all the file serving nodes should have thier system time synchronized with each other either manually or using a NTP Server. NOTE: The ping-pong issue can re-appear even with this fix, if the admin have different promotion freq period and demotion freq period, but this would be under the control of the admin. Backport of http://review.gluster.org/#/c/11110/ to 3.7.x: > Change-Id: I1b33a5881d0cac143662ddb48e5b7b653aeb1271 > BUG: 1218717 > Signed-off-by: Joseph Fernandes <josferna@redhat.com> > Reviewed-on: http://review.gluster.org/11110 > Reviewed-by: Dan Lambright <dlambrig@redhat.com> > Tested-by: Dan Lambright <dlambrig@redhat.com> > Tested-by: Gluster Build System <jenkins@build.gluster.com> Signed-off-by: Joseph Fernandes <josferna@redhat.com> Change-Id: I81bd1d677487ebc0fc46df4980500102571de68e BUG: 1230857 Reviewed-on: http://review.gluster.org/11191 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* tiering/rebalance: tier daemon stopped with out updating statusMohammed Rafi KC2015-06-261-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a subvol goes down, tier daemon stopped immediately, and the status shows as "Progressing". With this change, with respect to tier xlator, when a subvol goes offline it will update the status as failed. Back port of: >Change-Id: I9f722ed0d35cda8c7fc1a7e75af52222e2d0fdb7 >BUG: 1227803 >Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> >Reviewed-on: http://review.gluster.org/11068 >Tested-by: NetBSD Build System <jenkins@build.gluster.org> >Reviewed-by: Dan Lambright <dlambrig@redhat.com> >Tested-by: Dan Lambright <dlambrig@redhat.com> (cherry picked from commit d3714f252d91f4d1d5df05c4dcc8bc7c2ee75326) Change-Id: I268b7e2631779d15db5d9aec495dfd00ded94c5c BUG: 1235203 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/11414 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* tiering:static function called from a non static inline functionMohammed Rafi KC2015-06-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/#/c/11032/ gcc v5.1.1 throws warning for calling a static function from a non-static inline function. <snippet from compiler warning> CC tier.lo tier.c:610:15: warning: 'tier_migrate_using_query_file' is static but used in inline function 'tier_migrate_files_using_qfile' which is not static ret = tier_migrate_using_query_file ((void *)query_cbk_args); ^ tier.c:585:47: warning: 'tier_process_brick_cbk' is static but used in inline function 'tier_build_migration_qfile' which is not static ret = dict_foreach (args->brick_list, tier_process_brick_cbk, ^ tier.c:565:176: warning: 'demotion_qfile' is static but used in inline function 'tier_build_migration_qfile' which is not static tier.c:565:158: warning: 'promotion_qfile' is static but used in inline function 'tier_build_migration_qfile' which is not static tier.c:563:58: warning: 'demotion_qfile' is static but used in inline function 'tier_build_migration_qfile' which is not static tier.c:563:40: warning: 'promotion_qfile' is static but used in inline function 'tier_build_migration_qfile' which is not static ret = remove (GET_QFILE_PATH (is_promotion)); ^ CCLD tier.la </snip> Change-Id: I46046feeb79ab4e2724b0ba6b02c9ec8b121ff4e BUG: 1231767 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/11232 Reviewed-by: Joseph Fernandes Tested-by: Joseph Fernandes Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* dht : Error value check before performing rebalance completeSakshi2015-06-221-5/+13
| | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/#/c/11097/ >Change-Id: I7a0cd288d16f27b887c7820162efdbe99a039d95 >BUG: 1188242 >Signed-off-by: Sakshi <sabansal@redhat.com> Change-Id: I7a0cd288d16f27b887c7820162efdbe99a039d95 BUG: 1233632 Signed-off-by: Sakshi <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/11329 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: Prevent use after free bugPranith Kumar K2015-06-181-1/+3
| | | | | | | | | | | | Backport of http://review.gluster.org/11209 BUG: 1233042 Change-Id: If3685c9ed84a6720d8696d11773005e9786b503f Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11305 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* tier/volume set: Validate volume set option for tierMohammed Rafi KC2015-06-161-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Volume set option related to tier volume can only be set for tier volume, also currently all volume set i for tier option accepts a non-negative integer. This patch validate both condition. Back port of: >Change-Id: I3611af048ff4ab193544058cace8db205ea92336 >BUG: 1216960 >Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> >Signed-off-by: Dan Lambright <dlambrig@redhat.com> >Reviewed-on: http://review.gluster.org/10751 >Tested-by: Gluster Build System <jenkins@build.gluster.com> >Tested-by: NetBSD Build System <jenkins@build.gluster.org> >Reviewed-by: Joseph Fernandes (cherry picked from commit f6a062044a3447bea5bf0fcf21a3f85c00fb6c7d) Change-Id: Ic6081f0ce7ae7effac69ba192bd35c8d382a11d5 BUG: 1230560 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/11173 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Joseph Fernandes Tested-by: Joseph Fernandes
* features/changelog: Avoid setattr fop logging during renameSaravanakumar Arumugam2015-06-113-23/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: When a file is renamed and the (renamed)file's Hashing falls into a different brick, DHT creates a special file(linkto file) in the brick(Hashed subvolume) and carries out setattr operation on that file. Currently, Changelog records this(setattr) operation in Hashed subvolume. glusterfind in turn records this operation as MODIFY operation. So, there is a NEW entry in Cached subvolume and MODIFY entry in Hashed subvolume for the same file. Solution: Avoid logging setattr operation carried out, by marking the operation as internal fop using xdata. In changelog translator, check whether setattr is set as internal fop and skip accordingly. Change-Id: I21b09afb5a638b88a4ccb822442216680b7b74fd BUG: 1230687 Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-on: http://review.gluster.org/11183 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* dht: Add lookup-optimize configuration option for DHTShyam2015-06-054-16/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently with commit 4eaaf5 a mixed version cluster would have issues if lookup-uhashed is set to auto, as older clients would fail to validate the layouts if newer clients (i.e 3.7 or upwards) create directories. Also, in a mixed version cluster rebalance daemon would set commit hash for some subvolumes and not for the others. This commit fixes this problem by moving the enabling of the functionality introduced in the above mentioned commit to a new dht option. This option also has a op_version of 3_7_1 thereby preventing it from being set in a mixed version cluster. It brings in the following changes, - Option can be set only if min version of the cluster is 3.7.1 or more - Rebalance and mkdir update the layout with the commit hashes only if this option is set, hence ensuring rebalance works in a mixed version cluster, and also directories created by newer clients do not cause layout errors when read by older clients - This option also supersedes lookup-unhased, to enable the optimization for lookups more deterministic and not conflict with lookup-unhashed settings. Option added is cluster.lookup-optimize, which is a boolean. Usage: # gluster volume set VOLNAME cluster.lookup-optimize on Change-Id: Ifd1d4ce3f6438fcbcd60ffbfdbfb647355ea1ae0 BUG: 1225940 Signed-off-by: Shyam <srangana@redhat.com> Reviewed-on: http://review.gluster.org/10976 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: fix incorrect dst subvol info in inode_ctxNithya Balachandran2015-06-046-68/+156
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stashing additional information in the inode_ctx to help decide whether the migration information is stale, which could happen if a file was migrated several times but FOPs only detected the P1 migration phase. If no FOP detects the P2 phase, the inode ctx1 is never reset. We now save the src subvol as well as the dst subvol in the inode ctx. The src subvol is the subvol on which the FOP was sent when the mig info was set in the inode ctx. This information is considered stale if: 1. The subvol on which the current FOP is sent is the same as the dst subvol in the ctx 2. The subvol on which the current FOP is sent is not the same as the src subvol in the ctx This does not handle the case where the same file might have been renamed such that the src subvol is the same but the dst subvol is different. However, that is unlikely to happen very often. Change-Id: I05a2e9b107ee64750c7ca629aee03b03a02ef75f BUG: 1225809 Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/10967 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* dht/rebalance : Fixed rebalance failureNithya Balachandran2015-06-042-3/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The rebalance process determines the local subvols for the node it is running on and only acts on files in those subvols. If a dist-rep or dist-disperse volume is created on 2 nodes by dividing the bricks equally across the nodes, one process might determine it has no local_subvols. When trying to update the commit hash, the function attempts to lock all local subvols. On the node with no local_subvols the dht inode lock operation fails, in turn causing the rebalance to fail. In a dist-rep volume with 2 nodes, if brick 0 of each replica set is on node1 and brick 1 is on node2, node2 will find that it has no local subvols. Change-Id: I7d73b5b4bf1c822eae6df2e6f79bd6a1606f4d1c BUG: 1221656 Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on-master: http://review.gluster.org/10786 Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/10788 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: Fix dht_setxattr to follow files under migrationNithya Balachandran2015-06-041-17/+346
| | | | | | | | | | | | | | | | If a file is under migration, then any xattrs created on it are lost post migration of the file. This is because the xattrs are set only on the cached subvol of the source and as the source is under migration, it becomes a linkto file post migration. Change-Id: Ib8e233b519cf954e7723c6e26b38fa8f9b8c85c0 BUG: 1225839 Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/10968 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: pass a destination subvol to fop2 variants to avoid races.Raghavendra G2015-06-045-134/+193
| | | | | | | | | | | | | | | | | | The destination subvol used in the fop2 variants is either stored in inode-ctx1 or local->cached_subvol. However, it is not guaranteed that a value stored in these locations before invocation of fop2 is still present after the invocation as these locations are shared among different concurrent operations. So, to preserve the atomicity of "check dst-subvol and invoke fop2 variant if dst-subvol found", we pass down the dst-subvol to fop2 variant. This patch also fixes error handling in some fop2 variants. Change-Id: Icc226228a246d3f223e3463519736c4495b364d2 BUG: 1225809 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/10966 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/dht: Don't rely on linkto xattr to find destination subvolRaghavendra G2015-06-031-101/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | during phase 2 of migration. linkto xattr on source file cannot be relied to find where the data file currently resides. This can happen if there are multiple migrations before phase 2 detection by a client. For eg., * migration (M1, node1, node2) starts. * application writes some data. DHT correctly stores the state in inode context that phase-1 of migration is in progress * migration M1 completes * migration (M2, node2, node3) is triggered and completed * application resumes writes to the file. DHT identifies it as phase-2 of migration. However, linkto xattr on node1 points to node2, but the file is on node3. A lookup correctly identifies node3 as cached subvol TBD: When we identify phase-2 of a previous migration (say M1), there might be a migration in progress - say (M3, node3, node4). In this case we need to send writes to both (node3, node4) not just node3. Also, the inode state needs to correctly indicate that its in phase-1 of migration. I'll send this as a different patch. Change-Id: I1a861f766258170af2f6c0935468edb6be687b95 BUG: 1225809 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/10805 Reviewed-on: http://review.gluster.org/10965 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* dht/rebalance: Change log_level to DEBUGSusant Palai2015-06-011-2/+3
| | | | | | | | | | | BUG: 1221503 Change-Id: I4ac87bb69e05e2dd445fc0a5bcf48d5bd3d0020b Reviewed-on: http://review.gluster.org/10756 Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/10778
* cluster/tier: load libgfdb.so properly in all casesNiels de Vos2015-06-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | We should load libgfdb.so.0, not libgfdb.so Cherry picked from commit 628406f28364f6019261a3bb37335a494ccf8dda: > Change-Id: I7a0d64018ccd9893b1685de391e99b5392bd1879 > BUG: 1222092 > Signed-off-by: Dan Lambright <dlambrig@redhat.com> > Reviewed-on: http://review.gluster.org/10796 > Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> > Reviewed-by: Joseph Fernandes > Reviewed-by: Niels de Vos <ndevos@redhat.com> > Tested-by: Gluster Build System <jenkins@build.gluster.com> Change-Id: I7a0d64018ccd9893b1685de391e99b5392bd1879 BUG: 1221534 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/10799 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: mohammed rafi kc <rkavunga@redhat.com>
* tiering/rebalance: Use separate pid/socket file for tieringMohammed Rafi KC2015-05-311-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | Back port of http://review.gluster.org/10792 When promotion/demotion daemon starts, it uses the same pidfile as rebalance. This patch will introduce a different pid file for the same. >Change-Id: Ic484c53f51e00ae6b2d697748a9600b14829e23b >BUG: 1221970 >Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> >Reviewed-on: http://review.gluster.org/10792 >Reviewed-by: Atin Mukherjee <amukherj@redhat.com> >Tested-by: Gluster Build System <jenkins@build.gluster.com> >Tested-by: NetBSD Build System Change-Id: Idda13e983ffd443672aee0873ee51e8cc7089c49 BUG: 1221969 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/10980 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Joseph Fernandes Reviewed-by: Kaushal M <kaushal@redhat.com>
* tier: Do not allow detach-tier commands on a non-tiered volumeMohammed Rafi KC2015-05-311-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | Back port of http://review.gluster.org/10773 >Change-Id: Ic92d25db68e40ef4a4388ef42affd1b3ee5a7ec6 >BUG: 1221270 >Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> >Reviewed-on: http://review.gluster.org/10773 >Reviewed-by: Atin Mukherjee <amukherj@redhat.com> >Reviewed-by: Raghavendra G <rgowdapp@redhat.com> >Reviewed-by: Kaushal M <kaushal@redhat.com> >Tested-by: Gluster Build System <jenkins@build.gluster.com> >Tested-by: NetBSD Build System >Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Change-Id: I4b52da590dfcca8edc7e2b7e0c24c5dab7983c10 BUG: 1221967 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/10979 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Joseph Fernandes Reviewed-by: Kaushal M <kaushal@redhat.com>
* glusterd: add counter support for tiered volumesDan Lambright2015-05-283-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | back port of http://review.gluster.org/10292 This fix adds support to view the number of promoted or demoted files from the cli. The mechanism is isolmorphic to checking the status of volumes being rebalanced. gluster volume rebalance <vol> tier status >Change-Id: I1b11ca27355ceec36c488967c23531202030e205 >BUG: 1213063 >Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> >Signed-off-by: Dan Lambright <dlambrig@redhat.com> >Reviewed-on: http://review.gluster.org/10292 >Reviewed-by: Atin Mukherjee <amukherj@redhat.com> >Tested-by: Gluster Build System <jenkins@build.gluster.com> Change-Id: I543e886f17132b544274c83fdecca5a8da9d092a BUG: 1221477 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/10775 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* dht/tier/rebalancer: Fix reset of tiering client pidJoseph Fernandes2015-05-104-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the patch http://review.gluster.org/#/c/9657 the client pid set by tiering migration was getting over- written in dht_start_rebalance_task(). Just corrected it in dht_setxattr() before calling dht_start_rebalance_task() and removed it from dht_start_rebalance_task(). > http://review.gluster.org/#/c/10502/ > Cherry picked from commit a5fe0f594d41e1a11661d9074bb19e9c2e2c4776 > Change-Id: I37cfa111f83a4e5d498042575c93799f60b49870 > BUG: 1217937 > Signed-off-by: Joseph Fernandes <josferna@redhat.com> > Reviewed-on: http://review.gluster.org/10502 > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Susant Palai <spalai@redhat.com> > Reviewed-by: Dan Lambright <dlambrig@redhat.com> Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/10502 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Susant Palai <spalai@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Signed-off-by: Joseph Fernandes <josferna@redhat.com> Conflicts: xlators/cluster/dht/src/dht-common.c xlators/cluster/dht/src/tier.c Change-Id: Id513114c9a880c6196162dd4b35bbf1155a8cd09 BUG: 1219027 Reviewed-on: http://review.gluster.org/10609 Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* dht: make lookup-unhashed=auto do something actually usefulJeff Darcy2015-05-096-48/+577
| | | | | | | | | | | | | | | | | | | | | | | | | The key concept here is to determine whether a directory is "clean" by comparing its last-known-good topology to the current one for the volume. These are stored as "commit hashes" on the directory and the volume root respectively. The volume's commit hash changes whenever a brick is added or removed, and a fix-layout is done. A directory's commit hash changes only when a full rebalance (not just fix-layout) is done on it. If all bricks are present and have a directory commit hash that matches the volume commit hash, then we can assume that every file is in its "proper" place. Therefore, if we look for a file in that proper place and don't find it, we can assume it's not on any other subvolume and *safely* skip the global (broadcast to all) lookup. Change-Id: Id6ce4593ba1f7daffa74cfab591cb45960629ae3 BUG: 1220064 Reviewed-on-master: http://review.gluster.org/#/c/7702/ Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Signed-off-by: Shyam <srangana@redhat.com> Reviewed-on: http://review.gluster.org/10729 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* core: use reference counting for mem_acct structuresJeff Darcy2015-05-092-12/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When freeing memory, our memory-accounting code expects to be able to dereference from the (previously) allocated block to its owning translator. However, as we have already found once in option validation and twice in logging, that translator might itself have been freed and the dereference attempt causes on of our daemons to crash with SIGSEGV. This patch attempts to fix that as follows: * We no longer embed a struct mem_acct directly in a struct xlator, but instead allocate it separately. * Allocated memory blocks now contain a pointer to the mem_acct instead of the xlator. * The mem_acct structure contains a reference count, manipulated in both the normal and translator allocate/free code using atomic increments and decrements. * Because it's now a separate structure, we can defer freeing the mem_acct until its reference count reaches zero (either way). * Some unit tests were disabled, because they embedded their own copies of the implementation for what they were supposedly testing. Life's too short to spend time fixing tests that seem designed to impede progress by requiring a certain implementation as well as behavior. Change-Id: Id929b11387927136f78626901729296b6c0d0fd7 BUG: 1219026 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/10417 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10723 Tested-by: NetBSD Build System
* glusterd: support for tier volumes 'detach start' and 'detach commit'Dan Lambright2015-05-094-8/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Back port of http://review.gluster.org/10108 These commands work in a manner analagous to rebalancing when removing a brick. The existing migration daemon detects "detach start" and switches to moving data off the hot tier. While in this state all lookups are directed to the cold tier. gluster v detach-tier <vol> start gluster v detach-tier <vol> commit The status and stop cli commands shall be submitted separately. >Change-Id: I24fda5cc3ba74f5fb8aa9a3234ad51f18b80a8a0 >BUG: 1205540 >Signed-off-by: Dan Lambright <dlambrig@redhat.com> >Signed-off-by: root <root@localhost.localdomain> >Signed-off-by: Dan Lambright <dlambrig@redhat.com> >Reviewed-on: http://review.gluster.org/10108 >Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Change-Id: I212d748d077fb5870ee84b316c653acbafbea3f7 BUG: 1220047 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/10708 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: change log level of developer logs to DEBUGVijay Bellur2015-05-091-2/+2
| | | | | | | | | | | | | | | | | | Backport of : http://review.gluster.org/10281 A few log messages in dht directory self heal at log level INFO are useful only for developers and these logs tend to casue excessive logs in our log files. Hence moving the log level of such logs to DEBUG. Change-Id: I8a543f4ddeb5c20b2978a0f7b18d8baccc935a54 BUG: 1217949 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/10281 Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-on: http://review.gluster.org/10704 Tested-by: NetBSD Build System Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
* cluster/tier: don't use hot tier until subvolumes readyDan Lambright2015-05-082-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a backport of fix 10435 to Gluster 3.7. When we attach a tier, the hot tier becomes the hashed subvolume. But directories may not yet have been replicated by the fix layout process. Hence lookups to those directories will fail on the hot subvolume. We should only go to the hashed subvolume once the layout has been fixed. This is known if the layout for the parent directory does not have an error. If there is an error, the cold tier is considered the hashed subvolume. The exception to this rules is ENOCON, in which case we do not know where the file is and must abort. Note we may revalidate a lookup for a directory even if the inode has not yet been populated by FUSE. This case can happen in tiering (where one tier has completed a lookup but the other has not, in which case we revalidate one tier when we call lookup the second time). Such inodes are still invalid and should not be consulted for validation. > http://review.gluster.org/#/c/10435/ > Change-Id: Ia2bc62e1d807bd70590bd2a8300496264d73c523 > BUG: 1214289 > Signed-off-by: Dan Lambright <dlambrig@redhat.com> > Reviewed-on: http://review.gluster.org/10435 > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > Reviewed-by: N Balachandran <nbalacha@redhat.com> > Signed-off-by: Dan Lambright <dlambrig@redhat.com> Change-Id: Ia2bc62e1d807bd70590bd2a8300496264d73c523 BUG: 1219547 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/10649 Tested-by: NetBSD Build System Reviewed-by: Joseph Fernandes Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/shard: Implement [f]truncate fopsKrutika Dhananjay2015-05-081-2/+2
| | | | | | | | | | | | | | | Backport of: http://review.gluster.org/10631 To-Do: * Make ftruncate work even in the absence of path * Aggregate and update ia_blocks appropriately when a file is truncated to a lower size. Change-Id: Icd424430066233ba61a030e72fdddf692d2b3f22 BUG: 1214247 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/10638 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: rename handling in dht volumeNithya Balachandran2015-05-081-0/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Background: Glusterfs changelogs are stored in each brick, which records the changes happened in that brick. Georep will run in all the nodes of master and processes changelogs "independently". Processing changelogs is in brick level, but all the fops will be replayed on "slave mount" point. Problem: With a DHT volume, in changelog "internal fops" are NOT recorded. For Rename case, Rename is recorded in "hashed" brick changelog. (DHT's internal fops like creating linkto file, unlink is NOT recorded). This lead us to inconsistent rename operations. For example, Distribute volume created with Two bricks B1, B2. //Consider master volume mounted @ /mnt/master and following operations executed: cd /mnt/master touch f1 // f1 falls on B1 Hash mv f1 f2 // f2 falls on B2 Hash // Here, Changelogs are recorded as below: @B1 CREATE f1 @B2 RENAME f1 f2 Here, race exists between Brick B1 and B2, say B2 will get executed first. Source file f1 itself is "NOT PRESENT", so it will go ahead and create f2 (Current implementation). We have this problem When rename falls in another brick and file is unlinked in Master. Similar kind of issue exists in following case too(multiple rename): CREATE f1 RENAME f1 f2 RENAME f2 f1 Solution: Instead of carrying out "changelogging" at "HASHED volume", carry out at the "CACHED volume". This way we have rename operations carried out where actual files are present. So,Changelog recorded as : @B1 CREATE f1 RENAME f1 f2 credit: sarumuga@redhat.com PS: Some of the races as the one below are _NOT_ fixed by this patch * f1 and f2 exist. B1 and B2 are their respective cached subvols. For both files hashed-subvol == cached-subvol * mv f1 f2 on master. * B1 has change-log entry of rename f1 f2 * rebalance migrates f2 from B1 and B2 * mv f2 f1 on master. * B2 has change-log entry of rename f2 f1 Since changelog entries (rename f1 f2) and (rename f2 f1) are processed independently by gsyncds, which of either f1 and f2 survives on slave is subject to race. Note that on master its file f1 with name f1 which survived. On slave it can be either file f1 with name f1 or file f2 with name f2 based on who wins the race of processing changelog. BUG: 1219412 Change-Id: I43725d69635e2ce065135691ef629014e8df7d50 Original-Author: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/10410 Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-on: http://review.gluster.org/10628 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* guster/dht: tiered volumes may not allow access to files undergoing migrationDan Lambright2015-05-081-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a backport of fix 10324 to Gluster 3.7. If a read IO occurs against a file that has reached rebalance phase 2, we redirect the IO to the destination. For tiered volumes, when we try to reopen the file (on the destination), the lower level DHT receives the open call and fails; it does not have a "cached subvol". Fix is to "teach" the lower level DHT of the new location by sending a locate before the open. > http://review.gluster.org/#/c/10324/ > Change-Id: Ia4acb0035ff1da15f6a8f9ed54f43c76e8b98f5f > BUG: 1214048 > Signed-off-by: Dan Lambright <dlambrig@redhat.com> > Signed-off-by: root <root@gprfs018.sbu.lab.eng.bos.redhat.com> > Signed-off-by: Dan Lambright <dlambrig@redhat.com> > Reviewed-on: http://review.gluster.org/10324 > Tested-by: NetBSD Build System > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > Tested-by: Raghavendra G <rgowdapp@redhat.com> > Signed-off-by: Dan Lambright <dlambrig@redhat.com> Change-Id: Ia4acb0035ff1da15f6a8f9ed54f43c76e8b98f5f BUG: 1219608 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/10654 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Joseph Fernandes Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Restore build on non Linux systemsEmmanuel Dreyfus2015-05-072-4/+7
| | | | | | | | | | | | | | | | | | | | | | This change broke the build on NetBSD, FreeBSD, and MacOS X: http://review.gluster.org/10526/ We restore the build with two fixes: - Use POSIX-compliant sysconf(_SC_NPROCESSORS_ONLN) to get the number of processors, instead of Linux specific get_nprocs(). That let us remove Linux-specific #include <sys/sysinfo.h> - Only define MAX() if it is not already defined. NetBSD defines it in <sys/param.h> which is already included Backport of: I62341c670598670e47ea2f69ab94864f96588b18 BUG: 1212676 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Change-Id: I0f098153e76954bb85b5dca3f054a069e31dd94c Reviewed-on: http://review.gluster.org/10653 Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* dht/rebalance: Throttle rebalanceSusant Palai2015-05-073-4/+133
| | | | | | | | | | | | | | | | Throttle value will be "normal" by default. For throttling down, a thread will be put in to sleep. And for throttling up, gf_defrag_process_dir will wake up the sleeping threads. Change-Id: I4892ab14982a1ff305aeb2d8bbd33c79d6877b69 BUG: 1219579 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/10526 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/10629 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* dht/rebalancer: Marking tiering migration fopsJoseph Fernandes2015-05-073-6/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a follow up patch for http://review.gluster.org/#/c/10080 In the above, the suggested change in http://review.gluster.org/#/c/10080/7/xlators/cluster/dht/src/dht-rebalance.c doesnot work. The reason it doesnt work is promotion and demotion are done in a multithread way. Whenever a promotion or demotion thread is called, the frame of the old sync_op thread is not carried with it. As a result the frame->root->pid is not set. Solution: When the file is getting migrated, we get a tiering.migration key_value in the xattr dict, so that we pass this dic key-value when we do syncop_setxattr() to do data migration and set the frame->root->pid GF_CLIENT_PID_TIER_DEFRAG in dht_setxattr() just before calling dht_start_rebalance_task(). > http://review.gluster.org/#/c/10266/ > Change-Id: I86fef2d961b32fdd2c0c69d8512cbe846b393404 > BUG: 1194753 > Signed-off-by: Joseph Fernandes <josferna@redhat.com> > Reviewed-on: http://review.gluster.org/10266 > Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> > Reviewed-by: Susant Palai <spalai@redhat.com> > Reviewed-by: Dan Lambright <dlambrig@redhat.com> > Tested-by: Gluster Build System <jenkins@build.gluster.com> Change-Id: I6ab42b2c7a3c3e21c461d097b7558ee967b62c62 BUG: 1218959 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/10266 Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Susant Palai <spalai@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/10601 Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* rebalance: Introducing local crawl and parallel migrationSusant Palai2015-05-076-252/+1134
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current patch address two part of the design proposed. 1. Rebalance multiple files in parallel 2. Crawl only bricks that belong to the current node Brief design explanation for the above two points. 1. Rebalance multiple files in parallel: ------------------------------------- The existing rebalance engine is single threaded. Hence, introduced multiple threads which will be running parallel to the crawler. The current rebalance migration is converted to a "Producer-Consumer" frame work. Where Producer is : Crawler Consumer is : Migrating Threads Crawler: Crawler is the main thread. The job of the crawler is now limited to fix-layout of each directory and add the files which are eligible for the migration to a global queue in a round robin manner so that we will use all the disk resources efficiently. Hence, the crawler will not be "blocked" by migration process. Producer: Producer will monitor the global queue. If any file is added to this queue, it will dqueue that entry and migrate the file. Currently 20 migration threads are spawned at the beginning of the rebalance process. Hence, multiple file migration happens in parallel. 2. Crawl only bricks that belong to the current node: -------------------------------------------------- As rebalance process is spawned per node, it migrates only the files that belongs to it's own node for the sake of load balancing. But it also reads entries from the whole cluster, which is not necessary as readdir hits other nodes. New Design: As part of the new design the rebalancer decides the subvols that are local to the rebalancer node by checking the node-uuid of root directory prior to the crawler starts. Hence, readdir won't hit the whole cluster as it has already the context of local subvols and also node-uuid request for each file can be avoided. This makes the rebalance process "more scalable". Change-Id: I6f1b44086a09df8ca23935fd213509c70cc0c050 BUG: 1217381 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/10466 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: N Balachandran <nbalacha@redhat.com>
* cluster/afr,dht: Fix memleak after syncop_readlinkPranith Kumar K2015-05-011-0/+1
| | | | | | | | | | | | Backport of http://review.gluster.org/10305 BUG: 1216302 Change-Id: Icb0f2d6bbff806e1c5827fabcbf46b9b7983491f Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10441 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>