summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/dht/src
Commit message (Collapse)AuthorAgeFilesLines
...
* cluster/tier fix bug with sql includes introduced by 12031Dan Lambright2015-09-112-3/+3
| | | | | | | | | | | | We accidentally introduced a bug where client translators have a dependency on sql. This broke freebsd smoke tests. Fix is to abstract from the client those dependencies. Change-Id: I7152573a489bacc8f32e6eb139f9ff4408288f5b BUG: 1260730 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/12155 Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* dht/remove-brick: Avoid data loss for hard link migrationSusant Palai2015-09-091-6/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: If the hashed subvol of a file has reached cluster.min-free-disk, for a create opertaion a linkto file will be created on the hashed and the data file will be created on some other brick. For creation of the linkfile we populate the dictionary with linkto key and value as the cached subvol. After successful linkto file creation, the linkto-key-value pair is not deleted form the dictionary and hence, the data file will also have linkto xattr which points to itself.This looks something like this. client-0 client-1 -------T file rwx------file linkto.xattr=client-1 linkto.xattr=client-1 Now coming to the data loss part. Hardlink migration highly depend on this linkto xattr on the data file. This value should be the new hashed subvol of the first hardlink encountered post fix-layout. But when it tries to read the linkto xattr it gets the same target as where it is sitting. Now the source and destination are same for migration. At the end of migration the source file is truncated and deleted, which in this case is the destination and also the only data file it self resulting in data loss. Change-Id: I36b1d105752bd9467757ecf3f103b45c666783d6 BUG: 1260051 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/12105 Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht: NULL dereferencing causes crashMohammed Rafi KC2015-09-081-2/+2
| | | | | | | | | | | | | | If linkfile_create is failed for some reason, then we are trying to dereference a null variable Change-Id: I3c6ff3715821b9b993d1bab7b90167de2861e190 BUG: 1260147 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12106 Reviewed-by: Susant Palai <spalai@redhat.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* tier/ctr: Solving DB Lock issue due to write contention from db connectionsJoseph Fernandes2015-09-083-44/+105
| | | | | | | | | | | | | | | | | | | | | | | Problem: The DB on the brick is been accessed by CTR, for write and tier migrator, for read and write. The write from tier migrator is reseting the heat counters after a cycle. Since we are using sqlite, two connections trying to write would cause a db lock contention. As a result CTR used to fail to update the db. Solution: Using the same db connection of CTR for reseting the heat counters. 1) Introducted a new IPC FOP for CTR 2) After the query do a ipc syncop to the underlying client xlator associated to the brick. 3) CTR in brick will catch the IPC FOP and cleat the heat counters. Change-Id: I53306bfc08dcdba479deb4ccc154896521336150 BUG: 1260730 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/12031 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* cluster/tier: avoid filling /var/run with tiering filesDan Lambright2015-09-021-4/+28
| | | | | | | | | | | | | | We failed to delete old promote/demote workfiles in /var/run. This fix removes the <pid> postfix so there will be only a single pair of files. Change-Id: Ib9aafe7b4a9d4b0c05cf03a94cc1057a423a27d2 BUG: 1253970 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/11931 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com>
* build: Fix build on Mac OS X, booleanKaleb S. KEITHLEY2015-09-011-2/+1
| | | | | | | | | | | | | | bool and true conflict with clang macros in clang on Mac OS X, possibly with newer (?) versions of clang on Linux Change-Id: Ia8c56ae68b4ebffb99b0684ac72d68ec50eaa7fa BUG: 1249391 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/11816 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Susant Palai <spalai@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* all: reduce "inline" usageJeff Darcy2015-09-015-18/+11
| | | | | | | | | | | | | | | | | | | | | | | | | There are three kinds of inline functions: plain inline, extern inline, and static inline. All three have been removed from .c files, except those in "contrib" which aren't our problem. Inlines in .h files, which are overwhelmingly "static inline" already, have generally been left alone. Over time we should be able to "lower" these into .c files, but that has to be done in a case-by-case fashion requiring more manual effort. This part was easy to do automatically without (as far as I can tell) any ill effect. In the process, several pieces of dead code were flagged by the compiler, and were removed. Change-Id: I56a5e614735c9e0a6ee420dab949eac22e25c155 BUG: 1245331 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/11769 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* cluster/dht: Don't set posix acls on linkto filesNithya Balachandran2015-08-311-0/+34
| | | | | | | | | | | | | | | | | | | | Posix acls on a linkto file change the file's permission bits and cause DHT to treat it as a non-linkto file.This happens on the migration failure of a file on which posix acls were set. The fix prevents posix acls from being set on a linkto file and copies them across only after a file has been successfully migrated. Change-Id: Iccf7ff6fba49fe05d691d9b83bf76a240848b212 BUG: 1247563 Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/12025 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* fd: Do fd_bind on successful openPranith Kumar K2015-08-283-0/+6
| | | | | | | | | | | | | | | - fd_unref should decrement fd->inode->fd_count only if it is present in the inode's fd list. - successful open/opendir should perform fd_bind. Change-Id: I81dd04f330e2fee86369a6dc7147af44f3d49169 BUG: 1207735 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11044 Reviewed-by: Anoop C S <anoopcs@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht : lock on subvols to prevent lookup vs rmdir raceSakshi2015-08-275-84/+331
| | | | | | | | | | | | | | | | | There is a possibility that while an rmdir is completed on some non-hashed subvol and proceeding to others. A lookup selfheal can recreate the same directory on those subvols for which the rmdir had succeeded. The fix is to take a blocking inodelk on the subvols before starting rmdir. Since selfheal requires lock on all subvols, if an rmdir is in progess acquiring locks will fail and vice versa. Change-Id: I841a44758c3b88f5e04d1cb73ad36e0cac9fdabb BUG: 1245065 Signed-off-by: Sakshi <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/11725 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: avoid mknod on decommissioned brickSusant Palai2015-08-252-35/+334
| | | | | | | | | | Change-Id: I8c39ce38e257758e27e11ccaaff4798138203e0c BUG: 1256243 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/11998 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht: block/handle create op falling to decommissioned brickSusant Palai2015-08-235-57/+455
| | | | | | | | | | | | | | | | | | | | | | | Problem: Post remove-brick start till commit phase, the client layout may not be in sync with disk layout because of lack of lookup. Hence,a create call may fall on the decommissioned brick. Solution: Will acquire a lock on hashed subvol. So that a fix-layout or selfheal can not step on layout while reading the layout. Even if we read a layout before remove-brick fix-layout and the file falls on the decommissioned brick, the file should be migrated to a new brick as per the fix-layout. Change-Id: If84a12ec34f981adb2b9b224e80f535cfe5bf9f2 BUG: 1232378 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/11260 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/tier : Use dht_* versions for xlator_fopsN Balachandran2015-08-181-16/+28
| | | | | | | | | | | | | The tier xlator was using the default_* versions for some xlator_fops. Changed to use the dht_* versions for all xlator_fops Change-Id: I8252fb3911b8a48a55e9eee42b89bd66bbacf799 BUG: 1254451 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/11948 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* dht: return non NULL xattr,xdata for ret >= 0Susant Palai2015-08-131-2/+2
| | | | | | | | | | Change-Id: I4a3dd8c00894ceeed4af77df2d960f372281a03b BUG: 1235989 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/11409 Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* dht/tier :rename fails with EBUSYMohammed Rafi KC2015-08-131-8/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the files was in hot tier and the look up was done already, then hashed and cached subvolume will be hot-tier. Once the file is moved from hot-tier to cold-tier, then subsequent lookup will send a revalidate lookup to hot-tier and it will find out that the file was actually moved and there is only link in the cached subvolume. So dht will return an ESTALE to fuse. Upon receiving ESTALE for a lookup, fuse will create a new inode and sent a fresh lookup. This lookup will be successful, and it will locate the file properly. Then fuse try to link the inode, but the older inode was already there in inmemory inode cache with same gfid and that is also shared with fuse kernal. So inode_link will return the older ionode itself. So the subsequent rename fop will come to gluster with the older inode. From dht_rename, we will take a lock on the inode and after successful inodelk on inode dht will send lookup before creating a link. this lookup will again find out that the file is a link file, and then dht will think that file is migrating/migrated in the mean time, and will send EBUSY. Change-Id: Ib3a01e5b1d7f64514b04bb6234026d049f082679 BUG: 1248306 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/11768 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Dan Lambright <dlambrig@redhat.com>
* dht : updating return value for layout set functionSakshi2015-08-121-2/+2
| | | | | | | | | | | Change-Id: I7fd89e00b418391afe0a13c2033919c979cc8bbb BUG: 789278 Signed-off-by: Sakshi <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/10869 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* tier/libgfdb : Setting Freq counters of un-selected files to zeroJoseph Fernandes2015-08-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | Change Time Recorder increments the write/read frequency counters on a read or write of a file, if the "features.record-counters" is "on". It is the responsibility of the tiering migrator to reset these counters to zero for un-selected files to reset them to zero as frequency counters are function of promotion/Demotion cycles. If the counters are not set to zero then, 1) the counters may overflow in the DB 2) The file may be wrongly promoted or demoted. This fix will reset the freq counters of un-selected files to zero after promotion/demotion frequency. Change-Id: Ideea2c76a52d421a7e67c37fb0c823f552b3da7a BUG: 1242504 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/11648 Tested-by: Joseph Fernandes Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* cluster/tier: fix demotion when cold tier is ECDan Lambright2015-08-121-0/+2
| | | | | | | | | | | | | | | We did not set the gfid in the loc structure in tier demotion. EC has a sanity check which fails FOPs when the loc gfid mismatches with the file attribute. When the FOP failed demotion was aborted. Change-Id: I69022c9ccb135b86e1feea93b01801b6a4100509 BUG: 1251121 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/11855 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: mohammed rafi kc <rkavunga@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* cluster/dht: Reset source file mode bits on migration failureNithya Balachandran2015-08-121-3/+95
| | | | | | | | | | | | | | | DHT rebalance uses the sgid and sticky bits to indicate that a file is being migrated. These were not removed if the file migration failed. The fix resets these bits to the original values. Change-Id: I9801bfc0bd80c0800251ccd66c1c91a51cffd909 BUG: 1236512 Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/11454 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht/tiering : create new dictionary during migrationMohammed Rafi KC2015-08-061-2/+10
| | | | | | | | | | | | | To avoid setting wrong xattr during creating link file Change-Id: Iad8de3521eae17e510035ed42e3e01933d647096 BUG: 1250828 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/11838 Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dict: dict_set_bin() should never free the pointer on errorNiels de Vos2015-07-242-0/+4
| | | | | | | | | | | | | | | | | | | | | | dict_set_bin() is handling the pointer that it passed inconsistently. Depending on the errors that can occur, the pointer passed to the dict can be free'd, but there is no guarantee. It is cleaner to have the caller free the pointer that allocated it and dict_set_bin() returned an error. When dict_set_bin() returned success, the given pointer will be free'd when dict_unref() calls data_destroy(). Many callers of dict_set_bin() already take care of free'ing the pointer on error. The ones that did not, are corrected with this change too. Change-Id: I39a4f7ebc0cae6d403baba99307d7ce408f25966 BUG: 1242280 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/11638 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* dht: send lookup even for fd based operations during rebalanceRavishankar N2015-07-221-23/+30
| | | | | | | | | | | | | | | | | | | | | Problem: dht_rebalance_inprogress_task() was not sending lookups to the destination subvolume for a file undergoing writes during rebalance. Due to this, afr was not able to populate the read_subvol and failed the write with EIO. Fix: Send lookup for fd based operations as well. Thanks to Raghavendra G for helping with the RCA. Change-Id: I638c203abfaa45b29aa5902ffd76e692a8212a19 BUG: 1244165 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/11713 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* features/shard: Use xattrop (as opposed to setxattr) for updates to size xattrKrutika Dhananjay2015-07-151-2/+2
| | | | | | | | | | Change-Id: Icd8984976812bb47ae7129426f6c1aa9393b3ab9 BUG: 1232391 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/11467 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/tier: fixes for migration over ec as cold tierDan Lambright2015-07-132-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | An opendir is done in rebalance. The graph constructed when EC is used in tiering may have no local volumes (if all the hot volumes are on one node and all the others on another node). Previously the opendir only sent fops down the local subvolumes for migration. They must be sent down both the hot and cold subvolumes for tiering. When setxattr2() received a NULL subvolume; this dereferenced an uninitialized variable. When a lookup is done during creation of the destination file, the xattr dict is "polluted" with virtual xattrs. These cause subsequent xattrs in the new file to not be written by posix. They are required by EC. The inode gfid for "entry_loc" in gf_defrag_migrate_single_file() was not initialized. This made underlying translators think the gfid was 0, and failed migration. Change-Id: I6ccda8ca8e43485b9b354341bbfcb302496f632c BUG: 1236212 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/11433 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* syncop: Include iatt to 'syncop_link' argsSoumya Koduri2015-07-101-1/+1
| | | | | | | | | | | | | | Include iatt to 'syncop_link' args to fetch proper attributes of the newly linked inode. Signed-off-by: Soumya Koduri <skoduri@redhat.com> Change-Id: If6b92961bd7a89add3791ed3a9b494087348b492 BUG: 1241788 Reviewed-on: http://review.gluster.org/11611 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* cluster/dht: use refcount to manage memory used to store migrationRaghavendra G2015-07-012-21/+46
| | | | | | | | | | | | | | | | information. Without refcounting, we might free up memory while other fops are still accessing it. BUG: 1235927 Change-Id: Ia4fa4a651cd6fe2394a0c20cef83c8d2cbc8750f Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/11418 Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* cluster/tier: stop tier migration after graph switchDan Lambright2015-06-261-0/+16
| | | | | | | | | | | | | | | | | On a graph switch, a new xlator and private structures are created. The tier migration daemon must stop using the old xlator and private structures and begin using the new ones. Otherwise, when RPCs arrive (such as counter queries from glusterd), the new xlator will be consulted but it will not have up to date information. The fix detects a graph switch and exits the daemon in this case. Typical graph switches for the tier case would be turning off performance translators. Change-Id: Ibfbd4720dc82ea179b77c81b8f534abced21e3c8 BUG: 1226005 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/11372
* tier/ctr: Ignore creation of T file and Ctr Lookup heal improvememntsJoseph Fernandes2015-06-261-3/+2
| | | | | | | | | | | | | | | | | | 1) Ignore creation of T file in ctr_mknod 2) Ignore lookup for T file in ctr_lookup 3) Ctr_lookup: a. If the gfid and pgfid in empty dont record b. Decreased log level for multiple heal attempts c. Inode/File heal happens after an expiry period, which is configurable. d. Hardlink heal happens after an expiry period, which is configurable. Change-Id: Id8eb5092e78beaec22d05f5283645081619e2452 BUG: 1235269 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/11334 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* tiering/rebalance: tier daemon stopped with out updating statusMohammed Rafi KC2015-06-251-3/+0
| | | | | | | | | | | | | | | | When a subvol goes down, tier daemon stopped immediately, and the status shows as "Progressing". With this change, with respect to tier xlator, when a subvol goes offline it will update the status as failed. Change-Id: I9f722ed0d35cda8c7fc1a7e75af52222e2d0fdb7 BUG: 1227803 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/11068 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* dht: Adding log messages to the new logging frameworkarao2015-06-2315-342/+975
| | | | | | | | | | | | | Change-Id: Ib3bb61c5223f409c23c68100f3fe884918d2dc3f BUG: 1194640 Signed-off-by: arao <arao@redhat.com> Reviewed-on: http://review.gluster.org/10021 Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Joseph Fernandes Tested-by: Joseph Fernandes Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* dht : Error value check before performing rebalance completeSakshi2015-06-191-5/+13
| | | | | | | | | | | | | Change-Id: I7a0cd288d16f27b887c7820162efdbe99a039d95 BUG: 1188242 Signed-off-by: Sakshi <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/11097 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Susant Palai <spalai@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: Fix Null pointer dereference while loggingRaghavendra G2015-06-181-8/+8
| | | | | | | | | Change-Id: I1ea358b83267b0bcdf654ce18fe881fd4a6bf08d BUG: 1233139 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/11313 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* cluster/dht: Prevent use after free bugPranith Kumar K2015-06-171-1/+3
| | | | | | | | | Change-Id: I2d1f5bb2dd27f6cea52c059b4ff08ca0fa63b140 BUG: 1231425 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11209 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* tier/dht: Fixing non atomic promotion/demotion w.r.t to frequency periodJoseph Fernandes2015-06-111-40/+59
| | | | | | | | | | | | | | | | | | | | | This fixes the ping-pong issue i.e files getting demoted immediately after promition, caused by off-sync promotion/demotion processes. The solution is do promotion/demotion refering to the system time. To have the fix working all the file serving nodes should have thier system time synchronized with each other either manually or using a NTP Server. NOTE: The ping-pong issue can re-appear even with this fix, if the admin have different promotion freq period and demotion freq period, but this would be under the control of the admin. Change-Id: I1b33a5881d0cac143662ddb48e5b7b653aeb1271 BUG: 1218717 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/11110 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/tier: account for reordered layoutsDan Lambright2015-06-112-14/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For a tiered volume the cold subvolume is always at a fixed position in the graph. DHT's layout array, on the other hand, may have the cold subvolume in either the first or second index, therefore code cannot make any assumptions. The fix searches the layout for the correct position dynamically rather than statically. The bug manifested itself in NFS, in which a newly attached subvolume had not received an existing directory. This case is a "stale entry" and marked as such in the layout for that directory. The code did not see this, because it looked at the wrong index in the layout array. The fix also adds the check for decomissioned bricks, and fixes a problem in detach tier related to starting the rebalance process: we never received the right defrag command and it did not get directed to the tier translator. Change-Id: I77cdf9fbb0a777640c98003188565a79be9d0b56 BUG: 1214289 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Joseph Fernandes <josferna@redhat.com> Reviewed-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/11092
* features/changelog: Avoid setattr fop logging during renameSaravanakumar Arumugam2015-06-113-23/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: When a file is renamed and the (renamed)file's Hashing falls into a different brick, DHT creates a special file(linkto file) in the brick(Hashed subvolume) and carries out setattr operation on that file. Currently, Changelog records this(setattr) operation in Hashed subvolume. glusterfind in turn records this operation as MODIFY operation. So, there is a NEW entry in Cached subvolume and MODIFY entry in Hashed subvolume for the same file. Solution: Avoid logging setattr operation carried out, by marking the operation as internal fop using xdata. In changelog translator, check whether setattr is set as internal fop and skip accordingly. Change-Id: I21b09afb5a638b88a4ccb822442216680b7b74fd BUG: 1230007 Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-on: http://review.gluster.org/11137 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* tier/volume set: Validate volume set option for tierMohammed Rafi KC2015-06-101-0/+6
| | | | | | | | | | | | | | | | Volume set option related to tier volume can only be set for tier volume, also currently all volume set i for tier option accepts a non-negative integer. This patch validate both condition. Change-Id: I3611af048ff4ab193544058cace8db205ea92336 BUG: 1216960 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/10751 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Joseph Fernandes
* cluster/dht: fix incorrect dst subvol info in inode_ctxNithya Balachandran2015-06-026-88/+182
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Stashing additional information in the inode_ctx to help decide whether the migration information is stale, which could happen if a file was migrated several times but FOPs only detected the P1 migration phase. If no FOP detects the P2 phase, the inode ctx1 is never reset. We now save the src subvol as well as the dst subvol in the inode ctx. The src subvol is the subvol on which the FOP was sent when the mig info was set in the inode ctx. This information is considered stale if: 1. The subvol on which the current FOP is sent is the same as the dst subvol in the ctx 2. The subvol on which the current FOP is sent is not the same as the src subvol in the ctx This does not handle the case where the same file might have been renamed such that the src subvol is the same but the dst subvol is different. However, that is unlikely to happen very often. Change-Id: I05a2e9b107ee64750c7ca629aee03b03a02ef75f BUG: 1142423 Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/10834 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: pass a destination subvol to fop2 variants to avoid races.Raghavendra G2015-06-025-179/+206
| | | | | | | | | | | | | | | | | | | The destination subvol used in the fop2 variants is either stored in inode-ctx1 or local->cached_subvol. However, it is not guaranteed that a value stored in these locations before invocation of fop2 is still present after the invocation as these locations are shared among different concurrent operations. So, to preserve the atomicity of "check dst-subvol and invoke fop2 variant if dst-subvol found", we pass down the dst-subvol to fop2 variant. This patch also fixes error handling in some fop2 variants. Change-Id: Icc226228a246d3f223e3463519736c4495b364d2 BUG: 1142423 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/10943 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com>
* tiering:static function called from a non static inline functionMohammed Rafi KC2015-06-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gcc v5.1.1 throws warning for calling a static function from a non-static inline function. <snippet from compiler warning> CC tier.lo tier.c:610:15: warning: 'tier_migrate_using_query_file' is static but used in inline function 'tier_migrate_files_using_qfile' which is not static ret = tier_migrate_using_query_file ((void *)query_cbk_args); ^ tier.c:585:47: warning: 'tier_process_brick_cbk' is static but used in inline function 'tier_build_migration_qfile' which is not static ret = dict_foreach (args->brick_list, tier_process_brick_cbk, ^ tier.c:565:176: warning: 'demotion_qfile' is static but used in inline function 'tier_build_migration_qfile' which is not static tier.c:565:158: warning: 'promotion_qfile' is static but used in inline function 'tier_build_migration_qfile' which is not static tier.c:563:58: warning: 'demotion_qfile' is static but used in inline function 'tier_build_migration_qfile' which is not static tier.c:563:40: warning: 'promotion_qfile' is static but used in inline function 'tier_build_migration_qfile' which is not static ret = remove (GET_QFILE_PATH (is_promotion)); ^ CCLD tier.la </snip> Change-Id: I46046feeb79ab4e2724b0ba6b02c9ec8b121ff4e BUG: 1226881 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/11032 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Anoop C S <achiraya@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* cluster/tier: make attach/detach work with new rebalance logicDan Lambright2015-06-022-23/+28
| | | | | | | | | | | | | | | The new rebalance performance improvements added new datastructures which were not initialized in the tier case. Function dht_find_local_subvol_cbk() needs to accept a list built by lower level DHT translators in order to build the local subvolumes list. Change-Id: Iab03fc8e7fadc22debc08cd5bc781b9e3e270497 BUG: 1222088 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/10795 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* dht: Add lookup-optimize configuration option for DHTShyam2015-06-024-16/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently with commit 4eaaf5 a mixed version cluster would have issues if lookup-uhashed is set to auto, as older clients would fail to validate the layouts if newer clients (i.e 3.7 or upwards) create directories. Also, in a mixed version cluster rebalance daemon would set commit hash for some subvolumes and not for the others. This commit fixes this problem by moving the enabling of the functionality introduced in the above mentioned commit to a new dht option. This option also has a op_version of 3_7_1 thereby preventing it from being set in a mixed version cluster. It brings in the following changes, - Option can be set only if min version of the cluster is 3.7.1 or more - Rebalance and mkdir update the layout with the commit hashes only if this option is set, hence ensuring rebalance works in a mixed version cluster, and also directories created by newer clients do not cause layout errors when read by older clients - This option also supersedes lookup-unhased, to enable the optimization for lookups more deterministic and not conflict with lookup-unhashed settings. Option added is cluster.lookup-optimize, which is a boolean. Usage: # gluster volume set VOLNAME cluster.lookup-optimize on Change-Id: Ifd1d4ce3f6438fcbcd60ffbfdbfb647355ea1ae0 BUG: 1222126 Signed-off-by: Shyam <srangana@redhat.com> Reviewed-on: http://review.gluster.org/10797 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Kaushal M <kaushal@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* DHT/permissoin: Let setattr consume stat built from lookup in heal pathSusant Palai2015-06-011-2/+0
| | | | | | | | | | | | | | | | setattr call post mkdir(selfheal) ends up using the mode bits returned by mkdir,which miss the required suid, sgid and sticky bit. Hence, the fix is to use the mode bits from local->stbuf which was used to create the missing directories. Change-Id: I478708c80e28edc6509b784b0ad83952fc074a5b BUG: 1110262 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/8208 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: maintain start state of rebalance daemon across graph switch.Dan Lambright2015-06-011-2/+9
| | | | | | | | | | | | | | When we did a graph switch on a rebalance daemon, a second call to gf_degrag_start() was done. This lead to multiple threads doing migration. When multiple threads try to move the same file there can be deadlocks. Change-Id: I931ca7fe600022f245e3dccaabb1ad004f732c56 BUG: 1226005 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/10977 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* build: do not #include "config.h" in each fileNiels de Vos2015-05-2920-99/+0
| | | | | | | | | | | | | | | | | | Instead of including config.h in each file, and have the additional config.h included from the compiler commandline (-include option). When a .c file tests for a certain #define, and config.h was not included, incorrect assumtions were made. With this change, it can not happen again. BUG: 1222319 Change-Id: I4f9097b8740b81ecfe8b218d52ca50361f74cb64 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/10808 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* tiering/rebalance: Use separate pid/socket file for tieringMohammed Rafi KC2015-05-281-2/+3
| | | | | | | | | | | | | | When promotion/demotion daemon starts, it uses the same pidfile as rebalance. This patch will introduce a different pid file for the same. Change-Id: Ic484c53f51e00ae6b2d697748a9600b14829e23b BUG: 1221970 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/10792 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System
* tier: Do not allow detach-tier commands on a non-tiered volumeMohammed Rafi KC2015-05-281-1/+2
| | | | | | | | | | | | Change-Id: Ic92d25db68e40ef4a4388ef42affd1b3ee5a7ec6 BUG: 1221270 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/10773 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Kaushal M <kaushal@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System
* xlators/cluster/dht: Fix Explicit null dereferenced (CID 1291727).Günther Deschner2015-05-281-1/+1
| | | | | | | | | | | | | Coverity CID 1291727. Guenther Change-Id: I95f01b638f74370f0ef04383f0f9d5799abe31f5 BUG: 789278 Signed-off-by: Guenther Deschner <gd@samba.org> Reviewed-on: http://review.gluster.org/10300 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/dht: Fix dht_setxattr to follow files under migrationNithya Balachandran2015-05-281-17/+374
| | | | | | | | | | | | | | | | If a file is under migration, any xattrs created on it are lost post migration of the file. This is because the xattrs are set only on the cached subvol of the source and as the source is under migration, it becomes a linkto file post migration. Change-Id: Ib8e233b519cf954e7723c6e26b38fa8f9b8c85c0 BUG: 1193636 Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/10212 Tested-by: NetBSD Build System Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: Don't rely on linkto xattr to find destination subvol during ↵Raghavendra G2015-05-281-101/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | phase 2 of migration. linkto xattr on source file cannot be relied to find where the data file currently resides. This can happen if there are multiple migrations before phase 2 detection by a client. For eg., * migration (M1, node1, node2) starts. * application writes some data. DHT correctly stores the state in inode context that phase-1 of migration is in progress * migration M1 completes * migration (M2, node2, node3) is triggered and completed * application resumes writes to the file. DHT identifies it as phase-2 of migration. However, linkto xattr on node1 points to node2, but the file is on node3. A lookup correctly identifies node3 as cached subvol TBD: When we identify phase-2 of a previous migration (say M1), there might be a migration in progress - say (M3, node3, node4). In this case we need to send writes to both (node3, node4) not just node3. Also, the inode state needs to correctly indicate that its in phase-1 of migration. I'll send this as a different patch. Change-Id: I1a861f766258170af2f6c0935468edb6be687b95 BUG: 1142423 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/10805 Tested-by: NetBSD Build System