summaryrefslogtreecommitdiffstats
path: root/xlators/cluster
Commit message (Collapse)AuthorAgeFilesLines
* afr : glfs-heal implementationAnuradha2015-01-0610-38/+400
| | | | | | | | | | | | | Backport of http://review.gluster.org/6529 and http://review.gluster.org/9119 Change-Id: Ie420efcb399b5119c61f448b421979c228b27b15 BUG: 1173528 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/9335 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* cluster/dht: Change log level to avoid annoying logsVijay Bellur2015-01-051-1/+1
| | | | | | | | | | | [2015-01-04 08:03:23.820376] I [dht-common.c:1822:dht_lookup_cbk] 0-patchy-dht: Entry /tls missing on subvol patchy-replicate-0 Change-Id: Id9eae47213ed39e8bf969e82cc7b935dfada4598 BUG: 1138385 Reviewed-on: http://review.gluster.org/9382 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* cluster/afr: Make entry-self-heal in afr-v2 compatible with afr-v1Pranith Kumar K2015-01-041-1/+28
| | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/9227 Problem: entry self-heal in 3.6 and above, takes full lock on the directory only for the duration of figuring out the xattrs of the directories where as 3.5 takes locks through out the entry-self-heal. If the cluster is heterogeneous then there is a chance that 3.6 self-heal is triggered and then 3.5 self-heal will also triggered and both the self-heal daemons of 3.5 and 3.6 do self-heal. Fix: In 3.6.x and above get an entry lock on a very long name before entry self-heal begins so that 3.5 entry self-heal will not get locks until 3.6.x entry self-heal completes. BUG: 1177418 Change-Id: Iecf49d794c6b480e38563e39599a40067b3a21cb Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9355 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* ec: Fix mutex related coverity scan issuesXavier Hernandez2015-01-042-11/+19
| | | | | | | | | | | | | | | | | | | | | | | This patch solves 3 issues detected by coverity scan: CID1241484 Data race condition CID1241486 Data race condition CID1256173 Thread deadlock CID1257622 Thread deadlock With this patch, inode lock is never acquired inside a region locked with fop->lock. This is a backport of http://review.gluster.org/9230/ and http://review.gluster.org/9263/ Change-Id: I35c4633efd1b68b9f72b42661fa7c728b1f52c6a BUG: 1170954 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9244 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* Avoid spurious directory metedata split brainEmmanuel Dreyfus2014-12-231-1/+115
| | | | | | | | | | | | | | | | | | | | | When directory content is modified, [mc]time is updated. On Linux, the filesystem does it, while at least on NetBSD, the kernel file-system independant code does it. This means that when entries are added while bricks are down, the kernel sends a SETATTR [mc]time which will cause metadata split brain for the directory. In this case, clear the split brain by finding the source with the most recent modification date. Backport of: Ic0177e0df753a4748624d0b906834ed54593adb9 BUG: 1138897 Change-Id: Ic2e697be4f0074e2bbd3f23d6ad40a2d2d126a2c Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/9319 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* ec: Fix self-healing issues.Xavier Hernandez2014-12-2110-312/+549
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Three problems have been detected: 1. Self healing is executed in background, allowing the fop that detected the problem to continue without blocks nor delays. While this is quite interesting to avoid unnecessary delays, it can cause spurious failures of self-heal because it may try to recover a file inside a directory that a previous self-heal has not recovered yet, causing the file self-heal to fail. 2. When a partial self-heal is being executed on a directory, if a full self-heal is attempted, it won't be executed because another self-heal is already in process, so the directory won't be fully repaired. 3. Information contained in loc's of some fop's is not enough to do a complete self-heal. To solve these problems, I've made some changes: * Improved ec_loc_from_loc() to add all available information to a loc. * Before healing an entry, it's parent is checked and partially healed if necessary to avoid failures. * All heal requests received for the same inode while another self-heal is being processed are queued. When the first heal completes, all pending requests are answered using the results of the first heal (without full execution), unless the first heal was a partial heal. In this case all partial heals are answered, and the first full heal is processed normally. * An special virtual xattr (not physically stored on bricks) named 'trusted.ec.heal' has been created to allow synchronous self-heal of files. Now, the recommended way to heal an entire volume is this: find <mount> -d -exec getfattr -h -n trusted.ec.heal {} \; Some minor changes: * ec_loc_prepare() has been renamed to ec_loc_update(). * All loc management functions return 0 on success and -1 on error. * Do not delay fop unlocks if heal is needed. * Added basic ec xattrs initially on create, mkdir and mknod fops. * Some coding style changes This is a backport of http://review.gluster.org/9072/ Change-Id: I2a5fd9c57349a153710880d6ac4b1fa0c1475985 BUG: 1159484 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9073 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* telldir()/seekdir() portability fixesEmmanuel Dreyfus2014-12-201-6/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | POSIX says that an offset obtained from telldir() can only be used on the same DIR *. Linux is abls to reuse the offset accross closedir()/opendir() for a given directory, but this is not portable and such a behavior should be fixed. An incomplete fix for the posix xlator was merged in http://review.gluster.org/8933 This change set completes it. - Perform the same fix index xlator. - Use appropriate casts and variable types so that 32 bit signed offsets obtained by telldir() do not get clobbered when copied into 64 bit signed types. - modify afr-self-heald.c so that it does not use anonymous fd, since this will cause closedir()/opendir() between each syncop_readdir(). On failure we fallback to anonymous fs only for Linux so that we can cope with updated client vs not updated brick. - Avoid sending an EINVAL when the client request for the EOF offset. Here we fix an error in previous fix for posix xlator: since we fill each directory entry with the offset of the next entry, we must consider as EOF the offset of the last entry, and not the value of telldir() after we read it. This is a backport of I59fb7f06a872c4f98987105792d648141c258c6a BUG: 1138897 Change-Id: I1e9f3e4a7d780b98adf6d9f197ee2198d43ef94d Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/9084 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* ec: Remove O_APPEND from flags on create and open.Xavier Hernandez2014-12-184-53/+60
| | | | | | | | | | | | | | | | | | | | | | Allowing O_APPEND flag to pass through to the brick files corrupts fragment contents because writes are not stored on the desired place. Write fop has been modified so that it uses current file size as its write offset. This guarantees that all writes, even those comming from different file descriptors and clients, will write to the end of the file. This is backport of http://review.gluster.org/9079/ Change-Id: I9f721f12217a98231fe52e344166d1c94172c272 BUG: 1161885 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9080 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Raghavendra Bhat <raghavendra@redhat.com>
* ec: Fix incorrect value of EC_MAX_NODESXavier Hernandez2014-12-182-1/+4
| | | | | | | | | | | | | | | | | EC_MAX_NODES was incorrectly calculated. Now the value if computed as the minimum between the theoretical maximum and the limit imposed by the Galois Field. This is a backport of http://review.gluster.org/9193/ Change-Id: I75a8345147f344f051923d66be2c10d405370c7b BUG: 1170959 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9245 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Raghavendra Bhat <raghavendra@redhat.com>
* ec: Fix return errors when not enough bricksXavier Hernandez2014-12-128-16/+80
| | | | | | | | | | | | | | | | | | | | | | | | | Changes introduced by this patch: * Fix an incorrect error propagation when the state of the life cycle of a fop returns an error. * Fix incorrect unlocking of failed locks. * Return ENOTCONN if there aren't enough bricks online. * In readdir(p) check that the fd has been successfully open by a previous opendir. This is a backport of http://review.gluster.org/9098/ Change-Id: Ib44f25a1297849ebcbab839332f3b6359f275ebe BUG: 1161066 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9107 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Raghavendra Bhat <raghavendra@redhat.com>
* ec: Change licenseXavier Hernandez2014-12-1223-369/+138
| | | | | | | | | | | | | This is a backport of http://review.gluster.org/9201/ Change-Id: Iae90ade2421898417b53dec0417a610cf306c44b BUG: 1170515 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9233 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Raghavendra Bhat <raghavendra@redhat.com>
* cluster/afr: Eliminate locking in sh domain in metadata self-healKrutika Dhananjay2014-12-091-35/+2
| | | | | | | | | | | | | Backport of: http://review.gluster.org/9240 Change-Id: I2ab2ad9a02d88c299cfb32e0cf6baa44d1c2ee12 BUG: 1171077 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9246 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Raghavendra Bhat <raghavendra@redhat.com>
* features/marker: Filter internal xattrs in lookupPranith Kumar K2014-11-163-7/+21
| | | | | | | | | | | | | | Backport of http://review.gluster.org/9061 Afr should ignore quota-size-key as part of self-heal but should heal quota-limit key. BUG: 1163569 Change-Id: I93d203002eac4fe20b70730c27c852d783c16d7f Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9110 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Preserve errno in case of failures on all subvolsPranith Kumar K2014-11-161-5/+16
| | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/8984 Problem: When quorum is enabled and the fop fails on all the subvolumes, op_errno is set to EROFS which overrides the actual errno returned from bricks. Fix: Don't override the errno when fop fails on all subvols. Change-Id: I61e57bbf1a69407230ec172a983de18d1c624fd2 BUG: 1162122 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9087 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* ec: Avoid self-heal on directories on (f)stat callsXavier Hernandez2014-11-151-1/+2
| | | | | | | | | | | | | | | | | | | | | To avoid inconsistent directory listings, a full self-heal cannot happen on a directory until all its contents have been healed. This is controlled by a manual command using getfattr recursively and in post-order. While navigating the directories, sometimes an (f)stat fop can be sent. This fop caused a full self-heal of the directory. This patch makes that (f)stat only initiates a partial self-heal. This is a backport of http://review.gluster.org/9117/ Change-Id: I0a92bda8f4f9e43c1acbceab2d7926944a8a4d9a BUG: 1159498 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9118 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* ec: Correctly handle quota xattrsXavier Hernandez2014-11-151-0/+53
| | | | | | | | | | | This is a backport of http://review.gluster.org/8990/ Change-Id: I35e11d83c318210d44b918e847cf13db35b01510 BUG: 1158088 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8992 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* cluster/afr: Perform post-op in entry selfheal inside locksKrutika Dhananjay2014-10-311-3/+31
| | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/#/c/9020 Take entrylks in xlator domain before doing post-op (undo-pending) in entry self-heal. This is to prevent a parallel name self-heal on an entry under @fd->inode from reading pending xattrs while it is being modified by SHD after entry sh below, given that name self-heal takes locks ONLY in xlator domain and is free to read pending changelog in the absence of the following locking. Change-Id: I0bc92978efc0741d6e3f2439540d008e31472313 BUG: 1136821 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9030 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* ec: Correctly handle xtime extended attributeXavier Hernandez2014-10-281-2/+39
| | | | | | | | | Change-Id: I2bd34f063d6bf1835d5ae57a8e9aa03f3ec3deb3 BUG: 1156405 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8976 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* ec: Fix rebalance issuesXavier Hernandez2014-10-275-113/+218
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some issues in ec xlator made that rebalance didn't complete successfully and generated some warnings and errors in the log. The most critical error was a race condition that caused false corruption detection when two specific operations were executed sequentially and they shared the same lock. This explains the problem: 1. A setxattr is issued. 2. setxattr: ec locks the inode before updating the xattr. 3. setxattr: The xattr is updated. 4. setxattr: Upper xlator is notified that the operation completed. 5. setxattr: A background task is initiated to update the version of the file. 6. A stat is issued on the same file. 7. stat: Since the lock is already acquired, it's reused. 8. stat: A lookup is issued to determine version and size information of the file. At this point, operations 5 and 8 can interfere. This can make that lookup sees different information on each brick, determining that some bricks are corrupted and incorrectly excluding them from the operation and initiating a self-heal. In some cases this false detection combined with self-heal could lead to invalid updates of the trusted.ec.size xattr, leaving the file smaller than it should be. This only happens if the first operation does not perform a lookup, because chained operations reuse the information returned by the previous one, avoiding this kind of problems. To solve this, now the background update is executed atomically with the posterior unlock. This avoids some reuses of the lock while updating. However this reduces performance because the window in which new requests can reuse the lock is much smaller now. This has been alleviated by using the same technique implemented in AFR (i.e. waiting some time before releasing the lock). Some minor changes also introduced in this patch: * Bug in management of 'trusted.glusterfs.pathinfo' that was writing beyond the allocated space. * Uninitialized variable. * trusted.ec.config was not created for regular files created with mknod. * An invalid state was used in access fop. This is a backport of http://review.gluster.org/8947/ Change-Id: Idfaf69578ed04dbac97a62710326729715b9b395 BUG: 1152903 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8948 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* logs: Do selective logging for errnosPranith Kumar K2014-10-221-2/+3
| | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/8918 http://review.gluster.org/8955 Problem: Just after replace-brick the mount logs are filled with ENOENT/ESTALE warning logs because the file is yet to be self-healed now that the brick is new. Fix: Do conditional logging for the logs. ENOENT/ESTALE will be logged at lower log level. Only when debug logs are enabled, these logs will be written to the logfile. Change-Id: If203d09e2479e8c2415ebc14fb79d4fbb81dfc95 BUG: 1155027 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/8957 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Add afr-v1 xattr compatibilityPranith Kumar K2014-10-226-83/+330
| | | | | | | | | | | | | | | Backport of http://review.gluster.org/8536 All the special cases v1 handles and also self-accusing pending changelog from v1 pre-op also is handled in this patch. BUG: 1155017 Change-Id: I86cf6b80492be5c1f240c74f91a0e1b0dd9b58b2 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/8956 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* ec: Fix self-heal issuesXavier Hernandez2014-10-2213-302/+391
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Doing an 'ls' of a directory that has been modified while one of the bricks was down, sometimes returns the old directory contents. Cause: Directories are not marked when they are modified as files are. The ec xlator balances requests amongst available and healthy bricks. Since there is no way to detect that a directory is out of date in one of the bricks, it is used from time to time to return the directory contents. Solution: Basically the solution consists in use versioning information also for directories, however some additional changes have been necessary. Changes: * Use directory versioning: This required to lock full directory instead of a single entry for all requests that add or remove entries from it. This is needed to allow atomic version update. This affects the following fops: create, mkdir, mknod, link, symlink, rename, unlink, rmdir Another side effect is that opendir requires to do a previous lookup to get versioning information and discard out of date bricks for subsequent readdir(p) calls. * Restrict directory self-heal: Till now, when one discrepancy was found in lookup, a self-heal was automatically started. This caused the versioning information of a bad directory to be healed instantly, making the original problem to reapear again. To solve this, when a missing directory is detected in one or more bricks on lookup or opendir fops, only a partial self-heal is performed on it. A partial self-heal basically creates the directory but does not restore any additional information. This avoids that an 'ls' could repair the directory and cause the problem to happen again. With this change, output of 'ls' is always consistent. However, since the directory has been created in the brick, this allows any other operation on it (create new files, for example) to succeed on all bricks and not add additional work to the self-heal process. To force a self-heal of a directory, any other operation must be done on it. For example a getxattr. With these changes, the correct healing procedure that would avoid inconsistent directory browsing consists on a post-order traversal of directoriesi being healed. This way, the directory contents will be healed before healing the directory itslef. * Additional changes to fix self-heal errors - Don't use fop->fd to decide between fd/loc. open, opendir and create have an fd, but the correct data is in loc. - Fix incorrect management of bad bricks per inode/fd. - Fix incorrect selection of fop's target bricks when there are bad bricks involved. - Improved ec_loc_parent() to always return a parent loc as complete as possible. This is a backport of http://review.gluster.org/8916/ Change-Id: Iaf3df174d7857da57d4a87b4a8740a7048b366ad BUG: 1149727 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8946 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* ec: Fix 32 bits issues on file size calculationXavier Hernandez2014-10-202-6/+4
| | | | | | | | | | | | | | Some additional 32 bits issues have been added by a recent patch. This patch solves it. This is a backport of http://review.gluster.org/8882/ Change-Id: Ice81032fbe8e36e5ccad19a781b7876891993906 BUG: 1146904 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8883 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* ec: Fix incorrect management of healed bricksXavier Hernandez2014-10-203-7/+16
| | | | | | | | | | | | | | | | | | | | The final lookup made to restore final file attributes after a self-heal did clear the mask of bad bricks, causing that the final setattr won't modify any brick at all. This caused that some attriutes, specially the modification time of the file didn't get updated properly. Now the mask of healed bricks is saved before doing the last lookup. It's also used to correctly report the repaired bricks. This is a backport of http://review.gluster.org/8905/ Change-Id: Ib94083c9e1b562515dfb54f9574120f1f031dccc BUG: 1149725 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8906 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* ec: Add state dump supportXavier Hernandez2014-10-041-0/+30
| | | | | | | | | | | | This is a backport of http://review.gluster.org/8891/ Change-Id: I4504f3050674dde217e79af28cb4d2b5370fe2d5 BUG: 1148093 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8899 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* ec: Fix memory leak caused by undestroyed fopsXavier Hernandez2014-10-031-1/+1
| | | | | | | | | | | | | | | | | Operations processed by ec_dispatch_one() were not correctly completed by ec_complete(), leaving some structures in memory. Now ec_complete() also calls ec_resume() for this type of fops. This is a backport of http://review.gluster.org/8896/ Change-Id: Iaf0f2e8227399ebb735db9f1bd007593e0ece041 BUG: 1148521 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8897 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Fix invalid seekdir() usagev3.6.0beta3Emmanuel Dreyfus2014-09-301-3/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to POSIX, seekdir() should only be given offset obtained from telldir() on the same DIR * http://pubs.opengroup.org/onlinepubs/9699919799/functions/seekdir.html Code from afr-self-heald.c and index.c is operating outside of the specification, by doing using seekdir() with offset from a previously open/close/re-open directory. This seems to work on Linux (although with no guarantee it will always in the future). On NetBSD the seekdir() with a in invalid offset is a nilpotent operation, and causes an infinite loop, since index_fill_readdir() always restart from the beginning of the directory. The situation is fixed by using a non anonymous fd in afr-self-heald.c: we explicitely open the directory so that it remains open on the brick side during the timeframe where we want to reuse offsets in seekdir(). This requires adding an opendir fop in index xlator. If the brick was not updated, the opendir will fail and we fallback to the standard violating approach for backward compatibility on Linux. On other systems we fail since it never worked. While there, add tests to check seekdir() success in index and posix xlators, so that incorrect usage from calling code produce an explicit error instead of an infinite loop. We can only do it on non Linux systems, for the sake of backward compatibility when the brick was updated but not the client. Backport of I88ca90acfcfee280988124bd6addc1a1893ca7ab BUG: 1138897 Change-Id: I5446a9a17d5451ec5aab8fbd10d381da9a0a23ad Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/8860 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr : Fix incorrect looping of index healerAnuradha2014-09-291-4/+9
| | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/8868 Sending appropriate return value from afr_selfheal() fixes the issue. Credits : Krutika Dhananjay and Pranith Kumar. Change-Id: I1dc8105078e99dbc12295ef557a407bf3d8cfec3 BUG: 1147486 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/8884 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Launch self-heal only when all the brick status is knownPranith Kumar K2014-09-291-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: File goes into split-brain because of wrong erasing of xattrs. RCA: The issue happens because index self-heal is triggered even before all the bricks are up. So what ends up happening while erasing the xattrs is, xattrs are erased only on the sink brick for the brick that it thinks is up leading to split-brain Example: lets say the xattrs before heal started are: brick 2: trusted.afr.vol1-client-2=0x000000020000000000000000 trusted.afr.vol1-client-3=0x000000020000000000000000 brick 3: trusted.afr.vol1-client-2=0x000010040000000000000000 trusted.afr.vol1-client-3=0x000000000000000000000000 if only brick-2 came up at the time of triggering the self-heal only 'trusted.afr.vol1-client-2' is erased leading to the following xattrs: brick 2: trusted.afr.vol1-client-2=0x000000000000000000000000 trusted.afr.vol1-client-3=0x000000020000000000000000 brick 3: trusted.afr.vol1-client-2=0x000010040000000000000000 trusted.afr.vol1-client-3=0x000000000000000000000000 So the file goes into split-brain. BUG: 1142612 Change-Id: I0c8b66e154f03b636db052c97745399a7cca265b Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/8756 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Fix inode leakKrutika Dhananjay2014-09-291-0/+2
| | | | | | | | | | | | Backport of: http://review.gluster.org/8875 Change-Id: Ib000be1238d38f8d63ff25b3873bb813bf72beec BUG: 1145914 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/8876 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: More dict_t leak fixesKrutika Dhananjay2014-09-262-33/+68
| | | | | | | | | | | | Backport of: http://review.gluster.org/8852 Change-Id: I2c927092dc5a834fabdd3495a7f7a3527604a6d2 BUG: 1136831 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/8869 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Fix locking issues in entry self-healKrutika Dhananjay2014-09-261-92/+123
| | | | | | | | | | | | | | | Backport of: http://review.gluster.org/#/c/8837/ Original reporter of the bug & designer of the solution: Pranith Kumar K <pkarampu@redhat.com> Change-Id: I6e36326e1d9398ede82166b358ab438367dd3011 BUG: 1136829 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/8845 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Fix spurious metadata self-healsv3.6.0beta2Pranith Kumar K2014-09-247-29/+86
| | | | | | | | | | | | | | | Backport of http://review.gluster.org/8709 - Added logging for metadata and data self-heals which helped in debugging this issue. - Added checks to skip self-heals when no sinks are available to heal BUG: 1145987 Change-Id: Ide03af4f531a1280ec8ad95b627285df4d7bc42d Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/8832 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Fixed mem leaks in self-heal code path.Anuradha2014-09-242-1/+17
| | | | | | | | | | | | | | | | | backport of: http://review.gluster.org/8821 AFR_STACK_RESET previously didn't cleanup afr_local_t, leading to memory leaks. With this patch, cleanup is done. All credit goes to Pranith Kumar Karampuri. Change-Id: I26506dfd9273b917eff5127c3e0cf9421e60f228 BUG: 1145914 Reviewed-on: http://review.gluster.org/8831 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* ec: Add config information in an xattrXavier Hernandez2014-09-237-1/+185
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To simplify backward compatibility of the ec xlator when some parameter or the implementation itself is changed, a new xattr is added to each file with the configuration needed to recover it. The new attribute is called 'trusted.ec.config', and it's a 64-bit value containing the following information: 8 bits: version of the config information (currently always 0) 8 bits: algorithm used to encode the file (currently always 0) 8 bits: size of the galois field (currently always 8) 8 bits: number of bricks 8 bits: redundancy 24 bits: chunk size (currently 512) This new xattr could allow, in a future version, to have different configurations per file. This is a backport of http://review.gluster.org/8770/ Change-Id: I8c12d40ff546cc201fc66caa367484be3d48aeb4 BUG: 1140862 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8825 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Do not reset pending xattrs on gfid or type mismatch in entry-shKrutika Dhananjay2014-09-231-18/+79
| | | | | | | | | | | | Backport of: http://review.gluster.org/8816 Change-Id: I8463a579f542a2336b02edba8f5fbfea0edbbffe BUG: 1136829 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/8823 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Don't start heal when lookup succeeds on < 2 childrenPranith Kumar K2014-09-236-8/+29
| | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/8698 Problem: When self-heal code doesn't see at least 2 successes on looking up children, then self-heal can't be done. What is happening now is if all the lookups fail then the pending changelog is all zeros in xattrs so all the children are becoming sources and leading to crashes when the code paths further assume that some data structures are populated properly Fix: Don't proceed with self-heals when < 2 children succeed lookups. BUG: 1145726 Change-Id: I65465843f0e554c8ccdd8fa930ab42ac123ec023 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/8824 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Set all the xattrs needed by index xlatorAnuradha2014-09-212-41/+30
| | | | | | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/8652 Index xlator removes the index file from indices xattrop directory in case the value for keys sent are zero. If all the required keys are not set by afr then index file might be removed in an invalid way. With this change all the keys required by index xlator are set by afr such that invalid removal of files does not occur. Change-Id: I1b77904920c8566057415c52242179aec6a015e2 BUG: 1144744 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/8788 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: Fix dict_t leaks in rebalance process' execution pathKrutika Dhananjay2014-09-201-4/+7
| | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/8763 Two dict_t objects are leaked for every file migrated in success codepath. It is the caller's responsibility to unref dict that it gets from calls to syncop_getxattr(); and rebalance performs two syncop_getxattr()s per file without freeing them. Also, syncop_getxattr() on GF_XATTR_LINKINFO_KEY doesn't seem to be using the response dict. Hence, NULL is now passed as opposed to @dict to syncop_getxattr(). Change-Id: I48926389db965e006da151bf0ccb6bcaf3585199 BUG: 1144640 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/8785 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* ec: Fix some size_t vars to 64 bits even on 32 bits machinesXavier Hernandez2014-09-196-21/+20
| | | | | | | | | | | | | | | | | | | The 64 bits 'trusted.ec.size' extended attribute was incorrectly computed on 32 bits machines due to an overflow on negative numbers. Also changed some potentially dangerous uses of size_t in other places. This is a backport of http://review.gluster.org/8738/ Change-Id: Id76cfe49a2f350e564b5c71d8c8644fb9ce86662 BUG: 1144407 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8779 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* ec: Fix invalid inode lock in ftruncateXavier Hernandez2014-09-195-21/+21
| | | | | | | | | | | | | | | | | | | | | | The fops 'truncate' and 'ftruncate' share some code and inodelk() was always made against the inode inside the loc_t structure instead of that of fd_t. Since ftruncate has the loc initialized to NULL, this fop was executed without any lock, allowing some concurrent modifications in the file size. Also changed the way in which 'fop' and 'ffop' are differentiated in shared code. Now it uses 'id' field instead of checking if 'fd' is NULL. This is a backport of http://review.gluster.org/8695/ Change-Id: Ibd18accf2652193b395a841b9029729e5f4867c6 BUG: 1140847 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8780 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: perform list-xattr during lookupRavishankar N2014-09-194-11/+222
| | | | | | | | | | | | | Detect and heal mismatching user extended attributes during lookup. Backport of: http://review.gluster.org/8558 Change-Id: Id03c9746f083ffd3014711d0b3a2e5a71a45eed4 BUG: 1144274 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/8773 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr : Mark pending changelog xattrs for new creationsAnuradha2014-09-187-90/+148
| | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/8555 Based on type of file, set appropriate pending changelogs for new entries. Change-Id: Icf9af866fe9a9e511210e8ad097e968e2307d8ee BUG: 1141787 Reviewed-on: http://review.gluster.org/8555 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/8748 Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Cluster/DHT: Changing rename log severityNithya Balachandran2014-09-181-6/+5
| | | | | | | | | | | | | | Changing log level for a rename message from debug to info to improve debuggability Change-Id: I53031fcf97fffd62095692477330ecde0cf47dcd BUG: 1138395 Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on-master: http://review.gluster.org/8582 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Signed-off-by: Shyam <srangana@redhat.com> Reviewed-on: http://review.gluster.org/8626
* cluster/dht: Changed log level to DEBUGVenkatesh Somyajulu2014-09-181-4/+4
| | | | | | | | | | | | Change-Id: I7a4ee0c5a6a94bd4f31aff510a2971750913ed45 BUG: 1142402 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on-master: http://review.gluster.org/8621 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-by: susant palai <spalai@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/8749
* cluster/dht: Fixed double UNWIND in lookup everywhere codeShyam2014-09-161-4/+4
| | | | | | | | | | | | | | | | | | | | | | | In dht_lookup_everywhere_done: Line: 1194 we call DHT_STACK_UNWIND and in the same if condition we go ahead and call, goto unwind_hashed_and_cached; which at Line 1371 calls another UNWIND. As is obvious, higher frames could cleanup their locals and on receiving the next unwind could cause a coredump of the process. Fixed the same by calling the required return post the first unwind Change-Id: Ic5d57da98255b8616a65b4caaedabeba9144fd49 BUG: 1142409 Signed-off-by: Shyam <srangana@redhat.com> Reviewed-on-master: http://review.gluster.org/8666 Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: susant palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/8751
* cluster/dht: fix memory corruption in locking api.Raghavendra G2014-09-161-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | <man 3 qsort> The contents of the array are sorted in ascending order according to a comparison function pointed to by compar, which is called with two arguments that "point to the objects being compared". </man 3 qsort> qsort passes "pointers to members of the array" to comparision function. Since the members of the array happen to be (dht_lock_t *), the arguments passed to dht_lock_request_cmp are of type (dht_lock_t **). Previously we assumed them to be of type (dht_lock_t *), which resulted in memory corruption. Change-Id: Iee0758704434beaff3c3a1ad48d549cbdc9e1c96 BUG: 1142406 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on-master: http://review.gluster.org/8659 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/8750
* cluster/dht: Rename should not fail post hardlink creationShyam2014-09-162-41/+99
| | | | | | | | | | | | | | | | | | | | | | | | | | | In the rename path, we wind the creation of newname hardlink and linkto file in dst hashed a the same time. If the linkto creation fails, but the link creation succeeds, we enter the failure code and cleanup the created newname hardlink. In the interim if another client looks up newname and finds it as a hardlink from FUSE, it could send an unlink for oldname instead of a rename. This combined with the above cleanup code could end up losing all the files copies, and thereby losing data. This fix separates these steps into 2 parts, creating the linkto first and then the link file, so that post link file creation no failures would cleanup the newname file. If linkto fails then link is not attempted, thereby not polluting the name space with newname. Change-Id: I61da8e906060da16a31ea1076eec2f01fd617f44 BUG: 1138395 Signed-off-by: Shyam <srangana@redhat.com> Reviewed-on-master: http://review.gluster.org/8570 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/8615
* ec: Optimize read/write performanceXavier Hernandez2014-09-1613-268/+706
| | | | | | | | | | | | | | | | | | | | This patch significantly improves performance of read/write operations on a dispersed volume by reusing previous inodelk/ entrylk operations on the same inode/entry. This reduces the latency of each individual operation considerably. Inode version and size are also updated when needed instead of on each request. This gives an additional boost. This is a backport of http://review.gluster.org/8369/ Change-Id: I4b98d5508c86b53032e16e295f72a3f83fd8fcac BUG: 1140844 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8746 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* ec: Only heal data/metadata when inode has enough informationXavier Hernandez2014-09-161-0/+8
| | | | | | | | | | | | | | | | Sometimes loc_t structure in a heal request doesn't contain enough information to do an inodelk call (basically the gfid is missing). In these cases, self heal only recovers entry information. This is a backport of http://review.gluster.org/8368/ Change-Id: I459990c7df728ff4baf164df046672ddcde3efa5 BUG: 1140626 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8747 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>