summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/ec/src/ec-common.c
Commit message (Collapse)AuthorAgeFilesLines
* ec: Fix posix compliance failuresXavier Hernandez2015-02-111-23/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch solves some problems that caused dispersed volumes to not pass posix smoke tests: * Problems in open/create with O_WRONLY Opening files with -w- permissions using O_WRONLY returned an EACCES error because internally O_WRONLY was replaced with O_RDWR. * Problems with entrylk on renames. When source and destination were the same, ec tried to acquire the same entrylk twice, causing a deadlock. * Overwrite of a variable when reordering locks. On a rename, if the second lock needed to be placed at the beggining of the list, the 'lock' variable was overwritten and later its timer was cancelled, cancelling the incorrect one. * Handle O_TRUNC in open. When O_TRUNC was received in an open call, it was blindly propagated to child subvolumes. This caused a discrepancy between real file size and the size stored into trusted.ec.size xattr. This has been solved by removing O_TRUNC from open and later calling ftruncate. This is a backport of http://review.gluster.org/9420 Change-Id: I20c3d6e1c11be314be86879be54b728e01013798 BUG: 1159471 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9501 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* ec: Fix mutex related coverity scan issuesXavier Hernandez2015-01-041-2/+14
| | | | | | | | | | | | | | | | | | | | | | | This patch solves 3 issues detected by coverity scan: CID1241484 Data race condition CID1241486 Data race condition CID1256173 Thread deadlock CID1257622 Thread deadlock With this patch, inode lock is never acquired inside a region locked with fop->lock. This is a backport of http://review.gluster.org/9230/ and http://review.gluster.org/9263/ Change-Id: I35c4633efd1b68b9f72b42661fa7c728b1f52c6a BUG: 1170954 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9244 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* ec: Fix self-healing issues.Xavier Hernandez2014-12-211-44/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Three problems have been detected: 1. Self healing is executed in background, allowing the fop that detected the problem to continue without blocks nor delays. While this is quite interesting to avoid unnecessary delays, it can cause spurious failures of self-heal because it may try to recover a file inside a directory that a previous self-heal has not recovered yet, causing the file self-heal to fail. 2. When a partial self-heal is being executed on a directory, if a full self-heal is attempted, it won't be executed because another self-heal is already in process, so the directory won't be fully repaired. 3. Information contained in loc's of some fop's is not enough to do a complete self-heal. To solve these problems, I've made some changes: * Improved ec_loc_from_loc() to add all available information to a loc. * Before healing an entry, it's parent is checked and partially healed if necessary to avoid failures. * All heal requests received for the same inode while another self-heal is being processed are queued. When the first heal completes, all pending requests are answered using the results of the first heal (without full execution), unless the first heal was a partial heal. In this case all partial heals are answered, and the first full heal is processed normally. * An special virtual xattr (not physically stored on bricks) named 'trusted.ec.heal' has been created to allow synchronous self-heal of files. Now, the recommended way to heal an entire volume is this: find <mount> -d -exec getfattr -h -n trusted.ec.heal {} \; Some minor changes: * ec_loc_prepare() has been renamed to ec_loc_update(). * All loc management functions return 0 on success and -1 on error. * Do not delay fop unlocks if heal is needed. * Added basic ec xattrs initially on create, mkdir and mknod fops. * Some coding style changes This is a backport of http://review.gluster.org/9072/ Change-Id: I2a5fd9c57349a153710880d6ac4b1fa0c1475985 BUG: 1159484 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9073 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* ec: Fix return errors when not enough bricksXavier Hernandez2014-12-121-14/+23
| | | | | | | | | | | | | | | | | | | | | | | | | Changes introduced by this patch: * Fix an incorrect error propagation when the state of the life cycle of a fop returns an error. * Fix incorrect unlocking of failed locks. * Return ENOTCONN if there aren't enough bricks online. * In readdir(p) check that the fd has been successfully open by a previous opendir. This is a backport of http://review.gluster.org/9098/ Change-Id: Ib44f25a1297849ebcbab839332f3b6359f275ebe BUG: 1161066 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9107 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Raghavendra Bhat <raghavendra@redhat.com>
* ec: Change licenseXavier Hernandez2014-12-121-16/+6
| | | | | | | | | | | | | This is a backport of http://review.gluster.org/9201/ Change-Id: Iae90ade2421898417b53dec0417a610cf306c44b BUG: 1170515 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9233 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Raghavendra Bhat <raghavendra@redhat.com>
* ec: Avoid self-heal on directories on (f)stat callsXavier Hernandez2014-11-151-1/+2
| | | | | | | | | | | | | | | | | | | | | To avoid inconsistent directory listings, a full self-heal cannot happen on a directory until all its contents have been healed. This is controlled by a manual command using getfattr recursively and in post-order. While navigating the directories, sometimes an (f)stat fop can be sent. This fop caused a full self-heal of the directory. This patch makes that (f)stat only initiates a partial self-heal. This is a backport of http://review.gluster.org/9117/ Change-Id: I0a92bda8f4f9e43c1acbceab2d7926944a8a4d9a BUG: 1159498 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9118 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* ec: Fix rebalance issuesXavier Hernandez2014-10-271-110/+182
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some issues in ec xlator made that rebalance didn't complete successfully and generated some warnings and errors in the log. The most critical error was a race condition that caused false corruption detection when two specific operations were executed sequentially and they shared the same lock. This explains the problem: 1. A setxattr is issued. 2. setxattr: ec locks the inode before updating the xattr. 3. setxattr: The xattr is updated. 4. setxattr: Upper xlator is notified that the operation completed. 5. setxattr: A background task is initiated to update the version of the file. 6. A stat is issued on the same file. 7. stat: Since the lock is already acquired, it's reused. 8. stat: A lookup is issued to determine version and size information of the file. At this point, operations 5 and 8 can interfere. This can make that lookup sees different information on each brick, determining that some bricks are corrupted and incorrectly excluding them from the operation and initiating a self-heal. In some cases this false detection combined with self-heal could lead to invalid updates of the trusted.ec.size xattr, leaving the file smaller than it should be. This only happens if the first operation does not perform a lookup, because chained operations reuse the information returned by the previous one, avoiding this kind of problems. To solve this, now the background update is executed atomically with the posterior unlock. This avoids some reuses of the lock while updating. However this reduces performance because the window in which new requests can reuse the lock is much smaller now. This has been alleviated by using the same technique implemented in AFR (i.e. waiting some time before releasing the lock). Some minor changes also introduced in this patch: * Bug in management of 'trusted.glusterfs.pathinfo' that was writing beyond the allocated space. * Uninitialized variable. * trusted.ec.config was not created for regular files created with mknod. * An invalid state was used in access fop. This is a backport of http://review.gluster.org/8947/ Change-Id: Idfaf69578ed04dbac97a62710326729715b9b395 BUG: 1152903 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8948 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* ec: Fix self-heal issuesXavier Hernandez2014-10-221-140/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Doing an 'ls' of a directory that has been modified while one of the bricks was down, sometimes returns the old directory contents. Cause: Directories are not marked when they are modified as files are. The ec xlator balances requests amongst available and healthy bricks. Since there is no way to detect that a directory is out of date in one of the bricks, it is used from time to time to return the directory contents. Solution: Basically the solution consists in use versioning information also for directories, however some additional changes have been necessary. Changes: * Use directory versioning: This required to lock full directory instead of a single entry for all requests that add or remove entries from it. This is needed to allow atomic version update. This affects the following fops: create, mkdir, mknod, link, symlink, rename, unlink, rmdir Another side effect is that opendir requires to do a previous lookup to get versioning information and discard out of date bricks for subsequent readdir(p) calls. * Restrict directory self-heal: Till now, when one discrepancy was found in lookup, a self-heal was automatically started. This caused the versioning information of a bad directory to be healed instantly, making the original problem to reapear again. To solve this, when a missing directory is detected in one or more bricks on lookup or opendir fops, only a partial self-heal is performed on it. A partial self-heal basically creates the directory but does not restore any additional information. This avoids that an 'ls' could repair the directory and cause the problem to happen again. With this change, output of 'ls' is always consistent. However, since the directory has been created in the brick, this allows any other operation on it (create new files, for example) to succeed on all bricks and not add additional work to the self-heal process. To force a self-heal of a directory, any other operation must be done on it. For example a getxattr. With these changes, the correct healing procedure that would avoid inconsistent directory browsing consists on a post-order traversal of directoriesi being healed. This way, the directory contents will be healed before healing the directory itslef. * Additional changes to fix self-heal errors - Don't use fop->fd to decide between fd/loc. open, opendir and create have an fd, but the correct data is in loc. - Fix incorrect management of bad bricks per inode/fd. - Fix incorrect selection of fop's target bricks when there are bad bricks involved. - Improved ec_loc_parent() to always return a parent loc as complete as possible. This is a backport of http://review.gluster.org/8916/ Change-Id: Iaf3df174d7857da57d4a87b4a8740a7048b366ad BUG: 1149727 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8946 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* ec: Fix 32 bits issues on file size calculationXavier Hernandez2014-10-201-4/+2
| | | | | | | | | | | | | | Some additional 32 bits issues have been added by a recent patch. This patch solves it. This is a backport of http://review.gluster.org/8882/ Change-Id: Ice81032fbe8e36e5ccad19a781b7876891993906 BUG: 1146904 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8883 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* ec: Fix incorrect management of healed bricksXavier Hernandez2014-10-201-1/+7
| | | | | | | | | | | | | | | | | | | | The final lookup made to restore final file attributes after a self-heal did clear the mask of bad bricks, causing that the final setattr won't modify any brick at all. This caused that some attriutes, specially the modification time of the file didn't get updated properly. Now the mask of healed bricks is saved before doing the last lookup. It's also used to correctly report the repaired bricks. This is a backport of http://review.gluster.org/8905/ Change-Id: Ib94083c9e1b562515dfb54f9574120f1f031dccc BUG: 1149725 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8906 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* ec: Fix memory leak caused by undestroyed fopsXavier Hernandez2014-10-031-1/+1
| | | | | | | | | | | | | | | | | Operations processed by ec_dispatch_one() were not correctly completed by ec_complete(), leaving some structures in memory. Now ec_complete() also calls ec_resume() for this type of fops. This is a backport of http://review.gluster.org/8896/ Change-Id: Iaf0f2e8227399ebb735db9f1bd007593e0ece041 BUG: 1148521 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8897 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* ec: Add config information in an xattrXavier Hernandez2014-09-231-1/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To simplify backward compatibility of the ec xlator when some parameter or the implementation itself is changed, a new xattr is added to each file with the configuration needed to recover it. The new attribute is called 'trusted.ec.config', and it's a 64-bit value containing the following information: 8 bits: version of the config information (currently always 0) 8 bits: algorithm used to encode the file (currently always 0) 8 bits: size of the galois field (currently always 8) 8 bits: number of bricks 8 bits: redundancy 24 bits: chunk size (currently 512) This new xattr could allow, in a future version, to have different configurations per file. This is a backport of http://review.gluster.org/8770/ Change-Id: I8c12d40ff546cc201fc66caa367484be3d48aeb4 BUG: 1140862 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8825 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* ec: Fix some size_t vars to 64 bits even on 32 bits machinesXavier Hernandez2014-09-191-1/+1
| | | | | | | | | | | | | | | | | | | The 64 bits 'trusted.ec.size' extended attribute was incorrectly computed on 32 bits machines due to an overflow on negative numbers. Also changed some potentially dangerous uses of size_t in other places. This is a backport of http://review.gluster.org/8738/ Change-Id: Id76cfe49a2f350e564b5c71d8c8644fb9ce86662 BUG: 1144407 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8779 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* ec: Optimize read/write performanceXavier Hernandez2014-09-161-148/+397
| | | | | | | | | | | | | | | | | | | | This patch significantly improves performance of read/write operations on a dispersed volume by reusing previous inodelk/ entrylk operations on the same inode/entry. This reduces the latency of each individual operation considerably. Inode version and size are also updated when needed instead of on each request. This gives an additional boost. This is a backport of http://review.gluster.org/8369/ Change-Id: I4b98d5508c86b53032e16e295f72a3f83fd8fcac BUG: 1140844 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8746 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* cluster/ec: Added erasure code translatorXavier Hernandez2014-07-111-0/+1109
Change-Id: I293917501d5c2ca4cdc6303df30cf0b568cea361 BUG: 1118629 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/7749 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>