glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	cluster/ec: Handle CHILD UP/DOWN in all cases	Pranith Kumar K	2015-03-30	1	-0/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of http://review.gluster.org/9396 Problem: When all the bricks are down at the time of mounting the volume, then mount command hangs. Fix: 1. Ignore all CHILD_CONNECTING events comming from subvolumes. 2. On timer expiration (without enough up or down childs) send CHILD_DOWN. 3. Once enough up or down subvolumes are detected, send the appropriate event. When rest of the subvols go up/down without changing the overall ec-up/ec-down send CHILD_MODIFIED to parent subvols. BUG: 1188471 Change-Id: If92bd84107d49495cd104deb34601afe7f9b155c Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9551 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
*	cluster/ec: Handle internal xattr get/set	Pranith Kumar K	2015-02-04	1	-0/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of http://review.gluster.org/9385 Problem: Internal xattrs of EC like trusted.ec.size/config/version can be modified by users and that can lead to misbehavior in EC. Fix: Don't let the user modify the xattrs. Hide these xattrs in getfattr outputs. BUG: 1182490 Change-Id: Ie32ebb95ee67cabbb9488951097a517172b45bcf Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9455 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
*	ec: Correctly handle quota xattrs	Xavier Hernandez	2014-11-15	1	-0/+58
\| \| \| \| \| \| \| \| \| \| \|	This is a backport of http://review.gluster.org/8990/ Change-Id: I35e11d83c318210d44b918e847cf13db35b01510 BUG: 1158088 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8992 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
*	ec: Fix rebalance issues	Xavier Hernandez	2014-10-27	3	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some issues in ec xlator made that rebalance didn't complete successfully and generated some warnings and errors in the log. The most critical error was a race condition that caused false corruption detection when two specific operations were executed sequentially and they shared the same lock. This explains the problem: 1. A setxattr is issued. 2. setxattr: ec locks the inode before updating the xattr. 3. setxattr: The xattr is updated. 4. setxattr: Upper xlator is notified that the operation completed. 5. setxattr: A background task is initiated to update the version of the file. 6. A stat is issued on the same file. 7. stat: Since the lock is already acquired, it's reused. 8. stat: A lookup is issued to determine version and size information of the file. At this point, operations 5 and 8 can interfere. This can make that lookup sees different information on each brick, determining that some bricks are corrupted and incorrectly excluding them from the operation and initiating a self-heal. In some cases this false detection combined with self-heal could lead to invalid updates of the trusted.ec.size xattr, leaving the file smaller than it should be. This only happens if the first operation does not perform a lookup, because chained operations reuse the information returned by the previous one, avoiding this kind of problems. To solve this, now the background update is executed atomically with the posterior unlock. This avoids some reuses of the lock while updating. However this reduces performance because the window in which new requests can reuse the lock is much smaller now. This has been alleviated by using the same technique implemented in AFR (i.e. waiting some time before releasing the lock). Some minor changes also introduced in this patch: * Bug in management of 'trusted.glusterfs.pathinfo' that was writing beyond the allocated space. * Uninitialized variable. * trusted.ec.config was not created for regular files created with mknod. * An invalid state was used in access fop. This is a backport of http://review.gluster.org/8947/ Change-Id: Idfaf69578ed04dbac97a62710326729715b9b395 BUG: 1152903 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8948 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
*	ec: Fix self-heal issues	Xavier Hernandez	2014-10-22	2	-47/+169
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Doing an 'ls' of a directory that has been modified while one of the bricks was down, sometimes returns the old directory contents. Cause: Directories are not marked when they are modified as files are. The ec xlator balances requests amongst available and healthy bricks. Since there is no way to detect that a directory is out of date in one of the bricks, it is used from time to time to return the directory contents. Solution: Basically the solution consists in use versioning information also for directories, however some additional changes have been necessary. Changes: * Use directory versioning: This required to lock full directory instead of a single entry for all requests that add or remove entries from it. This is needed to allow atomic version update. This affects the following fops: create, mkdir, mknod, link, symlink, rename, unlink, rmdir Another side effect is that opendir requires to do a previous lookup to get versioning information and discard out of date bricks for subsequent readdir(p) calls. * Restrict directory self-heal: Till now, when one discrepancy was found in lookup, a self-heal was automatically started. This caused the versioning information of a bad directory to be healed instantly, making the original problem to reapear again. To solve this, when a missing directory is detected in one or more bricks on lookup or opendir fops, only a partial self-heal is performed on it. A partial self-heal basically creates the directory but does not restore any additional information. This avoids that an 'ls' could repair the directory and cause the problem to happen again. With this change, output of 'ls' is always consistent. However, since the directory has been created in the brick, this allows any other operation on it (create new files, for example) to succeed on all bricks and not add additional work to the self-heal process. To force a self-heal of a directory, any other operation must be done on it. For example a getxattr. With these changes, the correct healing procedure that would avoid inconsistent directory browsing consists on a post-order traversal of directoriesi being healed. This way, the directory contents will be healed before healing the directory itslef. * Additional changes to fix self-heal errors - Don't use fop->fd to decide between fd/loc. open, opendir and create have an fd, but the correct data is in loc. - Fix incorrect management of bad bricks per inode/fd. - Fix incorrect selection of fop's target bricks when there are bad bricks involved. - Improved ec_loc_parent() to always return a parent loc as complete as possible. This is a backport of http://review.gluster.org/8916/ Change-Id: Iaf3df174d7857da57d4a87b4a8740a7048b366ad BUG: 1149727 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8946 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	test/ec: Fix spurious failures caused by self-heal	Xavier Hernandez	2014-10-21	12	-41/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The sha1sum of a file may update the access time of that file. If this happens while a brick is down, as it is forced in the test, that brick doesn't get the update, getting out of sync. When the brick is restarted, self-heal repairs the file, but the test shouldn't access brick contents until self-heal finishes. If this is combined with a kill of another brick before self-heal has finished repairing the file, the volume could become inaccessible. Since the purpose of these tests is only to check ec functionality (there is another test that checks self-heal), the test that corrupts the file has been removed. Additional checks to validate the state of the volume have been added to avoid some timing issues. This is a backport of http://review.gluster.org/8892/ BUG: 1149118 Change-Id: I8a40b7f07fc8ecd2c721bad1bcdd351dd8504155 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8902 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
*	ec: Fix incorrect management of healed bricks	Xavier Hernandez	2014-10-20	1	-32/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The final lookup made to restore final file attributes after a self-heal did clear the mask of bad bricks, causing that the final setattr won't modify any brick at all. This caused that some attriutes, specially the modification time of the file didn't get updated properly. Now the mask of healed bricks is saved before doing the last lookup. It's also used to correctly report the repaired bricks. This is a backport of http://review.gluster.org/8905/ Change-Id: Ib94083c9e1b562515dfb54f9574120f1f031dccc BUG: 1149725 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8906 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	ec: Add state dump support	Xavier Hernandez	2014-10-04	1	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \|	This is a backport of http://review.gluster.org/8891/ Change-Id: I4504f3050674dde217e79af28cb4d2b5370fe2d5 BUG: 1148093 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8899 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	ec: Removed SSE2 dependency	Xavier Hernandez	2014-09-12	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements the Galois Field multiplications using pure C code without any assembler support. This makes the ec xlator portable to other architectures. In the future it will be possible to use an optimized implementation of the multiplications using architecture dependent facilities (it will be automatically detected and configured). To allow bricks with different machine word sizes to be able to work seamlessly in the same volume, the minimum fragment length to be stored in any brick has been fixed to 512 bytes. Otherwise, different implementations will corrupt the data (SSE2 used 128 bytes, while new implementation would have used 64). This patch also removes the '-msse2' option added on patch http://review.gluster.org/8396/ This is a backport of http://review.gluster.org/8413/ Change-Id: Iaf6e4ef3dcfda6c68f48f16ca46fc4fb61a215f4 BUG: 1140845 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8701 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	ec: Test volume mount point in a better way	Xavier Hernandez	2014-09-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	An 'ls -a1' on an empty volume seems to return 3 entries instead of the expected 2 ('.' and '..') in the build servers. I changed the test to a simple 'stat', which is enough and more reliable. This is a backport of http://review.gluster.org/8313. Change-Id: I12d0f47394ad378b40fc9b86507cdb3543f99970 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8313 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Humble Devassy Chirammal <humble.devassy@gmail.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/8417
*	cluster/ec: Fix incorrect management of NFS requests	Xavier Hernandez	2014-08-11	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some operations, specially those comming from NFS, do not use a regular fd and use an anonymous fd (i.e. a previous open call has not been sent). Any context information created during open or create will not be present on these fd's, so we simply return NULL for contexts of those fd. Also it seems that NFS can send write requests with a very big buffer (higher that the default value of 128 KB). Some changes have been made to correctly handle these large buffers. This is a backport of http://review.gluster.org/8367. Change-Id: I281476bd0d2cbaad231822248d6a616fcf5d4003 BUG: 1126734 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8367 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/8416
*	cli/glusterd: Added support for dispersed volumes	Xavier Hernandez	2014-07-11	10	-0/+597
	Two new options have been added to the 'create' command of the cli interface: disperse [<count>] redundancy <count> Both are optional. A dispersed volume is created by specifying, at least, one of them. If 'disperse' is missing or it's present but '<count>' does not, the number of bricks enumerated in the command line is taken as the disperse count. If 'redundancy' is missing, the lowest optimal value is assumed. A configuration is considered optimal (for most workloads) when the disperse count - redundancy count is a power of 2. If the resulting redundancy is 1, the volume is created normally, but if it's greater than 1, a warning is shown to the user and he/she must answer yes/no to continue volume creation. If there isn't any optimal value for the given number of bricks, a warning is also shown and, if the user accepts, a redundancy of 1 is used. If 'redundancy' is specified and the resulting volume is not optimal, another warning is shown to the user. A distributed-disperse volume can be created using a number of bricks multiple of the disperse count. Change-Id: Iab93efbe78e905cdb91f54f3741599f7ea6645e4 BUG: 1118629 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/7782 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>