| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of http://review.gluster.org/9396
Problem:
When all the bricks are down at the time of mounting the volume, then mount
command hangs.
Fix:
1. Ignore all CHILD_CONNECTING events comming from subvolumes.
2. On timer expiration (without enough up or down childs) send
CHILD_DOWN.
3. Once enough up or down subvolumes are detected, send the appropriate event.
When rest of the subvols go up/down without changing the overall
ec-up/ec-down send CHILD_MODIFIED to parent subvols.
BUG: 1188471
Change-Id: If92bd84107d49495cd104deb34601afe7f9b155c
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/9551
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of http://review.gluster.org/9385
Problem:
Internal xattrs of EC like trusted.ec.size/config/version
can be modified by users and that can lead to misbehavior
in EC.
Fix:
Don't let the user modify the xattrs. Hide these xattrs
in getfattr outputs.
BUG: 1182490
Change-Id: Ie32ebb95ee67cabbb9488951097a517172b45bcf
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/9455
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This is a backport of http://review.gluster.org/8990/
Change-Id: I35e11d83c318210d44b918e847cf13db35b01510
BUG: 1158088
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8992
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some issues in ec xlator made that rebalance didn't complete
successfully and generated some warnings and errors in the
log. The most critical error was a race condition that caused
false corruption detection when two specific operations were
executed sequentially and they shared the same lock.
This explains the problem:
1. A setxattr is issued.
2. setxattr: ec locks the inode before updating the xattr.
3. setxattr: The xattr is updated.
4. setxattr: Upper xlator is notified that the operation completed.
5. setxattr: A background task is initiated to update the version
of the file.
6. A stat is issued on the same file.
7. stat: Since the lock is already acquired, it's reused.
8. stat: A lookup is issued to determine version and size
information of the file.
At this point, operations 5 and 8 can interfere. This can make that
lookup sees different information on each brick, determining that
some bricks are corrupted and incorrectly excluding them from the
operation and initiating a self-heal. In some cases this false
detection combined with self-heal could lead to invalid updates of
the trusted.ec.size xattr, leaving the file smaller than it should
be.
This only happens if the first operation does not perform a lookup,
because chained operations reuse the information returned by the
previous one, avoiding this kind of problems.
To solve this, now the background update is executed atomically with
the posterior unlock. This avoids some reuses of the lock while
updating. However this reduces performance because the window in
which new requests can reuse the lock is much smaller now. This has
been alleviated by using the same technique implemented in AFR (i.e.
waiting some time before releasing the lock).
Some minor changes also introduced in this patch:
* Bug in management of 'trusted.glusterfs.pathinfo' that was writing
beyond the allocated space.
* Uninitialized variable.
* trusted.ec.config was not created for regular files created with
mknod.
* An invalid state was used in access fop.
This is a backport of http://review.gluster.org/8947/
Change-Id: Idfaf69578ed04dbac97a62710326729715b9b395
BUG: 1152903
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8948
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: Doing an 'ls' of a directory that has been modified while one
of the bricks was down, sometimes returns the old directory
contents.
Cause: Directories are not marked when they are modified as files are.
The ec xlator balances requests amongst available and healthy
bricks. Since there is no way to detect that a directory is
out of date in one of the bricks, it is used from time to time
to return the directory contents.
Solution: Basically the solution consists in use versioning information
also for directories, however some additional changes have
been necessary.
Changes:
* Use directory versioning:
This required to lock full directory instead of a single entry for
all requests that add or remove entries from it. This is needed to
allow atomic version update. This affects the following fops:
create, mkdir, mknod, link, symlink, rename, unlink, rmdir
Another side effect is that opendir requires to do a previous
lookup to get versioning information and discard out of date
bricks for subsequent readdir(p) calls.
* Restrict directory self-heal:
Till now, when one discrepancy was found in lookup, a self-heal
was automatically started. This caused the versioning information
of a bad directory to be healed instantly, making the original
problem to reapear again.
To solve this, when a missing directory is detected in one or more
bricks on lookup or opendir fops, only a partial self-heal is
performed on it. A partial self-heal basically creates the
directory but does not restore any additional information.
This avoids that an 'ls' could repair the directory and cause the
problem to happen again. With this change, output of 'ls' is
always consistent. However, since the directory has been created
in the brick, this allows any other operation on it (create new
files, for example) to succeed on all bricks and not add additional
work to the self-heal process.
To force a self-heal of a directory, any other operation must be
done on it. For example a getxattr.
With these changes, the correct healing procedure that would avoid
inconsistent directory browsing consists on a post-order traversal
of directoriesi being healed. This way, the directory contents will
be healed before healing the directory itslef.
* Additional changes to fix self-heal errors
- Don't use fop->fd to decide between fd/loc.
open, opendir and create have an fd, but the correct data is in
loc.
- Fix incorrect management of bad bricks per inode/fd.
- Fix incorrect selection of fop's target bricks when there are bad
bricks involved.
- Improved ec_loc_parent() to always return a parent loc as
complete as possible.
This is a backport of http://review.gluster.org/8916/
Change-Id: Iaf3df174d7857da57d4a87b4a8740a7048b366ad
BUG: 1149727
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8946
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The sha1sum of a file may update the access time of that file.
If this happens while a brick is down, as it is forced in the
test, that brick doesn't get the update, getting out of sync.
When the brick is restarted, self-heal repairs the file, but
the test shouldn't access brick contents until self-heal finishes.
If this is combined with a kill of another brick before self-heal
has finished repairing the file, the volume could become inaccessible.
Since the purpose of these tests is only to check ec functionality
(there is another test that checks self-heal), the test that corrupts
the file has been removed.
Additional checks to validate the state of the volume have been added
to avoid some timing issues.
This is a backport of http://review.gluster.org/8892/
BUG: 1149118
Change-Id: I8a40b7f07fc8ecd2c721bad1bcdd351dd8504155
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8902
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The final lookup made to restore final file attributes after a self-heal
did clear the mask of bad bricks, causing that the final setattr won't
modify any brick at all. This caused that some attriutes, specially the
modification time of the file didn't get updated properly.
Now the mask of healed bricks is saved before doing the last lookup.
It's also used to correctly report the repaired bricks.
This is a backport of http://review.gluster.org/8905/
Change-Id: Ib94083c9e1b562515dfb54f9574120f1f031dccc
BUG: 1149725
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8906
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a backport of http://review.gluster.org/8891/
Change-Id: I4504f3050674dde217e79af28cb4d2b5370fe2d5
BUG: 1148093
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8899
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements the Galois Field multiplications using pure C
code without any assembler support. This makes the ec xlator portable
to other architectures.
In the future it will be possible to use an optimized implementation
of the multiplications using architecture dependent facilities (it
will be automatically detected and configured). To allow bricks with
different machine word sizes to be able to work seamlessly in the
same volume, the minimum fragment length to be stored in any brick
has been fixed to 512 bytes. Otherwise, different implementations
will corrupt the data (SSE2 used 128 bytes, while new implementation
would have used 64).
This patch also removes the '-msse2' option added on patch
http://review.gluster.org/8396/
This is a backport of http://review.gluster.org/8413/
Change-Id: Iaf6e4ef3dcfda6c68f48f16ca46fc4fb61a215f4
BUG: 1140845
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8701
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An 'ls -a1' on an empty volume seems to return 3 entries instead
of the expected 2 ('.' and '..') in the build servers. I changed
the test to a simple 'stat', which is enough and more reliable.
This is a backport of http://review.gluster.org/8313.
Change-Id: I12d0f47394ad378b40fc9b86507cdb3543f99970
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8313
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Humble Devassy Chirammal <humble.devassy@gmail.com>
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Reviewed-on: http://review.gluster.org/8417
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some operations, specially those comming from NFS, do not use a
regular fd and use an anonymous fd (i.e. a previous open call has
not been sent). Any context information created during open or
create will not be present on these fd's, so we simply return NULL
for contexts of those fd.
Also it seems that NFS can send write requests with a very big
buffer (higher that the default value of 128 KB). Some changes
have been made to correctly handle these large buffers.
This is a backport of http://review.gluster.org/8367.
Change-Id: I281476bd0d2cbaad231822248d6a616fcf5d4003
BUG: 1126734
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8367
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Reviewed-on: http://review.gluster.org/8416
|
|
Two new options have been added to the 'create' command of the cli
interface:
disperse [<count>] redundancy <count>
Both are optional. A dispersed volume is created by specifying, at
least, one of them. If 'disperse' is missing or it's present but
'<count>' does not, the number of bricks enumerated in the command
line is taken as the disperse count.
If 'redundancy' is missing, the lowest optimal value is assumed. A
configuration is considered optimal (for most workloads) when the
disperse count - redundancy count is a power of 2. If the resulting
redundancy is 1, the volume is created normally, but if it's greater
than 1, a warning is shown to the user and he/she must answer yes/no
to continue volume creation. If there isn't any optimal value for
the given number of bricks, a warning is also shown and, if the user
accepts, a redundancy of 1 is used.
If 'redundancy' is specified and the resulting volume is not optimal,
another warning is shown to the user.
A distributed-disperse volume can be created using a number of bricks
multiple of the disperse count.
Change-Id: Iab93efbe78e905cdb91f54f3741599f7ea6645e4
BUG: 1118629
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/7782
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|