| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
| |
loc->parent may not always be populated. Even in those cases,
self-heal should happen if it can be completed using nameless loc.
Change-Id: I8871fc811bec8b881ae7fb09dcd202c6693b9877
BUG: 1177601
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/9717
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This xattr will be incremented before each data modifying operation and
decremented after it. This will add the possibility to detect partially
updated writes and refuse them on reads.
It will also be useful for interacting with index xlator and have a way
to heal dispersed files from the self-heal daemon.
Change-Id: Ie644a8dd074ae0f254c809c5863bdb030be5486a
BUG: 1190581
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/9607
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Three problems have been detected:
1. Self healing is executed in background, allowing the fop that
detected the problem to continue without blocks nor delays.
While this is quite interesting to avoid unnecessary delays,
it can cause spurious failures of self-heal because it may
try to recover a file inside a directory that a previous
self-heal has not recovered yet, causing the file self-heal
to fail.
2. When a partial self-heal is being executed on a directory,
if a full self-heal is attempted, it won't be executed
because another self-heal is already in process, so the
directory won't be fully repaired.
3. Information contained in loc's of some fop's is not enough
to do a complete self-heal.
To solve these problems, I've made some changes:
* Improved ec_loc_from_loc() to add all available information
to a loc.
* Before healing an entry, it's parent is checked and partially
healed if necessary to avoid failures.
* All heal requests received for the same inode while another
self-heal is being processed are queued. When the first heal
completes, all pending requests are answered using the results
of the first heal (without full execution), unless the first
heal was a partial heal. In this case all partial heals are
answered, and the first full heal is processed normally.
* An special virtual xattr (not physically stored on bricks)
named 'trusted.ec.heal' has been created to allow synchronous
self-heal of files.
Now, the recommended way to heal an entire volume is this:
find <mount> -d -exec getfattr -h -n trusted.ec.heal {} \;
Some minor changes:
* ec_loc_prepare() has been renamed to ec_loc_update().
* All loc management functions return 0 on success and -1 on
error.
* Do not delay fop unlocks if heal is needed.
* Added basic ec xattrs initially on create, mkdir and mknod
fops.
* Some coding style changes
Change-Id: I2a5fd9c57349a153710880d6ac4b1fa0c1475985
BUG: 1161588
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/9072
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
| |
|
|
|
|
|
|
|
|
| |
Change-Id: Iae90ade2421898417b53dec0417a610cf306c44b
BUG: 1168167
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/9201
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some issues in ec xlator made that rebalance didn't complete
successfully and generated some warnings and errors in the
log. The most critical error was a race condition that caused
false corruption detection when two specific operations were
executed sequentially and they shared the same lock.
This explains the problem:
1. A setxattr is issued.
2. setxattr: ec locks the inode before updating the xattr.
3. setxattr: The xattr is updated.
4. setxattr: Upper xlator is notified that the operation completed.
5. setxattr: A background task is initiated to update the version
of the file.
6. A stat is issued on the same file.
7. stat: Since the lock is already acquired, it's reused.
8. stat: A lookup is issued to determine version and size
information of the file.
At this point, operations 5 and 8 can interfere. This can make that
lookup sees different information on each brick, determining that
some bricks are corrupted and incorrectly excluding them from the
operation and initiating a self-heal. In some cases this false
detection combined with self-heal could lead to invalid updates of
the trusted.ec.size xattr, leaving the file smaller than it should
be.
This only happens if the first operation does not perform a lookup,
because chained operations reuse the information returned by the
previous one, avoiding this kind of problems.
To solve this, now the background update is executed atomically with
the posterior unlock. This avoids some reuses of the lock while
updating. However this reduces performance because the window in
which new requests can reuse the lock is much smaller now. This has
been alleviated by using the same technique implemented in AFR (i.e.
waiting some time before releasing the lock).
Some minor changes also introduced in this patch:
* Bug in management of 'trusted.glusterfs.pathinfo' that was writing
beyond the allocated space.
* Uninitialized variable.
* trusted.ec.config was not created for regular files created with
mknod.
* An invalid state was used in access fop.
Change-Id: Idfaf69578ed04dbac97a62710326729715b9b395
BUG: 1152902
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8947
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: Doing an 'ls' of a directory that has been modified while one
of the bricks was down, sometimes returns the old directory
contents.
Cause: Directories are not marked when they are modified as files are.
The ec xlator balances requests amongst available and healthy
bricks. Since there is no way to detect that a directory is
out of date in one of the bricks, it is used from time to time
to return the directory contents.
Solution: Basically the solution consists in use versioning information
also for directories, however some additional changes have
been necessary.
Changes:
* Use directory versioning:
This required to lock full directory instead of a single entry for
all requests that add or remove entries from it. This is needed to
allow atomic version update. This affects the following fops:
create, mkdir, mknod, link, symlink, rename, unlink, rmdir
Another side effect is that opendir requires to do a previous
lookup to get versioning information and discard out of date
bricks for subsequent readdir(p) calls.
* Restrict directory self-heal:
Till now, when one discrepancy was found in lookup, a self-heal
was automatically started. This caused the versioning information
of a bad directory to be healed instantly, making the original
problem to reapear again.
To solve this, when a missing directory is detected in one or more
bricks on lookup or opendir fops, only a partial self-heal is
performed on it. A partial self-heal basically creates the
directory but does not restore any additional information.
This avoids that an 'ls' could repair the directory and cause the
problem to happen again. With this change, output of 'ls' is
always consistent. However, since the directory has been created
in the brick, this allows any other operation on it (create new
files, for example) to succeed on all bricks and not add additional
work to the self-heal process.
To force a self-heal of a directory, any other operation must be
done on it. For example a getxattr.
With these changes, the correct healing procedure that would avoid
inconsistent directory browsing consists on a post-order traversal
of directoriesi being healed. This way, the directory contents will
be healed before healing the directory itslef.
* Additional changes to fix self-heal errors
- Don't use fop->fd to decide between fd/loc.
open, opendir and create have an fd, but the correct data is in
loc.
- Fix incorrect management of bad bricks per inode/fd.
- Fix incorrect selection of fop's target bricks when there are bad
bricks involved.
- Improved ec_loc_parent() to always return a parent loc as
complete as possible.
Change-Id: Iaf3df174d7857da57d4a87b4a8740a7048b366ad
BUG: 1149726
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8916
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some additional 32 bits issues have been added by a recent patch.
This patch solves it.
Change-Id: Ice81032fbe8e36e5ccad19a781b7876891993906
BUG: 1146903
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8882
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Emmanuel Dreyfus <manu@netbsd.org>
Tested-by: Emmanuel Dreyfus <manu@netbsd.org>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The final lookup made to restore final file attributes after a self-heal
did clear the mask of bad bricks, causing that the final setattr won't
modify any brick at all. This caused that some attriutes, specially the
modification time of the file didn't get updated properly.
Now the mask of healed bricks is saved before doing the last lookup.
It's also used to correctly report the repaired bricks.
Change-Id: Ib94083c9e1b562515dfb54f9574120f1f031dccc
BUG: 1149723
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8905
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To simplify backward compatibility of the ec xlator when some
parameter or the implementation itself is changed, a new xattr
is added to each file with the configuration needed to recover
it.
The new attribute is called 'trusted.ec.config', and it's a 64-bit
value containing the following information:
8 bits: version of the config information (currently always 0)
8 bits: algorithm used to encode the file (currently always 0)
8 bits: size of the galois field (currently always 8)
8 bits: number of bricks
8 bits: redundancy
24 bits: chunk size (currently 512)
This new xattr could allow, in a future version, to have different
configurations per file.
Change-Id: I8c12d40ff546cc201fc66caa367484be3d48aeb4
BUG: 1140861
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8770
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The 64 bits 'trusted.ec.size' extended attribute was incorrectly
computed on 32 bits machines due to an overflow on negative
numbers.
Also changed some potentially dangerous uses of size_t in other
places.
Change-Id: Id76cfe49a2f350e564b5c71d8c8644fb9ce86662
BUG: 1125312
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8738
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch significantly improves performance of read/write
operations on a dispersed volume by reusing previous inodelk/
entrylk operations on the same inode/entry. This reduces the
latency of each individual operation considerably.
Inode version and size are also updated when needed instead
of on each request. This gives an additional boost.
Change-Id: I4b98d5508c86b53032e16e295f72a3f83fd8fcac
BUG: 1122586
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8369
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
Change-Id: I293917501d5c2ca4cdc6303df30cf0b568cea361
BUG: 1118629
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/7749
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|