| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
This was leading to hangs when get_size_and_version fails
Change-Id: Iad9408c2dacc9a74594b8d2f94c95f402533b0f1
BUG: 1215265
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/10390
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the version numbers do not match, then writes are performed only on at least
N-R bricks which have same version. But if we want to do healing of files which
are constantly modified we need to allow writes on subvols that are undergoing
heal. Data healing will mark 62nd bit while the heal is going on. When the data
transaction sees that this bit is set it needs to perform the fop on that
subvol irrespective of whether the versions match or do not match. Fop is
considered successful only if N-R non-healing bricks succeed.
Change-Id: I69a17582df397aaf6e8ca4b5e746c7ca802cbbde
BUG: 1215265
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/10372
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
with the inode quota feature, quota size is now
increased from 64bit to 192bits which contains
values of 'file size', 'file count' and 'dir count'
This change in quota size xattr needs to be handled
in disperse xattr aggregation
Signed-off-by: vmallika <vmallika@redhat.com>
Change-Id: I5fd28aa9f5b8b6cba83a98360236417a97ac16ee
BUG: 1207967
Reviewed-on: http://review.gluster.org/10112
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Sachin Pandit <spandit@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Directory deletion should always happen with 'rm -rf' flag, otherwise the
call may fail with ENOTEMPTY.
- Instead of doing an explicit 'link' call, perform mknod call with
GLUSTERFS_INTERNAL_FOP_KEY which acts as 'link' if the
gfid already exists.
Change-Id: I8826f92170421db37efb67dfc00afad4ab695907
BUG: 1207085
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/10045
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
gf_deitransform returns the glbal client-id in the complete graph. So except
for the first disperse subvolume under dht, all the other disperse subvolumes
will return a client-id greater than ec->nodes, so readdir will always error
out in those subvolumes.
Fix:
Get the client subvolume whose client-id matches the client-id returned by
gf_deitransform of offset.
Change-Id: I26aa17504352d48d7ff14b390b62f49d7ab2d699
BUG: 1209113
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/10165
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
|
|
|
|
|
|
|
|
|
| |
Change-Id: Id60107e9fb96588d24fa2f3be85c764b7f08e3d1
BUG: 1207712
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/10077
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
iobuf_get and iobref_add implicitly
ref the iobuf.
Hence, it is necessary to unref iobuf
before setting it to NULL.
Change-Id: Icadd8925574cf04fe708d8090868e49356653a8e
Signed-off-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-on: http://review.gluster.org/9818
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds support for xdata in both the
request and response path of syncops.
Few calls like lookup already had the support;
have renamed variables in few places to maintain
uniformity.
xdata passed downwards is known as xdata_in
and xdata passed upwards is known as xdata_out.
There is an old patch by Jeff Darcy at
http://review.gluster.org/#/c/8769/3 which does the
same for some selected calls. It also brings in
xdata support at gfapi level.
xdata support at gfapi level would be introduced
in subsequent patches.
Change-Id: I340e94ebaf2a38e160e65bc30732e8fe1c532dcc
BUG: 1158621
Signed-off-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-on: http://review.gluster.org/9859
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ec_manager_xxx() function for [f]set/[f]remove xattr is exactly same except the
reporting part. So moved that to common function and use same
ec_manager_xattr() function for all these fops.
Change-Id: Iaa57023b800f8d1f3f6a827f4ceba9b0a0337336
BUG: 1199767
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/10036
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All _cbk() functions in inode-write.c do same things, i.e. store
op_ret/op_errno, stat structures if they are available and combine them. Moved
this common operation into one function ec_inode_write_cbk() and made all the
other _cbk() functions to use this instead.
Change-Id: I2387b9f2d9598ced6299a26ea1900e9deb9fadc4
BUG: 1199767
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/9981
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
glusterfs relies on Linux uuid implementation, which
API is incompatible with most other systems's uuid. As
a result, libglusterfs has to embed contrib/uuid,
which is the Linux implementation, on non Linux systems.
This implementation is incompatible with systtem's
built in, but the symbols have the same names.
Usually this is not a problem because when we link
with -lglusterfs, libc's symbols are trumped. However
there is a problem when a program not linked with
-lglusterfs will dlopen() glusterfs component. In
such a case, libc's uuid implementation is already
loaded in the calling program, and it will be used
instead of libglusterfs's implementation, causing
crashes.
A possible workaround is to use pre-load libglusterfs
in the calling program (using LD_PRELOAD on NetBSD for
instance), but such a mechanism is not portable, nor
is it flexible. A much better approach is to rename
libglusterfs's uuid_* functions to gf_uuid_* to avoid
any possible conflict. This is what this change attempts.
BUG: 1206587
Change-Id: I9ccd3e13afed1c7fc18508e92c7beb0f5d49f31a
Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org>
Reviewed-on: http://review.gluster.org/10017
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This also lists the files that are on-going I/O, which
will be fixed later.
Change-Id: Ib3f60a8b7e8798d068658cf38eaef2a904f9e327
BUG: 1203581
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/10020
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I5d3aca101c8cdda406d31d06c40404fa6a2b7170
BUG: 1192378
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/9995
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Changed the implementation of marker xattr handling to take just a
function which populates important data that is different from
default 'gauge' values and subvolumes where the call needs to be
wound.
- Removed duplicate code I found while reading the code and moved it to
cluster_marker_unwind. Removed unused structure members.
- Changed dht/afr/stripe implementations to follow the new implementation
- Implemented marker xattr handling for ec.
Change-Id: Ib0c3626fe31eb7c8aae841eabb694945bf23abd4
BUG: 1200372
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/9892
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Also fixed iatt_combine to go over all the valid iatts
Change-Id: I1d52d705ed0437f602357acde3e479cedb748681
BUG: 1199767
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/9827
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
position in the graph rather than relative (local) to a particular
translator.
Encoding the volume in this way allows a single translator to manage
which brick is currently being scanned for directory entries. Using a
single translator minimizes allocated bits in the d_off. It also allows
multiple DHT translators in the same graph to have a common frame of
reference (the graph position) for which brick is being read. Multiple
DHT translators are needed for the Tiering feature.
The fix builds off a previous change (9332) which removed subvolume
encoding from AFR. The fix makes an equivalent change to the EC
translator.
More background can be found in fix 9332 and gluster-dev discussions [1].
DHT and AFR/EC are responsibile (as before) for choosing which brick to
enumerate directory entries in over the readdir lifecycle.
The client translator receiving the readdir fop encodes the dht_t. It
is referred to as the "leaf node" in the graph and corresponds to the
brick being scanned.
When DHT decodes the d_off, it translates the leaf node to a local
subvolume, which represents the next node in the graph leading to
the brick.
Tracking of leaf nodes is done in common utility functions. Leaf nodes
counts and positional information are updated on a graph switch.
[1] www.gluster.org/pipermail/gluster-devel/2015-January/043592.html
Change-Id: Iaf0ea86d7046b1ceadbad69d88707b243077ebc8
BUG: 1190734
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/9688
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces the changes required in ec xlator to handle
index/full heal.
Index healer threads:
Ec xlator start an index healer thread per local brick. This thread keeps
waking up every minute to check if there are any files to be healed based on
the indices kept in index directory. Whenever child_up event comes, then also
this index healer thread wakes up and crawls the indices and triggers heal.
When self-heal-daemon is disabled on this particular volume then the healer
thread keeps waiting until it is enabled again to perform heals.
Full healer threads:
Ec xlator starts a full healer thread for the local subvolume provided by
glusterd to perform full crawl on the directory hierarchy to perform heals.
Once the crawl completes the thread exits if no more full heals are issued.
Changed xl-op prefix GF_AFR_OP to GF_SHD_OP to make it more generic.
Change-Id: Idf9b2735d779a6253717be064173dfde6f8f824b
BUG: 1177601
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/9787
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
loc->parent may not always be populated. Even in those cases,
self-heal should happen if it can be completed using nameless loc.
Change-Id: I8871fc811bec8b881ae7fb09dcd202c6693b9877
BUG: 1177601
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/9717
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This xattr will be incremented before each data modifying operation and
decremented after it. This will add the possibility to detect partially
updated writes and refuse them on reads.
It will also be useful for interacting with index xlator and have a way
to heal dispersed files from the self-heal daemon.
Change-Id: Ie644a8dd074ae0f254c809c5863bdb030be5486a
BUG: 1190581
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/9607
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Anonymous file descriptors need to be handled specially because
they can be used in some non standard ways (i.e. an anonymous fd
can be used without having been opened).
This caused NFS to fail on some operations because ec always
expected to have a previous successful opendir call (from patch
http://review.gluster.org/9098/).
This patch treats all anonymous fd as opened on all subvolumes.
Change-Id: I09dbbce2ffc1ae3a5bcbb328bed55b84f4f0b9f8
BUG: 1187474
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/9513
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is to prevent spurious heals that can result in self-heal.
Change-Id: I0b27c1c1fc7a58e2683cb1ca135117a85efcc6c9
BUG: 1179180
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/9523
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
When all the bricks are down at the time of mounting the volume, then mount
command hangs.
Fix:
1. Ignore all CHILD_CONNECTING events comming from subvolumes.
2. On timer expiration (without enough up or down childs) send
CHILD_DOWN.
3. Once enough up or down subvolumes are detected, send the appropriate event.
When rest of the subvols go up/down without changing the overall
ec-up/ec-down send CHILD_MODIFIED to parent subvols.
Change-Id: Ie0194dbadef2dce36ab5eb7beece84a6bf3c631c
BUG: 1179180
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/9396
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch solves some problems that caused dispersed volumes to not
pass posix smoke tests:
* Problems in open/create with O_WRONLY
Opening files with -w- permissions using O_WRONLY returned an EACCES
error because internally O_WRONLY was replaced with O_RDWR.
* Problems with entrylk on renames.
When source and destination were the same, ec tried to acquire
the same entrylk twice, causing a deadlock.
* Overwrite of a variable when reordering locks.
On a rename, if the second lock needed to be placed at the beggining
of the list, the 'lock' variable was overwritten and later its timer
was cancelled, cancelling the incorrect one.
* Handle O_TRUNC in open.
When O_TRUNC was received in an open call, it was blindly propagated
to child subvolumes. This caused a discrepancy between real file
size and the size stored into trusted.ec.size xattr. This has been
solved by removing O_TRUNC from open and later calling ftruncate.
Change-Id: I20c3d6e1c11be314be86879be54b728e01013798
BUG: 1161886
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/9420
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For volumes with replicate, disperse xlators, self-heal daemon should do
healing. This patch provides enable/disable functionality for the xlators to be
part of self-heal-daemon. Replicate already had this functionality with
'gluster volume set cluster.self-heal-daemon on/off'. But this patch makes it
uniform for both types of volumes. Internally it still does 'volume set' based
on the volume type.
Change-Id: Ie0f3799b74c2afef9ac658ef3d50dce3e8072b29
BUG: 1177601
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/9358
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Tested-by: Krishnan Parthasarathi <kparthas@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When command 'clear-locks' from cli is executed, a getxattr request
is received by ec. This request was handled as usual, first locking
the inode. Once this request was processed by the bricks, all locks
were removed, including the lock used by ec.
When ec tried to unlock the previously acquired lock (which was
already released), caused a crash in glusterfsd.
This fix executes the getxattr request without any lock acquired
for the clear-locks command.
Change-Id: I77e550d13c4673d2468a1e13fe6e2fed20e233c6
BUG: 1179050
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/9440
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An incorrectly placed 'inline' keyword caused compilation warnings
with gcc 5.
Change-Id: I2bf8c39b1514ea7dac13e82eb3b6ff4b98e62c79
BUG: 1182267
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/9452
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allowing O_APPEND flag to pass through to the brick files
corrupts fragment contents because writes are not stored on
the desired place.
Write fop has been modified so that it uses current file
size as its write offset. This guarantees that all writes,
even those comming from different file descriptors and
clients, will write to the end of the file.
Change-Id: I9f721f12217a98231fe52e344166d1c94172c272
BUG: 1161621
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/9079
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Internal xattrs of EC like trusted.ec.size/config/version
can be modified by users and that can lead to misbehavior
in EC.
Fix:
Don't let the user modify the xattrs. Hide these xattrs
in getfattr outputs.
Change-Id: I39cec96ae12826b506b496fda7da74201015fd75
BUG: 1178688
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/9385
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Emmanuel Dreyfus <manu@netbsd.org>
Tested-by: Emmanuel Dreyfus <manu@netbsd.org>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
EC heal tries to heal quota-size, selinux xattrs as well. quota-size is
private to the brick but since quotad accesses them using the standard
interface as well, they can not be filtered in the fops.
Fix:
Ignore QUOTA_SIZE_KEY and SELINUX xattrs during heal.
Change-Id: I1572f9e2fcba7f120b4265e034953a15ff297f04
BUG: 1179640
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/9401
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This patch solves CID 1257622.
Change-Id: I95680c7de49cd84011d2ad38f02e5fad82e15c90
BUG: 1170254
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/9263
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Changes introduced by this patch:
* Fix an incorrect error propagation when the state of the life
cycle of a fop returns an error.
* Fix incorrect unlocking of failed locks.
* Return ENOTCONN if there aren't enough bricks online.
* In readdir(p) check that the fd has been successfully open by
a previous opendir.
Change-Id: Ib44f25a1297849ebcbab839332f3b6359f275ebe
BUG: 1162805
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/9098
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch solves 3 issues detected by coverity scan:
CID1241484 Data race condition
CID1241486 Data race condition
CID1256173 Thread deadlock
With this patch, inode lock is never acquired inside a region locked
with fop->lock.
Change-Id: I35c4633efd1b68b9f72b42661fa7c728b1f52c6a
BUG: 1170254
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/9230
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
EC_MAX_NODES was incorrectly calculated. Now the value if computed
as the minimum between the theoretical maximum and the limit imposed
by the Galois Field.
Change-Id: I75a8345147f344f051923d66be2c10d405370c7b
BUG: 1167419
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/9193
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Three problems have been detected:
1. Self healing is executed in background, allowing the fop that
detected the problem to continue without blocks nor delays.
While this is quite interesting to avoid unnecessary delays,
it can cause spurious failures of self-heal because it may
try to recover a file inside a directory that a previous
self-heal has not recovered yet, causing the file self-heal
to fail.
2. When a partial self-heal is being executed on a directory,
if a full self-heal is attempted, it won't be executed
because another self-heal is already in process, so the
directory won't be fully repaired.
3. Information contained in loc's of some fop's is not enough
to do a complete self-heal.
To solve these problems, I've made some changes:
* Improved ec_loc_from_loc() to add all available information
to a loc.
* Before healing an entry, it's parent is checked and partially
healed if necessary to avoid failures.
* All heal requests received for the same inode while another
self-heal is being processed are queued. When the first heal
completes, all pending requests are answered using the results
of the first heal (without full execution), unless the first
heal was a partial heal. In this case all partial heals are
answered, and the first full heal is processed normally.
* An special virtual xattr (not physically stored on bricks)
named 'trusted.ec.heal' has been created to allow synchronous
self-heal of files.
Now, the recommended way to heal an entire volume is this:
find <mount> -d -exec getfattr -h -n trusted.ec.heal {} \;
Some minor changes:
* ec_loc_prepare() has been renamed to ec_loc_update().
* All loc management functions return 0 on success and -1 on
error.
* Do not delay fop unlocks if heal is needed.
* Added basic ec xattrs initially on create, mkdir and mknod
fops.
* Some coding style changes
Change-Id: I2a5fd9c57349a153710880d6ac4b1fa0c1475985
BUG: 1161588
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/9072
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Iae90ade2421898417b53dec0417a610cf306c44b
BUG: 1168167
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/9201
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To avoid inconsistent directory listings, a full self-heal
cannot happen on a directory until all its contents have
been healed. This is controlled by a manual command using
getfattr recursively and in post-order.
While navigating the directories, sometimes an (f)stat fop
can be sent. This fop caused a full self-heal of the directory.
This patch makes that (f)stat only initiates a partial self-heal.
Change-Id: I0a92bda8f4f9e43c1acbceab2d7926944a8a4d9a
BUG: 1163760
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/9117
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I35e11d83c318210d44b918e847cf13db35b01510
BUG: 1158008
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8990
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I2bd34f063d6bf1835d5ae57a8e9aa03f3ec3deb3
BUG: 1156404
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8972
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some issues in ec xlator made that rebalance didn't complete
successfully and generated some warnings and errors in the
log. The most critical error was a race condition that caused
false corruption detection when two specific operations were
executed sequentially and they shared the same lock.
This explains the problem:
1. A setxattr is issued.
2. setxattr: ec locks the inode before updating the xattr.
3. setxattr: The xattr is updated.
4. setxattr: Upper xlator is notified that the operation completed.
5. setxattr: A background task is initiated to update the version
of the file.
6. A stat is issued on the same file.
7. stat: Since the lock is already acquired, it's reused.
8. stat: A lookup is issued to determine version and size
information of the file.
At this point, operations 5 and 8 can interfere. This can make that
lookup sees different information on each brick, determining that
some bricks are corrupted and incorrectly excluding them from the
operation and initiating a self-heal. In some cases this false
detection combined with self-heal could lead to invalid updates of
the trusted.ec.size xattr, leaving the file smaller than it should
be.
This only happens if the first operation does not perform a lookup,
because chained operations reuse the information returned by the
previous one, avoiding this kind of problems.
To solve this, now the background update is executed atomically with
the posterior unlock. This avoids some reuses of the lock while
updating. However this reduces performance because the window in
which new requests can reuse the lock is much smaller now. This has
been alleviated by using the same technique implemented in AFR (i.e.
waiting some time before releasing the lock).
Some minor changes also introduced in this patch:
* Bug in management of 'trusted.glusterfs.pathinfo' that was writing
beyond the allocated space.
* Uninitialized variable.
* trusted.ec.config was not created for regular files created with
mknod.
* An invalid state was used in access fop.
Change-Id: Idfaf69578ed04dbac97a62710326729715b9b395
BUG: 1152902
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8947
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: Doing an 'ls' of a directory that has been modified while one
of the bricks was down, sometimes returns the old directory
contents.
Cause: Directories are not marked when they are modified as files are.
The ec xlator balances requests amongst available and healthy
bricks. Since there is no way to detect that a directory is
out of date in one of the bricks, it is used from time to time
to return the directory contents.
Solution: Basically the solution consists in use versioning information
also for directories, however some additional changes have
been necessary.
Changes:
* Use directory versioning:
This required to lock full directory instead of a single entry for
all requests that add or remove entries from it. This is needed to
allow atomic version update. This affects the following fops:
create, mkdir, mknod, link, symlink, rename, unlink, rmdir
Another side effect is that opendir requires to do a previous
lookup to get versioning information and discard out of date
bricks for subsequent readdir(p) calls.
* Restrict directory self-heal:
Till now, when one discrepancy was found in lookup, a self-heal
was automatically started. This caused the versioning information
of a bad directory to be healed instantly, making the original
problem to reapear again.
To solve this, when a missing directory is detected in one or more
bricks on lookup or opendir fops, only a partial self-heal is
performed on it. A partial self-heal basically creates the
directory but does not restore any additional information.
This avoids that an 'ls' could repair the directory and cause the
problem to happen again. With this change, output of 'ls' is
always consistent. However, since the directory has been created
in the brick, this allows any other operation on it (create new
files, for example) to succeed on all bricks and not add additional
work to the self-heal process.
To force a self-heal of a directory, any other operation must be
done on it. For example a getxattr.
With these changes, the correct healing procedure that would avoid
inconsistent directory browsing consists on a post-order traversal
of directoriesi being healed. This way, the directory contents will
be healed before healing the directory itslef.
* Additional changes to fix self-heal errors
- Don't use fop->fd to decide between fd/loc.
open, opendir and create have an fd, but the correct data is in
loc.
- Fix incorrect management of bad bricks per inode/fd.
- Fix incorrect selection of fop's target bricks when there are bad
bricks involved.
- Improved ec_loc_parent() to always return a parent loc as
complete as possible.
Change-Id: Iaf3df174d7857da57d4a87b4a8740a7048b366ad
BUG: 1149726
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8916
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some additional 32 bits issues have been added by a recent patch.
This patch solves it.
Change-Id: Ice81032fbe8e36e5ccad19a781b7876891993906
BUG: 1146903
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8882
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Emmanuel Dreyfus <manu@netbsd.org>
Tested-by: Emmanuel Dreyfus <manu@netbsd.org>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The final lookup made to restore final file attributes after a self-heal
did clear the mask of bad bricks, causing that the final setattr won't
modify any brick at all. This caused that some attriutes, specially the
modification time of the file didn't get updated properly.
Now the mask of healed bricks is saved before doing the last lookup.
It's also used to correctly report the repaired bricks.
Change-Id: Ib94083c9e1b562515dfb54f9574120f1f031dccc
BUG: 1149723
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8905
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Operations processed by ec_dispatch_one() were not correctly
completed by ec_complete(), leaving some structures in memory.
Now ec_complete() also calls ec_resume() for this type of fops.
Change-Id: Iaf0f2e8227399ebb735db9f1bd007593e0ece041
BUG: 1148520
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8896
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I4504f3050674dde217e79af28cb4d2b5370fe2d5
BUG: 1148010
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8891
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Emmanuel Dreyfus <manu@netbsd.org>
Tested-by: Emmanuel Dreyfus <manu@netbsd.org>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To simplify backward compatibility of the ec xlator when some
parameter or the implementation itself is changed, a new xattr
is added to each file with the configuration needed to recover
it.
The new attribute is called 'trusted.ec.config', and it's a 64-bit
value containing the following information:
8 bits: version of the config information (currently always 0)
8 bits: algorithm used to encode the file (currently always 0)
8 bits: size of the galois field (currently always 8)
8 bits: number of bricks
8 bits: redundancy
24 bits: chunk size (currently 512)
This new xattr could allow, in a future version, to have different
configurations per file.
Change-Id: I8c12d40ff546cc201fc66caa367484be3d48aeb4
BUG: 1140861
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8770
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The fops 'truncate' and 'ftruncate' share some code and inodelk()
was always made against the inode inside the loc_t structure
instead of that of fd_t. Since ftruncate has the loc initialized
to NULL, this fop was executed without any lock, allowing some
concurrent modifications in the file size.
Also changed the way in which 'fop' and 'ffop' are differentiated
in shared code. Now it uses 'id' field instead of checking if 'fd'
is NULL.
Change-Id: Ibd18accf2652193b395a841b9029729e5f4867c6
BUG: 1140396
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8695
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The 64 bits 'trusted.ec.size' extended attribute was incorrectly
computed on 32 bits machines due to an overflow on negative
numbers.
Also changed some potentially dangerous uses of size_t in other
places.
Change-Id: Id76cfe49a2f350e564b5c71d8c8644fb9ce86662
BUG: 1125312
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8738
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch significantly improves performance of read/write
operations on a dispersed volume by reusing previous inodelk/
entrylk operations on the same inode/entry. This reduces the
latency of each individual operation considerably.
Inode version and size are also updated when needed instead
of on each request. This gives an additional boost.
Change-Id: I4b98d5508c86b53032e16e295f72a3f83fd8fcac
BUG: 1122586
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8369
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sometimes loc_t structure in a heal request doesn't contain enough
information to do an inodelk call (basically the gfid is missing).
In these cases, self heal only recovers entry information.
Change-Id: I459990c7df728ff4baf164df046672ddcde3efa5
BUG: 1122581
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8368
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently there is no need to handle inode invalidation requests,
so this callback has been removed.
Change-Id: I0ac2e47679bf62b1493e0403178305923bc036e8
BUG: 1126932
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8420
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|