| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a replica pair unlike files, directories may not have their
content in same order, so readdir for same (offset, size) may
not give same entries on both the sobvolumes of replica pair.
Switching over from one subvolume to another may not be a good
idea sometimes. It may lead to duplicate entries or fewer entries
or both. This patch provides a way to disable readdir-failover
so that applications like rebalance can retry if they want to.
Change-Id: I2b23eb224a2e84016a561362932613ac824c11a0
BUG: 859387
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4159
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I4aef1c79743ee08b62e04d7b709f3e8c6b9dc56a
BUG: 881517
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4244
Reviewed-by: Anand Avati <avati@redhat.com>
Tested-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
afr_sh_data_fxattrop() currently allocates and sends a single xattr
dict_t instance to each replica. The callback codepath references
the returned object in the self-heal in-memory state for the
particular replica. If storage/posix is in the same address-space
(i.e., running a single glusterfs client with a fuse->afr->posix
graph), the same object is modified and returned for each child,
causing corrupted in-memory state and afr xattrs.
Allocate and send independent xattr dict_t's for each replica. This
allows self-heal to work correctly in a single address-space
graph.
BUG: 868478
Change-Id: I42832e85b5d1abb6098c28944c717e129300109e
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-on: http://review.gluster.org/4149
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current failure to handle short writes on writev fops leaves
us open to file corruption. A short write on a user request is
ignored and leaves replicas in an inconsistent state. A short write
during a self-heal is ignored and incorrectly marks the files as
consistent if the heal completes.
Modify user writev handling to return the best case return value
from each of the replicas. Short writes that occur relative to this
value are marked as failed and will require a heal. Modify
self-heal to set an error on a short write and abort the heal.
BUG: 853690
Change-Id: I18b30f58702326249230eeebb361b29e40b535f5
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-on: http://review.gluster.org/4150
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
values from all children need to be aggregated into a dictionary
and serialized buffer of this aggregated dictionary has to be
the value of GF_XATTR_LOCKINFO_KEY in the dict sent as a result of
fgetxattr.
Change-Id: Ie877f7c637c07feaee4c44d7ef86aa967a17b7e7
BUG: 808400
Signed-off-by: Raghavendra G <raghavendra@gluster.com>
Reviewed-on: http://review.gluster.org/4121
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The functional issue is described by the subject line. This patch also
addresses several efficiency/structure issues, such as...
* Calling dict_set_ptr once for each txn type, instead of once overall.
* Calling afr_index_for_transaction_type once per iteration instead of
once per call (or better yet zero since the conversion is unnecessary).
* Implementation of inner functions in a different file than their one
caller, creating a spurious header-file dependency.
Change-Id: I29e0df906a820533b66b9ced73e015dfe77267d2
BUG: 865825
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/4070
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Whenever gluster volume heal vol full command is executed, the entries
stored in the circual buffer for sh->healed are added in the dictionary
in the _crawl_post_sh_action function irrespective of whether actual self heal
(due to non-zero values in chage log) takes place or not.
Fix:
Value of key (actual-sh-done) will be set to 1 whenever self heal takes place
due to non-zero change log values and if for some FOP self heal daemon finds
that no self heal required after examining the pending matrix, the value will
be 0.
Change-Id: I11fd0b9ee76759af17c5bca6bfafbaf66bcaacbc
BUG: 863068
Signed-off-by: Venkatesh Somyajula <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/4181
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Eager locking lk-owner decision is taken before transaction
type is set. Default transaction type is DATA so all transactions
are treated as DATA transactions at the time of eager-locking
decision.
Fix:
Move the code that takes lk-owner decision after the transaction
type is set.
Test:
Checked that the transaction type is set properly in gdb at
the time of the lk-owner decision.
Change-Id: I7607c7ff4f88c7ced5416a1cddb6586cf45d88f9
BUG: 861335
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4220
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I9f7562d28c8bc798552c403164397f929a7bd1e7
BUG: 860246
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/4052
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As the callings of GF_CALLOC can seldom come to a failure, glusterfs client
will crash due to segment fault. We should have returned once the variables
of transaction's local can't be alloced.
Change-Id: Ia3798b8349d832b23c7825e64dbad93ebe29cd1b
BUG: 861335
Signed-off-by: linbaiye <linbaiye@gmail.com>
Reviewed-on: http://review.gluster.org/4005
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Iebf821ff720c63ab6da4b219d82c7f1d00769992
BUG: 862838
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/4032
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch contains several xml related changes which fix some bugs and
introduce xml output for commands which were missing it. These include,
* XML output for rebalance & remove-brick status
* XML output for replace-brick
* XML output for 'volume status all' in on xml document
* proper XML output for "volume {create|start|stop|delete}"
* type & status of a volume in 'volume info' is now given as a string as well
This patch also cleans up the '#if (HAVE_LIB_XML)' sections from the code-base,
so that it is not littered around.
Change-Id: I5bb022adf0fedf7e3ead92b4b79bfa02b0b5fef5
BUG: 828131
Signed-off-by: Kaushal M <kaushal@redhat.com>
Reviewed-on: http://review.gluster.org/3869
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ic2506561367bfec9022dc53e9b17b03dc343df95
BUG: 859411
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/4055
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Eager locking lk-owner decision is taken before transaction
type is set. Default transaction type is DATA so all transactions
are treated as DATA transactions at the time of eager-locking
decision.
Fix:
Move the code that takes lk-owner decision after the transaction
type is set.
Test:
Checked that the transaction type is set properly in gdb at
the time of the lk-owner decision.
Change-Id: Ib1c886866f28788aed67622982e86d667b2cdb80
BUG: 864786
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.org/4053
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Automake provides a separate variable for preprocessor flags
(*_CPPFLAGS). They are already uses in a few places, so make it
consistent and use it everywhere. Note that cflags obtained from
pkg-config often are cppflags, which is why LIBXML2_CFLAGS moves with
into AM_CPPFLAGS, for example.
Change-Id: I15feed1d18b2ca497371271c4b5876d5ec6289dd
BUG: 862082
Original-author: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/4029
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CFLAGS
libtool will automatically add "-fPIC" to the compiler command line as
needed, so there is no need to specify it separately.
"-shared" is normally a linker flag and has an odd effect when used with
libtool --mode=compile, namely that it inhibits production of static
objects. For that however, using AC_DISABLE_STATIC is a lot simpler.
Change-Id: Ic4cba0fad18ffd985cf07f8d6951a976ae59a48f
BUG: 862082
Original-author: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/4027
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The "-nostartfiles" is a discouraged option and is documented to
potentially result in undesired behavior. Since I see no reason why it
should be in glusterfs, remove it.
Change-Id: I56f2b08874516ebad91447b2583ca2fb776bb7ab
BUG: 862082
Original-author: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/4018
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some -D flags are present in all files, so collect them.
This adds -D${GF_HOST_OS} to some compiler command lines,
but this should not be a problem.
Change-Id: I1aeb346143d4984c9cc4f2750c465ce09af1e6ca
BUG: 862082
Original-author: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/4013
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Entry self-heal does lookups on all the entries that are read
in readdir. More the size of readdir more number of lookups happen
in parallel. It is observed that it leads to HUGE cpu spikes
rendering everything else on the system unusable.
Fix:
Provided the option self-heal-readdir-size to configure the size.
Default value is at 1KB.
Tests:
Checked that the readdirs are happening with the configured value
in entry-self-heal.
Change-Id: Icaa937ad88857e6f9a12375b1e7f6a49192bc8b1
BUG: 860895
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.org/4002
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
The index in the child that comes online is generally empty
because the changes would have happened on the other child which
has been up. So the sync begins when the other child's poll
time-out happens (i.e. 10 minutes). The expectation is that the
sync must be triggered as soon as the connection with any brick
is established.
Fix:
Whenever any child_up happens trigger the index self-heal on all
local children in the replicate subvolume.
Tests:
1) Checked that the self-heal is triggered on all local children
whenever any child comes online.
2) Checked that the volume heal commands are working fine.
Change-Id: I4f64737866470a2f989349a889ea52782930e11d
BUG: 852741
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.org/3972
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
The problem is observed when kernel untar is done. One file untar
happens every second. The reason for this is, setattr lock is blocked
on the prev fd data-transaction full-lock (because of eager-lock).
Because of post-op-delay the post-op (xattrop + unlock) of the prev
data-transaction happens after 1 sec.
Until this the setattr is blocked resulting in performance problems
in untar.
Fix:
Whenever an loc data, meta-data transaction comes, it should wakeup
the prev-post-op on the same process' fd.
Tests:
The performance problem in untar went away. I put a breakpoint in
client_finodelk for a 2G file dd and the inodelk is hit only 4 times.
This confirms that the change does not affect post-op-delay in a
-ve way.
Change-Id: Ice3c2a1211f4dca6520a19bc4ba6cb9efb2902ad
BUG: 845754
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.org/3975
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I48733967facc526fb523a8dc9bd068f8c5cc5971
BUG: 764282
Signed-off-by: Varun Shastry <vshastry@redhat.com>
Reviewed-on: http://review.gluster.org/3950
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Signed-off-by: Jules Wang <lancelotds@163.com>
Change-Id: I6c7dd337c758e82e9d58d4d65f53b5aa72ac5dfb
BUG: 764890
Reviewed-on: http://review.gluster.org/3895
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* ie, don't dereference dict_t pointer, instead use APIs everywhere
* other than dict_t only 'data_t' should be the valid export from dict.h
* added 'dict_foreach_fnmatch()' API
* changed dict_lookup() to use data_t, instead of data_pair_t
Change-Id: I400bb0dd55519a7c5d2a107e67c8e7a7207228dc
Signed-off-by: Amar Tumballi <amarts@redhat.com>
BUG: 850917
Reviewed-on: http://review.gluster.org/3829
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Entry/Data self-heal is orthogonal to meta-data self-heal.
meta-data split-brain should not affect entry/data self-heal.
Fix:
Prevented aborting rest of the self-heals when metadata split-brain
happens.
Tests:
1) Simulated meta-data split-brain then checked data-self-heal
succeed on regular file, entry-self-heal succeed on dir.
2) Reset meta-data change-log on one of the subvols and checked
that meta-data self-heal also completes.
3) Executed self-heal sanity script.
Change-Id: I05ca222d855d3a6000703e3775471d0f874d35d6
BUG: 851451
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.org/3853
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <obdurodon@gmail.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- (Excessive) Logging has been very useful as 'bread-crumbs' in
many a root-cause analyses. This patch aims at avoiding logging when
the information could be reconstructed using the xattrs, statedump,
and/or "volume heal" CLI commands.
Change-Id: Iebc6b10ae18f0dd9704bdc6dd03bcfe0f2a09abd
BUG: 844804
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/3805
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ie56228dfbdc7e519a344681487164a835488a470
BUG: 835423
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/3826
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RCA:
In case of dir fops create, mknod, mkdir, link, symlink, rename
if the fop fails on read-child then unwinds are happening with
all-zero pre/post iatt-bufs. The bug occurs because the parent
bufs are not saved if the response is not from read-child.
Fix:
Save the pre/post-bufs for the first response. If the response
comes from read-child, overwrite whatever we have cached.
Tests:
Attached the mount process to gdb.
Tested that the unwinds happen with proper pre/post iatt bufs in
the following cases:
1) All success case
2) Failure on read-child
3) Failure on non-read-child
4) Failure on all children.
Tested soft-link self-heal to test the change made in that.
Tested errno ENOTEMPTY for rmdir, rename fops.
Change-Id: I82882423d2d766b4f4a3044203bcb5dbcaee1755
BUG: 845242
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3775
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RCA:
When an fd is opened while a brick is down, after the brick
comes back up afr issues open on the other brick. It can
fail for a number of reasons (enoent etc). While the system
is in that state, inode/entrylks pre-op happen only on the
brick that is up and fd is opened for fd-fops. post-op should
consider only the bricks where both pre-op and fop succeeded
as success, rest of them as failures. Code now marks only the
children that are down as failures as opposed to child_down &
fd-not-opened. This makes change-log appear as success on the
subvolume where we did not do any fop leading to no change-log
but differences in data/metadata for reg-files.
Fix:
Mark non-participants of fop as failure. This is tracked in
transaction.pre_op[].
Tests:
Simulated the scenario using err-gen on top of one of the client
xlator which fails all fops always. Performed fops and the changelog
represented pending fops on the brick with err-gen loaded. Tested
the case of brick down and perform entry/metadata/data operations
to confirm they still work as expected.
Change-Id: I41905936126b19abba56ca581c0301a894507e1a
BUG: 844987
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3765
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RCA:
Afr crashes when a last fop response fails and
'fop output' arguments are NULL. Afr does not handle
these gracefully.
Fix:
Changed the fops to not access the 'fop output' arguments
in case of failures.
Tests:
Changed afr wind_cbk code to fail the last response by setting
op_ret as -1 and op_errno as ENOMEM and setting all other output
variables as NULL to test the change. Removed the code to verify
success cases. No crashes or errors seen.
Change-Id: Iad9bc54db093a162f85bfb8dbeeda5b95acd21d8
BUG: 844689
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3760
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RCA:
inode passed to inode_link is not assigned any gfid if the
inode with that gfid is already linked, so loc for opendir
does not have a valid inode
Fix:
Use the linked_inode returned by inode_link in the loc to
perform further operations on the entry.
Tests:
Checked that opendir comes with an loc with valid inode.
Checked that re-opendir happens successfully. Tested index,
full self-heal work fine with the fix.
Change-Id: Idf4ced4cc2320133744962059d363e373af0e5ec
BUG: 826580
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3748
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RCA
The bug is observed because the decision to mark
a file in split-brain is taken outside appropriate locks.
Lookup gathers xattrs outside any lock. The xattrs being
in split-brain in lookup should only be taken as a hint.
Appropriate inodelks should be taken before confirming
a split-brain. Self-heal confirms this at the moment.
If data/metadata self-heal is turned off, inspecting of
xattrs could not be performed so split-brain behavior
does not work correctly if the self-heal options are turned off.
Fix
Self-heals are launched to inspect xattrs even when the
data/metadata self-heal options are turned off. The decision
to heal data/metadata after the xattrs are inspected is based
on whether the options are turned on/off. So decision to set/reset
split-brain flag is taken inside appropriate locks.
Testcases:
tests 33-36 in
https://github.com/pranithk/gluster-tests/blob/master/afr/self-heal.sh
Change-Id: Ia8aeab08208b50c06609ad35a9d72f3d553ee343
BUG: 833727
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3626
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RCA:
When open was done while a brick is down, afr opens the file after
the brick comes backup. If this happens after the self-heal on the file
is completed by self-heald etc, the file will end up in truncated state.
Fix:
Filter O_TRUNC while afr-fix-open because afr_open turns O_TRUNC
into truncate transaction, so there will be pending changelog for
the subvolume on which open fails.
Testing:
Had to simulate the race by stopping fix-open until self-heald completes
self-heal on the file after brick online.
Change-Id: I32759cc37f4bb34f206d01606a279f17b246dba4
BUG: 841840
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3705
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A crash occurs when attempting to link a named pipe on a striped,
replicated volume. The cause for this crash is attempting to deref
a NULL inode pointer in stripe_link_cbk(). The RCA for this bug
uncovered a couple of problems:
- AFR ignores the inode pointer it receives on failure (returning
NULL).
- stripe assumes the inode pointer is valid on failure.
Either one of these changes addresses the crash, but this patch
includes both changes. AFR is modified to pass along the inode
pointer it receives (which could still be NULL). stripe is
modified to not assume the inode pointer is valid on fop failure.
BUG: 842825
Change-Id: I9cb2cc918552620929c3ecbd69bc66d4635eafdc
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-on: http://review.gluster.com/3727
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RCA:
Data self-heal for non regular files open the files
and then proceeds using that fd. This approach
does not work for symlinks because open on symlink opens
the file resolved by it.
Fix:
If the file is not a regular file then perform self-heal using
loc. It needs to get 'big' lock and then perform lookup to get
changelog then erase data part of chagelog, then unlock.
Test cases:
Automated at
https://github.com/pranithk/gluster-tests/blob/master/afr/special-file-self-heal-test.sh
Change-Id: I924a922f5135872efe2cccf2e712ada082c5689f
BUG: 811317
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3724
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A striped, replicated volume spits an error on file creation because
stripe requires xdata to process stripe information and AFR isn't
passing it back.
This fix was suggested by Amar Tumballi.
BUG: 842373
Change-Id: Ia7063590ca5e873d4a4e155989cf067e8a07501f
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-on: http://review.gluster.com/3713
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
See comments in http://bugzilla.redhat.com/839925 for
the code to perform this change.
Signed-off-by: Jim Meyering <meyering@redhat.com>
BUG: 839925
Change-Id: I10e4ecff16c3749fe17c2831c516737e08a3205a
Reviewed-on: http://review.gluster.com/3661
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the struct volume_options, the "data-self-heal"
.default_value = "" setting appeared before a setting of
.default_value = "on". Remove the former.
Change-Id: Ieddcc18f61581f9448d806cd8bf8eefaaf0118b9
BUG: 789278
Signed-off-by: Jim Meyering <meyering@redhat.com>
Reviewed-on: http://review.gluster.com/3589
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
post-op-delay introduces an artificial delay between the OP and
POST-OP-CHANGELOG phases of a write transaction to increase the
probability of changelog-piggyback and eager-locking to work
more efficiently.
Also enable eager-locking by default.
Change-Id: I865ca4b68512c44818719c7e388952f15d53e6c2
BUG: 836033
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.com/3621
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pranithk@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Historically PID (frame->root->pid) was used by the locks translator
to identify a locker (and make decisions about which locks contend
or cooperate/merge). Since the introduction of lock_owner parameter
the usage of PID (for locks) was deprecated and is now unused. This
patch nukes the usage of PID in AFR
The usage of lk_owner has also ended up being a mess, because of the
differentiation required between ->lk() and ->inodelk(), (->lk() needs
to be identified by the process (roughly) and ->inodelk() needs to be
identified by the transaction) and also because of optimizations like
eager locking (locks are no more identified by the transaction as they
now get inherited by the next transaction).
The scheme (and technique) now is:
- All FOPs (the third phase of the transaction) happen with the lk_owner
which is set by the topmost layer (FUSE, NFS etc.)
- All entrylks are issued with lk_owner set to the frame->root address.
- Inodelks which will not be subject to eager locking are issued with
lk_owner set to frame->root.
- Inodelks which are subject to eager locking are issued with lk_owner
set to the address of fd_t (which are the only type of frames which
get subject to the eager locking optimization)
- At the start of the transaction, the transaction frame's lk_owner is
set to the either frame->root or fd_t (and never unmodified) depending
on the type of transaction.
- Just before the third phase (FOP phase) the set lk_owner is "saved"
away and overwritten by the lk_owner submitted by the top layer (FUSE
or NFS)
- Right after the third phase, the saved lk_owner is "restored" to resume
the transaction into the POST-OP and eventually UNLOCK using the same
lk_owner which was used during the LOCK phase.
Change-Id: I6ab8e4d6b65ae4185fa85ad3fded8e9188b2f929
BUG: 836033
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.com/3620
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pranithk@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
read subvolume is a nice option to set prefred read child if you have a
replication over 2 datacenter. if you have 2 datacenter and have a
distributed replication where one set of servers are in datacenter one
and the other (the replicated) are in the other datacenter
read-subvolume it not very handy since it goes over name and the
subvolume name is different for each replication pair. i added a new
option called read-subvolume-index which take the number of the
subvolume to choose. 0 fo first , 1 for second and so on subvolume in
every replication. this option can now be used in the --xlator-option
mount option to choose the prefered read child for all replication at
once. For Example on all clients in datacenter one you can use
--xlator-option=volumename-replication-*.read-subvolume-index=0 to
prefer read from the servers in datacenter one. when you expand or
shrink the volume no changes are needed to the client config since the
wildcard will set this option automatic on reconfigure.
Change-Id: I3b47432f77037c380ff4a6296636c6f8fc953db9
BUG: 837420
Original-author: domwo <glusterfs@wollina.de>
Signed-off-by: domwo <glusterfs@wollina.de>
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.com/3615
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
changed order of prevered read child in afr_select_read_child_from_policy
when a read child is set over config option read-subvolume it shoudl be first to return
Change-Id: I1c5a8171379bb2bad76f6653e9d68a9349d55142
BUG: 833750
Original-author: domwo <glusterfs@wollina.de>
Signed-off-by: domwo <glusterfs@wollina.de>
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.com/3614
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
BUG: 804606
Change-Id: I8cefcb6efa687fac4ad412403c085b3767218f72
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3586
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
BUG: 831151
Change-Id: I6ecc099cf5f3ae58b19dfb00ed0b3f9959e711e5
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3571
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Integer volume options which specified only the min value as 0, would not be
validated during "volume set".
The range check for an option happened only if both min and max were not 0. In
the above case, even though a minium was specified, the range check did not
happen as both min and max were 0.
To allow forced validation in such cases, a new member, "validate", has been
added to volume_options_t. This member takes the values GF_OPT_VALIDATE_BOTH,
GF_OPT_VALIDATE_MIN and GF_OPT_VALIDATE_MAX (GF_OPT_VALIDATE_BOTH is the
default).
Change-Id: I351de0eedb6028120e5c0b073ee5d9c141dee717
BUG: 809847
Signed-off-by: Kaushal M <kaushal@redhat.com>
Reviewed-on: http://review.gluster.com/3084
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gfid_req is set only by the fuse xlator. Fresh lookups
performed by self-heal-daemon, rebalance will not have
gfid at all.
Change-Id: I6712e3063067ecc5f19956e75d28c86bfc19fc65
BUG: 829203
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3529
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Controlled by the "choose-local" option (on by default).
Change-Id: I560f27c81703f2c9c62fdb51532c8eb763826df7
BUG: 806462
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.com/3005
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I559a3ff507b9487b1dfca7871c188a05d89ea6d6
BUG: 826580
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3515
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Both the first-to-respond method and the round-robin method are susceptible
to clients repeatedly choosing the same servers across a series of opens,
creating hot spots. Also, the code to handle a replica being down will
ignore both methods and just choose the first remaining (which is not an
issue for two-way but can be otherwise). The hashed method more reliably
avoids such hot spots. There are three values/modes.
0: use the old (broken) methods.
1: select a read-child based on a hash of the file's GFID, so all clients
will choose the same subvolume for a file (ensuring maximum consistency)
but will distribute load for a set of files.
2: select a read-child based on a hash of the file's GFID plus the client's
PID, so different children will distribute load even for one file.
Mode 2 will probably be optimal for most cases. Using response time when we
open the file is problematic, both because a single sample might not have
been representative even then and because load might have shifted in the
hours or days since (for long-lived files). Trying to use more current load
information can lead to "herd following" behavior which is just as bad.
Pseudo-random distribution is likely to be the best we can reasonably do,
just as it is for DHT.
Change-Id: I798c2760411eacf32e82a85f03bb7b08a4a49461
BUG: 802513
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.com/2926
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ia2944f891dd62e72f3c79678c3a1fed389854a90
BUG: 811970
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3158
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|