| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RCA:
In case of dir fops create, mknod, mkdir, link, symlink, rename
if the fop fails on read-child then unwinds are happening with
all-zero pre/post iatt-bufs. The bug occurs because the parent
bufs are not saved if the response is not from read-child.
Fix:
Save the pre/post-bufs for the first response. If the response
comes from read-child, overwrite whatever we have cached.
Tests:
Attached the mount process to gdb.
Tested that the unwinds happen with proper pre/post iatt bufs in
the following cases:
1) All success case
2) Failure on read-child
3) Failure on non-read-child
4) Failure on all children.
Tested soft-link self-heal to test the change made in that.
Tested errno ENOTEMPTY for rmdir, rename fops.
Change-Id: I82882423d2d766b4f4a3044203bcb5dbcaee1755
BUG: 845242
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3775
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RCA:
When an fd is opened while a brick is down, after the brick
comes back up afr issues open on the other brick. It can
fail for a number of reasons (enoent etc). While the system
is in that state, inode/entrylks pre-op happen only on the
brick that is up and fd is opened for fd-fops. post-op should
consider only the bricks where both pre-op and fop succeeded
as success, rest of them as failures. Code now marks only the
children that are down as failures as opposed to child_down &
fd-not-opened. This makes change-log appear as success on the
subvolume where we did not do any fop leading to no change-log
but differences in data/metadata for reg-files.
Fix:
Mark non-participants of fop as failure. This is tracked in
transaction.pre_op[].
Tests:
Simulated the scenario using err-gen on top of one of the client
xlator which fails all fops always. Performed fops and the changelog
represented pending fops on the brick with err-gen loaded. Tested
the case of brick down and perform entry/metadata/data operations
to confirm they still work as expected.
Change-Id: I41905936126b19abba56ca581c0301a894507e1a
BUG: 844987
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3765
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RCA:
Afr crashes when a last fop response fails and
'fop output' arguments are NULL. Afr does not handle
these gracefully.
Fix:
Changed the fops to not access the 'fop output' arguments
in case of failures.
Tests:
Changed afr wind_cbk code to fail the last response by setting
op_ret as -1 and op_errno as ENOMEM and setting all other output
variables as NULL to test the change. Removed the code to verify
success cases. No crashes or errors seen.
Change-Id: Iad9bc54db093a162f85bfb8dbeeda5b95acd21d8
BUG: 844689
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3760
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RCA:
inode passed to inode_link is not assigned any gfid if the
inode with that gfid is already linked, so loc for opendir
does not have a valid inode
Fix:
Use the linked_inode returned by inode_link in the loc to
perform further operations on the entry.
Tests:
Checked that opendir comes with an loc with valid inode.
Checked that re-opendir happens successfully. Tested index,
full self-heal work fine with the fix.
Change-Id: Idf4ced4cc2320133744962059d363e373af0e5ec
BUG: 826580
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3748
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cluster/stripe write callback handling is broken in the event of
server side errors and short writes due to crudely summing up the
return values from each node. This can produce incorrect results
or cause an application to rewrite the wrong portions of a buffer
in an attempt to handle this condition.
Modify cluster/stripe writev handling to record the requested size
of each write and use this data to return the number of consecutive
bytes written from the original request. This allows an application
to retry a write at the point of error (and potentially consume
said error).
BUG: 809975
Change-Id: Ic35cb1e092c29545205aa32e352485c507534ce0
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-on: http://review.gluster.com/3700
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Shishir Gowda <sgowda@redhat.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RCA
The bug is observed because the decision to mark
a file in split-brain is taken outside appropriate locks.
Lookup gathers xattrs outside any lock. The xattrs being
in split-brain in lookup should only be taken as a hint.
Appropriate inodelks should be taken before confirming
a split-brain. Self-heal confirms this at the moment.
If data/metadata self-heal is turned off, inspecting of
xattrs could not be performed so split-brain behavior
does not work correctly if the self-heal options are turned off.
Fix
Self-heals are launched to inspect xattrs even when the
data/metadata self-heal options are turned off. The decision
to heal data/metadata after the xattrs are inspected is based
on whether the options are turned on/off. So decision to set/reset
split-brain flag is taken inside appropriate locks.
Testcases:
tests 33-36 in
https://github.com/pranithk/gluster-tests/blob/master/afr/self-heal.sh
Change-Id: Ia8aeab08208b50c06609ad35a9d72f3d553ee343
BUG: 833727
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3626
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RCA:
When open was done while a brick is down, afr opens the file after
the brick comes backup. If this happens after the self-heal on the file
is completed by self-heald etc, the file will end up in truncated state.
Fix:
Filter O_TRUNC while afr-fix-open because afr_open turns O_TRUNC
into truncate transaction, so there will be pending changelog for
the subvolume on which open fails.
Testing:
Had to simulate the race by stopping fix-open until self-heald completes
self-heal on the file after brick online.
Change-Id: I32759cc37f4bb34f206d01606a279f17b246dba4
BUG: 841840
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3705
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A crash occurs when attempting to link a named pipe on a striped,
replicated volume. The cause for this crash is attempting to deref
a NULL inode pointer in stripe_link_cbk(). The RCA for this bug
uncovered a couple of problems:
- AFR ignores the inode pointer it receives on failure (returning
NULL).
- stripe assumes the inode pointer is valid on failure.
Either one of these changes addresses the crash, but this patch
includes both changes. AFR is modified to pass along the inode
pointer it receives (which could still be NULL). stripe is
modified to not assume the inode pointer is valid on fop failure.
BUG: 842825
Change-Id: I9cb2cc918552620929c3ecbd69bc66d4635eafdc
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-on: http://review.gluster.com/3727
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RCA:
Data self-heal for non regular files open the files
and then proceeds using that fd. This approach
does not work for symlinks because open on symlink opens
the file resolved by it.
Fix:
If the file is not a regular file then perform self-heal using
loc. It needs to get 'big' lock and then perform lookup to get
changelog then erase data part of chagelog, then unlock.
Test cases:
Automated at
https://github.com/pranithk/gluster-tests/blob/master/afr/special-file-self-heal-test.sh
Change-Id: I924a922f5135872efe2cccf2e712ada082c5689f
BUG: 811317
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3724
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cluster/stripe broke directory rename. Only check for fctx on regular
files.
BUG: 842652
Change-Id: I8a1e7ff30d57c994082cb10471f610023713ee53
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-on: http://review.gluster.com/3720
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Changing the log-level to DEBUG.
Xattr mismatch can occur when parallel setxattr's race, or when
one of the bricks was down. A subsequent setxattr will fix the
condition when all the subvols are up. In this case, the 'user.swift'
xattr used by ufo was out of sync, but did not cause any other error.
Change-Id: I6fdff78869b8ff72c305bbe122033e6c1d9d3cff
BUG: 838197
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.com/3722
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Mohammed Junaid <junaid@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A striped, replicated volume spits an error on file creation because
stripe requires xdata to process stripe information and AFR isn't
passing it back.
This fix was suggested by Amar Tumballi.
BUG: 842373
Change-Id: Ia7063590ca5e873d4a4e155989cf067e8a07501f
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-on: http://review.gluster.com/3713
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
as 'stripe-coalesce' is an internal key, no need to show it on top
of the mount-point.
Change-Id: Iab836e73d59c42774db8a2eee13fe3b0cd994bc9
Signed-off-by: Amar Tumballi <amarts@redhat.com>
BUG: 801887
Reviewed-on: http://review.gluster.com/3680
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Shishir Gowda <sgowda@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
So far there has been a global glusterfs_ctx_t object which
represents the running instance of the filesystem (client or server).
It contains the various graphs, connection to the management daemon
over which new graphs are obtained, calls stacks issued on this
filesystem, and a bunch of such things.
With the introduction of libgfapi, it is no more true that there will
be only one filesystem context in a process. Applications can
be written to use libgfapi and obtain serveral instances of different
filesystems/volumes in the same process.
This involves messy untangling of assumptions inside libglusterfs that
there would only be one global glusterfs_ctx_t and offload that
assumption to glusterfsd/ and cli/ (where it is true).
Change-Id: Ifd7d1259428c26076140a5764a2dc7361694139c
BUG: 839950
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.com/3678
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
See comments in http://bugzilla.redhat.com/839925 for
the code to perform this change.
Signed-off-by: Jim Meyering <meyering@redhat.com>
BUG: 839925
Change-Id: I10e4ecff16c3749fe17c2831c516737e08a3205a
Reviewed-on: http://review.gluster.com/3661
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
BUG: 764890
Change-Id: I3eb626eeaa2a09f0e248444f560c2a0eaf46c642
Signed-off-by: Amar Tumballi <amarts@redhat.com>
Reviewed-on: http://review.gluster.com/3660
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the struct volume_options, the "data-self-heal"
.default_value = "" setting appeared before a setting of
.default_value = "on". Remove the former.
Change-Id: Ieddcc18f61581f9448d806cd8bf8eefaaf0118b9
BUG: 789278
Signed-off-by: Jim Meyering <meyering@redhat.com>
Reviewed-on: http://review.gluster.com/3589
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
post-op-delay introduces an artificial delay between the OP and
POST-OP-CHANGELOG phases of a write transaction to increase the
probability of changelog-piggyback and eager-locking to work
more efficiently.
Also enable eager-locking by default.
Change-Id: I865ca4b68512c44818719c7e388952f15d53e6c2
BUG: 836033
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.com/3621
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pranithk@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Historically PID (frame->root->pid) was used by the locks translator
to identify a locker (and make decisions about which locks contend
or cooperate/merge). Since the introduction of lock_owner parameter
the usage of PID (for locks) was deprecated and is now unused. This
patch nukes the usage of PID in AFR
The usage of lk_owner has also ended up being a mess, because of the
differentiation required between ->lk() and ->inodelk(), (->lk() needs
to be identified by the process (roughly) and ->inodelk() needs to be
identified by the transaction) and also because of optimizations like
eager locking (locks are no more identified by the transaction as they
now get inherited by the next transaction).
The scheme (and technique) now is:
- All FOPs (the third phase of the transaction) happen with the lk_owner
which is set by the topmost layer (FUSE, NFS etc.)
- All entrylks are issued with lk_owner set to the frame->root address.
- Inodelks which will not be subject to eager locking are issued with
lk_owner set to frame->root.
- Inodelks which are subject to eager locking are issued with lk_owner
set to the address of fd_t (which are the only type of frames which
get subject to the eager locking optimization)
- At the start of the transaction, the transaction frame's lk_owner is
set to the either frame->root or fd_t (and never unmodified) depending
on the type of transaction.
- Just before the third phase (FOP phase) the set lk_owner is "saved"
away and overwritten by the lk_owner submitted by the top layer (FUSE
or NFS)
- Right after the third phase, the saved lk_owner is "restored" to resume
the transaction into the POST-OP and eventually UNLOCK using the same
lk_owner which was used during the LOCK phase.
Change-Id: I6ab8e4d6b65ae4185fa85ad3fded8e9188b2f929
BUG: 836033
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.com/3620
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pranithk@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
read subvolume is a nice option to set prefred read child if you have a
replication over 2 datacenter. if you have 2 datacenter and have a
distributed replication where one set of servers are in datacenter one
and the other (the replicated) are in the other datacenter
read-subvolume it not very handy since it goes over name and the
subvolume name is different for each replication pair. i added a new
option called read-subvolume-index which take the number of the
subvolume to choose. 0 fo first , 1 for second and so on subvolume in
every replication. this option can now be used in the --xlator-option
mount option to choose the prefered read child for all replication at
once. For Example on all clients in datacenter one you can use
--xlator-option=volumename-replication-*.read-subvolume-index=0 to
prefer read from the servers in datacenter one. when you expand or
shrink the volume no changes are needed to the client config since the
wildcard will set this option automatic on reconfigure.
Change-Id: I3b47432f77037c380ff4a6296636c6f8fc953db9
BUG: 837420
Original-author: domwo <glusterfs@wollina.de>
Signed-off-by: domwo <glusterfs@wollina.de>
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.com/3615
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
changed order of prevered read child in afr_select_read_child_from_policy
when a read child is set over config option read-subvolume it shoudl be first to return
Change-Id: I1c5a8171379bb2bad76f6653e9d68a9349d55142
BUG: 833750
Original-author: domwo <glusterfs@wollina.de>
Signed-off-by: domwo <glusterfs@wollina.de>
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.com/3614
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
glusterfs_ctx->notify can be used by any xlator to talk to
glusterfsd-mgmt.
Note- This is for any rpc communication initiated by the xlator,
and not from glusterd.
Change-Id: Ic0e4af106fe1e98d797ca621facda8839b87598a
BUG: 835757
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.com/3618
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
BUG: 804606
Change-Id: I8cefcb6efa687fac4ad412403c085b3767218f72
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3586
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
BUG: 831151
Change-Id: I6ecc099cf5f3ae58b19dfb00ed0b3f9959e711e5
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3571
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The coalesce file format for cluster/stripe condenses the striped
files to a contiguous layout. The elimination of holes in striped
files eliminates space wasted via local filesystem preallocation
heuristics and significantly improves read performance.
Coalesce mode is implemented with a new 'coalesce' xlator option,
which is user-configurable and disabled by default. The format of
newly created files is marked with a new 'stripe-coalesce' xattr.
Cluster/stripe handles/preserves the format of files regardless
of the current mode of operation (i.e., a volume can
simultaneously consist of coalesced and non-coalesced files).
Files without the stripe-coalesce attribute are assumed to have
the traditional format to provide backward compatibility.
extras/stripe-merge: support traditional and coalesce stripe formats
Update the stripe-merge recovery tool to handle the traditional
and coalesced file formats. The format of the file is detected
automatically (and verified) via the stripe-coalesce attributes.
BUG: 801887
Change-Id: I682f0b4e819f496ddb68c9a01c4de4688280fdf8
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-on: http://review.gluster.com/3282
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Integer volume options which specified only the min value as 0, would not be
validated during "volume set".
The range check for an option happened only if both min and max were not 0. In
the above case, even though a minium was specified, the range check did not
happen as both min and max were 0.
To allow forced validation in such cases, a new member, "validate", has been
added to volume_options_t. This member takes the values GF_OPT_VALIDATE_BOTH,
GF_OPT_VALIDATE_MIN and GF_OPT_VALIDATE_MAX (GF_OPT_VALIDATE_BOTH is the
default).
Change-Id: I351de0eedb6028120e5c0b073ee5d9c141dee717
BUG: 809847
Signed-off-by: Kaushal M <kaushal@redhat.com>
Reviewed-on: http://review.gluster.com/3084
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gfid_req is set only by the fuse xlator. Fresh lookups
performed by self-heal-daemon, rebalance will not have
gfid at all.
Change-Id: I6712e3063067ecc5f19956e75d28c86bfc19fc65
BUG: 829203
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3529
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
currently working on obvious resource leak reports in coverity
Change-Id: I261f4c578987b16da399ab5a504ad0fda0b176b1
Signed-off-by: Amar Tumballi <amarts@redhat.com>
BUG: 789278
Reviewed-on: http://review.gluster.com/3265
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Controlled by the "choose-local" option (on by default).
Change-Id: I560f27c81703f2c9c62fdb51532c8eb763826df7
BUG: 806462
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.com/3005
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I559a3ff507b9487b1dfca7871c188a05d89ea6d6
BUG: 826580
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3515
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also no need to free the xlator object after rebalance is over, as the process
is about to be killed.
Change-Id: I6973e43c0353b5de61c0b39e52a22c618be361f4
BUG: 826584
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-on: http://review.gluster.com/3495
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
with this, inode-linking it in readdirp_cbk will be neater.
Change-Id: Ie2cd646438f851e1755e9b6a3fc9898059bee359
Signed-off-by: Amar Tumballi <amar@gluster.com>
BUG: 816140
Reviewed-on: http://review.gluster.com/2717
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Both the first-to-respond method and the round-robin method are susceptible
to clients repeatedly choosing the same servers across a series of opens,
creating hot spots. Also, the code to handle a replica being down will
ignore both methods and just choose the first remaining (which is not an
issue for two-way but can be otherwise). The hashed method more reliably
avoids such hot spots. There are three values/modes.
0: use the old (broken) methods.
1: select a read-child based on a hash of the file's GFID, so all clients
will choose the same subvolume for a file (ensuring maximum consistency)
but will distribute load for a set of files.
2: select a read-child based on a hash of the file's GFID plus the client's
PID, so different children will distribute load even for one file.
Mode 2 will probably be optimal for most cases. Using response time when we
open the file is problematic, both because a single sample might not have
been representative even then and because load might have shifted in the
hours or days since (for long-lived files). Trying to use more current load
information can lead to "herd following" behavior which is just as bad.
Pseudo-random distribution is likely to be the best we can reasonably do,
just as it is for DHT.
Change-Id: I798c2760411eacf32e82a85f03bb7b08a4a49461
BUG: 802513
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.com/2926
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new type is DHT_HASH_TYPE_DM_USER=1 (on disk in network byte order) and
we treat it the same as DHT_HASH_TYPE_DM except that we don't stomp on it
during rebalance.
Change-Id: I893571a9b89577acdea2fe868915b18d3663fd77
BUG: 807312
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.com/3004
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ia2944f891dd62e72f3c79678c3a1fed389854a90
BUG: 811970
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3158
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I9d76ddbd2cf8e4e8e4ad70529ba3a70178489a68
BUG: 765194
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3435
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added run-time value field to cli output of rebalance/remove-brick.
A new cluster/distribute boolean option rebalance-stats when set to
ON, time taken for migration of each file is logged.
With rebalance-stats OFF (default), rebalance logs will only have
entries showing time spent in each directory.
Change-Id: I02a8918621120068cd71ffaf2999d30b3a2d10a2
BUG: 821987
Signed-off-by: shishir gowda <shishirng@gluster.com>
Reviewed-on: http://review.gluster.com/3303
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I58271e1ac5a116b5bc717d7cad9f03eb7dc8a1a4
BUG: 811551
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3417
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
creating a local synctask_env can lead to creating of many more
syncop threads than required. The current syncop logic can handle
the scale-up/scale-down of threads depending on the load. Hence,
its neater to use global synctask env.
Change-Id: Id46f963a0190c0154513317ae03323db155ac15a
Signed-off-by: Amar Tumballi <amarts@redhat.com>
BUG: 823774
Reviewed-on: http://review.gluster.com/3412
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
BUG: 823255
Change-Id: Ic6ad33518ea42c9518a21381518bd4f4afdd87cb
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3382
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Linux basename() and dirname() return a pointer within the string
passed as argument. On BSD flavors, basename() and dirname() return
static storage, or pthread specific storage. Both behaviour are
compliant, but calling free on the result in the second case is a bug.
BUG: 764655
Change-Id: Ic82414aff1f8db2a7544b16315761ce1c05276c4
Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org>
Reviewed-on: http://review.gluster.com/3377
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pranithk@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I45be4ea7f04ee79b67a83134fe8ebd18067a707f
BUG: 820355
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3373
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Raghavendra Bhat <raghavendrabhat@gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The xattrop order in pre/post op on all the subvols
is client-0, client-1... client-n where n is (replica-count - 1).
This order can lead to invalid split-brains if the brick
dies in the middle of xattrops.
Example: transaction completed pre-op, so on all the subvolumes
xattrs have '1' changelog. Now post-op is sent to both the subvols.
On subvol-0 change-log of client-0 is decremented to 0, before
decrementing change-log of client-1 to 0 the brick dies.
This change-log status on subvol-0 gives the meaning that a
change is done on subvol-0 successfully but on subvol-1 it failed.
Which is not what happened.
Changes done when the subvol-0 was down will lead to pending
change-log on subvol-1 for subvol-0. Which is correct.
When the subvol-0 is brought back up, the change-log will be in
split-brain state even when it is not a legitimate split-brain.
If the brick dies in the middle of xattrops it should remain fool.
Pre-op should perform xattrop of the local change-log first and
post-op should perform xattrop of the local change-log last.
In case of optimistic changelogs txn_changelog should be done
last on local if it succeeds, first if it fails.
Change-Id: Ib6eeb20cdc49b0b1fd2f454f25a9c8e08388c6e7
BUG: 765194
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3226
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I96d59ad239c2c5efee14dd4b01a10a3f565d491e
BUG: 765587
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3091
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ia27ee996bed8f5915c154718bf6e859b6a2fc335
BUG: 765587
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3090
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I4500f39a49ee16e6e88451dcf147d9f49b1d749e
BUG: 765587
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3089
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ifab37db2af8d489cd516e992b7423c765dcabc4f
BUG: 765587
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3088
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I631e5bf4b3615b553b72e7ac7f490714b3b995f9
BUG: 821395
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3329
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When iobuf is created it has reference count = 1.
After iobref_add (local->iobref, iobuf); reference count becomes 2.
After iobref_unref(local->iobref); it becomes 1 and never becomes 0.
So iobuf never deletes and this causes a memory leak.
I emulated it, creating files on brick manually.
After 5 mins of:
while true; do dd if=file of=/dev/zero bs=16384; done
top showed me this:
4618 root 20 0 1721m 1.5g 1868 S 0.0 16.2 5:41.77 glusterfs
1.5 gb of memory has leaked.
For what this if for? Can it be true in the normal conditions?
if ((local->replies[i].op_ret < local->replies[i].requested_size) &&
(local->stbuf_size > (local->offset + op_ret))) {
May be delete it entirely?
Change-Id: I17c115ab566e5bba662dd809e0c747db3c0310c8
BUG: 822378
Signed-off-by: Alexander Bersenev <bay@hackerdom.ru>
Reviewed-on: http://review.gluster.com/3340
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I45767e26288ef6de6446ddf2ea82ed31e128d227
BUG: 796579
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/3277
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|