| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
After fix http://review.gluster.org/4282 (libglusterfsterfs/syncop: do not
hold ref on the fd in cbk) was pushed, syncop_open does not take a ref anymore.
BUG: 910661
Change-Id: Idedff91270966e6e70e71ee83785c0228e238d31
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4608
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, linkfile creation happens as root.
use uid/gid returned from _cbk (link/rename) to set the correct ownership of
the link files.
Also added test/dht.rc to implement common dht functions
BUG: 884597
Change-Id: I6bc0e04f62d4716fc033681e5678e852a1be7a2f
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4607
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since directories have presence on all subvolumes there is
no definite meaning of ->hashed_subvol or ->cached_subvol.
getxattr() code path chooses ->cached_subvol for pathinfo
extended attribute. While this makes sense of files, it makes
less sense for directories. Further if a hashed or a cached
subvolume is down, and there's a getxattr request for a
directory, we return with an errno.
This patch changes pathinfo extended attribute contents by
aggregating information from all subvolumes that are up.
Change-Id: I58adb741d63ccfd1d0239af75eb65f26f0fb384d
Signed-off-by: Venky Shankar <vshankar@redhat.com>
BUG: 856455
Reviewed-on: http://review.gluster.org/4047
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I1c9541058c7d07786539a3266ca125a6a15287d8
BUG: 859835
Signed-off-by: Anand Avati <avati@redhat.com>
Original-author: Kacper Kowalik (Xarthisius) <xarthisius.kk@gmail.com>
Signed-off-by: Kacper Kowalik (Xarthisius) <xarthisius.kk@gmail.com>
Reviewed-on: http://review.gluster.org/3967
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Typically this lock was not needed in practice, but with
http://review.gluster.org/3842, this code gets executed in multiple
threads for different servers and we lose a count. This results in
leaked lock and a hang for a future transaction.
Change-Id: I377ed20e44f2a45cff522289dfef181f0653eca2
BUG: 765564
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/4480
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This method deals with the case where swapping might gain a bigger overlap
for the xlator currently under consideration, but sacrifices even more from
the xlator we're swapping with. For example:
A = 0x00000000 - 0x44444443 (new 0x00000000 - 0x55555554)
B = 0x44444444 - 0x77777776 (new 0x55555555 - 0xaaaaaaa9)
C = 0x77777777 - 0xffffffff (new 0xaaaaaaaa - 0xffffffff)
Here, the new range for B has a bigger overlap with the old C than with the
old B (0x33333333 vs. 0x22222222 to be precise) so looking only at that
might lead us to swap. However, such a swap turns the new C's overlap from
0x55555556 (vs. old C) to *zero* (vs. old B). In other words, we've gained
0x11111111 for B but lost 0x55555556 for C, so it's a bad idea.
The new algorithm accounts for all effects of the swap, so it not only avoids
bad swaps but can make some good ones that would have been missed previously.
For example, if swapping a range X with a later range Y would not increase the
overlap for X we would previously have skipped it even if the swap would
increase Y's overlap without affecting X's. This is the normal case when we're
adding a new brick (which initially has zero overlap with any old range) so
finding more good swaps is probably even more important than avoiding bad ones.
Also, the logic in dht_overlap_calc was completely broken before, causing
integer overflows instead of providing correct values, so no matter what
higher-level algorithm was in place the GIGO effect would have resulted in
bad decisions.
Change-Id: If61ed513cfcb931916c6b51da293e3efbaaf385f
BUG: 853258
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/3908
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I7049c0c64e36a9dfa4cc0e0b34de7ec111d2f6c1
BUG: 908302
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4076
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is no necessity for the delayed-post-op to wait until
the next fop phase on the fd completes. Change-log,
locks are inherited by the time next fop phase is attempted
so the wakeup can happen just before the fop phase is started.
Change-Id: I0b8e591f591b0f7565eb55265ab51f476ed2b165
BUG: 908302
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4073
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Files were being created in subvol which had less than
min_free_disk available even in the cases where other
subvols with more space were available.
Solution:
Changed the logic to look for subvol which has more
space available.
In cases where all the subvols have lesser than
Min_free_disk available , the one with max space and
atleast one inode is available.
Known Issue: Cannot ensure that first file that is
created right after min-free-value is crossed on a
brick will get created in other brick because disk
usage stat takes some time to update in glusterprocess.
Will fix that as part of another bug.
Change-Id: If3ae0bf5a44f8739ce35b3ee3f191009ddd44455
BUG: 858488
Signed-off-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-on: http://review.gluster.org/4420
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
* 'CONNECTING' is taken as CHILD_UP.
* Sending notifications (default_notify()) for all the events individually
while mounting.
Solution:
* Consider Child up only after the event CHILD_UP is received.
* Send a single notification for all the children's events only
while mounting.
Change-Id: I1b7de127e12f5bfb8f80702dbdce02019e138bc8
BUG: 885072
Signed-off-by: Varun Shastry <vshastry@redhat.com>
Reviewed-on: http://review.gluster.org/4356
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In dht_mkdir_cbk, EEXIST error is treated like a true error. Because
of this the following sequence of events can happen, eventually
resulting in GFID mismatch and (and possibly leaked locks and hang,
in the presence of replicate.)
The issue exists when many clients concurrently attempt creation of
directory and subdirectory (e.g mkdir -p /mnt/gluster/dir1/subdir)
0. First mkdir happens by one client on the hashed subvolume. Only
one client wins the race. Others racing mkdirs get EEXIST. Yet
other "laggers" in the race encounter the just-created directory
in lookup() on the hash dir.
1. At least one "lagger" lookup() notices that there are missing
directories on other subvolumes (which the "winner" mkdir is yet
to create), and starts off self-heal of the directory.
2. At least on some subvolumes, self-heal's mkdir wins the race
against the "winner" mkdir and creates the directory first. This
causes the "winner" mkdir to experience EEXIST error on those
subvolumes.
3. On other subvolumes where "winner" mkdir won the race, self-heal
experiences EEXIST error, but self-heal is properly translating
that into a success (but mkdir code path is not -- which is the
bug.)
4. Both mkdir and self-heal assign hash layouts to the just created
directory. But self-heal distributes hash range across N (total)
subvolumes, whereas mkdir distributes hash range across N - M
(where M is the number of subvolumes where mkdir lost the race).
Both the clients "cache" their respective layouts in the near
future for all future creates inside them (evidence in logs)
5. During the creation of the subdirectory, two clients race again.
Ideally winner performs mkdir() on the hashed subvolume and proceeds
to create other dirs, loser experiences EEXIST error on the hashed
subvolume and backs off. But in this case, because the two clients
have different layout views of the parent directory (because of
different hash splits and assignements), the hashed subvolumes for
the new directory can end up being different. Therefore, both clients
now win the race (they were never fighting against each other on a
common server), assigning different GFIDs to the directory on their
respective (different) subvolumes. Some of the remaining subvolumes
get GFID1, others GFID2.
Conclusion/Fix:
Making mkdir translate EEXIST error as success (just the way self-heal
is already rightly doing) will bring back truth to the design claim
that concurrent mkdir/self-heals perform deterministic + idempotent
operations. This will prevent the differing "hash views" by different
clients and thereby also avoid GFID mismatch by forcing all clients
to have a "fair race", because the hashed subvolume for all will be
the same (and thereby avoiding leaked locks and hangs.)
Change-Id: I84592fb9b8a3f739a07e2afb23b33758a0a9a157
BUG: 907072
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/4459
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Iaf119f839cb2113b8f8efb7bf7636d471b6541bf
BUG: 866440
Signed-off-by: Venkatesh Somyajula <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/4385
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Default_fops uses stack_wind_tail. It winds without creating the frame leading
into wrong subvol return in the cookie. To avoid the problem caused by the
same, we're getting the subvol by passing the cookie.
Change-Id: I51ee79b22c89e4fb0b89e9a0bc3ac96c5b469f8f
BUG: 893338
Signed-off-by: Varun Shastry <vshastry@redhat.com>
Reviewed-on: http://review.gluster.org/4388
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
Tested-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Do not do fd_ref in cbks of the fops which return a fd (such as
open, opendir, create).
Change-Id: Ic2f5b234c5c09c258494f4fb5d600a64813823ad
BUG: 885008
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-on: http://review.gluster.org/4282
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When one of the subvolume is down, then lock request is not attempted
on that subvolume and move on to the next subvolume.
/* skip over children that are down */
while ((child_index < priv->child_count)
&& !local->child_up[child_index])
child_index++;
In the above case if there are 2 subvolumes and 2nd subvolume is down (subvolume
1 from afr's view), then after attempting lock on 1st child (i.e subvolume 0)
child index is calculated to be 1. But since the 2nd child is down child_index
is incremented to 2 as per the above logic and lock request is STACK_WINDed to
the child with child_index 2. Since there are only 2 children for afr the child
(i.e the xlator_t pointer) for child_index will be NULL. The process crashes
when it dereference the NULL xlator object.
Change-Id: Icd9b5ad28bac1b805e6e80d53c12d296526bedf5
BUG: 765564
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-on: http://review.gluster.org/4438
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I5d84ef72615f9d71b4af210976e2449de6e02326
BUG: 888174
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4446
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generally inode-write fops do transaction.unwind then
transaction.resume, but writev needs to make sure that
delayed post-op frame is placed in fdctx before unwind
happens. This prevents the race of flush doing the
changelog wakeup first in fuse thread and then this
writev placing its delayed post-op frame in fdctx.
This helps flush make sure all the delayed post-ops are
completed.
Change-Id: Ia78ca556f69cab3073c21172bb15f34ff8c3f4be
BUG: 888174
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4428
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
entrylk
when the expected lock count is equal to the attempted lock count, then before
deciding that lock is failed on all the nodes, make sure the lock type is
checked properly.
Change-Id: I1f362d54320cb6ec5654c5c69915c0f61c91d8c7
BUG: 765564
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-on: http://review.gluster.org/4436
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As of http://review.gluster.org/2828, the blocking lock code
path's condition for checking completion of locking atempt is
broken. The condition -
if ((child_index == priv->child_count) || ...)
and
if ((child_index == priv->child_count) && ...)
which is retained to check completion of blocking lock attempts
for DATA/METADATA transaction will _always_ fail because a few
lines above we have -
child_index = cookie % priv->child_count;
So child_index will never equal priv->child_count. This leaves
the correctness at the mercy of the next part of the
conditional -
.. (int_lock->lock_count == int_lock->lk_expected_count) ..
This "works" as long as no server went down during the transaction.
If a server goes down in the middle of the transaction, then this
condition also fails, and the code wraps around and starts a
blocking lock attempt loop all the way again from from the first server.
This results in double locks getting acquired on those servers, and
eventually the second condition gets hit (first condition is _never_
hit) and we come out of locking phase.
During unlock phase we perform only one unlock per server leaving the
other lock "leaked" forever.
Change-Id: I7189cdf3f70901b04647516fe1d1e189f36cc8dd
BUG: 765564
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/4433
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The earlier logic used to check if (layout-spread-count <= subvol_cnt -
decommissioned bricks). With this if a subvol was down, and layout-spread was >
upsubvols, a mkdir ended up creating holes in the layout.
The fix is to consider only the combination of subvols which are usable (not
down or not decommissioned).
Change-Id: I61ad3bcaf4589f5a75f7887cfa595c98311ae3bb
BUG: 902610
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4412
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* There are upto 3 entry lockees that may be needed to perform
entrylk'ing in posix dir-write operations.
* For eg, rmdir ("/a/b") needs to acquire locks on two entities,
- entrylk ("/a", "b")
- entrylk ("/a/b", null)
* Changed existing entrylk/rename/selfheal (entrylk) transactions
to use the new book-keeping structures
* Fixed few issues in afr_trace_entry_lk{in,out} functions. Tracing is now
aware of the new entry lockee structure.
Implementation notes:
* Changed 'cookie' sent in stack_wind to encode lockee_entity_no
and subvol_no.
cookie is a non-negative integer such that 0 <= cookie < replica_count,
When more than one lock is being acquired across the subvolumes,
cookie % replica_count gives the subvol_no
cookie / replica_count gives the lockee_entity_no.
Change-Id: Idbf41803387a7d59a0f7fcb1453d91cea74da153
BUG: 765564
Signed-off-by: Krishnan Parthasarathi <kp@gluster.com>
Reviewed-on: http://review.gluster.org/2828
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Leaving option frame-work un-changed for backward compatibility.
Change-Id: I40bce1ec360801307e67f09e53b0721f64efab37
BUG: 886998
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4309
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ic1c9559aec59c1fb9dfede4aba8895f3b86f32f1
BUG: 861015
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4098
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I0db00b7334bb9707ab48bd661ac03a3ad818d6e4
BUG: 893458
Signed-off-by: Avra Sengupta <asengupt@redhat.com>
Reviewed-on: http://review.gluster.org/4393
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* warnings on 'void *' arguments
* warnings on empty initializations
* warnings on empty array (array[0])
Change-Id: Iae440f54cbd59580eb69f3ecaed5a9926c0edf95
BUG: 875913
Signed-off-by: Avra Sengupta <asengupt@redhat.com>
Reviewed-on: http://review.gluster.org/4219
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
When "gluster volume heal <volname> info is executed, crawl's
process_entry is not going to populate iatt structure so the
iatt's gfid will be empty. So inode_links are failing.
Fix:
inode_link should be done only after lookup i.e. when heal is
performed. So moved the inode_link related code to just after
the lookup which is triggered when self-heal is done.
Tests:
The testcase that gives this issue does not give the inode-link
failures anymore. glustershd heal, info commands are working as
expected.
Wrote basic automation tests for proactive-self-heal-daemon
https://github.com/pranithk/gluster-tests/blob/master/afr/proactive-self-heal.sh
Change-Id: Ic112bf104a4d553a64d3d8559f681a25ae1a5362
BUG: 861015
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4090
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we follow a linkfile, and the lookup returns a ENOTCONN error, return
the error, as the cached subvol is down, and lookup_everywhere wont succeed,
but actually ends up clearing the linkfile, and clearing the namespace.
Change-Id: I772bf71531bc646e8fb62d3e8549a5fe0f3896da
BUG: 893378
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4383
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
When eager-lock is disabled, inodelks for write-fops on same
fd conflict with each other. If eager-lock is disabled but
delayed post-op is enabled then each write fop's inodelk unlock
waits for post-op-delay-secs. So the conflicting write fop
acquires inodelk after post-op-delay-secs. This results in
post-op-delay-secs delay for every write fop on the fd for
sequential writes (Ex: dd).
Fix:
Disable delayed-post-op when eager-lock is off.
Change-Id: I87ea4c8d1c7bb269b9b174388ae50f37e82629b7
BUG: 895235
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4391
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Afr prevents opens on a file in split-brian but the
fd that is already open still has the capability to perform
both reads and writes to the file.
Fix:
Fail readvs on a file with EIO.
Change-Id: I8e07f24c36fab800499b36ab374f984b743332cd
BUG: 873962
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4199
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The most important errno logic historically only prioritized ESTALE
over ENOENT. Commit c8c0942d added EIO prioritization over ENOENT
to ensure that split-brain was reported when it occurs in
conjunction with bricks missing the file entry. The unintended side
effect of this change is that (non split-brain) EIO errors reported
from the bricks themselves are now reported to the client when the
expectation is that afr should squash said errors in favor of
marking the file inconsistent.
The high-level problem is that EIO is overloaded with different
meanings from different contexts. This commit adds an eio parameter
to the errno priority logic to conditionally flag when EIO is of
higher priority and should be propagated to the client.
BUG: 892730
Change-Id: Ib692a8a1f1737ef190d57894f392ec53ffb33aab
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-on: http://review.gluster.org/4376
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
afr_more_important_error() is written to return whether a new errno
should override an existing errno for high-level operations that
could span multiple sub-operations. It specifically prioritizes
ESTALE over EIO over ENOENT, and otherwise defaults to the latest
error passed having priority.
This change preserves current behavior, but rewrites the logic to
return the higher priority error of the existing and new errno. The
purpose of the change is to make the logic a bit more clear and set
the stage for future changes to make the logic flexible based on
context.
BUG: 892730
Change-Id: Id1aa48855dfb0507abc9d1ef22f2259b30472576
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-on: http://review.gluster.org/4375
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Used local->postparent(contains merged iatt of all succesful calls) instead
of postparent for dht ctx time update.
2. dht_inode_ctx_time_update avoided in case of opret -1.
Change-Id: Ie04a7842a41c241f911b6a3f76267b996d27fb43
BUG: 881013
Signed-off-by: Varun Shastry <vshastry@redhat.com>
Reviewed-on: http://review.gluster.org/4338
Reviewed-by: Shishir Gowda <sgowda@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
When fop fails post-op is always performed
over the network irrespective of whether pre-op is piggybacked
or not. Decrementing Pre-op-done count even for the piggybacked
ones is wrong.
I have added an assert for pre_op_done to be non-zero and when
dd of=a if=/dev/urandom bs=5M count=1000 is executed and a brick
is taken down, the mount is crashing.
Fix:
Decrement pre-op-done count only when the post-op is not
piggybacked.
Change-Id: Ie837251a43bfb437f0fada191302eeee60be1601
BUG: 863939
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4310
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Iebfa6770a688e89c051666b46977862188061738
BUG: 802417
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/4034
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I9fa7744b02791180ccb93adef10c363a1b38aa31
BUG: 838204
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-on: http://review.gluster.org/4319
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By failing over readdir (default behaviour), rebalance could get duplicate
files, as readdir would re-read from offset 0. Rebalance should not attempt
to migrate these files again.
Additionally, we need to handle these cases as failure in rebalance crawl.
No test case provided, as we cannot determine the read child in afr.
Change-Id: If07508b4f92dacc17e0f695b48a866c7c66004be
BUG: 859387
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4300
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Along with this change, fixed the race of setting the
split-brain status in inode-ctx after unwinding the fop from
self-heal in case of back-ground self-heal.
Change-Id: Ifc829300df485f50f139443802e8b6dc7038b4ad
BUG: 873962
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4198
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When subvols-per-directory option is used, with bricks addition/removal
the layouts might get distributed to other subvols, which were not part
of the layout before. We need to clean up layouts on old subvolumes, to
prevent overlaps.
Also, we need to make sure if layout-cnt is never less
than subvolume-cnt.
Change-Id: I00994a092ca0c99aedcc41bd9412d43460f88a04
BUG: 884455
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4281
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Glusterd does not allow empty string as default value. Changed
afr option values to disallow empty string as value.
Change-Id: I92a2d658907dbc6101e1139dd91f548acb5506f5
BUG: 859927
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4271
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
When create/mknod fails on some of the nodes, appropriate pending
data/metadata changelogs are not assigned. This was not considered
to be an issue because entry self-heal would do the assigning of
appropriate changelog after creating new entries. But using
the combination of rebalance and remove brick we can construct a
case where a file with same name and gfid can be created in a dir
with different data and link-to xattr without any changelog.
Fix:
When a create/mknod failure is observed mark the appropriate
changelog on the new file created.
Change-Id: I4c32cbf5594a13fb14deaf97ff30b2fff11cbfd6
BUG: 858212
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4207
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Self-heal data lock contention between clients and glustershd
instances can lead to long wait and user response times if the
client ends up pending its lock on glustershd self-heal of a large
file. We have reports of guest vm instances going completely
unresponsive during self-heal of virtual disk images.
Optimize the read/write self-heal trigger codepath
(i.e., afr_open_fd_fix()) to trylock for self-heal and skip the
self-heal otherwise to minimize the likelihood of a running/active
guest of competing with glustershd on arrival of a brick. Note that
lock contention is still possible from the client (e.g., via
lookup).
BUG: 874045
Change-Id: I406443c061ff6acd2a851179626b78352caa5c03
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-on: http://review.gluster.org/4258
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce a block flag to support an optional blocking or
non-blocking mode in the self-heal data locking mechanism. All
callers are modified to use blocking mode, which is the current
default behavior (no change in behavior is introduced by this
commit).
BUG: 874045
Change-Id: Ib7ff9984578fa11de4e3b6981508100cdddd37cd
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-on: http://review.gluster.org/4257
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Flush is historically a transaction to ensure all previous writes
were complete. This is no longer required as write-behind has
learned to make flush a barrier operation (re: conversation w/
Avati).
Flush taking a full file lock causes VMs running on afr volumes
to stall when a migration occurs and self-heal is in progress.
Make afr_flush() a non-transactional operation.
BUG: 874045
Change-Id: If2db83823e280c86b1b29b41361eed7081601632
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-on: http://review.gluster.org/4261
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Many people have asked for behavior like the old NUFA, which builds and
seems to run but was previously impossible to enable/configure in a
standard way. This change allows NUFA to be enabled instead of DHT from
the command line, with automatic selection of the local subvolume on each
host.
Change-Id: I0065938db3922361fd450a6c1919a4cbbf6f202e
BUG: 882278
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/4234
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* write-behind: free the inode context in wb_forget
* distribute: in readdirp callback put the allocated context to the inode
* distribute: check if the layout is NULL before accessing it in layout_unref
Change-Id: I7698f81b85b99d06bf6b01fc1a6e51e1593b5e27
BUG: 790709
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-on: http://review.gluster.org/4250
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a replica pair unlike files, directories may not have their
content in same order, so readdir for same (offset, size) may
not give same entries on both the sobvolumes of replica pair.
Switching over from one subvolume to another may not be a good
idea sometimes. It may lead to duplicate entries or fewer entries
or both. This patch provides a way to disable readdir-failover
so that applications like rebalance can retry if they want to.
Change-Id: I2b23eb224a2e84016a561362932613ac824c11a0
BUG: 859387
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4159
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I4aef1c79743ee08b62e04d7b709f3e8c6b9dc56a
BUG: 881517
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4244
Reviewed-by: Anand Avati <avati@redhat.com>
Tested-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If any subvolume is down, and a layout is re-written and hash
values change, entry names in the downed subvol can be reused
in the other subvol which got the same hash range. when the
downed subvol is brought back up, duplicate entried might appear
Also separated handling of ENOSPC and ENOTCONN error.
Change-Id: I5ed93990425a4cee70df2dab7c7c119fdc87ad56
BUG: 860663
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4000
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Identify mismatching uid/gid in lookup, and trigger a syncop
heal. uid/gid of subvol with latest ctime is trusted (local->prebuf).
Change-Id: Ib5c4bc438e7f4b1f33080e73593f40f400e997f0
BUG: 862967
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/3964
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
afr_sh_data_fxattrop() currently allocates and sends a single xattr
dict_t instance to each replica. The callback codepath references
the returned object in the self-heal in-memory state for the
particular replica. If storage/posix is in the same address-space
(i.e., running a single glusterfs client with a fuse->afr->posix
graph), the same object is modified and returned for each child,
causing corrupted in-memory state and afr xattrs.
Allocate and send independent xattr dict_t's for each replica. This
allows self-heal to work correctly in a single address-space
graph.
BUG: 868478
Change-Id: I42832e85b5d1abb6098c28944c717e129300109e
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-on: http://review.gluster.org/4149
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|