| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Summary:
rmtab support is brutally hacked out in FB branch, these tests
are doomed to be sad.
Test Plan:
Reviewers: sshreyas
Subscribers:
Tasks:
Blame Revision:
Change-Id: I7bc3c061108682a12f051edcc4379721d5a216df
Signed-off-by: Kevin Vigor <kvigor@fb.com>
Reviewed-on: https://review.gluster.org/16884
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Summary:
- Fixes case where when a brick is completely wiped, the AFR2 discovery
mechanism would potentially (1/R chance where R is your replication
factor) pin a NFSd or client to the wiped brick. This would in turn
prevent the client from seeing the contents of the (degraded)
subvolume.
- The fix proposed in this patch is to force the entry-self
heal code path when the discovery process happens. And furthermore,
forcing a conservative merge in the case where no brick is found to be
degraded.
- This also restores the property of our 3.4.x builds where-by bricks
automagically rebuild via the SHDs without having to run any sort of
"full heal". SHDs are given enough signal via this patch to figure
out what they need to heal.
Test Plan:
Run "prove -v tests/bugs/fb8149516.t"
Output: https://phabricator.fb.com/P19989638
Prove test showing failed run on v3.6.3-fb_10 without the patch -> https://phabricator.fb.com/P19989643
Reviewers: dph, moox, sshreyas
Reviewed By: sshreyas
FB-commit-id: 3d6f171
Change-Id: I7e0dec82c160a2981837d3f07e3aa6f6a701703f
Signed-off-by: Kevin Vigor <kvigor@fb.com>
Reviewed-on: https://review.gluster.org/16862
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Summary:
Try to get a passing smoke test by crudely hacking out failing tests.
Test Plan:
Commit & hope for happy smoke.
Reviewers: sshreyas
Subscribers:
Tasks:
Blame Revision:
Change-Id: I564ac50557276a839d8de3a89a5c154c751b7503
Signed-off-by: Kevin Vigor <kvigor@fb.com>
Reviewed-on: https://review.gluster.org/16856
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Summary:
It was observed while testing the SHD threading code, that under high loads SHD/AFR related
SyncOps & SyncTasks can actually hang/deadlock as the transport
disconnected event (for frame timeouts) never gets bubbled up correctly. Various
tests indicated the ping timeouts worked fine, while "frame timeouts"
did not. The only difference? Ping timeouts actually disconnect
the transport while frame timeouts did not. So from a high-level we
know this prevents deadlock as subsequent tests showed the deadlocks
no longer ocurred (after this change). That said, there may be some
more elegant solution. For now though, forcing a reconnect is
preferential vs hanging clients or deadlocking the SHD.
Test Plan:
It's fairly difficult to write a good prove test for this since it requires human eyes to observe if the SHD is deadlocked (I'm open to ideas). Here's the repro though:
1. Create a 3x replicated cluster on a host.
2. Set the frame-timeout low (say 2 sec)
3. Down a brick, and write a pile of files (maybe 2000)
4. Bring up the downed brick and let the SHD begin healing files
5. During the heal process, kill -STOP <pid of brick> (hang) one of the bricks
Without this patch the SHD will be deadlocked, even though the frame timed out after 2 seconds. With the patch, the plug is pulled on the transport, a disconnect is bubbled up
to the syncop and the SHD resumes.
Reviewers: dph, meyering, cjh
Reviewed By: cjh
Subscribers: ethanr
Conflicts:
rpc/rpc-lib/src/rpc-clnt.c
FB-commit-id: c99357c
Change-Id: I344079161492b195267c2d64b6eab0b441f12ded
Signed-off-by: Kevin Vigor <kvigor@fb.com>
Reviewed-on: https://review.gluster.org/16846
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
|
|\| |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When GlusterD is restarted on a multi node cluster, while syncing the
global options from other GlusterD, it checks for quorum and based on
which it decides whether to stop/start a brick. However we handle the
return code of this function in which case if we don't want to start any
bricks the ret will be non zero and we will end up failing the import
which is incorrect.
Fix is just to ignore the ret code of glusterd_restart_bricks ()
>Reviewed-on: https://review.gluster.org/16574
>Smoke: Gluster Build System <jenkins@build.gluster.org>
>NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
>CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
>Reviewed-by: Samikshan Bairagya <samikshan@gmail.com>
>Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
>(cherry picked from commit 55625293093d485623f3f3d98687cd1e2c594460)
Change-Id: I37766b0bba138d2e61d3c6034bd00e93ba43e553
BUG: 1420993
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-on: https://review.gluster.org/16594
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Samikshan Bairagya <samikshan@gmail.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Summary:
This test is failing consistently on both the upstream 3.98 branch and
3.8-fb, just nuke.
Test Plan:
Check in, hope smoke test passes.
Reviewers:
Subscribers:
Tasks:
Blame Revision:
Change-Id: I888a9f424340dff128a4a5273f96e6d3fbc323a9
Signed-off-by: Kevin Vigor <kvigor@fb.com>
Reviewed-on: https://review.gluster.org/16647
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
|
|\| |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Problem:
The various split-brain resolution policies (favorite-child-policy based,
CLI based and mount (get/setfattr) based) attempt to resolve split-brain
even when not all bricks of replica are up. This can be a problem when
say in a replica 3, the only good copy is down and the other 2 bricks
are up and blame each other (i.e. split-brain). We end up healing the
file in such a case and allow I/O on it.
Fix:
A decision on whether the file is in split-brain or not must be taken
only if we are able to examine the afr xattrs of *all* bricks of a given
replica.
> Reviewed-on: https://review.gluster.org/16476
> Smoke: Gluster Build System <jenkins@build.gluster.org>
> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
(cherry picked from commit 0e03336a9362e5717e561f76b0c543e5a197b31b)
Change-Id: Icddb1268b380005799990f5379ef957d84639ef9
BUG: 1420984
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: https://review.gluster.org/16589
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|\| |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When overwriting an existing file with O_TRUNC, the 'atime' was set to
0, meaning the Epoch (01-Jan-1970 UTC). However, the 'mtime' gets
updated correcty.
In case 'atime' or 'mtime' is not passed in the 'struct iatt', the time
values passed to the systemcall are taken from the current values are
returned by lstat().
Cherry picked from commit 9bed81ada6f91f998e9abd915b18e3f06557cdcb:
> Change-Id: I7021b7161dcd6c9a3e515d98f6d4847533c434b3
> BUG: 1401777
> Reported-by: Eivind Sarto <eivindsarto@gmail.com>
> Signed-off-by: Niels de Vos <ndevos@redhat.com>
> Reviewed-on: http://review.gluster.org/16034
> Smoke: Gluster Build System <jenkins@build.gluster.org>
> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Change-Id: I7021b7161dcd6c9a3e515d98f6d4847533c434b3
BUG: 1411011
Reported-by: Eivind Sarto <eivindsarto@gmail.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/16356
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Problem:
Currently, I/O on a split-brained file fails even when the
favorite-child-policy is set until the self-heal is complete.
Fix:
If a valid 'source' is found using the set favorite-child-policy, inspect
and reset the afr pending xattrs on the 'sinks' (inside appropriate locks),
refresh the inode and then proceed with the read or write transaction.
The resetting itself happens in the self-heal code and hence can also
happen in the client side background-heal or by the shd's index-heal in
addition to the txn code path explained above. When it happens in via
heal, we also add checks in undo-pending to not reset the sink xattrs
again.
> Reviewed-on: http://review.gluster.org/15673
> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
> Smoke: Gluster Build System <jenkins@build.gluster.org>
> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Change-Id: Ic8c1317720cb26bd114b6fe6af4e58c73b864626
BUG: 1378547
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reported-by: Simon Turcotte-Langevin <simon.turcotte-langevin@ubisoft.com>
Reviewed-on: http://review.gluster.org/16091
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Summary:
dd is doing a statfs and failing with ENOSPC instead of writign and
getting EDQUOTA. Make either error a success in this test.
Test Plan:
prove bug-1292020.t
Reviewers:
Subscribers:
Tasks:
Blame Revision:
Change-Id: I9f580d9e4a4dd293df55a1d954f86a9862fcae7b
Signed-off-by: Kevin Vigor <kvigor@fb.com>
Reviewed-on: http://review.gluster.org/16352
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Summary:
Test relied on rmtab setting to find NFS config dir, but in FB branch
rmtab is disabled by default. Just hardcoded the location (test already
contained hardcoded path, as it turns out).
Test Plan:
Reviewers:
Subscribers:
Tasks:
Blame Revision:
Change-Id: I4a50a00ed550832ca8d91981e6c5af4d8c81b466
Signed-off-by: Kevin Vigor <kvigor@fb.com>
Reviewed-on: http://review.gluster.org/16336
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
|
|\|
| |
| |
| |
| | |
Change-Id: I844adf2aef161a44d446f8cd9b7ebcb224ee618a
Signed-off-by: Kevin Vigor <kvigor@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Problem:
In CentOS-7, the file was receving an extra removexattr(security.ima)
FOP which changed its ctime, breaking the assumption that a particular brick
had the latest ctime based on the writevs done in the .t
Fix:
1. Compare the ctime of both files in the backend and pick the one with
the latest ctime for the fav-child policy. Also unmount the volume
before comparing, to avoid any further FOPS on the file that
can possibly modify the timestamps.
2. Added floating point handling in stat function. Thanks to Pranith for
the helping debugging the regex.
> Reviewed-on: http://review.gluster.org/16288
> Smoke: Gluster Build System <jenkins@build.gluster.org>
> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
(cherry picked from commit 76fff8cb2a164b596ca67e65c99623f5b68361fd)
Change-Id: I06041a0f39a29d2593b867af8685d65c7cd99150
BUG: 1410073
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/16324
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Backport of: http://review.gluster.org/16286
PROBLEM:
Consider a volume with granular-entry-heal and sharding enabled. When
a replica is down and a shard is created as part of a write, the name
index is correctly created under indices/entry-changes/<dot-shard-gfid>.
Now when a read on the same region triggers another MKNOD, the fop
fails on the online bricks with EEXIST. By virtue of this being a
symmetric error, the failed_subvols[] array is reset to all zeroes.
Because of this, before post-op, the GF_XATTROP_ENTRY_OUT_KEY will be
set, causing the name index, which was created in the previous MKNOD
operation, to be wrongly deleted in THIS MKNOD operation.
FIX:
The ideal fix would have been for a transaction to delete the name
index ONLY if it knows it is the one that created the index in the first
place. This would involve gathering information as to whether THIS xattrop
created the index from individual bricks, aggregating their responses and
based on the various posisble combinations of responses, decide whether to
delete the index or not. This is rather complex. Simpler fix would be
for post-op to examine local->op_ret in the event of no failed_subvols
to figure out whether to delete the name index or not. This can occasionally
lead to creation of stale name indices but they won't be affecting the IO path
or mess with pending changelogs in any way and self-heal in its crawl of
"entry-changes" directory would take care to delete such indices.
Change-Id: Icc642a987d1b6a5097562315aecf1263ed35ceb6
BUG: 1408786
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/16293
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Backport of: http://review.gluster.org/16193
Replace the EXPECT '00000001' with EXPECT_NOT '00000000'. This is
because occasionally a name-heal is performing new-entry marking on
'c' causing the pending entry changelog on it to become '00000002'.
Change-Id: Ib7b0d64c8de2498c2ffb3b8e06228694f2c55755
BUG: 1406740
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/16224
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Backport of: http://review.gluster.org/16169
Check that shd is up before executing 'volume heal' command
Change-Id: I43634a979791fcb92bfebf93ec48eff42af2bb97
BUG: 1405890
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/16190
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
After sending SIGTERM to gluster process we immediately
check if process exited. We should wait for some time
before checking process state.
> Reviewed-on: http://review.gluster.org/16162
> Smoke: Gluster Build System <jenkins@build.gluster.org>
> Reviewed-by: Avra Sengupta <asengupt@redhat.com>
> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
> Reviewed-by: N Balachandran <nbalacha@redhat.com>
> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
(cherry picked from commit e9d8525a0d34130ba2a582109937b8e79eecf6ab)
BUG: 1405450
Change-Id: Iaba0067f6e880a7fe38e11b9fa0fe9bd103b19e2
Signed-off-by: Rajesh Joseph <rjoseph@redhat.com>
Reviewed-on: http://review.gluster.org/16164
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Avra Sengupta <asengupt@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Enable data, metadata and entry self-heal as xlator-options so that glfs-heal.c
can heal split-brain files even if they are disabled on the volume via volume
set commands.
> Reviewed-on: http://review.gluster.org/11333
> Smoke: Gluster Build System <jenkins@build.gluster.org>
> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
(cherry picked from commit 209c2d447be874047cb98d86492b03fa807d1832)
Change-Id: Ic191a1017131db1ded94d97c932079d7bfd79457
BUG: 1405130
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/16144
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Backport of: http://review.gluster.org/16075
Incorrect initialisation of local->optimistic_change_log was leading
to skipped pre-op and post-op even when a brick didn't participate in
the txn because it was down.
The result - missing granular name index resulting in some entries
never getting healed.
FIX:
Initialise local->optimistic_change_log just before pre-op.
Also fixed granular entry heal to create the granular name index in
pre-op as opposed to post-op. This is to prevent loss of granular
information when during an entry txn, the good (src) brick goes
offline before the post-op is done. This would cause self-heal to
do conservative merge (since dirty xattr is the only information
available), which when granular-entry-heal is enabled, expects
granular indices, the lack of which can lead to loss of data in
the worst case.
Change-Id: Ibc0fbfb3fa21c578e28868d9e30b274e33c12064
BUG: 1403646
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/16105
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Problem:
The issue as seen by the user is detailed in the BZ but what is
happening is if the no. of items in the wait queue == max-qlen,
syncop_mt_dir_scan() does a pthread_cond_wait until the launched
synctask workers dequeue the queue. But if for some reason the worker
fails, the queue is never emptied due to which further invocations of
syncop_mt_dir_scan() are blocked forever.
Fix: Made some changes to _dir_scan_job_fn
- If a worker encounters error while processing an entry, notify the
readdir loop in syncop_mt_dir_scan() of the error but continue to process
other entries in the queue, decrementing the qlen as and when we dequeue
elements, and ending only when the queue is empty.
- If the readdir loop in syncop_mt_dir_scan() gets an error form the
worker, stop the readdir+queueing of further entries.
> Reviewed-on: http://review.gluster.org/16073
> Smoke: Gluster Build System <jenkins@build.gluster.org>
> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
(cherry picked from commit 2d012c4558046afd6adb3992ff88f937c5f835e4)
Change-Id: I39ce073e01a68c7ff18a0e9227389245a6f75b88
BUG: 1403192
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/16096
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
During snapd graph generation we should check if SSL is
enabled on main volume or not. This is because clients
will communicate with snapd as if it is communicating to
a brick.
> Reviewed-on: http://review.gluster.org/15979
> Smoke: Gluster Build System <jenkins@build.gluster.org>
> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
> Reviewed-by: Kaushal M <kaushal@redhat.com>
(cherry picked from commit 182f0d12040dab5081ca645a3f370f65cd68b528)
Change-Id: I0d7fe86c567b297a8528a48faf06161d4c3cb415
Signed-off-by: Rajesh Joseph <rjoseph@redhat.com>
BUG: 1400459
Reviewed-on: http://review.gluster.org/15986
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Backport of: http://review.gluster.org/15747
When there are already existing non-granular indices created that are
yet to be healed, if granular-entry-heal option is toggled from 'off' to
'on', AFR self-heal whenever it kicks in, will try to look for granular
indices in 'entry-changes'. Because of the absence of name indices,
granular entry healing logic will fail to heal these directories, and
worse yet unset pending extended attributes with the assumption that
are no entries that need heal.
To get around this, a new CLI is introduced which will invoke glfsheal
program to figure whether at the time an attempt is made to enable
granular entry heal, there are pending heals on the volume OR there
are one or more bricks that are down. If either of them is true, the
command will be failed with the appropriate error.
New CLI: gluster volume heal <VOL> granular-entry-heal {enable,disable}
Change-Id: I342e0390f847fcb015a50ef58aedfcbcb58f4ed3
BUG: 1398501
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/15942
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Issue:
=====
In certain cases, there was no unwind of read
from read-ahead xlator, thus resulting in hang.
RCA:
====
In certain cases, ioc_readv() issues STACK_WIND_TAIL() instead
of STACK_WIND(). One such case is when inode_ctx for that file
is not present (can happen if readdirp was called, and populates
md-cache and serves all the lookups from cache).
Consider the following graph:
...
io-cache (parent)
|
readdir-ahead
|
read-ahead
...
Below is the code snippet of ioc_readv calling STACK_WIND_TAIL:
ioc_readv()
{
...
if (!inode_ctx)
STACK_WIND_TAIL (frame, FIRST_CHILD (frame->this),
FIRST_CHILD (frame->this)->fops->readv, fd,
size, offset, flags, xdata);
/* Ideally, this stack_wind should wind to readdir-ahead:readv()
but it winds to read-ahead:readv(). See below for
explaination.
*/
...
}
STACK_WIND_TAIL (frame, obj, fn, ...)
{
frame->this = obj;
/* for the above mentioned graph, frame->this will be readdir-ahead
* frame->this = FIRST_CHILD (frame->this) i.e. readdir-ahead, which
* is as expected
*/
...
THIS = obj;
/* THIS will be read-ahead instead of readdir-ahead!, as obj expands
* to "FIRST_CHILD (frame->this)" and frame->this was pointing
* to readdir-ahead in the previous statement.
*/
...
fn (frame, obj, params);
/* fn will call read-ahead:readv() instead of readdir-ahead:readv()!
* as fn expands to "FIRST_CHILD (frame->this)->fops->readv" and
* frame->this was pointing ro readdir-ahead in the first statement
*/
...
}
Thus, the readdir-ahead's readv() implementation will be skipped, and
ra_readv() will be called with frame->this = "readdir-ahead" and
this = "read-ahead". This can lead to corruption / hang / other problems.
But in this perticular case, when 'frame->this' and 'this' passed
to ra_readv() doesn't match, it causes ra_readv() to call ra_readv()
again!. Thus the logic of read-ahead readv() falls apart and leads to
hang.
Solution:
=========
Modify STACK_WIND_TAIL() as:
STACK_WIND_TAIL (frame, obj, fn, ...)
{
next_xl = obj /* resolve obj as the variables passed in obj macro
can be overwritten in the further instrucions */
next_xl_fn = fn /* resolve fn and store in a tmp variable, before
modifying any variables */
frame->this = next_xl;
...
THIS = next_xl;
...
next_xl_fn (frame, next_xl, params);
...
}
BUG: 1399018
Change-Id: Ie662ac8f18fa16909376f1e59387bc5b886bd0f9
Signed-off-by: Poornima G <pgurusid@redhat.com>
Reviewed-on: http://review.gluster.org/15934
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
during crawl
Backport of: http://review.gluster.org/15880
If granular name indices are already in existence for a volume, and
before they are healed, granular entry heal be disabled, a crawl on
indices/xattrop will clear the changelogs on these directories. When
their corresponding entry-changes indices are crawled subsequently,
if it is found that the directories don't need heal anymore, the
granular indices are not cleaned up.
This patch fixes that problem by ensuring that the zero-xattrop
also deletes the stale indices at the level of index translator.
Change-Id: Iae0a560c1c9d37b083cad89f16d3dcf83c4f7dc7
BUG: 1398501
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/15927
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Backport of http://review.gluster.org/#/c/15005/9.
GlusterD as of now was blindly assuming that the brick port which was
already allocated would be available to be reused and that assumption
is absolutely wrong.
Solution : On first attempt, we thought GlusterD should check if the
already allocated brick ports are free, if not allocate new port and
pass it to the daemon. But with that approach there is a possibility
that if PMAP_SIGNOUT is missed out, the stale port will be given back
to the clients where connection will keep on failing. Now given the
port allocation always start from base_port, if everytime a new port
has to be allocated for the daemons, the port range will still be
under control. So this fix tries to clean up old port using
pmap_registry_remove () if any and then goes for pmap_registry_alloc ()
This patch is being ported to 3.8 branch because, the brick process
blindly re-using old port, without registering with the pmap server,
causes snapd daemon to not start properly, even though snapd registers
with the pmap server. With this patch, all the brick processes and
snapd will register with the pmap server to either get the same port,
or a new port, and avoid port collision.
> Reviewed-on: http://review.gluster.org/15005
> Smoke: Gluster Build System <jenkins@build.gluster.org>
> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
> Reviewed-by: Avra Sengupta <asengupt@redhat.com>
(cherry picked from commit c3dee6d35326c6495591eb5bbf7f52f64031e2c4)
Change-Id: If54a055d01ab0cbc06589dc1191d8fc52eb2c84f
BUG: 1369766
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-on: http://review.gluster.org/15308
Tested-by: Avra Sengupta <asengupt@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Backport of http://review.gluster.org/15826
On recieving a rename fop, marker_rename() stores the,
oldloc and newloc in its 'local' struct, once the rename
is done, the xtime marker(last updated time) is set on
the file, but sending a setxattr fop. When upcall
receives the setxattr fop, the loc->inode is NULL and
it crashes. The loc->inode can be NULL only in one valid
case, i.e. in rename case where the inode of new loc
can be NULL. Hence, marker should have filled the inode
of the new_loc before issuing a setxattr.
> Reviewed-on: http://review.gluster.org/15826
> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
> Reviewed-by: Kotresh HR <khiremat@redhat.com>
> Smoke: Gluster Build System <jenkins@build.gluster.org>
> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
(cherry picked from commit 46e5466850311ee69e6ae9a11c2bba2aabadd5de)
Change-Id: Id638f678c3daaf4a5c29b970b58929d377ae8977
BUG: 1396418
Signed-off-by: Poornima G <pgurusid@redhat.com>
Reviewed-on: http://review.gluster.org/15878
Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Summary:
Halo prove tests were racy in a couple of ways. First, they raced
against the self-heal daemon (e.g. write to volume with two bricks
up and then assert that only two bricks have data file; but shd will
properly copy file to third brick sooner or later). Fix by disabling
shd in such tests.
Second, tests rely on pings to complete and set halo state as
expected, but do not check for this. If writing begins before initial
pings complete, all bricks may be up and receive the data. Fix by adding
explicit check for halo child states.
Test Plan:
prove tests/basic/halo*.t
(prior to this changeset, would fail within ~10 iterations on my
devserver and almost always on centos regression. Now runs overnight
without failure on my devserver).
Reviewers:
Subscribers:
Tasks:
Blame Revision:
Change-Id: If6823540dd4e23a19cc495d5d0e8b0c6fde9a3bd
Signed-off-by: Kevin Vigor <kvigor@fb.com>
Reviewed-on: http://review.gluster.org/16325
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Summary:
- Implements "Hybrid" halo mounts which perform write FOPs synchronously
to all regions and read FOPs asynchronously to those nodes within the
region.
- Leverages the fact that the inode cache stores "hints" as to the
clean/dirty state of upto 16 replicas in the cluster; upon refresh of
an inode we can do a fresh lookup to get a "wise" node if none exist
in our region
- Activated via the cluster.halo-hybrid-mode option
Test Plan:
- Run AFR prove tests
- Run halo prove tests
Reviewers: kvigor, sshreyas
Reviewed By: kvigor, sshreyas
FB-commit-id: aca760757afd45e4de2e28dc31b87a73ee52f12d
Change-Id: Ic6728ce93b7b96e3151dccdebccd30e007f4750c
Signed-off-by: Kevin Vigor <kvigor@fb.com>
Reviewed-on: http://review.gluster.org/16306
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Summary:
- SHD is now excluded from the max-replicas policy. We'd need
to make an SHD specific tunable for this to make tests reliably
pass, and frankly it probably makes things more intuitive having
SHD excluded (i.e. SHD can always see everything).
- Updated the halo-failover-enabled test, I think it's a bit more clear
now, and works reliably. halo.t fixed after fixing the SHD
max-replicas bug.
Test Plan: - Run prove tests -> https://phabricator.fb.com/P19872728
Reviewers: dph, sshreyas
Reviewed By: sshreyas
FB-commit-id: e425e6651cd02691d36427831b6b8ca206d0f78f
Change-Id: I57855ef99628146c32de59af475b096bd91d6012
Signed-off-by: Kevin Vigor <kvigor@fb.com>
Reviewed-on: http://review.gluster.org/16305
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Summary:
- Changes halo-decision to be based on the lowest halo value observed
- Adds halo-min-sample option to wait until N latency samples have been
gathered prior to activating halos.
- Fixed 3 edge cases where halo's weren't being correctly
config'd, or not configured as quickly as is possible. Namely:
1. Don't mark a child down if there's no better alternative (and you'd
no longer satisfy min/max replicas); fixes unneccessary flapping.
2. If a child goes down and this causes us to fall below max_replicas,
swap in a warm child immediately if it is within our halo latency
(don't wait around for the next "ping"); swaps in a new child
immediately helping with resiliency.
3. If the child latency is within the halo, and it's currently marked
up, mark it down if it's the highest latency child and the number of
children is > max_replicas; this will allow us to support the
SHD use-case where we can "beam" a single copy to a geo and have it
replicate within the geo after that.
- More commenting
Test Plan:
- Run halo prove tests
- Pointed compiled code at gfsglobal.prn2, tested out an NFS daemon and
FUSE mounts to ensure they worked as expected on a large scale
cluster.
Reviewers: dph, jackl, cjh, mmckeen
Reviewed By: mmckeen
FB-commit-id: 7e2e8ae6b8ec62a5e0b31c9fd6100c81795b3424
Change-Id: Iba2b2f1bc848b4546cb96117ff1895f83953a4f8
Signed-off-by: Kevin Vigor <kvigor@fb.com>
Reviewed-on: http://review.gluster.org/16304
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Summary:
- Adds "halo-failover-enabled" option to enable/disable failing over to a brick outside of the defined halo to satisfy min-replicas
- There are some use-cases where failing over to a brick which is out of region will be undesirable. I such cases we will more than likely opt to have more replicas within the region to tolerate the loss of a single replica in that region without losing quorum.
- Fixed quorum accounting problem as well, now correctly goes RO in case where we lose a brick and aren't able to swap one in for some reason (fail-over not enabled or otherwise)
Test Plan:
- run prove -v tests/basic/halo.t
- run prove -v tests/basic/halo-disable.t
- run prove -v tests/basic/halo-failover-enabled.t
- run prove -v tests/basic/halo-failover-disabled.t
Reviewers: dph, cjh, jackl, mmckeen
Reviewed By: mmckeen
Conflicts:
xlators/cluster/afr/src/afr.h
xlators/mount/fuse/utils/mount.glusterfs.in
Change-Id: Ia3ebf83f34b53118ca4491a3c4b66a178cc9795e
Signed-off-by: Kevin Vigor <kvigor@fb.com>
Reviewed-on: http://review.gluster.org/16275
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Summary:
Several prove tests use the 'launch_cluster' function to set up a
clustered volume. This replies on using multiple local IP
addresses, one for each server. Since IPV6 provides only ::1 as
a local address, as opposed to IPv4's complete 127.x.x.x subnet,
this cannot work in a pure IPv6 environment.
However, FB systems do at least have enough IPv4 stack to talk
locally, so fix launch_cluster to work properly when default
transport is IPv6.
To do this:
1) explicitly set transport.address-family volume option to inet in
launch_cluster().
2) teach glusterd to honor transport.address-family when connecting
to peer glusterds in glusterd_friend_rpc_create(). Previously
transport.address-family was used only for binding local socket,
not for communicating with peers.
Test Plan:
prove -f --timer ./tests/basic/glusterd/arbiter-volume-probe.t
Reviewers:
Subscribers:
Tasks:
Blame Revision:
Change-Id: I077d8549dcdbe4919ac7df34856a4b2d1428cdb6
Signed-off-by: Kevin Vigor <kvigor@fb.com>
Reviewed-on: http://review.gluster.org/16225
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Summary:
- Enforces FUSE/gNFSd/SHD/rebalance rejection of writes when all subvolumes are
beyond the value set in "cluster.min-free-disk"
- Fixes existing code paths to be more intuitive & straightforward
- Write path now honors min-free-disk
- Adds test to ensure feature doesn't break in future
- This is a port of D2981282 to 3.8
Signed-off-by: Shreyas Siravara <sshreyas@fb.com>
Change-Id: I76923bf76178fe589aa1a26bd1970cf8d009642a
Reviewed-on: http://review.gluster.org/16153
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
Tested-by: Shreyas Siravara <sshreyas@fb.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Summary:
- After the afr-common.c refactor the quorum accounting broke, this diff
fixes things so when an alternate brick is swapped in to provide the
min-replicas quorum accounting works correctly and the FS doesn't go
RO. In short, the magic which makes halo clusters seamlessly fail-over
to a remote region for writes broke :).
Test Plan:
prove -v tests/basic/halo-failover.t ->
https://phabricator.fb.com/P12179467
Reviewers: dph, jackl, cjh
Reviewed By: cjh
Subscribers: meyering
Differential Revision: https://phabricator.fb.com/D1390670
Tasks: 4117827
Change-Id: I2d7fb8ca1e80cd1b21cc12b11b0a3db812321080
Signed-off-by: Kevin Vigor <kvigor@fb.com>
Reviewed-on: http://review.gluster.org/16203
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Summary:
When building tests using build_tester in FB environment, we need to
pass additional library flags. Plumb up to the --with-fbextras option.
Test Plan:
prove -f --timer ./tests/basic/gfapi/anonymous_fd.t
Reviewers:
Subscribers:
Tasks:
Blame Revision:
Change-Id: Ibd04851234f9367d6a3192ba2d4440ce3fa4a45b
Signed-off-by: Kevin Vigor <kvigor@fb.com>
Reviewed-on: http://review.gluster.org/16204
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Summary:
- Adds the fops-sanity test for GFProxy so we have more coverage for
both regular FUSE mounts and GFProxy FUSE mounts.
Signed-off-by: Shreyas Siravara <sshreyas@fb.com>
Change-Id: I0f17315098a48b7295d6eb5f92616e9c7dfc278a
Reviewed-on: http://review.gluster.org/16189
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Kevin Vigor <kvigor@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Summary:
- Add a configurable minimum free space for bricks, using the new
options storage.min-free-disk (analagous to cluster.min-free-disk,
and using the same units: either a percentage or an
absolute number of bytes) and storage.freespace-check-interval
(how frequently to check free space, in seconds).
- This is a cherry-pick of D2920210 to 3.8
Signed-off-by: Shreyas Siravara <sshreyas@fb.com>
Change-Id: I4b87e421aad023e49b5972c6e61539670a818411
Reviewed-on: http://review.gluster.org/16176
Tested-by: Shreyas Siravara <sshreyas@fb.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Kevin Vigor <kvigor@fb.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Summary:
- Captures the error of an operation in the FOP sample.
- Cherry-pick of D3306106 to io-stats
Signed-off-by: Shreyas Siravara <sshreyas@fb.com>
Change-Id: Ia32a5b34bbd36981ac693a8829c70fa74b02d38d
Reviewed-on: http://review.gluster.org/16175
Tested-by: Shreyas Siravara <sshreyas@fb.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Kevin Vigor <kvigor@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Summary:
Don't blindly 'df', target the volume in question.
Test Plan:
runtests.sh
Reviewers:
Subscribers:
Tasks:
Blame Revision:
Change-Id: Ic2c5883dd102835db64be9594657257e20711ba0
Signed-off-by: Kevin Vigor <kvigor@fb.com>
Reviewed-on: http://review.gluster.org/16182
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Tested-by: Shreyas Siravara <sshreyas@fb.com>
Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Summary:
- Master option for halo geo-replication
- Added prove test for halo mode
- Updated options do values which should work "out of the box" for most use-cases, just run "enable" halo mode and you are done, the options can be tweaked as needed from there.
Test Plan:
- Enabled option, verified halo works, disabled option verified async
behavior is disabled
- Ran "prove -v tests/basic/halo.t" -> https://phabricator.fb.com/P12074204
Reviewers: jackl, dph, cjh
Reviewed By: cjh
Subscribers: meyering
Differential Revision: https://phabricator.fb.com/D1386675
Tasks: 4117827
Conflicts:
xlators/cluster/afr/src/afr-self-heal-common.c
xlators/cluster/afr/src/afr.h
Change-Id: Ib704528d438c3a42150e30974eb6bb01d9e795ae
Signed-off-by: Kevin Vigor <kvigor@fb.com>
Reviewed-on: http://review.gluster.org/16172
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Summary:
- This diff adds the ability to track paths in our FOP samples.
- It adds a function called `attach_iosstat_to_inode()` which
attaches a special struct containing the filename, to the inode.
- This diff attaches a struct, `ios_local` to each frame, and tracks
paths, locs, fds, and inodes depending on the fop that it is
executing.
- Operations done on this inode can then reference this path.
Signed-off-by: Shreyas Siravara <sshreyas@fb.com>
Change-Id: Ie43b2193f66d8c7f59b5d07293e07d6120e3b20a
Reviewed-on: http://review.gluster.org/16149
Tested-by: Shreyas Siravara <sshreyas@fb.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Kevin Vigor <kvigor@fb.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Summmary:
- Adds a new server-side daemon called gfproxyd & a new FUSE client
called gfproxy-client
- Adds a new translator called AHA (Advanced High-Availability) that
manages failover between gfproxy servers & FOP replay for failed FOPS.
Signed-off-by: Shreyas Siravara <sshreyas@fb.com>
Change-Id: Iba0bd54e6b4035b8d7914aab64bcac9e93089dd7
Reviewed-on: http://review.gluster.org/16136
Tested-by: Shreyas Siravara <sshreyas@fb.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Kevin Vigor <kvigor@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
page_size and window_size
Summary:
- It adds a configurable option for "trickling-writes".
- Makes `__wb_preprocess_winds()` use `wb_inode->window_conf` rather
than `page_size`, so that the window-size option is actually
respected.
- This is a port of D3576122 & D3738605 to 3.8.
Test Plan:
- Prove test which looks @ brick-level FOPs and ensures
that they fall in the right write-size bucket.
Reviewed By: rwareing
Signature: t1:3576122:1468892648:6923a6a19b18888577ce5173b5c9cb9531f941e7
Change-Id: I379a9f2f0c4768c9052b7e9dd71c5f0469cb2d68
Signed-off-by: Shreyas Siravara <sshreyas@fb.com>
Reviewed-on: http://review.gluster.org/16079
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Kevin Vigor <kvigor@fb.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Summary:
- http://review.gluster.org/#/c/16078/ made rebalance faster and broke
the test.
- We made the file bigger so rebalance takes longer.
Change-Id: I86f08d3d53bbff8373e954b8ae57a3a9a5942b74
Signed-off-by: Shreyas Siravara <sshreyas@fb.com>
Reviewed-on: http://review.gluster.org/16133
Reviewed-by: Kevin Vigor <kvigor@fb.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Summary:
- Retry the volfile server when the initial connection fails. The
default connect attempts is currently 200.
- This is a port D2174716 & D3792748 to 3.8.
Test Plan: Tested retry functionality on devserver.
Reviewed By: rwareing
Signed-off-by: Shreyas Siravara <sshreyas@fb.com>
Change-Id: I22810d52b43107cc156483649fc160612677858a
Reviewed-on: http://review.gluster.org/16077
Tested-by: Shreyas Siravara <sshreyas@fb.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Kevin Vigor <kvigor@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Summary:
- Motivation: Prevents cluster instability by mis-behaving clients
causing bricks to OOM due to inode/entry lock pile-ups.
- Adds option to strip clients of entry/inode locks after N seconds
- Adds option to clear ALL locks should the revocation threshold get hit
- Adds option to clear all or granted locks should the max-blocked
threshold get hit (can be used in combination w/ revocation-clear-all).
- Options are:
features.locks-revocation-secs <integer; 0 to disable>
features.locks-revocation-clear-all [on/off]
features.locks-revocation-max-blocked <integer>
- Adds monkey-locking option to ignore 1% of unlock requests (dev only)
features.locks-monkey-unlocking [on/off]
- Adds logging to indicate revocation event & reason
Test Plan:
First you will need TWO fuse mounts for this repro. Call them /mnt/patchy1 & /mnt/patchy2.
1. Enable monkey unlocking on the volume:
gluster vol set patchy features.locks-monkey-unlocking on
2. From the "patchy1", use DD or some other utility to begin writing to a file,
eventually the dd will hang due to the dropped unlocked requests. This now
simulates the broken client. Run:
for i in {1..1000};do dd if=/dev/zero of=/mnt/patchy1/testfile bs=1k count=10;done'
...this will eventually hang as the unlock request has been lost.
3. Goto another window and setup the mount "patchy2" @ /mnt/patchy2, and
observe that 'echo "hello" >> /mnt/patchy2/testfile" will hang due to the
inability of the client to take out the required lock.
4. Next, re-start the test this time enabling lock revocation; use a timeout of
2-5 seconds for testing:
'gluster vol set patchy features.locks-revocation-secs <2-5>'
5. Wait 2-5 seconds before executing step 3 above this time. Observe that this
time the access to the file will succeed, and the writes on patchy1 will
unblock until they hit another failed unlock request due to
"monkey-unlocking".
Change-Id: I814b9f635fec53834a26db634d1300d9a61057d8
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/14816
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-on: http://review.gluster.org/16086
Tested-by: Shreyas Siravara <sshreyas@fb.com>
Reviewed-by: Kevin Vigor <kvigor@fb.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
- This diff changes all locations in the code to prefer inet6 family
instead of inet. This will allow change GlusterFS to operate
via IPv6 instead of IPv4 for all internal operations while still
being able to serve (FUSE or NFS) clients via IPv4.
- The changes apply to NFS as well.
- This diff ports D1892990, D1897341 & D1896522 to the 3.8 branch.
Test Plan: Prove tests!
Reviewers: dph, rwareing
Signed-off-by: Shreyas Siravara <sshreyas@fb.com>
Change-Id: I34fdaaeb33c194782255625e00616faf75d60c33
Reviewed-on: http://review.gluster.org/16059
Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
Tested-by: Shreyas Siravara <sshreyas@fb.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|