| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
...in various self-heal code paths.
Originally found by Pranith in __afr_selfheal_name_impunge ()
Also change __afr_selfheal_assign_gfid() to send lookup only on those
bricks that don't have a gfid matching that of the source.
> Reviewed-on: https://review.gluster.org/18065
> Smoke: Gluster Build System <jenkins@build.gluster.org>
> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
(cherry picked from commit d594900dbca92c356152be65fce16f77c402117c)
Change-Id: I70a2ccd750a2af92c5fc36e0eefb2b6125404b4a
BUG: 1487319
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: https://review.gluster.org/18173
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
At the moment when we have replica 3 or arbiter setup, even when
lk succeeds on just one brick we give success to application which
is wrong
Fix:
Consider quorum-number of successes as success when quorum is enabled.
BUG: 1461792
Change-Id: I5789e6eb5defb68f8a0eb9cd594d316f5cdebaea
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: https://review.gluster.org/17524
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
AFR was returning the node uuid of the first node for every file if
the replica set was healthy, which was resulting in only one node
migrating all the files.
Fix:
With this patch AFR returns the list of node_uuids to the upper layer,
so that they can decide on which node to migrate which files, resulting
in improved performance. Ordering of node uuids will be maintained based
on the ordering of the bricks. If a brick is down, then the node uuid
for that will be set to all zeros.
Change-Id: I73ee0f9898ae473584fdf487a2980d7a6db22f31
BUG: 1366817
Signed-off-by: karthik-us <ksubrahm@redhat.com>
Reviewed-on: https://review.gluster.org/17084
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Halo Geo-replication is a feature which allows Gluster or NFS clients to write
locally to their region (as defined by a latency "halo" or threshold if you
like), and have their writes asynchronously propagate from their origin to the
rest of the cluster. Clients can also write synchronously to the cluster
simply by specifying a halo-latency which is very large (e.g. 10seconds) which
will include all bricks.
In other words, it allows clients to decide at mount time if they desire
synchronous or asynchronous IO into a cluster and the cluster can support both
of these modes to any number of clients simultaneously.
There are a few new volume options due to this feature:
halo-shd-latency: The threshold below which self-heal daemons will
consider children (bricks) connected.
halo-nfsd-latency: The threshold below which NFS daemons will consider
children (bricks) connected.
halo-latency: The threshold below which all other clients will
consider children (bricks) connected.
halo-min-replicas: The minimum number of replicas which are to
be enforced regardless of latency specified in the above 3 options.
If the number of children falls below this threshold the next
best (chosen by latency) shall be swapped in.
New FUSE mount options:
halo-latency & halo-min-replicas: As descripted above.
This feature combined with multi-threaded SHD support (D1271745) results in
some pretty cool geo-replication possibilities.
Operational Notes:
- Global consistency is gaurenteed for synchronous clients, this is provided by
the existing entry-locking mechanism.
- Asynchronous clients on the other hand and merely consistent to their region.
Writes & deletes will be protected via entry-locks as usual preventing
concurrent writes into files which are undergoing replication. Read operations
on the other hand should never block.
- Writes are allowed from _any_ region and propagated from the origin to all
other regions. The take away from this is care should be taken to ensure
multiple writers do not write the same files resulting in a gfid split-brain
which will require resolution via split-brain policies (majority, mtime &
size). Recommended method for preventing this is using the nfs-auth feature to
define which region for each share has RW permissions, tiers not in the origin
region should have RO perms.
TODO:
- Synchronous clients (including the SHD) should choose clients from their own
region as preferred sources for reads. Most of the plumbing is in place for
this via the child_latency array.
- Better GFID split brain handling & better dent type split brain handling
(i.e. create a trash can and move the offending files into it).
- Tagging in addition to latency as a means of defining which children you wish
to synchronously write to
Test Plan:
- The usual suspects, clang, gcc w/ address sanitizer & valgrind
- Prove tests
Reviewers: jackl, dph, cjh, meyering
Reviewed By: meyering
Subscribers: ethanr
Differential Revision: https://phabricator.fb.com/D1272053
Tasks: 4117827
Change-Id: I694a9ab429722da538da171ec528406e77b5e6d1
BUG: 1428061
Signed-off-by: Kevin Vigor <kvigor@fb.com>
Reviewed-on: http://review.gluster.org/16099
Reviewed-on: https://review.gluster.org/16177
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
__afr_set_in_flight_sb_status(), which resets event_gen to zero, is
called if failed_subvols[i] is non-zero for any brick. But failed_subvols[i]
is true even if the brick was down *before* the transaction started.
Hence say if 1 brick is down in a replica-3, every writev that comes
will trigger an inode refresh because of this resetting, as seen from
the no. of FSTATs in the profile info in the BZ.
Fix:
Reset event gen only if the brick was previously a valid read child and
the FOP failed on it the first time.
Also `s/afr_inode_read_subvol_reset/afr_inode_event_gen_reset` because
the function only resets event gen and not the data/metadata readable.
Change-Id: I603ae646cbde96995c35db77916e2ed80b602a91
BUG: 1409206
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/16309
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Even on errors like ENOENT, AFR logs split-brain after
read-txn refresh, introduced by commit a07ddd8f.
This can be a cause of much panic and confusion and needs to be fixed.
* Also fixed this issue in write-txns.
* Fixed afr read txns to log about split-brain only after knowing that
there is no split-brain choice configured.
* Removed code duplication
* Fixed incorrect passing of error code in afr_write_txn_refresh_done()
(the function was passing -0 as errno to gf_msg().
Change-Id: I354f454ce5bf0e5f00bc27916eb597367cb7d927
BUG: 1411625
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/16362
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before http://review.gluster.org/#/c/15673/, after inode refresh, we
failed read txns in case of EIO or event_generation being zero. For
write transactions, the check was only for EIO. 15673 re-factored the
code to fail both read and write when event_generation=0. This seems to
have caused a regression as explained in the BZ.
This patch restores that behaviour in afr_txn_refresh_done().
Change-Id: Ib8e116506badce6f58b55827dbe403d95069d744
BUG: 1406224
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/16205
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bugs found and fixed:
1. Use correct subvolume index in pre-op-writev compound cbk
2. Prevent use-after-free of local->compound_args members in
compound fops cbk in protocol/client
3. Fix xdata and xattr leaks in client_process_response
4. Fix possible leak of xdata in client_pre_writev() in
test mode.
5. Free req->compound_req_array.compound_req_array_val as well
after freeing its members
6. Free tmp_rsp->flock.lk_owner.lk_owner_val in LK fop.
Change-Id: I15b646d7d4e0e5cd4ea3d2d6452c815cf2eaf68f
BUG: 1401218
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/16020
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
of cache invalidation(upcall).
Issue:
------
When a cache invalidation is recieved as a result of changing
pending xattr, the read_subvol is reset. Consider the below chain
of execution:
CHILD_DOWN
...
afr_readv
...
afr_inode_refresh
...
afr_inode_read_subvol_reset <- as a result of pending xattr set by
some other client GF_EVENT_UPCALL will
be sent
afr_refresh_done -> this results in an EIO, as the read subvol was
reset by the end of the afr_inode_refresh
Solution:
---------
When GF_EVENT_UPCALL is recieved, instead of resetting read_subvol,
set a variable need_refresh in inode_ctx, the next time some one
starts a txn, along with event gen, need_rrefresh also needs to
be checked.
Change-Id: Ifda21a7a8039b8874215e1afa4bdf20f7d991b58
BUG: 1396952
Signed-off-by: Poornima G <pgurusid@redhat.com>
Reviewed-on: http://review.gluster.org/15892
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problems:
1) Inodelk is not taking quorum into account
2) finodelk, [f]entrylk are not implemented correctly
3) By default afr doesn't go for non-blocking parallel locks.
Fix:
Implemented a common framework which can be used by
[f]inodelk/[f]entrylk. Used quorum for the same.
Change-Id: I239f13875a065298630d266941df10cfa3addc85
BUG: 1369077
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/15802
Tested-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Consider a replica setup, where one mount writes data to a
file and the other mount reads the file. In afr, read operations
are not transaction based, a brick(read subvolume) is chosen as
a part of lookup or other operations, read is always wound only
to the read subvolume, even if there was write from a different client
that failed on this brick. This stale read continues until there is
a lookup or any write operation from the mount point. Currently, this
is not a major issue, as a lookup is issued before every read and it will
switch the read subvolume to a correct one. But with the plan of
increasing md-cache timeout to 600s, the stale read problem will be
more pronounced, i.e. stale read can continue for 600s(or more if cascaded
with readdirp), as there will be no lookups.
Solution:
Afr doesn't have any built-in solution for stale read(without affecting
the performance). The solution that came up, was to use upcall. When a file
on any brick is marked bad for the first time, upcall sends a notification
to all the clients that had recently accessed the file. The solution has
2 parts:
- Identifying when a file is marked bad, on any of the bricks,
for the first time
- Client side actions on recieving the notifications
Identifying when a file is marked bad on any of the bricks for the first time:
-----------------------------------------------------------------------------
The idea is to track xattrop in upcall. xattrop currently comes with 2 afr
xattrs - afr dirty bit and afr pending xattrs.
Dirty xattr is set to 1 before every write, and is unset if write succeeds.
In certain scenarios, dirty xattr can be 0 and still the file could be bad
copy. Hence do not track dirty xattr.
Pending xattr is set on the good copy, indicating the other bricks that have
bad copy. It is still not as simple as, notifying when any of the pending xattrs
change. It could lead to flood of notifcations, in case the other brick is
completely down or consistantly failing. Hence it is important to notify only
once, the first time a good copy is marked bad.
Client side actions on recieving pending xattr change, notification:
--------------------------------------------------------------------
md-cache will invalidate the cache of that file, so that further lookup is
passed down to afr and hence update the read subvolume. Invalidating only in
md-cache is not enough, consider the folling oder of opertaions:
- pending xattr invalidation - invalidate md-cache
- readdirp on the bad read subvolume - fill md-cache
- lookup (served from md-cache)
- read - wound to the old read subvol.
Hence, along with invalidating md-cache, it is very important to reset the
read subvolume for that file, in afr.
Design Credit: Anuradha Talur, Ravishankar N
1. xattrop doesn't carry info saying post op/pre op.
2. Pre xattrop will have 0 value for all pending xattrs,
the cbk of pre xattrop carries the on-disk xattr value.
Non zero indicated healing is required.
3. Post xattrop will have non zero value for any of the
pending xattrs, if the fop failed on any of the bricks.
Change-Id: I469cbc111714c433984fe1c922be2ef113c25804
BUG: 1211863
Signed-off-by: Poornima G <pgurusid@redhat.com>
Reviewed-on: http://review.gluster.org/15398
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ib06ece3cce1b10d28d6d2953da28444f5c2457ad
BUG: 1290304
Signed-off-by: Anuradha Talur <atalur@redhat.com>
Reviewed-on: http://review.gluster.org/15014
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
When tiering/rebalance does migrations and afr with 2-way replica is in
picture, migration can read stale data if the source brick goes down and writes
to the destination. After this deletion of the file leads to permanent loss of
data after migration.
Fix:
Rebalance/tiering should migrate only when the data is definitely not stale. So
introduce an option in afr called consistent-io which will be enabled in
migration daemons.
BUG: 1306398
Change-Id: I750f65091cc70a3ed4bf3c12f83d0949af43920a
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/13425
Reviewed-by: Anuradha Talur <atalur@redhat.com>
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cyclic order
When the bricks are brought offline and then online in cyclic
order while writes are in progress on a file, thanks to inode
refresh in write txns, AFR will mostly fail the write attempt
when the only good copy is offline. However, there is still a
remote possibility that the file will run into split-brain if
the brick that has the lone good copy goes offline *after* the
inode refresh but *before* the write txn completes (I call it
in-flight split-brain in the patch for ease of reference),
requiring intervention from admin to resolve the split-brain
before the IO can resume normally on the file. To get around this,
the patch does the following things:
i) retains the dirty xattrs on the file
ii) avoids marking the last of the good copies as bad (or accused)
in case it is the one to go down during the course of a write.
iii) fails that particular write with the appropriate errno.
This way, we still have one good copy left despite the split-brain situation
which when it is back online, will be chosen as source to do the heal.
Change-Id: I9ca634b026ac830b172bac076437cc3bf1ae7d8a
BUG: 1363721
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/15080
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-by: Oleksandr Natalenko <oleksandr@natalenko.name>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
...and remove their definitons from EC and AFR.
Also `s/alloca+memset0/alloca0` wherever it is used.
Change-Id: I3b71e596d12a7d8900f5d761af6b98305c8874d5
BUG: 1366226
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/15147
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Thanks to Krutika for a cleaner way to track inode refs in
afr_set_split_brain_choice().
Change-Id: I2d968d05b815ad764b7e3f8aa9ad95a792b3c1df
BUG: 1355604
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/14895
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When there is a failure afr was not unwinding xdata to xlators above.
xdata need not be NULL on failures. So it is important to send it
to parent xlators.
Change-Id: Ic36aac10a79fa91121961932dd1920cb1c2c3a4c
BUG: 1340623
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/14567
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DHT expects GF_PREOP_CHECK_FAILED to be present in xdata_rsp in case of mkdir
failures because of stale layout. But AFR was unwinding null xdata_rsp in case
of failures. This was leading to mkdir failures just after remove-brick. Unwind
the xdata_rsp in case of failures to make sure the response from brick reaches
dht.
BUG: 1340623
Change-Id: Idd3f7b95730e8ea987b608e892011ff190e181d1
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/14553
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anuradha Talur <atalur@redhat.com>
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce cluster.favorite-child-policy which when enabled with
[ctime|mtime|size|majority], automatically heals files that are in
split-brian.
The majority policy will not pick a source if there is no majority.
The other three policies pick the first brick with a valid reply and
non-zero ctime/mtime/size as source.
Change-Id: I3c099a0404082213860f74f2c9b4d207cfaedb76
BUG: 1328224
Original-author: Richard Wareing <rwareing@fb.com>
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/14026
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anuradha Talur <atalur@redhat.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
If a named fresh-lookup is done on an loc and the fop fails on one of the
bricks or not sent on one of the bricks, but by the time response comes to afr,
if the brick is up, 'can_interpret' will be set to false in afr_lookup_done(),
this will lead to inode-ctx for that inode to be not set, this can lead to EIO
in case of a transaction as it depends on 'readable' array to be available by
that point.
Fix:
Refresh inode for inode-write fops for the ctx to be set if it is not already
done at the time of named fresh-lookup or if the file is in split-brain where
we need to perform one more refresh before failing the fop to check if the file
is still in split-brain or not.
BUG: 1336612
Change-Id: I5c50b62c8de06129b8516039f7c252e5008c47a5
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/14368
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I52da41dff5619492b656c2217f4716a6cdadebe0
BUG: 1269461
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/12442
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: During mount, afr waits for response from all its children before
notifying the parent xlator. In a 1x2 replica volume , if one of the nodes is
down, the mount will hang for more than a minute until child down is received
from the client xlator for that node.
Fix:
When parent up is received by afr, start a 10 second timer. In the timer call
back, if we receive a successful child up from atleast one brick, propagate the
event to the parent xlator.
Change-Id: I31e57c8802c1a03a4a5d581ee4ab82f3a9c8799d
BUG: 1054694
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/11113
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Locking schemes in afr-v1 were locking the directory/file completely during
self-heal. Newer schemes of locking don't require Full directory, file locking.
But afr-v2 still has compatibility code to work-well with older clients, where
in entry-self-heal it takes a lock on a special 256 character name which can't
be created on the fs. Similarly for data self-heal there used to be a lock on
(LLONG_MAX-2, 1). Old locking scheme requires heal info to take sh-domain locks
before examining heal-state. If it doesn't take sh-domain locks, then there is
a possibility of heal-info hanging till self-heal completes because of
compatibility locks. But the problem with heal-info taking sh-domain locks is
that if two heal-info or shd, heal-info try to inspect heal state in parallel
using trylocks on sh-domain, there is a possibility that both of them assuming
a heal is in progress. This was leading to spurious entries being shown in
heal-info.
Fix:
As long as there is afr-v1 way of locking, we can't fix this problem with
simple solutions. If we know that the cluster is running newer versions of
locking schemes, in those cases we can give accurate information in heal-info.
So introduce a new option called 'locking-scheme' which if it is 'granular'
will give correct information in heal-info. Not only that, Extra network hops
for taking compatibility locks, sh-domain locks in heal info will not be
necessary anymore. Thus it improves performance.
BUG: 1322850
Change-Id: Ia563c5f096b5922009ff0ec1c42d969d55d827a3
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/13873
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Ashish Pandey <aspandey@redhat.com>
Reviewed-by: Anuradha Talur <atalur@redhat.com>
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
BUG: 1221737
Change-Id: I0ed71a72f0e33bd733723e00a01cf28378c5534e
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/13755
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Smoke: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch is part two change to prevent data loss
in a replicate volume on doing a add-brick operation.
Problem: After doing add-brick, there is a chance
that self heal might happen from the newly added
brick rather than the source brick, leading to data loss.
Solution: Mark pending changelogs on afr children for
the new afr-child so that heal is performed in the
correct direction.
Change-Id: I11871e55eef3593aec874f92214a2d97da229b17
BUG: 1276203
Signed-off-by: Anuradha Talur <atalur@redhat.com>
Reviewed-on: http://review.gluster.org/12454
Smoke: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is better to choose local brick as source if possible to prevent
over the wire read thus saving on bandwidth. Also changed code to not
attempt data-heal if 'source' is selected as arbiter.
Change-Id: I9a328d0198422280b13a30ab99545370a301dfea
BUG: 1314150
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/13585
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
Tested-by: Krutika Dhananjay <kdhananj@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a heal is needed after inode refresh (lookup, read_txn), launch it in
the background instead of blocking the fop (that triggered refresh) until the
heal happens.
afr_replies_interpret() is modified such that the heal is
launched only if atleast one sink brick is up.
Max. no of heals that can happen in parallel is configurable via the
'background-self-heal-count' volume option. Any number greater than that
is put in a wait queue whose length is configurable via
'heal-wait-queue-leng' volume option. If the wait queue is also full,
further heals will be ignored.
Default values: background-self-heal-count=8, heal-wait-queue-leng=128
Change-Id: I1d4a52814cdfd43d90591b6d2ad7b6219937ce70
BUG: 1297172
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/13207
Smoke: Gluster Build System <jenkins@build.gluster.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
seek() is like a read(), copied the same semantics.
Change-Id: I100b741d9bfacb799df318bb081f2497c0664927
BUG: 1220173
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/11483
Smoke: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now heal-info does an open() on the file being examined so that
the client at some point sees open-fd count being > 1 and releases
the eager-lock so that heal-info doesn't remain blocked forever
until IO completes.
Change-Id: Icc478098e2bc7234408728b54d8185102b3540dc
BUG: 1297695
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/13326
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Things done :
1) during lookup and inode_refresh as part of read_txn,
request is sent to detect if heal is required or not.
2) If heal is required, be conservative in setting the
readdirp entry inodes to NULL, otherwise don't be.
3) Self-heal-daemon now crawls both indices/xattrop
and indices/dirty directory while healing.
Change-Id: Ic4a4da63fb7e0726eab5f341a200859b29cf7eb7
BUG: 1250803
Signed-off-by: Anuradha Talur <atalur@redhat.com>
Reviewed-on: http://review.gluster.org/12507
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: If a file is created with zeroes ('dd', 'fallocate' etc.) when
a brick is down, the self-heal does not write the zeroes to the sink
after it comes up. Consequenty, there is a mismatch in disk-usage
amongst the bricks of the replica.
Fix: If we definitely know that the file is not sparse, then write the
zeroes to the sink even if the checksums match.
Change-Id: Ic739b3da5dbf47d99801c0e1743bb13aeb3af864
BUG: 1272460
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/12371
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. When winding the pre-op, transaction.pre_op[i] is set. If the pre-op fails,
transaction.failed_subvols[i] is set. If if fails on all chidren, we can
directly proceed to unlock (via afr_changelog_post_op_now) without trying
to wind the write, fail and then go to unlock.
2. 'fop_subvols' seems to be an unused variable, hence removing it.
Change-Id: I9525628daf48082e979b0093fa0478934495e61f
BUG: 1272362
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/12368
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: Anuradha Talur <atalur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
replace-brick setxattr is not performed inside a synctask. This can lead
to hangs if the setxattr is executed by epoll thread, as the epoll
thread will be waiting for replies to come where as epoll thread is the
thread that needs to epoll_ctl for reading from socket and listen.
Fix:
Move replace-brick to synctask to prevent epoll thread hang.
This patch is in line with the fix performed in
http://review.gluster.org/#/c/12163/
Change-Id: I6a71038bb7819f9e98f7098b18a6cee34805868f
BUG: 1262345
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/12169
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On executing `getfattr -n replica.split-brain-status <file>` on mount,
there is a possibility that the mount hangs. To avoid this hang,
fetch the split-brain-status of a file in synctask.
Change-Id: I87b781419ffc63248f915325b845e3233143d385
BUG: 1262345
Signed-off-by: Anuradha Talur <atalur@redhat.com>
Reviewed-on: http://review.gluster.org/12163
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
When xlators above afr do [f]xattrop when one of the bricks is down, after the
brick comes backup, the metadata is not healed because [f]xattrop is not
considered a transaction.
Fix:
Treat [f]xattrop as transaction so that changes done by xlators above afr are
marked for heal when some of the bricks were down at the time of [f]xattrop.
Change-Id: Iea180f9a456509847c3cd8d5d59a0cdc2712d334
BUG: 1248887
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/11809
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: AFR serves statfs from the brick having the least free space
available. Since the size to be allocated to the arbiter brick in a 3
way replica is supposed to be considerably lesser than the other 2
bricks, statfs will be served from this brick which is incorrect.
Fix: Don't serve statfs from the arbiter brick.
Change-Id: I5af098b9c50626f52cf3d7dbb060bf754c797f05
BUG: 1251346
Reported-by: Fredrik Brandt <fredrikb@denlillaplaneten.se>
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/11857
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For directories, block metadata FOPS.
For non-directories, block data and metadata FOPS.
Do not block entry FOPS.
Change-Id: Id7f656f4a513b9d33c457dd7f2d58028dbef8e61
BUG: 1235007
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/11371
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
| |
calculation
Change-Id: I12c1e4f67f4ec4affbe13d7daf871044a8a2a12e
BUG: 1235216
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/11373
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of including config.h in each file, and have the additional
config.h included from the compiler commandline (-include option).
When a .c file tests for a certain #define, and config.h was not
included, incorrect assumtions were made. With this change, it can not
happen again.
BUG: 1222319
Change-Id: I4f9097b8740b81ecfe8b218d52ca50361f74cb64
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/10808
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) Provided setfattr command to set timeout for split-brain
choice.
2) If split-brain inspection/resolution is being done
from the mount for a file, ref the inode when
split-brain-choice is set.
This inode will be unconditionally unref-ed after timeout
seconds set by the user/default otherwise.
3) Updated the doc and testcase to reflect the changes.
Change-Id: I15c9037dee28855f21e680e7e3632e1f48dba4e1
BUG: 1209104
Signed-off-by: Anuradha <atalur@redhat.com>
Reviewed-on: http://review.gluster.org/10134
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add logic in afr to work in conjunction with the arbiter xlator when a
replica 3 arbiter volume is created. More specifically, this patch:
* Enables full locks for afr data transaction for such volumes.
* Removes the upfront marking of pending xattrs at the time of pre-op
and defer it to post-op. (This is an arbiter independent change and is made for all afr transactions.)
* After pre-op stage, check if we can proceed with the fop stage without
ending up in split-brain by examining the changelog xattrs.
* Unwinds the fop with failure if only one source was available at the
time of pre-op and the fop happened to fail on particular source brick.
* Skips data self-heal if arbiter brick is the only source available.
* Adds the arbiter-count option to the shd graph.
This patch is a part of the arbiter logic implementation for 3 way AFR
details of which can be found at http://review.gluster.org/#/c/9656/
Change-Id: I9603db9d04de5626eb2f4d8d959ef5b46113561d
BUG: 1199985
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/10258
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Logic for adding the 'glusterd_brickinfo->group' member and using it to
find the brick positon has been taken from http://review.gluster.org/#/c/9919.
Thanks to Jeff Darcy for that.
This patch is a part of the arbiter logic implementation for 3 way AFR
details of which can be found at http://review.gluster.org/#/c/9656/
Change-Id: Idbfe4f29ee8e098e0102def8f38b32314316b188
BUG: 1199985
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/10257
Tested-by: NetBSD Build System
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Tested-by: Krishnan Parthasarathi <kparthas@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Changed the implementation of marker xattr handling to take just a
function which populates important data that is different from
default 'gauge' values and subvolumes where the call needs to be
wound.
- Removed duplicate code I found while reading the code and moved it to
cluster_marker_unwind. Removed unused structure members.
- Changed dht/afr/stripe implementations to follow the new implementation
- Implemented marker xattr handling for ec.
Change-Id: Ib0c3626fe31eb7c8aae841eabb694945bf23abd4
BUG: 1200372
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/9892
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Part 2/2 patch to enable users analyze and resolve
split-brain.
This patch enables :
1) Users to inspect the files in data and metadata split-brain.
2) Resolve the split-brain.
Both using a series of setfattr commands.
Consider a volume "test" with 2 bricks.
1) To inspect a file f1:
setfattr -n replica.split-brain-choice -v test-client-0 f1
After the execution of this command, if no read_subvol
is found, reads will be served from test-client-0 (corresponding
to brick-0).
2) To resolve split-brain :
setfattr -n replica.split-brain-heal-finalize -v test-client-0 f1
Execution of this command will lead to the resolution
of data and metadata split-brain with subvol mentioned in the
command (test-client-0 here) as the source and the rest as sink.
Change-Id: Ia20f3ee5abd3119e3d54fcc599f1e55ac65fd179
BUG: 1191396
Signed-off-by: Anuradha <atalur@redhat.com>
Reviewed-on: http://review.gluster.org/9743
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Having this particular check which was introduced by
commit c78998c39f0857ea7aacba360632c148afc54a55 causes a drop in
performance in readdirp. So the behavior is made configurable with this
patch.
Change-Id: I2858fc18b3539df7aa6d3f489e0d5cfaeb8a9b3c
BUG: 1202669
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/9917
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Provide a way of disabling reads when quorum is not met.
Change-Id: Ic4f57c2b87a0b8514600759de3a7a47e217fe3b5
BUG: 1187885
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/9543
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These will be used by both afr and ec. Moved syncop_dirfd, syncop_ftw,
syncop_dir_scan functions also into syncop-utils.c
Change-Id: I467253c74a346e1e292d36a8c1a035775c3aa670
BUG: 1177601
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/9740
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-by: Anuradha Talur <atalur@redhat.com>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch is one part to enable users analyze and resolve
split-brain.
Problem : To know if a file is in data/metadata split-brain
Solution : Performing "getfattr -n afr.split-brain-status
<path-to-file>" from the mount provides this information.
Also provides the list of afr children to analyse to
get more information.
Change-Id: I4d9b429794759a906371416cb84c84a212e2c7b9
BUG: 1191396
Signed-off-by: Anuradha <atalur@redhat.com>
Reviewed-on: http://review.gluster.org/9633
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Extend the AFR heal command to include automated split-brain resolution.
This patch [3/3] is the final patch for afr automated split-brain resolution
implementation.
"gluster volume heal <VOLNAME> [full | statistics [heal-count [replica
<HOSTNAME:BRICKNAME>]] |info [healed | heal-failed | split-brain]| split-brain
{bigger-file <FILE> |source-brick <HOSTNAME:BRICKNAME> [<FILE>]}]"
The new additions being:
1.gluster volume heal <VOLNAME> split-brain bigger-file <FILE>
Locates the replica containing the FILE, selects bigger-file as source and
completes heal.
2.gluster volume heal <VOLNAME> split-brain source-brick <HOSTNAME:BRICKNAME>
<FILE>
Selects <FILE> present in <HOSTNAME:BRICKNAME> as source and completes heal.
3.gluster volume heal <VOLNAME> split-brain <HOSTNAME:BRICKNAME>
Selects all split-brained files in <HOSTNAME:BRICKNAME> as source and completes
heal.
Note: <FILE> can be either the full file name as seen from the root of the
volume (or) the gfid-string representation of the file, which sometimes gets
displayed in the heal info command's output.
Entry/gfid split-brain resolution is not supported.
Example can be found in the test case.
Change-Id: I4649733922d406f14f28ee9033a5cb627b9538b3
BUG: 1136769
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/9377
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Afr winds inodelk calls without any order, so blocking inodelks
from two different mounts can lead to dead lock when mount1 gets
the lock on brick-1 and blocked on brick-2 where as mount2 gets
lock on brick-2 and blocked on brick-1
Fix:
Serialize the inodelks whether they are blocking inodelks or
non-blocking inodelks.
Non-blocking locks also need to be serialized.
Otherwise there is a chance that both the mounts which issued same
non-blocking inodelk may endup not acquiring the lock on any-brick.
Ex:
Mount1 and Mount2 request for full length lock on file f1. Mount1 afr may
acquire the partial lock on brick-1 and may not acquire the lock on brick-2
because Mount2 already got the lock on brick-2, vice versa. Since both the
mounts only got partial locks, afr treats them as failure in gaining the locks
and unwinds with EAGAIN errno.
Change-Id: Ie6cc3d564638ab3aad586f9a4064d81e42d52aef
BUG: 1176008
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/9372
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|