glusterfs.git/xlators/cluster/afr/src/afr-common.c, branch v3.8.3

cluster/afr: Prevent split-brain when bricks are brought off and on in cyclic order

2016-08-22T10:22:36+00:00

        Backport of: http://review.gluster.org/15080

When the bricks are brought offline and then online in cyclic
order while writes are in progress on a file, thanks to inode
refresh in write txns, AFR will mostly fail the write attempt
when the only good copy is offline. However, there is still a
remote possibility that the file will run into split-brain if
the brick that has the lone good copy goes offline *after* the
inode refresh but *before* the write txn completes (I call it
in-flight split-brain in the patch for ease of reference),
requiring intervention from admin to resolve the split-brain
before the IO can resume normally on the file. To get around this,
the patch does the following things:
i) retains the dirty xattrs on the file
ii) avoids marking the last of the good copies as bad (or accused)
    in case it is the one to go down during the course of a write.
iii) fails that particular write with the appropriate errno.

This way, we still have one good copy left despite the split-brain situation
which when it is back online, will be chosen as source to do the heal.

> Change-Id: I9ca634b026ac830b172bac076437cc3bf1ae7d8a
> BUG: 1363721
> Signed-off-by: Krutika Dhananjay 
> Reviewed-on: http://review.gluster.org/15080
> Tested-by: Pranith Kumar Karampuri 
> Smoke: Gluster Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Ravishankar N 
> Reviewed-by: Oleksandr Natalenko 
> NetBSD-regression: NetBSD Build System 
> Reviewed-by: Pranith Kumar Karampuri 
(cherry picked from commit fcb5b70b1099d0379b40c81f35750df8bb9545a5)

Change-Id: I157f1025aebd6624fa3d412abc69a4ae6f2fe9e0
BUG: 1367272
Signed-off-by: Krutika Dhananjay 
Signed-off-by: Oleksandr Natalenko 
Reviewed-on: http://review.gluster.org/15221
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

afr: some coverity fixes

2016-07-28T13:54:49+00:00

Note: This is a backport of http://review.gluster.org/14895.
It contains:
i) fixes that prevent deadlocks (afr-common.c).
ii) fixes over-writing op-errno=ENOMEM with possible other values
(afr-inode-read.c).
iii) prevents doing further operations with a NULL dictionary if
allocation fails (afr-self-heal-data.c).
iv) prevents falsely marking a sink as healed if metadata heal fails
midway(afr-self-heal-metadata.c).
v) other minor fixes.

Considering the above are not trivial fixes, the patch is a good
candidate for merging in 3.8 branch.

Thanks to Krutika for a cleaner way to track inode refs in
afr_set_split_brain_choice().

Change-Id: I2d968d05b815ad764b7e3f8aa9ad95a792b3c1df
BUG: 1360556
Signed-off-by: Ravishankar N 
Reviewed-on: http://review.gluster.org/15018
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Krutika Dhananjay 
Reviewed-by: Pranith Kumar Karampuri

cluster/afr: Unwind with xdata in inode-write fops

2016-06-13T10:22:31+00:00

When there is a failure afr was not unwinding xdata to xlators above.
xdata need not be NULL on failures. So it is important to send it
to parent xlators.

 >Change-Id: Ic36aac10a79fa91121961932dd1920cb1c2c3a4c
 >BUG: 1340623
 >Signed-off-by: Pranith Kumar K 
 >Reviewed-on: http://review.gluster.org/14567
 >Smoke: Gluster Build System 
 >NetBSD-regression: NetBSD Build System 
 >CentOS-regression: Gluster Build System 
 >Reviewed-by: Jeff Darcy 

BUG: 1342178
Change-Id: Idd74d2bc898fe5aef537ab48c1754510030c8825
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/14618
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Niels de Vos

cluster/afr: Unwind xdata_rsp even in case of failures

2016-06-10T15:36:21+00:00

DHT expects GF_PREOP_CHECK_FAILED to be present in xdata_rsp in case of mkdir
failures because of stale layout. But AFR was unwinding null xdata_rsp in case
of failures. This was leading to mkdir failures just after remove-brick. Unwind
the xdata_rsp in case of failures to make sure the response from brick reaches
dht.

 >BUG: 1340623
 >Change-Id: Idd3f7b95730e8ea987b608e892011ff190e181d1
 >Signed-off-by: Pranith Kumar K 
 >Reviewed-on: http://review.gluster.org/14553
 >NetBSD-regression: NetBSD Build System 
 >Reviewed-by: Ravishankar N 
 >Smoke: Gluster Build System 
 >CentOS-regression: Gluster Build System 
 >Reviewed-by: Anuradha Talur 
 >Reviewed-by: Krutika Dhananjay 

BUG: 1342178
Change-Id: Iaacadcad0f76979fb250bd008b8e43f0e7acf642
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/14617
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Krutika Dhananjay 
Reviewed-by: Niels de Vos

cluster/afr: Do not inode_link in afr

2016-05-25T10:31:27+00:00

Race is explained at
https://bugzilla.redhat.com/show_bug.cgi?id=1337405#c0

This patch also handles performing of self-heal with shd-pid.
Also performs the healing with this->itable's inode rather than
main itable.

 >BUG: 1337405
 >Change-Id: Id657a6623b71998b027b1dff6af5bbdf8cab09c9
 >Signed-off-by: Pranith Kumar K 
 >Reviewed-on: http://review.gluster.org/14422
 >Smoke: Gluster Build System 
 >NetBSD-regression: NetBSD Build System 
 >CentOS-regression: Gluster Build System 
 >Reviewed-by: Krutika Dhananjay 

BUG: 1337870
Change-Id: Ifb476eeed2ff73a44e481d64074599ab0707c725
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/14455
Smoke: Gluster Build System 
Reviewed-by: Krutika Dhananjay 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Niels de Vos

cluster/afr: Refresh inode for inode-write fops in need

2016-05-24T21:42:30+00:00

Problem:
If a named fresh-lookup is done on an loc and the fop fails on one of the
bricks or not sent on one of the bricks, but by the time response comes to afr,
if the brick is up, 'can_interpret' will be set to false in afr_lookup_done(),
this will lead to inode-ctx for that inode to be not set, this can lead to EIO
in case of a transaction as it depends on 'readable' array to be available by
that point.

Fix:
Refresh inode for inode-write fops for the ctx to be set if it is not already
done at the time of named fresh-lookup or if the file is in split-brain where
we need to perform one more refresh before failing the fop to check if the file
is still in split-brain or not.

 >BUG: 1336612
 >Change-Id: I5c50b62c8de06129b8516039f7c252e5008c47a5
 >Signed-off-by: Pranith Kumar K 
 >Reviewed-on: http://review.gluster.org/14368
 >Smoke: Gluster Build System 
 >NetBSD-regression: NetBSD Build System 
 >Reviewed-by: Ravishankar N 
 >CentOS-regression: Gluster Build System 

BUG: 1337822
Change-Id: I0f904ebaa78b99cbb11546e08c9fc1562e9a3eef
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/14449
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Krutika Dhananjay 
Reviewed-by: Anuradha Talur 
CentOS-regression: Gluster Build System 
Reviewed-by: Niels de Vos

cluster/afr: Handle non-zero source in heal-info decision

2016-05-14T14:12:38+00:00

        Backport of http://review.gluster.org/14302

Problem:
Spurious entries are reported in heal info when the mount is on second/third
brick of the replica pair because local-child is given preference in selecting
source. The code is supposed to suggest the file needs heal if the (source < 0)
(failure code path), but instead it is written as if any non-zero value
is considered failure.

Fix:
Treat +ve source as success case

BUG: 1335433
Change-Id: Iede983b6560622964e91306405587da3f1de5748
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/14303
Reviewed-by: Krutika Dhananjay 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Ravishankar N 
Reviewed-by: Anuradha Talur 
Reviewed-by: Niels de Vos

cluster/afr: Entry self-heal performance enhancements

2016-04-30T01:21:56+00:00

Change-Id: I52da41dff5619492b656c2217f4716a6cdadebe0
BUG: 1269461
Signed-off-by: Krutika Dhananjay 
Reviewed-on: http://review.gluster.org/12442
Reviewed-by: Pranith Kumar Karampuri 
NetBSD-regression: NetBSD Build System 
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System

afr: propagate child up event after timeout

2016-04-27T07:35:19+00:00

Problem: During mount, afr waits for response from all its children before
notifying the parent xlator. In a 1x2 replica volume , if one of the nodes is
down, the mount will hang for more than a minute until child down is received
from the client xlator for that node.

Fix:
When parent up is received by afr, start a 10 second timer. In the timer call
back, if we receive a successful child up from atleast one brick, propagate the
event to the parent xlator.

Change-Id: I31e57c8802c1a03a4a5d581ee4ab82f3a9c8799d
BUG: 1054694
Signed-off-by: Ravishankar N 
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/11113
NetBSD-regression: NetBSD Build System 
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System

cluster/afr: Fix spurious entries in heal info

2016-04-20T11:51:20+00:00

Problem:
Locking schemes in afr-v1 were locking the directory/file completely during
self-heal. Newer schemes of locking don't require Full directory, file locking.
But afr-v2 still has compatibility code to work-well with older clients, where
in entry-self-heal it takes a lock on a special 256 character name which can't
be created on the fs. Similarly for data self-heal there used to be a lock on
(LLONG_MAX-2, 1). Old locking scheme requires heal info to take sh-domain locks
before examining heal-state.  If it doesn't take sh-domain locks, then there is
a possibility of heal-info hanging till self-heal completes because of
compatibility locks.  But the problem with heal-info taking sh-domain locks is
that if two heal-info or shd, heal-info try to inspect heal state in parallel
using trylocks on sh-domain, there is a possibility that both of them assuming
a heal is in progress. This was leading to spurious entries being shown in
heal-info.

Fix:
As long as there is afr-v1 way of locking, we can't fix this problem with
simple solutions.  If we know that the cluster is running newer versions of
locking schemes, in those cases we can give accurate information in heal-info.
So introduce a new option called 'locking-scheme' which if it is 'granular'
will give correct information in heal-info. Not only that, Extra network hops
for taking compatibility locks, sh-domain locks in heal info will not be
necessary anymore. Thus it improves performance.

BUG: 1322850
Change-Id: Ia563c5f096b5922009ff0ec1c42d969d55d827a3
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/13873
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Ashish Pandey 
Reviewed-by: Anuradha Talur 
Reviewed-by: Krutika Dhananjay