glusterfs.git/xlators/cluster/afr/src, branch v3.8.4

cluster/afr: copy loc before passing to syncop

2016-08-23T15:08:42+00:00

Problem:
When io-threads is enabled on the client side, io-threads destroys the
call-stub in which the loc is stored as soon as the c-stack unwinds.
Because afr is creating a syncop with the address of loc passed in
setxattr by the time syncop tries to access it, io-threads would have
already freed the call-stub. This will lead to crash.

Fix:
Copy loc to frame->local and use it's address.

> Reviewed-on: http://review.gluster.org/15070
> CentOS-regression: Gluster Build System 
> Smoke: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> Reviewed-by: Ravishankar N 

BUG: 1369042
Change-Id: I16987e491e24b0b4e3d868a6968e802e47c77f7a
Signed-off-by: Pranith Kumar K 
Signed-off-by: Oleksandr Natalenko 
Reviewed-on: http://review.gluster.org/15233
Smoke: Gluster Build System 
Reviewed-by: Ravishankar N 
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System

cluster/afr: Prevent split-brain when bricks are brought off and on in cyclic order

2016-08-22T10:22:36+00:00

        Backport of: http://review.gluster.org/15080

When the bricks are brought offline and then online in cyclic
order while writes are in progress on a file, thanks to inode
refresh in write txns, AFR will mostly fail the write attempt
when the only good copy is offline. However, there is still a
remote possibility that the file will run into split-brain if
the brick that has the lone good copy goes offline *after* the
inode refresh but *before* the write txn completes (I call it
in-flight split-brain in the patch for ease of reference),
requiring intervention from admin to resolve the split-brain
before the IO can resume normally on the file. To get around this,
the patch does the following things:
i) retains the dirty xattrs on the file
ii) avoids marking the last of the good copies as bad (or accused)
    in case it is the one to go down during the course of a write.
iii) fails that particular write with the appropriate errno.

This way, we still have one good copy left despite the split-brain situation
which when it is back online, will be chosen as source to do the heal.

> Change-Id: I9ca634b026ac830b172bac076437cc3bf1ae7d8a
> BUG: 1363721
> Signed-off-by: Krutika Dhananjay 
> Reviewed-on: http://review.gluster.org/15080
> Tested-by: Pranith Kumar Karampuri 
> Smoke: Gluster Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Ravishankar N 
> Reviewed-by: Oleksandr Natalenko 
> NetBSD-regression: NetBSD Build System 
> Reviewed-by: Pranith Kumar Karampuri 
(cherry picked from commit fcb5b70b1099d0379b40c81f35750df8bb9545a5)

Change-Id: I157f1025aebd6624fa3d412abc69a4ae6f2fe9e0
BUG: 1367272
Signed-off-by: Krutika Dhananjay 
Signed-off-by: Oleksandr Natalenko 
Reviewed-on: http://review.gluster.org/15221
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

cluster/afr: Bug fixes in txn codepath

2016-08-17T10:22:15+00:00

        Backport of: http://review.gluster.org/15145

AFR sets transaction.pre_op[] array even before actually doing the
pre-op on-disk. Therefore, AFR must not only consider the pre_op[] array
but also the failed_subvols[] information before setting the pre_op_done[]
flag. This patch fixes that.

Change-Id: I726b2acd4025e2e75a87dea547ca6e088bc82c00
BUG: 1367272
Signed-off-by: Krutika Dhananjay 
Reviewed-on: http://review.gluster.org/15164
Reviewed-by: Ravishankar N 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Pranith Kumar Karampuri 
Smoke: Gluster Build System 
Reviewed-by: Anuradha Talur 
CentOS-regression: Gluster Build System

afr: some coverity fixes

2016-07-28T13:54:49+00:00

Note: This is a backport of http://review.gluster.org/14895.
It contains:
i) fixes that prevent deadlocks (afr-common.c).
ii) fixes over-writing op-errno=ENOMEM with possible other values
(afr-inode-read.c).
iii) prevents doing further operations with a NULL dictionary if
allocation fails (afr-self-heal-data.c).
iv) prevents falsely marking a sink as healed if metadata heal fails
midway(afr-self-heal-metadata.c).
v) other minor fixes.

Considering the above are not trivial fixes, the patch is a good
candidate for merging in 3.8 branch.

Thanks to Krutika for a cleaner way to track inode refs in
afr_set_split_brain_choice().

Change-Id: I2d968d05b815ad764b7e3f8aa9ad95a792b3c1df
BUG: 1360556
Signed-off-by: Ravishankar N 
Reviewed-on: http://review.gluster.org/15018
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Krutika Dhananjay 
Reviewed-by: Pranith Kumar Karampuri

afr, index: Clean up stale directory and file indices in granular entry sh

2016-07-15T13:43:13+00:00

	Backport of: http://review.gluster.org/14832

Specifically when a directory tree is removed (rm -rf)
while a brick is down, both the directory index and the
name indices of the files and subdirs under it will remain.
Self-heal will need to pick up these and remove them.

Towards this, afr sh will now also crawl indices/entry-changes
and call an rmdir on the dir if the directory index is stale.

On the brick side, rmdir fop has been implemented for index xl,
which would delete the directory index and its contents if present
in a synctask.

Change-Id: I08f45201adca56737ec2be1aab5433aebaefefd0
BUG: 1355609
Signed-off-by: Krutika Dhananjay 
Reviewed-on: http://review.gluster.org/14920
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Ravishankar N 
Reviewed-by: Jeff Darcy

core, shard: Make shards inherit main file's O_DIRECT flag if present

2016-06-27T15:17:55+00:00

        Backport of: http://review.gluster.org/14191

If the application opens a file with O_DIRECT, the shards'
anon fds would also need to inherit the flag. Towards this,
shard xl would be passing the odirect flag in the @flags parameter
to the WRITEV fop. This will be used in anon fd resolution
and subsequent opening by posix xl.

 >Change-Id: I3a0593fa46cc25e390a5762a0354b469c2a1532d
 >BUG: 1342903
 >Signed-off-by: Krutika Dhananjay 
 >Reviewed-on: http://review.gluster.org/14663
 >Smoke: Gluster Build System 
 >CentOS-regression: Gluster Build System 
 >NetBSD-regression: NetBSD Build System 
 >Reviewed-by: Jeff Darcy 

Change-Id: Ibfc164aa7f9eecd6993255f1c03557f2ec35ac8c
BUG: 1347553
Signed-off-by: Krutika Dhananjay 
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/14754
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

afr:Don't wind reads for files in metadata split-brain

2016-06-27T12:19:55+00:00

Backport of http://review.gluster.org/#/c/13389/

Problem: For a read on  a file in metadata split-brain:
1.lookup_done resets event_generation to zero.
2. readv is issued, goes to inode refresh due to mismatching event_gen.
3. After refresh is successful, we update event_generation, data and
metdata readable.
3. We then call afr_read_txn_refresh_done() which in turn calls
afr_inode_get_readable() but doesn't check for EIO. So afr_readv_wind
is called with local->readable (which is populated with data_readable),
thus winding the read to a brick.
4. Also, further parallel reads that come directly go to the wind path
because there is no inode_refresh needed.

Fix:
1.For any afr_read_txn(), readable must be an intersection of data and metadata
readable.
2.Check for EIO in afr_read_txn_refresh_done().

Change-Id: I22dd221fdfaf96d7aced2f474e28ed1337d69f0e
BUG: 1349879
Signed-off-by: Ravishankar N 
(cherry picked from commit 7a1c1e2904701496968ed14b6d7479fb706c3188)
Reviewed-on: http://review.gluster.org/14790
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

cluster/afr: Unwind with xdata in inode-write fops

2016-06-13T10:22:31+00:00

When there is a failure afr was not unwinding xdata to xlators above.
xdata need not be NULL on failures. So it is important to send it
to parent xlators.

 >Change-Id: Ic36aac10a79fa91121961932dd1920cb1c2c3a4c
 >BUG: 1340623
 >Signed-off-by: Pranith Kumar K 
 >Reviewed-on: http://review.gluster.org/14567
 >Smoke: Gluster Build System 
 >NetBSD-regression: NetBSD Build System 
 >CentOS-regression: Gluster Build System 
 >Reviewed-by: Jeff Darcy 

BUG: 1342178
Change-Id: Idd74d2bc898fe5aef537ab48c1754510030c8825
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/14618
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Niels de Vos

afr: Consider ENOSPC and EDQUOT as symmetric errors

2016-06-13T10:14:49+00:00

Backport of http://review.gluster.org/#/c/14604/

Problem:
Since commit 8eaa3506ead4f11b81b146a9e56575c79f3aad7b, in replica 3, if a
brick is down and a create fails on the other 2 brick with EDQUOT, we consider
it an unsymmetric error and hence do not do post-op. So the dirty xattr
remains set on the parent dir, leading to conservative merges during heal when
all bricks are up. i.e. a file deleted on the source might re-appear after heal.

Fix:
Consider ENOSPC and  EDQUOT as symmetric errors since there is no
possibility of partial inode or entry modification operations possible when
quota is enabled. IOW, if quota reports EDQUOT, the no. of bytes written
(or not written) will be the same on all bricks of the replica.
Likewise, the entry operation (create, mkdir...) will either succeed or
not succeed on all bricks.

Change-Id: Iacb1108e9ef4a918e36242fb4a957455133744e9
BUG: 1344559
Signed-off-by: Ravishankar N 
Reviewed-on: http://review.gluster.org/14687
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri 
Reviewed-by: Niels de Vos

cluster/afr: Unwind xdata_rsp even in case of failures

2016-06-10T15:36:21+00:00

DHT expects GF_PREOP_CHECK_FAILED to be present in xdata_rsp in case of mkdir
failures because of stale layout. But AFR was unwinding null xdata_rsp in case
of failures. This was leading to mkdir failures just after remove-brick. Unwind
the xdata_rsp in case of failures to make sure the response from brick reaches
dht.

 >BUG: 1340623
 >Change-Id: Idd3f7b95730e8ea987b608e892011ff190e181d1
 >Signed-off-by: Pranith Kumar K 
 >Reviewed-on: http://review.gluster.org/14553
 >NetBSD-regression: NetBSD Build System 
 >Reviewed-by: Ravishankar N 
 >Smoke: Gluster Build System 
 >CentOS-regression: Gluster Build System 
 >Reviewed-by: Anuradha Talur 
 >Reviewed-by: Krutika Dhananjay 

BUG: 1342178
Change-Id: Iaacadcad0f76979fb250bd008b8e43f0e7acf642
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/14617
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Krutika Dhananjay 
Reviewed-by: Niels de Vos