glusterfs.git/xlators/cluster/afr/src/afr-common.c, branch v3.10.12

cluster/afr: Honor default timeout of 5min for analyzing split-brain files

2017-11-27T13:50:23+00:00

Problem:
After setting split-brain-choice option to analyze the file to resolve
the split brain using the command
"setfattr -n replica.split-brain-choice -v "choiceX" "
should allow to access the file from mount for default timeout of 5mins.
But the timeout was not honored and was able to access the file even after
the timeout.

Fix:
Call the inode_invalidate() in afr_set_split_brain_choice_cbk() so that
it will triger the cache invalidate after resetting the timer and the
split brain choice. So the next calls to access the file will fail with EIO.

Change-Id: I698cb833676b22ff3e4c6daf8b883a0958f51a64
BUG: 1514388
Signed-off-by: karthik-us 
(cherry picked from commit 933ec57ccda2c1ba5ce6f207313c3b6802e67ca3)

afr: heal metadata in discover code path

2017-09-17T13:37:32+00:00

******************************************************
Backport of: https://review.gluster.org/18202
Also added loc_is_nameless() to libglusterfs since the patch that
introduced it in master was not backported to release-3.10.

Note: 18202 is a squash of  17850 and 18187 in master.
******************************************************

During graph switch, if fuse sends nameless (gfid) lookups, afr takes
the discover code path to serve it. If there are pending metadata heals,
they do not happen unless an inode refresh happens as a part of
discover (which is not guaranteed to happen always).

This patch fixes it by attempting metadata heal as a part of discover,
just like how it is done in lookup code path.

Change-Id: I87c493045b9225741cad173bf3f645848697032e
BUG: 1492010
Signed-off-by: Ravishankar N 
Reviewed-on: https://review.gluster.org/18304
Smoke: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

afr: check validity of afr_reply

2017-09-17T12:53:24+00:00

...in various self-heal code paths.

Originally found by Pranith in __afr_selfheal_name_impunge ()

Also change __afr_selfheal_assign_gfid() to send lookup only on those
bricks that don't have a gfid matching that of the source.

> Reviewed-on: https://review.gluster.org/18065
> Smoke: Gluster Build System 
> CentOS-regression: Gluster Build System 
(cherry picked from commit d594900dbca92c356152be65fce16f77c402117c)
Change-Id: I70a2ccd750a2af92c5fc36e0eefb2b6125404b4a
BUG: 1491995
Signed-off-by: Pranith Kumar K 
Signed-off-by: Ravishankar N 
Reviewed-on: https://review.gluster.org/18303
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

afr: add errno to afr_inode_refresh_done()

2017-06-16T21:54:56+00:00

Backport of https://review.gluster.org/17413 and
https://review.gluster.org/17436

Problem:
When parellel `rm -rf`s were being done from cifs clients, opendir might
fail on some replicas with ENOENT. DHT ignores partial opendir failures
in dht_fd_cbk() and winds readdirs on those replicas. Afr inode refresh
(as a part of readdirp read_txn) sees in its fd context that the state
of the fds is *not* AFR_FD_OPENED and bails out to
afr_inode_refresh_done() without doing a refresh. When this happens, the
errno is set as EIO due to lack of readable subvols, logging split-brain
messages in the logs.

Fix:
Introduce an errno argument to afr_inode_refresh_do() to bail out with
the right error value when inode refresh is not performed.

Change-Id: Id88e604278abb8df47750d45258d9e6dde710600
BUG: 1457732
Signed-off-by: Ravishankar N 
Reviewed-on: https://review.gluster.org/17516
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

cluster/afr: Return the list of node_uuids for the subvolume

2017-05-30T13:13:27+00:00

Problem:
AFR was returning the node uuid of the first node for every file if
the replica set was healthy, which was resulting in only one node
migrating all the files.

Fix:
With this patch AFR returns the list of node_uuids to the upper layer,
so that they can decide on which node to migrate which files, resulting
in improved performance. Ordering of node uuids will be maintained based
on the ordering of the bricks. If a brick is down, then the node uuid
for that will be set to all zeros.

> Reviewed-on: https://review.gluster.org/17084
> Reviewed-by: Pranith Kumar Karampuri 
> Tested-by: Pranith Kumar Karampuri 
> Smoke: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
(cherry picked from commit 0a50167c0a8f950f5a1c76442b6c9abea466200d)

Change-Id: I73ee0f9898ae473584fdf487a2980d7a6db22f31
BUG: 1451561
Signed-off-by: karthik-us 
Reviewed-on: https://review.gluster.org/17337
Tested-by: Ravishankar N 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra Talur

afr: send the correct iatt values in fsync cbk

2017-05-14T08:27:33+00:00

Problem:
afr unwinds the fsync fop with an iatt buffer from one of its children
on whom fsync was successful. But that child might not be a valid read
subvolume for that inode because of pending heals or because it happens
to be the arbiter brick etc. Thus we end up sending the wrong iatt to
mdcache which will in turn serve it to the application on a subsequent
stat call as reported in the BZ.

Fix:
Pick a child on whom the fsync was successful *and* that is readable as
indicated in the inode context.

> Reviewed-on: https://review.gluster.org/17227
> CentOS-regression: Gluster Build System 
> Reviewed-by: Pranith Kumar Karampuri 
> NetBSD-regression: NetBSD Build System 
> Smoke: Gluster Build System 
(cherry picked from commit 1a8fa910ccba7aa941f673302c1ddbd7bd818e39)

Change-Id: Ie8647289219cebe02dde4727e19a729b3353ebcf
BUG: 1444892
RCA'ed-by: Miklós Fokin 
Signed-off-by: Ravishankar N 
Reviewed-on: https://review.gluster.org/17247
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Raghavendra Talur 
CentOS-regression: Gluster Build System

afr: all children of AFR must be up to resolve s-brain

2017-02-15T12:31:45+00:00

Problem:
The various split-brain resolution policies (favorite-child-policy based,
CLI based and mount (get/setfattr) based) attempt to resolve split-brain
even when not all bricks of replica are up. This can be a problem when
say in a replica 3, the only good copy is down and the other 2 bricks
are up and blame each other (i.e. split-brain). We end up healing the
file in such a  case and allow I/O on it.

Fix:
A decision on whether the file is in split-brain or not must be taken
only if we are able to examine the afr xattrs of *all* bricks of a given
replica.

Signed-off-by: Ravishankar N 
> Reviewed-on: https://review.gluster.org/16476
> Smoke: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Pranith Kumar Karampuri 

(cherry picked from commit 0e03336a9362e5717e561f76b0c543e5a197b31b)
Change-Id: Icddb1268b380005799990f5379ef957d84639ef9
BUG: 1420982
Reviewed-on: https://review.gluster.org/16587
Tested-by: Ravishankar N 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

afr: Avoid resetting event_gen when brick is always down

2017-01-13T02:54:02+00:00

Problem:
__afr_set_in_flight_sb_status(), which resets event_gen to zero, is
called if failed_subvols[i] is non-zero for any brick. But failed_subvols[i]
is true even if the brick was down *before* the transaction started.
Hence say if 1 brick is down in  a replica-3, every writev that comes
will trigger an inode refresh because of this resetting, as seen from
the no. of FSTATs in the profile info in the BZ.

Fix:
Reset event gen only if the brick was previously a valid read child and
the FOP failed on it the first time.

Also `s/afr_inode_read_subvol_reset/afr_inode_event_gen_reset` because
the function only resets event gen and not the data/metadata readable.

Change-Id: I603ae646cbde96995c35db77916e2ed80b602a91
BUG: 1409206
Signed-off-by: Ravishankar N 
Reviewed-on: http://review.gluster.org/16309
Smoke: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: Pranith Kumar Karampuri 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

ec: Invalidations in disperse volume should not update the stat

2017-01-06T05:12:18+00:00

Issue:
In disperse volume, the file is present across bricks, hence the stat
from one brick doesn't carry the valid size of the file. Therefore
the upcall from one brick updating the md-cache results in wrong size
being updated.

Fix:
If the notification is cache invalidation then, indicate md-cache that
the attributes is invalid.

BUG: 1410375
Change-Id: Id89d2283478e70b62b435a8891fffc86d2be8cb2
Signed-off-by: Poornima G 
Reviewed-on: http://review.gluster.org/16329
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Xavier Hernandez 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

afr: use accused matrix instead of readable matrix for deciding heals

2016-12-27T06:34:02+00:00

Problem:
afr_replies_interpret() used the 'readable' matrix to trigger client
side heals after inode refresh. But for arbiter, readable is always
zero. So when `dd` is run with a data brick down, spurious data heals
are are triggered. These heals open an fd, causing eager lock to be
disabled (open fd count >1) in afr transactions, leading to extra FXATTROPS

Fix:
Use the accused matrix (derived from interpreting the afr pending
xattrs) to decide whether we can start heal or not.

Change-Id: Ibbd56c9aed6026de6ec42422e60293702aaf55f9
BUG: 1408395
Signed-off-by: Ravishankar N 
Reviewed-on: http://review.gluster.org/16277
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: Pranith Kumar Karampuri