glusterfs.git/xlators/cluster, branch v3.12.12

afr: don't update readables if inode refresh failed on all children

2018-07-11T14:04:25+00:00

Backport of: https://review.gluster.org/#/c/20029/
3.12 still supports quorum-reads, hence modified afr_inode_refresh_done() to
support that.

If inode refresh failed on all children of afr due to ENOENT (say file
migrated by dht), it resets the readables to zero. Any inflight txn which
then later comes on the inode fails with EIO because no readable
children present for the inode.

Fix:
Don't update readables when inode refresh fails on *all* children of
afr. In that way any inflight txns will either proceed with its own inode
refresh if needed and fail it with the right errno or use the old value
of readables and continue with the txn.

Also, add quorum checks to the beginning of afr_transaction(). Otherwise, we
seem to be winding the lock and checking for quorum only in pre-op pahse.

Note: This should ideally fix BZ 1329505 since the stop gap fix for
it is has been reverted at https://review.gluster.org/#/c/20028.

Change-Id: I82990769f01be918a073fec83fc67ba4b3be24b1
BUG: 1599247
Signed-off-by: Ravishankar N

afr: heal gfids when file is not present on all bricks

2018-07-11T14:03:47+00:00

Backport of https://review.gluster.org/#/c/20271/ (only change is in .t)

commit 20fa80057eb430fd72b4fa31b9b65598b8ec1265 introduced a regression
wherein if a file is present in only 1 brick of replica *and* doesn't
have a gfid associated with it, it doesn't get healed upon the next
lookup from the client. Fix it.

Change-Id: I7d1111dcb45b1b8b8340a7d02558f05df70aa599
BUG: 1598121
fixes: bz#1598121
Signed-off-by: Ravishankar N 
(cherry picked from commit eb472d82a083883335bc494b87ea175ac43471ff)

afr: fix bug-1363721.t failure

2018-07-09T10:03:07+00:00

Backport of https://review.gluster.org/#/c/20036/
Note:  We need to update inode context's write_subvol even in case of compound
fops. This is not there in master and 4.1 since compound FOPS was removed in it.

Problem:
In the .t, when the only good brick was brought down, writes on the fd were
still succeeding on the bad bricks. The inflight split-brain check was
marking the write as failure but since the write succeeded on all the
bad bricks, afr_txn_nothing_failed() was set to true and we were
unwinding writev with success to DHT and then catching the failure in
post-op in the background.

Fix:
Don't wind the FOP phase if the write_subvol (which is populated with readable
subvols obtained in pre-op cbk) does not have at least 1 good brick which was up
when the transaction started.

Change-Id: I4a1fef4569609c31cffeaef591a64c10870e8d0b
BUG: 1598720
Signed-off-by: Ravishankar N

afr: add quorum checks in pre-op

2018-07-06T01:42:54+00:00

Backport of https://review.gluster.org/#/c/19781/

Problem:
We seem to be winding the FOP if pre-op did not succeed on quorum bricks
and then failing the FOP with EROFS since the fop did not meet quorum.
This essentially masks the actual error due to which pre-op failed. (See
BZ).

Fix:
Skip FOP phase if pre-op quorum is not met and go to post-op.

Change-Id: Ie58a41e8fa1ad79aa06093706e96db8eef61b6d9
BUG: 1597154
Signed-off-by: Ravishankar N

afr: capture the correct errno in post-op quorum check

2018-07-05T05:56:24+00:00

If the post-op phase of txn did not meet quorm checks, use that errno to
unwind the FOP rather than blindly setting ENOTCONN.

Change-Id: I0cb0c8771ec75a45f9a25ad4cd8601103deddf0c
BUG: 1597120
Signed-off-by: Ravishankar N 
(cherry picked from commit 440a048f24b006c80af3d7bcd0a1f13fe3459d87)

cluster/dht: act as passthrough for renames on single child DHT

2018-07-05T05:54:11+00:00

Various synchronization present in dht_rename while handling
directories and files is necessary only if we have more than only one
child.


Change-Id: Ie21ad419125504ca2f391b1ae2e5c1d166fee247
fixes: bz#1563513
Signed-off-by: Raghavendra G

afr: add quorum checks in post-op

2018-07-04T04:04:22+00:00

afr relies on pending changelog xattrs to identify source and sinks and the
setting of these xattrs happen in post-op. So if post-op fails, we need to
unwind the write txn with a failure.

Change-Id: I0f019ac03890108324ee7672883d774918b20be1
BUG: 1597120
Signed-off-by: Ravishankar N 
(cherry picked from commit a40a87ec3b226ae86a6ed8f4af25b45965a20cad)

afr: don't treat all cases all bricks being blamed as split-brain

2018-07-04T04:03:45+00:00

Problem:
We currently don't have a roll-back/undoing of post-ops if quorum is not
met. Though the FOP is still unwound with failure, the xattrs remain on
the disk.  Due to these partial post-ops and partial heals (healing only when
2 bricks are up), we can end up in split-brain purely from the afr
xattrs point of view i.e each brick is blamed by atleast one of the
others. These scenarios are hit when there is frequent
connect/disconnect of the client/shd to the bricks while I/O or heal
are in progress.

Fix:
Instead of undoing the post-op, pick a source based on the xattr values.
If 2 bricks blame one, the blamed one must be treated as sink.
If there is no majority, all are sources. Once we pick a source,
self-heal will then do the heal instead of erroring out due to
split-brain.

Change-Id: I3d0224b883eb0945785ade0e9697a1c828aec0ae
BUG: 1597123
Signed-off-by: Ravishankar N 
(cherry picked from commit 0e6e8216823c2d9dafb81aae0f6ee3497c23d140)

cluster/dht: Remove EIO from dht_inode_missing

2018-07-04T04:02:48+00:00

Removed EIO from the list of errnos that triggered
a migrate check task.

(cherry picked from commit c925962b91c67c8cd2391df7dd0251e0cbf66648)

Change-Id: I7f89c7a16056421588f1af2377cebe6affddcb47
BUG: 1579673
Signed-off-by: N Balachandran

cluster/dht: Fix dht_rename lock order

2018-05-09T05:04:20+00:00

Fixed dht_order_rename_lock to use the same inodelk ordering
as that of the dht selfheal locks (dictionary order of
lock subvolumes).

Change-Id: Ia3f8353b33ea2fd3bc1ba7e8e777dda6c1d33e0d
BUG: 1570475
Signed-off-by: N Balachandran