glusterfs.git/xlators, branch release-3.13

afr: capture the correct errno in post-op quorum check

2018-02-06T14:28:04+00:00

If the post-op phase of txn did not meet quorm checks, use that errno to
unwind the FOP rather than blindly setting ENOTCONN.

Change-Id: I0cb0c8771ec75a45f9a25ad4cd8601103deddf0c
BUG: 1536346
Signed-off-by: Ravishankar N 
(cherry picked from commit 440a048f24b006c80af3d7bcd0a1f13fe3459d87)

afr: don't treat all cases all bricks being blamed as split-brain

2018-02-06T14:27:55+00:00

Problem:
We currently don't have a roll-back/undoing of post-ops if quorum is not
met. Though the FOP is still unwound with failure, the xattrs remain on
the disk.  Due to these partial post-ops and partial heals (healing only when
2 bricks are up), we can end up in split-brain purely from the afr
xattrs point of view i.e each brick is blamed by atleast one of the
others. These scenarios are hit when there is frequent
connect/disconnect of the client/shd to the bricks while I/O or heal
are in progress.

Fix:
Instead of undoing the post-op, pick a source based on the xattr values.
If 2 bricks blame one, the blamed one must be treated as sink.
If there is no majority, all are sources. Once we pick a source,
self-heal will then do the heal instead of erroring out due to
split-brain.

Change-Id: I3d0224b883eb0945785ade0e9697a1c828aec0ae
BUG: 1541458
Signed-off-by: Ravishankar N 
(cherry picked from commit 0e6e8216823c2d9dafb81aae0f6ee3497c23d140)

cluster/afr: remove unnecessary child_up initialization

2018-02-05T09:15:06+00:00

The child_up array was initialized with all elements being -1 to
allow afr_notify() to differentiate down bricks from bricks that
haven't reported yet. With current implementation this is not needed
anymore and it was causing unexpected results when other parts of
the code considered that if child_up[i] != 0, it meant that it was up.

Backport of:
> BUG: 1541038

Change-Id: I2a9d712ee64c512f24bd5cd3a48dcb37e3139472
BUG: 1541929
Signed-off-by: Xavier Hernandez

cluster/ec: Do lock conflict check correctly for wait-list

2018-02-02T15:05:01+00:00

Problem:
ec_link_has_lock_conflict() is traversing over only owner_list
but the function is also getting called with wait_list.

Fix:
Modify ec_link_has_lock_conflict() to traverse lists correctly.
Updated the callers to reflect the changes.

BUG: 1540896
Change-Id: Ibd7ea10f4498e7c2761f9a6faac6d5cb7d750c91
Signed-off-by: Pranith Kumar K

afr: add quorum checks in post-op

2018-02-01T13:57:42+00:00

afr relies on pending changelog xattrs to identify source and sinks and the
setting of these xattrs happen in post-op. So if post-op fails, we need to
unwind the write txn with a failure.

Change-Id: I0f019ac03890108324ee7672883d774918b20be1
BUG: 1536346
Signed-off-by: Ravishankar N 
(cherry picked from commit a40a87ec3b226ae86a6ed8f4af25b45965a20cad)

selinux-xlator : validate dict before calling dict_rename_key()

2018-01-19T14:57:52+00:00

Upstream reference :
>Change-Id: I71da3b64e5e8c82e8842e119b2b05da3e2ace550
>BUG: 1535772
>Signed-off-by: Jiffin Tony Thottan 
>(cherry picked from commit bee06ccd7b80e3f5804f0c7c7c56936fed6d2b4e)

Change-Id: I71da3b64e5e8c82e8842e119b2b05da3e2ace550
BUG: 1536294

cluster/afr: Adding option to take full file lock

2018-01-19T14:24:52+00:00

Problem:
In replica 3 volumes there is a possibilities of ending up in split
brain scenario, when multiple clients writing data on the same file
at non overlapping regions in parallel.

Scenario:
- Initially all the copies are good and all the clients gets the value
  of data readables as all good.
- Client C0 performs write W1 which fails on brick B0 and succeeds on
  other two bricks.
- C1 performs write W2 which fails on B1 and succeeds on other two bricks.
- C2 performs write W3 which fails on B2 and succeeds on other two bricks.
- All the 3 writes above happen in parallel and fall on different ranges
  so afr takes granular locks and all the writes are performed in parallel.
  Since each client had data-readables as good, it does not see
  file going into split-brain in the in_flight_split_brain check, hence
  performs the post-op marking the pending xattrs. Now all the bricks
  are being blamed by each other, ending up in split-brain.

Fix:
Have an option to take either full lock or range lock on files while
doing data transactions, to prevent the possibility of ending up in
split brains. With this change, by default the files will take full
lock while doing IO. If you want to make use of the old range lock
change the value of "cluster.full-lock" to "no".

Change-Id: I7893fa33005328ed63daa2f7c35eeed7c5218962
BUG: 1535438
Signed-off-by: karthik-us

cluster/afr: Fixing the flaws in arbiter becoming source patch

2018-01-18T18:20:08+00:00

Problem:
Setting the write_subvol value to read_subvol in case of metadata
transaction during pre-op (commit 19f9bcff4aada589d4321356c2670ed283f02c03)
might lead to the original problem of arbiter becoming source.

Scenario:
1) All bricks are up and good
2) 2 writes w1 and w2 are in progress in parallel
3) ctx->read_subvol is good for all the subvolumes
4) w1 succeeds on brick0 and fails on brick1, yet to do post-op on
   the disk
5) read/lookup comes on the same file and refreshes read_subvols back
   to all good
6) metadata transaction happens which makes ctx->write_subvol to be
   assigned with ctx->read_subvol which is all good
7) w2 succeeds on brick1 and fails on brick0 and this will update the
   brick in reverse order leading to arbiter becoming source

Fix:
Instead of setting the ctx->write_subvol to ctx->read_subvol in the
pre-op statge, if there is a metadata transaction, check in the
function __afr_set_in_flight_sb_status() if it is a data/metadata
transaction. Use the value of ctx->write_subvol if it is a data
transactions and ctx->read_subvol value for other transactions.

With this patch we assign the value of ctx->write_subvol in the
afr_transaction_perform_fop() with the on disk value, instead of
assigning it in the afr_changelog_pre_op() with the in memory value.

Change-Id: Id2025a7e965f0578af35b1abaac793b019c43cc4
BUG: 1516313
Signed-off-by: karthik-us 
(cherry picked from commit ba149bac92d169ae2256dbc75202dc9e5d06538e)

cluster/ec: OpenFD heal implementation for EC

2018-01-18T18:18:53+00:00

Existing EC code doesn't try to heal the OpenFD to
avoid unnecessary healing of the data later.

Fix implements the healing of open FDs before
carrying out file operations on them by making an
attempt to open the FDs on required up nodes.

>BUG: 1431955
>Change-Id: Ib696f59c41ffd8d5678a484b23a00bb02764ed15
>Signed-off-by: Sunil Kumar Acharya 

Upstream Patch: https://review.gluster.org/#/c/17077/

BUG: 1533023
Change-Id: Ib696f59c41ffd8d5678a484b23a00bb02764ed15
Signed-off-by: Sunil Kumar Acharya

posix: delete stale gfid handles in nameless lookup

2018-01-16T20:18:35+00:00

..in order for self-heal of symlinks to work properly (see BZ for
details).

Backport of https://review.gluster.org/#/c/19070/
Signed-off-by: Ravishankar N 

Change-Id: I9a011d00b07a690446f7fd3589e96f840e8b7501
BUG: 1534842