glusterfs.git/xlators/cluster/ec/src/ec-locks.c, branch v3.9.0rc2

cluster/afr: Prevent split-brain when bricks are brought off and on in cyclic order

2016-08-22T09:38:36+00:00

When the bricks are brought offline and then online in cyclic
order while writes are in progress on a file, thanks to inode
refresh in write txns, AFR will mostly fail the write attempt
when the only good copy is offline. However, there is still a
remote possibility that the file will run into split-brain if
the brick that has the lone good copy goes offline *after* the
inode refresh but *before* the write txn completes (I call it
in-flight split-brain in the patch for ease of reference),
requiring intervention from admin to resolve the split-brain
before the IO can resume normally on the file. To get around this,
the patch does the following things:
i) retains the dirty xattrs on the file
ii) avoids marking the last of the good copies as bad (or accused)
    in case it is the one to go down during the course of a write.
iii) fails that particular write with the appropriate errno.

This way, we still have one good copy left despite the split-brain situation
which when it is back online, will be chosen as source to do the heal.

Change-Id: I9ca634b026ac830b172bac076437cc3bf1ae7d8a
BUG: 1363721
Signed-off-by: Krutika Dhananjay 
Reviewed-on: http://review.gluster.org/15080
Tested-by: Pranith Kumar Karampuri 
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Ravishankar N 
Reviewed-by: Oleksandr Natalenko 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Pranith Kumar Karampuri

cluster/ec: Unlock stale locks when inodelk/entrylk/lk fails

2016-06-14T10:48:54+00:00

Thanks to Rafi for hinting a while back that this kind of
problem he saw once. I didn't think the theory was valid.
Could have caught it earlier if I had tested his theory.

Change-Id: Iac6ffcdba2950aa6f8cf94f8994adeed6e6a9c9b
BUG: 1344836
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/14703
Reviewed-by: Xavier Hernandez 
Smoke: Gluster Build System 
Tested-by: mohammed rafi  kc 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

cluster/ec: Fix tracking of good bricks

2015-08-06T17:12:22+00:00

The bitmask of good and bad bricks was kept in the context of the
corresponding inode or fd. This was problematic when an external
process (another client or the self-heal process) did heal the
bricks but no one changed the bitmaks of other clients.

This patch removes the bitmask stored in the context and calculates
which bricks are healthy after locking them and doing the initial
xattrop. After that, it's updated using the result of each fop.

Change-Id: I225e31cd219a12af4ca58871d8a4bb6f742b223c
BUG: 1236065
Signed-off-by: Xavier Hernandez 
Reviewed-on: http://review.gluster.org/11844
Tested-by: NetBSD Build System 
Tested-by: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

cluster/ec: Minimize usage of EIO error

2015-07-28T11:12:17+00:00

Change-Id: I82e245615419c2006a2d1b5e94ff0908d2f5e891
BUG: 1245276
Signed-off-by: Xavier Hernandez 
Reviewed-on: http://review.gluster.org/11741
Tested-by: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: NetBSD Build System

cluster/ec: Propogate correct errno in case of failures

2015-07-15T00:05:00+00:00

- Also remove internal-fop setting in create/mknod etc xattrs.

Rebalance was failing because ec was giving EIO when lock acquiring fails as
the file/dir doesn't exist. Posix_create/mknod are not setting config xattr
because internal-fop key is present in dict and setxattr for this fails leading
to failure in setting rest of xattrs.

Change-Id: Ifb429c8db9df7cd51e4f8ce53fdf1e1b975c9993
BUG: 1242254
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/11639
Reviewed-by: Raghavendra G 
Tested-by: Gluster Build System 
Reviewed-by: Xavier Hernandez 
Tested-by: NetBSD Build System

ec: Porting messages to new logging framework

2015-06-26T15:51:59+00:00

Change-Id: Ia05ae750a245a37d48978e5f37b52f4fb0507a8c
BUG: 1194640
Signed-off-by: Nandaja Varma 
Reviewed-on: http://review.gluster.org/10465
Tested-by: NetBSD Build System 
Reviewed-by: Xavier Hernandez

cluster/ec: Prevent unnecessary self-heals

2015-05-15T08:24:51+00:00

When a blocking lock is requested, lock request is succeeded even when
ec->fragment number of locks are acquired successfully in non-blocking locking
phase. This will lead to fop succeeding only on the bricks where the locks are
acquired, leading to the necessity of self-heals. To prevent these un-necessary
self-heals, if the remaining locks fail with EAGAIN in non-blocking lock phase
try blocking locking phase instead.

Change-Id: I940969e39acc620ccde2a876546cea77f7e130b6
BUG: 1221145
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/10770
Tested-by: Gluster Build System 
Reviewed-by: Xavier Hernandez

ec: Fix failures with missing files

2015-05-10T00:29:46+00:00

When a file does not exist on a brick but it does on others, there
could be problems trying to access it because there was some loc_t
structures with null 'pargfid' but 'name' was set. This forced
inode resolution based on /name instead of  which
would be the correct one. To solve this problem, 'name' is always
set to NULL when 'pargfid' is not present.

Another problem was caused by an incorrect management of errors
while doing incremental locking. The only allowed error during an
incremental locking was ENOTCONN, but missing files on a brick can
be returned as ESTALE. This caused an EIO on the operation.

This patch doesn't care of errors during an incremental locking. At
the end of the operation it will check if there are enough successfully
locked bricks to continue or not.

Change-Id: I9360ebf8d819d219cea2d173c09bd37679a6f15a
BUG: 1176062
Signed-off-by: Xavier Hernandez 
Reviewed-on: http://review.gluster.org/9407
Tested-by: NetBSD Build System
Tested-by: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

ec: Fix return errors when not enough bricks

2014-12-05T11:39:07+00:00

Changes introduced by this patch:

* Fix an incorrect error propagation when the state of the life
  cycle of a fop returns an error.

* Fix incorrect unlocking of failed locks.

* Return ENOTCONN if there aren't enough bricks online.

* In readdir(p) check that the fd has been successfully open by
  a previous opendir.

Change-Id: Ib44f25a1297849ebcbab839332f3b6359f275ebe
BUG: 1162805
Signed-off-by: Xavier Hernandez 
Reviewed-on: http://review.gluster.org/9098
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur

ec: Change license

2014-12-03T18:45:26+00:00

Change-Id: Iae90ade2421898417b53dec0417a610cf306c44b
BUG: 1168167
Signed-off-by: Xavier Hernandez 
Reviewed-on: http://review.gluster.org/9201
Reviewed-by: Kaleb KEITHLEY 
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur