glusterfs.git/xlators/cluster/ec/src/ec-locks.c, branch release-3.7

cluster/afr: Prevent split-brain when bricks are brought off and on in cyclic order

2016-08-22T10:05:08+00:00

        Backport of: http://review.gluster.org/15080

When the bricks are brought offline and then online in cyclic
order while writes are in progress on a file, thanks to inode
refresh in write txns, AFR will mostly fail the write attempt
when the only good copy is offline. However, there is still a
remote possibility that the file will run into split-brain if
the brick that has the lone good copy goes offline *after* the
inode refresh but *before* the write txn completes (I call it
in-flight split-brain in the patch for ease of reference),
requiring intervention from admin to resolve the split-brain
before the IO can resume normally on the file. To get around this,
the patch does the following things:
i) retains the dirty xattrs on the file
ii) avoids marking the last of the good copies as bad (or accused)
    in case it is the one to go down during the course of a write.
iii) fails that particular write with the appropriate errno.

This way, we still have one good copy left despite the split-brain situation
which when it is back online, will be chosen as source to do the heal.

Change-Id: I7c13c6ddd5b8fe88b0f2684e8ce5f4a9c3a24a08
BUG: 1367270
Signed-off-by: Krutika Dhananjay 
Reviewed-on: http://review.gluster.org/15222
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Oleksandr Natalenko 
Reviewed-by: Pranith Kumar Karampuri

cluster/ec: Unlock stale locks when inodelk/entrylk/lk fails

2016-07-30T01:04:22+00:00

Thanks to Rafi for hinting a while back that this kind of
problem he saw once. I didn't think the theory was valid.
Could have caught it earlier if I had tested his theory.

 >Change-Id: Iac6ffcdba2950aa6f8cf94f8994adeed6e6a9c9b
 >BUG: 1344836
 >Signed-off-by: Pranith Kumar K 
 >Reviewed-on: http://review.gluster.org/14703
 >Reviewed-by: Xavier Hernandez 
 >Smoke: Gluster Build System 
 >Tested-by: mohammed rafi  kc 
 >NetBSD-regression: NetBSD Build System 
 >CentOS-regression: Gluster Build System 

BUG: 1361402
Change-Id: If9ccf0b3db7159b87ddcdc7b20e81cde8c3c76f0
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/15040
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Xavier Hernandez 
CentOS-regression: Gluster Build System

cluster/ec: Fix tracking of good bricks

2015-08-14T09:02:21+00:00

The bitmask of good and bad bricks was kept in the context of the
corresponding inode or fd. This was problematic when an external
process (another client or the self-heal process) did heal the
bricks but no one changed the bitmaks of other clients.

This patch removes the bitmask stored in the context and calculates
which bricks are healthy after locking them and doing the initial
xattrop. After that, it's updated using the result of each fop.

> Change-Id: I225e31cd219a12af4ca58871d8a4bb6f742b223c
> BUG: 1236065
> Signed-off-by: Xavier Hernandez 
> Reviewed-on: http://review.gluster.org/11844
> Tested-by: NetBSD Build System 
> Tested-by: Gluster Build System 
> Reviewed-by: Pranith Kumar Karampuri 

Change-Id: Idbe68b28b865c4b28366703ad1e96ae16ba44b66
BUG: 1235964
Signed-off-by: Xavier Hernandez 
Reviewed-on: http://review.gluster.org/11867
Tested-by: NetBSD Build System 
Tested-by: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

cluster/ec: Minimize usage of EIO error

2015-08-08T15:36:57+00:00

>Change-Id: I82e245615419c2006a2d1b5e94ff0908d2f5e891
>BUG: 1245276
>Signed-off-by: Xavier Hernandez 
>Reviewed-on: http://review.gluster.org/11741
>Tested-by: Gluster Build System 
>Reviewed-by: Pranith Kumar Karampuri 
>Tested-by: NetBSD Build System 

Change-Id: Ifd3d63f88a686a2963c5ba2e62110249f84f338d
BUG: 1250864
Signed-off-by: Xavier Hernandez 
Reviewed-on: http://review.gluster.org/11852
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: NetBSD Build System 
Tested-by: Gluster Build System

cluster/ec: Propogate correct errno in case of failures

2015-07-22T06:28:19+00:00

- Also remove internal-fop setting in create/mknod etc xattrs.

Rebalance was failing because ec was giving EIO when lock acquiring fails as
the file/dir doesn't exist. Posix_create/mknod are not setting config xattr
because internal-fop key is present in dict and setxattr for this fails leading
to failure in setting rest of xattrs.

 >Change-Id: Ifb429c8db9df7cd51e4f8ce53fdf1e1b975c9993
 >BUG: 1242254
 >Signed-off-by: Pranith Kumar K 
 >Reviewed-on: http://review.gluster.org/11639
 >Reviewed-by: Raghavendra G 
 >Tested-by: Gluster Build System 
 >Reviewed-by: Xavier Hernandez 
 >Tested-by: NetBSD Build System 

BUG: 1243654
Change-Id: Iedb90d6a7d980fb88d6dfa6a6c978a165a4be3fd
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/11688
Reviewed-by: Xavier Hernandez 
Tested-by: Gluster Build System

ec: Porting messages to new logging framework

2015-06-27T09:19:38+00:00

This is a backport of http://review.gluster.org/#/c/10465/

cherry-picked from commit b0b9eaea9dbb4e9a535f5e969defc4556a9e2204
>Change-Id: Ia05ae750a245a37d48978e5f37b52f4fb0507a8c
>BUG: 1194640
>Signed-off-by: Nandaja Varma 

Change-Id: Ia05ae750a245a37d48978e5f37b52f4fb0507a8c
BUG: 1217722
Signed-off-by: Nandaja Varma 
Reviewed-on: http://review.gluster.org/11429
Tested-by: Gluster Build System 
Tested-by: NetBSD Build System 
Reviewed-by: Pranith Kumar Karampuri

cluster/ec: Fix all EIO errors in EC

2015-05-28T11:12:06+00:00

        Backport of http://review.gluster.org/10770
        Backport of http://review.gluster.org/10806
        Backport of http://review.gluster.org/10787
        Backport of http://review.gluster.org/10868
        Backport of http://review.gluster.com/10852

 - When a blocking lock is requested, lock request is succeeded even when
ec->fragment number of locks are acquired successfully in non-blocking locking
phase. This will lead to fop succeeding only on the bricks where the locks are
acquired, leading to the necessity of self-heals. To prevent these un-necessary
self-heals, if the remaining locks fail with EAGAIN in non-blocking lock phase
try blocking locking phase instead.

 -  Handle lookup failures while op in progress

 - cluster/ec: Correctly cleanup delayed locks
When a delayed lock is pending, a graph switch doesn't correctly
terminate it. This means that the update of version and size xattrs
is lost, causing EIO errors. This patch handles GF_EVENT_PARENT_DOWN
event to correctly finish pending udpdates before completing the
graph switch.

 - Fix use after free crash
ec_heal creates ec_fop_data but doesn't run ec_manager. ec_fop_data_allocate
adds this fop to ec->pending_fops, because ec_manager is not run on this heal
fop it is never removed from ec->pending_fops. When it is accessed after free
it leads to crash. It is better to not to add HEAL fops to ec->pending_fops
because we don't want graph switch to hang the mount because of a BIG
file/directory heal.

- Forced unlock when lock contention is detected
EC uses an eager lock mechanism to optimize multiple read/write
requests on the same entry or inode. This increases performance
but can have adverse results when other clients try to access the
same entry/inode. To solve this, this patch adds a functionality
to detect when this happens and force an earlier release to not
block other clients.

The method consists on requesting GF_GLUSTERFS_INODELK_COUNT and
GF_GLUSTERFS_ENTRYLK_COUNT for all fops that take a lock. When this
count is greater than one, the lock is marked to be released. All
fops already waiting for this lock will be executed normally before
releasing the lock, but new requests that also require it will be
blocked and restarted after the lock has been released and reacquired
again.

Another problem was that some operations did correctly lock the
parent of an entry when needed, but got the size and version xattrs
from the entry instead of the parent.

This patch solves this problem by binding all queries of size and
version to each lock and replacing all entrylk calls by inodelk ones
to remove concurrent updates on directory metadata.  This also allows
rename to correctly update source and destination directories.

BUG: 1225279
Change-Id: I02a6084b138dd38e018a462347cd9ce38610c7ef
Reviewed-on: http://review.gluster.org/10926
Tested-by: NetBSD Build System
Tested-by: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

ec: Fix failures with missing files

2015-05-10T00:30:19+00:00

      Backport of http://review.gluster.com/9407

When a file does not exist on a brick but it does on others, there
could be problems trying to access it because there was some loc_t
structures with null 'pargfid' but 'name' was set. This forced
inode resolution based on /name instead of  which
would be the correct one. To solve this problem, 'name' is always
set to NULL when 'pargfid' is not present.

Another problem was caused by an incorrect management of errors
while doing incremental locking. The only allowed error during an
incremental locking was ENOTCONN, but missing files on a brick can
be returned as ESTALE. This caused an EIO on the operation.

This patch doesn't care of errors during an incremental locking. At
the end of the operation it will check if there are enough successfully
locked bricks to continue or not.

BUG: 1220011
Change-Id: I4a1e6235d80e20ef7ef12daba0807b859ee5c435
Signed-off-by: Xavier Hernandez 
Reviewed-on: http://review.gluster.org/10701
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: Gluster Build System

ec: Fix return errors when not enough bricks

2014-12-05T11:39:07+00:00

Changes introduced by this patch:

* Fix an incorrect error propagation when the state of the life
  cycle of a fop returns an error.

* Fix incorrect unlocking of failed locks.

* Return ENOTCONN if there aren't enough bricks online.

* In readdir(p) check that the fd has been successfully open by
  a previous opendir.

Change-Id: Ib44f25a1297849ebcbab839332f3b6359f275ebe
BUG: 1162805
Signed-off-by: Xavier Hernandez 
Reviewed-on: http://review.gluster.org/9098
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur

ec: Change license

2014-12-03T18:45:26+00:00

Change-Id: Iae90ade2421898417b53dec0417a610cf306c44b
BUG: 1168167
Signed-off-by: Xavier Hernandez 
Reviewed-on: http://review.gluster.org/9201
Reviewed-by: Kaleb KEITHLEY 
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur