glusterfs.git/xlators/cluster, branch v4.1.6

afr/lease: Read child nodes from lease structure

2018-10-31T00:31:24+00:00

For lease operation, we allocate and store child nodes
data in lease structure. Use the same in afr_lease_cbk()
while checking for the quorum.

Change-Id: If1fdd5a0798888afd39ad3df57d96487baf9d1e6
fixes: bz#1644474
Signed-off-by: Soumya Koduri

afr: prevent winding inodelks twice for arbiter volumes

2018-10-10T12:55:44+00:00

Backport of https://review.gluster.org/#/c/glusterfs/+/21380/

Problem:
In an arbiter volume, if there is a pending data heal of a file only on
arbiter brick, self-heal takes inodelks twice due to a code-bug but unlocks
it only once, leaving behind a stale lock on the brick. This causes
the next write to the file to hang.

Fix:
Fix the code-bug to take lock only once. This bug was introduced master
with commit eb472d82a083883335bc494b87ea175ac43471ff

Thanks to  Pranith Kumar K  for finding the RCA.

fixes: bz#1637953
Change-Id: I15ad969e10a6a3c4bd255e2948b6be6dcddc61e1
Signed-off-by: Ravishankar N

afr: fix incorrect reporting of directory split-brain

2018-10-05T14:43:20+00:00

Backport of https://review.gluster.org/#/c/glusterfs/+/21135/

Problem:
When a directory has dirty xattrs due to failed post-ops or when
replace/reset brick is performed, AFR does a conservative merge as
expected, but heal-info reports it as split-brain because there are no
clear sources.

Fix:
Modify pending flag to contain information about pending heals and
split-brains. For directories, if spit-brain flag is not set,just show
them as needing heal and not being in split-brain.

Change-Id: I09ef821f6887c87d315ae99e6b1de05103cd9383
fixes: bz#1633634
Signed-off-by: Ravishankar N

cluster/afr: Make data eager-lock decision based on number of locks

2018-10-05T14:38:52+00:00

For both Virt and block workloads the file is opened multiple times
leading to dynamically setting eager-lock to off for the workload.
Instead of depending on the number-of-open-fds, if we change the
logic to depend on number of inodelks, then it will give better
performance than the earlier logic. When there is an eager-lock
and number of inodelks is more than 1 we know that there is a
conflicting lock, so depend on that information to decide whether
to keep the current transaction go through delayed-post-op or not.

Locks xlator doesn't have implementation to query number of locks in
fxattrop in releases older than 3.10 so to keep things backward
compatible in 3.12, data transactions will use new logic where as
fxattrop transactions will use old logic. I am planning to send one
more patch which makes metadata domain locks also depend on
inodelk-count

Profile info for a dd of 500MB to a file with another fd opened
on the file using exec 250>filename

Without this patch:
 0.14      67.41 us      16.72 us    3870.82 us  892 FINODELK
 0.59     279.87 us      95.71 us    2085.89 us  898 FXATTROP
 3.46     366.43 us      81.75 us    6952.79 us 4000 WRITE
95.79  148733.99 us   50568.12 us  919127.86 us  273 FSYNC

With this patch:
 0.00      51.01 us      38.07 us      80.16 us    4 FINODELK
 0.00     235.43 us     235.43 us     235.43 us    1 TRUNCATE
 0.00     125.07 us      56.80 us     193.33 us    2 GETXATTR
 0.00     135.86 us      62.13 us     209.59 us    2  INODELK
 0.00     197.88 us     155.39 us     253.90 us    4 FXATTROP
 0.00     450.59 us     394.28 us     506.89 us    2  XATTROP
 0.00      56.96 us      19.06 us     406.59 us   23    FLUSH
37.81  273648.93 us      48.43 us 6017657.05 us   44   LOOKUP
62.18    4951.86 us      93.80 us 1143154.75 us 3999    WRITE

postgresql benchmark performance changed from ~1130 TPS to ~2300TPS
randio fio job inside Ovirt based VM went from ~600IOPs to ~2000IOPS

fixes bz#1635980
Change-Id: If7f7388d2f08cf7f17ca517a4ea222560661dc36
Signed-off-by: Pranith Kumar K

cluster/afr: Batch writes in same lock even when multiple fds are open

2018-10-05T14:38:52+00:00

Problem:
When eager-lock is disabled because of multiple-fds opened and app
writes come on conflicting regions, the number of locks grows very
fast leading to all the CPU being spent just in locking and unlocking
by traversing huge queues in locks xlator for granting locks.

Fix:
Reduce the number of locks in transit by bundling the writes in the
same lock and disable delayed piggy-pack when we learn that multiple
fds are open on the file. This will reduce the size of queues in the
locks xlator.  This also reduces the number of network calls like
inodelk/fxattrop.

Please note that this problem can still happen if eager-lock is
disabled as the writes will not be bundled in the same lock.

fixes bz#1635979
Change-Id: I8fd1cf229aed54ce5abd4e6226351a039924dd91
Signed-off-by: Pranith Kumar K

cluster/afr: Delegate name-heal when possible

2018-09-21T13:27:00+00:00

Problem:
When name-self-heal is triggered on the mount, it blocks
lookup until name-self-heal completes. But that can lead
to hangs when lot of clients are accessing a directory which
needs name heal and all of them trigger heals waiting
for other clients to complete heal.

Fix:
When a name-heal is needed but quorum number of names have the
file and pending xattrs exist on the parent, then better to
delegate the heal to SHD which will be completed as part of
entry-heal of the parent directory. We could also do the same
for quorum-number of names not present but we don't have
any known use-case where this is a frequent occurrence so
not changing that part at the moment. When there is a gfid
mismatch or missing gfid it is important to complete the heal
so that next rename doesn't assume everything is fine and
perform a rename etc

fixes bz#1625575
Change-Id: I8b002c85dffc6eb6f2833e742684a233daefeb2c
Signed-off-by: Pranith Kumar K

cluster/afr: Delegate metadata heal with pending xattrs to SHD

2018-09-21T13:27:00+00:00

Problem:
When metadata-self-heal is triggered on the mount, it blocks
lookup until metadata-self-heal completes. But that can lead
to hangs when lot of clients are accessing a directory which
needs metadata heal and all of them trigger heals waiting
for other clients to complete heal.

Fix:
Only when the heal is needed but the pending xattrs are not set,
trigger metadata heal that could block lookup. This is the only
case where different clients may give different metadata to the
clients without heals, which should be avoided.

Updates bz#1625575
Change-Id: I6089e9fda0770a83fb287941b229c882711f4e66
Signed-off-by: Pranith Kumar K

dht: Delete MDS internal xattr from dict in dht_getxattr_cbk

2018-08-16T14:31:37+00:00

Problem: At the time of fetching xattr to heal xattr by afr
         it is not able to fetch xattr because posix_getxattr
         has a check to ignore if xattr name is MDS

Solution: To ignore same xattr update a check in dht_getxattr_cbk
          instead of having a check in posix_getxattr

Backport of:
 > BUG: 1584098
 > Change-Id: I86cd2b2ee08488cb6c12f407694219d57c5361dc
 > Signed-off-by: Mohit Agrawal 

Change-Id: I86cd2b2ee08488cb6c12f407694219d57c5361dc
fixes: bz#1611116
Signed-off-by: Mohit Agrawal

afr: switch lk_owner only when pre-op succeeds

2018-07-23T23:42:15+00:00

Problem:
In a disk full scenario, we take a failure path in afr_transaction_perform_fop()
and go to unlock phase. But we change the lk-owner before that, causing unlock
to fail. When mount issues another fop that takes locks on that file, it hangs.

Fix:
Change lk-owner only when we are about to perform the fop phase.
Also fix the same issue for arbiters when afr_txn_arbitrate_fop() fails the fop.

Also removed the DISK_SPACE_CHECK_AND_GOTO in posix_xattrop. Otherwise truncate
to zero will fail pre-op phase with ENOSPC when the user is actually trying to
freee up space.

Change-Id: Ic4c8a596b4cdf4a7fc189bf00b561113cf114353
fixes: bz#1603056
Signed-off-by: Ravishankar N 
(cherry picked from commit ec0d7d77de3e4bd485a4fa2e53c9137e25c71ce7)

cluster/afr: Prevent execution of code after call_count decrementing

2018-07-10T08:49:30+00:00

Problem:
When call_count is decremented by one thread, another thread can
go ahead with the operation leading to undefined behavior for the
thread executing statements after decrementing call count.

Fix:
Do the operations necessary before decrementing call count.

fixes bz#1599629
Change-Id: Icc90cd92ac16e5fbdfe534d9f0a61312943393fe
Signed-off-by: Pranith Kumar K 
(cherry picked from commit 03f1f5bdc46076178f1afdf8e2a76c5b973fe11f)