glusterfs.git/tests/basic/afr, branch v5.1

afr: thin-arbiter read txn changes

2018-09-05T08:28:23+00:00

If both data bricks are up, read subvol will be based on read_subvols.

If only one data brick is up:
- First qeury the data-brick that is up. If it blames the other brick,
allow the reads.

- If if doesn't, query the TA to obtain the source of truth.

TODO: See if in-memory state can be maintained for read txns (BZ 1624358).

updates: bz#1579788
Change-Id: I61eec35592af3a1aaf9f90846d9a358b2e4b2fcc
Signed-off-by: Ravishankar N

cluster/afr: Delegate name-heal when possible

2018-09-04T01:52:02+00:00

Problem:
When name-self-heal is triggered on the mount, it blocks
lookup until name-self-heal completes. But that can lead
to hangs when lot of clients are accessing a directory which
needs name heal and all of them trigger heals waiting
for other clients to complete heal.

Fix:
When a name-heal is needed but quorum number of names have the
file and pending xattrs exist on the parent, then better to
delegate the heal to SHD which will be completed as part of
entry-heal of the parent directory. We could also do the same
for quorum-number of names not present but we don't have
any known use-case where this is a frequent occurrence so
not changing that part at the moment. When there is a gfid
mismatch or missing gfid it is important to complete the heal
so that next rename doesn't assume everything is fine and
perform a rename etc

fixes bz#1622821
Change-Id: I8b002c85dffc6eb6f2833e742684a233daefeb2c
Signed-off-by: Pranith Kumar K

cluster/afr: Delegate metadata heal with pending xattrs to SHD

2018-08-28T14:12:55+00:00

Problem:
When metadata-self-heal is triggered on the mount, it blocks
lookup until metadata-self-heal completes. But that can lead
to hangs when lot of clients are accessing a directory which
needs metadata heal and all of them trigger heals waiting
for other clients to complete heal.

Fix:
Only when the heal is needed but the pending xattrs are not set,
trigger metadata heal that could block lookup. This is the only
case where different clients may give different metadata to the
clients without heals, which should be avoided.

Updates bz#1622821
Change-Id: I6089e9fda0770a83fb287941b229c882711f4e66
Signed-off-by: Pranith Kumar K

performance/readdir-ahead: keep stats of cached dentries in sync with modifications

2018-08-18T07:28:53+00:00

PROBLEM:

Stats of dentries that are readdirp'd ahead can become stale due to
fops like writes, truncate etc that modify the file pointed by
dentries. When a readdir is finally wound at offset corresponding to
these entries, the iatts that are returned to the application come
from readdir-ahead's cache, which are stale by now. This problem gets
further aggravated when caching translators/modules cache and continue
to serve this stale information.

FIX:

* Store the iatt in context of the inode pointed by dentry.
* Whenever the inode pointed by dentry undergoes modification, in cbk
  of modification fop, update the iatt stored in inode-ctx to reflect
  the modification.
* When serving a readdirp response from application, update iatts of
  dentries with the iatts stored in the context of inodes pointed by
  these dentries.
* Some fops don't have valid iatts in their responses. For eg., write
  response whose data is still cached in write-behind will have zeroed
  out stat. In this case keep only ia_type and ia_gfid and reset rest
  of the iatt members to zero.
  - fuse-bridge in this case just sends "entry" information back to
    kernel and attr is not sent.
  - gfapi sets entry->inode to NULL and zeroes out the entire stat
* There is one tiny race between the entry creation and a readdirp on
  its parent dir, which could cause the inode-ctx setting and inode
  ctx reading to happen on two different inode objects. To prevent
  this, when entry->inode doesn't eqaul to linked_inode,
  - fuse-bridge is made to send only "entry" information without
    attributes
  - gfapi sets entry->inode to NULL and zeroes out the entire stat.

Change-Id: Ia27ff49a61922e88c73a1547ad8aacc9968a69df
BUG: 1390050
Updates: bz#1390050
Signed-off-by: Krutika Dhananjay 
Signed-off-by: Raghavendra G

tests: Add thin-arbiter.rc for writing tests for thin-arbiter

2018-08-18T04:20:15+00:00

fixes bz#1615789
Change-Id: I1f42e78fec5ddaf2a425dc4b82c9a20472aa146d
Signed-off-by: Pranith Kumar K

tests: Fix for gfid-mismatch-resolution-with-fav-child-policy.t failure

2018-08-14T04:30:25+00:00

This test was retried once on build
https://build.gluster.org/job/regression-on-demand-multiplex/174/
(logs for the first try is not available with this build)
Test case was failing in line #47 where it was was checking for the
heal count to be 0. Line #51 had passed that means file got the gfid
split brain resolved, and both the bricks had same gfids.
At line #54 it again failed which checks for the md5sum on both the
bricks. At this point the md5sum of the brick where the file got
impunged had the md5sum same as the newly created empty file. This
means the data heal has not happened for the file.
At line #64 enabling granular-entry-heal faild, but without the logs
it is not possible to debug this issue.

Change-Id: I56d854dbb9e188cafedfd24a9d463603ae79bd06
fixes: bz#1615331
Signed-off-by: karthik-us

tests: fix replace-brick-self-heal.t failure

2018-08-13T12:10:13+00:00

Please see BZ for details.

Change-Id: Id9273432874bc6a452ac96b2b8c7a61ea6c5b98d
Fixes: bz#1615239
Signed-off-by: Ravishankar N

tests: potential fixes for tests/basic/afr/add-brick-self-heal.t

2018-08-13T04:56:22+00:00

Please see bug description for details.

Change-Id: Ieb6bce6d1d5c4c31f1878dd1a1c3d007d8ff81d5
fixes: bz#1614654
Signed-off-by: Ravishankar N

tests: Set heal-timeout to 5 seconds

2018-08-09T11:32:09+00:00

Shd keeps doing heals in a loop until it heals at least one entry in the
previous run. A heal is termed successful only if it heals both metadata and
entry/data heal i.e. the entry needs to be completely healed by just that healer.

In tests/basic/afr/granular-esh/replace-brick.t test, brick-0 is old and brick-1
is new. After replace-brick only root-gfid will be present in brick-0's index
1) shd-thread corresponding to brick-0 does metadata heal, this creates
root-gfid in brick-0's 'dirty' index.
2) Both healer threads corresponding to brick-0 and brick-1 now try to heal
root-gfid and brick-1 gets the heal-domain lock. brick-0's shd-thread will
experience a failure and it goes back to waiting for 10 minutes
(cluster.heal-timeout).
3) When brick-1's healer-thread completes healing root-gfid it creates 5 files
which create indices in brick-0, so until brick-0 doesn't trigger one more
heal, heal won't happen. $HEAL_TIMEOUT is set at 120 seconds, which is lesser
than cluster.heal-timeout, so decreasing this to 5 seconds so that the next
heal is triggered which will do the heals.

fixes bz#1613807
Change-Id: I881133fc28880d8615fbc4558a0dfa0dc63d7798
Signed-off-by: Pranith Kumar K

Revert "performance/readdir-ahead: Invalidate cached dentries if they're modified while in cache"

2018-08-03T22:18:11+00:00

This reverts commit 7131de81f72dda0ef685ed60d0887c6e14289b8c.

With the latest master, I created a single brick volume and some files
inside it.

[root@rhgs313-6 ~]# umount -f /mnt/fuse1; mount -t glusterfs -s
192.168.122.6:/thunder /mnt/fuse1; ls -l /mnt/fuse1/; echo "Trying
again"; ls -l /mnt/fuse1
umount: /mnt/fuse1: not mounted
total 0
----------. 0 root root 0 Jan  1  1970 file-1
----------. 0 root root 0 Jan  1  1970 file-2
----------. 0 root root 0 Jan  1  1970 file-3
----------. 0 root root 0 Jan  1  1970 file-4
----------. 0 root root 0 Jan  1  1970 file-5
d---------. 0 root root 0 Jan  1  1970 subdir
Trying again
total 3
-rw-r--r--. 1 root root 33 Aug  3 14:06 file-1
-rw-r--r--. 1 root root 33 Aug  3 14:06 file-2
-rw-r--r--. 1 root root 33 Aug  3 14:06 file-3
-rw-r--r--. 1 root root 33 Aug  3 14:06 file-4
-rw-r--r--. 1 root root 33 Aug  3 14:06 file-5
d---------. 0 root root  0 Jan  1  1970 subdir
[root@rhgs313-6 ~]#

Conversation can be followed on gluster-devel on thread with subj:
tests/bugs/distribute/bug-1122443.t - spurious failure. git-bisected
pointed this patch as culprit.

Change-Id: I1eb46f6c196f44fde8ce991840a0e724e6f50862
Signed-off-by: Raghavendra G 
Updates: bz#1390050