glusterfs.git/tests/bugs, branch v7.0rc0

afr: restore timestamp of parent dir during entry-heal

2019-08-21T11:45:24+00:00

Fixes: bz#1741041
Change-Id: I29e338bac62104233a6f80212df8d0fb016affda
Signed-off-by: Ravishankar N 
(cherry picked from commit 8e9c53ebf16705b9a1db2fc486dc24a5cb244ddd)

features/shard: Send correct size when reads are sent beyond file size

2019-08-21T11:41:58+00:00

Change-Id: I0cebaaf55c09eb1fb77a274268ff564e871b743b
fixes bz#1740316
Signed-off-by: Krutika Dhananjay 
(cherry picked from commit 51237eda7c4b3846d08c5d24d1e3fe9b7ffba1d4)

cluster/afr: Fix incorrect reporting of gfid & type mismatch

2019-07-20T07:35:44+00:00

Problems:
1. When checking for type and gfid mismatch, if the type or gfid
is unknown because of missing gfid handle and the gfid xattr
it will be reported as type or gfid mismatch and the heal will
not complete.

2. If the source selected during entry heal has null gfid the same
will be sent to afr_lookup_and_heal_gfid(). In this function when
we try to assign the gfid on the bricks where it does not exist,
we are considering the same gfid and try to assign that on those
bricks. This will fail in posix_gfid_set() since the gfid sent
is null.

Fix:
If the gfid sent to afr_lookup_and_heal_gfid() is null choose a
valid gfid before proceeding to assign the gfid on the bricks
where it is missing.

In afr_selfheal_detect_gfid_and_type_mismatch(), do not report
type/gfid mismatch if the type/gfid is unknown or not set.

Change-Id: Ia06552e4dc4a9f89cb7f5302833604bd21bbf7da
fixes: bz#1729481
Signed-off-by: karthik-us

tests: Fix bug-1717819-metadata-split-brain-detection.t failure

2019-07-15T09:16:38+00:00

Problem:
tests/bugs/replicate/bug-1717819-metadata-split-brain-detection.t fails
intermittently in test cases #49 & #50, which compare the values of the
user set xattr values after enabling the heal. We are not waiting for
the heal to complete before comparing those values, which might lead
those tests to fail.

Fix:
Wait till the HEAL-TIMEOUT before comparing the xattr values.
Also cheking for the shd to come up and the bricks to connect to the shd
process in another case.

Change-Id: I0e245b328da9df23ce70c5300278fad1c1d9f7ff
fixes: bz#1729895
Signed-off-by: karthik-us

test: Fix spurious failures in bug-1040275-brick-uid-reset-on-volume-restart.t

2019-07-09T09:09:37+00:00

Problem: test case is failing after just starting the volume at the
         time of running stat command on mount point and client is
         getting error transport endpoint is not conencted

Solution: To avoid the error make sure all brick instance should be
          up and mount point should be active

> Change-Id: I49553a04d5b13e155ee02f4a1888a07fe3ee2ff5
> fixes: bz#1721590
> Signed-off-by: Mohit Agrawal 
> (cherry picked from commit 283b77805cca3027e333a11c9b00ac611662c9ee)

Change-Id: I49553a04d5b13e155ee02f4a1888a07fe3ee2ff5
fixes: bz#1728182
Signed-off-by: Mohit Agrawal

encryption/crypt: remove from volume file

2019-06-20T11:51:33+00:00

The feature is not supported and is moved out of the codebase from
glusterfs-5.x release. Doesn't make sense to keep the code to
support it.

For those who want to upgrade from an version supporting it to higher
version, please do a 'gluster volume reset $VOL encryption reset' and
then continue with the upgrade process.

updates: bz#1648169
Change-Id: I8cf822c0d7195940bd37f6af2432a3cac68d44d1
Signed-off-by: Amar Tumballi

glusterd: add GF_TRANSPORT_BOTH_TCP_RDMA in glusterd_get_gfproxy_client_volfile

2019-06-17T10:27:15+00:00

... with out which volume creation fails with "volume create: : failed:
Failed to create volume files"

Fixes: bz#1716812
Change-Id: I2f4c2c6d5290f066b54e1c1db19e25db9937bedb
Signed-off-by: Atin Mukherjee

tests: Add missing NFS test tag to the testfile

2019-06-15T03:56:13+00:00

$SRC/glusterfs/bugs/nfs/showmount-many-clients.t

Change-Id: I48758cc66fcb55f48c4a8a0a738b06867f6814a1
Signed-off-by: Aravinda VK 
Updates: bz#1193929

Cluster/afr: Don't treat all bricks having metadata pending as split-brain

2019-06-10T14:48:11+00:00

Problem:
We currently don't have a roll-back/undoing of post-ops if quorum is not met.
Though the FOP is still unwound with failure, the xattrs remain on the disk.
Due to these partial post-ops and partial heals (healing only when 2 bricks
are up), we can end up in metadata split-brain purely from the afr xattrs
point of view i.e each brick is blamed by atleast one of the others for
metadata. These scenarios are hit when there is frequent connect/disconnect
of the client/shd to the bricks.

Fix:
Pick a source based on the xattr values. If 2 bricks blame one, the blamed
one must be treated as sink. If there is no majority, all are sources. Once
we pick a source, self-heal will then do the heal instead of erroring out
due to split-brain.
This patch also adds restriction of all the bricks to be up to perform
metadata heal to avoid any metadata loss.

Removed the test case tests/bugs/replicate/bug-1468279-source-not-blaming-sinks.t
as it was doing metadata heal even when only 2 of 3 bricks were up.

Change-Id: I07a9d62f84ceda329dcab1f02a33aeed258dcb09
fixes: bz#1717819
Signed-off-by: karthik-us

features/shard: Fix extra unref when inode object is lru'd out and added back

2019-06-09T17:28:07+00:00

Long tale of double unref! But do read...

In cases where a shard base inode is evicted from lru list while still
being part of fsync list but added back soon before its unlink, there
could be an extra inode_unref() leading to premature inode destruction
leading to crash.

One such specific case is the following -

Consider features.shard-deletion-rate = features.shard-lru-limit = 2.
This is an oversimplified example but explains the problem clearly.

First, a file is FALLOCATE'd to a size so that number of shards under
/.shard = 3 > lru-limit.
Shards 1, 2 and 3 need to be resolved. 1 and 2 are resolved first.
Resultant lru list:
                               1 -----> 2
refs on base inode -          (1)  +   (1) = 2
3 needs to be resolved. So 1 is lru'd out. Resultant lru list -
		               2 -----> 3
refs on base inode -          (1)  +   (1) = 2

Note that 1 is inode_unlink()d but not destroyed because there are
non-zero refs on it since it is still participating in this ongoing
FALLOCATE operation.

FALLOCATE is sent on all participant shards. In the cbk, all of them are
added to fync_list.
Resulting fsync list -
                               1 -----> 2 -----> 3 (order doesn't matter)
refs on base inode -          (1)  +   (1)  +   (1) = 3
Total refs = 3 + 2 = 5

Now an attempt is made to unlink this file. Background deletion is triggered.
The first $shard-deletion-rate shards need to be unlinked in the first batch.
So shards 1 and 2 need to be resolved. inode_resolve fails on 1 but succeeds
on 2 and so it's moved to tail of list.
lru list now -
                              3 -----> 2
No change in refs.

shard 1 is looked up. In lookup_cbk, it's linked and added back to lru list
at the cost of evicting shard 3.
lru list now -
                              2 -----> 1
refs on base inode:          (1)  +   (1) = 2
fsync list now -
                              1 -----> 2 (again order doesn't matter)
refs on base inode -         (1)  +   (1) = 2
Total refs = 2 + 2 = 4
After eviction, it is found 3 needs fsync. So fsync is wound, yet to be ack'd.
So it is still inode_link()d.

Now deletion of shards 1 and 2 completes. lru list is empty. Base inode unref'd and
destroyed.
In the next batched deletion, 3 needs to be deleted. It is inode_resolve()able.
It is added back to lru list but base inode passed to __shard_update_shards_inode_list()
is NULL since the inode is destroyed. But its ctx->inode still contains base inode ptr
from first addition to lru list for no additional ref on it.
lru list now -
                              3
refs on base inode -         (0)
Total refs on base inode = 0
Unlink is sent on 3. It completes. Now since the ctx contains ptr to base_inode and the
shard is part of lru list, base shard is unref'd leading to a crash.

FIX:
When shard is readded back to lru list, copy the base inode pointer as is into its inode ctx,
even if it is NULL. This is needed to prevent double unrefs at the time of deleting it.

Change-Id: I99a44039da2e10a1aad183e84f644d63ca552462
Updates: bz#1696136
Signed-off-by: Krutika Dhananjay