glusterfs.git/tests/bugs/shard, branch master

features/shard: optimization over shard lookup in case of prealloc

2020-08-20T06:02:19+00:00

Assume that we are preallocating a VM of size 1TB with a shard
block size of 64MB then there will be ~16k shards.

This creation happens in 2 steps shard_fallocate() path i.e

1. lookup for the shards if any already present and
2. mknod over those shards do not exist.

But in case of fresh creation, we dont have to lookup for all
shards which are not present as the the file size will be 0.
Through this, we can save lookup on all shards which are not
present. This optimization is quite useful in the case of
preallocating big vm.

Also if the file is already present and the call is to
extend it to bigger size then we need not to lookup for non-
existent shards. Just lookup preexisting shards, populate
the inodes and issue mknod on extended size.

Fixes: #1425
Change-Id: I60036fe8302c696e0ca80ff11ab0ef5bcdbd7880
Signed-off-by: Vinayakswami Hariharmath

features/shard: Use fd lookup post file open

2020-06-11T11:20:41+00:00

Issue:
When a process has the open fd and the same file is
unlinked in middle of the operations, then file based
lookup fails with ENOENT or stale file

Solution:
When the file already open and fd is available, use fstat
to get the file attributes

Change-Id: I0e83aee9f11b616dcfe13769ebfcda6742e4e0f4
Fixes: #1281
Signed-off-by: Vinayakswami Hariharmath

features/shard: Aggregate file size, block-count before unwinding removexattr

2020-05-26T09:47:44+00:00

Posix translator returns pre and postbufs in the dict in {F}REMOVEXATTR fops.
These iatts are further cached at layers like md-cache.
Shard translator, in its current state, simply returns these values without
updating the aggregated file size and block-count.

This patch fixes this problem.

Change-Id: I4b2dd41ede472c5829af80a67401ec5a6376d872
Fixes: #1243
Signed-off-by: Krutika Dhananjay

features/shard: Aggregate size, block-count in iatt before unwinding setxattr

2020-05-21T20:06:32+00:00

Posix translator returns pre and postbufs in the dict in {F}SETXATTR fops.
These iatts are further cached at layers like md-cache.
Shard translator, in its current state, simply returns these values without
updating the aggregated file size and block-count.

This patch fixes this problem.

Change-Id: I4da0eceb4235b91546df79270bcc0af8cd64e9ea
Fixes: #1243
Signed-off-by: Krutika Dhananjay

tests: do not truncate file offsets and sizes to 32-bit

2020-04-15T07:58:33+00:00

Do not truncate file offsets and sizes to 32-bit to
prevent tests from spurious failures on >2Gb files.

Signed-off-by: Dmitry Antipov 

Change-Id: I2a77ea5f9f415249b23035eecf07129f19194ac2
Fixes: #1161

tests: Specify bs for dd

2019-10-16T06:49:55+00:00

On some distros default bs is very slow and the test takes
close to 2 minutes instead of 20 seconds.

fixes: bz#1761769
Change-Id: If10d595a7ca05f053237f3c5ffbb09c5151eab35
Signed-off-by: Pranith Kumar K

tests/shard: Remove dependence on distributed cache

2019-09-27T05:42:24+00:00

fixes: bz#1756211
Change-Id: Iee5b37af89ab624c16a45df364806003238280e5
Signed-off-by: Pranith Kumar K

features/shard: Send correct size when reads are sent beyond file size

2019-08-12T13:30:20+00:00

Change-Id: I0cebaaf55c09eb1fb77a274268ff564e871b743b
fixes bz#1738419
Signed-off-by: Krutika Dhananjay

features/shard: Fix extra unref when inode object is lru'd out and added back

2019-06-09T17:28:07+00:00

Long tale of double unref! But do read...

In cases where a shard base inode is evicted from lru list while still
being part of fsync list but added back soon before its unlink, there
could be an extra inode_unref() leading to premature inode destruction
leading to crash.

One such specific case is the following -

Consider features.shard-deletion-rate = features.shard-lru-limit = 2.
This is an oversimplified example but explains the problem clearly.

First, a file is FALLOCATE'd to a size so that number of shards under
/.shard = 3 > lru-limit.
Shards 1, 2 and 3 need to be resolved. 1 and 2 are resolved first.
Resultant lru list:
                               1 -----> 2
refs on base inode -          (1)  +   (1) = 2
3 needs to be resolved. So 1 is lru'd out. Resultant lru list -
		               2 -----> 3
refs on base inode -          (1)  +   (1) = 2

Note that 1 is inode_unlink()d but not destroyed because there are
non-zero refs on it since it is still participating in this ongoing
FALLOCATE operation.

FALLOCATE is sent on all participant shards. In the cbk, all of them are
added to fync_list.
Resulting fsync list -
                               1 -----> 2 -----> 3 (order doesn't matter)
refs on base inode -          (1)  +   (1)  +   (1) = 3
Total refs = 3 + 2 = 5

Now an attempt is made to unlink this file. Background deletion is triggered.
The first $shard-deletion-rate shards need to be unlinked in the first batch.
So shards 1 and 2 need to be resolved. inode_resolve fails on 1 but succeeds
on 2 and so it's moved to tail of list.
lru list now -
                              3 -----> 2
No change in refs.

shard 1 is looked up. In lookup_cbk, it's linked and added back to lru list
at the cost of evicting shard 3.
lru list now -
                              2 -----> 1
refs on base inode:          (1)  +   (1) = 2
fsync list now -
                              1 -----> 2 (again order doesn't matter)
refs on base inode -         (1)  +   (1) = 2
Total refs = 2 + 2 = 4
After eviction, it is found 3 needs fsync. So fsync is wound, yet to be ack'd.
So it is still inode_link()d.

Now deletion of shards 1 and 2 completes. lru list is empty. Base inode unref'd and
destroyed.
In the next batched deletion, 3 needs to be deleted. It is inode_resolve()able.
It is added back to lru list but base inode passed to __shard_update_shards_inode_list()
is NULL since the inode is destroyed. But its ctx->inode still contains base inode ptr
from first addition to lru list for no additional ref on it.
lru list now -
                              3
refs on base inode -         (0)
Total refs on base inode = 0
Unlink is sent on 3. It completes. Now since the ctx contains ptr to base_inode and the
shard is part of lru list, base shard is unref'd leading to a crash.

FIX:
When shard is readded back to lru list, copy the base inode pointer as is into its inode ctx,
even if it is NULL. This is needed to prevent double unrefs at the time of deleting it.

Change-Id: I99a44039da2e10a1aad183e84f644d63ca552462
Updates: bz#1696136
Signed-off-by: Krutika Dhananjay

features/shard: Fix block-count accounting upon truncate to lower size

2019-06-04T07:30:12+00:00

The way delta_blocks is computed in shard is incorrect, when a file
is truncated to a lower size. The accounting only considers change
in size of the last of the truncated shards.

FIX:

Get the block-count of each shard just before an unlink at posix in
xdata.  Their summation plus the change in size of last shard
(from an actual truncate) is used to compute delta_blocks which is
used in the xattrop for size update.

Change-Id: I9128a192e9bf8c3c3a959e96b7400879d03d7c53
fixes: bz#1705884
Signed-off-by: Krutika Dhananjay