glusterfs.git/xlators/features/shard, branch master

features/shard: optimization over shard lookup in case of prealloc

2020-08-20T06:02:19+00:00

Assume that we are preallocating a VM of size 1TB with a shard
block size of 64MB then there will be ~16k shards.

This creation happens in 2 steps shard_fallocate() path i.e

1. lookup for the shards if any already present and
2. mknod over those shards do not exist.

But in case of fresh creation, we dont have to lookup for all
shards which are not present as the the file size will be 0.
Through this, we can save lookup on all shards which are not
present. This optimization is quite useful in the case of
preallocating big vm.

Also if the file is already present and the call is to
extend it to bigger size then we need not to lookup for non-
existent shards. Just lookup preexisting shards, populate
the inodes and issue mknod on extended size.

Fixes: #1425
Change-Id: I60036fe8302c696e0ca80ff11ab0ef5bcdbd7880
Signed-off-by: Vinayakswami Hariharmath

features/shard: Convert shard block indices to uint64

2020-07-08T13:28:04+00:00

This patch fixes a crash in FOPs that operate on really large sharded
files where number of participant shards could sometimes exceed
signed int32 max.

The patch also adds GF_ASSERTs to ensure that number of participating
shards is always greater than 0 for files that do have more than one
shard.

Change-Id: I354de58796f350eb1aa42fcdf8092ca2e69ccbb6
Fixes: #1348
Signed-off-by: Krutika Dhananjay

features/shard: Use fd lookup post file open

2020-06-11T11:20:41+00:00

Issue:
When a process has the open fd and the same file is
unlinked in middle of the operations, then file based
lookup fails with ENOENT or stale file

Solution:
When the file already open and fd is available, use fstat
to get the file attributes

Change-Id: I0e83aee9f11b616dcfe13769ebfcda6742e4e0f4
Fixes: #1281
Signed-off-by: Vinayakswami Hariharmath

features/shard: Aggregate file size, block-count before unwinding removexattr

2020-05-26T09:47:44+00:00

Posix translator returns pre and postbufs in the dict in {F}REMOVEXATTR fops.
These iatts are further cached at layers like md-cache.
Shard translator, in its current state, simply returns these values without
updating the aggregated file size and block-count.

This patch fixes this problem.

Change-Id: I4b2dd41ede472c5829af80a67401ec5a6376d872
Fixes: #1243
Signed-off-by: Krutika Dhananjay

features/shard: Aggregate size, block-count in iatt before unwinding setxattr

2020-05-21T20:06:32+00:00

Posix translator returns pre and postbufs in the dict in {F}SETXATTR fops.
These iatts are further cached at layers like md-cache.
Shard translator, in its current state, simply returns these values without
updating the aggregated file size and block-count.

This patch fixes this problem.

Change-Id: I4da0eceb4235b91546df79270bcc0af8cd64e9ea
Fixes: #1243
Signed-off-by: Krutika Dhananjay

features/shard: Fix crash during shards cleanup in error cases

2020-03-26T04:44:39+00:00

A crash is seen during a reattempt to clean up shards in background
upon remount. And this happens even on remount (which means a remount
is no workaround for the crash).

In such a situation, the in-memory base inode object will not be
existent (new process, non-existent base shard).
So local->resolver_base_inode will be NULL.

In the event of an error (in this case, of space running out), the
process would crash at the time of logging the error in the following line -

        gf_msg(this->name, GF_LOG_ERROR, local->op_errno, SHARD_MSG_FOP_FAILED,
               "failed to delete shards of %s",
               uuid_utoa(local->resolver_base_inode->gfid));

Fixed that by using local->base_gfid as the source of gfid when
local->resolver_base_inode is NULL.

Change-Id: I0b49f2b58becd0d8874b3d4b14ff8d92a89d02d5
Fixes: #1127
Signed-off-by: Krutika Dhananjay

multiple: fix bad type cast

2020-01-10T00:58:19+00:00

When using inode_ctx_get() or inode_ctx_set(), a 'uint64_t *' is expected.
In many cases, the value to retrieve or store is a pointer, which will be
of smaller size in some architectures (for example 32-bits). In this case,
directly passing the address of the pointer casted to an 'uint64_t *' is
wrong and can cause memory corruption.

Change-Id: Iae616da9dda528df6743fa2f65ae5cff5ad23258
Signed-off-by: Xavi Hernandez 
Fixes: bz#1785611

tests/shard: fix tests/bugs/shard/unlinks-and-renames.t failure

2019-11-01T01:03:51+00:00

on rhel8 machine cleanup of shards is not happening properly for a
sharded file with hard-links. It needs to refresh the hard link count
to make it successful

The problem occurs when a sharded file with hard-links gets removed.
When the last link file is removed, all shards need to be cleaned up.
But in the current code structure shard xlator, instead of sending a lookup
to get the link count uses stale cache values of inodectx. Therby removing
the base shard but not the shards present in /.shard directory.

This fix will make sure that it marks in the first unlink's callback that
the inode ctx needs a refresh so that in the next operation, it will be
refreshed by looking up the file on-disk.

fixes: bz#1764110
Change-Id: I81625c7451dabf006c0864d859b1600f3521b648
Signed-off-by: Sheetal Pamecha

afr: support split-brain CLI for replica 3

2019-10-09T06:35:43+00:00

Ever since we added quorum checks for lookups in afr via commit
bd44d59741bb8c0f5d7a62c5b1094179dd0ce8a4, the split-brain resolution
commands would not work for replica 3 because there would be no
readables for the lookup fop.

The argument was that split-brains do not occur in replica 3 but we do
see (data/metadata) split-brain cases once in a while which indicate that there are
a few bugs/corner cases yet to be discovered and fixed.

Fortunately, commit  8016d51a3bbd410b0b927ed66be50a09574b7982 added
GF_CLIENT_PID_GLFS_HEALD as the pid for all fops made by glfsheal. If we
leverage this and allow lookups in afr when pid is GF_CLIENT_PID_GLFS_HEALD,
split-brain resolution commands will work for replica 3 volumes too.

Likewise, the check is added in shard_lookup as well to permit resolving
split-brains by specifying "/.shard/shard-file.xx" as the file name
(which previously used to fail with EPERM).

Change-Id: I3c543dea79caf7cfbc1633e9089cb1cdd2538ba9
Fixes: bz#1756938
Signed-off-by: Ravishankar N

features/shard: Send correct size when reads are sent beyond file size

2019-08-12T13:30:20+00:00

Change-Id: I0cebaaf55c09eb1fb77a274268ff564e871b743b
fixes bz#1738419
Signed-off-by: Krutika Dhananjay