glusterfs.git/xlators/features/shard, branch release-7

features/shard: Convert shard block indices to uint64

2020-07-13T06:43:52+00:00

This patch fixes a crash in FOPs that operate on really large sharded
files where number of participant shards could sometimes exceed
signed int32 max.

The patch also adds GF_ASSERTs to ensure that number of participating
shards is always greater than 0 for files that do have more than one
shard.

Change-Id: I354de58796f350eb1aa42fcdf8092ca2e69ccbb6
Fixes: #1348
Signed-off-by: Krutika Dhananjay 
(cherry picked from commit cdf01cc47eb2efb427b5855732d9607eec2abc8a)

features/shard: Aggregate file size, block-count before unwinding removexattr

2020-07-13T06:29:35+00:00

Posix translator returns pre and postbufs in the dict in {F}REMOVEXATTR fops.
These iatts are further cached at layers like md-cache.
Shard translator, in its current state, simply returns these values without
updating the aggregated file size and block-count.

This patch fixes this problem.

Change-Id: I4b2dd41ede472c5829af80a67401ec5a6376d872
Fixes: #1243
Signed-off-by: Krutika Dhananjay 
(cherry picked from commit 32519525108a2ac6bcc64ad931dc8048d33d64de)

features/shard: Aggregate size, block-count in iatt before unwinding setxattr

2020-06-15T12:12:00+00:00

Posix translator returns pre and postbufs in the dict in {F}SETXATTR fops.
These iatts are further cached at layers like md-cache.
Shard translator, in its current state, simply returns these values without
updating the aggregated file size and block-count.

This patch fixes this problem.

Change-Id: I4da0eceb4235b91546df79270bcc0af8cd64e9ea
Fixes: #1243
Signed-off-by: Krutika Dhananjay 
(cherry picked from commit 29ec66c6ab77e2d6893c6e213a3d1fb148702c99)

features/shard: Fix crash during shards cleanup in error cases

2020-04-07T11:11:25+00:00

A crash is seen during a reattempt to clean up shards in background
upon remount. And this happens even on remount (which means a remount
is no workaround for the crash).

In such a situation, the in-memory base inode object will not be
existent (new process, non-existent base shard).
So local->resolver_base_inode will be NULL.

In the event of an error (in this case, of space running out), the
process would crash at the time of logging the error in the following line -

        gf_msg(this->name, GF_LOG_ERROR, local->op_errno, SHARD_MSG_FOP_FAILED,
               "failed to delete shards of %s",
               uuid_utoa(local->resolver_base_inode->gfid));

Fixed that by using local->base_gfid as the source of gfid when
local->resolver_base_inode is NULL.

Change-Id: I0b49f2b58becd0d8874b3d4b14ff8d92a89d02d5
Fixes: #1127
Signed-off-by: Krutika Dhananjay 
(cherry picked from commit cc43ac8651de9aa508b01cb259b43c02d89b2afc)

multiple: fix bad type cast

2020-03-16T08:21:00+00:00

When using inode_ctx_get() or inode_ctx_set(), a 'uint64_t *' is expected.
In many cases, the value to retrieve or store is a pointer, which will be
of smaller size in some architectures (for example 32-bits). In this case,
directly passing the address of the pointer casted to an 'uint64_t *' is
wrong and can cause memory corruption.

Backport of:
> Change-Id: Iae616da9dda528df6743fa2f65ae5cff5ad23258
> Signed-off-by: Xavi Hernandez 
> Fixes: bz#1785611

Change-Id: Iae616da9dda528df6743fa2f65ae5cff5ad23258
Signed-off-by: Xavi Hernandez 
Fixes: bz#1785323

tests/shard: fix tests/bugs/shard/unlinks-and-renames.t failure

2019-11-13T05:05:54+00:00

on rhel8 machine cleanup of shards is not happening properly for a
sharded file with hard-links. It needs to refresh the hard link count
to make it successful

The problem occurs when a sharded file with hard-links gets removed.
When the last link file is removed, all shards need to be cleaned up.
But in the current code structure shard xlator, instead of sending a lookup
to get the link count uses stale cache values of inodectx. Therby removing
the base shard but not the shards present in /.shard directory.

This fix will make sure that it marks in the first unlink's callback that
the inode ctx needs a refresh so that in the next operation, it will be
refreshed by looking up the file on-disk.

>fixes: bz#1764110
>Change-Id: I81625c7451dabf006c0864d859b1600f3521b648
>Signed-off-by: Sheetal Pamecha 
>(Reviewed on upstream link https://review.gluster.org/#/c/glusterfs/+/23585/)

Fixes: bz#1768760
Change-Id: I81625c7451dabf006c0864d859b1600f3521b648
Signed-off-by: Sheetal Pamecha

afr: support split-brain CLI for replica 3

2019-11-13T05:04:07+00:00

Ever since we added quorum checks for lookups in afr via commit
bd44d59741bb8c0f5d7a62c5b1094179dd0ce8a4, the split-brain resolution
commands would not work for replica 3 because there would be no
readables for the lookup fop.

The argument was that split-brains do not occur in replica 3 but we do
see (data/metadata) split-brain cases once in a while which indicate that there are
a few bugs/corner cases yet to be discovered and fixed.

Fortunately, commit  8016d51a3bbd410b0b927ed66be50a09574b7982 added
GF_CLIENT_PID_GLFS_HEALD as the pid for all fops made by glfsheal. If we
leverage this and allow lookups in afr when pid is GF_CLIENT_PID_GLFS_HEALD,
split-brain resolution commands will work for replica 3 volumes too.

Likewise, the check is added in shard_lookup as well to permit resolving
split-brains by specifying "/.shard/shard-file.xx" as the file name
(which previously used to fail with EPERM).

Change-Id: I3c543dea79caf7cfbc1633e9089cb1cdd2538ba9
Fixes: bz#1760791
Signed-off-by: Ravishankar N 
(cherry picked from commit 47dbd753187f69b3835d2e42fdbe7485874c4b3e)

features/shard: Send correct size when reads are sent beyond file size

2019-08-21T11:41:58+00:00

Change-Id: I0cebaaf55c09eb1fb77a274268ff564e871b743b
fixes bz#1740316
Signed-off-by: Krutika Dhananjay 
(cherry picked from commit 51237eda7c4b3846d08c5d24d1e3fe9b7ffba1d4)

graph/shd: Use glusterfs_graph_deactivate to free the xl rec

2019-07-24T10:23:18+00:00

We were using glusterfs_graph_fini to free the xl rec from
glusterfs_process_volfp as well as glusterfs_graph_cleanup.

Instead we can use glusterfs_graph_deactivate, which is does
fini as well as other common rec free.

Backportof : https://review.gluster.org/22904
>Change-Id: Ie4a5f2771e5254aa5ed9f00c3672a6d2cc8e4bc1
>Updates: bz#1716695
>Signed-off-by: Mohammed Rafi KC 

Change-Id: Ie4a5f2771e5254aa5ed9f00c3672a6d2cc8e4bc1
Updates: bz#1730229
Signed-off-by: Mohammed Rafi KC

libglusterfs: cleanup iovec functions

2019-06-11T13:28:07+00:00

This patch cleans some iovec code and creates two additional helper
functions to simplify management of iovec structures.

  iov_range_copy(struct iovec *dst, uint32_t dst_count, uint32_t dst_offset,
                 struct iovec *src, uint32_t src_count, uint32_t src_offset,
                 uint32_t size);

    This function copies up to 'size' bytes from 'src' at offset
    'src_offset' to 'dst' at 'dst_offset'. It returns the number of
    bytes copied.

  iov_skip(struct iovec *iovec, uint32_t count, uint32_t size);

    This function removes the initial 'size' bytes from 'iovec' and
    returns the updated number of iovec vectors remaining.

The signature of iov_subset() has also been modified to make it safer
and easier to use. The new signature is:

  iov_subset(struct iovec *src, int src_count, uint32_t start, uint32_t size,
             struct iovec **dst, int32_t dst_count);

    This function creates a new iovec array containing the subset of the
    'src' vector starting at 'start' with size 'size'. The resulting
    array is allocated if '*dst' is NULL, or copied to '*dst' if it fits
    (based on 'dst_count'). It returns the number of iovec vectors used.

A new set of functions to iterate through an iovec array have been
created. They can be used to simplify the implementation of other
iovec-based helper functions.

Change-Id: Ia5fe57e388e23392a8d6cdab17670e337cadd587
Updates: bz#1193929
Signed-off-by: Xavi Hernandez