glusterfs.git/xlators, branch v7.5

posix: log aio_error return codes in posix_fs_health_check

2020-04-14T14:00:17+00:00

Problem: Sometime brick is going down to health check thread is
         failed without logging error codes return by aio system calls.
         As per aio_error man page it returns a positive error number
         if the asynchronous I/O operation failed.

Solution: log aio_error return codes in error message

> Change-Id: I2496b1bc16e602b0fd3ad53e211de11ec8c641ef
> (Cherry picked from commit 032862fa3944fc7152140aaa13cdc474ae594a51)
> (Reviwed on upstream link https://review.gluster.org/#/c/glusterfs/+/23284/
> Signed-off-by: Mohit Agrawal 

Change-Id: I2496b1bc16e602b0fd3ad53e211de11ec8c641ef
Fixes: #1168
Signed-off-by: Mohit Agrawal

mount/fuse: Wait for 'mount' child to exit before dying

2020-04-14T12:49:34+00:00

Problem:
tests/bugs/protocol/bug-1433815-auth-allow.t fails
sometimes because of stale mount. This stale mount
comes into picture when parent process dies without
waiting for the child process which mounts fuse fs
to die

Fix:
Wait for mounting child process to die before dying.

Fixes: #1152
Change-Id: I8baee8720e88614fdb762ea822d5877973eef8dc
Signed-off-by: Pranith Kumar K

features/utime: Don't access frame after stack-wind

2020-04-07T11:22:36+00:00

Problem:
frame is accessed after stack-wind. This can lead to crash
if the cbk frees the frame.

Fix:
Use new frame for the wind instead.

Fixes: #832
Change-Id: I64754609f1114b0bbd4d1336fa81a56f2cca6e03
Signed-off-by: Pranith Kumar K

write-behind: fix data corruption

2020-04-07T11:20:05+00:00

There was a bug in write-behind that allowed a previous completed write
to overwrite the overlapping region of data from a future write.

Suppose we want to send three writes (W1, W2 and W3). W1 and W2 are
sequential, and W3 writes at the same offset of W2:

    W2.offset = W3.offset = W1.offset + W1.size

Both W1 and W2 are sent in parallel. W3 is only sent after W2 completes.
So W3 should *always* overwrite the overlapping part of W2.

Suppose write-behind processes the requests from 2 concurrent threads:

    Thread 1                    Thread 2

    
                                
    wb_enqueue_tempted(W1)
    /* W1 is assigned gen X */
                                wb_enqueue_tempted(W2)
                                /* W2 is assigned gen X */

                                wb_process_queue()
                                  __wb_preprocess_winds()
                                    /* W1 and W2 are sequential and all
                                     * other requisites are met to merge
                                     * both requests. */
                                    __wb_collapse_small_writes(W1, W2)
                                    __wb_fulfill_request(W2)

                                  __wb_pick_unwinds() -> W2
                                  /* In this case, since the request is
                                   * already fulfilled, wb_inode->gen
                                   * is not updated. */

                                wb_do_unwinds()
                                  STACK_UNWIND(W2)

                                /* The application has received the
                                 * result of W2, so it can send W3. */
                                

                                wb_enqueue_tempted(W3)
                                /* W3 is assigned gen X */

                                wb_process_queue()
                                  /* Here we have W1 (which contains
                                   * the conflicting W2) and W3 with
                                   * same gen, so they are interpreted
                                   * as concurrent writes that do not
                                   * conflict. */
                                  __wb_pick_winds() -> W3

                                wb_do_winds()
                                  STACK_WIND(W3)

    wb_process_queue()
      /* Eventually W1 will be
       * ready to be sent */
      __wb_pick_winds() -> W1
      __wb_pick_unwinds() -> W1
        /* Here wb_inode->gen is
         * incremented. */

    wb_do_unwinds()
      STACK_UNWIND(W1)

    wb_do_winds()
      STACK_WIND(W1)

So, as we can see, W3 is sent before W1, which shouldn't happen.

The problem is that wb_inode->gen is only incremented for requests that
have not been fulfilled but, after a merge, the request is marked as
fulfilled even though it has not been sent to the brick. This allows
that future requests are assigned to the same generation, which could
be internally reordered.

Solution:

Increment wb_inode->gen before any unwind, even if it's for a fulfilled
request.

Special thanks to Stefan Ring for writing a reproducer that has been
crucial to identify the issue.

Change-Id: Id4ab0f294a09aca9a863ecaeef8856474662ab45
Signed-off-by: Xavi Hernandez 
Fixes: #884

utime: resolve an issue of permission denied logs

2020-04-07T11:14:40+00:00

In case where uid is not set to be 0, there are possible errors
from acl xlator. So, set `uid = 0;` with pid indicating this is
set from UTIME activity.

The message "E [MSGID: 148002] [utime.c:146:gf_utime_set_mdata_setxattr_cbk] 0-dev_SNIP_data-utime: dict set of key for set-ctime-mdata failed [Permission denied]" repeated 2 times between [2019-12-19 21:27:55.042634] and [2019-12-19 21:27:55.047887]

Change-Id: Ieadf329835a40a13ac0bf908dac776e66954466c
Updates: #832
Signed-off-by: Amar Tumballi 
(cherry picked from commit eb916c057036db8289b41265797e5dce066d1512)

features/shard: Fix crash during shards cleanup in error cases

2020-04-07T11:11:25+00:00

A crash is seen during a reattempt to clean up shards in background
upon remount. And this happens even on remount (which means a remount
is no workaround for the crash).

In such a situation, the in-memory base inode object will not be
existent (new process, non-existent base shard).
So local->resolver_base_inode will be NULL.

In the event of an error (in this case, of space running out), the
process would crash at the time of logging the error in the following line -

        gf_msg(this->name, GF_LOG_ERROR, local->op_errno, SHARD_MSG_FOP_FAILED,
               "failed to delete shards of %s",
               uuid_utoa(local->resolver_base_inode->gfid));

Fixed that by using local->base_gfid as the source of gfid when
local->resolver_base_inode is NULL.

Change-Id: I0b49f2b58becd0d8874b3d4b14ff8d92a89d02d5
Fixes: #1127
Signed-off-by: Krutika Dhananjay 
(cherry picked from commit cc43ac8651de9aa508b01cb259b43c02d89b2afc)

afr: mark pending xattrs as a part of metadata heal

2020-04-07T11:09:25+00:00

...if pending xattrs are zero for all children.

Problem:
If there are no pending xattrs and a metadata heal needs to be
performed, it can be possible that we end up with xattrs inadvertendly
deleted from all bricks, as explained in the  BZ.

Fix:
After picking one among the sources as the good copy, mark pending xattrs on
all sources to blame the sinks. Now even if this metadata heal fails midway,
a subsequent heal will still choose one of the valid sources that it
picked previously.

Updates: #1067
Change-Id: If1b050b70b0ad911e162c04db4d89b263e2b8d7b
Signed-off-by: Ravishankar N 
(cherry picked from commit 2d5ba449e9200b16184b1e7fc84cabd015f1f779)

open-behind: fix missing fd reference

2020-04-07T10:32:07+00:00

Open behind was not keeping any reference on fd's pending to be
opened. This makes it possible that a concurrent close and an entry
fop (unlink, rename, ...) caused destruction of the fd while it
was still being used.

Change-Id: Ie9e992902cf2cd7be4af1f8b4e57af9bd6afd8e9
Fixes: #1028
Signed-off-by: Xavi Hernandez

cluster/ec: Change handling of heal failure to avoid crash

2020-03-17T14:27:36+00:00

Problem:
ec_getxattr_heal_cbk was called with NULL as second argument
in case heal was failing.
This function was dereferencing "cookie" argument which caused crash.

Solution:
Cookie is changed to carry the value that was supposed to be
stored in fop->data, so even in the case when fop is NULL in error
case, there won't be any NULL dereference.

Thanks to Xavi for the suggestion about the fix.

Change-Id: I0798000d5cadb17c3c2fbfa1baf77033ffc2bb8c
updates: #1061

glusterd: Brick process fails to come up with brickmux on

2020-03-17T06:04:40+00:00

Issue:
1- In a cluster of 3 Nodes N1, N2, N3. Create 3 volumes vol1,
vol2, vol3 with 3 bricks (one from each node)
2- Set cluster.brick-multiplex on
3- Start all 3 volumes
4- Check if all bricks on a node are running on same port
5- Kill N1
6- Set performance.readdir-ahead for volumes vol1, vol2, vol3
7- Bring N1 up and check volume status
8- All bricks processes not running on N1.

Root Cause -
Since, There is a diff in volfile versions in N1 as compared
to N2 and N3 therefore glusterd_import_friend_volume() is called.
glusterd_import_friend_volume() copies the new_volinfo and deletes
old_volinfo and then calls glusterd_start_bricks().
glusterd_start_bricks() looks for the volfiles and sends an rpc
request to glusterfs_handle_attach(). Now, since the volinfo
has been deleted by glusterd_delete_stale_volume()
from priv->volumes list before glusterd_start_bricks() and
glusterd_create_volfiles_and_notify_services() and
glusterd_list_add_order is called after glusterd_start_bricks(),
therefore the attach RPC req gets an empty volfile path
and that causes the brick to crash.

Fix- Call glusterd_list_add_order() and
glusterd_create_volfiles_and_notify_services before
glusterd_start_bricks() cal is made in glusterd_import_friend_volume

> Change-Id: Idfe0e8710f7eb77ca3ddfa1cabeb45b2987f41aa
> Bug: bz#1773856
> Signed-off-by: Mohammed Rafi KC 
(cherry picked from commit 45e81aae791da9d013aba2286af44826227c05ec)

Change-Id: Idfe0e8710f7eb77ca3ddfa1cabeb45b2987f41aa
fixes: bz#1808964
Signed-off-by: Sanju Rakonde