glusterfs.git/tests, branch v7.7

features/shard: Aggregate file size, block-count before unwinding removexattr

2020-07-13T06:29:35+00:00

Posix translator returns pre and postbufs in the dict in {F}REMOVEXATTR fops.
These iatts are further cached at layers like md-cache.
Shard translator, in its current state, simply returns these values without
updating the aggregated file size and block-count.

This patch fixes this problem.

Change-Id: I4b2dd41ede472c5829af80a67401ec5a6376d872
Fixes: #1243
Signed-off-by: Krutika Dhananjay 
(cherry picked from commit 32519525108a2ac6bcc64ad931dc8048d33d64de)

cluster/afr: Prioritize ENOSPC over other errors

2020-06-22T05:42:20+00:00

Problem:
In a replicate/arbiter volume if file creations or writes fails on
quorum number of bricks and on one brick it is due to ENOSPC and
on other brick it fails for a different reason, it may fail with
errors other than ENOSPC in some cases.

Fix:
Prioritize ENOSPC over other lesser priority errors and do not set
op_errno in posix_gfid_set if op_ret is 0 to avoid receiving any
error_no which can be misinterpreted by __afr_dir_write_finalize().

Also removing the function afr_has_arbiter_fop_cbk_quorum() which
might consider a successful reply form a single brick as quorum
success in some cases, whereas we always need fop to be successful
on quorum number of bricks in arbiter configuration.

Change-Id: I106e267f8b9451f681022f1cccb410d9bc824c08
Fixes: #1254
Signed-off-by: karthik-us 
(cherry picked from commit fa63b45ca5edf172b1b89b28b5db3c5129cc57b6)

afr: more quorum checks in lookup and new entry marking

2020-06-18T05:57:18+00:00

Problem: See github issue for details.

Fix:
-In lookup if the entry exists in 2 out of 3 bricks, don't fail the
lookup with ENOENT just because there is an entrylk on the parent.
Consider quorum before deciding.

-If entry FOP does not succeed on quorum no. of bricks, do not perform
new entry mark.

Fixes: #1303
Change-Id: I56df8c89ad53b29fa450c7930a7b7ccec9f4a6c5
Signed-off-by: Ravishankar N 
(cherry picked from commit c4a6748f25d2c1ab3ebcf89952278ebf94c8d371)

features/shard: Aggregate size, block-count in iatt before unwinding setxattr

2020-06-15T12:12:00+00:00

Posix translator returns pre and postbufs in the dict in {F}SETXATTR fops.
These iatts are further cached at layers like md-cache.
Shard translator, in its current state, simply returns these values without
updating the aggregated file size and block-count.

This patch fixes this problem.

Change-Id: I4da0eceb4235b91546df79270bcc0af8cd64e9ea
Fixes: #1243
Signed-off-by: Krutika Dhananjay 
(cherry picked from commit 29ec66c6ab77e2d6893c6e213a3d1fb148702c99)

open-behind: rewrite of internal logic

2020-06-15T10:53:24+00:00

There was a critical flaw in the previous implementation of open-behind.

When an open is done in the background, it's necessary to take a
reference on the fd_t object because once we "fake" the open answer,
the fd could be destroyed. However as long as there's a reference,
the release function won't be called. So, if the application closes
the file descriptor without having actually opened it, there will
always remain at least 1 reference, causing a leak.

To avoid this problem, the previous implementation didn't take a
reference on the fd_t, so there were races where the fd could be
destroyed while it was still in use.

To fix this, I've implemented a new xlator cbk that gets called from
fuse when the application closes a file descriptor.

The whole logic of handling background opens have been simplified and
it's more efficient now. Only if the fop needs to be delayed until an
open completes, a stub is created. Otherwise no memory allocations are
needed.

Correctly handling the close request while the open is still pending
has added a bit of complexity, but overall normal operation is simpler.

Change-Id: I6376a5491368e0e1c283cc452849032636261592
Fixes: #1225
Signed-off-by: Xavi Hernandez

tests: skip tests on absence of reflink in xfs

2020-06-10T10:27:33+00:00

Fixes: #1223
Change-Id: I36cb72d920ffd77405051546615c5262c392daef
Signed-off-by: Pranith Kumar K

afr: event gen changes

2020-05-09T12:14:06+00:00

The general idea of the changes is to prevent resetting event generation
to zero in the inode ctx, since event gen is something that should
follow 'causal order'.

Change #1:
For a read txn, in inode refresh cbk, if event_generation is
found zero, we are failing the read fop. This is not needed
because change in event gen is only a marker for the next inode refresh to
happen and should not be taken into account by the current read txn.

Change #2:
The event gen being zero above can happen if there is a racing lookup,
which resets even get (in afr_lookup_done) if there are non zero afr
xattrs. The resetting is done only to trigger an inode refresh and a
possible client side heal on the next lookup. That can be acheived by
setting the need_refresh flag in the inode ctx. So replaced all
occurences of resetting even gen to zero with a call to
afr_inode_need_refresh_set().

Change #3:
In both lookup and discover path, we are doing an inode refresh which is
not required since all 3 essentially do the same thing- update the inode
ctx with the good/bad copies from the brick replies. Inode refresh also
triggers background heals, but I think it is okay to do it when we call
refresh during the read and write txns and not in the lookup path.

The .ts which relied on inode refresh in lookup path to trigger heals are
now changed to do read txn so that inode refresh and the heal happens.

Change-Id: Iebf39a9be6ffd7ffd6e4046c96b0fa78ade6c5ec
Fixes: #1179
Signed-off-by: Ravishankar N 
Reported-by: Erik Jacobson 
(cherry picked from commit f0fcd909ad4535b60c9208d4804ebe6afe421a09)

snap_scheduler: python3 compatibility and new test case

2020-04-14T12:31:14+00:00

Problem:
"snap_scheduler.py init" command failing with the below traceback:

[root@dhcp43-104 ~]# snap_scheduler.py init
Traceback (most recent call last):
  File "/usr/sbin/snap_scheduler.py", line 941, in 
    sys.exit(main(sys.argv[1:]))
  File "/usr/sbin/snap_scheduler.py", line 851, in main
    initLogger()
  File "/usr/sbin/snap_scheduler.py", line 153, in initLogger
    logfile = os.path.join(process.stdout.read()[:-1], SCRIPT_NAME + ".log")
  File "/usr/lib64/python3.6/posixpath.py", line 94, in join
    genericpath._check_arg_types('join', a, *p)
  File "/usr/lib64/python3.6/genericpath.py", line 151, in _check_arg_types
    raise TypeError("Can't mix strings and bytes in path components") from None
TypeError: Can't mix strings and bytes in path components

Solution:

Added the 'universal_newlines' flag to Popen to support backward compatibility.

Added a basic test for snapshot scheduler.

Backport of:

   >Upstream patch: 
   >https://review.gluster.org/#/c/glusterfs/+/24257/
   >Change-Id: I78e8fabd866fd96638747ecd21d292f5ca074a4e
   >Fixes: #1134
   >Signed-off-by: Sunny Kumar 
   >(cherry picked from commit a7d7ec066e56ac03bf252c26beb20fdc2c3b6772)

Change-Id: I78e8fabd866fd96638747ecd21d292f5ca074a4e
Fixes: #1134
Signed-off-by: Sunny Kumar

features/utime: Don't access frame after stack-wind

2020-04-07T11:22:36+00:00

Problem:
frame is accessed after stack-wind. This can lead to crash
if the cbk frees the frame.

Fix:
Use new frame for the wind instead.

Fixes: #832
Change-Id: I64754609f1114b0bbd4d1336fa81a56f2cca6e03
Signed-off-by: Pranith Kumar K

write-behind: fix data corruption

2020-04-07T11:20:05+00:00

There was a bug in write-behind that allowed a previous completed write
to overwrite the overlapping region of data from a future write.

Suppose we want to send three writes (W1, W2 and W3). W1 and W2 are
sequential, and W3 writes at the same offset of W2:

    W2.offset = W3.offset = W1.offset + W1.size

Both W1 and W2 are sent in parallel. W3 is only sent after W2 completes.
So W3 should *always* overwrite the overlapping part of W2.

Suppose write-behind processes the requests from 2 concurrent threads:

    Thread 1                    Thread 2

    
                                
    wb_enqueue_tempted(W1)
    /* W1 is assigned gen X */
                                wb_enqueue_tempted(W2)
                                /* W2 is assigned gen X */

                                wb_process_queue()
                                  __wb_preprocess_winds()
                                    /* W1 and W2 are sequential and all
                                     * other requisites are met to merge
                                     * both requests. */
                                    __wb_collapse_small_writes(W1, W2)
                                    __wb_fulfill_request(W2)

                                  __wb_pick_unwinds() -> W2
                                  /* In this case, since the request is
                                   * already fulfilled, wb_inode->gen
                                   * is not updated. */

                                wb_do_unwinds()
                                  STACK_UNWIND(W2)

                                /* The application has received the
                                 * result of W2, so it can send W3. */
                                

                                wb_enqueue_tempted(W3)
                                /* W3 is assigned gen X */

                                wb_process_queue()
                                  /* Here we have W1 (which contains
                                   * the conflicting W2) and W3 with
                                   * same gen, so they are interpreted
                                   * as concurrent writes that do not
                                   * conflict. */
                                  __wb_pick_winds() -> W3

                                wb_do_winds()
                                  STACK_WIND(W3)

    wb_process_queue()
      /* Eventually W1 will be
       * ready to be sent */
      __wb_pick_winds() -> W1
      __wb_pick_unwinds() -> W1
        /* Here wb_inode->gen is
         * incremented. */

    wb_do_unwinds()
      STACK_UNWIND(W1)

    wb_do_winds()
      STACK_WIND(W1)

So, as we can see, W3 is sent before W1, which shouldn't happen.

The problem is that wb_inode->gen is only incremented for requests that
have not been fulfilled but, after a merge, the request is marked as
fulfilled even though it has not been sent to the brick. This allows
that future requests are assigned to the same generation, which could
be internally reordered.

Solution:

Increment wb_inode->gen before any unwind, even if it's for a fulfilled
request.

Special thanks to Stefan Ring for writing a reproducer that has been
crucial to identify the issue.

Change-Id: Id4ab0f294a09aca9a863ecaeef8856474662ab45
Signed-off-by: Xavi Hernandez 
Fixes: #884