glusterfs.git/tests/basic, branch v8.1

cluster/afr: Delay post-op for fsync

2020-07-28T13:21:05+00:00

Problem:
AFR doesn't delay post-op for fsync fop. For fsync heavy workloads
this leads to un-necessary fxattrop/finodelk for every fsync leading
to bad performance.

Fix:
Have delayed post-op for fsync. Add special flag in xdata to indicate
that afr shouldn't delay post-op in cases where either the
process will terminate or graph-switch would happen. Otherwise it leads
to un-necessary heals when the graph-switch/process-termination
happens before delayed-post-op completes.

Fixes: #1253
Change-Id: I531940d13269a111c49e0510d49514dc169f4577
Signed-off-by: Pranith Kumar K

open-behind: rewrite of internal logic

2020-06-29T12:51:52+00:00

There was a critical flaw in the previous implementation of open-behind.

When an open is done in the background, it's necessary to take a
reference on the fd_t object because once we "fake" the open answer,
the fd could be destroyed. However as long as there's a reference,
the release function won't be called. So, if the application closes
the file descriptor without having actually opened it, there will
always remain at least 1 reference, causing a leak.

To avoid this problem, the previous implementation didn't take a
reference on the fd_t, so there were races where the fd could be
destroyed while it was still in use.

To fix this, I've implemented a new xlator cbk that gets called from
fuse when the application closes a file descriptor.

The whole logic of handling background opens have been simplified and
it's more efficient now. Only if the fop needs to be delayed until an
open completes, a stub is created. Otherwise no memory allocations are
needed.

Correctly handling the close request while the open is still pending
has added a bit of complexity, but overall normal operation is simpler.

Change-Id: I6376a5491368e0e1c283cc452849032636261592
Fixes: #1225
Signed-off-by: Xavi Hernandez

tests: skip tests on absence of reflink in xfs

2020-05-29T07:02:11+00:00

Fixes: #1223
Change-Id: I36cb72d920ffd77405051546615c5262c392daef
Signed-off-by: Pranith Kumar K 
(cherry picked from commit b85f01abab658d1d704cd6caf84dd64eddafbff7)

afr: event gen changes

2020-04-24T12:50:13+00:00

The general idea of the changes is to prevent resetting event generation
to zero in the inode ctx, since event gen is something that should
follow 'causal order'.

Change #1:
For a read txn, in inode refresh cbk, if event_generation is
found zero, we are failing the read fop. This is not needed
because change in event gen is only a marker for the next inode refresh to
happen and should not be taken into account by the current read txn.

Change #2:
The event gen being zero above can happen if there is a racing lookup,
which resets even get (in afr_lookup_done) if there are non zero afr
xattrs. The resetting is done only to trigger an inode refresh and a
possible client side heal on the next lookup. That can be acheived by
setting the need_refresh flag in the inode ctx. So replaced all
occurences of resetting even gen to zero with a call to
afr_inode_need_refresh_set().

Change #3:
In both lookup and discover path, we are doing an inode refresh which is
not required since all 3 essentially do the same thing- update the inode
ctx with the good/bad copies from the brick replies. Inode refresh also
triggers background heals, but I think it is okay to do it when we call
refresh during the read and write txns and not in the lookup path.

The .ts which relied on inode refresh in lookup path to trigger heals are
now changed to do read txn so that inode refresh and the heal happens.

Change-Id: Iebf39a9be6ffd7ffd6e4046c96b0fa78ade6c5ec
Fixes: #1179
Signed-off-by: Ravishankar N 
Reported-by: Erik Jacobson 
(cherry picked from commit f0fcd909ad4535b60c9208d4804ebe6afe421a09)

tests: do not truncate file offsets and sizes to 32-bit

2020-04-15T07:58:33+00:00

Do not truncate file offsets and sizes to 32-bit to
prevent tests from spurious failures on >2Gb files.

Signed-off-by: Dmitry Antipov 

Change-Id: I2a77ea5f9f415249b23035eecf07129f19194ac2
Fixes: #1161

Posix: Optimize posix code to improve file creation

2020-04-06T10:43:26+00:00

Problem: Before executing a fop in POSIX xlator it builds an internal
         path based on GFID.To validate the path it call's (l)stat
         system call and while .glusterfs is heavily loaded kernel takes
         time to lookup inode and due to that performance drops

Solution: In this patch we followed two ways to improve the performance.
          1) Keep open fd specific to first level directory(gfid[0])
             in .glusterfs, it would force to kernel keep the inodes
             from all those files in cache. In case of memory pressure
             kernel won't uncache first level inodes. We need to open
             256 fd's per brick to access the entry faster.
          2) Use at based call's to access relative path to reduce
             path based lookup time.

Note: To verify the patch we have executed kernel untar 100 times on 6
      different clients after enabling metadata group-cache and some
      other option.We were getting more than 20 percent improvement in
      kenel untar after applying the patch.

Credits: Xavi Hernandez 
Change-Id: I1643e6b01ed669b2bb148d02f4e6a8e08da45343
updates: #891
Signed-off-by: Mohit Agrawal

cluster/ec: Add test for reset-brick command

2020-04-06T10:39:36+00:00

Following tests are done -

1 - After finishing reset-brick all the bricks should be up.
2 - Heal should be completed.
3 - Check number of entries present on brick which was reset.

Change-Id: I9314bed180293a99d400d94bb8cc7ece999da29e
Updates: #1144

snap_scheduler: python3 compatibility and new test case

2020-04-03T07:16:58+00:00

Problem:
"snap_scheduler.py init" command failing with the below traceback:

[root@dhcp43-104 ~]# snap_scheduler.py init
Traceback (most recent call last):
  File "/usr/sbin/snap_scheduler.py", line 941, in 
    sys.exit(main(sys.argv[1:]))
  File "/usr/sbin/snap_scheduler.py", line 851, in main
    initLogger()
  File "/usr/sbin/snap_scheduler.py", line 153, in initLogger
    logfile = os.path.join(process.stdout.read()[:-1], SCRIPT_NAME + ".log")
  File "/usr/lib64/python3.6/posixpath.py", line 94, in join
    genericpath._check_arg_types('join', a, *p)
  File "/usr/lib64/python3.6/genericpath.py", line 151, in _check_arg_types
    raise TypeError("Can't mix strings and bytes in path components") from None
TypeError: Can't mix strings and bytes in path components

Solution:

Added the 'universal_newlines' flag to Popen to support backward compatibility.

Added a basic test for snapshot scheduler.

Change-Id: I78e8fabd866fd96638747ecd21d292f5ca074a4e
Fixes: #1134
Signed-off-by: Sunny Kumar

tests: fix afr-lock-heal-* failure

2020-03-16T07:15:24+00:00

When brick-mux is enabled:

i)brick statedumps seem to be listing the same lock information multiple times.
While that is getting fixed, make changes to the .ts to check for unique values.

ii)detecting a brick as online via brick_up_status() seems to be taking
longer time when delaygen is enabled. Hence bump up PROCESS_UP_TIMEOUT to
90 for afr-lock-heal-advanced.t

Updates: #1042
Change-Id: Ife76008f7a99dd1f1fe5791a32577366baaab4b3
Signed-off-by: Ravishankar N

cluster/afr: Fixes for halo

2020-03-13T13:20:37+00:00

Current implementation assumes that ping-event will come after connect event
but that may not be the case in the cases where after socket connection fds
need to be re-opened which would consume more time. So handle any order of the
ping/child-up events.

fixes: bz#1800583
Change-Id: I6bcdc0caa503bdc039ef2b4739fbf4afae121f05
Signed-off-by: Pranith Kumar K