glusterfs.git/libglusterfs/src, branch v8.1

posix: Implement a janitor thread to close fd

2020-08-21T10:38:07+00:00

Problem: In the commit fb20713b380e1df8d7f9e9df96563be2f9144fd6 we use
         syntask to close fd but we have found the patch is reducing the
         performance

Solution: Use janitor thread to close fd's and save the pfd ctx into
          ctx janitor list and also save the posix_xlator into pfd object to
          avoid the race condition during cleanup in brick_mux environment

Change-Id: Ifb3d18a854b267333a3a9e39845bfefb83fbc092
Fixes: #1396
Signed-off-by: Mohit Agrawal 
(cherry picked from commit 41b9616435cbdf671805856e487e373060c9455b)

Issue with gf_fill_iatt_for_dirent

2020-08-19T13:27:22+00:00

In "gf_fill_iatt_for_dirent()", while calculating inode_path for loc,
the inode should be of parent's. Instead it is loc.inode which results in error
 and eventually lookup/readdirp fails.

This patch fixes the same.

Change-Id: Ied086234a4634e8cb13520521ac547c87b3c76b5
Fixes: #1351
Signed-off-by: Soumya Koduri 
(cherry picked from commit ab8308333aaf033e07dbbdf2f69f9313a7e311f3)

open-behind: rewrite of internal logic

2020-06-29T12:51:52+00:00

There was a critical flaw in the previous implementation of open-behind.

When an open is done in the background, it's necessary to take a
reference on the fd_t object because once we "fake" the open answer,
the fd could be destroyed. However as long as there's a reference,
the release function won't be called. So, if the application closes
the file descriptor without having actually opened it, there will
always remain at least 1 reference, causing a leak.

To avoid this problem, the previous implementation didn't take a
reference on the fd_t, so there were races where the fd could be
destroyed while it was still in use.

To fix this, I've implemented a new xlator cbk that gets called from
fuse when the application closes a file descriptor.

The whole logic of handling background opens have been simplified and
it's more efficient now. Only if the fop needs to be delayed until an
open completes, a stub is created. Otherwise no memory allocations are
needed.

Correctly handling the close request while the open is still pending
has added a bit of complexity, but overall normal operation is simpler.

Change-Id: I6376a5491368e0e1c283cc452849032636261592
Fixes: #1225
Signed-off-by: Xavi Hernandez

syncop: improve scaling and implement more tools

2020-05-30T05:17:12+00:00

The current scaling of the syncop thread pool is not working properly
and can leave some tasks in the run queue more time than necessary
when the maximum number of threads is not reached.

This patch provides a better scaling condition to react faster to
pending work.

Condition variables and sleep in the context of a synctask have also
been implemented. Their purpose is to replace regular condition
variables and sleeps that block synctask threads and prevent other
tasks to be executed.

The new features have been applied to several places in glusterd.

Change-Id: Ic50b7c73c104f9e41f08101a357d30b95efccfbf
Fixes: #1116
Signed-off-by: Xavi Hernandez

tests: skip tests on absence of reflink in xfs

2020-05-29T07:02:11+00:00

Fixes: #1223
Change-Id: I36cb72d920ffd77405051546615c5262c392daef
Signed-off-by: Pranith Kumar K 
(cherry picked from commit b85f01abab658d1d704cd6caf84dd64eddafbff7)

Posix: Optimize posix code to improve file creation

2020-04-06T10:43:26+00:00

Problem: Before executing a fop in POSIX xlator it builds an internal
         path based on GFID.To validate the path it call's (l)stat
         system call and while .glusterfs is heavily loaded kernel takes
         time to lookup inode and due to that performance drops

Solution: In this patch we followed two ways to improve the performance.
          1) Keep open fd specific to first level directory(gfid[0])
             in .glusterfs, it would force to kernel keep the inodes
             from all those files in cache. In case of memory pressure
             kernel won't uncache first level inodes. We need to open
             256 fd's per brick to access the entry faster.
          2) Use at based call's to access relative path to reduce
             path based lookup time.

Note: To verify the patch we have executed kernel untar 100 times on 6
      different clients after enabling metadata group-cache and some
      other option.We were getting more than 20 percent improvement in
      kenel untar after applying the patch.

Credits: Xavi Hernandez 
Change-Id: I1643e6b01ed669b2bb148d02f4e6a8e08da45343
updates: #891
Signed-off-by: Mohit Agrawal

core[brick_mux]: brick crashed when creating and deleting volumes over time

2020-03-27T15:19:20+00:00

Problem: In brick_mux environment, while volumes are created/stopped in a loop
         after running a long time the main brick is crashed.The brick is crashed
         because the main brick process was not cleaned up memory for all objects
         at the time of detaching a volume.
         Below are the objects that are missed at the time of detaching a volume
         1) xlator object for a brick graph
         2) local_pool for posix_lock xlator
         3) rpc object cleanup at quota xlator
         4) inode leak at brick xlator

Solution: To avoid the crash resolve all leak at the time of detaching a brick
Change-Id: Ibb6e46c5fba22b9441a88cbaf6b3278823235913
updates: #977
Signed-off-by: Mohit Agrawal

Posix: Use simple approach to close fd

2020-03-20T04:08:42+00:00

Problem: posix_release(dir) functions add the fd's into a ctx->janitor_fds
         and janitor thread closes the fd's.In brick_mux environment it is
         difficult to handle race condition in janitor threads because brick
         spawns a single janitor thread for all bricks.

Solution: Use synctask to execute posix_release(dir) functions instead of 
          using background a thread to close fds.

Credits: Pranith Karampuri 
Change-Id: Iffb031f0695a7da83d5a2f6bac8863dad225317e
Fixes: bz#1811631
Signed-off-by: Mohit Agrawal

utime: resolve an issue of permission denied logs

2020-03-13T12:15:21+00:00

In case where uid is not set to be 0, there are possible errors
from acl xlator. So, set `uid = 0;` with pid indicating this is
set from UTIME activity.

The message "E [MSGID: 148002] [utime.c:146:gf_utime_set_mdata_setxattr_cbk] 0-dev_SNIP_data-utime: dict set of key for set-ctime-mdata failed [Permission denied]" repeated 2 times between [2019-12-19 21:27:55.042634] and [2019-12-19 21:27:55.047887]

Change-Id: Ieadf329835a40a13ac0bf908dac776e66954466c
Fixes: #832
Signed-off-by: Amar Tumballi

Segmentation fault occurs during truncate

2020-02-24T18:31:04+00:00

Problem:
Segmentation fault occurs when bricks are nearly full 100% and in
parallel truncate of a file is attempted (No space left on device).
Prerequicite is that performance xlators are activated
(read-ahead, write-behind etc)
while stack unwind of the frames following an error responce
from brick (No space left on device) frame->local includes a memory
location that is not allocated via mem_get but via calloc.
The destroyed frame is always ra_truncate_cbk winded from ra_ftruncate
and the inode ptr is copied to the frame local in the wb_ftruncate.

Fix:
extra check is added for the pool ptr

Change-Id: Ic5d3bd0ab7011e40b2811c6dece063b256e4d9d1
Fixes: bz#1797882
Signed-off-by: kinsu