glusterfs.git/xlators/features/shard/src/shard.h, branch v7.4

features/shard: Fix block-count accounting upon truncate to lower size

2019-06-04T07:30:12+00:00

The way delta_blocks is computed in shard is incorrect, when a file
is truncated to a lower size. The accounting only considers change
in size of the last of the truncated shards.

FIX:

Get the block-count of each shard just before an unlink at posix in
xdata.  Their summation plus the change in size of last shard
(from an actual truncate) is used to compute delta_blocks which is
used in the xattrop for size update.

Change-Id: I9128a192e9bf8c3c3a959e96b7400879d03d7c53
fixes: bz#1705884
Signed-off-by: Krutika Dhananjay

features/shard: Fix integer overflow in block count accounting

2019-05-06T10:49:04+00:00

... by holding delta_blocks in 64-bit int as opposed to 32-bit int.

Change-Id: I2c1ddab17457f45e27428575ad16fa678fd6c0eb
updates: bz#1705884
Signed-off-by: Krutika Dhananjay

features/shard: Fix launch of multiple synctasks for background deletion

2019-01-11T08:35:55+00:00

PROBLEM:

When multiple sharded files are deleted in quick succession, multiple
issues were observed:
1. misleading logs corresponding to a sharded file where while one log
   message said the shards corresponding to the file were deleted
   successfully, this was followed by multiple logs suggesting the very
   same operation failed. This was because of multiple synctasks
   attempting to clean up shards of the same file and only one of them
   succeeding (the one that gets ENTRYLK successfully), and the rest of
   them logging failure.

2. multiple synctasks to do background deletion would be launched, one
   for each deleted file but all of them could readdir entries from
   .remove_me at the same time could potentially contend for ENTRYLK on
   .shard for each of the entry names. This is undesirable and wasteful.

FIX:
Background deletion will now follow a state machine. In the event that
there are multiple attempts to launch synctask for background deletion,
one for each file deleted, only the first task is launched. And if while
this task is doing the cleanup, more attempts are made to delete other
files, the state of the synctask is adjusted so that it restarts the
crawl even after reaching end-of-directory to pick up any files it may
have missed in the previous iteration.

This patch also fixes uninitialized lk-owner during syncop_entrylk()
which was leading to multiple background deletion synctasks entering
the critical section at the same time and leading to illegal memory access
of base inode in the second syntcask after it was destroyed post shard deletion
by the first synctask.

Change-Id: Ib33773d27fb4be463c7a8a5a6a4b63689705324e
updates: bz#1662368
Signed-off-by: Krutika Dhananjay

libglusterfs: Move devel headers under glusterfs directory

2018-12-05T21:47:04+00:00

libglusterfs devel package headers are referenced in code using
include semantics for a program, this while it works can be better
especially when dealing with out of tree xlator builds or in
general out of tree devel package usage.

Towards this, the following changes are done,
- moved all devel headers under a glusterfs directory
- Included these headers using system header notation <> in all
code outside of libglusterfs
- Included these headers using own program notation "" within
libglusterfs

This change although big, is just moving around the headers and
making it correct when including these headers from other sources.

This helps us correctly include libglusterfs includes without
namespace conflicts.

Change-Id: Id2a98854e671a7ee5d73be44da5ba1a74252423b
Updates: bz#1193929
Signed-off-by: ShyamsundarR

Land clang-format changes

2018-09-12T11:52:48+00:00

Change-Id: I6f5d8140a06f3c1b2d196849299f8d483028d33b

Multiple files: calloc -> malloc

2018-09-04T04:58:14+00:00

xlators/storage/posix/src/posix-inode-fd-ops.c:
xlators/storage/posix/src/posix-helpers.c:
xlators/storage/bd/src/bd.c:
xlators/protocol/client/src/client-lk.c:
xlators/performance/quick-read/src/quick-read.c:
xlators/performance/io-cache/src/page.c
xlators/nfs/server/src/nfs3-helpers.c
xlators/nfs/server/src/nfs-fops.c
xlators/nfs/server/src/mount3udp_svc.c
xlators/nfs/server/src/mount3.c
xlators/mount/fuse/src/fuse-helpers.c
xlators/mount/fuse/src/fuse-bridge.c
xlators/mgmt/glusterd/src/glusterd-utils.c
xlators/mgmt/glusterd/src/glusterd-syncop.h
xlators/mgmt/glusterd/src/glusterd-snapshot.c
xlators/mgmt/glusterd/src/glusterd-rpc-ops.c
xlators/mgmt/glusterd/src/glusterd-replace-brick.c
xlators/mgmt/glusterd/src/glusterd-op-sm.c
xlators/mgmt/glusterd/src/glusterd-mgmt.c
xlators/meta/src/subvolumes-dir.c
xlators/meta/src/graph-dir.c
xlators/features/trash/src/trash.c
xlators/features/shard/src/shard.h
xlators/features/shard/src/shard.c
xlators/features/marker/src/marker-quota.c
xlators/features/locks/src/common.c
xlators/features/leases/src/leases-internal.c
xlators/features/gfid-access/src/gfid-access.c
xlators/features/cloudsync/src/cloudsync-plugins/src/cloudsyncs3/src/libcloudsyncs3.c
xlators/features/bit-rot/src/bitd/bit-rot.c
xlators/features/bit-rot/src/bitd/bit-rot-scrub.c
bxlators/encryption/crypt/src/metadata.c
xlators/encryption/crypt/src/crypt.c
xlators/performance/md-cache/src/md-cache.c:

Move to GF_MALLOC() instead of GF_CALLOC() when possible

It doesn't make sense to calloc (allocate and clear) memory
when the code right away fills that memory with data.
It may be optimized by the compiler, or have a microscopic
performance improvement.

In some cases, also changed allocation size to be sizeof some
struct or type instead of a pointer - easier to read.
In some cases, removed redundant strlen() calls by saving the result
into a variable.

1. Only done for the straightforward cases. There's room for improvement.
2. Please review carefully, especially for string allocation, with the
terminating NULL string.

Only compile-tested!

.. and allocate memory as much as needed.

xlators/nfs/server/src/mount3.c :

Don't blindly allocate PATH_MAX, but strlen() the string and allocate
appropriately.
Also, align error messges.

updates: bz#1193929
Original-Author: Yaniv Kaul 
Signed-off-by: Yaniv Kaul 
Signed-off-by: Yaniv Kaul 
Change-Id: Ibda6f33dd180b7f7694f20a12af1e9576fe197f5

features/shard: Make lru limit of inode list configurable

2018-07-23T07:32:32+00:00

Currently this lru limit is hard-coded to 16384. This patch makes it
configurable to make it easier to hit the lru limit and enable testing
of different cases that arise when the limit is reached.

The option is features.shard-lru-limit. It is by design allowed to
be configured only in init() but not in reconfigure(). This is to avoid
all the complexity associated with eviction of least recently used shards
when the list is shrunk.

Change-Id: Ifdcc2099f634314fafe8444e2d676e192e89e295
updates: bz#1605056
Signed-off-by: Krutika Dhananjay

features/shard: Perform shards deletion in the background

2018-06-20T09:44:56+00:00

A synctask is created that would scan the indices from
.shard/.remove_me, to delete the shards associated with the
gfid corresponding to the index bname and the rate of deletion
is controlled by the option features.shard-deletion-rate whose
default value is 100.
The task is launched on two accounts:
1. when shard receives its first-ever lookup on the volume
2. when a rename or unlink deleted an inode

Change-Id: Ia83117230c9dd7d0d9cae05235644f8475e97bc3
updates: bz#1568521
Signed-off-by: Krutika Dhananjay

features/shard: Introducing ".shard/.remove_me" for atomic shard deletion (part 1)

2018-06-13T09:57:14+00:00

PROBLEM:
Shards are deleted synchronously when a sharded file is unlinked or
when a sharded file participating as the dst in a rename() is going to
be replaced. The problem with this approach is it makes the operation
really slow, sometimes causing the application to time out, especially
with large files.

SOLUTION:
To make this operation atomic, we introduce a ".remove_me" directory.
Now renames and unlinks will simply involve two steps:
1. creating an empty file under .remove_me named after the gfid of the file
participating in unlink/rename
2. carrying out the actual rename/unlink
A synctask is created (more on that in part 2) to scan this directory
after every unlink/rename operation (or upon a volume mount) and clean
up all shards associated with it. All of this happens in the background.
The task takes care to delete the shards associated with the gfid in
.remove_me only if this gfid doesn't exist in backend, ensuring that the
file was successfully renamed/unlinked and its shards can be discarded now
safely.

Change-Id: Ia1d238b721a3e99f951a73abbe199e4245f51a3a
updates: bz#1568521
Signed-off-by: Krutika Dhananjay

features/shard: Add option to barrier parallel lookup and unlink of shards

2018-04-23T15:53:27+00:00

Also move the common parallel unlink callback for GF_FOP_TRUNCATE and
GF_FOP_FTRUNCATE into a separate function.

Change-Id: Ib0f90a5f62abdfa89cda7bef9f3ff99f349ec332
updates: bz#1568521
Signed-off-by: Krutika Dhananjay