glusterfs.git/xlators/performance, branch v7.0rc2

md-cache: only update generation for inode at upcall and NULL stat

2019-06-19T09:13:03+00:00

1. For parallel writes from nfs-ganesha, two fops with two generations,
   but the fops reply maybe returned disordered.
2. The inode md-cache timeout should not increase conf->generation.

With this patch,
1, Fop only gets generation from inode md-cache or conf, does not increase it.
2. The generation is increased at upcall invalidate, estal/enoent error
   invalidate, reply with zeroed out stat from write-behind.

Change-Id: I897ecaa143fd18bc024c1948c7d1a6f831fd53da
Updates: bz#1683594
Signed-off-by: Kinglong Mee

multiple files: another attempt to remove includes

2019-06-14T16:50:32+00:00

There are many include statements that are not needed.
A previous more ambitious attempt failed because of *BSD plafrom
(see https://review.gluster.org/#/c/glusterfs/+/21929/ )

Now trying a more conservative reduction.
It does not solve all circular deps that we have, but it
does reduce some of them. There is just too much to handle
reasonably (dht-common.h includes dht-lock.h which includes
dht-common.h ...), but it does reduce the overall number of lines
of include we need to look at in the future to understand and fix
the mess later one.

Change-Id: I550cd001bdefb8be0fe67632f783c0ef6bee3f9f
updates: bz#1193929
Signed-off-by: Yaniv Kaul

libglusterfs: cleanup iovec functions

2019-06-11T13:28:07+00:00

This patch cleans some iovec code and creates two additional helper
functions to simplify management of iovec structures.

  iov_range_copy(struct iovec *dst, uint32_t dst_count, uint32_t dst_offset,
                 struct iovec *src, uint32_t src_count, uint32_t src_offset,
                 uint32_t size);

    This function copies up to 'size' bytes from 'src' at offset
    'src_offset' to 'dst' at 'dst_offset'. It returns the number of
    bytes copied.

  iov_skip(struct iovec *iovec, uint32_t count, uint32_t size);

    This function removes the initial 'size' bytes from 'iovec' and
    returns the updated number of iovec vectors remaining.

The signature of iov_subset() has also been modified to make it safer
and easier to use. The new signature is:

  iov_subset(struct iovec *src, int src_count, uint32_t start, uint32_t size,
             struct iovec **dst, int32_t dst_count);

    This function creates a new iovec array containing the subset of the
    'src' vector starting at 'start' with size 'size'. The resulting
    array is allocated if '*dst' is NULL, or copied to '*dst' if it fits
    (based on 'dst_count'). It returns the number of iovec vectors used.

A new set of functions to iterate through an iovec array have been
created. They can be used to simplify the implementation of other
iovec-based helper functions.

Change-Id: Ia5fe57e388e23392a8d6cdab17670e337cadd587
Updates: bz#1193929
Signed-off-by: Xavi Hernandez

lcov: improve line coverage

2019-06-03T08:43:13+00:00

upcall: remove extra variable assignment and use just one
        initialization.
open-behind: reduce the overall number of lines, in functions
             not frequently called
selinux: reduce some lines in init failure cases

updates: bz#1693692
Change-Id: I7c1de94f2ec76a5bfe1f48a9632879b18e5fbb95
Signed-off-by: Amar Tumballi

performance/write-behind: remove request from wip list in wb_writev_cbk

2019-05-06T14:23:02+00:00

There is a race in the way O_DIRECT writes are handled. Assume two
overlapping write requests w1 and w2.

* w1 is issued and is in wb_inode->wip queue as the response is still
  pending from bricks. Also wb_request_unref in wb_do_winds is not yet
  invoked.

       list_for_each_entry_safe (req, tmp, tasks, winds) {
		list_del_init (&req->winds);

                if (req->op_ret == -1) {
			call_unwind_error_keep_stub (req->stub, req->op_ret,
		                                     req->op_errno);
                } else {
                        call_resume_keep_stub (req->stub);
		}

                wb_request_unref (req);
        }

* w2 is issued and wb_process_queue is invoked. w2 is not picked up
  for winding as w1 is still in wb_inode->wip. w1 is added to todo
  list and wb_writev for w2 returns.

* response to w1 is received and invokes wb_request_unref. Assume
  wb_request_unref in wb_do_winds (see point 1) is not invoked
  yet. Since there is one more refcount, wb_request_unref in
  wb_writev_cbk of w1 doesn't remove w1 from wip.

* wb_process_queue is invoked as part of wb_writev_cbk of w1. But, it
  fails to wind w2 as w1 is still in wip.

* wb_requet_unref is invoked on w1 as part of wb_do_winds. w1 is
  removed from all queues including w1.

* After this point there is no invocation of wb_process_queue unless
  new request is issued from application causing w2 to be hung till
  the next request.

This bug is similar to bz 1626780 and bz 1379655.

Change-Id: Iaa47437613591699d4c8ad18bc0b32de6affcc31
Signed-off-by: Raghavendra G 
Fixes: bz#1705865

performance/decompounder: remove the translator as the feature is not used anymore

2019-04-29T05:29:06+00:00

updates: bz#1693692
Change-Id: Id5932b11e115ca6da1c2bfff7ae1460787109e06
Signed-off-by: Amar Tumballi

io-threads.c: Potentially skip a lock.

2019-03-12T03:09:33+00:00

Before going into the lock, verify stub_cnt != 0.
Otherwise, let's skip this code.

Unrelated, switch a CALLOC to MALLOC, as we
initialize all members right away. This allocation
is done also under lock, so also should help a bit.

Compile-tested only!
updates: bz#1193929
Signed-off-by: Yaniv Kaul 

Change-Id: Ie2fe6adff41ae4969abff95eff945b54e1a01d32

performance/readdir-ahead: fix deadlock

2019-03-07T15:10:48+00:00

This deadlock happens while processing dentry corresponding to current
directory (.) in rda_fill_readdirp. In this case following order is
followed:

LOCK(directory_fd_ctx->lock);
  rda_inode_ctx_get_iatt -> LOCK(directory_inode->lock);

However, in rda_mark_inode_dirty following lock order is followed:
LOCK(directory_inode->lock);
  LOCK(directory_fd_ctx->lock);

these two codepaths when executed concurrently resulted in a deadlock.

Current patch fixes this by removing locking directory inode and
fd-ctx in rda_fill_readdirp. This is fine as directory inode's stat
won't change due to writes to files within directory.

Change-Id: Ic93a67a0dac8229bb0d79582e526a512e6f2569c
fixes: bz#1674412
Signed-off-by: Raghavendra Gowdappa 
Fixes:bz#1674412

io-threads: Prioritize fops with NO_ROOT_SQUASH pid

2019-03-05T13:55:21+00:00

There was 30% regression observed in mkdir path with commit
b139bc58eb504adf5ef81658896c9283ae21f390. On analysis it is found
that io-threads xlator deprioritzes fops with all -ve pid.

Some context in to the no-root-squash pid requirement:
DHT xlator does some of the internal fops with root privileges. This is
needed so that operations like layout healing should not be abandoned
because a non root user is operating.  If root-squash option is enabled
the layout set operation looses its root privilege as server xlator
converts the uid and pid to random numbers. Hence, the above mentioned
commit converted pid to GF_CLIENT_PID_NO_ROOT_SQUASH to continue fops
as root.

Combining the above I am proposing not to deprioritize fops with
no-root-squash pid.

Change-Id: I54d056c01b25729304a77f9242fbaff39c5672ba
fixes: bz#1676430
Signed-off-by: Susant Palai

md-cache: Adapt integer data types to avoid integer overflow

2019-02-20T07:54:09+00:00

The "struct iatt" in iatt.h is using int64_t types for storing
the atime, mtime and ctime. Therefore the struct 'struct md_cache' in
md-cache.c should also use this types to avoid an integer overflow.

This can happen e.g. if someone uses a very high default-retention-period
in the WORM-Xlator.

Change-Id: I605268d300ab622b9c8ab30e459dc00d9340aad1
fixes: bz#1678726
Signed-off-by: David Spisla