glusterfs.git/xlators, branch v5.7

cluster/ec: honor contention notifications for partially acquired locks

2019-06-28T11:07:08+00:00

EC was ignoring lock contention notifications received while a lock was
being acquired. When a lock is partially acquired (some bricks have
granted the lock but some others not yet) we can receive notifications
from acquired bricks, which should be honored, since we may not receive
more notifications after that.

Since EC was ignoring them, once the lock was acquired, it was not
released until the eager-lock timeout, causing unnecessary delays on
other clients.

This fix takes into consideration the notifications received before
having completed the full lock acquisition. After that, the lock will
be releaed as soon as possible.

Backport of:
> BUG: bz#1708156
> Change-Id: I2a306dbdb29fb557dcab7788a258bd75d826cc12
> Signed-off-by: Xavi Hernandez 

Fixes: bz#1717282
Change-Id: I2a306dbdb29fb557dcab7788a258bd75d826cc12
Signed-off-by: Xavi Hernandez 
Signed-off-by: Hari Gowtham

ec: fix truncate lock to cover the write in tuncate clean

2019-05-08T14:20:13+00:00

ec_truncate_clean does writing under the lock granted for truncate,
but the lock is calculated by ec_adjust_offset_up, so that,
the write in ec_truncate_clean is out of lock.

Updates: bz#1699500
Change-Id: I15ed1b0807d75c5eb817323f1c227e97d03e0e7c
Signed-off-by: Kinglong Mee 
(cherry picked from commit 0e1223491e964096384edfae5032ed0d50d028ad)

performance/write-behind: remove request from wip list in wb_writev_cbk

2019-05-08T14:07:56+00:00

There is a race in the way O_DIRECT writes are handled. Assume two
overlapping write requests w1 and w2.

* w1 is issued and is in wb_inode->wip queue as the response is still
  pending from bricks. Also wb_request_unref in wb_do_winds is not yet
  invoked.

       list_for_each_entry_safe (req, tmp, tasks, winds) {
		list_del_init (&req->winds);

                if (req->op_ret == -1) {
			call_unwind_error_keep_stub (req->stub, req->op_ret,
		                                     req->op_errno);
                } else {
                        call_resume_keep_stub (req->stub);
		}

                wb_request_unref (req);
        }

* w2 is issued and wb_process_queue is invoked. w2 is not picked up
  for winding as w1 is still in wb_inode->wip. w1 is added to todo
  list and wb_writev for w2 returns.

* response to w1 is received and invokes wb_request_unref. Assume
  wb_request_unref in wb_do_winds (see point 1) is not invoked
  yet. Since there is one more refcount, wb_request_unref in
  wb_writev_cbk of w1 doesn't remove w1 from wip.

* wb_process_queue is invoked as part of wb_writev_cbk of w1. But, it
  fails to wind w2 as w1 is still in wip.

* wb_requet_unref is invoked on w1 as part of wb_do_winds. w1 is
  removed from all queues including w1.

* After this point there is no invocation of wb_process_queue unless
  new request is issued from application causing w2 to be hung till
  the next request.

This bug is similar to bz 1626780 and bz 1379655.

Change-Id: Iaa47437613591699d4c8ad18bc0b32de6affcc31
Signed-off-by: Raghavendra G 
Fixes: bz#1707198
(cherry picked from commit 6454132342c0b549365d92bcf3572ecd914f7fa8)

cluster/afr: Remove local from owners_list on failure of lock-acquisition

2019-05-08T14:07:12+00:00

When eager-lock lock acquisition fails because of say network failures, the
local is not being removed from owners_list, this leads to accumulation of
waiting frames and the application will hang because the waiting frames are
under the assumption that another transaction is in the process of acquiring
lock because owner-list is not empty. Handled this case as well in this patch.
Added asserts to make it easier to find these problems in future.

fixes bz#1699736
Change-Id: I3101393265e9827755725b1f2d94a93d8709e923
Signed-off-by: Pranith Kumar K

cluster/dht: Request linkto xattrs in dht_rmdir opendir

2019-04-10T07:51:40+00:00

If parallel-readdir is enabled, the rda xlator is loaded
below dht in the graph and proactively lists and caches
entries when an opendir is performed. dht_rmdir checks if
the directory being deleted contains stale linkto files by
performing a readdirp on its child subvols. However, as
the entries are actually read in during the opendir operation
which does not request the linkto xattr,no linkto xattrs are
present for the entries causing dht to incorrectly identify
them as data files and fail the rmdir operation with ENOTEMPTY.
DHT now always adds the linkto xattr in the list of xattrs
requested in the opendir.

Change-Id: I0711198e66c59146282eb8b88084170bedfb4018
fixes: bz#1695399
Signed-off-by: N Balachandran 
(cherry picked from commit 110006bbcd5bb3e814b4cfe7d74cb41891ac3b0c)

glusterd: fix txn-id mem leak

2019-04-09T11:06:46+00:00

This commit ensures the following:
1. Don't send commit op request to the remote nodes when gluster v
status all is executed as for the status all transaction the local
commit gets the name of the volumes and remote commit ops are
technically a no-op. So no need for additional rpc requests.
2. In op state machine flow, if the transaction is in staged state and
op_info.skip_locking is true, then no need to set the txn id in the
priv->glusterd_txn_opinfo dictionary which never gets freed.

Fixes: bz#1694612
Change-Id: Ib6a9300ea29633f501abac2ba53fb72ff648c822
Signed-off-by: Atin Mukherjee 
(cherry picked from commit 34e010d64905b7387de57840d3fb16a326853c9b)

client-rpc: Fix the payload being sent on the wire

2019-04-08T14:14:29+00:00

The fops allocate 3 kind of payload(buffer) in the client xlator:
- fop payload, this is the buffer allocated by the write and put fop
- rsphdr paylod, this is the buffer required by the reply cbk of
  some fops like lookup, readdir.
- rsp_paylod, this is the buffer required by the reply cbk of fops like
  readv etc.

Currently, in the lookup and readdir fop the rsphdr is sent as payload,
hence the allocated rsphdr buffer is also sent on the wire, increasing
the bandwidth consumption on the wire.

With this patch, the issue is fixed.

Fixes: bz#1673058
Change-Id: Ie8158921f4db319e60ad5f52d851fa5c9d4a269b
Signed-off-by: Poornima G

cluster/dht: Fix lookup selfheal and rmdir race

2019-04-08T14:06:17+00:00

A race between the lookup selfheal and rmdir can cause
directories to be healed only on non-hashed subvols.
This can prevent the directory from being listed from
the mount point and in turn causes rm -rf to fail with
ENOTEMPTY.
Fix: Update the layout information correctly and reduce
the call count only after processing the response.

Change-Id: I812779aaf3d7bcf24aab1cb158cb6ed50d212451
fixes: bz#1695403
Signed-off-by: N Balachandran 
(cherry picked from commit b0f1d782fc45313fce4e1c0e74127401d5342d05)

cluster/afr: Send truncate on arbiter brick from SHD

2019-03-12T07:21:26+00:00

Problem:
In an arbiter volume configuration SHD will not send any writes onto the arbiter
brick even if there is data pending marker for the arbiter brick. If we have a
arbiter setup on the geo-rep master and there are data pending markers for the files
on arbiter brick, SHD will not mark any data changelog during healing. While syncing
the data from master to slave, if the arbiter-brick is considered as ACTIVE, then
there is a chance that slave will miss out some data. If the arbiter brick is being
newly added or replaced there is a chance of slave missing all the data during sync.

Fix:
If there is data pending marker for the arbiter brick, send truncate on the arbiter
brick during heal, so that it will record truncate as the data transaction in changelog.

Change-Id: I3242ba6cea6da495c418ef860d9c3359c5459dec
fixes: bz#1687687
Signed-off-by: karthik-us

core: make compute_cksum function op_version compatible

2019-03-11T00:31:01+00:00

Problem: commit 5a152a changed the mechansim of computing the
checksum. In heterogeneous cluster, peers are running into
rejected state because we have different cksum computation
mechansims in upgraded and non-upgraded nodes.

Solution: add a check for op-version so that all the nodes
in the cluster follow the same mechanism for computing the
cksum.

fixes: bz#1684569

> Change-Id: I1508f000e8c9895588b6011b8b6cc0eda7102193
> BUG: bz#1685120
> Signed-off-by: Sanju Rakonde 

Change-Id: I1508f000e8c9895588b6011b8b6cc0eda7102193
Signed-off-by: Sanju Rakonde