glusterfs.git/xlators, branch v4.1.9

performance/write-behind: remove request from wip list in wb_writev_cbk

2019-05-07T06:05:40+00:00

There is a race in the way O_DIRECT writes are handled. Assume two
overlapping write requests w1 and w2.

* w1 is issued and is in wb_inode->wip queue as the response is still
  pending from bricks. Also wb_request_unref in wb_do_winds is not yet
  invoked.

       list_for_each_entry_safe (req, tmp, tasks, winds) {
		list_del_init (&req->winds);

                if (req->op_ret == -1) {
			call_unwind_error_keep_stub (req->stub, req->op_ret,
		                                     req->op_errno);
                } else {
                        call_resume_keep_stub (req->stub);
		}

                wb_request_unref (req);
        }

* w2 is issued and wb_process_queue is invoked. w2 is not picked up
  for winding as w1 is still in wb_inode->wip. w1 is added to todo
  list and wb_writev for w2 returns.

* response to w1 is received and invokes wb_request_unref. Assume
  wb_request_unref in wb_do_winds (see point 1) is not invoked
  yet. Since there is one more refcount, wb_request_unref in
  wb_writev_cbk of w1 doesn't remove w1 from wip.

* wb_process_queue is invoked as part of wb_writev_cbk of w1. But, it
  fails to wind w2 as w1 is still in wip.

* wb_requet_unref is invoked on w1 as part of wb_do_winds. w1 is
  removed from all queues including w1.

* After this point there is no invocation of wb_process_queue unless
  new request is issued from application causing w2 to be hung till
  the next request.

This bug is similar to bz 1626780 and bz 1379655.

Change-Id: Iab47437613591699d4c8ad18bc0b32de6affcc31
Signed-off-by: Raghavendra G 
Fixes: bz#1707200

cluster/dht: Fix rename journal in changelog

2019-04-02T06:54:33+00:00

With patch [1], renames are journalled only
on cached subvolume. The dht sends the special
key on the cached subvolume so that the changelog
journals the rename. With single distribute
sub-volume, the key is not being set. This patch
fixes the same.

[1] https://review.gluster.org/10410

Backport of:
> Patch: https://review.gluster.org/20093
> BUG: 1583018
> Change-Id: Ic2e35b40535916fa506a714f257ba325e22d0961
> Signed-off-by: Kotresh HR 

fixes: bz#1660225
Change-Id: Ic2e35b40535916fa506a714f257ba325e22d0961
Signed-off-by: Kotresh HR

performance/write-behind: fix use after free in readdirp_cbk

2019-03-27T05:18:12+00:00

wb_inode->lock is accessed after inode_unref (inode), which is a
use-after-free as the inode_unref can potentially free up the inode
and hence the inode-ctx (wb_inode). Instead inode_unref has to happen
after the last access of wb_inode.

Change-Id: Ie1a8bb5e44a668578e7d6bcedc77df52618a36e2
Signed-off-by: Raghavendra G 
Fixes: bz#1691292

cluster/dht: sync brick root perms on add brick

2019-03-27T05:18:12+00:00

If a single brick is added to the volume and the
newly added brick is the first to respond to a
dht_revalidate call, its stbuf will not be merged
into local->stbuf as the brick does not yet have
a layout. The is_permission_different check therefore
fails to detect that an attr heal is required as it
only considers the stbuf values from existing bricks.
To fix this, merge all stbuf values into local->stbuf
and use local->prebuf to store the correct directory
attributes.

Change-Id: Ic9e8b04a1ab9ed1248b6b056e3450bbafe32e1bc
fixes: bz#1693057
Signed-off-by: N Balachandran

cluster/afr: Send truncate on arbiter brick from SHD

2019-03-21T10:36:15+00:00

Problem:
In an arbiter volume configuration SHD will not send any writes onto the arbiter
brick even if there is data pending marker for the arbiter brick. If we have a
arbiter setup on the geo-rep master and there are data pending markers for the files
on arbiter brick, SHD will not mark any data changelog during healing. While syncing
the data from master to slave, if the arbiter-brick is considered as ACTIVE, then
there is a chance that slave will miss out some data. If the arbiter brick is being
newly added or replaced there is a chance of slave missing all the data during sync.

Fix:
If there is data pending marker for the arbiter brick, send truncate on the arbiter
brick during heal, so that it will record truncate as the data transaction in changelog.

Change-Id: I3242ba6cea6da495c418ef860d9c3359c5459dec
fixes: bz#1687746
Signed-off-by: karthik-us

leases: Reset lease_ctx->timer post deletion

2019-01-09T15:38:39+00:00

To avoid use_after_free, reset lease_ctx->timer back to NULL
after the structure has been freed.

Change-Id: Icd213ec809b8af934afdb519c335a4680a1d6cdc
updates: bz#1655532
Signed-off-by: Soumya Koduri 
(cherry picked from commit a9b0003c717087ff168bc143c70559162e53e0d5)

leases: Do not conflict with internal fops

2019-01-03T09:38:17+00:00

Internal fops (with frame->root->pid < 0) are used to heal
or move data and maintains data integrity. That is they do not
modify client data which holds the lease. Hence no need to recall
Lease for such fops.

Note: Like for locks, we would need rebalance and self-heal
daemon process to heal lease state as well.

Change-Id: I8988693fef8d00e17c19dcc842e2238f9eb5ab48
updates: bz#1655532
Signed-off-by: Soumya Koduri

lease: Treat unlk request as noop if lease not found

2019-01-03T09:38:16+00:00

When the glusterfs server recalls the lease, it expects
client to flush data and unlock the lease. If not it sets
a timer (starting from the time it sends RECALL request) and post
timeout, it revokes it.

Here we could have a race where in client did send UNLK
lease request but because of network delay it may have reached
after server revokes it. To handle such situations, treat
such requests as noop and return sucesss.

Change-Id: I166402d10273f4f115ff04030ecbc14676a01663
updates: bz#1655532
Signed-off-by: Soumya Koduri

io-cache: xdata needs to be passed for readv operations

2019-01-03T09:38:16+00:00

io-cache xlator has been skipping xdata references when the
date needs to be read into page cache. This patch fixes the same.

Note: similar changes may be needed for other fops as well
which are handled by io-cache.

Change-Id: I28d73d4ba471d13eb55d0fd0b5197d222df77a2a
updates: bz#1655532
Signed-off-by: Soumya Koduri 
(cherry picked from commit b3d88a0904131f6851f4185e43f815ecc3353ab5)

leases: Fix incorrect inode_ref/unrefs

2018-12-26T16:59:17+00:00

From testing & code-reading, found couple of places where
we incorrectly unref the inode resulting in use_after_free
crash or ref leaks. This patch addresses couple of them.

a) When we try to grant the very first lease for a inode,
inode_ref is taken in __add_lease. This ref should be active
till all the leases granted to that inode are released (i.e,
till lease_cnt > 0). In addition even after lease_cnt becomes '0',
the inode should be active till all the blocked fops are resumed.

Hence release this ref, after resuming all those fops. To avoid
granting new leases while resuming those fops, defined a new boolean
(blocked_fops_resuming) to flag it in the lease_ctx.

b) 'new_lease_inode' which creates new lease_inode_entry and
takes ref on inode, is used while adding that entry to
client_list and recall_list.

Use its counter function '__destroy_lease_inode' which does unref
while removing those entries from those lists.

c) inode ref is also taken when added to timer->data. Unref the same
after processing timer->data.

Change-Id: Ie77c78ff4a971e0d9a66178597fb34faf39205fb
updates: bz#1655532
Signed-off-by: Soumya Koduri