glusterfs.git/xlators/cluster, branch release-4.0

cluster/dht: Handle file migrations when brick down

2018-04-18T13:21:54+00:00

The decision as to which node would migrate a file
was based on the gfid of the file. Files were divided
among the nodes for the replica/disperse set. However,
if a brick was down when rebalance started, the nodeuuids
would be saved as NULL and a set of files would not be migrated.

Now, if the nodeuuid is NULL, the first non-null entry in
the set is the node responsible for migrating the file.

Change-Id: I72554c107792c7d534e0f25640654b6f8417d373
fixes: bz#1566822
Signed-off-by: N Balachandran 

(cherry picked from commit 1f0765242a689980265c472646c64473a92d94c0)

Change-Id: I3072ca1f2975eb7ad3c38798e65d60d2312fd057

cluster/dht: Wind open to all subvols

2018-04-13T14:46:46+00:00

dht_opendir should wind the open to all subvols
whether or not local->subvols is set. This is
because dht_readdirp winds the calls to all subvols.

(cherry picked from commit c4251edec654b4e0127577e004923d9729bc323d)

Change-Id: I67a96b06dad14a08967c3721301e88555aa01017
updates: bz#1566822
Signed-off-by: N Balachandran

cluster/afr: Prevent ping-event handling on shd

2018-04-09T12:48:51+00:00

On shd, we shouldn't treat any brick down based
on latency, otherwise self-heal will never happen

fixes: bz#1562728
Change-Id: Ica07fcc4fae91a6bfd9c9a670e2be464704d94b7
BUG: 1562728
Signed-off-by: Pranith Kumar K

cluster/ec: send list-node-uuids request to all subvolumes

2018-03-28T18:32:44+00:00

The xattr trusted.glusterfs.list-node-uuids was only sent to a single
subvolume. This was returning null uuids from the other subvolumes as
if they were down.

This fix forces that xattr to be requested from all subvolumes.

Backport of:
> BUG: 1561406

Change-Id: If62eb39a6857258923ba625e153d4ad79018ea2f
BUG: 1561721
Signed-off-by: Xavi Hernandez

cluster/ec: fix SHD crash for null gfid's

2018-03-21T16:32:26+00:00

When the self-heal daemon is doing a full sweep it uses readdirp to
get extra stat information from each file. This information is
obtained in two steps by the posix xlator: first the directory is
read to get the entries and then each entry is stated to get additional
info. Between these two steps, it's possible that the file is removed
by the user, so we'll get an error, leaving stat info empty.

EC's heal daemon was using the gfid blindly, causing an assert failure
when protocol/client was trying to encode the gfid.

To fix the problem a check has been added. If we detect a null gfid, we
simply ignore it and continue healing.

Backport of:
> BUG: 1558016

Change-Id: I2e4acdcecd0b6951055e50d1c37d686a2186a228
BUG: 1559079
Signed-off-by: Xavi Hernandez

cluster/ec: Change default read policy to gfid-hash

2018-03-19T08:52:04+00:00

Problem:
Whenever we read data from file over NFS, NFS reads
more data then requested and caches it. Based on the
stat information it makes sure that the cached/pre-read
data is valid or not.

Consider 4 + 2 EC volume and all the bricks are on
differnt nodes.

In EC, with round-robin read policy, reads are sent on
different set of data bricks. This way, it balances the
read fops to go on all the bricks and avoid heating UP
(overloading) same set of bricks.

Due to small difference in clock speed, it is possible
that we get minor difference for atime, mtime or ctime
for different bricks. That might cause a different stat
returned to NFS based on which NFS will discard
cached/pre-read data which is actually not changed and
could be used.

Solution:
Change read policy for EC as gfid-hash. That will force
all the read to go to same set of bricks.

>Change-Id: I825441cc519e94bf3dc3aa0bd4cb7c6ae6392c84
>BUG: 1554743
>Signed-off-by: Ashish Pandey 

Change-Id: I825441cc519e94bf3dc3aa0bd4cb7c6ae6392c84
BUG: 1557906
Signed-off-by: Ashish Pandey

cluster/ec: avoid delays in self-heal

2018-03-15T07:20:10+00:00

Self-heal creates a thread per brick to sweep the index looking for
files that need to be healed. These threads are started before the
volume comes online, so nothing is done but waiting for the next
sweep. This happens once per minute.

When a replace brick command is executed, the new graph is loaded and
all index sweeper threads started. When all bricks have reported, a
getxattr request is sent to the root directory of the volume. This
causes a heal on it (because the new brick doesn't have good data),
and marks its contents as pending to be healed. This is done by the
index sweeper thread on the next round, one minute later.

This patch solves this problem by waking all index sweeper threads
after a successful check on the root directory.

Additionally, the index sweep thread scans the index directory
sequentially, but it might happen that after healing a directory entry
more index entries are created but skipped by the current directory
scan. This causes the remaining entries to be processed on the next
round, one minute later. The same can happen in the next round, so
the heal is running in bursts and taking a lot to finish, specially
on volumes with many directory levels.

This patch solves this problem by immediately restarting the index
sweep if a directory has been healed.

Backport of:
> BUG: 1547662

Change-Id: I58d9ab6ef17b30f704dc322e1d3d53b904e5f30e
BUG: 1555198
Signed-off-by: Xavi Hernandez

cluster/afr: Fix dict-leak in pre-op

2018-03-03T04:33:26+00:00

At the time of pre-op, pre_op_xdata is populted with the xattrs we get from the
disk and at the time of post-op it gets over-written without unreffing the
previous value stored leading to a leak.
This is a regression we missed in
https://review.gluster.org/#/q/ba149bac92d169ae2256dbc75202dc9e5d06538e

 >BUG: 1550078
 >Change-Id: I0456f9ad6f77ce6248b747964a037193af3a3da7
 >Signed-off-by: Pranith Kumar K 
 >(cherry picked from commit e7b79c59590c203c65f7ac8548b30d068c232d33)

BUG: 1550808
Change-Id: I0456f9ad6f77ce6248b747964a037193af3a3da7

cluster/dht: Handle single dht child in dht_lookup

2018-02-23T15:05:17+00:00

This patch limits itself to only handling the case
where no file (data or linkto) exists on the subvol.

Additional cases to be handled:
1. A linkto file was found on the only child subvol. This currently
calls dht_lookup_everywhere which eventually deletes it. It can be
deleted directly as it will not be pointing to a valid subvol.
2. Directory lookups - locking might be unnecessary in some cases.

> Change-Id: I940ba34531f2aaee1d36fd9ca45ecfd46be662a4
> BUG: 1546620
> Signed-off-by: N Balachandran 

Change-Id: I940ba34531f2aaee1d36fd9ca45ecfd46be662a4
BUG: 1548271
Signed-off-by: N Balachandran

cluster/dht: Ignore ENODATA from getxattr for posix acls

2018-02-23T15:04:23+00:00

dht_migrate_file no longer prints an error if getxattr for
posix acls fails with ENODATA/ENOATTR.

> Change-Id: Id9ecf6852cb5294c1c154b28d609889ea3420e1c
> BUG: 1546954
> Signed-off-by: N Balachandran 

Change-Id: Id9ecf6852cb5294c1c154b28d609889ea3420e1c
BUG: 1548264
Signed-off-by: N Balachandran