glusterfs.git/xlators/cluster/ec/src, branch release-4.0

cluster/ec: send list-node-uuids request to all subvolumes

2018-03-28T18:32:44+00:00

The xattr trusted.glusterfs.list-node-uuids was only sent to a single
subvolume. This was returning null uuids from the other subvolumes as
if they were down.

This fix forces that xattr to be requested from all subvolumes.

Backport of:
> BUG: 1561406

Change-Id: If62eb39a6857258923ba625e153d4ad79018ea2f
BUG: 1561721
Signed-off-by: Xavi Hernandez

cluster/ec: fix SHD crash for null gfid's

2018-03-21T16:32:26+00:00

When the self-heal daemon is doing a full sweep it uses readdirp to
get extra stat information from each file. This information is
obtained in two steps by the posix xlator: first the directory is
read to get the entries and then each entry is stated to get additional
info. Between these two steps, it's possible that the file is removed
by the user, so we'll get an error, leaving stat info empty.

EC's heal daemon was using the gfid blindly, causing an assert failure
when protocol/client was trying to encode the gfid.

To fix the problem a check has been added. If we detect a null gfid, we
simply ignore it and continue healing.

Backport of:
> BUG: 1558016

Change-Id: I2e4acdcecd0b6951055e50d1c37d686a2186a228
BUG: 1559079
Signed-off-by: Xavi Hernandez

cluster/ec: Change default read policy to gfid-hash

2018-03-19T08:52:04+00:00

Problem:
Whenever we read data from file over NFS, NFS reads
more data then requested and caches it. Based on the
stat information it makes sure that the cached/pre-read
data is valid or not.

Consider 4 + 2 EC volume and all the bricks are on
differnt nodes.

In EC, with round-robin read policy, reads are sent on
different set of data bricks. This way, it balances the
read fops to go on all the bricks and avoid heating UP
(overloading) same set of bricks.

Due to small difference in clock speed, it is possible
that we get minor difference for atime, mtime or ctime
for different bricks. That might cause a different stat
returned to NFS based on which NFS will discard
cached/pre-read data which is actually not changed and
could be used.

Solution:
Change read policy for EC as gfid-hash. That will force
all the read to go to same set of bricks.

>Change-Id: I825441cc519e94bf3dc3aa0bd4cb7c6ae6392c84
>BUG: 1554743
>Signed-off-by: Ashish Pandey 

Change-Id: I825441cc519e94bf3dc3aa0bd4cb7c6ae6392c84
BUG: 1557906
Signed-off-by: Ashish Pandey

cluster/ec: avoid delays in self-heal

2018-03-15T07:20:10+00:00

Self-heal creates a thread per brick to sweep the index looking for
files that need to be healed. These threads are started before the
volume comes online, so nothing is done but waiting for the next
sweep. This happens once per minute.

When a replace brick command is executed, the new graph is loaded and
all index sweeper threads started. When all bricks have reported, a
getxattr request is sent to the root directory of the volume. This
causes a heal on it (because the new brick doesn't have good data),
and marks its contents as pending to be healed. This is done by the
index sweeper thread on the next round, one minute later.

This patch solves this problem by waking all index sweeper threads
after a successful check on the root directory.

Additionally, the index sweep thread scans the index directory
sequentially, but it might happen that after healing a directory entry
more index entries are created but skipped by the current directory
scan. This causes the remaining entries to be processed on the next
round, one minute later. The same can happen in the next round, so
the heal is running in bursts and taking a lot to finish, specially
on volumes with many directory levels.

This patch solves this problem by immediately restarting the index
sweep if a directory has been healed.

Backport of:
> BUG: 1547662

Change-Id: I58d9ab6ef17b30f704dc322e1d3d53b904e5f30e
BUG: 1555198
Signed-off-by: Xavi Hernandez

cluster/ec: Do lock conflict check correctly for wait-list

2018-02-02T15:04:04+00:00

Problem:
ec_link_has_lock_conflict() is traversing over only owner_list
but the function is also getting called with wait_list.

Fix:
Modify ec_link_has_lock_conflict() to traverse lists correctly.
Updated the callers to reflect the changes.

BUG: 1540882
Change-Id: Ibd7ea10f4498e7c2761f9a6faac6d5cb7d750c91
Signed-off-by: Pranith Kumar K

cluster/ec : EC options for GD2

2018-01-22T08:53:36+00:00

Updates #302

Change-Id: I31b4648f7b1a394fceece5cba8120c579c66edd9
Signed-off-by: Sunil Kumar Acharya

locks: added inodelk/entrylk contention upcall notifications

2018-01-16T10:37:22+00:00

The locks xlator now is able to send a contention notification to
the current owner of the lock.

This is only a notification that can be used to improve performance
of some client side operations that might benefit from extended
duration of lock ownership. Nothing is done if the lock owner decides
to ignore the message and to not release the lock. For forced
release of acquired resources, leases must be used.

Change-Id: I7f1ad32a0b4b445505b09908a050080ad848f8e0
Signed-off-by: Xavier Hernandez

cluster/ec: OpenFD heal implementation for EC

2018-01-05T06:55:44+00:00

Existing EC code doesn't try to heal the OpenFD to
avoid unnecessary healing of the data later.

Fix implements the healing of open FDs before
carrying out file operations on them by making an
attempt to open the FDs on required up nodes.

BUG: 1431955
Change-Id: Ib696f59c41ffd8d5678a484b23a00bb02764ed15
Signed-off-by: Sunil Kumar Acharya

posix: Introduce flags for validity of iatt members

2017-12-29T17:13:25+00:00

v1 of the patch started off as adding new fields to iatt that can be
filled up using statx but the discussions were more around introducing
masks to check the validity of different fields from a RIO perspective.
To that extent, I have dropped the statx call in this version and
introduced a 64 bit mask for existing fields. The masks I have defined
are similar with the statx() flags' masks.

I have *not* changed iatt_to_stat() to use the macros IATT_TYPE_VALID,
IATT_GFID_VALID etc before blindly copying from struct iatt to struct.

Also fixed warnings in xlators because of atime/mtime/ctime seconds
field change from uint32_t to int64_t.

Change-Id: I4ac614f1e8d5c8246fc99d5bc2d2a23e7941512b
Signed-off-by: Ravishankar N

cluster/ec: Fix possible shift overflow

2017-12-22T16:19:53+00:00

A coverity scan has revelaed a potential shift overflow while scanning
the bitmap of available subvolumes. The actual overflow cannot happen,
but I've changed to test used to control the limit to make it explicit.

Change-Id: Ieb55f010bbca68a1d86a93e47822f7c709a26e83
BUG: 789278
Signed-off-by: Xavier Hernandez