glusterfs.git/xlators, branch v6.6

cluster/ec: fix EIO error for concurrent writes on sparse files

2019-10-24T09:25:08+00:00

EC doesn't allow concurrent writes on overlapping areas, they are
serialized. However non-overlapping writes are serviced in parallel.
When a write is not aligned, EC first needs to read the entire chunk
from disk, apply the modified fragment and write it again.

The problem appears on sparse files because a write to an offset
implicitly creates data on offsets below it (so, in some way, they
are overlapping). For example, if a file is empty and we read 10 bytes
from offset 10, read() will return 0 bytes. Now, if we write one byte
at offset 1M and retry the same read, the system call will return 10
bytes (all containing 0's).

So if we have two writes, the first one at offset 10 and the second one
at offset 1M, EC will send both in parallel because they do not overlap.
However, the first one will try to read missing data from the first chunk
(i.e. offsets 0 to 9) to recombine the entire chunk and do the final write.
This read will happen in parallel with the write to 1M. What could happen
is that half of the bricks process the write before the read, and the
half do the read before the write. Some bricks will return 10 bytes of
data while the otherw will return 0 bytes (because the file on the brick
has not been expanded yet).

When EC tries to recombine the answers from the bricks, it can't, because
it needs more than half consistent answers to recover the data. So this
read fails with EIO error. This error is propagated to the parent write,
which is aborted and EIO is returned to the application.

The issue happened because EC assumed that a write to a given offset
implies that offsets below it exist.

This fix prevents the read of the chunk from bricks if the current size
of the file is smaller than the read chunk offset. This size is
correctly tracked, so this fixes the issue.

Also modifying ec-stripe.t file for Test #13 within it.
In this patch, if a file size is less than the offset we are writing, we
fill zeros in head and tail and do not consider it strip cache miss.
That actually make sense as we know what data that part holds and there is
no need of reading it from bricks.


Backport of:
 > Patch:https://review.gluster.org/#/c/glusterfs/+/23066/
 > Change-Id: Ic342e8c35c555b8534109e9314c9a0710b6225d6
 > BUG: 1730715
 > Signed-off-by: Xavi Hernandez 
 
(cherry picked from commit b01a43586c5abc23a874e5528a063c508f952cbd)

Change-Id: Ic342e8c35c555b8534109e9314c9a0710b6225d6
Fixes: bz#1739451
Signed-off-by: Xavi Hernandez

features/shard: Send correct size when reads are sent beyond file size

2019-10-24T09:24:22+00:00

Change-Id: I0cebaaf55c09eb1fb77a274268ff564e871b743b
fixes bz#1737141
Signed-off-by: Krutika Dhananjay 
(cherry picked from commit 51237eda7c4b3846d08c5d24d1e3fe9b7ffba1d4)

dht: Rebalance causing IO Error - File descriptor in bad state

2019-10-17T10:55:05+00:00

Problem : When a file is migrated, dht attempts to re-open all open
          fds on the new cached subvol. Earlier, if dht had not opened the fd,
          the client xlator would be unable to find the remote fd and would
          fall back to using an anon fd for the fop. That behavior changed with
          https://review.gluster.org/#/c/glusterfs/+/15804, causing fops to fail
          with EBADFD if the fd was not available on the cached subvol.
          The client xlator returns EBADFD if the remote fd is not found but
          dht only checks for EBADF before re-opening fds on the new cached subvol.

Solution: Handle EBADFD at dht code path to avoid the issue

>Change-Id: I43c51995cdd48d05b12e4b2889c8dbe2bb2a72d8
>Fixes: bz#1758579
>(cherry picked from commit 9314a9fbf487614c736cf6c4c1b93078d37bb9df)
>(Reviewed on upstream https://review.gluster.org/#/c/glusterfs/+/23518/)

Change-Id: I43c51995cdd48d05b12e4b2889c8dbe2bb2a72d8
Fixes: bz#1761907

cluster/afr: Heal entries when there is a source & no healed_sinks

2019-10-17T10:52:54+00:00

Problem:
In a situation where B1 blames B2, B2 blames B1 and B3 doesn't blame
anything for entry heal, heal will not complete even though we have
clear source and sinks. This will happen because while doing
afr_selfheal_find_direction() only the bricks which are blamed by
non-accused bricks are considered as sinks. Later in
__afr_selfheal_entry_finalize_source() when it tries to mark all the
non-sources as sinks it fails to do so because there won't be any
healed_sinks marked, no witness present and there will be a source.

Fix:
If there is a source and no healed_sinks, then reset all the locked
sources to 0 and healed sinks to 1 to do conservative merge.

Change-Id: If40d8bc95d52a52b2730f55bdcf135109b421548
Fixes: bz#1760706
Signed-off-by: karthik-us

afr: support split-brain CLI for replica 3

2019-10-17T10:51:33+00:00

Ever since we added quorum checks for lookups in afr via commit
bd44d59741bb8c0f5d7a62c5b1094179dd0ce8a4, the split-brain resolution
commands would not work for replica 3 because there would be no
readables for the lookup fop.

The argument was that split-brains do not occur in replica 3 but we do
see (data/metadata) split-brain cases once in a while which indicate that there are
a few bugs/corner cases yet to be discovered and fixed.

Fortunately, commit  8016d51a3bbd410b0b927ed66be50a09574b7982 added
GF_CLIENT_PID_GLFS_HEALD as the pid for all fops made by glfsheal. If we
leverage this and allow lookups in afr when pid is GF_CLIENT_PID_GLFS_HEALD,
split-brain resolution commands will work for replica 3 volumes too.

Likewise, the check is added in shard_lookup as well to permit resolving
split-brains by specifying "/.shard/shard-file.xx" as the file name
(which previously used to fail with EPERM).

Change-Id: I3c543dea79caf7cfbc1633e9089cb1cdd2538ba9
Fixes: bz#1760792
Signed-off-by: Ravishankar N 
(cherry picked from commit 47dbd753187f69b3835d2e42fdbe7485874c4b3e)

system/posix-acl: update ctx only if iatt is non-NULL

2019-10-15T05:42:30+00:00

We need to safe-guard against possible zero'ing out of iatt
structure in acl ctx, which can cause many issues.

> fixes: bz#1668286
> Change-Id: Ie81a57d7453a6624078de3be8c0845bf4d432773
> Signed-off-by: Amar Tumballi 
> (cherry picked from commit 6bf9637a93011298d032332ca93009ba4e377e46)

Change-Id: I992d25f1c1282d50aa0232d01586d2df2216551c
fixes: bz#1741402
Signed-off-by: Mohit Agrawal

perf/write-behind: Clear frame->local on conflict error

2019-10-04T05:22:36+00:00

WB saves the wb_inode in frame->local for the truncate and
ftruncate fops. This value is not cleared in case of error
on a conflicting write request. FRAME_DESTROY finds a non-null
frame->local and tries to free it using mem_put. However,
wb_inode is allocated using GF_CALLOC, causing the
process to crash.

credit: vpolakis@gmail.com

Change-Id: I217f61470445775e05145aebe44c814731c1b8c5
fixes: bz#1755679
Signed-off-by: N Balachandran

fuse: add missing GF_FREE to fuse_interrupt

2019-09-27T12:16:19+00:00

Change-Id: Id7e003e4a53d0a0057c1c84e1cd704c80a6cb015
fixes: bz#1753571
Signed-off-by: N Balachandran

ctime/rebalance: Heal ctime xattr on directory during rebalance

2019-09-27T11:34:25+00:00

After add-brick and rebalance, the ctime xattr is not present
on rebalanced directories on new brick. This patch fixes the
same.

Note that ctime still doesn't support consistent time across
distribute sub-volume.

This patch also fixes the in-memory inconsistency of time attributes
when metadata is self healed.

Backport of:
 > Patch: https://review.gluster.org/23127
 > Change-Id: Ia20506f1839021bf61d4753191e7dc34b31bb2df
 > BUG: 1734026
 > Signed-off-by: Kotresh HR 

Patch: https://review.gluster.org/23127
Change-Id: Ia20506f1839021bf61d4753191e7dc34b31bb2df
fixes: bz#1752413
Signed-off-by: Kotresh HR

dht: Custom xattrs are not healed in case of add-brick

2019-09-27T11:26:15+00:00

Problem: If any custom xattrs are set on the directory before
         add a brick, xattrs are not healed on the directory
         after adding a brick.

Solution: xattr are not healed because dht_selfheal_dir_mkdir_lookup_cbk
          checks the value of MDS and if MDS value is not negative
          selfheal code path does not take reference of MDS xattrs.Change the
          condition to take reference of MDS xattr so that custom xattrs are
          populated on newly added brick

Backport of:
 > Patch: https://review.gluster.org/22520
 > BUG: bz#1702299
 > Change-Id: Id14beedb98cce6928055f294e1594b22132e811c
 > Signed-off-by: Mohit Agrawal 

fixes: bz#1753561
Change-Id: Id14beedb98cce6928055f294e1594b22132e811c
Signed-off-by: Kotresh HR