glusterfs.git/xlators, branch v7.0rc0

storage/posix: set the op_errno to proper errno during gfid set

2019-08-22T05:58:47+00:00

In posix_gfid_set, the proper error is not captured in one of
the failure cases.

Change-Id: I1c13f0691a15d6893f1037b3a5fe385a99657e00
Fixes: bz#1736481
Signed-off-by: Raghavendra Bhat 
(cherry picked from commit ed7a3793073670e787063c47e55010fc7c963064)

cluster/ec: Update lock->good_mask on parent fop failure

2019-08-22T05:56:58+00:00

When discard/truncate performs write fop, it should do so
after updating lock->good_mask to make sure readv happens
on the correct mask

fixes: bz#1739424
Change-Id: Idfef0bbcca8860d53707094722e6ba3f81c583b7
Signed-off-by: Pranith Kumar K

cluster/ec: Fix reopen flags to avoid misbehavior

2019-08-22T05:56:58+00:00

Problem:
when a file needs to be re-opened O_APPEND and O_EXCL
flags are not filtered in EC.

- O_APPEND should be filtered because EC doesn't send O_APPEND below EC for
open to make sure writes happen on the individual fragments instead of at the
end of the file.

- O_EXCL should be filtered because shd could have created the file so even
when file exists open should succeed

- O_CREAT should be filtered because open happens with gfid as parameter. So
open fop will create just the gfid which will lead to problems.

Fix:
Filter out these two flags in reopen.

Change-Id: Ia280470fcb5188a09caa07bf665a2a94bce23bc4
Fixes: bz#1739426
Signed-off-by: Pranith Kumar K

cluster/ec: Always read from good-mask

2019-08-22T05:56:58+00:00

There are cases where fop->mask may have fop->healing added
and readv shouldn't be wound on fop->healing. To avoid this
always wind readv to lock->good_mask

updates: bz#1739424
Change-Id: I2226ef0229daf5ff315d51e868b980ee48060b87
Signed-off-by: Pranith Kumar K

cluster/ec: fix EIO error for concurrent writes on sparse files

2019-08-22T05:56:58+00:00

EC doesn't allow concurrent writes on overlapping areas, they are
serialized. However non-overlapping writes are serviced in parallel.
When a write is not aligned, EC first needs to read the entire chunk
from disk, apply the modified fragment and write it again.

The problem appears on sparse files because a write to an offset
implicitly creates data on offsets below it (so, in some way, they
are overlapping). For example, if a file is empty and we read 10 bytes
from offset 10, read() will return 0 bytes. Now, if we write one byte
at offset 1M and retry the same read, the system call will return 10
bytes (all containing 0's).

So if we have two writes, the first one at offset 10 and the second one
at offset 1M, EC will send both in parallel because they do not overlap.
However, the first one will try to read missing data from the first chunk
(i.e. offsets 0 to 9) to recombine the entire chunk and do the final write.
This read will happen in parallel with the write to 1M. What could happen
is that half of the bricks process the write before the read, and the
half do the read before the write. Some bricks will return 10 bytes of
data while the otherw will return 0 bytes (because the file on the brick
has not been expanded yet).

When EC tries to recombine the answers from the bricks, it can't, because
it needs more than half consistent answers to recover the data. So this
read fails with EIO error. This error is propagated to the parent write,
which is aborted and EIO is returned to the application.

The issue happened because EC assumed that a write to a given offset
implies that offsets below it exist.

This fix prevents the read of the chunk from bricks if the current size
of the file is smaller than the read chunk offset. This size is
correctly tracked, so this fixes the issue.

Also modifying ec-stripe.t file for Test #13 within it.
In this patch, if a file size is less than the offset we are writing, we
fill zeros in head and tail and do not consider it strip cache miss.
That actually make sense as we know what data that part holds and there is
no need of reading it from bricks.

Change-Id: Ic342e8c35c555b8534109e9314c9a0710b6225d6
Fixes: bz#1739427
Signed-off-by: Xavi Hernandez

cluster/ec: inherit healing from lock when it has info

2019-08-22T05:56:23+00:00

If lock has info, fop should inherit healing mask from it.
Otherwise, fop cannot inherit right healing when changed_flags is zero.

Change-Id: Ife80c9169d2c555024347a20300b0583f7e8a87f
updates: bz#1739424
Signed-off-by: Kinglong Mee

afr: restore timestamp of parent dir during entry-heal

2019-08-21T11:45:24+00:00

Fixes: bz#1741041
Change-Id: I29e338bac62104233a6f80212df8d0fb016affda
Signed-off-by: Ravishankar N 
(cherry picked from commit 8e9c53ebf16705b9a1db2fc486dc24a5cb244ddd)

features/shard: Send correct size when reads are sent beyond file size

2019-08-21T11:41:58+00:00

Change-Id: I0cebaaf55c09eb1fb77a274268ff564e871b743b
fixes bz#1740316
Signed-off-by: Krutika Dhananjay 
(cherry picked from commit 51237eda7c4b3846d08c5d24d1e3fe9b7ffba1d4)

event: rename event_XXX with gf_ prefixed

2019-08-21T06:13:38+00:00

I hit one crash issue when using the libgfapi.

In the libgfapi it will call glfs_poller() --> event_dispatch()
in file api/src/glfs.c:721, and the event_dispatch() is defined
by libgluster locally, the problem is the name of event_dispatch()
is the extremly the same with the one from libevent package form
the OS.

For example, if a executable program Foo, which will also use and
link the libevent and the libgfapi at the same time, I can hit the
crash, like:

kernel: glfs_glfspoll[68486]: segfault at 1c0 ip 00007fef006fd2b8 sp
00007feeeaffce30 error 4 in libevent-2.0.so.5.1.9[7fef006ed000+46000]

The link for Foo is:
lib_foo_LADD = -levent $(GFAPI_LIBS)
It will crash.

This is because the glfs_poller() is calling the event_dispatch() from
the libevent, not the libglsuter.

The gfapi link info :
GFAPI_LIBS = -lacl -lgfapi -lglusterfs -lgfrpc -lgfxdr -luuid

If I link Foo like:
lib_foo_LADD = $(GFAPI_LIBS) -levent
It will works well without any problem.

And if Foo call one private lib, such as handler_glfs.so, and the
handler_glfs.so will link the GFAPI_LIBS directly, while the Foo won't
and it will dlopen(handler_glfs.so), then the crash will be hit everytime.

The link info will be:
foo_LADD = -levent
libhandler_glfs_LIBADD = $(GFAPI_LIBS)

I can avoid the crash temporarily by linking the GFAPI_LIBS in Foo too like:
foo_LADD = $(GFAPI_LIBS) -levent
libhandler_glfs_LIBADD = $(GFAPI_LIBS)

But this is ugly since the Foo won't use any APIs from the GFAPI_LIBS.

And in some cases when the --as-needed link option is added(on many dists
it is added as default), then the crash is back again, the above workaround
won't work.

Backport of:
> https://review.gluster.org/#/c/glusterfs/+/23110/
> Change-Id: I38f0200b941bd1cff4bf3066fca2fc1f9a5263aa
> Fixes: #699
> Signed-off-by: Xiubo Li 

Change-Id: I38f0200b941bd1cff4bf3066fca2fc1f9a5263aa
updates: bz#1740519
Signed-off-by: Xiubo Li 
(cherry picked from commit 799edc73c3d4f694c365c6a7c27c9ab8eed5f260)

locks/fencing: Address hang while lock preemption

2019-08-21T06:13:23+00:00

The fop_wind_count can go negative when fencing is enabled
on unwind path of the IO leading to hang.

Also changed code so that fop_wind_count needs to be maintained only
till fencing is enabled on the file.

> updates: bz#1717824
> Change-Id: Icd04b42bc16cd3d50eaa581ee57233910194f480
> signed-off-by: Susant Palai 
(backport of https://review.gluster.org/#/c/glusterfs/+/23088/)

fixes: bz#1740077
Change-Id: Icd04b42bc16cd3d50eaa581ee57233910194f480
Signed-off-by: Susant Palai