glusterfs.git/xlators/cluster, branch v6.8

cluster/ec: Change handling of heal failure to avoid crash

2020-02-28T06:06:57+00:00

Problem:
ec_getxattr_heal_cbk was called with NULL as second argument
in case heal was failing.
This function was dereferencing "cookie" argument which caused crash.

Solution:
Cookie is changed to carry the value that was supposed to be
stored in fop->data, so even in the case when fop is NULL in error
case, there won't be any NULL dereference.

Thanks to Xavi for the suggestion about the fix.

Change-Id: I0798000d5cadb17c3c2fbfa1baf77033ffc2bb8c
fixes: bz#1806836

afr: prevent spurious entry heals leading to gfid split-brain

2020-02-28T06:06:10+00:00

Problem:
In a hyperconverged setup with granular-entry-heal enabled, if a file is
recreated while one of the bricks is down, and an index heal is triggered
(with the brick still down), entry-self heal was doing a spurious heal
with just the 2 good bricks. It was doing a post-op leading to removal
of the filename from .glusterfs/indices/entry-changes as well as
erroneous setting of afr xattrs on the parent. When the brick came up,
the xattrs were cleared, resulting in the renamed file not getting
healed and leading to gfid split-brain and EIO on the mount.

Fix:
Proceed with entry heal only when shd can connect to all bricks of the replica,
just like in data and metadata heal.

fixes: bz#1804594
Change-Id: I916ae26ad1fabf259bc6362da52d433b7223b17e
Signed-off-by: Ravishankar N 
(cherry picked from commit 06453d77d056fbaa393a137ca277a20e38d2f67e)

cluster/thin-arbiter: Wait for TA connection before ta-file lookup

2020-02-26T11:18:54+00:00

Problem:
When we mount a ta volume, as soon as 2 data bricks are connected
we consider that the mount is done and then send a lookup/create on
ta file on ta node. However, this connection with ta node might not
have been completed.
Due to this delay, ta replica id file will not be created and we
will see ENOTCONN error in log file if we do lookup.

Solution:
As we know that this ta node could have a higher latency, we should
wait for reasonable time for connection to happen before sending
lookup/create on replica id file.

fixes: bz#1804546
Change-Id: I36f90865afe617e4e84cee57fec832a16f5dd6cc
(cherry picked from commit a7fa54ddea3fe429f143b37e4de06a93b49d776a)

cluster/ec: skip updating ctx->loc again when ec_fix_open/opendir

2020-02-26T11:09:08+00:00

The ec_manager_open/opendir memsets ctx->loc which causes
memory/inode leak, and ec_fheal uses ctx->loc out of fd->lock
that loc_copy may copy bad data when memset it.

This patch skips updating ctx->loc when it is initilizaed.
With it, ctx->loc is filled once, and never updated.

Change-Id: I3bf5ffce4caf4c1c667f7acaa14b451d37a3550a
fixes: bz#1806838
Signed-off-by: Kinglong Mee

Cluster/afr: Don't treat all bricks having metadata pending as split-brain

2020-02-25T07:06:51+00:00

Problem:
We currently don't have a roll-back/undoing of post-ops if quorum is not met.
Though the FOP is still unwound with failure, the xattrs remain on the disk.
Due to these partial post-ops and partial heals (healing only when 2 bricks
are up), we can end up in metadata split-brain purely from the afr xattrs
point of view i.e each brick is blamed by atleast one of the others for
metadata. These scenarios are hit when there is frequent connect/disconnect
of the client/shd to the bricks.

Fix:
Pick a source based on the xattr values. If 2 bricks blame one, the blamed
one must be treated as sink. If there is no majority, all are sources. Once
we pick a source, self-heal will then do the heal instead of erroring out
due to split-brain.
This patch also adds restriction of all the bricks to be up to perform
metadata heal to avoid any metadata loss.

Removed the test case tests/bugs/replicate/bug-1468279-source-not-blaming-sinks.t
as it was doing metadata heal even when only 2 of 3 bricks were up.

Change-Id: I07a9d62f84ceda329dcab1f02a33aeed258dcb09
fixes: bz#1805097
Signed-off-by: karthik-us

cluster/dht: Correct fd processing loop

2019-12-30T07:11:08+00:00

The fd processing loops in the
dht_migration_complete_check_task and the
dht_rebalance_inprogress_task functions were unsafe
and could cause an open to be sent on an already freed
fd. This has been fixed.

> Change-Id: I0a3c7d2fba314089e03dfd704f9dceb134749540
> Fixes: bz#1757399
> Signed-off-by: N Balachandran 
> (cherry picked from commit 9b15867070b0cc241ab165886292ecffc3bc0aed)

Change-Id: I0a3c7d2fba314089e03dfd704f9dceb134749540
Fixes: bz#1786983
Signed-off-by: Mohit Agrawal

cluster/ec: Update lock->good_mask on parent fop failure

2019-10-30T08:18:19+00:00

When discard/truncate performs write fop, it should do so
after updating lock->good_mask to make sure readv happens
on the correct mask

fixes: bz#1739449
Change-Id: Idfef0bbcca8860d53707094722e6ba3f81c583b7
Signed-off-by: Pranith Kumar K

cluster/ec: Fix reopen flags to avoid misbehavior

2019-10-30T08:18:19+00:00

Problem:
when a file needs to be re-opened O_APPEND and O_EXCL
flags are not filtered in EC.

- O_APPEND should be filtered because EC doesn't send O_APPEND below EC for
open to make sure writes happen on the individual fragments instead of at the
end of the file.

- O_EXCL should be filtered because shd could have created the file so even
when file exists open should succeed

- O_CREAT should be filtered because open happens with gfid as parameter. So
open fop will create just the gfid which will lead to problems.

Fix:
Filter out these two flags in reopen.

Change-Id: Ia280470fcb5188a09caa07bf665a2a94bce23bc4
Fixes: bz#1739450
Signed-off-by: Pranith Kumar K

cluster/ec: Always read from good-mask

2019-10-30T08:18:18+00:00

There are cases where fop->mask may have fop->healing added
and readv shouldn't be wound on fop->healing. To avoid this
always wind readv to lock->good_mask

updates: bz#1739449
Change-Id: I2226ef0229daf5ff315d51e868b980ee48060b87
Signed-off-by: Pranith Kumar K

cluster/ec: inherit healing from lock when it has info

2019-10-30T08:18:18+00:00

If lock has info, fop should inherit healing mask from it.
Otherwise, fop cannot inherit right healing when changed_flags is zero.

Change-Id: Ife80c9169d2c555024347a20300b0583f7e8a87f
updates: bz#1739449
Signed-off-by: Kinglong Mee