glusterfs.git/tests/bugs/replicate, branch master

cluster/afr: Heal directory rename without rmdir/mkdir

2020-10-01T12:03:25+00:00

Problem1:
When a directory is renamed while a brick
is down entry-heal always did an rm -rf on that directory on
the sink on old location and did mkdir and created the directory
hierarchy again in the new location. This is inefficient.

Problem2:
Renamedir heal order may lead to a scenario where directory in
the new location could be created before deleting it from old
location leading to 2 directories with same gfid in posix.

Fix:
As part of heal, if oldlocation is healed first and is not present in
source-brick always rename it into a hidden directory inside the
sink-brick so that when heal is triggered in new-location shd can
rename it from this hidden directory to the new-location.

If new-location heal is triggered first and it detects that the
directory already exists in the brick, then it should skip healing the
directory until it appears in the hidden directory.

Credits: Ravi for rename-data-loss.t script

Fixes: #1211
Change-Id: I0cba2006f35cd03d314d18211ce0bd530e254843
Signed-off-by: Pranith Kumar K

afr: more quorum checks in lookup and new entry marking

2020-06-16T04:53:03+00:00

Problem: See github issue for details.

Fix:
-In lookup if the entry exists in 2 out of 3 bricks, don't fail the
lookup with ENOENT just because there is an entrylk on the parent.
Consider quorum before deciding.

-If entry FOP does not succeed on quorum no. of bricks, do not perform
new entry mark.

Fixes: #1303
Change-Id: I56df8c89ad53b29fa450c7930a7b7ccec9f4a6c5
Signed-off-by: Ravishankar N

cluster/afr: Prioritize ENOSPC over other errors

2020-06-05T03:53:28+00:00

Problem:
In a replicate/arbiter volume if file creations or writes fails on
quorum number of bricks and on one brick it is due to ENOSPC and
on other brick it fails for a different reason, it may fail with
errors other than ENOSPC in some cases.

Fix:
Prioritize ENOSPC over other lesser priority errors and do not set
op_errno in posix_gfid_set if op_ret is 0 to avoid receiving any
error_no which can be misinterpreted by __afr_dir_write_finalize().

Also removing the function afr_has_arbiter_fop_cbk_quorum() which
might consider a successful reply form a single brick as quorum
success in some cases, whereas we always need fop to be successful
on quorum number of bricks in arbiter configuration.

Change-Id: I106e267f8b9451f681022f1cccb410d9bc824c08
Fixes: #1254
Signed-off-by: karthik-us

tests: Fix bug-1101647.t test case failure

2020-04-29T04:39:45+00:00

Problem: tests/bugs/replicate/bug-1101647.t test case fails sporadically
in the volume heal since connection to the bricks with shd was not being
checked before running the index heal.
Build link: https://build.gluster.org/job/regression-test-burn-in/5007/

Fix: Check for the connection status of the bricks with shd before
performing the index heal.

Change-Id: Ie7060f379b63bef39fd4f9804f6e22e0a25680c1
Updates: #1154
Signed-off-by: karthik-us

afr: mark pending xattrs as a part of metadata heal

2020-04-02T04:32:57+00:00

...if pending xattrs are zero for all children.

Problem:
If there are no pending xattrs and a metadata heal needs to be
performed, it can be possible that we end up with xattrs inadvertendly
deleted from all bricks, as explained in the  BZ.

Fix:
After picking one among the sources as the good copy, mark pending xattrs on
all sources to blame the sinks. Now even if this metadata heal fails midway,
a subsequent heal will still choose one of the valid sources that it
picked previously.

Fixes: #1067
Change-Id: If1b050b70b0ad911e162c04db4d89b263e2b8d7b
Signed-off-by: Ravishankar N

afr: prevent spurious entry heals leading to gfid split-brain

2020-02-18T09:11:40+00:00

Problem:
In a hyperconverged setup with granular-entry-heal enabled, if a file is
recreated while one of the bricks is down, and an index heal is triggered
(with the brick still down), entry-self heal was doing a spurious heal
with just the 2 good bricks. It was doing a post-op leading to removal
of the filename from .glusterfs/indices/entry-changes as well as
erroneous setting of afr xattrs on the parent. When the brick came up,
the xattrs were cleared, resulting in the renamed file not getting
healed and leading to gfid split-brain and EIO on the mount.

Fix:
Proceed with entry heal only when shd can connect to all bricks of the replica,
just like in data and metadata heal.

fixes: bz#1801624
Change-Id: I916ae26ad1fabf259bc6362da52d433b7223b17e
Signed-off-by: Ravishankar N

afr: restore timestamp of files during metadata heal

2020-01-24T04:35:38+00:00

For files: During metadata heal, we restore timestamps
only for non-regular (char, block etc.) files.
Extenting it for regular files as timestamp is updated
via touch command also

fixes: bz#1787274
Change-Id: I26fe4fb6dff679422ba4698a7f828bf62ca7ca18
Signed-off-by: Sheetal Pamecha

tests: Fix spurious failure

2019-10-16T12:31:03+00:00

fixes: bz#1759002
Change-Id: I4d49e1c2ca9b3c1d74b9dd5a30f1c66983a76529
Signed-off-by: Pranith Kumar K

Fix spurious failure in bug-1744548-heal-timeout.t

2019-10-09T07:17:49+00:00

Script was assuming that the heal would have triggered
by the time test was executed, which may not be the case.
It can lead to following failures when the race happens:

...
18:29:45 not ok  14 [     85/      1] <  26> '[ 331 == 333 ]' -> ''
...
18:29:45 not ok  16 [  10097/      1] <  33> '[ 668 == 666 ]' -> ''

Heal on 3rd brick didn't start completely first time the command was executed.
So the extra count got added to the next profile info.

Fixed it by depending on cumulative stats and waiting until the count is
satisfied using EXPECT_WITHIN

fixes: bz#1759002
Change-Id: I3b410671c902d6b1458a757fa245613cb29d967d
Signed-off-by: Pranith Kumar K

cluster/afr: Heal entries when there is a source & no healed_sinks

2019-10-09T07:16:11+00:00

Problem:
In a situation where B1 blames B2, B2 blames B1 and B3 doesn't blame
anything for entry heal, heal will not complete even though we have
clear source and sinks. This will happen because while doing
afr_selfheal_find_direction() only the bricks which are blamed by
non-accused bricks are considered as sinks. Later in
__afr_selfheal_entry_finalize_source() when it tries to mark all the
non-sources as sinks it fails to do so because there won't be any
healed_sinks marked, no witness present and there will be a source.

Fix:
If there is a source and no healed_sinks, then reset all the locked
sources to 0 and healed sinks to 1 to do conservative merge.

Change-Id: If40d8bc95d52a52b2730f55bdcf135109b421548
Fixes: bz#1749322
Signed-off-by: karthik-us