glusterfs.git/xlators/cluster, branch v8.0rc0

cluster/ec: Return correct error code and log message

2020-05-28T07:16:40+00:00

In case of readdir was send with an FD on which opendir
was failed, this FD will be useless and we return it with error.
For now, we are returning it with EINVAL without logging any
message in log file.

Return a correct error code and also log the message to improve thing to debug.

fixes: #1220
Change-Id: Iaf035254b9c5aa52fa43ace72d328be622b06169
(cherry picked from commit af70cb5eedd80207cd184e69f2a4fb252b72d070)

afr: event gen changes

2020-04-24T12:50:13+00:00

The general idea of the changes is to prevent resetting event generation
to zero in the inode ctx, since event gen is something that should
follow 'causal order'.

Change #1:
For a read txn, in inode refresh cbk, if event_generation is
found zero, we are failing the read fop. This is not needed
because change in event gen is only a marker for the next inode refresh to
happen and should not be taken into account by the current read txn.

Change #2:
The event gen being zero above can happen if there is a racing lookup,
which resets even get (in afr_lookup_done) if there are non zero afr
xattrs. The resetting is done only to trigger an inode refresh and a
possible client side heal on the next lookup. That can be acheived by
setting the need_refresh flag in the inode ctx. So replaced all
occurences of resetting even gen to zero with a call to
afr_inode_need_refresh_set().

Change #3:
In both lookup and discover path, we are doing an inode refresh which is
not required since all 3 essentially do the same thing- update the inode
ctx with the good/bad copies from the brick replies. Inode refresh also
triggers background heals, but I think it is okay to do it when we call
refresh during the read and write txns and not in the lookup path.

The .ts which relied on inode refresh in lookup path to trigger heals are
now changed to do read txn so that inode refresh and the heal happens.

Change-Id: Iebf39a9be6ffd7ffd6e4046c96b0fa78ade6c5ec
Fixes: #1179
Signed-off-by: Ravishankar N 
Reported-by: Erik Jacobson 
(cherry picked from commit f0fcd909ad4535b60c9208d4804ebe6afe421a09)

dht - Remove "tier" code (part 1)

2020-04-17T04:59:18+00:00

This patch is removing some of the "tier" code in dht xlator, as it is no longer
being used.
Not all of the not-needed code is removed at once, so reviewing is easier.
Follow up patches removing additional unused code will follow.

This is based in the work done in https://review.gluster.org/#/c/glusterfs/+/23935/

Change-Id: I3cb6a0c5d8f14afcd87cf021ef8f74b91c0f908a
updates: #1097
Signed-off-by: Barak Sason Rofman

dht - fixing a permission update issue

2020-04-08T06:57:53+00:00

When bringing back a downed brick and performing lookup from the client
side, the permission on said brick aren't updated on the first lookup,
but only on the second.

This patch modifies permission update logic so the first lookup will
trigger a permission update on the downed brick.

LIMITATIONS OF THE PATCH:
As the choice of source depends on whether the directory has layout or not.
Even the directories on the newly added brick will have layout xattr[zeroed], but the same is not true for a root directory.
Hence, in case in the entire cluster only the newly added bricks are up [and others are down], then any change in permission during this time will be overwritten by the older permissions when the cluster is restarted.

fixes: #999
Change-Id: Ieb70246d41e59f9cae9f70bc203627a433dfbd33
Signed-off-by: Barak Sason Rofman

cluster/afr: Removing unsupported options from code base to improve coverage

2020-04-07T04:26:33+00:00

Support for gluster volume heal  info healed/heal-failed
was removed by commit bb02cfb56ae08f56df4452c2b948fa962ae1212b in
release-3.6. cli parser will display the usage message in all the
supported versions whenever these clis are run, leading to some
dead code in the latest branches. Since support for these clis
were removed long back, this should not give any backward
compatibility issues as well. Hence removing the dead code from
the code base which will lead to better code coverage by the
regression runs as well.

Updates: #1052
Change-Id: I0c2b061469caf233c06d9699b0d159ce48e240b9
Signed-off-by: karthik-us

afr: mark pending xattrs as a part of metadata heal

2020-04-02T04:32:57+00:00

...if pending xattrs are zero for all children.

Problem:
If there are no pending xattrs and a metadata heal needs to be
performed, it can be possible that we end up with xattrs inadvertendly
deleted from all bricks, as explained in the  BZ.

Fix:
After picking one among the sources as the good copy, mark pending xattrs on
all sources to blame the sinks. Now even if this metadata heal fails midway,
a subsequent heal will still choose one of the valid sources that it
picked previously.

Fixes: #1067
Change-Id: If1b050b70b0ad911e162c04db4d89b263e2b8d7b
Signed-off-by: Ravishankar N

dht: gf_defrag_process_dir is called even if gf_defrag_fix_layout has failed

2020-03-24T05:14:07+00:00

Currently even though gf_defrag_fix_layout fails with ENOENT or ESTALE, a
subsequent call is made to gf_defrag_process_dir leading to rebalance failure.

fixes: #1102
Change-Id: Ib0c309fd78e89a000fed3feb4bbe2c5b48e61478
Signed-off-by: Susant Palai

cluster/afr: Fixes for halo

2020-03-13T13:20:37+00:00

Current implementation assumes that ping-event will come after connect event
but that may not be the case in the cases where after socket connection fds
need to be re-opened which would consume more time. So handle any order of the
ping/child-up events.

fixes: bz#1800583
Change-Id: I6bcdc0caa503bdc039ef2b4739fbf4afae121f05
Signed-off-by: Pranith Kumar K

dht - selfheal code cleaning

2020-03-12T07:31:30+00:00

1 - Converted methods to static
2 - Removed unused code

Change-Id: I49db3e28116da1c3c9ff0a33dcce7281bc3856f7
updates: bz#1193929
Signed-off-by: Barak Sason Rofman

dht/rebalance - fixing failure occurace due to rebalance stop

2020-03-04T09:40:24+00:00

Probelm description:
When topping rebalance, the following error messages appear in the
rebalance log file:
[2020-01-28 14:31:42.452070] W [dht-rebalance.c:3447:gf_defrag_process_dir] 0-distrep-dht: Found error from gf_defrag_get_entry
[2020-01-28 14:31:42.452764] E [MSGID: 109111] [dht-rebalance.c:3971:gf_defrag_fix_layout] 0-distrep-dht: gf_defrag_process_dir failed for directory: /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25/26/27/28/29/30/31
[2020-01-28 14:31:42.453498] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25/26/27/28/29/30

In order to avoid seing these error messages, a modification to the
error handling mechanism has been made.
In addition, several log messages had been added in order to improve debugging efficiency

fixes: bz#1800956
Change-Id: Ifc82dae79ab3da9fe22ee25088a2a6b855afcfcf
Signed-off-by: Barak Sason Rofman