glusterfs.git/tests, branch v7.4

glusterd: Brick process fails to come up with brickmux on

2020-03-17T06:04:40+00:00

Issue:
1- In a cluster of 3 Nodes N1, N2, N3. Create 3 volumes vol1,
vol2, vol3 with 3 bricks (one from each node)
2- Set cluster.brick-multiplex on
3- Start all 3 volumes
4- Check if all bricks on a node are running on same port
5- Kill N1
6- Set performance.readdir-ahead for volumes vol1, vol2, vol3
7- Bring N1 up and check volume status
8- All bricks processes not running on N1.

Root Cause -
Since, There is a diff in volfile versions in N1 as compared
to N2 and N3 therefore glusterd_import_friend_volume() is called.
glusterd_import_friend_volume() copies the new_volinfo and deletes
old_volinfo and then calls glusterd_start_bricks().
glusterd_start_bricks() looks for the volfiles and sends an rpc
request to glusterfs_handle_attach(). Now, since the volinfo
has been deleted by glusterd_delete_stale_volume()
from priv->volumes list before glusterd_start_bricks() and
glusterd_create_volfiles_and_notify_services() and
glusterd_list_add_order is called after glusterd_start_bricks(),
therefore the attach RPC req gets an empty volfile path
and that causes the brick to crash.

Fix- Call glusterd_list_add_order() and
glusterd_create_volfiles_and_notify_services before
glusterd_start_bricks() cal is made in glusterd_import_friend_volume

> Change-Id: Idfe0e8710f7eb77ca3ddfa1cabeb45b2987f41aa
> Bug: bz#1773856
> Signed-off-by: Mohammed Rafi KC 
(cherry picked from commit 45e81aae791da9d013aba2286af44826227c05ec)

Change-Id: Idfe0e8710f7eb77ca3ddfa1cabeb45b2987f41aa
fixes: bz#1808964
Signed-off-by: Sanju Rakonde

afr: prevent spurious entry heals leading to gfid split-brain

2020-02-25T17:07:04+00:00

Problem:
In a hyperconverged setup with granular-entry-heal enabled, if a file is
recreated while one of the bricks is down, and an index heal is triggered
(with the brick still down), entry-self heal was doing a spurious heal
with just the 2 good bricks. It was doing a post-op leading to removal
of the filename from .glusterfs/indices/entry-changes as well as
erroneous setting of afr xattrs on the parent. When the brick came up,
the xattrs were cleared, resulting in the renamed file not getting
healed and leading to gfid split-brain and EIO on the mount.

Fix:
Proceed with entry heal only when shd can connect to all bricks of the replica,
just like in data and metadata heal.

fixes: bz#1804591
Change-Id: I916ae26ad1fabf259bc6362da52d433b7223b17e
Signed-off-by: Ravishankar N 
(cherry picked from commit 06453d77d056fbaa393a137ca277a20e38d2f67e)

server: Mount fails after reboot 1/3 gluster nodes

2020-02-10T08:03:04+00:00

Problem: At the time of coming up one server node(1x3) after reboot
client is unmounted.The client is unmounted because a client
is getting AUTH_FAILED event and client call fini for the graph.The
client is getting AUTH_FAILED because brick is not attached with a
graph at that moment

Solution: To avoid the unmounting the client graph throw ENOENT error
          from server in case if brick is not attached with server at
          the time of authenticate clients.

> Credits: Xavi Hernandez 
> Change-Id: Ie6fbd73cbcf23a35d8db8841b3b6036e87682f5e
> Fixes: bz#1793852
> Signed-off-by: Mohit Agrawal 
> (cherry picked from commit > f6421dff22a6ddaf14134f6894deae219948c89d)

Change-Id: Ie6fbd73cbcf23a35d8db8841b3b6036e87682f5e
Fixes: bz#1794019
Signed-off-by: Mohit Agrawal

rpc: Cleanup SSL specific data at the time of freeing rpc object

2020-02-10T07:50:06+00:00

Problem: At the time of cleanup rpc object ssl specific data
         is not freeing so it has become a leak.

Solution: To avoid the leak cleanup ssl specific data at the
          time of cleanup rpc object

> Credits: l17zhou 
> Fixes: bz#1768407
> Change-Id: I37f598673ae2d7a33c75f39eb8843ccc6dffaaf0
> (cherry picked from commit > > 54ed71dba174385ab0d8fa415e09262f6250430c)

Change-Id: I37f598673ae2d7a33c75f39eb8843ccc6dffaaf0
Fixes: bz#1795540
Signed-off-by: Mohit Agrawal

geo-rep: Fix ssh-port validation

2020-02-10T07:29:57+00:00

If non-standard ssh-port is used, Geo-rep can be configured to use ssh port
by using config option, the value should be in allowed port range and non negative.

At present it can accept negative value and outside allowed port range which is incorrect.

Many Linux kernels use the port range 32768 to 61000.
IANA suggests it should be in the range 1 to 2^16 - 1, so keeping the same.

$ gluster volume geo-replication master 127.0.0.1::slave config ssh-port -22
geo-replication config updated successfully
$ gluster volume geo-replication master 127.0.0.1::slave config ssh-port 22222222
geo-replication config updated successfully

This patch fixes the above issue and have added few validations around this
in test cases.
Upstream Patch:
https://review.gluster.org/#/c/glusterfs/+/24035/

Backport of:
>   Change-Id: I9875ab3f00d7257370fbac6f5ed4356d2fed3f3c
>   Fixes: bz#1792276
>   Signed-off-by: Sunny Kumar 
>   (cherry picked from commit 485212e858bddd97573a3b2b811357b0d822005a)

Change-Id: I9875ab3f00d7257370fbac6f5ed4356d2fed3f3c
Fixes: bz#1793412
Signed-off-by: Sunny Kumar

performance/md-cache: Do not skip caching of null character xattr values

2019-12-19T12:42:32+00:00

Null character string is a valid xattr value in file system. But for
those xattrs processed by md-cache, it does not update its entries if
value is null('\0'). This results in ENODATA when those xattrs are
queried afterwards via getxattr() causing failures in basic operations
like create, copy etc in a specially configured Samba setup for Mac OS
clients.

On the other side snapview-server is internally setting empty string("")
as value for xattrs received as part of listxattr() and are not intended
to be cached. Therefore we try to maintain that behaviour using an
additional dictionary key to prevent updation of entries in getxattr()
and fgetxattr() callbacks in md-cache.

Credits: Poornima G 

Change-Id: I7859cbad0a06ca6d788420c2a495e658699c6ff7
Fixes: bz#1785228
Signed-off-by: Anoop C S 
(cherry picked from commit b4b683736367d93daad08a5ee6ca95778c07c5a4)

test: fix non-root test case for geo-rep

2019-12-18T10:55:31+00:00

Problem:
On a freshly installed system non-root geo-rep test case gets blocked.

Solution:

On a freshly installed system, the remote key need to be accepted automatically by ssh-copy-id.

Credits: M. Scherer 

Backport of:
>    Change-Id: I5077f99a6681660f7e3e84c25ef216f521b7c29c
>    Fixes: bz#1779742
>    Signed-off-by: Sunny Kumar 

Change-Id: I5077f99a6681660f7e3e84c25ef216f521b7c29c
Fixes: bz#1784790
Signed-off-by: Sunny Kumar

test: fix suspicous non-root geo-rep test failures

2019-11-27T06:16:11+00:00

Export of env variable is required for ssh-copy-id command.

Backport of:

    >fixes: bz#1765426
    >Change-Id: Icaf7a848cb8f4ae9f887d885a8c5bb71f26633b4
    >Signed-off-by: Sunny Kumar 
    >(cherry picked from commit febfa9f2ec9dfc5dbf4a68c3518f98364ebc461)

Change-Id: Ic244b065db9959c0c6ba952955f0f68e3f96e925
fixes: bz#1765431
Signed-off-by: Sunny Kumar

cluster/afr: Heal entries when there is a source & no healed_sinks

2019-11-14T13:18:45+00:00

Problem:
In a situation where B1 blames B2, B2 blames B1 and B3 doesn't blame
anything for entry heal, heal will not complete even though we have
clear source and sinks. This will happen because while doing
afr_selfheal_find_direction() only the bricks which are blamed by
non-accused bricks are considered as sinks. Later in
__afr_selfheal_entry_finalize_source() when it tries to mark all the
non-sources as sinks it fails to do so because there won't be any
healed_sinks marked, no witness present and there will be a source.

Fix:
If there is a source and no healed_sinks, then reset all the locked
sources to 0 and healed sinks to 1 to do conservative merge.

Change-Id: If40d8bc95d52a52b2730f55bdcf135109b421548
Fixes: bz#1760699
Signed-off-by: karthik-us

afr: support split-brain CLI for replica 3

2019-11-13T05:04:07+00:00

Ever since we added quorum checks for lookups in afr via commit
bd44d59741bb8c0f5d7a62c5b1094179dd0ce8a4, the split-brain resolution
commands would not work for replica 3 because there would be no
readables for the lookup fop.

The argument was that split-brains do not occur in replica 3 but we do
see (data/metadata) split-brain cases once in a while which indicate that there are
a few bugs/corner cases yet to be discovered and fixed.

Fortunately, commit  8016d51a3bbd410b0b927ed66be50a09574b7982 added
GF_CLIENT_PID_GLFS_HEALD as the pid for all fops made by glfsheal. If we
leverage this and allow lookups in afr when pid is GF_CLIENT_PID_GLFS_HEALD,
split-brain resolution commands will work for replica 3 volumes too.

Likewise, the check is added in shard_lookup as well to permit resolving
split-brains by specifying "/.shard/shard-file.xx" as the file name
(which previously used to fail with EPERM).

Change-Id: I3c543dea79caf7cfbc1633e9089cb1cdd2538ba9
Fixes: bz#1760791
Signed-off-by: Ravishankar N 
(cherry picked from commit 47dbd753187f69b3835d2e42fdbe7485874c4b3e)