glusterfs.git/tests/bugs, branch v3.11.0rc1

rda, glusterd: Change the max of rda-cache-limit to INFINITY

2017-05-22T15:05:43+00:00

Issue:
The max value of rda-cache-limit is 1GB before this patch.
When parallel-readdir is enabled, there will be many instances of
readdir-ahead, hence the rda-cache-limit depends on the number of
instances. Eg: On a volume with distribute count 4, rda-cache-limit
when parallel-readdir is enabled, will be 4GB instead of 1GB.
Consider a followinf sequence of operations:
- Enable parallel readdir
- Set rda-cache-limit to lets say 3GB
- Disable parallel-readdir, this results in one instance of readdir-ahead
  and the rda-cache-limit will be back to 1GB, but the current value is 3GB
  and hence the mount will stop working as 3GB > max 1GB.

Solution:
To fix this, we can limit the cache to 1GB even when parallel-readdir
is enabled. But there is no necessity to limit the cache to 1GB, it
can be increased if the system has enough resources. Hence getting rid
of the rda-cache-limit max value is more apt. If we just change the
rda-cache-limit max to INFINITY, we will render older(<3.11) clients
broken, when the rda-cache-limit is set to > 1GB (as the older clients
still expect a value < 1GB). To safely change the max value of
rda-cache-limit to INFINITY, add a check in glusted to verify all
the clients are > 3.11 if the value exceeds 1GB.

>Reviewed-on: https://review.gluster.org/17338
>Smoke: Gluster Build System 
>Reviewed-by: Atin Mukherjee 
>NetBSD-regression: NetBSD Build System 
>CentOS-regression: Gluster Build System 
>(cherry picked from commit e43b40296956d132c70ffa3aa07b0078733b39d4)

Change-Id: Id0cdda3b053287b659c7bf511b13db2e45b92032
BUG: 1453152
Signed-off-by: Poornima G 
Reviewed-on: https://review.gluster.org/17354
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

glusterd: Don't spawn new glusterfsds on node reboot with brick-mux

2017-05-22T14:17:23+00:00

With brick multiplexing enabled, upon a node reboot new bricks were
not being attached to the first spawned brick process even though
there wasn't any compatibility issues.

The reason for this is that upon glusterd restart after a node
reboot, since brick services aren't running, glusterd starts the
bricks in a "no-wait" mode. So after a brick process is spawned for
the first brick, there isn't enough time for the corresponding pid
file to get populated with a value before the compatibilty check is
made for the next brick.

This commit solves this by iteratively waiting for the pidfile to be
populated in the brick compatibility comparison stage before checking
if the brick process is alive.

> Reviewed-on: https://review.gluster.org/17307
> Reviewed-by: Atin Mukherjee 
> Smoke: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 

(cherry picked from commit 13e7b3b354a252ad4065f7b2f0f805c40a3c5d18)

Change-Id: Ibd1f8e54c63e4bb04162143c9d70f09918a44aa4
BUG: 1453086
Signed-off-by: Samikshan Bairagya 
Reviewed-on: https://review.gluster.org/17351
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee

Fixes quota aux mount failure

2017-05-17T23:26:20+00:00

The aux mount is created on the first limit/remove_limit/list command
and it remains until volume is stopped / deleted / (quota is disabled)
, where we do a lazy unmount. If the process is uncleanly terminated,
then the mount entry remains and we get (Transport disconnected) error
on subsequent attempts to run quota list/limit-usage/remove commands.

Second issue, There is also a risk of inadvertent rm -rf on the
/var/run/gluster causing data loss for the user. Ideally, /var/run is
a temp path for application use and should not cause any data loss to
persistent storage.

Solution:
1) unmount the aux mount after each use.
2) clean stale mount before mounting, if any.

One caveat with doing mount/unmount on each command is that we cannot
use same mount point for both list and limit commands.
The reason for this is that list command needs mount to be accessible
in cli after response from glusterd, So it could be unmounted by a
limit command if executed in parallel (had we used same mount point)
Hence we use separate mount points for list and limit commands.

>Reviewed-on: https://review.gluster.org/16938
>NetBSD-regression: NetBSD Build System 
>Smoke: Gluster Build System 
>Reviewed-by: Manikandan Selvaganesh 
>CentOS-regression: Gluster Build System 
>Reviewed-by: Raghavendra G 
>Reviewed-by: Atin Mukherjee 
>(cherry picked from commit 2ae4b4058691b324535d802f4e6d24cce89a10e5)

Change-Id: I4f9e39da2ac2b65941399bffb6440db8a6ba59d0
BUG: 1449775
Signed-off-by: Sanoj Unnikrishnan 
Reviewed-on: https://review.gluster.org/17240
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee

glusterd: Make reset-brick work correctly if brick-mux is on

2017-05-16T00:29:48+00:00

Reset brick currently kills of the corresponding brick process.
However, with brick multiplexing enabled, stopping the brick
process would render all bricks attached to it unavailable. To
handle this correctly, we need to make sure that the brick process
is terminated only if brick-multiplexing is disabled. Otherwise,
we should send the GLUSTERD_BRICK_TERMINATE rpc to the respective
brick process to detach the brick that is to be reset.

> Reviewed-on: https://review.gluster.org/17128
> Smoke: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Atin Mukherjee 

(cherry picked from commit 74383e3ec6f8244b3de9bf14016452498c1ddcf0)

Change-Id: I69002d66ffe6ec36ef48af09b66c522c6d35ac58
BUG: 1449933
Signed-off-by: Samikshan Bairagya 
Reviewed-on: https://review.gluster.org/17245
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee

afr: include quorum type and count when dumping afr priv

2017-05-12T13:36:40+00:00

Squash of  https://review.gluster.org/17196 and
           https://review.gluster.org/17215

Dump the client quorum type ('auto', 'fixed' or 'none'). If it is 'fixed',
also dump the quorum-count. This information will be available in the client
statedump and in
//.meta/graphs/active/testvol-replicate-X/private.

Also added a test-case.

Change-Id: I91367c5250b26efb35e5f7d7c397def09cc77cbc
BUG: 1449921
Signed-off-by: Ravishankar N 
Reviewed-on: https://review.gluster.org/17243
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

glusterd: socketfile & pidfile related fixes for brick multiplexing feature

2017-05-10T14:05:52+00:00

Problem: While brick-muliplexing is on after restarting glusterd, CLI is
         not showing pid of all brick processes in all volumes.

Solution: While brick-mux is on all local brick process communicated through one
          UNIX socket but as per current code (glusterd_brick_start) it is trying
          to communicate with separate UNIX socket for each volume which is populated
          based on brick-name and vol-name.Because of multiplexing design only one
          UNIX socket is opened so it is throwing poller error and not able to
          fetch correct status of brick process through cli process.
          To resolve the problem write a new function glusterd_set_socket_filepath_for_mux
          that will call by glusterd_brick_start to validate about the existence of socketpath.
          To avoid the continuous EPOLLERR erros in  logs update socket_connect code.

Test:     To reproduce the issue followed below steps
          1) Create two distributed volumes(dist1 and dist2)
          2) Set cluster.brick-multiplex is on
          3) kill glusterd
          4) run command gluster v status
          After apply the patch it shows correct pid for all volumes

> BUG: 1444596
> Change-Id: I5d10af69dea0d0ca19511f43870f34295a54a4d2
> Signed-off-by: Mohit Agrawal 
> Reviewed-on: https://review.gluster.org/17101
> Smoke: Gluster Build System 
> Reviewed-by: Prashanth Pai 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Atin Mukherjee 
> (cherry picked from commit 21c7f7baccfaf644805e63682e5a7d2a9864a1e6)

Change-Id: Ia95b9d36e50566b293a8d6350f8316dafc27033b
BUG: 1449004
Signed-off-by: Mohit Agrawal 
Reviewed-on: https://review.gluster.org/17212
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Atin Mukherjee 
Reviewed-by: Prashanth Pai 
CentOS-regression: Gluster Build System

afr: don't do a post-op on a brick if op failed

2017-04-19T02:29:25+00:00

Problem:
In afr-v2, self-blaming xattrs are not there by design. But if the FOP
failed on a brick due to an error other than ENOTCONN (or even due to
ENOTCONN, but we regained connection before postop was wound), we wind
the post-op also on the failed brick, leading to setting self-blaming
xattrs on that brick. This can lead to undesired results like healing of
files in split-brain etc.

Fix:
If a fop failed on a brick on which pre-op was successful, do not
perform post-op on it. This also produces the desired effect of not
resetting the dirty xattr on the brick, which is how it should be
because if the fop failed on a brick, there is no reason to clear the
dirty bit which actually serves as an indication of the failure.

Change-Id: I5f1caf4d1b39f36cf8093ccef940118638caa9c4
BUG: 1438255
Signed-off-by: Ravishankar N 
Reviewed-on: https://review.gluster.org/16976
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

dht: Add readdir-ahead in rebalance graph if parallel-readdir is on

2017-04-18T06:16:11+00:00

Issue:
The value of linkto xattr is generally the name of the dht's
next subvol, this requires that the next subvol of dht is not
changed for the life time of the volume. But with parallel
readdir enabled, the readdir-ahead loaded below dht, is optional.
The linkto xattr for first subvol, when:
- parallel readdir is enabled : "-readdir-head-0"
- plain distribute volume : "-client-0"
- distribute replicate volume : "-afr-0"

The value of linkto xattr is "-readdir-head-0" when
parallel readdir is enabled, and is "-client-0" if
its disabled. But the dht_lookup takes care of healing if it
cannot identify which linkto subvol, the xattr points to.

In dht_lookup_cbk, if linkto xattr is found to be "-client-0"
and parallel readdir is enabled, then it cannot understand the
value "-client-0" as it expects "-readdir-head-0".
In that case, dht_lookup_everywhere is issued and then the linkto file
is unlinked and recreated with the right linkto xattr. The issue is
when parallel readdir is enabled, mount point accesses the file
that is currently being migrated. Since rebalance process doesn't
have parallel-readdir feature, it expects "-client-0"
where as mount expects "-readdir-head-0". Thus at some point
either the mount or rebalance will fail.

Solution:
Enable parallel-readdir for rebalance as well and then do not
allow enabling/disabling parallel-readdir if rebalance is in
progress.

Change-Id: I241ab966bdd850e667f7768840540546f5289483
BUG: 1436090
Signed-off-by: Poornima G 
Reviewed-on: https://review.gluster.org/17056
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee 
Reviewed-by: Raghavendra G

cluster/dht: Make rebalance honor min-free-disk

2017-04-13T09:57:42+00:00

test:  Manual

created files of size 1K on 2 brick(of size 1GB) setup .
added a brick of size 16GB.
set min-free-disk to 12GB(so that first two bricks won't receive any files).
removed one of the 1st brick of size 1GB.

Logs from test:
[2017-04-12 08:52:08.196484] W [MSGID: 0] [dht-rebalance.c:895:__dht_check_free_space]
 0-test1-dht: Write will cross min-free-disk for file - /tile32 on subvol - test1-client-1.
Looking for new subvol.

[2017-04-12 08:52:08.196904] I [MSGID: 0] [dht-rebalance.c:925:__dht_check_free_space]
0-test1-dht: new target found - test1-client-2 for file - /tile32

 - Post migration we have two files. The new destination (/brick/1) has the data file
[root@vm1 ~]# ll /brick/1/tile32
-rw-r--r--. 2 root root 0 Apr 12 14:22 /brick/1/tile32

 - On the old target the linkto file is there with linkto xattr pointing to /brick/1
[root@vm1 ~]# ll /tmp/2/tile32
---------T. 2 root root 1000 Apr 12 14:22 /tmp/2/tile32
[root@vm1 ~]# getfattr -m . -de text /tmp/2/tile32
getfattr: Removing leading '/' from absolute path names
security.selinux="unconfined_u:object_r:user_tmp_t:s0"
trusted.gfid="����:Aс�#�/'b2"
trusted.glusterfs.dht.linkto="test1-client-2"

Marking ./tests/features/worm_sh.t as bad test.
Reason being, this patch failed on master branch as well and it has nothing
to do with rebalance/remove-brick.

BUG: 1441508
Change-Id: I90bae251cda3d957a49cdceda90cd08311a392fb
Signed-off-by: Susant Palai 
Reviewed-on: https://review.gluster.org/17034
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Amar Tumballi 
Reviewed-by: Raghavendra G 
CentOS-regression: Gluster Build System

glusterd: Add client details to get-state output

2017-04-13T03:43:08+00:00

This commit optionally adds client details corresponding to the
locally running bricks to the get-state output. Since getting
the client details involves sending RPC requests to the respective
local bricks, this is a relatively more costly operation. These
client details would be added to the get-state output only if the
get-state command is invoked with the 'detail' option.

This commit therefore also changes the get-state CLI usage. The
modified usage is as follows:

 # gluster get-state [] [[odir ] \
[file ]] [detail]

Change-Id: I42cd4ef160f9e96d55a08a10d32c8ba44e4cd3d8
BUG: 1431183
Signed-off-by: Samikshan Bairagya 
Reviewed-on: https://review.gluster.org/17003
NetBSD-regression: NetBSD Build System 
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee