glusterfs.git/tests/bugs/glusterd, branch v3.10.2

Fixes quota aux mount failure

2017-05-13T21:12:07+00:00

The aux mount is created on the first limit/remove_limit/list command
and it remains until volume is stopped / deleted / (quota is disabled)
, where we do a lazy unmount. If the process is uncleanly terminated,
then the mount entry remains and we get (Transport disconnected) error
on subsequent attempts to run quota list/limit-usage/remove commands.

Second issue, There is also a risk of inadvertent rm -rf on the
/var/run/gluster causing data loss for the user. Ideally, /var/run is
a temp path for application use and should not cause any data loss to
persistent storage.

Solution:
1) unmount the aux mount after each use.
2) clean stale mount before mounting, if any.

One caveat with doing mount/unmount on each command is that we cannot
use same mount point for both list and limit commands.
The reason for this is that list command needs mount to be accessible
in cli after response from glusterd, So it could be unmounted by a
limit command if executed in parallel (had we used same mount point)
Hence we use separate mount points for list and limit commands.

> Reviewed-on: https://review.gluster.org/16938
> NetBSD-regression: NetBSD Build System 
> Smoke: Gluster Build System 
> Reviewed-by: Manikandan Selvaganesh 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Raghavendra G 
> Reviewed-by: Atin Mukherjee 
> (cherry picked from commit 2ae4b4058691b324535d802f4e6d24cce89a10e5)

Change-Id: I4f9e39da2ac2b65941399bffb6440db8a6ba59d0
BUG: 1449779
Signed-off-by: Sanoj Unnikrishnan 
Reviewed-on: https://review.gluster.org/17241
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra Talur

glusterd: Make reset-brick work correctly if brick-mux is on

2017-05-12T19:59:02+00:00

Reset brick currently kills of the corresponding brick process.
However, with brick multiplexing enabled, stopping the brick
process would render all bricks attached to it unavailable. To
handle this correctly, we need to make sure that the brick process
is terminated only if brick-multiplexing is disabled. Otherwise,
we should send the GLUSTERD_BRICK_TERMINATE rpc to the respective
brick process to detach the brick that is to be reset.

> Signed-off-by: Samikshan Bairagya 
> Reviewed-on: https://review.gluster.org/17128
> Smoke: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Atin Mukherjee 

(cherry picked from commit 74383e3ec6f8244b3de9bf14016452498c1ddcf0)

Change-Id: I69002d66ffe6ec36ef48af09b66c522c6d35ac58
BUG: 1449934
Signed-off-by: Samikshan Bairagya 
Reviewed-on: https://review.gluster.org/17253
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee

glusterd: socketfile & pidfile related fixes for brick multiplexing feature

2017-05-10T10:42:05+00:00

Problem: While brick-muliplexing is on after restarting glusterd, CLI is
         not showing pid of all brick processes in all volumes.

Solution: While brick-mux is on all local brick process communicated through one
          UNIX socket but as per current code (glusterd_brick_start) it is trying
          to communicate with separate UNIX socket for each volume which is populated
          based on brick-name and vol-name.Because of multiplexing design only one
          UNIX socket is opened so it is throwing poller error and not able to
          fetch correct status of brick process through cli process.
          To resolve the problem write a new function glusterd_set_socket_filepath_for_mux
          that will call by glusterd_brick_start to validate about the existence of socketpath.
          To avoid the continuous EPOLLERR erros in  logs update socket_connect code.

Test:     To reproduce the issue followed below steps
          1) Create two distributed volumes(dist1 and dist2)
          2) Set cluster.brick-multiplex is on
          3) kill glusterd
          4) run command gluster v status
          After apply the patch it shows correct pid for all volumes

> BUG: 1444596
> Change-Id: I5d10af69dea0d0ca19511f43870f34295a54a4d2
> Signed-off-by: Mohit Agrawal 
> Reviewed-on: https://review.gluster.org/17101
> Smoke: Gluster Build System 
> Reviewed-by: Prashanth Pai 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Atin Mukherjee 
> (cherry picked from commit 21c7f7baccfaf644805e63682e5a7d2a9864a1e6)

Change-Id: I1892c80b9ffa93974f20c92d421660bcf93c4cda
BUG: 1449002
Signed-off-by: Mohit Agrawal 
Reviewed-on: https://review.gluster.org/17210
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee 
Reviewed-by: Prashanth Pai

rpc: bump up conn->cleanup_gen in rpc_clnt_reconnect_cleanup

2017-03-27T13:58:29+00:00

Commit 086436a introduced generation number (cleanup_gen) to ensure that
rpc layer doesn't end up cleaning up the connection object if
application layer has already destroyed it. Bumping up cleanup_gen was
done only in rpc_clnt_connection_cleanup (). However the same is needed
in rpc_clnt_reconnect_cleanup () too as with out it if the object gets destroyed
through the reconnect event in the application layer, rpc layer will
still end up in trying to delete the object resulting into double free
and crash.

Peer probing an invalid host/IP was the basic test to catch this issue.

>Reviewed-on: https://review.gluster.org/16914
>Smoke: Gluster Build System 
>NetBSD-regression: NetBSD Build System 
>Reviewed-by: Milind Changire 
>CentOS-regression: Gluster Build System 
>Reviewed-by: Jeff Darcy 
>(cherry picked from commit 39e09ad1e0e93f08153688c31433c38529f93716)

Change-Id: Id5332f3239cb324cead34eb51cf73d426733bd46
BUG: 1434399
Signed-off-by: Atin Mukherjee 
Reviewed-on: https://review.gluster.org/16936
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

glusterd: ignore return code of glusterd_restart_bricks

2017-02-10T11:41:31+00:00

When GlusterD is restarted on a multi node cluster, while syncing the
global options from other GlusterD, it checks for quorum and based on
which it decides whether to stop/start a brick. However we handle the
return code of this function in which case if we don't want to start any
bricks the ret will be non zero and we will end up failing the import
which is incorrect.

Fix is just to ignore the ret code of glusterd_restart_bricks ()

>Reviewed-on: https://review.gluster.org/16574
>Smoke: Gluster Build System 
>NetBSD-regression: NetBSD Build System 
>CentOS-regression: Gluster Build System 
>Reviewed-by: Samikshan Bairagya 
>Reviewed-by: Jeff Darcy 
>(cherry picked from commit 55625293093d485623f3f3d98687cd1e2c594460)

Change-Id: I37766b0bba138d2e61d3c6034bd00e93ba43e553
BUG: 1420991
Signed-off-by: Atin Mukherjee 
Reviewed-on: https://review.gluster.org/16593
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Samikshan Bairagya 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Shyamsundar Ranganathan

glusterd: double-check brick liveness for remove-brick validation

2017-02-03T13:46:53+00:00

Same problem as https://review.gluster.org/#/c/16509/ in a different
place.  Tests detach bricks without glusterd's knowledge, so
glusterd's internal brick state is out of date and we have to re-check
(via the brick's pidfile) as well.

Backport of:
> BUG: 1385758
> Change-Id: I169538c1c62d72a685a49d57ef65fb6c3db6eab2
> Reviewed-on: https://review.gluster.org/16529

BUG: 1418091
Change-Id: Id0b597bc60807ed090f6ecdba549c5cf3d758f98
Signed-off-by: Jeff Darcy 
Reviewed-on: https://review.gluster.org/16537
Reviewed-by: Atin Mukherjee 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

core: run many bricks within one glusterfsd process

2017-02-02T00:54:58+00:00

This patch adds support for multiple brick translator stacks running in
a single brick server process.  This reduces our per-brick memory usage
by approximately 3x, and our appetite for TCP ports even more.  It also
creates potential to avoid process/thread thrashing, and to improve QoS
by scheduling more carefully across the bricks, but realizing that
potential will require further work.

Multiplexing is controlled by the "cluster.brick-multiplex" global
option.  By default it's off, and bricks are started in separate
processes as before.  If multiplexing is enabled, then *compatible*
bricks (mostly those with the same transport options) will be started in
the same process.

Backport of:
> Change-Id: I45059454e51d6f4cbb29a4953359c09a408695cb
> BUG: 1385758
> Reviewed-on: https://review.gluster.org/14763

Change-Id: I4bce9080f6c93d50171823298fdf920258317ee8
BUG: 1418091
Signed-off-by: Jeff Darcy 
Reviewed-on: https://review.gluster.org/16496
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

glusterd: daemon restart logic should adhere server side quorum

2017-01-30T14:13:52+00:00

Just like brick processes, other daemon services should also follow the same
logic of quorum checks to see if a particular service needs to come up if
glusterd is restarted or the incoming friend add/update request is received
(in glusterd_restart_bricks () function)

>Reviewed-on: https://review.gluster.org/15626
>NetBSD-regression: NetBSD Build System 
>CentOS-regression: Gluster Build System 
>Smoke: Gluster Build System 
>Reviewed-by: Prashanth Pai 

Change-Id: I54a1fbdaa1571cc45eed627181b81463fead47a3
BUG: 1417042
Signed-off-by: Atin Mukherjee 
Reviewed-on: https://review.gluster.org/16472
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan 
Reviewed-by: Samikshan Bairagya 
Reviewed-by: Prashanth Pai

glusterd: bypass add-brick validation with force

2017-01-19T03:50:33+00:00

Commit c916a2f added a validation to restrict add-brick operation if a
replica configuration is changed and any of the bricks belonging to the
volume is down. However we should bypass this validation with a force
option if users really want to have add-brick to go through at the sake
of the corner cases of data loss issue.

The original problem of add-brick getting failed when layout is not set
will still be a problem with a force option as the issue has to be taken
care in the DHT layer.

Change-Id: I0ed3df91ea712f77674eb8afc6fdfa577f25a7bb
BUG: 1406411
Signed-off-by: Atin Mukherjee 
Reviewed-on: http://review.gluster.org/16358
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Ravishankar N 
CentOS-regression: Gluster Build System

tier : Tier as a service

2017-01-17T04:49:47+00:00

tierd is implemented by separating from rebalance process.

The commands affected:

1) Attach tier will trigger this process instead of old one
2) tier start and tier start force will also trigger this process.
3) volume status [tier] will show tier daemon as a process instead
of task and normal tier status and tier detach status works.
4) tier stop implemented.
5) detach tier implemented separately along with new detach tier
status
6) volume tier volname status will work using the changes.
7) volume set works

This patch has separated the tier translator from the legacy
DHT rebalance code. It now sends the RPCs from the CLI
to glusterd separate to the DHT rebalance code.
The daemon is now a service, similar to the snapshot daemon,
and can be viewed using the volume status command.

The code for the validation and commit phase are the same
as the earlier tier validation code in DHT rebalance.

The “brickop” phase has been changed so that the status
command can use this framework.

The service management framework is now used.
DHT rebalance does not use this framework.

This service framework takes care of :

*) spawning the daemon, killing it and other such processes.
*) volume set options , which are written on the volfile.
*) restart and reconfigure functions. Restart is to restart
the daemon at two points
        1)after gluster goes down and comes up.
        2) to stop detach tier.
*) reconfigure is used to make immediate volfile changes.
By doing this, we don’t restart the daemon.
it has the code to rewrite the volfile for topological
changes too (which comes into place during add and remove brick).

With this patch the log, pid, and volfile are separated
and put into respective directories.

Change-Id: I3681d0d66894714b55aa02ca2a30ac000362a399
BUG: 1313838
Signed-off-by: hari gowtham 
Reviewed-on: http://review.gluster.org/13365
Smoke: Gluster Build System 
Tested-by: hari gowtham 
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Dan Lambright 
Reviewed-by: Atin Mukherjee