glusterfs.git/tests/bugs/glusterd, branch release-3.10

glusterd: delete source brick only once in reset-brick commit force

2017-10-31T18:08:14+00:00

While stopping the brick which is to be reset and replaced delete_brick
flag was passed as true which resulted glusterd to free up to source
brick before the actual operation. This results commit force to fail
failing to find the source brickinfo.

> mainline patch : https://review.gluster.org/#/c/18581/

Change-Id: I1aa7508eff7cc9c9b5d6f5163f3bb92736d6df44
BUG: 1507880
Signed-off-by: Atin Mukherjee 
(cherry picked from commit 0fb8acaa6ff80c43e46deac0ce66b29ae0df0ca4)

glusterd: Gluster should keep PID file in correct location

2017-10-25T14:03:54+00:00

Currently Gluster keeps process pid information of all the daemons
and brick processes in Gluster configuration file directory
(ie., /var/lib/glusterd/*).

These pid files should be seperate from configuration files.
Deletion of the configuration file directory might result into serious problems.
Also, /var/run/gluster is the default placeholder directory for pid files.

So, with this fix Gluster will keep all process pid information of all
processes in /var/run/gluster/* directory.

> Change-Id: Idb09e3fccb6a7355fbac1df31082637c8d7ab5b4
> BUG: 1258561
> Signed-off-by: Gaurav Kumar Garg 
> Signed-off-by: Saravanakumar Arumugam 
> Reviewed-on: https://review.gluster.org/13580
> Tested-by: MOHIT AGRAWAL 
> Smoke: Gluster Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Atin Mukherjee 
> (Cherry pick from commit 220d406ad13d840e950eef001a2b36f87570058d)

BUG: 1491059
Change-Id: Idb09e3fccb6a7355fbac1df31082637c8d7ab5b4
Signed-off-by: Mohit Agrawal

libglusterfs : Fix crash in glusterd while peer probing

2017-06-20T04:56:08+00:00

glusterd crashes when port is being set explcitly to a
range which is outside greater than short data type range.
Eg. sysctl net.ipv4.ip_local_reserved_ports="49152-49156"
In above case glusterd crashes while parsing the port.

With this fix glusterd will be able to handle port range
between INT_MIN to INT_MAX

> Reviewed-on: https://review.gluster.org/17359
> Smoke: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
>  Reviewed-by: Samikshan Bairagya 
> Reviewed-by: Atin Mukherjee 
> Reviewed-by: Niels de Vos 
> Reviewed-by: Jeff Darcy 

Change-Id: I7c75ee67937b0e3384502973d96b1c36c89e0fe1
BUG: 1459760
Signed-off-by: Gaurav Yadav 
Reviewed-on: https://review.gluster.org/17494
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Samikshan Bairagya 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra Talur

glusterd: Don't spawn new glusterfsds on node reboot with brick-mux

2017-05-30T13:36:40+00:00

With brick multiplexing enabled, upon a node reboot new bricks were
not being attached to the first spawned brick process even though
there wasn't any compatibility issues.

The reason for this is that upon glusterd restart after a node
reboot, since brick services aren't running, glusterd starts the
bricks in a "no-wait" mode. So after a brick process is spawned for
the first brick, there isn't enough time for the corresponding pid
file to get populated with a value before the compatibilty check is
made for the next brick.

This commit solves this by iteratively waiting for the pidfile to be
populated in the brick compatibility comparison stage before checking
if the brick process is alive.

> Reviewed-on: https://review.gluster.org/17307
> Reviewed-by: Atin Mukherjee 
> Smoke: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 

(cherry picked from commit 13e7b3b354a252ad4065f7b2f0f805c40a3c5d18)

Change-Id: Ibd1f8e54c63e4bb04162143c9d70f09918a44aa4
BUG: 1453087
Signed-off-by: Samikshan Bairagya 
Reviewed-on: https://review.gluster.org/17352
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee 
Reviewed-by: Raghavendra Talur

Fixes quota aux mount failure

2017-05-13T21:12:07+00:00

The aux mount is created on the first limit/remove_limit/list command
and it remains until volume is stopped / deleted / (quota is disabled)
, where we do a lazy unmount. If the process is uncleanly terminated,
then the mount entry remains and we get (Transport disconnected) error
on subsequent attempts to run quota list/limit-usage/remove commands.

Second issue, There is also a risk of inadvertent rm -rf on the
/var/run/gluster causing data loss for the user. Ideally, /var/run is
a temp path for application use and should not cause any data loss to
persistent storage.

Solution:
1) unmount the aux mount after each use.
2) clean stale mount before mounting, if any.

One caveat with doing mount/unmount on each command is that we cannot
use same mount point for both list and limit commands.
The reason for this is that list command needs mount to be accessible
in cli after response from glusterd, So it could be unmounted by a
limit command if executed in parallel (had we used same mount point)
Hence we use separate mount points for list and limit commands.

> Reviewed-on: https://review.gluster.org/16938
> NetBSD-regression: NetBSD Build System 
> Smoke: Gluster Build System 
> Reviewed-by: Manikandan Selvaganesh 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Raghavendra G 
> Reviewed-by: Atin Mukherjee 
> (cherry picked from commit 2ae4b4058691b324535d802f4e6d24cce89a10e5)

Change-Id: I4f9e39da2ac2b65941399bffb6440db8a6ba59d0
BUG: 1449779
Signed-off-by: Sanoj Unnikrishnan 
Reviewed-on: https://review.gluster.org/17241
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra Talur

glusterd: Make reset-brick work correctly if brick-mux is on

2017-05-12T19:59:02+00:00

Reset brick currently kills of the corresponding brick process.
However, with brick multiplexing enabled, stopping the brick
process would render all bricks attached to it unavailable. To
handle this correctly, we need to make sure that the brick process
is terminated only if brick-multiplexing is disabled. Otherwise,
we should send the GLUSTERD_BRICK_TERMINATE rpc to the respective
brick process to detach the brick that is to be reset.

> Signed-off-by: Samikshan Bairagya 
> Reviewed-on: https://review.gluster.org/17128
> Smoke: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Atin Mukherjee 

(cherry picked from commit 74383e3ec6f8244b3de9bf14016452498c1ddcf0)

Change-Id: I69002d66ffe6ec36ef48af09b66c522c6d35ac58
BUG: 1449934
Signed-off-by: Samikshan Bairagya 
Reviewed-on: https://review.gluster.org/17253
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee

glusterd: socketfile & pidfile related fixes for brick multiplexing feature

2017-05-10T10:42:05+00:00

Problem: While brick-muliplexing is on after restarting glusterd, CLI is
         not showing pid of all brick processes in all volumes.

Solution: While brick-mux is on all local brick process communicated through one
          UNIX socket but as per current code (glusterd_brick_start) it is trying
          to communicate with separate UNIX socket for each volume which is populated
          based on brick-name and vol-name.Because of multiplexing design only one
          UNIX socket is opened so it is throwing poller error and not able to
          fetch correct status of brick process through cli process.
          To resolve the problem write a new function glusterd_set_socket_filepath_for_mux
          that will call by glusterd_brick_start to validate about the existence of socketpath.
          To avoid the continuous EPOLLERR erros in  logs update socket_connect code.

Test:     To reproduce the issue followed below steps
          1) Create two distributed volumes(dist1 and dist2)
          2) Set cluster.brick-multiplex is on
          3) kill glusterd
          4) run command gluster v status
          After apply the patch it shows correct pid for all volumes

> BUG: 1444596
> Change-Id: I5d10af69dea0d0ca19511f43870f34295a54a4d2
> Signed-off-by: Mohit Agrawal 
> Reviewed-on: https://review.gluster.org/17101
> Smoke: Gluster Build System 
> Reviewed-by: Prashanth Pai 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Atin Mukherjee 
> (cherry picked from commit 21c7f7baccfaf644805e63682e5a7d2a9864a1e6)

Change-Id: I1892c80b9ffa93974f20c92d421660bcf93c4cda
BUG: 1449002
Signed-off-by: Mohit Agrawal 
Reviewed-on: https://review.gluster.org/17210
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee 
Reviewed-by: Prashanth Pai

rpc: bump up conn->cleanup_gen in rpc_clnt_reconnect_cleanup

2017-03-27T13:58:29+00:00

Commit 086436a introduced generation number (cleanup_gen) to ensure that
rpc layer doesn't end up cleaning up the connection object if
application layer has already destroyed it. Bumping up cleanup_gen was
done only in rpc_clnt_connection_cleanup (). However the same is needed
in rpc_clnt_reconnect_cleanup () too as with out it if the object gets destroyed
through the reconnect event in the application layer, rpc layer will
still end up in trying to delete the object resulting into double free
and crash.

Peer probing an invalid host/IP was the basic test to catch this issue.

>Reviewed-on: https://review.gluster.org/16914
>Smoke: Gluster Build System 
>NetBSD-regression: NetBSD Build System 
>Reviewed-by: Milind Changire 
>CentOS-regression: Gluster Build System 
>Reviewed-by: Jeff Darcy 
>(cherry picked from commit 39e09ad1e0e93f08153688c31433c38529f93716)

Change-Id: Id5332f3239cb324cead34eb51cf73d426733bd46
BUG: 1434399
Signed-off-by: Atin Mukherjee 
Reviewed-on: https://review.gluster.org/16936
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

glusterd: ignore return code of glusterd_restart_bricks

2017-02-10T11:41:31+00:00

When GlusterD is restarted on a multi node cluster, while syncing the
global options from other GlusterD, it checks for quorum and based on
which it decides whether to stop/start a brick. However we handle the
return code of this function in which case if we don't want to start any
bricks the ret will be non zero and we will end up failing the import
which is incorrect.

Fix is just to ignore the ret code of glusterd_restart_bricks ()

>Reviewed-on: https://review.gluster.org/16574
>Smoke: Gluster Build System 
>NetBSD-regression: NetBSD Build System 
>CentOS-regression: Gluster Build System 
>Reviewed-by: Samikshan Bairagya 
>Reviewed-by: Jeff Darcy 
>(cherry picked from commit 55625293093d485623f3f3d98687cd1e2c594460)

Change-Id: I37766b0bba138d2e61d3c6034bd00e93ba43e553
BUG: 1420991
Signed-off-by: Atin Mukherjee 
Reviewed-on: https://review.gluster.org/16593
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Samikshan Bairagya 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Shyamsundar Ranganathan

glusterd: double-check brick liveness for remove-brick validation

2017-02-03T13:46:53+00:00

Same problem as https://review.gluster.org/#/c/16509/ in a different
place.  Tests detach bricks without glusterd's knowledge, so
glusterd's internal brick state is out of date and we have to re-check
(via the brick's pidfile) as well.

Backport of:
> BUG: 1385758
> Change-Id: I169538c1c62d72a685a49d57ef65fb6c3db6eab2
> Reviewed-on: https://review.gluster.org/16529

BUG: 1418091
Change-Id: Id0b597bc60807ed090f6ecdba549c5cf3d758f98
Signed-off-by: Jeff Darcy 
Reviewed-on: https://review.gluster.org/16537
Reviewed-by: Atin Mukherjee 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System