glusterfs.git/tests, branch v6.3

cluster/ec: honor contention notifications for partially acquired locks

2019-06-03T04:08:06+00:00

EC was ignoring lock contention notifications received while a lock was
being acquired. When a lock is partially acquired (some bricks have
granted the lock but some others not yet) we can receive notifications
from acquired bricks, which should be honored, since we may not receive
more notifications after that.

Since EC was ignoring them, once the lock was acquired, it was not
released until the eager-lock timeout, causing unnecessary delays on
other clients.

This fix takes into consideration the notifications received before
having completed the full lock acquisition. After that, the lock will
be releaed as soon as possible.

Backport of:
> BUG: bz#1708156
> Change-Id: I2a306dbdb29fb557dcab7788a258bd75d826cc12
> Signed-off-by: Xavi Hernandez 

Fixes: bz#1714172
Change-Id: I2a306dbdb29fb557dcab7788a258bd75d826cc12
Signed-off-by: Xavi Hernandez

geo-rep: Fix sync hang with tarssh

2019-05-21T05:14:53+00:00

Problem:
Geo-rep sync hangs when tarssh is used as sync
engine at heavy workload.

Analysis and Root cause:
It's found out that the tar process was hung.
When debugged further, it's found out that stderr
buffer of tar process on master was full i.e., 64k.
When the buffer was copied to a file from /proc/pid/fd/2,
the hang is resolved.

This can happen when files picked by tar process
to sync doesn't exist on master anymore. If this count
increases around 1k, the stderr buffer is filled up.

Fix:
The tar process is executed using Popen with stderr as PIPE.
The final execution is something like below.

tar | ssh  root@slave tar --overwrite -xf - -C 

It was waiting on ssh process first using communicate() and then tar.
Note that communicate() reads stdout and stderr. So when stderr of tar
process is filled up, there is no one to read until untar via ssh is
completed. This can't happen and leads to deadlock.
Hence we should be waiting on both process parallely, so that stderr is
read on both processes.

Backport of:
 > Patch: https://review.gluster.org/22684/
 > Change-Id: I609c7cc5c07e210c504771115b4d551a2e891adf
 > BUG: 1707728
 > Signed-off-by: Kotresh HR 

Change-Id: I609c7cc5c07e210c504771115b4d551a2e891adf
fixes: bz#1709738
Signed-off-by: Kotresh HR

tests/geo-rep: Fix arequal checksum comparison

2019-05-21T05:14:53+00:00

The arequal checkusm comparison was always returning
as successful, eventhough, if it was not. Fixed the same.

Backport of:
> Patch: https://review.gluster.org/22682
> Change-Id: I5083da25c0954126e452d06311d2d376f8540555
> BUG: 1707742
> Signed-off-by: Kotresh HR 
(cherry picked from commit 288cffd1ab7180cccfcdea36d0c469b9fa52108f)

Change-Id: I5083da25c0954126e452d06311d2d376f8540555
fixes: bz#1712220
Signed-off-by: Kotresh HR

geo-rep: Fix sync-method config

2019-05-17T07:47:53+00:00

Problem:
When 'use_tarssh' is set to true, it exits with successful
message but the default 'rsync' was used as sync-engine.
The new config 'sync-method' is not allowed to set from cli.

Analysis and Fix:
The 'use_tarssh' config is deprecated with new
config framework and 'sync-method' is the new
config to choose sync-method i.e. tarssh or rsync.
This patch fixes the 'sync-method' config. The allowed
values are tarssh and rsync.

Backport of:
 > Patch: https://review.gluster.org/22683
 > Change-Id: I0edb0319cad0455b29e49f2f08a64ce324735e84
 > BUG: 1707686
 > Signed-off-by: Kotresh HR 

Change-Id: I0edb0319cad0455b29e49f2f08a64ce324735e84
fixes: bz#1709737
Signed-off-by: Kotresh HR

geo-rep: Fix rename with existing destination with same gfid

2019-05-17T07:47:53+00:00

Problem:
   Geo-rep fails to sync the rename properly if destination exists.
It results in source to be remained on slave causing more number of
files on slave. Also heavy rename workload like logrotate caused
lot of ESTALE errors

Cause:
   Geo-rep fails to sync rename if destination exists if creation
of source file also falls into single batch of changelogs being
processed. This is because, after fixing problematic gfids verifying
from master, while re-processing original entries, CREATE also was
re-processed causing more files on slave and rename to be failed.

Solution:
   Entries need to be removed from retrial list after fixing
problematic gfids on slave so that it's not re-created again on slave.
   Also treat ESTALE as EEXIST so that the error is properly handled
verifying the op on master volume.

Backport of:
 > Patch: https://review.gluster.org/22519/
 > Change-Id: I50cf289e06b997adddff0552bf2466d9201dd1f9
 > BUG: 1694820
 > Signed-off-by: Kotresh HR 
 > Signed-off-by: Sunny Kumar 

Change-Id: I50cf289e06b997adddff0552bf2466d9201dd1f9
fixes: bz#1709734
Signed-off-by: Kotresh HR

cluster/dht: refactor dht lookup functions

2019-05-08T14:00:05+00:00

Part 1:  refactor the dht_lookup_dir_cbk
and dht_selfheal_directory functions.
Added a simple dht selfheal directory test

Change-Id: I1410c26359e3c14b396adbe751937a52bd2fcff9
updates: bz#1707393
Signed-off-by: N Balachandran

glusterd: define dumpops in the xlator_api of glusterd

2019-05-08T13:57:24+00:00

Problem: statedump is not capturing information related to glusterd

Solution: statdump is not capturing glusterd info because
trav->dumpops is null in gf_proc_dump_single_xlator_info ()
where trav is glusterd xlator object. trav->dumpops is null
because we missed to define dumpops in xlator_api of glusterd.
defining dumpops in xlator_api of glusterd fixes the issue.

fixes: bz#1703759
Change-Id: If85429ecb1ef580aced8d5b88d09fc15258bfc4c
Signed-off-by: Sanju Rakonde 
(cherry picked from commit 5d866c13efdcdeddf184f012aa88a652e90ff22e)

extras/hooks: syntactical errors in SELinux hooks, scipt logic improved

2019-05-08T13:56:20+00:00

Fixes: bz#1701818
Change-Id: Ia5fa1df81bbaec3a84653d136a331c76b457f42c
Signed-off-by: Milan Zink 
(cherry picked from commit 1ad201a9fd6748d7ef49fb073fcfe8c6858d557d)

cluster/ec: fix fd reopen

2019-05-08T13:54:59+00:00

Currently EC tries to reopen fd's that have been opened while a brick
was down. This is done as part of regular write operations, just after
having acquired the locks, and it's sent as a sub-fop of the main write
fop.

There were two problems:

1. The reopen was attempted on all UP bricks, even if a previous lock
didn't succeed. This is incorrect because most probably the open will
fail.

2. If reopen is sent and fails, the error is propagated to the main
operation, causing it to fail when it shouldn't.

To fix this, we only attempt reopens on bricks where the current fop
owns a lock, and we prevent any error to be propagated to the main
fop.

To implement this behaviour an argument used to indicate the minimum
number of required answers has overloaded to also include some flags. To
make the change consistent, it has been necessary to rename the
argument, which means that a lot of files have been changed. However
there are no functional changes.

This change has also uncovered a problem in discard code, which didn't
correctely process requests of small sizes because no real discard fop
was being processed, only a write of 0's on some region. In this case
some fields of the fop remained uninitialized or with incorrect values.
To fix this, a new function has been created to simulate success on a
fop and it's used in the discard case.

Thanks to Pranith for providing a test script that has also detected an
issue in this patch. This patch includes a small modification of this
script to force data to be written into bricks before stopping them.

Backport of:
> Change-Id: If272343873369186c2fb8f43c1d9c52c3ea304ec
> BUG: bz#1699866
> Signed-off-by: Xavi Hernandez 

Change-Id: If272343873369186c2fb8f43c1d9c52c3ea304ec
Fixes: bz#1699917
Signed-off-by: Xavi Hernandez

geo-rep: fix integer config validation

2019-04-17T13:58:52+00:00

ssh-port validation is mentioned as `validation=int` in template
`gsyncd.conf`, but not handled this during geo-rep config set.

Backport of https://review.gluster.org/22418

Fixes: bz#1695445
Change-Id: I3f19d9b471b0a3327e4d094dfbefcc58ed2c34f6
Signed-off-by: Aravinda VK 
(cherry picked from commit c574984e19d59e351372eacce0ce11fb36e96dd4)