glusterfs.git, branch v6.2

doc: Added release notes for 6.2

2019-05-23T13:24:14+00:00

Fixes: bz#1701203

Change-Id: Id105192610726e370fa977df2c29723201b94695
Signed-off-by: Hari Gowtham

geo-rep: Convert gfid conflict resolutiong logs into debug

2019-05-21T05:14:53+00:00

The gfid conflict resolution code path is not supposed
to hit in generic code path. But few of the heavy rename
workload (BUG: 1694820) makes it a generic case. So
logging the entries to be fixed as INFO floods the log
in these particular workloads. Hence convert them to DEBUG.

Backport of:
 > Patch: https://review.gluster.org/22720
 > BUG: 1709653
 > Change-Id: I4d5e102b87be5fe5b54f78f329e588882d72b9d9
 > Signed-off-by: Kotresh HR 

fixes: bz#1712223
Change-Id: I4d5e102b87be5fe5b54f78f329e588882d72b9d9
Signed-off-by: Kotresh HR

geo-rep: Fix sync hang with tarssh

2019-05-21T05:14:53+00:00

Problem:
Geo-rep sync hangs when tarssh is used as sync
engine at heavy workload.

Analysis and Root cause:
It's found out that the tar process was hung.
When debugged further, it's found out that stderr
buffer of tar process on master was full i.e., 64k.
When the buffer was copied to a file from /proc/pid/fd/2,
the hang is resolved.

This can happen when files picked by tar process
to sync doesn't exist on master anymore. If this count
increases around 1k, the stderr buffer is filled up.

Fix:
The tar process is executed using Popen with stderr as PIPE.
The final execution is something like below.

tar | ssh  root@slave tar --overwrite -xf - -C 

It was waiting on ssh process first using communicate() and then tar.
Note that communicate() reads stdout and stderr. So when stderr of tar
process is filled up, there is no one to read until untar via ssh is
completed. This can't happen and leads to deadlock.
Hence we should be waiting on both process parallely, so that stderr is
read on both processes.

Backport of:
 > Patch: https://review.gluster.org/22684/
 > Change-Id: I609c7cc5c07e210c504771115b4d551a2e891adf
 > BUG: 1707728
 > Signed-off-by: Kotresh HR 

Change-Id: I609c7cc5c07e210c504771115b4d551a2e891adf
fixes: bz#1709738
Signed-off-by: Kotresh HR

tests/geo-rep: Fix arequal checksum comparison

2019-05-21T05:14:53+00:00

The arequal checkusm comparison was always returning
as successful, eventhough, if it was not. Fixed the same.

Backport of:
> Patch: https://review.gluster.org/22682
> Change-Id: I5083da25c0954126e452d06311d2d376f8540555
> BUG: 1707742
> Signed-off-by: Kotresh HR 
(cherry picked from commit 288cffd1ab7180cccfcdea36d0c469b9fa52108f)

Change-Id: I5083da25c0954126e452d06311d2d376f8540555
fixes: bz#1712220
Signed-off-by: Kotresh HR

geo-rep: Fix sync-method config

2019-05-17T07:47:53+00:00

Problem:
When 'use_tarssh' is set to true, it exits with successful
message but the default 'rsync' was used as sync-engine.
The new config 'sync-method' is not allowed to set from cli.

Analysis and Fix:
The 'use_tarssh' config is deprecated with new
config framework and 'sync-method' is the new
config to choose sync-method i.e. tarssh or rsync.
This patch fixes the 'sync-method' config. The allowed
values are tarssh and rsync.

Backport of:
 > Patch: https://review.gluster.org/22683
 > Change-Id: I0edb0319cad0455b29e49f2f08a64ce324735e84
 > BUG: 1707686
 > Signed-off-by: Kotresh HR 

Change-Id: I0edb0319cad0455b29e49f2f08a64ce324735e84
fixes: bz#1709737
Signed-off-by: Kotresh HR

geo-rep: Fix rename with existing destination with same gfid

2019-05-17T07:47:53+00:00

Problem:
   Geo-rep fails to sync the rename properly if destination exists.
It results in source to be remained on slave causing more number of
files on slave. Also heavy rename workload like logrotate caused
lot of ESTALE errors

Cause:
   Geo-rep fails to sync rename if destination exists if creation
of source file also falls into single batch of changelogs being
processed. This is because, after fixing problematic gfids verifying
from master, while re-processing original entries, CREATE also was
re-processed causing more files on slave and rename to be failed.

Solution:
   Entries need to be removed from retrial list after fixing
problematic gfids on slave so that it's not re-created again on slave.
   Also treat ESTALE as EEXIST so that the error is properly handled
verifying the op on master volume.

Backport of:
 > Patch: https://review.gluster.org/22519/
 > Change-Id: I50cf289e06b997adddff0552bf2466d9201dd1f9
 > BUG: 1694820
 > Signed-off-by: Kotresh HR 
 > Signed-off-by: Sunny Kumar 

Change-Id: I50cf289e06b997adddff0552bf2466d9201dd1f9
fixes: bz#1709734
Signed-off-by: Kotresh HR

geo-rep: Fix entries and metadata counters in geo-rep status

2019-05-17T07:47:53+00:00

Entries counter was incremented twice and decremented only
once. And entries count was being used in place of metadata
entries. This patch fixes both of them.

Backport of:
 > Patch: https://review.gluster.org/22603
 > BUG: 1512093
 > Change-Id: I5601a5fe8d25c9d65b72eb529171e7117ebbb67f
 > Signed-off-by: Kotresh HR 
  (cherry picked from commit e0a6941af6ed352911698012ada895d1296b549e)

fixes: bz#1709685
Change-Id: I5601a5fe8d25c9d65b72eb529171e7117ebbb67f
Signed-off-by: Kotresh HR

cluster/ec: Reopen shouldn't happen with O_TRUNC

2019-05-15T10:36:50+00:00

Problem:
Doing re-open with O_TRUNC will truncate the fragment even when it is not
needed needing extra heals

Fix:
At the time of re-open don't use O_TRUNC.

fixes bz#1709660
Change-Id: Idc6408968efaad897b95a5a52481c66e843d3fb8
Signed-off-by: Pranith Kumar K

afr: thin-arbiter lock release fixes

2019-05-15T04:16:52+00:00

- pass fop state instead of afr local to
afr_ta_dom_lock_check_and_release()

- avoid afr_lock_release_synctask() being called simultaneosuly from
notify code path and transaction (post-op) code path due to races.

- Check if the post-op on TA is valid based on event_gen checks.

- Invalidate in-memory information when we get TA child down.

Note: Thi patch addresses some pending review comments of commit
053b1309dc8fbc05fcde5223e734da9f694cf5cc
(https://review.gluster.org/#/c/glusterfs/+/20095/)

fixes: bz#1709130
Change-Id: I2ccd7e1b53362f9f3fed8680aecb23b5011eb18c
Signed-off-by: Ravishankar N 
(cherry picked from commit 9ab2747da78061882f6734df4b265bce11adaef1)

cluster/afr : TA: Return actual error code in case of failure

2019-05-13T05:36:51+00:00

In afr_ta_post_op_do, we were sending EIO for every failure.
However, the original error code should be sent.

Change-Id: I9fdc15dac00d758baf8e6f14db244f526481a63a
updates: bz#1709143
Signed-off-by: Ashish Pandey 
(cherry picked from commit 63159cdb5374f458d7d2bffec24d4720ffc96d6c)