glusterfs.git/xlators/cluster, branch v4.0dev

cluster/dht: Fixed crash in dht_rmdir_is_subvol_empty

2017-07-20T12:45:59+00:00

The local->call_cnt was being accessed and updated inside
the loop where the entries were being processed and the calls
were being wound.
This could end up in a scenario where the local->call_cnt became
0 before the processing was complete causing the crash when the
next entry was being processed.

Change-Id: I930f61f1a1d1948f90d4e58e80b7d6680cf27f2f
BUG: 1472949
Signed-off-by: N Balachandran 
Reviewed-on: https://review.gluster.org/17825
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Jeff Darcy 
Reviewed-by: Amar Tumballi 
Reviewed-by: Shyamsundar Ranganathan

libglusterfs: Name threads on creation

2017-07-19T14:16:19+00:00

Set names to threads on creation for easier
debugging.

Output of top -H -p 
Before:
19773 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19774 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19775 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19776 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19777 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19778 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19779 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19780 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19781 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19782 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19783 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19784 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19785 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.01 glusterfsd
19786 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.01 glusterfsd
19787 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.01 glusterfsd
19789 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19790 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
25178 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
 5398 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
 7881 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd

After:
19773 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19774 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glustertimer
19775 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19776 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glustermemsweep
19777 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glustersproc0
19778 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glustersproc1
19779 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterepoll0
19780 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusteridxwrker
19781 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusteriotwr0
19782 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterbrssign
19783 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterbrswrker
19784 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterclogecon
19785 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.01 glusterclogd0
19786 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.01 glusterclogd1
19787 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.01 glusterclogd2
19789 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterposixjan
19790 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterposixfsy
25178 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterepoll1
 5398 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterepoll2
 7881 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterposixhc

Change-Id: Id5f333755c1ba168a2ffaa4fce6e71c375e10703
BUG: 1254002
Updates: #271
Signed-off-by: Raghavendra Talur 
Reviewed-on: https://review.gluster.org/11926
Reviewed-by: Prashanth Pai 
Smoke: Gluster Build System 
Reviewed-by: Niels de Vos 
CentOS-regression: Gluster Build System

cluster/afr: GFID split-brain resolution with existing CLI

2017-07-18T15:24:54+00:00

Problem:
Currently there is no way for the admin from CLI to resolve gfid
split-brain based on some policy like choice of the brick, mtime
or size.

Fix:
With the existing CLI options based on size, mtime, and choice of
brick, we do lookup on the parent for the specified file. As
part of the lookup, if we find gfid mismatch, we resolve them
based on the policy and return. If the file is not in gfid split-
brain, then we check for the data and metadata split-brain in the
getxattr code path, and resolve if any.

This will work provided absolute path to the file with the CLI
and not with gfid of the file. Hence the source-brick policy
without any file path will also not resolve the gfid split-brain
since it uses the gfid of the files. But it can resolve any other
type of split-brains and skip the gfid mismatch resolution with
the usual error message.

Reverting the change https://review.gluster.org/17290. This patch
resolves the issue.

Fixes gluster/glusterfs#135

Change-Id: Iaeba6fc32f184a34255d03be87cda02773130a09
BUG: 1459530
Signed-off-by: karthik-us 
Reviewed-on: https://review.gluster.org/17485
Reviewed-by: Ravishankar N 
Reviewed-by: Pranith Kumar Karampuri 
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System

cluster/ec: Non-disruptive upgrade on EC volume fails

2017-07-14T00:26:04+00:00

Problem:
Enabling optimistic changelog on EC volume was not
handling node down scenarios appropriately resulting
in volume data inaccessibility.

Solution:
Update dirty xattr appropriately on good bricks whenever
nodes are down. This would fix the metadata information
as part of heal and thus ensures data accessibility.

BUG: 1468261
Change-Id: I08b0d28df386d9b2b49c3de84b4aac1c729ac057
Signed-off-by: Sunil Kumar Acharya 
Reviewed-on: https://review.gluster.org/17703
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

afr: mark non sources as sinks in metadata heal

2017-07-13T17:22:58+00:00

Problem:
In a 3 way replica, when the source brick does not have pending xattrs
for the sinks, but the 2 sinks blame each other, metadata heal was not
happpening because we were not setting all non-sources as sinks.

Fix: Mark all non-sources as sinks, like it is done in data and entry
heal.

Change-Id: I534978940f5087302e307fcc810a48ffe898ce08
BUG: 1468279
Signed-off-by: Ravishankar N 
Reviewed-on: https://review.gluster.org/17717
Smoke: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri 
CentOS-regression: Gluster Build System

cluster/ec: Get size of file in EC [f]xattrop

2017-07-13T08:10:11+00:00

Problem:
For allowing parallel writes we shouldn't depend on ia_size to be same for
all the bricks in each write_cbk(). But we need to make sure backend size
is correct on all the bricks and no crashes/manual modifications happened.

Fix:
At the time of get_size_version() we do 1 check to make sure size of the file
is same across the bricks. From then on the FOPs will give the status of the
fop, so we rely on this information to keep which bricks are good/bad.

Updates #251
Change-Id: I1df645347e2e9f2e09cfa4411b6cc305d7f4e4e5
Signed-off-by: Pranith Kumar K 
Reviewed-on: https://review.gluster.org/17741
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Xavier Hernandez

cluster/rebalance: Fix hardlink migration failures

2017-07-13T05:38:40+00:00

A brief about how hardlink migration works:
  - Different hardlinks (to the same file) may hash to different bricks,
but their cached subvol will be same. Rebalance picks up the first hardlink,
calculates it's  hash(call it TARGET) and set the hashed subvolume as an xattr
on the data file.
  - Now all the hardlinks those come after this will fetch that xattr and will
create linkto files on TARGET (all linkto files for the hardlinks will be hardlink
to each other on TARGET).
  - When number of hardlinks on source is equal to the number of hardlinks on
TARGET, the data migration will happen.

RACE:1
  Since rebalance is multi-threaded, the first lookup (which decides where the TARGET
subvol should be), can be called by two hardlink migration parallely and they may end
up creating linkto files on two different TARGET subvols. Hence, hardlinks won't be
migrated.

Fix: Rely on the xattr response of lookup inside gf_defrag_handle_hardlink since it
is executed under synclock.

RACE:2
  The linkto files on TARGET can be created by other clients also if they are doing
lookup on the hardlinks.  Consider a scenario where you have 100 hardlinks.  When
rebalance is migrating 99th hardlink, as a result of continuous lookups from other
client, linkcount on TARGET is equal to source linkcount. Rebalance will migrate data
on the 99th hardlink itself. On 100th hardlink migration, hardlink will have TARGET as
cached subvolume. If it's hash is also the same, then a migration will be triggered from
TARGET to TARGET leading to data loss.

Fix: Make sure before the final data migration, source is not same as destination.

RACE:3
  Since a hardlink can be migrating to a non-hashed subvolume, a lookup from other
client or even the rebalance it self, might delete the linkto file on TARGET leading
to hardlinks never getting migrated.

This will be addressed in a different patch in future.

Change-Id: If0f6852f0e662384ee3875a2ac9d19ac4a6cea98
BUG: 1469964
Signed-off-by: Susant Palai 
Reviewed-on: https://review.gluster.org/17755
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: N Balachandran 
Reviewed-by: Raghavendra G

cluster/dht: Clear clean_dst flag on target change

2017-07-11T05:06:13+00:00

If the target of a file migration was changed because
of min-free-disk limits, the dst_fd was closed but the
clean_dst flag was not set to false. If the file could
not be created on the new target for some reason, the
ftruncate call to clean up the dst was sent on the now
invalid fd causing the process to deadlock.

Change-Id: I5bfa80f519b04567413d84229cf62d143c6e2f04
BUG: 1469029
Signed-off-by: N Balachandran 
Reviewed-on: https://review.gluster.org/17735
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Amar Tumballi 
Reviewed-by: Raghavendra G

cluster/dht: Fix fd check race

2017-07-11T05:02:52+00:00

There is a another race between the cached subvol
being updated in the inode_ctx and the fd being opened on
the target.

1. fop1 -> fd1 -> subvol0
2. file migrated from subvol0 to subvol1 and cached_subvol
   changed to subvol1 in inode_ctx
3. fop2 -> fd1 -> subvol1 [takes new cached subvol]
4. fop2 -> checks fd ctx (fd not open on subvol1) -> opens fd1 on subvol1
5. fop1 -> checks fd ctx (fd not open on subvol0)
   -> tries to open fd1 on subvol0 -> fails with "No such file on directory".

Fix:
If dht_fd_open_on_dst fails with ENOENT or ESTALE, wind to old subvol
and let the phase1/phase2 checks handle it.

Change-Id: I34f8011574a8b72e3bcfe03b0cc4f024b352f225
BUG: 1465075
Signed-off-by: N Balachandran 
Reviewed-on: https://review.gluster.org/17731
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra G 
Reviewed-by: Amar Tumballi

cluster/dht: Use size to calculate estimates

2017-07-10T14:35:34+00:00

The earlier approach of using the number of files
to determine when the rebalance would complete did
not work well when file sizes differed widely.

The new approach now gets the total data size and
uses that information to determine how long
the rebalance is expected to take.

Change-Id: I84e80a0893efab72ff06130e4596fa71c9c8c868
BUG: 1467209
Signed-off-by: N Balachandran 
Reviewed-on: https://review.gluster.org/17668
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: MOHIT AGRAWAL 
Reviewed-by: Raghavendra G