glusterfs.git, branch v3.8.11

doc : release-notes for GlusterFS-3.8.11

2017-04-10T15:50:45+00:00

BUG: 1431410
Change-Id: Iaf1d9603221bc0c70ad1695f5aa0afc2d651d737
Signed-off-by: Jiffin Tony Thottan 
Reviewed-on: https://review.gluster.org/17028
CentOS-regression: Gluster Build System 
Reviewed-by: Niels de Vos 
NetBSD-regression: NetBSD Build System 
Smoke: Gluster Build System

features/shard: Fix vm corruption upon fix-layout

2017-04-10T15:45:51+00:00

        Backport of: https://review.gluster.org/17010

shard's writev implementation, as part of identifying
presence of participant shards that aren't in memory,
first sends an MKNOD on these shards, and upon EEXIST error,
looks up the shards before proceeding with the writes.

The VM corruption was caused when the following happened:
1. DHT had n subvolumes initially.
2. Upon add-brick + fix-layout, the layout of .shard changed
   although the existing shards under it were yet to be migrated
   to their new hashed subvolumes.
3. During this time, there were writes on the VM falling in regions
   of the file whose corresponding shards were already existing under
   .shard.
4. Sharding xl sent MKNOD on these shards, now creating them in their
   new hashed subvolumes although there already exist shard blocks for
   this region with valid data.
5. All subsequent writes were wound on these newly created copies.

The net outcome is that both copies of the shard didn't have the correct
data. This caused the affected VMs to be unbootable.

FIX:
For want of better alternatives in DHT, the fix changes shard fops to do
a LOOKUP before the MKNOD and upon EEXIST error, perform another lookup.

Change-Id: I1a5d3515b42e2e5583c407d1b4aff44d7ce472eb
BUG: 1440635
RCA'd-by: Raghavendra Gowdappa 
Reported-by: Mahdi Adnan 
Signed-off-by: Krutika Dhananjay 
Reviewed-on: https://review.gluster.org/17019
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Smoke: Gluster Build System 
Reviewed-by: jiffin tony Thottan

features/shard: Initialize local->fop in readv

2017-04-10T11:27:44+00:00

        Backport of: https://review.gluster.org/17014

Change-Id: I4d2f0a3f533009038d48579db5a8a2a048b77ca1
BUG: 1440635
Signed-off-by: Krutika Dhananjay 
Reviewed-on: https://review.gluster.org/17020
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Pranith Kumar Karampuri 
CentOS-regression: Gluster Build System

features/worm: Adding implementation for ftruncate

2017-04-07T12:09:30+00:00

Problem:
Since the ftruncate fop was not handled in the worm feature, when
truncate and write was happening on a worm-retained/worm file, it was
giving the EROFS error but truncating the file, which is not correct.

> Change-Id: I1a7e904655210d78bce9e01652ac56f3783b5aed
> BUG: 1438810
> Signed-off-by: karthik-us 
> Reviewed-on: https://review.gluster.org/16995
> NetBSD-regression: NetBSD Build System 
> Reviewed-by: Niels de Vos 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Ravishankar N 
> Smoke: Gluster Build System 
> Reviewed-by: Amar Tumballi 
> Reviewed-by: Raghavendra Talur 
(cherry picked from commit c5a4a77848024d2adf8cd4f35d550ba90c174fc7)

Change-Id: Ic5e904b5bb3d76954a143f92fbfd8959fec884b8
BUG: 1439112
Signed-off-by: karthik-us 
Reviewed-on: https://review.gluster.org/17000
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Ravishankar N 
Reviewed-by: Niels de Vos

cluster/dht: Modify local->loc.gfid in thread safe manner

2017-04-07T12:08:00+00:00

	Backport of https://review.gluster.org/16986

Problem:
local->loc.gfid in dht_lookup_directory() will be null-gfid for a fresh lookup.
dht_lookup_dir_cbk() updates local->loc.gfid while in other thread dht_lookup_directory()
is still winding lookup calls to subvolumes so there is a chance of partial gfid being
seen by EC.

We saw in 12x(4+2) volume, ec is receiving an loc where the gfid has last 10 bytes matching
with the gfid of the directory and the first 4 bytes are all-zeros. This is leading to EC
erroring out the lookup with EINVAL which leads to NFS failing lookup with EIO.

snip from gdb:
$37 = (dht_local_t *) 0x7fde5de5b3cc
(gdb) p /x $37->loc.gfid
$39 = {0x3b, 0x82, 0x10, 0x5e, 0x40, 0x65, 0x43, 0x14, 0xa0, 0xc6, 0x8, 0xf5,
0x6c, 0x2c, 0xb8, 0x56}
(gdb) fr 7
state=) at ec-generic.c:837
837	                ec_lookup_rebuild(fop->xl->private, fop, cbk);
(gdb) p /x fop->loc[0].gfid
$40 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x43, 0x14, 0xa0, 0xc6, 0x8, 0xf5, 0x6c,
0x2c, 0xb8, 0x56}

snip from log:
[2017-01-29 03:22:30.132328] W [MSGID: 122019]
[ec-helpers.c:354:ec_loc_gfid_check] 0-butcher-disperse-4: Mismatching GFID's
in loc [2017-01-29 03:22:30.132709] W [MSGID: 112199]
[nfs3-helpers.c:3515:nfs3_log_newfh_res] 0-nfs-nfsv3:
/linux-4.9.5/Documentation => (XID: b27b9474, MKDIR: NFS: 5(I/O error), POSIX:
5(Input/output error)), FH: exportid 00000000-0000-0000-0000-000000000000, gfid
00000000-0000-0000-0000-000000000000, mountid
00000000-0000-0000-0000-000000000000 [Invalid argument]

Fix:
update local->loc.gfid in last-call to make sure there are no races.

 >BUG: 1438411
 >Change-Id: Ifcb7e911568c1f1f83123da6ff0cf742b91800a0
 >Signed-off-by: Pranith Kumar K 

BUG: 1438424
Change-Id: If039956205cfac5e798c2c90e92a9a47b404e804
Signed-off-by: Pranith Kumar K 
Reviewed-on: https://review.gluster.org/16988
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra G

rpcsvc: Add rpchdr and proghdr to iobref before submitting to transport

2017-04-07T12:05:09+00:00

Backport of https://review.gluster.org/16613

Issue:
When fio is run on multiple clients (each client writes to its own files),
and meanwhile the clients does a readdirp, thus the client which did
a readdirp will now recieve the upcalls. In this scenario the client
disconnects with rpc decode failed error.

RCA:
Upcall calls rpcsvc_request_submit to submit the request to socket:
rpcsvc_request_submit currently:
rpcsvc_request_submit () {
   iobuf = iobuf_new
   iov = iobuf->ptr
   fill iobuf to contain xdrised upcall content - proghdr
   rpcsvc_callback_submit (..iov..)
   ...
   if (iobuf)
       iobuf_unref (iobuf)
}

rpcsvc_callback_submit (... iov...) {
   ...
   iobuf = iobuf_new
   iov1 = iobuf->ptr
   fill iobuf to contain xdrised rpc header - rpchdr
   msg.rpchdr = iov1
   msg.proghdr = iov
   ...
   rpc_transport_submit_request (msg)
   ...
   if (iobuf)
       iobuf_unref (iobuf)
}

rpcsvc_callback_submit assumes that once rpc_transport_submit_request()
returns the msg is written on to socket and thus the buffers(rpchdr, proghdr)
can be freed, which is not the case. In especially high workload,
rpc_transport_submit_request() may not be able to write to socket immediately
and hence adds it to its own queue and returns as successful. Thus, we have
use after free, for rpchdr and proghdr. Hence the clients gets garbage rpchdr
and proghdr and thus fails to decode the rpc, resulting in disconnect.

To prevent this, we need to add the rpchdr and proghdr to a iobref and send
it in msg:
   iobref_add (iobref, iobufs)
   msg.iobref = iobref;
The socket layer takes a ref on msg.iobref, if it cannot write to socket and
is adding to the queue. Thus we do not have use after free.

Thank You for discussing, debugging and fixing along:
Prashanth Pai 
Raghavendra G 
Rajesh Joseph 
Kotresh HR 
Mohammed Rafi KC 
Soumya Koduri 

> Reviewed-on: https://review.gluster.org/16613
> Reviewed-by: Prashanth Pai 
> Smoke: Gluster Build System 
> Reviewed-by: soumya k 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Raghavendra G 

Change-Id: Ifa6bf6f4879141f42b46830a37c1574b21b37275
BUG: 1422788
Signed-off-by: Poornima G 
Reviewed-on: https://review.gluster.org/16638
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System 
Reviewed-by: Prashanth Pai 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Raghavendra G

cluster/ec: Add/Modify description for eager-lock option

2017-04-07T11:59:33+00:00

This patch provides description for disperse.eager-lock
option for disperse volume.

It also modifies the description for cluster.eager-lock
option to indicate that this option is only for replica
volume.

>Change-Id: Ie73298947fcaaa6aaf825978bc2d27ceaff386d2
>BUG: 1327171
>Signed-off-by: Ashish Pandey 
>Reviewed-on: http://review.gluster.org/13999
>NetBSD-regression: NetBSD Build System 
>Smoke: Gluster Build System 
>Reviewed-by: Ravishankar N 
>CentOS-regression: Gluster Build System 
>Reviewed-by: Pranith Kumar Karampuri 

BUG: 1435645
Change-Id: I48b091e002b5c3308d6fbf2feb024a7f2fe08969
Signed-off-by: Sunil Kumar Acharya 
Reviewed-on: https://review.gluster.org/16943
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Xavier Hernandez

glusterd: support filesystems with dynamic inode sizes

2017-04-07T11:57:22+00:00

btrfs and zfs are two filesystems that do not have fixed sizes for
inodes. Instead of logging an error, skip checking and mark the size as
"N/A" like other properties that can not be reported.

The error message that was reported by users on the mailinglist shows up
like:

  [glusterd-utils.c:5458:glusterd_add_inode_size_to_dict] 0-management: could not find (null) to getinode size for /dev/vdb (btrfs): (null) package missing?

Cherry picked from commit 12921693b572f642156d3167d1c92d3449dfc8ec:
> Change-Id: Ib10b7a3669f2f4221075715d9fd44ce1ffc35324
> Reported-by: Arman Khalatyan 
> URL: http://lists.gluster.org/pipermail/gluster-users/2017-March/030189.html
> BUG: 1433425
> Signed-off-by: Niels de Vos 
> Reviewed-on: https://review.gluster.org/16867
> Smoke: Gluster Build System 
> Reviewed-by: Atin Mukherjee 
> Reviewed-by: Prashanth Pai 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 

Change-Id: Ib10b7a3669f2f4221075715d9fd44ce1ffc35324
Reported-by: Arman Khalatyan 
BUG: 1436412
Signed-off-by: Niels de Vos 
Reviewed-on: https://review.gluster.org/16960
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Zhou Zhengping

cluster/afr: Undo pending xattrs only on the up bricks

2017-04-07T11:56:55+00:00

Problem:
While doing conservative merge, even if a brick is down, it will reset
the pending xattr on that. When that brick comes up, as part of the
heal, it will consider this brick as the source and removes the entries
on the other bricks, which leads to data loss.

Fix:
Undo pending only for the bricks which are up.

> Change-Id: I18436fa0bb1faa5f60531b357dea3f6b20446303
> BUG: 1433571
> Signed-off-by: karthik-us 
> Reviewed-on: https://review.gluster.org/16913
> Reviewed-by: Pranith Kumar Karampuri 
> Smoke: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Ravishankar N 
(cherry picked from commit f91596e6566c605e70a31a60523d11f78a097c3c)

Change-Id: Id20c9ce53ee59f005d977494903247e2a8024ed1
BUG: 1436231
Signed-off-by: karthik-us 
Reviewed-on: https://review.gluster.org/16956
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Ravishankar N 
Reviewed-by: Pranith Kumar Karampuri

cluster/ec: Metadata healing fails to update the version

2017-04-07T11:55:32+00:00

During meatadata heal, we were not updating the version
though all the inode attributes were in sync.

Updated the code to adjust version when all the inode
attributes are in sync.

>BUG: 1425703
>Change-Id: I6723be3c5f748b286d4efdaf3c71e9d2087c7235
>Signed-off-by: Sunil Kumar Acharya 
>Reviewed-on: https://review.gluster.org/16772
>Smoke: Gluster Build System 
>Reviewed-by: Xavier Hernandez 
>NetBSD-regression: NetBSD Build System 
>Reviewed-by: Pranith Kumar Karampuri 
>CentOS-regression: Gluster Build System 

BUG: 1434298
Change-Id: I5b74423253138957644b1bfa543d4abb2532c377
Signed-off-by: Sunil Kumar Acharya 
Reviewed-on: https://review.gluster.org/16935
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Xavier Hernandez