glusterfs.git/rpc, branch exp

socket: Keepalives should happen on IPv6 as well as IPv4

2016-12-16T20:30:05+00:00

Summary:
- Check for AF_INET *and* AF_INET6.
- This is a cherry-pick of D3057373 to 3.8

Signed-off-by: Shreyas Siravara 
Change-Id: I53eb79284eddfee6e13821c6570809f575b96769
BUG: 1405478
Reviewed-on: http://review.gluster.org/16167
Reviewed-by: Jeff Darcy 
Tested-by: Jeff Darcy 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Vijay Bellur

rpc: fix for race between rpc and protocol/client

2016-12-05T14:09:48+00:00

It is possible that the notification thread which notifies
protocol/client layer about the disconnection is put to sleep
and meanwhile, a fuse thread or a timer thread initiates and
completes reconnection to the brick. The notification thread
is then woken up and protocol/client layer updates its flags
to indicate that network is disconnected. No reconnection is
initiated because reconnection is rpc-lib layer's responsibility
and its flags indicate that connection is connected.

Fix: Serialize connect and disconnect notify

Credit: Raghavendra Talur 
Change-Id: I8ff5d1a3283b47f5c26848a42016a40bc34ffc1d
BUG: 1386626
Signed-off-by: Rajesh Joseph 
Reviewed-on: http://review.gluster.org/15916
Reviewed-by: Raghavendra G 
Smoke: Gluster Build System 
Reviewed-by: Raghavendra Talur 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

dht/md-cache: Filter invalidate if the file is made a linkto file

2016-12-02T10:46:58+00:00

Upcall as a part of setattr, sends an invalidation and the
invalidation carries the resulting stat value. When a file
is converted to linkto files, even then an invalidation
is set and as a result the mountpoint shows the sticky
bit in the stat of the file.
eg: ---------T. 945 root root 0 Nov  8 10:14 hardlink.999

Fix:
When dht recieves a notification of sticky bit change, it updates
the flag, to indicate md-cache to send the subsequent lookup.

Change-Id: Ic2fd7a5b196db0754f9b97072e644e6bf69da606
BUG: 1392713
Signed-off-by: Poornima G 
Reviewed-on: http://review.gluster.org/15789
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Niels de Vos 
CentOS-regression: Gluster Build System 
Reviewed-by: Susant Palai 
Reviewed-by: Rajesh Joseph

cluster/afr: CLI for granular entry heal enablement/disablement

2016-11-28T11:56:33+00:00

When there are already existing non-granular indices created that are
yet to be healed, if granular-entry-heal option is toggled from 'off' to
'on', AFR self-heal whenever it kicks in, will try to look for granular
indices in 'entry-changes'. Because of the absence of name indices,
granular entry healing logic will fail to heal these directories, and
worse yet unset pending extended attributes with the assumption that
are no entries that need heal.

To get around this, a new CLI is introduced which will invoke glfsheal
program to figure whether at the time an attempt is made to enable
granular entry heal, there are pending heals on the volume OR there
are one or more bricks that are down. If either of them is true, the
command will be failed with the appropriate error.

New CLI: gluster volume heal  granular-entry-heal {enable,disable}

Change-Id: I1f4fe8162813b9068e198965d94169fee4adc099
BUG: 1370410
Signed-off-by: Krutika Dhananjay 
Reviewed-on: http://review.gluster.org/15747
Smoke: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee

afr,dht,ec: Replace GF_EVENT_CHILD_MODIFIED with event SOME_DESCENDENT_DOWN/UP

2016-11-21T09:32:05+00:00

Currently these are few events related to child_up/down:
GF_EVENT_CHILD_UP :  Issued when any of the protocol client
connects.
GF_EVENT_CHILD_MODIFIED : Issued by afr/dht/ec
GF_EVENT_CHILD_DOWN : Issued when any of the protocol client
disconnects.
These events get modified at the dht/afr/ec layers. Here is a
brief on the same.

DHT:
- All the subvolumes reported once, and atleast one child came
  up, then GF_EVENT_CHILD_UP is issued
- connect GF_EVENT_CHILD_UP is issued
- disconnect GF_EVENT_CHILD_MODIFIED is issued
- All the subvolumes disconnected, GF_EVENT_CHILD_DOWN is issued

AFR:
- First subvolume came up, then GF_EVENT_CHILD_UP is issued
- Subsequent subvolumes coming up, results in GF_EVENT_CHILD_MODIFIED
- Any of the subvolumes go down, then GF_EVENT_SOME_CHILD_DOWN is issued
- Last up subvolume goes down, then GF_EVENT_CHILD_DOWN is issued

Until the patch [1] introduced GF_EVENT_SOME_CHILD_UP,
GF_EVENT_CHILD_MODIFIED was issued by afr/dht when any of the subvolumes
go up or down.

Now with md-cache changes, there is a necessity to differentiate between
child up and down. Hence, introducing GF_EVENT_SOME_DESCENDENT_DOWN/UP and
getting rid of GF_EVENT_CHILD_MODIFIED.

[1] http://review.gluster.org/12573

Change-Id: I704140b6598f7ec705493251d2dbc4191c965a58
BUG: 1396038
Signed-off-by: Poornima G 
Reviewed-on: http://review.gluster.org/15764
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Smoke: Gluster Build System 
Reviewed-by: N Balachandran 
Reviewed-by: Pranith Kumar Karampuri 
Reviewed-by: Rajesh Joseph

Revert "rpc: Fix the race between notification and reconnection"

2016-11-16T09:25:31+00:00

This reverts commit a6b63e11b7758cf1bfcb67985e25ec02845f0995.

Nithya and Rajesh found that the mount fails sometimes after this patch
was merged so reverting it.

BUG: 1386626
Change-Id: I959a5b6c7da61368cf4c67c98193c6e8fdd1755d
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/15838
Reviewed-by: N Balachandran 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System

glusterd/quota: upgrade quota.conf file during an upgrade

2016-11-08T04:37:05+00:00

Problem
=======
When quota is enabled on 3.6, it will have quota conf version in quota.conf
as v1.1. This node gets upgraded to 3.7 but it will still have quota conf
version as v1.1 until a quota enable/disable/set limit is initiated. When
this is not initiated and when this node tries to peer probe a node which
is a fresh install of 3.7 (which will have quota conf version as v1.2), then this
will result in "Peer rejected" state. This patch fixes the issue.

Solution
========
When an upgrade happens from 3.6 to 3.7, quota.conf file needs
to be modified as well. With 3.6, in quota.conf the version will be
v1.1 and it needs to be changed to v1.2 from 3.7. This is because in
3.7, inode quota feature is introduced. So when an op-version bumpup
happens quota.conf needs to be upgraded with quota conf version v1.2
and all the 16 byte uuid needs to be changed to 17 bytes uuid as well.

Previously, when the cluster version is upgraded to 3.7, the quota.conf
got upgraded as well. But, the upgradation was done only when quota
enable/disable/set limit is done. With this patch, the upgradation is done
during a cluster op version bump up as well.

Change-Id: Idb5ba29d3e1ea0e45c85d87c952c75da9e0f99f0
BUG: 1371539
Signed-off-by: Manikandan Selvaganesh 
Reviewed-on: http://review.gluster.org/15352
Tested-by: Atin Mukherjee 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System 
Reviewed-by: Atin Mukherjee

rpc: Fix the race between notification and reconnection

2016-10-25T06:42:20+00:00

Problem:
There was a hang because unlock on an entry failed with
ENOTCONN.
Client thinks the connection is down where as server thinks
the connection is up.

This is the race we are seeing:
1) Connection from client to the brick disconnects.
2) Saved frames unwind is called which unwinds all
   frames that were wound before disconnect.
3) connection from client to the brick happens and
   setvolume.
4) Disconnect notification for the connection in 1)
   comes now and calls client_rpc_notify() which
   marks the connection to be offline even when the
   connection is up.

This is happening because I/O can retrigger connection
before disconnect notification is sent to the higher
layers in rpc.

Fix:
Notify the higher layers that a disconnect happened and then
go ahead with reconnect logic.

For the logs which point to the information above check:
https://bugzilla.redhat.com/show_bug.cgi?id=1386626#c1

Thanks to Raghavendra G for suggesting the correct fix.

BUG: 1386626
Change-Id: I3c84ba1f17010bd69049fa88ec5f0ae431f8cda9
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/15681
NetBSD-regression: NetBSD Build System 
Reviewed-by: Niels de Vos 
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System 
Reviewed-by: Raghavendra G

compound fops: Fix file corruption issue

2016-10-24T14:11:08+00:00

1. Address of a local variable @args is copied into state->req
in server3_3_compound (). But even after the function has gone out of
scope, in server_compound_resume () this pointer is accessed and
dereferenced. This patch fixes that.

2. Compound fops, by virtue of NOT having a vector sizer (like the one
writev has), ends up having both the header and the data (in case one of
its member fops is WRITEV) in the same hdr_iobuf. This buffer was not
being preserved through the lifetime of the compound fop, causing it to
be overwritten by a parallel write fop, even when the writev associated
with the currently executing compound fop is yet to hit the desk, thereby
corrupting the file's data. This is fixed by associating the hdr_iobuf with
the iobref so its memory remains valid through the lifetime of the fop.

3. Also fixed a use-after-free bug in protocol/client in compound fops cbk,
missed by Linux but caught by NetBSD.

Finally, big thanks to Pranith Kumar K and Raghavendra Gowdappa for their
help in debugging this file corruption issue.

Change-Id: I6d5c04f400ecb687c9403a17a12683a96c2bf122
BUG: 1378778
Signed-off-by: Krutika Dhananjay 
Reviewed-on: http://review.gluster.org/15654
NetBSD-regression: NetBSD Build System 
Reviewed-by: Raghavendra G 
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System

rpc/socket.c : Modify socket_poller code in case of ENODATA error code.

2016-10-24T05:30:52+00:00

Problem:  Continuous warning message(ENODATA) are coming in socket_rwv
          while SSL is enabled.

Solution: To avoid the warning message update one condition in
          socket_poller loop code before break from loop in case
          of error returned by poll functions.

BUG: 1386450
Change-Id: I19b3a92d4c3ba380738379f5679c1c354f0ab9b1
Signed-off-by: Mohit Agrawal 
Reviewed-on: http://review.gluster.org/15677
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra G