glusterfs.git/rpc/rpc-lib/src, branch release-3.8

rpc: bump up conn->cleanup_gen in rpc_clnt_reconnect_cleanup

2017-07-11T13:25:40+00:00

Commit 086436a introduced generation number (cleanup_gen) to ensure that
rpc layer doesn't end up cleaning up the connection object if
application layer has already destroyed it. Bumping up cleanup_gen was
done only in rpc_clnt_connection_cleanup (). However the same is needed
in rpc_clnt_reconnect_cleanup () too as with out it if the object gets destroyed
through the reconnect event in the application layer, rpc layer will
still end up in trying to delete the object resulting into double free
and crash.

Peer probing an invalid host/IP was the basic test to catch this issue.

Cherry picked from commit 39e09ad1e0e93f08153688c31433c38529f93716:
> Change-Id: Id5332f3239cb324cead34eb51cf73d426733bd46
> BUG: 1433578
> Signed-off-by: Atin Mukherjee 
> Reviewed-on: https://review.gluster.org/16914
> Smoke: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> Reviewed-by: Milind Changire 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Jeff Darcy 

Change-Id: Id5332f3239cb324cead34eb51cf73d426733bd46
BUG: 1462447
Signed-off-by: Niels de Vos 
Reviewed-on: https://review.gluster.org/17743
Smoke: Gluster Build System 
Reviewed-by: Milind Changire 
CentOS-regression: Gluster Build System

rpc/clnt: remove locks while notifying CONNECT/DISCONNECT

2017-07-11T13:25:31+00:00

Locking during notify was introduced as part of commit
aa22f24f5db7659387704998ae01520708869873 [1]. The fix was introduced
to fix out-of-order CONNECT/DISCONNECT events from rpc-clnt to parent
xlators [2]. However as part of handling DISCONNECT protocol/client
does unwind saved frames (with failure) waiting for responses. This
saved_frames_unwind can be a costly operation and hence ideally
shouldn't be included in the critical section of notifylock, as it
unnecessarily delays the reconnection to same brick. Also, its not a
good practise to pass control to other xlators holding a lock as it
can lead to deadlocks. So, this patch removes locking in rpc-clnt
while notifying parent xlators.

To fix [2], two changes are present in this patch:

* notify DISCONNECT before cleaning up rpc connection (same as commit
  a6b63e11b7758cf1bfcb6798, patch [3]).
* protocol/client uses rpc_clnt_cleanup_and_start, which cleans up rpc
  connection and does a start while handling a DISCONNECT event from
  rpc. Note that patch [3] was reverted as rpc_clnt_start called in
  quick_reconnect path of protocol/client didn't invoke connect on
  transport as the connection was not cleaned up _yet_ (as cleanup was
  moved post notification in rpc-clnt). This resulted in clients never
  attempting connect to bricks.

Note that one of the neater ways to fix [2] (without using locks) is
to introduce generation numbers to map CONNECT and DISCONNECTS across
epochs and ignore DISCONNECT events if they don't belong to current
epoch. However, this approach is a bit complex to implement and
requires time. So, current patch is a hacky stop-gap fix till we come
up with a more cleaner solution.

[1] http://review.gluster.org/15916
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1386626
[3] http://review.gluster.org/15681

Cherry picked from commit 773f32caf190af4ee48818279b6e6d3c9f2ecc79:
> Change-Id: I62daeee8bb1430004e28558f6eb133efd4ccf418
> Signed-off-by: Raghavendra G 
> BUG: 1427012
> Reviewed-on: https://review.gluster.org/16784
> Smoke: Gluster Build System 
> Reviewed-by: Milind Changire 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 

Change-Id: I62daeee8bb1430004e28558f6eb133efd4ccf418
Reported-by: Markus Stockhausen 
Signed-off-by: Niels de Vos 
BUG: 1462447
Reviewed-on: https://review.gluster.org/17733
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Milind Changire 
Reviewed-by: Raghavendra G

rpcsvc: Add rpchdr and proghdr to iobref before submitting to transport

2017-04-07T12:05:09+00:00

Backport of https://review.gluster.org/16613

Issue:
When fio is run on multiple clients (each client writes to its own files),
and meanwhile the clients does a readdirp, thus the client which did
a readdirp will now recieve the upcalls. In this scenario the client
disconnects with rpc decode failed error.

RCA:
Upcall calls rpcsvc_request_submit to submit the request to socket:
rpcsvc_request_submit currently:
rpcsvc_request_submit () {
   iobuf = iobuf_new
   iov = iobuf->ptr
   fill iobuf to contain xdrised upcall content - proghdr
   rpcsvc_callback_submit (..iov..)
   ...
   if (iobuf)
       iobuf_unref (iobuf)
}

rpcsvc_callback_submit (... iov...) {
   ...
   iobuf = iobuf_new
   iov1 = iobuf->ptr
   fill iobuf to contain xdrised rpc header - rpchdr
   msg.rpchdr = iov1
   msg.proghdr = iov
   ...
   rpc_transport_submit_request (msg)
   ...
   if (iobuf)
       iobuf_unref (iobuf)
}

rpcsvc_callback_submit assumes that once rpc_transport_submit_request()
returns the msg is written on to socket and thus the buffers(rpchdr, proghdr)
can be freed, which is not the case. In especially high workload,
rpc_transport_submit_request() may not be able to write to socket immediately
and hence adds it to its own queue and returns as successful. Thus, we have
use after free, for rpchdr and proghdr. Hence the clients gets garbage rpchdr
and proghdr and thus fails to decode the rpc, resulting in disconnect.

To prevent this, we need to add the rpchdr and proghdr to a iobref and send
it in msg:
   iobref_add (iobref, iobufs)
   msg.iobref = iobref;
The socket layer takes a ref on msg.iobref, if it cannot write to socket and
is adding to the queue. Thus we do not have use after free.

Thank You for discussing, debugging and fixing along:
Prashanth Pai 
Raghavendra G 
Rajesh Joseph 
Kotresh HR 
Mohammed Rafi KC 
Soumya Koduri 

> Reviewed-on: https://review.gluster.org/16613
> Reviewed-by: Prashanth Pai 
> Smoke: Gluster Build System 
> Reviewed-by: soumya k 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Raghavendra G 

Change-Id: Ifa6bf6f4879141f42b46830a37c1574b21b37275
BUG: 1422788
Signed-off-by: Poornima G 
Reviewed-on: https://review.gluster.org/16638
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System 
Reviewed-by: Prashanth Pai 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Raghavendra G

rpc: fix for race between rpc and protocol/client

2016-12-20T17:12:25+00:00

It is possible that the notification thread which notifies
protocol/client layer about the disconnection is put to sleep
and meanwhile, a fuse thread or a timer thread initiates and
completes reconnection to the brick. The notification thread
is then woken up and protocol/client layer updates its flags
to indicate that network is disconnected. No reconnection is
initiated because reconnection is rpc-lib layer's responsibility
and its flags indicate that connection is connected.

Fix: Serialize connect and disconnect notify

> Credit: Raghavendra Talur 
> Reviewed-on: http://review.gluster.org/15916
> Reviewed-by: Raghavendra G 
> Smoke: Gluster Build System 
> Reviewed-by: Raghavendra Talur 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
(cherry picked from commit aa22f24f5db7659387704998ae01520708869873)

Change-Id: I8ff5d1a3283b47f5c26848a42016a40bc34ffc1d
BUG: 1401534
Signed-off-by: Rajesh Joseph 
Reviewed-on: http://review.gluster.org/16025
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Raghavendra Talur

cluster/afr: CLI for granular entry heal enablement/disablement

2016-11-29T10:52:25+00:00

        Backport of: http://review.gluster.org/15747
When there are already existing non-granular indices created that are
yet to be healed, if granular-entry-heal option is toggled from 'off' to
'on', AFR self-heal whenever it kicks in, will try to look for granular
indices in 'entry-changes'. Because of the absence of name indices,
granular entry healing logic will fail to heal these directories, and
worse yet unset pending extended attributes with the assumption that
are no entries that need heal.

To get around this, a new CLI is introduced which will invoke glfsheal
program to figure whether at the time an attempt is made to enable
granular entry heal, there are pending heals on the volume OR there
are one or more bricks that are down. If either of them is true, the
command will be failed with the appropriate error.

New CLI: gluster volume heal  granular-entry-heal {enable,disable}

Change-Id: I342e0390f847fcb015a50ef58aedfcbcb58f4ed3
BUG: 1398501
Signed-off-by: Krutika Dhananjay 
Reviewed-on: http://review.gluster.org/15942
NetBSD-regression: NetBSD Build System 
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

rpc: increase RPC/XID with each callback

2016-09-21T03:11:30+00:00

The RPC/XID for callbacks has been hardcoded to GF_UNIVERSAL_ANSWER. In
Wireshark these RPC-calls are marked as "RPC retransmissions" because of
the repeating RPC/XID. This is most confusing when verifying the
callbacks that the upcall framework sends. There is no way to see the
difference between real retransmissions and new callbacks.

This change was verified by create and removal of files through
different Gluster clients. The RPC/XID is increased on a per connection
(or client) base. The expectations of the RPC protocol are met this way.

> Change-Id: I2116bec0e294df4046d168d8bcbba011284cd0b2
> BUG: 1377097
> Signed-off-by: Niels de Vos 
> Reviewed-on: http://review.gluster.org/15524
> Smoke: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Raghavendra G 
(cherry picked from commit e9b39527d5dcfba95c4c52a522c8ce1f4512ac21)

Change-Id: I2116bec0e294df4046d168d8bcbba011284cd0b2
BUG: 1377290
Signed-off-by: Niels de Vos 
Reviewed-on: http://review.gluster.org/15528
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra G

changelog/rpc: Fix rpc_clnt_t mem leaks

2016-07-25T06:40:30+00:00

Backport of: http://review.gluster.org/13658

PROBLEM:
   1. Freeing up rpc_clnt object might lead to crashes. Well,
      it was not a necessity to free rpc-clnt object till now
      because all the existing use cases needs to reconnect
      back on disconnects. Hence timer code was not taking
      ref on rpc-clnt object.

      Glusterd had some use-cases that led to crash due to
      ping-timer and they fixed only those code paths that
      involve ping-timer.

      Now, since changelog has an use-case where rpc-clnt
      need to be freed up, we need to fix timer code to take
      refs

   2. In changelog, because of issue 1, only mydata was being
      freed which is incorrect. And there are races where
      rpc-clnt object would access the freed mydata which
      would lead to crashes.

      Since changelog xlator resides on brick side and is long
      living process, if multiple libgfchangelog consumers
      register to changelog and disconnect/reconnect mulitple
      times, it would result in leak of 'rpc-clnt' object
      for every connect/disconnect.

SOLUTION:
   1. Handle ref/unref of 'rpc_clnt' structure in timer
      functions properly.
   2. In changelog, unref 'rpc_clnt' in RPC_CLNT_DISCONNECT
      after disabling timers and free mydata on RPC_CLNT_DESTROY.

RPC SETUP IN CHANGELOG:
   1. changelog xlator initiates rpc server say 'changelog_rpc_server'
   2. libgfchangelog initiates one rpc server say 'libgfchangelog_rpc_server'
   3. libgfchangelog initiates rpc client and connects to 'changelog_rpc_server'
   4. In return changelog_rpc_server initiates a rpc client and connects back
      to 'libgfchangelog_rpc_server'

REF/UNREF HANDLING IN TIMER FUNCTIONS:
Let's say rpc clnt refcount = 1
   1. Take the ref before reigstering callback to timer queue
           >>>>  rpc_clnt_ref (say ref count becomes = 2)
   2. Register a callback to timer say 'callback1'
   3. If register fails:
           >>>> rpc_clnt_unref (ref count = 1)
   4. On timer expiration, 'callback1' gets called. So unref rpc clnt at the end
      in 'callback1'. This is corresponding to ref taken in step 1
           >>>> rpc_clnt_unref (ref count = 1)
   5. The cycle from step-1 to step-4 continues....until timer cancel event happens
   6. timer cancel of say 'callback1'
           If timer cancel fails:
                 Do nothing, Step-4 would have unrefd
           If timer cancel succeeds:
                 >>>> rpc_clnt_unref (ref count = 1)

Change-Id: I91389bc511b8b1a17824941970ee8d2c29a74a09
BUG: 1359364
Signed-off-by: Kotresh HR 
(cherry picked from commit 637ce9e2e27e9f598a4a6c5a04cd339efaa62076)
Reviewed-on: http://review.gluster.org/14994
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra G

glusterd: add defence mechanism to avoid brick port clashes

2016-05-10T07:11:33+00:00

Intro:
Currently glusterd maintain the portmap registry which contains ports that
are free to use between 49152 - 65535, this registry is initialized
once, and updated accordingly as an then when glusterd sees they are been
used.

Glusterd first checks for a port within the portmap registry and gets a FREE
port marked in it, then checks if that port is currently free using a connect()
function then passes it to brick process which have to bind on it.

Problem:
We see that there is a time gap between glusterd checking the port with
connect() and brick process actually binding on it. In this time gap it could
be so possible that any process would have occupied this port because of which
brick will fail to bind and exit.

Case 1:
To avoid the gluster client process occupying the port supplied by glusterd :

we have separated the client port map range with brick port map range more @
http://review.gluster.org/#/c/13998/

Case 2: (Handled by this patch)
To avoid the other foreign process occupying the port supplied by glusterd :

To handle above situation this patch implements a mechanism to return EADDRINUSE
error code to glusterd, upon which a new port is allocated and try to restart
the brick process with the newly allocated port.

Note: Incase of glusterd restarts i.e. runner_run_nowait() there is no way to
handle Case 2, becuase runner_run_nowait() will not wait to get the return/exit
code of the executed command (brick process). Hence as of now in such case,
we cannot know with what error the brick has failed to connect.

This patch also fix the runner_end() to perform some cleanup w.r.t
return values.

Backport of:
> Change-Id: Iec52e7f5d87ce938d173f8ef16aa77fd573f2c5e
> BUG: 1322805
> Signed-off-by: Prasanna Kumar Kalever 
> Reviewed-on: http://review.gluster.org/14043
> Tested-by: Prasanna Kumar Kalever 
> Reviewed-by: Atin Mukherjee 
> Smoke: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Raghavendra G 

Change-Id: Id7d8351a0082b44310177e714edc0571ad0f7195
BUG: 1333711
Signed-off-by: Prasanna Kumar Kalever 
Reviewed-on: http://review.gluster.org/14235
Tested-by: Prasanna Kumar Kalever 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee

protocol: add setactivelk () fop

2016-05-02T01:04:52+00:00

Change-Id: I60fe2d59c454095febce4c0fbef87a2dad9636e4
BUG: 1326085
Signed-off-by: Susant Palai 
Reviewed-on: http://review.gluster.org/14013
Smoke: Gluster Build System 
Reviewed-by: Niels de Vos 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

protocol: add getactivelk () fop

2016-05-02T01:04:31+00:00

Change-Id: Ie38198db990f133fe163ba160cdf647e34f83f4f
BUG: 1326085
Signed-off-by: Susant Palai 
Reviewed-on: http://review.gluster.org/13994
Reviewed-by: Niels de Vos 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System