glusterfs.git/rpc, branch v3.10.1

build: errors generating xdr stubs+headers with `make -j`

2017-03-28T12:08:12+00:00

Using a makebomb, on f23 at least, blows up when generating the
xdr headers and stubs. (Works reliably on f25 though, go figure.)
This change appears to mitigate the race on f23.

Master change https://review.gluster.org/16941
Master BZ: 1429696

Change-Id: I006066f0e7c3f8b65189f97c70089f3422e3e08b
BUG: 1430512
Signed-off-by: Kaleb S. KEITHLEY 
Reviewed-on: https://review.gluster.org/16942
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

rpc: bump up conn->cleanup_gen in rpc_clnt_reconnect_cleanup

2017-03-27T13:58:29+00:00

Commit 086436a introduced generation number (cleanup_gen) to ensure that
rpc layer doesn't end up cleaning up the connection object if
application layer has already destroyed it. Bumping up cleanup_gen was
done only in rpc_clnt_connection_cleanup (). However the same is needed
in rpc_clnt_reconnect_cleanup () too as with out it if the object gets destroyed
through the reconnect event in the application layer, rpc layer will
still end up in trying to delete the object resulting into double free
and crash.

Peer probing an invalid host/IP was the basic test to catch this issue.

>Reviewed-on: https://review.gluster.org/16914
>Smoke: Gluster Build System 
>NetBSD-regression: NetBSD Build System 
>Reviewed-by: Milind Changire 
>CentOS-regression: Gluster Build System 
>Reviewed-by: Jeff Darcy 
>(cherry picked from commit 39e09ad1e0e93f08153688c31433c38529f93716)

Change-Id: Id5332f3239cb324cead34eb51cf73d426733bd46
BUG: 1434399
Signed-off-by: Atin Mukherjee 
Reviewed-on: https://review.gluster.org/16936
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

build: libgfxdr.so calls GF_FREE(), needs to link with -lglusterfs

2017-03-21T17:12:15+00:00

build: libgfxdr.so calls GF_FREE(), needs to link with -lglusterfs

The previous change to remove the xdrgen script exposed (or
created) a recursive build dependency: libglusterfs needs the
generated headers, and libgfxdr should be linked with libglusterfs
for GF_FREE/__gf_free.

(Much grumbling about libglusterfs being the kitchen sink of gluster
elided. This would not be necessary if there were two more more libs,
a gluster "runtime" library with common gluster code shared by the
xlators and daemons, and a utility library with things like the
rbtree, memory allocation, and whatnot.)

So. Link at build time or link at runtime? For truth-and-beauty, link
with libglusterfs.so at build time. Without truth-and-beauty, don't
link with libglusterfs and rely on other the libs that link with
libglusterfs to provide resolution of __gf_free().

Truth-and-beauty it is. But how to generate the headers first, then
build libglusterfs, then come back and build libgfxdr? Autotools is a
maze of twisty passages, all different. Things that work with gnu
make on linux don't work with the BSD make. Finally I hit on this
solution. Add a shadow directory where make only generates the headers,
then build libglusterfs using the generated headers, and finally build
libgfxdr and link with libglusterfs.

See original BZ 1330604
change http://review.gluster.org/14085

master BZ 1429696
master change: https://review.gluster.org/#/c/16873/

Change-Id: Iede8a30e3103176cb8f0b054885f30fcb352492b
BUG: 1430512
Signed-off-by: Kaleb S. KEITHLEY 
Reviewed-on: https://review.gluster.org/16874
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

rpc/clnt: remove locks while notifying CONNECT/DISCONNECT

2017-03-06T16:20:33+00:00

Locking during notify was introduced as part of commit
aa22f24f5db7659387704998ae01520708869873 [1]. The fix was introduced
to fix out-of-order CONNECT/DISCONNECT events from rpc-clnt to parent
xlators [2]. However as part of handling DISCONNECT protocol/client
does unwind saved frames (with failure) waiting for responses. This
saved_frames_unwind can be a costly operation and hence ideally
shouldn't be included in the critical section of notifylock, as it
unnecessarily delays the reconnection to same brick. Also, its not a
good practise to pass control to other xlators holding a lock as it
can lead to deadlocks. So, this patch removes locking in rpc-clnt
while notifying parent xlators.

To fix [2], two changes are present in this patch:

* notify DISCONNECT before cleaning up rpc connection (same as commit
  a6b63e11b7758cf1bfcb6798, patch [3]).
* protocol/client uses rpc_clnt_cleanup_and_start, which cleans up rpc
  connection and does a start while handling a DISCONNECT event from
  rpc. Note that patch [3] was reverted as rpc_clnt_start called in
  quick_reconnect path of protocol/client didn't invoke connect on
  transport as the connection was not cleaned up _yet_ (as cleanup was
  moved post notification in rpc-clnt). This resulted in clients never
  attempting connect to bricks.

Note that one of the neater ways to fix [2] (without using locks) is
to introduce generation numbers to map CONNECT and DISCONNECTS across
epochs and ignore DISCONNECT events if they don't belong to current
epoch. However, this approach is a bit complex to implement and
requires time. So, current patch is a hacky stop-gap fix till we come
up with a more cleaner solution.

[1] http://review.gluster.org/15916
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1386626
[3] http://review.gluster.org/15681

>Change-Id: I62daeee8bb1430004e28558f6eb133efd4ccf418
>Signed-off-by: Raghavendra G 
>BUG: 1427012
>Reviewed-on: https://review.gluster.org/16784
>Smoke: Gluster Build System 
>Reviewed-by: Milind Changire 
>NetBSD-regression: NetBSD Build System 
>CentOS-regression: Gluster Build System 
(cherry picked from commit 773f32caf190af4ee48818279b6e6d3c9f2ecc79)

Change-Id: I62daeee8bb1430004e28558f6eb133efd4ccf418
Signed-off-by: Raghavendra G 
BUG: 1428670
Reviewed-on: https://review.gluster.org/16835
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

rpcsvc: Add rpchdr and proghdr to iobref before submitting to transport

2017-02-16T14:46:53+00:00

Backport of https://review.gluster.org/16613

Issue:
When fio is run on multiple clients (each client writes to its own files),
and meanwhile the clients does a readdirp, thus the client which did
a readdirp will now recieve the upcalls. In this scenario the client
disconnects with rpc decode failed error.

RCA:
Upcall calls rpcsvc_request_submit to submit the request to socket:
rpcsvc_request_submit currently:
rpcsvc_request_submit () {
   iobuf = iobuf_new
   iov = iobuf->ptr
   fill iobuf to contain xdrised upcall content - proghdr
   rpcsvc_callback_submit (..iov..)
   ...
   if (iobuf)
       iobuf_unref (iobuf)
}

rpcsvc_callback_submit (... iov...) {
   ...
   iobuf = iobuf_new
   iov1 = iobuf->ptr
   fill iobuf to contain xdrised rpc header - rpchdr
   msg.rpchdr = iov1
   msg.proghdr = iov
   ...
   rpc_transport_submit_request (msg)
   ...
   if (iobuf)
       iobuf_unref (iobuf)
}

rpcsvc_callback_submit assumes that once rpc_transport_submit_request()
returns the msg is written on to socket and thus the buffers(rpchdr, proghdr)
can be freed, which is not the case. In especially high workload,
rpc_transport_submit_request() may not be able to write to socket immediately
and hence adds it to its own queue and returns as successful. Thus, we have
use after free, for rpchdr and proghdr. Hence the clients gets garbage rpchdr
and proghdr and thus fails to decode the rpc, resulting in disconnect.

To prevent this, we need to add the rpchdr and proghdr to a iobref and send
it in msg:
   iobref_add (iobref, iobufs)
   msg.iobref = iobref;
The socket layer takes a ref on msg.iobref, if it cannot write to socket and
is adding to the queue. Thus we do not have use after free.

Thank You for discussing, debugging and fixing along:
Prashanth Pai 
Raghavendra G 
Rajesh Joseph 
Kotresh HR 
Mohammed Rafi KC 
Soumya Koduri 

> Reviewed-on: https://review.gluster.org/16613
> Reviewed-by: Prashanth Pai 
> Smoke: Gluster Build System 
> Reviewed-by: soumya k 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Raghavendra G 

Change-Id: Ifa6bf6f4879141f42b46830a37c1574b21b37275
BUG: 1422363
Signed-off-by: Poornima G 
Reviewed-on: https://review.gluster.org/16623
NetBSD-regression: NetBSD Build System 
Smoke: Gluster Build System 
Reviewed-by: Prashanth Pai 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

glusterd: add a cli command to trigger a statedump on a client

2017-02-14T01:53:10+00:00

With this, we will be able to trigger statedumps on remote Gluster
clients, mainly targetted for applications using libgfapi.

Design:
SIGUSR signal is the most comman way of taking a statedump in Gluster.
But it cannot be used for libgfapi based processes, as the process
loading the library might have already consumed SIGUSR signal. Hence
going by the command way.

One has to issue a Gluster command to initiate a statedump on the
libgfapi based client. The command takes hostname and PID as an
argument. All the glusterds in the cluster, check if they are connected
to the specified hostname, and send an RPC request to all the connected
clients from that hostname (via the mgmt connection).

> URL: http://review.gluster.org/16357
> BUG: 1169302
> Signed-off-by: Poornima G 
> [ndevos: minor fixes and split patch in smaller pieces]
> Reviewed-on-master: https://review.gluster.org/9228
> Reviewed-by: Niels de Vos 
> Tested-by: Niels de Vos 
> Reviewed-by: Kaleb KEITHLEY 
> Reviewed-by: Samikshan Bairagya 

BUG: 1418981
Change-Id: Icbe4d2f026b32a2c7d5535e1bfb2cdaaff042e91
Signed-off-by: Shyam 
Reviewed-on: https://review.gluster.org/16601
Smoke: Gluster Build System 
Reviewed-by: Niels de Vos 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

Tier: remove warning related to the enum

2017-02-08T14:16:04+00:00

        back-port of : https://review.gluster.org/#/c/16539/

PROBLEM: In the tier as a service patch the enums for tier (from
gf1_op_command and gf_defrag_command) are put into a single enum
gf_defrag_command which causes a warning that will make the build
fail.

FIX: send both the enum and eliminate the warning.

>Change-Id: I899ff622dfb07134e6459aa65f65ea7252765293
>BUG: 1418973
>Signed-off-by: hari gowtham 
>Reviewed-on: https://review.gluster.org/16539
>Smoke: Gluster Build System 
>Tested-by: hari gowtham 
>NetBSD-regression: NetBSD Build System 
>CentOS-regression: Gluster Build System 
>Reviewed-by: Atin Mukherjee 

Change-Id: I8d2ec89b8689066091ab58406d225e1058f435cf
BUG: 1419846
Signed-off-by: hari gowtham 
Reviewed-on: https://review.gluster.org/16553
Smoke: Gluster Build System 
Tested-by: hari gowtham 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Atin Mukherjee 
CentOS-regression: Gluster Build System

socket: GF_REF_PUT should be called outside lock

2017-02-07T11:49:43+00:00

GF_REF_PUT was called inside lock which can call
socket_poller_mayday which inturn tries to take the
same lock. This can lead to deadlock scenario.

>Reviewed-on: https://review.gluster.org/16343
>Reviewed-by: Raghavendra G 
>CentOS-regression: Gluster Build System 
>Smoke: Gluster Build System 
>NetBSD-regression: NetBSD Build System 

BUG: 1419503
Change-Id: Ib3b161bcfeac810bd3593dc04c10ef984f996b17
Signed-off-by: Rajesh Joseph 
Reviewed-on: https://review.gluster.org/16548
Tested-by: Atin Mukherjee 
Reviewed-by: Raghavendra G 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System

socket: retry connect immediately if it fails

2017-02-03T00:43:24+00:00

Previously we relied on a complex dance of setting flags, shutting
down the socket, tearing stuff down, getting an event, tearing more
stuff down, and waiting for a higher-level retry.  What we really
need, in the case where we're just trying to connect prematurely e.g.
to a brick that hasn't fully come up yet, is a simple retry of the
connect(2) call.

This was discovered by observing failures in ec-new-entry.t with
multiplexing enabled, but probably fixes other random failures as
well.

Backport of:
> Change-Id: Ibedb8942060bccc96b02272a333c3002c9b77d4c
> BUG: 1385758
> Reviewed-on: https://review.gluster.org/16510

BUG: 1418091
Change-Id: I4bac26929a12cabcee4f9e557c8b4d520948378b
Signed-off-by: Jeff Darcy 
Reviewed-on: https://review.gluster.org/16533
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

rpc/socket.c : Bonnie++ hangs during rewrites in ganesha + SSL

2017-02-02T17:33:45+00:00

Problem: Bonnie++ rewrite operation hangs in ganesha + SSL environment

Solution: Bonnie++ hangs during execution of rewrite operation in
          ganesha + SSL environment.It was hanged due to blocking on poll
          call in ssl_do because no POLLOUT event was getting on socket.
          Socket is not getting POLLOUT event because all other threads
          are waiting to get lock and lock is not released ssl_do
          because it is not getting any event on poll.To correct it
          update the condition in ssl_do as same in getting error
          SSL_ERROR_WANT_READ.

Test:     To test the patch followed below procedure
          1) Setup 2X2 Ganesha + SSL environment.
          2) Run bonnie from 3 nfs client parallely
          3) After run "Rewwrite operation" by bonnie it is hanged.
          4) After apply the patch it is not hanged.

> BUG: 1418213
> Change-Id: I5985cbbc4cfdac5d287268d791e31c274abc3c8d
> Signed-off-by: Mohit Agrawal 
> Reviewed-on: https://review.gluster.org/16501
> Smoke: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> Reviewed-by: Jeff Darcy 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Vijay Bellur 
> (cherry picked from commit d7077bca4b372a056d23416294e729637e9af94e)

Change-Id: Id029c71382025477bb5ff31f28ec537e4fe58b03
BUG: 1418541
Reviewed-on: https://review.gluster.org/16513
Tested-by: MOHIT AGRAWAL 
NetBSD-regression: NetBSD Build System 
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan