glusterfs.git/rpc, branch release-3.8-fb

event: Idle connection management

2017-09-12T15:12:21+00:00

Summary:
- This diff adds support for detecting and tracking idle client connections.
- It allows *service translators* (server, nfs) to opt-in to detect and close idle client connections.
- Right now it explicitly restricts the service to NFS as a safety.

Here are the debug logs when a client connection gets closed:

  [2016-03-29 17:27:06.154232] W [socket.c:2426:socket_timeout_handler] 0-socket: Shutting down idle client connection (idle=20s,fd=20,conn=[2401:db00:11:d0af:face:0:3:0:957]->[2401:db00:11:d0af:face:0:3:0:2049])!
  [2016-03-29 17:27:06.154292] D [event-epoll.c:655:__event_epoll_timeout_slot] 0-epoll: Connection on slot->fd=9 was idle for 20 seconds!
  [2016-03-29 17:27:06.163282] D [socket.c:629:__socket_rwv] 0-socket.nfs-server: EOF on socket
  [2016-03-29 17:27:06.163298] D [socket.c:2474:socket_event_handler] 0-transport: disconnecting now
  [2016-03-29 17:27:06.163316] D [event-epoll.c:614:event_dispatch_epoll_handler] 0-epoll: generation bumped on idx=9 from gen=4 to slot->gen=5, fd=20, slot->fd=20

Test Plan: - Used stuck NFS mounts to create idle clients and unstuck them.

Reviewers: kvigor, rwareing

Reviewed By: rwareing

Subscribers: dld, moox, dph

Differential Revision: https://phabricator.fb.com/D3112099

Change-Id: Ic06c89e03f87daabab7f07f892390edd1a1fcc20
Signed-off-by: Jeff Darcy 
Reviewed-on: https://review.gluster.org/18265
Reviewed-by: Jeff Darcy 
Tested-by: Jeff Darcy 
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System

Merge remote-tracking branch 'origin/release-3.8' into release-3.8-fb

2017-08-31T19:33:59+00:00

Change-Id: Ie35cd1c8c7808949ddf79b3189f1f8bf0ff70ed8

gNFSd: Auto re-register NFS/Mount programs with rpcbind periodically

2017-08-30T01:23:51+00:00

Summary:
Every once in a while rpcbind crashes and the NFS endpoints go bye-bye.
This diff makes it such that we should almost never encounter the case
where we have NFS up and rpcbind down causing bad endpoints and hanging
mounts for our customers.

Test Plan: Added prove tests + tested on dev server

Reviewers: dph, moox, rwareing

Reviewed By: rwareing

Differential Revision: https://phabricator.fb.com/D2571724

Tasks: 8803558

Change-Id: I35acb2d731185a7b20020cb57bdd4d879e978df4
Signature: t1:2571724:1445555327:3276a4dcc4da71346b09d4aeb46c69dddcc7c5ba
Reviewed-on: https://review.gluster.org/17961
Smoke: Gluster Build System 
Reviewed-by: Shreyas Siravara 
CentOS-regression: Gluster Build System

rpc: bump up conn->cleanup_gen in rpc_clnt_reconnect_cleanup

2017-07-11T13:25:40+00:00

Commit 086436a introduced generation number (cleanup_gen) to ensure that
rpc layer doesn't end up cleaning up the connection object if
application layer has already destroyed it. Bumping up cleanup_gen was
done only in rpc_clnt_connection_cleanup (). However the same is needed
in rpc_clnt_reconnect_cleanup () too as with out it if the object gets destroyed
through the reconnect event in the application layer, rpc layer will
still end up in trying to delete the object resulting into double free
and crash.

Peer probing an invalid host/IP was the basic test to catch this issue.

Cherry picked from commit 39e09ad1e0e93f08153688c31433c38529f93716:
> Change-Id: Id5332f3239cb324cead34eb51cf73d426733bd46
> BUG: 1433578
> Signed-off-by: Atin Mukherjee 
> Reviewed-on: https://review.gluster.org/16914
> Smoke: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> Reviewed-by: Milind Changire 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Jeff Darcy 

Change-Id: Id5332f3239cb324cead34eb51cf73d426733bd46
BUG: 1462447
Signed-off-by: Niels de Vos 
Reviewed-on: https://review.gluster.org/17743
Smoke: Gluster Build System 
Reviewed-by: Milind Changire 
CentOS-regression: Gluster Build System

rpc/clnt: remove locks while notifying CONNECT/DISCONNECT

2017-07-11T13:25:31+00:00

Locking during notify was introduced as part of commit
aa22f24f5db7659387704998ae01520708869873 [1]. The fix was introduced
to fix out-of-order CONNECT/DISCONNECT events from rpc-clnt to parent
xlators [2]. However as part of handling DISCONNECT protocol/client
does unwind saved frames (with failure) waiting for responses. This
saved_frames_unwind can be a costly operation and hence ideally
shouldn't be included in the critical section of notifylock, as it
unnecessarily delays the reconnection to same brick. Also, its not a
good practise to pass control to other xlators holding a lock as it
can lead to deadlocks. So, this patch removes locking in rpc-clnt
while notifying parent xlators.

To fix [2], two changes are present in this patch:

* notify DISCONNECT before cleaning up rpc connection (same as commit
  a6b63e11b7758cf1bfcb6798, patch [3]).
* protocol/client uses rpc_clnt_cleanup_and_start, which cleans up rpc
  connection and does a start while handling a DISCONNECT event from
  rpc. Note that patch [3] was reverted as rpc_clnt_start called in
  quick_reconnect path of protocol/client didn't invoke connect on
  transport as the connection was not cleaned up _yet_ (as cleanup was
  moved post notification in rpc-clnt). This resulted in clients never
  attempting connect to bricks.

Note that one of the neater ways to fix [2] (without using locks) is
to introduce generation numbers to map CONNECT and DISCONNECTS across
epochs and ignore DISCONNECT events if they don't belong to current
epoch. However, this approach is a bit complex to implement and
requires time. So, current patch is a hacky stop-gap fix till we come
up with a more cleaner solution.

[1] http://review.gluster.org/15916
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1386626
[3] http://review.gluster.org/15681

Cherry picked from commit 773f32caf190af4ee48818279b6e6d3c9f2ecc79:
> Change-Id: I62daeee8bb1430004e28558f6eb133efd4ccf418
> Signed-off-by: Raghavendra G 
> BUG: 1427012
> Reviewed-on: https://review.gluster.org/16784
> Smoke: Gluster Build System 
> Reviewed-by: Milind Changire 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 

Change-Id: I62daeee8bb1430004e28558f6eb133efd4ccf418
Reported-by: Markus Stockhausen 
Signed-off-by: Niels de Vos 
BUG: 1462447
Reviewed-on: https://review.gluster.org/17733
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Milind Changire 
Reviewed-by: Raghavendra G

Build/test fixes - build_env, tirpc, mem-pool, cleanup

2017-07-06T19:05:23+00:00

Differential Revision: https://phabricator.intern.facebook.com/D5376801

Change-Id: I5bf733a395ef2b85065200fa5810ced27ee0d682
Reviewed-on: https://review.gluster.org/17719
Smoke: Gluster Build System 
Tested-by: Jeff Darcy 
CentOS-regression: Gluster Build System 
Reviewed-by: Jeff Darcy

rpcsvc: Add rpchdr and proghdr to iobref before submitting to transport

2017-04-07T12:05:09+00:00

Backport of https://review.gluster.org/16613

Issue:
When fio is run on multiple clients (each client writes to its own files),
and meanwhile the clients does a readdirp, thus the client which did
a readdirp will now recieve the upcalls. In this scenario the client
disconnects with rpc decode failed error.

RCA:
Upcall calls rpcsvc_request_submit to submit the request to socket:
rpcsvc_request_submit currently:
rpcsvc_request_submit () {
   iobuf = iobuf_new
   iov = iobuf->ptr
   fill iobuf to contain xdrised upcall content - proghdr
   rpcsvc_callback_submit (..iov..)
   ...
   if (iobuf)
       iobuf_unref (iobuf)
}

rpcsvc_callback_submit (... iov...) {
   ...
   iobuf = iobuf_new
   iov1 = iobuf->ptr
   fill iobuf to contain xdrised rpc header - rpchdr
   msg.rpchdr = iov1
   msg.proghdr = iov
   ...
   rpc_transport_submit_request (msg)
   ...
   if (iobuf)
       iobuf_unref (iobuf)
}

rpcsvc_callback_submit assumes that once rpc_transport_submit_request()
returns the msg is written on to socket and thus the buffers(rpchdr, proghdr)
can be freed, which is not the case. In especially high workload,
rpc_transport_submit_request() may not be able to write to socket immediately
and hence adds it to its own queue and returns as successful. Thus, we have
use after free, for rpchdr and proghdr. Hence the clients gets garbage rpchdr
and proghdr and thus fails to decode the rpc, resulting in disconnect.

To prevent this, we need to add the rpchdr and proghdr to a iobref and send
it in msg:
   iobref_add (iobref, iobufs)
   msg.iobref = iobref;
The socket layer takes a ref on msg.iobref, if it cannot write to socket and
is adding to the queue. Thus we do not have use after free.

Thank You for discussing, debugging and fixing along:
Prashanth Pai 
Raghavendra G 
Rajesh Joseph 
Kotresh HR 
Mohammed Rafi KC 
Soumya Koduri 

> Reviewed-on: https://review.gluster.org/16613
> Reviewed-by: Prashanth Pai 
> Smoke: Gluster Build System 
> Reviewed-by: soumya k 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Raghavendra G 

Change-Id: Ifa6bf6f4879141f42b46830a37c1574b21b37275
BUG: 1422788
Signed-off-by: Poornima G 
Reviewed-on: https://review.gluster.org/16638
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System 
Reviewed-by: Prashanth Pai 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Raghavendra G

Increase RPC ping timeout to 180 seconds for larger clusters

2017-03-06T21:19:30+00:00

Summary:
- Large clusters explode with such a low timeout since the peer info
  exchange is serialized.

Test Plan: - Build and pushed to gfsbudev.ash3c06 where problem first observed

Reviewers: dph, moox, sshreyas

Reviewed By: sshreyas

FB-commit-id: 82f7af1

Change-Id: Id7c2f408eeb8847118e0ad53465c9fca4c6d9fb5
Signed-off-by: Kevin Vigor 
Reviewed-on: https://review.gluster.org/16857
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Shreyas Siravara 
Smoke: Gluster Build System

Update tirpc registration to "force" unregister old mapping before re-registering

2017-03-05T21:52:13+00:00

Summary: Per title

Test Plan: Run prove tests to make sure we didn't break anything

Reviewers: dph, rwareing

Reviewed By: rwareing

FB-commit-id: 78a9a0c

Change-Id: I05ed6b7c715a71e5819fbe8116e7c3146010f836
Signed-off-by: Kevin Vigor 
Reviewed-on: https://review.gluster.org/16849
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Smoke: Gluster Build System 
Reviewed-by: Shreyas Siravara

Prevent frame-timeouts from hanging syncops

2017-03-05T18:46:32+00:00

Summary:
It was observed while testing the SHD threading code, that under high loads SHD/AFR related
SyncOps & SyncTasks can actually hang/deadlock as the transport
disconnected event (for frame timeouts) never gets bubbled up correctly. Various
tests indicated the ping timeouts worked fine, while "frame timeouts"
did not. The only difference? Ping timeouts actually disconnect
the transport while frame timeouts did not. So from a high-level we
know this prevents deadlock as subsequent tests showed the deadlocks
no longer ocurred (after this change). That said, there may be some
more elegant solution. For now though, forcing a reconnect is
preferential vs hanging clients or deadlocking the SHD.

Test Plan:
It's fairly difficult to write a good prove test for this since it requires human eyes to observe if the SHD is deadlocked (I'm open to ideas). Here's the repro though:
1. Create a 3x replicated cluster on a host.
2. Set the frame-timeout low (say 2 sec)
3. Down a brick, and write a pile of files (maybe 2000)
4. Bring up the downed brick and let the SHD begin healing files
5. During the heal process, kill -STOP (hang) one of the bricks

Without this patch the SHD will be deadlocked, even though the frame timed out after 2 seconds. With the patch, the plug is pulled on the transport, a disconnect is bubbled up
to the syncop and the SHD resumes.

Reviewers: dph, meyering, cjh

Reviewed By: cjh

Subscribers: ethanr

Conflicts:
rpc/rpc-lib/src/rpc-clnt.c
FB-commit-id: c99357c

Change-Id: I344079161492b195267c2d64b6eab0b441f12ded
Signed-off-by: Kevin Vigor
Reviewed-on: https://review.gluster.org/16846
CentOS-regression: Gluster Build System
NetBSD-regression: NetBSD Build System
Smoke: Gluster Build System
Reviewed-by: Shreyas Siravara