glusterfs.git/xlators/protocol, branch v3.10.6

build: miscellaneous spelling fixes

2017-04-03T01:36:51+00:00

Debian builds detected spelling issues with GlusterFS 3.10.1. Instead of
carrying the patch in the Debian sources, let's include the fixes here
too.

Change-Id: I38db6adf142f7ec247bffd47aa1e6ff1a0c49e00
Reviewed-on-master: https://review.gluster.org/16973
Reported-by: Patrick Matthäi 
BUG: 1437854
Signed-off-by: Niels de Vos 
Reviewed-on: https://review.gluster.org/16974
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System 
Reviewed-by: Zhou Zhengping 
Reviewed-by: Kaleb KEITHLEY

protocol : fix auth-allow regression

2017-03-30T14:06:31+00:00

One of the brick multiplexing patches (commit 1a95fc3) had some changes
in gf_auth () & server_setvolume () functions which caused auth-allow
feature to be broken. mount doesn't succeed even if it's part of the
auth-allow list. This fix does the following:

1. Reintroduce the peer-info data back in gf_auth () so that fnmatch has
valid input and it can decide on the result.

2. config-params dict should capture key values pairs for all the bricks
in case brick multiplexing is on. In case brick multiplexing isn't
enabled, then config-params should carry attributes from protocol/server
such that all rpc auth related attributes stay in tact in the
dictionary.

>Reviewed-on: https://review.gluster.org/16920
>Tested-by: Jeff Darcy 
>Smoke: Gluster Build System 
>NetBSD-regression: NetBSD Build System 
>CentOS-regression: Gluster Build System 
>Reviewed-by: Jeff Darcy 
>Reviewed-by: MOHIT AGRAWAL 
>(cherry picked from commit 0bd58241143e91b683a3e5c4335aabf9eed537fe)

Change-Id: I007c4c6d78620a896b8858a29459a77de8b52412
BUG: 1429117
Signed-off-by: Atin Mukherjee 
Reviewed-on: https://review.gluster.org/16967
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

protocol-client: Initialize the list_head before using

2017-03-27T14:59:42+00:00

In client3_3_readdir(p)_cbk, in case of error conditions,
it is possible that the list_head is used before initializing.
Hence move the initialization before usage.

>Reviewed-on: https://review.gluster.org/16948
>Smoke: Gluster Build System 
>NetBSD-regression: NetBSD Build System 
>CentOS-regression: Gluster Build System 
>Reviewed-by: Raghavendra G  

Change-Id: Ie58902d079fdc58416d17b5fa5f61375decb1c99
BUG: 1435946
Signed-off-by: Poornima G 
Reviewed-on: https://review.gluster.org/16949
NetBSD-regression: NetBSD Build System 
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

rpc/clnt: remove locks while notifying CONNECT/DISCONNECT

2017-03-06T16:20:33+00:00

Locking during notify was introduced as part of commit
aa22f24f5db7659387704998ae01520708869873 [1]. The fix was introduced
to fix out-of-order CONNECT/DISCONNECT events from rpc-clnt to parent
xlators [2]. However as part of handling DISCONNECT protocol/client
does unwind saved frames (with failure) waiting for responses. This
saved_frames_unwind can be a costly operation and hence ideally
shouldn't be included in the critical section of notifylock, as it
unnecessarily delays the reconnection to same brick. Also, its not a
good practise to pass control to other xlators holding a lock as it
can lead to deadlocks. So, this patch removes locking in rpc-clnt
while notifying parent xlators.

To fix [2], two changes are present in this patch:

* notify DISCONNECT before cleaning up rpc connection (same as commit
  a6b63e11b7758cf1bfcb6798, patch [3]).
* protocol/client uses rpc_clnt_cleanup_and_start, which cleans up rpc
  connection and does a start while handling a DISCONNECT event from
  rpc. Note that patch [3] was reverted as rpc_clnt_start called in
  quick_reconnect path of protocol/client didn't invoke connect on
  transport as the connection was not cleaned up _yet_ (as cleanup was
  moved post notification in rpc-clnt). This resulted in clients never
  attempting connect to bricks.

Note that one of the neater ways to fix [2] (without using locks) is
to introduce generation numbers to map CONNECT and DISCONNECTS across
epochs and ignore DISCONNECT events if they don't belong to current
epoch. However, this approach is a bit complex to implement and
requires time. So, current patch is a hacky stop-gap fix till we come
up with a more cleaner solution.

[1] http://review.gluster.org/15916
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1386626
[3] http://review.gluster.org/15681

>Change-Id: I62daeee8bb1430004e28558f6eb133efd4ccf418
>Signed-off-by: Raghavendra G 
>BUG: 1427012
>Reviewed-on: https://review.gluster.org/16784
>Smoke: Gluster Build System 
>Reviewed-by: Milind Changire 
>NetBSD-regression: NetBSD Build System 
>CentOS-regression: Gluster Build System 
(cherry picked from commit 773f32caf190af4ee48818279b6e6d3c9f2ecc79)

Change-Id: I62daeee8bb1430004e28558f6eb133efd4ccf418
Signed-off-by: Raghavendra G 
BUG: 1428670
Reviewed-on: https://review.gluster.org/16835
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

rpcsvc: Add rpchdr and proghdr to iobref before submitting to transport

2017-02-16T14:46:53+00:00

Backport of https://review.gluster.org/16613

Issue:
When fio is run on multiple clients (each client writes to its own files),
and meanwhile the clients does a readdirp, thus the client which did
a readdirp will now recieve the upcalls. In this scenario the client
disconnects with rpc decode failed error.

RCA:
Upcall calls rpcsvc_request_submit to submit the request to socket:
rpcsvc_request_submit currently:
rpcsvc_request_submit () {
   iobuf = iobuf_new
   iov = iobuf->ptr
   fill iobuf to contain xdrised upcall content - proghdr
   rpcsvc_callback_submit (..iov..)
   ...
   if (iobuf)
       iobuf_unref (iobuf)
}

rpcsvc_callback_submit (... iov...) {
   ...
   iobuf = iobuf_new
   iov1 = iobuf->ptr
   fill iobuf to contain xdrised rpc header - rpchdr
   msg.rpchdr = iov1
   msg.proghdr = iov
   ...
   rpc_transport_submit_request (msg)
   ...
   if (iobuf)
       iobuf_unref (iobuf)
}

rpcsvc_callback_submit assumes that once rpc_transport_submit_request()
returns the msg is written on to socket and thus the buffers(rpchdr, proghdr)
can be freed, which is not the case. In especially high workload,
rpc_transport_submit_request() may not be able to write to socket immediately
and hence adds it to its own queue and returns as successful. Thus, we have
use after free, for rpchdr and proghdr. Hence the clients gets garbage rpchdr
and proghdr and thus fails to decode the rpc, resulting in disconnect.

To prevent this, we need to add the rpchdr and proghdr to a iobref and send
it in msg:
   iobref_add (iobref, iobufs)
   msg.iobref = iobref;
The socket layer takes a ref on msg.iobref, if it cannot write to socket and
is adding to the queue. Thus we do not have use after free.

Thank You for discussing, debugging and fixing along:
Prashanth Pai 
Raghavendra G 
Rajesh Joseph 
Kotresh HR 
Mohammed Rafi KC 
Soumya Koduri 

> Reviewed-on: https://review.gluster.org/16613
> Reviewed-by: Prashanth Pai 
> Smoke: Gluster Build System 
> Reviewed-by: soumya k 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Raghavendra G 

Change-Id: Ifa6bf6f4879141f42b46830a37c1574b21b37275
BUG: 1422363
Signed-off-by: Poornima G 
Reviewed-on: https://review.gluster.org/16623
NetBSD-regression: NetBSD Build System 
Smoke: Gluster Build System 
Reviewed-by: Prashanth Pai 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

protocol/client: Fix double free of client fdctx destroy

2017-02-15T12:33:24+00:00

This patch fixes the race between fd re-open code and fd release code,
both of which free the fd context due to a race in certain variable
checks as explained below:

1. client process (shd in the case of this BZ) sends an opendir to its
children (client xlators) which send the fop to the bricks to get a valid fd.

2. Client xlator loses connection to the brick. fdctx->remotefd is -1

3. Client re-establishes connection. After handshake, it reopens the dir
and sets fdctx->remotefd to a valid fd in client3_3_reopendir_cbk().

4. Meanwhile, shd sends a fd unref after it is done with the opendir.
This triggers a releasedir (since fd->refcount becomes 0).

5. client3_3_releasedir() sees that fdctx-->remotefd is a valid number
(i.e not -1), sets fdctx->released=1 and calls  client_fdctx_destroy()

6. As a continuation of step3, client_reopen_done() is called by
client3_3_reopendir_cbk(), which sees that fdctx->released==1 and
again calls client_fdctx_destroy().

Depending on when step-5 does GF_FREE(fdctx), we may crash at any place in
step-6 in client3_3_reopendir_cbk() when it tries to access
fdctx->{whatever}.


> Reviewed-on: https://review.gluster.org/16521
> CentOS-regression: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> Smoke: Gluster Build System 
> Reviewed-by: Pranith Kumar Karampuri 

(cherry picked from commit 25fc74f9d1f2b1e7bab76485a99f27abadd10b7b)
Change-Id: Ia50873d11763e084e41d2a1f4d53715438e5e947
BUG: 1422350
Signed-off-by: Ravishankar N 
Reviewed-on: https://review.gluster.org/16619
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

libglusterfs+changetimerecorder: reduce log noise

2017-02-10T19:23:08+00:00

The logging about translator options is so verbose that it
significantly slows down scalability tests - sometimes even to the
point where it induces timing-related failures.  Quiet, please.

Backport of:
> Change-Id: If0766e2a80746bba586e67e6019ff7084d68b425
> Reviewed-on: https://review.gluster.org/16569

Change-Id: I65117e69427ce1d6a2490832c5c9ab57ee29004e
Signed-off-by: Jeff Darcy 
Reviewed-on: https://review.gluster.org/16599
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

core: run many bricks within one glusterfsd process

2017-02-02T00:54:58+00:00

This patch adds support for multiple brick translator stacks running in
a single brick server process.  This reduces our per-brick memory usage
by approximately 3x, and our appetite for TCP ports even more.  It also
creates potential to avoid process/thread thrashing, and to improve QoS
by scheduling more carefully across the bricks, but realizing that
potential will require further work.

Multiplexing is controlled by the "cluster.brick-multiplex" global
option.  By default it's off, and bricks are started in separate
processes as before.  If multiplexing is enabled, then *compatible*
bricks (mostly those with the same transport options) will be started in
the same process.

Backport of:
> Change-Id: I45059454e51d6f4cbb29a4953359c09a408695cb
> BUG: 1385758
> Reviewed-on: https://review.gluster.org/14763

Change-Id: I4bce9080f6c93d50171823298fdf920258317ee8
BUG: 1418091
Signed-off-by: Jeff Darcy 
Reviewed-on: https://review.gluster.org/16496
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

glusterd: Add info on op-version for clients in vol status output

2017-01-12T18:20:59+00:00

Currently the `gluster volume status  clients` command
gives us the following information on clients:
1. Brick name
2. Client count for each brick
3. hostname:port for each client
4. Bytes read and written for each client

There is no information regarding op-version for each client. This
patch adds that to the output.

Change-Id: Ib2ece93ab00c234162bb92b7c67a7d86f3350a8d
BUG: 1409078
Signed-off-by: Samikshan Bairagya 
Reviewed-on: http://review.gluster.org/16303
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee

socket: socket disconnect should wait for poller thread exit

2016-12-22T04:49:19+00:00

When SSL is enabled or if "transport.socket.own-thread" option is set
then socket_poller is run as different thread. Currently during
disconnect or PARENT_DOWN scenario we don't wait for this thread
to terminate. PARENT_DOWN will disconnect the socket layer and
cleanup resources used by socket_poller.

Therefore before disconnect we should wait for poller thread to exit.

Change-Id: I71f984b47d260ffd979102f180a99a0bed29f0d6
BUG: 1404181
Signed-off-by: Rajesh Joseph 
Reviewed-on: http://review.gluster.org/16141
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Kaushal M 
Reviewed-by: Raghavendra Talur 
Reviewed-by: Raghavendra G