glusterfs.git/xlators/protocol/server/src/server.c, branch v3.12.8

glusterfsd: Memleak in glusterfsd process while brick mux is on

2018-04-06T12:47:34+00:00

Problem: At the time of stopping the volume while brick multiplex is
         enabled memory is not cleanup from all server side xlators.

Solution: To cleanup memory for all server side xlators call fini
          in glusterfs_handle_terminate after send GF_EVENT_CLEANUP
          notification to top xlator.

> BUG: 1544090
> Signed-off-by: Mohit Agrawal 
> (cherry picked from commit 7c3cc485054e4ede1efb358552135b432fb7047a)

>Note: Run all test-cases in separate build (https://review.gluster.org/19574)
>      with same patch after enable brick mux forcefully, all test cases are
>      passed.

BUG: 1549473
Signed-off-by: Mohit Agrawal 
Change-Id: Ia10dc7f2605aa50f2b90b3fe4eb380ba9299e2fc

protocol-auth: use the proper validation method

2017-10-25T11:35:25+00:00

Currently, server protocol's init and glusterd's option
validation methods are different, causing an issue. They
should be same for having consistent behavior

Change-Id: Ibbf9a18c7192b2d77f9b7675ae7da9b8d2fe5de4
BUG: 1501315
Signed-off-by: Amar Tumballi

rpc: destroy transport after client_t

2017-09-07T06:49:51+00:00

Problem:
1. Ref counting increment on the client_t object is done in
   rpcsvc_request_init() which is incorrect.
2. Ref not taken when delegating to grace_time_handler()

Solution:
1. Only fop requests which require processing down the graph via
   stack 'frames' now ref count the request in get_frame_from_request()
2. Take ref on client_t object in server_rpc_notify() but avoid
   dropping in RPCSVC_EVENT_TRANSPORT_DESRTROY. Drop the ref
   unconditionally when exiting out of grace_time_handler().
   Also, avoid dropping ref on client_t in
   RPCSVC_EVENT_TRANSPORT_DESTROY when ref mangement as been
   delegated to grace_time_handler()

mainline:
> BUG: 1481600
> Reported-by: Raghavendra G 
> Signed-off-by: Milind Changire 
> Reviewed-on: https://review.gluster.org/17982
> Tested-by: Raghavendra G 
> Reviewed-by: Raghavendra G 
> CentOS-regression: Gluster Build System 
> Smoke: Gluster Build System 
(cherry picked from commit 24b95089a18a6a40e7703cb344e92025d67f3086)

Change-Id: Ic16246bebc7ea4490545b26564658f4b081675e4
BUG: 1487033
Reported-by: Raghavendra G 
Signed-off-by: Milind Changire 
Reviewed-on: https://review.gluster.org/18156
Smoke: Gluster Build System 
Reviewed-by: Raghavendra G 
Tested-by: Raghavendra G 
CentOS-regression: Gluster Build System

glusterfsd: allow subdir mount

2017-08-04T13:34:21+00:00

Changes:

1. Take subdir mount option in client (mount.gluster / glusterfsd)
2. Pass the subdir mount to server-handshake (from client-handshake)
3. Handle subdir-mount dir's lookup in server-first-lookup and handle
   all fops resolution accordingly with proper gfid of subdir
4. Change the auth/addr module to handle the multiple subdir entries
   in option, and valid parsing.

How to use the feature:

`# mount -t glusterfs $hostname:/$volname/$subdir /$mount_point`
Or
`# mount -t glusterfs $hostname:/$volname -osubdir_mount=$subdir /$mount_point`

Option can be set like:

`# gluster volume set  auth.allow "/subdir1(192.168.1.*),/(192.168.10.*),/subdir2(192.168.8.*)"`

Updates #175

> Reviewed-At: https://review.gluster.org/17141/

Change-Id: I7ea57f76ddbe6c3862cfe02e13f89e8a39719e11
Signed-off-by: Amar Tumballi 
Reviewed-on: https://review.gluster.org/17968
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

mgtm/core : use sha hash function for volfile check

2017-07-10T05:07:11+00:00

We are storing the entire volfile and using this to check
volfile change. With brick multiplexing there will be lot
of graphs per process which will increase the memory foot
print of the process. So instead of storing the entire
graph we could use sha256 and we can compare the hash to
see whether volfile change happened or not.

Also with Brick multiplexing, the direct comparison of vol
file is not correct. There are two problems.

Problem 1:

We are currently storing one single graph (the last
updated volfile) whereas, what we need is the entire
graph with all atttached bricks.

If we fix this issue, we have second problem

Problem 2:
With multiplexing we have a graph that contains multiple
bricks. But what we are checking as part of the reconfigure
is, comparing the entire graph with one single graph,
which will always fail.

Solution:
We create list in glusterfs_ctx_t that stores sha256 hash
of individual brick graphs. When a graph changes happens
we compare the stored hash and the current hash. If the
hash matches, then no need for reconfigure. Otherwise we
first do the reconfigure and then update the hash.

For now, gfapi has not changed this way. Meaning when gfapi
volfile fetch or reconfigure happens, we still store the
entire graph and compare, each memory.

This is fine, because libgfapi will not load brick graphs.
But changing the libgfapi will make the code similar in
both glusterfsd-mgmt and api. Also it helps to reduce some
memory.

Change-Id: I9df917a771a52b95622ab8f63af34ec390163a77
BUG: 1467986
Signed-off-by: Mohammed Rafi KC 
Reviewed-on: https://review.gluster.org/17709
Smoke: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee 
Reviewed-by: Amar Tumballi

Link against missed libraries to resolve symbols

2017-07-03T10:58:14+00:00

When external programs perform a dlopen("..so", RTLD_LAZY|RTLD_LOCAL)
on some shared objects like xlators, it can fail with dlerror set to
error string "undefined symbol ".

This was observed for the following shared objects: fuse.so, quota.so,
quotad.so, server.so, libgfrpc.so and socket.so

P.S: This was found while running a go program which fetches the list
of xlator options (volume_option_t) from xlator's shared object.

BUG: 1193929
Change-Id: I7b958409cf11fb67c2be32a3f85a96fb1260236b
Signed-off-by: Prashanth Pai 
Reviewed-on: https://review.gluster.org/17659
Smoke: Gluster Build System 
Reviewed-by: Amar Tumballi 
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Jeff Darcy

protocol/server: make listen backlog value as configurable

2017-06-08T15:32:30+00:00

problem:

When we call listen from protocol/server, we are giving a
hard coded valie of 10 if it is not manually given.
With multiplexing, especially when glusterd restarts all
clients may try to connect to the server at a time.
Which will result in overflowing the queue, and kernel
will complain about the errors.

Solution:

This patch will introduce a volume set command to make backlog
value as a configurable. This patch also changes the default
values for backlog from 10 to 128. This changes is only applicable
for sockets listening from protocol.

Example:

gluster volume set  transport.listen-backlog 1024

Note: 1 Brick has to be restarted to get this value in effect
      2 This changes won't be reflected in glusterd, or other
        xlators which calls listen. If you need, you have to
        add this option to the volfile.

Change-Id: I0c5a2bbf28b5db612f9979e7560e05dd82b41477
BUG: 1456405
Signed-off-by: Mohammed Rafi KC 
Reviewed-on: https://review.gluster.org/17411
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra G 
Reviewed-by: Raghavendra Talur 
Reviewed-by: Atin Mukherjee 
Reviewed-by: Niels de Vos 
Reviewed-by: Jeff Darcy

glusterfs: Not able to mount running volume after enable brick mux and stopped any volume

2017-05-31T20:43:53+00:00

Problem: After enabled brick mux if any volume has down and then try ot run mount
         with running volume , mount command is hung.

Solution: After enable brick mux server has shared one data structure server_conf
          for all associated subvolumes.After down any subvolume in some
          ungraceful manner (remove brick directory) posix xlator sends
          GF_EVENT_CHILD_DOWN event to parent xlatros and server notify
          updates the child_up to false in server_conf.When client is trying
          to communicate with server through mount it checks conf->child_up
          and it is FALSE so it throws message "translator are not yet ready".
          From this patch updated structure server_conf to save child_up status
          for xlator wise. Another improtant correction from this patch is
          cleanup threads from server side xlators after stop the volume.

BUG: 1453977
Change-Id: Ic54da3f01881b7c9429ce92cc569236eb1d43e0d
Signed-off-by: Mohit Agrawal 
Reviewed-on: https://review.gluster.org/17356
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Raghavendra Talur 
CentOS-regression: Gluster Build System 
Reviewed-by: Jeff Darcy

libglusterfs, dht, locks, glusterd: Coverity fixes

2017-02-23T12:14:33+00:00

Fix up use after free bugs and dead code

Change-Id: I8f79ed6b5108926c1fac31c147b5ecba79d10785
BUG: 1424905
Signed-off-by: Nigel Babu 
Reviewed-on: https://review.gluster.org/16666
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee 
Reviewed-by: Shyamsundar Ranganathan

rpcsvc: Add rpchdr and proghdr to iobref before submitting to transport

2017-02-16T04:09:36+00:00

Issue:
When fio is run on multiple clients (each client writes to its own files),
and meanwhile the clients does a readdirp, thus the client which did
a readdirp will now recieve the upcalls. In this scenario the client
disconnects with rpc decode failed error.

RCA:
Upcall calls rpcsvc_request_submit to submit the request to socket:
rpcsvc_request_submit currently:
rpcsvc_request_submit () {
   iobuf = iobuf_new
   iov = iobuf->ptr
   fill iobuf to contain xdrised upcall content - proghdr
   rpcsvc_callback_submit (..iov..)
   ...
   if (iobuf)
       iobuf_unref (iobuf)
}

rpcsvc_callback_submit (... iov...) {
   ...
   iobuf = iobuf_new
   iov1 = iobuf->ptr
   fill iobuf to contain xdrised rpc header - rpchdr
   msg.rpchdr = iov1
   msg.proghdr = iov
   ...
   rpc_transport_submit_request (msg)
   ...
   if (iobuf)
       iobuf_unref (iobuf)
}

rpcsvc_callback_submit assumes that once rpc_transport_submit_request()
returns the msg is written on to socket and thus the buffers(rpchdr, proghdr)
can be freed, which is not the case. In especially high workload,
rpc_transport_submit_request() may not be able to write to socket immediately
and hence adds it to its own queue and returns as successful. Thus, we have
use after free, for rpchdr and proghdr. Hence the clients gets garbage rpchdr
and proghdr and thus fails to decode the rpc, resulting in disconnect.

To prevent this, we need to add the rpchdr and proghdr to a iobref and send
it in msg:
   iobref_add (iobref, iobufs)
   msg.iobref = iobref;
The socket layer takes a ref on msg.iobref, if it cannot write to socket and
is adding to the queue. Thus we do not have use after free.

Thank You for discussing, debugging and fixing along:
Prashanth Pai 
Raghavendra G 
Rajesh Joseph 
Kotresh HR 
Mohammed Rafi KC 
Soumya Koduri 

Change-Id: Ifa6bf6f4879141f42b46830a37c1574b21b37275
BUG: 1421937
Signed-off-by: Poornima G 
Reviewed-on: https://review.gluster.org/16613
Reviewed-by: Prashanth Pai 
Smoke: Gluster Build System 
Reviewed-by: soumya k 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra G