glusterfs.git/xlators/protocol/server/src/server.c, branch v3.12.0

glusterfsd: allow subdir mount

2017-08-04T13:34:21+00:00

Changes:

1. Take subdir mount option in client (mount.gluster / glusterfsd)
2. Pass the subdir mount to server-handshake (from client-handshake)
3. Handle subdir-mount dir's lookup in server-first-lookup and handle
   all fops resolution accordingly with proper gfid of subdir
4. Change the auth/addr module to handle the multiple subdir entries
   in option, and valid parsing.

How to use the feature:

`# mount -t glusterfs $hostname:/$volname/$subdir /$mount_point`
Or
`# mount -t glusterfs $hostname:/$volname -osubdir_mount=$subdir /$mount_point`

Option can be set like:

`# gluster volume set  auth.allow "/subdir1(192.168.1.*),/(192.168.10.*),/subdir2(192.168.8.*)"`

Updates #175

> Reviewed-At: https://review.gluster.org/17141/

Change-Id: I7ea57f76ddbe6c3862cfe02e13f89e8a39719e11
Signed-off-by: Amar Tumballi 
Reviewed-on: https://review.gluster.org/17968
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

mgtm/core : use sha hash function for volfile check

2017-07-10T05:07:11+00:00

We are storing the entire volfile and using this to check
volfile change. With brick multiplexing there will be lot
of graphs per process which will increase the memory foot
print of the process. So instead of storing the entire
graph we could use sha256 and we can compare the hash to
see whether volfile change happened or not.

Also with Brick multiplexing, the direct comparison of vol
file is not correct. There are two problems.

Problem 1:

We are currently storing one single graph (the last
updated volfile) whereas, what we need is the entire
graph with all atttached bricks.

If we fix this issue, we have second problem

Problem 2:
With multiplexing we have a graph that contains multiple
bricks. But what we are checking as part of the reconfigure
is, comparing the entire graph with one single graph,
which will always fail.

Solution:
We create list in glusterfs_ctx_t that stores sha256 hash
of individual brick graphs. When a graph changes happens
we compare the stored hash and the current hash. If the
hash matches, then no need for reconfigure. Otherwise we
first do the reconfigure and then update the hash.

For now, gfapi has not changed this way. Meaning when gfapi
volfile fetch or reconfigure happens, we still store the
entire graph and compare, each memory.

This is fine, because libgfapi will not load brick graphs.
But changing the libgfapi will make the code similar in
both glusterfsd-mgmt and api. Also it helps to reduce some
memory.

Change-Id: I9df917a771a52b95622ab8f63af34ec390163a77
BUG: 1467986
Signed-off-by: Mohammed Rafi KC 
Reviewed-on: https://review.gluster.org/17709
Smoke: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee 
Reviewed-by: Amar Tumballi

Link against missed libraries to resolve symbols

2017-07-03T10:58:14+00:00

When external programs perform a dlopen("..so", RTLD_LAZY|RTLD_LOCAL)
on some shared objects like xlators, it can fail with dlerror set to
error string "undefined symbol ".

This was observed for the following shared objects: fuse.so, quota.so,
quotad.so, server.so, libgfrpc.so and socket.so

P.S: This was found while running a go program which fetches the list
of xlator options (volume_option_t) from xlator's shared object.

BUG: 1193929
Change-Id: I7b958409cf11fb67c2be32a3f85a96fb1260236b
Signed-off-by: Prashanth Pai 
Reviewed-on: https://review.gluster.org/17659
Smoke: Gluster Build System 
Reviewed-by: Amar Tumballi 
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Jeff Darcy

protocol/server: make listen backlog value as configurable

2017-06-08T15:32:30+00:00

problem:

When we call listen from protocol/server, we are giving a
hard coded valie of 10 if it is not manually given.
With multiplexing, especially when glusterd restarts all
clients may try to connect to the server at a time.
Which will result in overflowing the queue, and kernel
will complain about the errors.

Solution:

This patch will introduce a volume set command to make backlog
value as a configurable. This patch also changes the default
values for backlog from 10 to 128. This changes is only applicable
for sockets listening from protocol.

Example:

gluster volume set  transport.listen-backlog 1024

Note: 1 Brick has to be restarted to get this value in effect
      2 This changes won't be reflected in glusterd, or other
        xlators which calls listen. If you need, you have to
        add this option to the volfile.

Change-Id: I0c5a2bbf28b5db612f9979e7560e05dd82b41477
BUG: 1456405
Signed-off-by: Mohammed Rafi KC 
Reviewed-on: https://review.gluster.org/17411
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra G 
Reviewed-by: Raghavendra Talur 
Reviewed-by: Atin Mukherjee 
Reviewed-by: Niels de Vos 
Reviewed-by: Jeff Darcy

glusterfs: Not able to mount running volume after enable brick mux and stopped any volume

2017-05-31T20:43:53+00:00

Problem: After enabled brick mux if any volume has down and then try ot run mount
         with running volume , mount command is hung.

Solution: After enable brick mux server has shared one data structure server_conf
          for all associated subvolumes.After down any subvolume in some
          ungraceful manner (remove brick directory) posix xlator sends
          GF_EVENT_CHILD_DOWN event to parent xlatros and server notify
          updates the child_up to false in server_conf.When client is trying
          to communicate with server through mount it checks conf->child_up
          and it is FALSE so it throws message "translator are not yet ready".
          From this patch updated structure server_conf to save child_up status
          for xlator wise. Another improtant correction from this patch is
          cleanup threads from server side xlators after stop the volume.

BUG: 1453977
Change-Id: Ic54da3f01881b7c9429ce92cc569236eb1d43e0d
Signed-off-by: Mohit Agrawal 
Reviewed-on: https://review.gluster.org/17356
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Raghavendra Talur 
CentOS-regression: Gluster Build System 
Reviewed-by: Jeff Darcy

libglusterfs, dht, locks, glusterd: Coverity fixes

2017-02-23T12:14:33+00:00

Fix up use after free bugs and dead code

Change-Id: I8f79ed6b5108926c1fac31c147b5ecba79d10785
BUG: 1424905
Signed-off-by: Nigel Babu 
Reviewed-on: https://review.gluster.org/16666
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee 
Reviewed-by: Shyamsundar Ranganathan

rpcsvc: Add rpchdr and proghdr to iobref before submitting to transport

2017-02-16T04:09:36+00:00

Issue:
When fio is run on multiple clients (each client writes to its own files),
and meanwhile the clients does a readdirp, thus the client which did
a readdirp will now recieve the upcalls. In this scenario the client
disconnects with rpc decode failed error.

RCA:
Upcall calls rpcsvc_request_submit to submit the request to socket:
rpcsvc_request_submit currently:
rpcsvc_request_submit () {
   iobuf = iobuf_new
   iov = iobuf->ptr
   fill iobuf to contain xdrised upcall content - proghdr
   rpcsvc_callback_submit (..iov..)
   ...
   if (iobuf)
       iobuf_unref (iobuf)
}

rpcsvc_callback_submit (... iov...) {
   ...
   iobuf = iobuf_new
   iov1 = iobuf->ptr
   fill iobuf to contain xdrised rpc header - rpchdr
   msg.rpchdr = iov1
   msg.proghdr = iov
   ...
   rpc_transport_submit_request (msg)
   ...
   if (iobuf)
       iobuf_unref (iobuf)
}

rpcsvc_callback_submit assumes that once rpc_transport_submit_request()
returns the msg is written on to socket and thus the buffers(rpchdr, proghdr)
can be freed, which is not the case. In especially high workload,
rpc_transport_submit_request() may not be able to write to socket immediately
and hence adds it to its own queue and returns as successful. Thus, we have
use after free, for rpchdr and proghdr. Hence the clients gets garbage rpchdr
and proghdr and thus fails to decode the rpc, resulting in disconnect.

To prevent this, we need to add the rpchdr and proghdr to a iobref and send
it in msg:
   iobref_add (iobref, iobufs)
   msg.iobref = iobref;
The socket layer takes a ref on msg.iobref, if it cannot write to socket and
is adding to the queue. Thus we do not have use after free.

Thank You for discussing, debugging and fixing along:
Prashanth Pai 
Raghavendra G 
Rajesh Joseph 
Kotresh HR 
Mohammed Rafi KC 
Soumya Koduri 

Change-Id: Ifa6bf6f4879141f42b46830a37c1574b21b37275
BUG: 1421937
Signed-off-by: Poornima G 
Reviewed-on: https://review.gluster.org/16613
Reviewed-by: Prashanth Pai 
Smoke: Gluster Build System 
Reviewed-by: soumya k 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra G

libglusterfs+changetimerecorder: reduce log noise

2017-02-10T13:17:25+00:00

The logging about translator options is so verbose that it
significantly slows down scalability tests - sometimes even to the
point where it induces timing-related failures.  Quiet, please.

Change-Id: If0766e2a80746bba586e67e6019ff7084d68b425
Signed-off-by: Jeff Darcy 
Reviewed-on: https://review.gluster.org/16569
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan 
Reviewed-by: Milind Changire

core: run many bricks within one glusterfsd process

2017-01-31T00:13:58+00:00

This patch adds support for multiple brick translator stacks running
in a single brick server process.  This reduces our per-brick memory usage by
approximately 3x, and our appetite for TCP ports even more.  It also creates
potential to avoid process/thread thrashing, and to improve QoS by scheduling
more carefully across the bricks, but realizing that potential will require
further work.

Multiplexing is controlled by the "cluster.brick-multiplex" global option.  By
default it's off, and bricks are started in separate processes as before.  If
multiplexing is enabled, then *compatible* bricks (mostly those with the same
transport options) will be started in the same process.

Change-Id: I45059454e51d6f4cbb29a4953359c09a408695cb
BUG: 1385758
Signed-off-by: Jeff Darcy 
Reviewed-on: https://review.gluster.org/14763
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Vijay Bellur

glusterd: Add info on op-version for clients in vol status output

2017-01-12T18:20:59+00:00

Currently the `gluster volume status  clients` command
gives us the following information on clients:
1. Brick name
2. Client count for each brick
3. hostname:port for each client
4. Bytes read and written for each client

There is no information regarding op-version for each client. This
patch adds that to the output.

Change-Id: Ib2ece93ab00c234162bb92b7c67a7d86f3350a8d
BUG: 1409078
Signed-off-by: Samikshan Bairagya 
Reviewed-on: http://review.gluster.org/16303
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee