| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Set names to threads on creation for easier
debugging.
Output of top -H -p <PID-OF-GLUSTERFSD>
Before:
19773 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19774 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19775 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19776 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19777 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19778 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19779 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19780 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19781 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19782 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19783 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19784 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19785 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd
19786 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd
19787 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd
19789 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19790 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
25178 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
5398 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
7881 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
After:
19773 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19774 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustertimer
19775 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19776 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustermemsweep
19777 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustersproc0
19778 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustersproc1
19779 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll0
19780 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusteridxwrker
19781 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusteriotwr0
19782 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterbrssign
19783 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterbrswrker
19784 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterclogecon
19785 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd0
19786 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd1
19787 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd2
19789 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixjan
19790 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixfsy
25178 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll1
5398 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll2
7881 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixhc
Change-Id: Id5f333755c1ba168a2ffaa4fce6e71c375e10703
BUG: 1254002
Updates: #271
Signed-off-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-on: https://review.gluster.org/11926
Reviewed-by: Prashanth Pai <ppai@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Program
Since poller thread bears the brunt of execution till the request is
handed over to io-threads, poller thread experiencies lock
contention(s) in the control flow till io-threads, which slows it
down. This delay invariably affects reading ping requests from network
and responding to them, resulting in increased ping latencies, which
sometimes results in a ping-timer-expiry on client leading to
disconnect of transport. So, this patch aims to free up poller thread
from executing code of Glusterfs Program. We do this by making
* Glusterfs Program registering itself asking rpcsvc to execute its
actors in its own threads.
* GF-DUMP Program registering itself asking rpcsvc to _NOT_ execute
its actors in its own threads. Otherwise program's ownthreads become
bottleneck in processing ping traffic. This means that poller thread
reads a ping packet, invokes its actor and hands the response msg to
transport queue.
Change-Id: I526268c10bdd5ef93f322a4f95385137550a6a49
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
BUG: 1421938
Reviewed-on: https://review.gluster.org/17105
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
init_openssl_mt() wasn't explicitly invoked and was run implicitly before
dlopen() returned as it was tagged as __attribute__ ((constructor)). This
function used to call GF_CALLOC() which wasn't available or initialized
when socket.so is dlopen()ed by an external program. The program used to
crash with SIGSEGV as follows:
0x00007ffff5efe1ad in __gf_calloc (nmemb=41, size=40, type=158, typestr=0x7ffff63eb3d6 "gf_sock_mt_lock_array")
at mem-pool.c:109
0x00007ffff63e6acf in init_openssl_mt () at socket.c:4016
0x00007ffff7de90da in call_init.part () from /lib64/ld-linux-x86-64.so.2
0x00007ffff7de91eb in _dl_init () from /lib64/ld-linux-x86-64.so.2
0x00007ffff7dedde1 in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
0x00007ffff7de8f84 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
0x00007ffff7ded339 in _dl_open () from /lib64/ld-linux-x86-64.so.2
This change moves call to init_openssl_mt() from being a constructor function
to the init() function and introduces fini_openssl_mt() which cleans up
resources (called in destructor).
BUG: 1193929
Change-Id: Iab690897ec34e24c33f6b43f8d8d9f8fd75ac607
Signed-off-by: Prashanth Pai <ppai@redhat.com>
Reviewed-on: https://review.gluster.org/17753
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When external programs perform a dlopen("..so", RTLD_LAZY|RTLD_LOCAL)
on some shared objects like xlators, it can fail with dlerror set to
error string "undefined symbol <some-type>".
This was observed for the following shared objects: fuse.so, quota.so,
quotad.so, server.so, libgfrpc.so and socket.so
P.S: This was found while running a go program which fetches the list
of xlator options (volume_option_t) from xlator's shared object.
BUG: 1193929
Change-Id: I7b958409cf11fb67c2be32a3f85a96fb1260236b
Signed-off-by: Prashanth Pai <ppai@redhat.com>
Reviewed-on: https://review.gluster.org/17659
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The most common pattern, both in our code and elsewhere, is this:
struct _xyz {
...
};
typedef struct _xyz xyz_t;
These exceptions - especially call_frame/call_stack - have been slowing
down code navigation for years. By converging on a single pattern,
navigating from xyz_t in code to the actual definition of struct _xyz
(i.e. without having to visit the typedef first) might even be
automatable.
Change-Id: I0e5dd1f51f98e000173c62ef4ddc5b21d9ec44ed
Signed-off-by: Jeff Darcy <jdarcy@fb.com>
Reviewed-on: https://review.gluster.org/17650
Smoke: Gluster Build System <jenkins@build.gluster.org>
Tested-by: Jeff Darcy <jeff@pl.atyp.us>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
problem:
When we call listen from protocol/server, we are giving a
hard coded valie of 10 if it is not manually given.
With multiplexing, especially when glusterd restarts all
clients may try to connect to the server at a time.
Which will result in overflowing the queue, and kernel
will complain about the errors.
Solution:
This patch will introduce a volume set command to make backlog
value as a configurable. This patch also changes the default
values for backlog from 10 to 128. This changes is only applicable
for sockets listening from protocol.
Example:
gluster volume set <volname> transport.listen-backlog 1024
Note: 1 Brick has to be restarted to get this value in effect
2 This changes won't be reflected in glusterd, or other
xlators which calls listen. If you need, you have to
add this option to the volfile.
Change-Id: I0c5a2bbf28b5db612f9979e7560e05dd82b41477
BUG: 1456405
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: https://review.gluster.org/17411
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
stopped any volume
Problem: After enabled brick mux if any volume has down and then try ot run mount
with running volume , mount command is hung.
Solution: After enable brick mux server has shared one data structure server_conf
for all associated subvolumes.After down any subvolume in some
ungraceful manner (remove brick directory) posix xlator sends
GF_EVENT_CHILD_DOWN event to parent xlatros and server notify
updates the child_up to false in server_conf.When client is trying
to communicate with server through mount it checks conf->child_up
and it is FALSE so it throws message "translator are not yet ready".
From this patch updated structure server_conf to save child_up status
for xlator wise. Another improtant correction from this patch is
cleanup threads from server side xlators after stop the volume.
BUG: 1453977
Change-Id: Ic54da3f01881b7c9429ce92cc569236eb1d43e0d
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Reviewed-on: https://review.gluster.org/17356
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In socket reconfigure, reconfigurations are doing with old
dict values. It should be with new reconfigured dict values
Change-Id: Iac5ad4382fe630806af14c99bb7950a288756a87
BUG: 1456405
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: https://review.gluster.org/17412
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: socket poller is throwing Input/Output messages during volume operation
Solution: Update the code in socket_connect function before call
socket_spwan.
BUG: 1450559
Change-Id: I5f275fe7a4b730b16d7b0a0407e76288b07ceaef
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Reviewed-on: https://review.gluster.org/17280
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Zhou Zhengping <johnzzpcrystal@gmail.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
reading the entire rpc message from the wire
Currently socket is added back for future events after higher layers
(rpc, xlators etc) have processed the message. If message processing
involves signficant delay (as in writev replies processed by Erasure
Coding), performance takes hit. Hence this patch modifies
transport/socket to add back the socket for polling of events
immediately after reading the entire rpc message, but before
notification to higher layers.
credits: Thanks to "Kotresh Hiremath Ravishankar"
<khiremat@redhat.com> for assitance in fixing a regression in
bitrot caused by this patch.
Change-Id: I04b6b9d0b51a1cfb86ecac3c3d87a5f388cf5800
BUG: 1448364
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: https://review.gluster.org/15036
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: If an error has occured with rdma_create_id() in gf_rdma_connect(),
process will jump to the 'unlock' label and then call gf_rdma_teardown()
which call __gf_rdma_teardown().
Presently, __gf_rdma_teardown() checks InifiniBand QP with peer->cm_id->qp!
Unfortunately, cm_id is not allocated and will be crushed in this situation :)
Solution: If 'this->private->peer->cm_id' member is null, do not check
'this->private->peer->cm_id->qp'.
Change-Id: Ie321b8cf175ef4f1bdd9733d73840f03ddff8c3b
BUG: 1449495
Signed-off-by: Ji-Hyeon Gim <potatogim@potatogim.net>
Reviewed-on: https://review.gluster.org/17249
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Prashanth Pai <ppai@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Ji-Hyeon Gim
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I57ad970411db1ccd3d2c56c504c7da9cc221051f
BUG: 1448692
Signed-off-by: Zhou Zhengping <johnzzpcrystal@gmail.com>
Reviewed-on: https://review.gluster.org/17198
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
They snuck in with the HALO patch (07cc8679c)
Change-Id: I8ced6cbb0b49554fc9d348c453d4d5da00f981f6
BUG: 1447953
Signed-off-by: Kaushal M <kaushal@redhat.com>
Reviewed-on: https://review.gluster.org/17174
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When running 'make cscope' on a clean configured tree (before building
anything), it fails because of missing files unders rpc/xdr/src/. These
files are symlinked from generated files in the rpc/xdr/gen/ directory.
In order to create the symlinks for 'make cscope', a target needs to
specify how these symlinks are created.
Change-Id: I473c90e10d915ee438425cf0f806c0531b9f582a
BUG: 1447966
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: https://review.gluster.org/17176
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Spurious __gf_free() assertion failures seen all over the place with
header->magic being overwritten when running port probing tests with
'nmap'
Solution:
Fix sequence of:
1. add accept()ed socket connection fd to epoll set
2. add newly created rpc_transport_t object in RPCSVC service list
Correct sequence is #2 followed by #1.
Reason:
Adding new fd returned by accept() to epoll set causes an epoll_wait()
to return immediately with a POLLIN event. This races ahead to a readv()
which returms with errno:104 (Connection reset by peer) during port
probing using 'nmap'. The error is then handled by POLLERR code to
remove the new transport object from RPCSVC service list and later
unref and destroy the rpc transport object.
socket_server_event_handler() then catches up with registering the
unref'd/destroyed rpc transport object. This is later manifest as
assertion failures in __gf_free() with the header->magic field botched
due to invalid address references.
All this does not result in a Segmentation Fault since the address
space continues to be mapped into the process and pages still being
referenced elsewhere.
As a further note:
This race happens only in accept() codepath. Only in this codepath,
the notify will be referring to two transports:
1, listener transport and
2. newly accepted transport
All other notify refer to only one transport i.e., the transport/socket
on which the event is received. Since epoll is ONE_SHOT another event won't
arrive on the same socket till the current event is processed. However, in
the accept() codepath, the current event - ACCEPT - and the new event -
POLLIN/POLLER - arrive on two different sockets:
1. ACCEPT on listener socket and
2. POLLIN/POLLERR on newly registered socket.
Also, note that these two events are handled different thread contexts.
Cleanup:
Critical section in socket_server_event_handler() has been removed.
Instead, an additional ref on new_trans has been used to avoid ref/unref
race when notifying RPCSVC.
Change-Id: I4417924bc9e6277d24bd1a1c5bcb7445bcb226a3
BUG: 1438966
Signed-off-by: Milind Changire <mchangir@redhat.com>
Reviewed-on: https://review.gluster.org/17139
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Halo Geo-replication is a feature which allows Gluster or NFS clients to write
locally to their region (as defined by a latency "halo" or threshold if you
like), and have their writes asynchronously propagate from their origin to the
rest of the cluster. Clients can also write synchronously to the cluster
simply by specifying a halo-latency which is very large (e.g. 10seconds) which
will include all bricks.
In other words, it allows clients to decide at mount time if they desire
synchronous or asynchronous IO into a cluster and the cluster can support both
of these modes to any number of clients simultaneously.
There are a few new volume options due to this feature:
halo-shd-latency: The threshold below which self-heal daemons will
consider children (bricks) connected.
halo-nfsd-latency: The threshold below which NFS daemons will consider
children (bricks) connected.
halo-latency: The threshold below which all other clients will
consider children (bricks) connected.
halo-min-replicas: The minimum number of replicas which are to
be enforced regardless of latency specified in the above 3 options.
If the number of children falls below this threshold the next
best (chosen by latency) shall be swapped in.
New FUSE mount options:
halo-latency & halo-min-replicas: As descripted above.
This feature combined with multi-threaded SHD support (D1271745) results in
some pretty cool geo-replication possibilities.
Operational Notes:
- Global consistency is gaurenteed for synchronous clients, this is provided by
the existing entry-locking mechanism.
- Asynchronous clients on the other hand and merely consistent to their region.
Writes & deletes will be protected via entry-locks as usual preventing
concurrent writes into files which are undergoing replication. Read operations
on the other hand should never block.
- Writes are allowed from _any_ region and propagated from the origin to all
other regions. The take away from this is care should be taken to ensure
multiple writers do not write the same files resulting in a gfid split-brain
which will require resolution via split-brain policies (majority, mtime &
size). Recommended method for preventing this is using the nfs-auth feature to
define which region for each share has RW permissions, tiers not in the origin
region should have RO perms.
TODO:
- Synchronous clients (including the SHD) should choose clients from their own
region as preferred sources for reads. Most of the plumbing is in place for
this via the child_latency array.
- Better GFID split brain handling & better dent type split brain handling
(i.e. create a trash can and move the offending files into it).
- Tagging in addition to latency as a means of defining which children you wish
to synchronously write to
Test Plan:
- The usual suspects, clang, gcc w/ address sanitizer & valgrind
- Prove tests
Reviewers: jackl, dph, cjh, meyering
Reviewed By: meyering
Subscribers: ethanr
Differential Revision: https://phabricator.fb.com/D1272053
Tasks: 4117827
Change-Id: I694a9ab429722da538da171ec528406e77b5e6d1
BUG: 1428061
Signed-off-by: Kevin Vigor <kvigor@fb.com>
Reviewed-on: http://review.gluster.org/16099
Reviewed-on: https://review.gluster.org/16177
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
The following line of code intended to remove pmap entry for the
connection during disconnects:
pmap_registry_remove (this, 0, NULL, GF_PMAP_PORT_NONE, xprt);
However, no pmap entry will have it's type set to GF_PMAP_PORT_NONE
at any point in time. So a call to pmap_registry_search_by_xprt() in
pmap_registry_remove() will always fail to find a match.
Fix:
Optionally ignore pmap entry's type in pmap_registry_search_by_xprt().
BUG: 1193929
Change-Id: I705f101739ab1647ff52a92820d478354407264a
Signed-off-by: Prashanth Pai <ppai@redhat.com>
Reviewed-on: https://review.gluster.org/17129
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit optionally adds client details corresponding to the
locally running bricks to the get-state output. Since getting
the client details involves sending RPC requests to the respective
local bricks, this is a relatively more costly operation. These
client details would be added to the get-state output only if the
get-state command is invoked with the 'detail' option.
This commit therefore also changes the get-state CLI usage. The
modified usage is as follows:
# gluster get-state [<daemon>] [[odir </path/to/output/dir/>] \
[file <filename>]] [detail]
Change-Id: I42cd4ef160f9e96d55a08a10d32c8ba44e4cd3d8
BUG: 1431183
Signed-off-by: Samikshan Bairagya <samikshan@gmail.com>
Reviewed-on: https://review.gluster.org/17003
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Default values for handling socket timeouts for brick responses are
insufficient for aggressive applications such as databases.
Solution:
Add 1:1 gluster options for keepalive, keepalive-idle,
keepalive-interval and keepalive-timeout as per the socket level options
available as per tcp(7) man page.
Default values for options are NOT agressive and continue to be values
which result in default timeout when only the keep alive option is
turned on.
These options are Linux specific and will not be applicable to the
*BSDs.
Change-Id: I2a08ecd949ca8ceb3e090d336ad634341e2dbf14
BUG: 1426059
Signed-off-by: Milind Changire <mchangir@redhat.com>
Reviewed-on: https://review.gluster.org/16731
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I0487337223a54a52e73088cb6dd812ce6d47178d
BUG: 1429696
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: https://review.gluster.org/16994
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Zhou Zhengping <johnzzpcrystal@gmail.com>
Tested-by: Zhou Zhengping <johnzzpcrystal@gmail.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem :
- Deploy gluster on 2 nodes, one brick each, one volume replicated
- Create a snapshot
- Lose one server
- Add a replacement peer and new brick with a new IP address
- replace-brick the missing brick onto the new server
(wait for replication to finish)
- peer detach the old server
- after doing above steps, glusterd fails to restart.
Solution:
With the fix detach peer will populate an error : "N2 is part of
existing snapshots. Remove those snapshots before proceeding".
While doing so we force user to stay with that peer or to delete
all snapshots.
Change-Id: I3699afb9b2a5f915768b77f885e783bd9b51818c
BUG: 1322145
Signed-off-by: Gaurav Yadav <gyadav@redhat.com>
Reviewed-on: https://review.gluster.org/16907
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using a makebomb, on f23 at least, blows up when generating the
xdr headers and stubs. (Works reliably on f25 though, go figure.)
This change appears to mitigate the race on f23.
BUG: 1429696
Change-Id: Icca2de035255a7759563bf06d853950dddc5724d
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: https://review.gluster.org/16954
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using a makebomb, on f23 at least, blows up when generating the
xdr headers and stubs. (Works reliably on f25 though, go figure.)
This change appears to mitigate the race on f23.
Change-Id: I006066f0e7c3f8b65189f97c70089f3422e3e08b
BUG: 1429696
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: https://review.gluster.org/16941
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Tested-by: Anoop C S <anoopcs@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous change to remove the xdrgen script exposed (or
created) a recursive build dependency: libglusterfs needs the
generated headers, and libgfxdr should be linked with libglusterfs
for GF_FREE/__gf_free.
(Much grumbling about libglusterfs being the kitchen sink of gluster
elided. This would not be necessary if there were two or more libs,
a gluster "runtime" library with common gluster code shared by the
xlators and daemons, and a utility library with things like the
rbtree, memory allocation, and whatnot.)
So. Link at build time or link at runtime? For truth-and-beauty, link
with libglusterfs.so at build time. Without truth-and-beauty, don't
link with libglusterfs and rely on the other things that link with
libglusterfs to provide resolution of __gf_free().
Truth-and-beauty it is. But how to generate the headers first, then
build libglusterfs, then come back and build libgfxdr? Autotools is a
maze of twisty passages, all different. Things that work with gnu
make on linux don't work with the BSD make. Finally I hit on this
solution. Add a shadow directory where make only generates the headers,
then build libglusterfs using the generated headers, and finally build
libgfxdr and link with libglusterfs.
See original BZ 1330604
change http://review.gluster.org/14085
Change-Id: Iede8a30e3103176cb8f0b054885f30fcb352492b
BUG: 1429696
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: https://review.gluster.org/16873
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 086436a introduced generation number (cleanup_gen) to ensure that
rpc layer doesn't end up cleaning up the connection object if
application layer has already destroyed it. Bumping up cleanup_gen was
done only in rpc_clnt_connection_cleanup (). However the same is needed
in rpc_clnt_reconnect_cleanup () too as with out it if the object gets destroyed
through the reconnect event in the application layer, rpc layer will
still end up in trying to delete the object resulting into double free
and crash.
Peer probing an invalid host/IP was the basic test to catch this issue.
Change-Id: Id5332f3239cb324cead34eb51cf73d426733bd46
BUG: 1433578
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-on: https://review.gluster.org/16914
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Milind Changire <mchangir@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replace complex and slow port selection code with bind(0) which
already respects privileged ports.
Change-Id: I408a8528e58e1aafcd32eba6a8f1a759e0bf274e
BUG: 1405628
Reviewed-on-release-3.8-fb: http://review.gluster.org/16150
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: https://review.gluster.org/16178
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Avoid logging Success in the event of failure especially when errno has
no meaningful value w.r.t. the failure. In this case the errno is set to
zero when there's indeed a failure at the RPC level.
Change-Id: If2cc81aa1e590023ed22892dacbef7cac213e591
BUG: 1426032
Signed-off-by: Milind Changire <mchangir@redhat.com>
Reviewed-on: https://review.gluster.org/16730
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Omitting this return causes a confusing error message in the next block,
which is also a case of use after free. This bug was found by Coverity
scan.
BUG: 789278
Change-Id: Ifd4932de437e8ae875ff191033ea43cff81b701d
Signed-off-by: Nigel Babu <nigelb@redhat.com>
Reviewed-on: https://review.gluster.org/16790
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Log more about the different paths leading to socket disconnect for ease
of debugging.
Log via gf_log_callingfn() in __socket_disconnect() at loglevel TRACE if
socket connection is being torn down.
Change-Id: I1e551c2d685784b5ec747f481179f64d524c0461
BUG: 1426125
Signed-off-by: Milind Changire <mchangir@redhat.com>
Reviewed-on: https://review.gluster.org/16732
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Locking during notify was introduced as part of commit
aa22f24f5db7659387704998ae01520708869873 [1]. The fix was introduced
to fix out-of-order CONNECT/DISCONNECT events from rpc-clnt to parent
xlators [2]. However as part of handling DISCONNECT protocol/client
does unwind saved frames (with failure) waiting for responses. This
saved_frames_unwind can be a costly operation and hence ideally
shouldn't be included in the critical section of notifylock, as it
unnecessarily delays the reconnection to same brick. Also, its not a
good practise to pass control to other xlators holding a lock as it
can lead to deadlocks. So, this patch removes locking in rpc-clnt
while notifying parent xlators.
To fix [2], two changes are present in this patch:
* notify DISCONNECT before cleaning up rpc connection (same as commit
a6b63e11b7758cf1bfcb6798, patch [3]).
* protocol/client uses rpc_clnt_cleanup_and_start, which cleans up rpc
connection and does a start while handling a DISCONNECT event from
rpc. Note that patch [3] was reverted as rpc_clnt_start called in
quick_reconnect path of protocol/client didn't invoke connect on
transport as the connection was not cleaned up _yet_ (as cleanup was
moved post notification in rpc-clnt). This resulted in clients never
attempting connect to bricks.
Note that one of the neater ways to fix [2] (without using locks) is
to introduce generation numbers to map CONNECT and DISCONNECTS across
epochs and ignore DISCONNECT events if they don't belong to current
epoch. However, this approach is a bit complex to implement and
requires time. So, current patch is a hacky stop-gap fix till we come
up with a more cleaner solution.
[1] http://review.gluster.org/15916
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1386626
[3] http://review.gluster.org/15681
Change-Id: I62daeee8bb1430004e28558f6eb133efd4ccf418
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
BUG: 1427012
Reviewed-on: https://review.gluster.org/16784
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Milind Changire <mchangir@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since strcmp return a int, and since the spec
of strcmp do not tell the return value, it
could return 256 and this would overflow.
Found by Coverity scan.
(thanks to Stéphane Marcheusin who explained
the details to me)
Change-Id: I5195e05b44f8b537226e6cee178d95a1ab904e96
BUG: 789278
Signed-off-by: Michael Scherer <misc@redhat.com>
Reviewed-on: https://review.gluster.org/16738
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Tested-by: Michael Scherer <misc@fedoraproject.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: Avoid flodding of SSL messages in case of failure/success
Solution: 1) Ideally ssl_setup_connection should be call after getting success on
connect so update the condition before call socket_spawn in socket_connect.
2) Change the message type to debug in case of success.
BUG: 1427018
Change-Id: Icb6101e49304d5fe539609b4afacfb1b50b62f84
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Reviewed-on: https://review.gluster.org/16767
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Coverity warn of use after free here. I assume that
under pressure, this might crash the whole process.
Change-Id: I15fb5cfc9b509705e96e4156b739988d816bbef5
BUG: 789278
Signed-off-by: Michael Scherer <misc@redhat.com>
Reviewed-on: https://review.gluster.org/16719
Smoke: Gluster Build System <jenkins@build.gluster.org>
Tested-by: Michael Scherer <misc@fedoraproject.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I003e38b238704d3345d46688355bcf3702455ba1
BUG: 1399593
Signed-off-by: Mateusz Slupny <mateusz.slupny@appeartv.com>
[ndevos: rebased after I8ff5d1a32 moved the code around]
Reviewed-on: https://review.gluster.org/15969
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Tested-by: Niels de Vos <ndevos@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Prashanth Pai <ppai@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Issue:
When fio is run on multiple clients (each client writes to its own files),
and meanwhile the clients does a readdirp, thus the client which did
a readdirp will now recieve the upcalls. In this scenario the client
disconnects with rpc decode failed error.
RCA:
Upcall calls rpcsvc_request_submit to submit the request to socket:
rpcsvc_request_submit currently:
rpcsvc_request_submit () {
iobuf = iobuf_new
iov = iobuf->ptr
fill iobuf to contain xdrised upcall content - proghdr
rpcsvc_callback_submit (..iov..)
...
if (iobuf)
iobuf_unref (iobuf)
}
rpcsvc_callback_submit (... iov...) {
...
iobuf = iobuf_new
iov1 = iobuf->ptr
fill iobuf to contain xdrised rpc header - rpchdr
msg.rpchdr = iov1
msg.proghdr = iov
...
rpc_transport_submit_request (msg)
...
if (iobuf)
iobuf_unref (iobuf)
}
rpcsvc_callback_submit assumes that once rpc_transport_submit_request()
returns the msg is written on to socket and thus the buffers(rpchdr, proghdr)
can be freed, which is not the case. In especially high workload,
rpc_transport_submit_request() may not be able to write to socket immediately
and hence adds it to its own queue and returns as successful. Thus, we have
use after free, for rpchdr and proghdr. Hence the clients gets garbage rpchdr
and proghdr and thus fails to decode the rpc, resulting in disconnect.
To prevent this, we need to add the rpchdr and proghdr to a iobref and send
it in msg:
iobref_add (iobref, iobufs)
msg.iobref = iobref;
The socket layer takes a ref on msg.iobref, if it cannot write to socket and
is adding to the queue. Thus we do not have use after free.
Thank You for discussing, debugging and fixing along:
Prashanth Pai <ppai@redhat.com>
Raghavendra G <rgowdapp@redhat.com>
Rajesh Joseph <rjoseph@redhat.com>
Kotresh HR <khiremat@redhat.com>
Mohammed Rafi KC <rkavunga@redhat.com>
Soumya Koduri <skoduri@redhat.com>
Change-Id: Ifa6bf6f4879141f42b46830a37c1574b21b37275
BUG: 1421937
Signed-off-by: Poornima G <pgurusid@redhat.com>
Reviewed-on: https://review.gluster.org/16613
Reviewed-by: Prashanth Pai <ppai@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: soumya k <skoduri@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PROBLEM: In the tier as a service patch the enums for tier (from
gf1_op_command and gf_defrag_command) are put into a single enum
gf_defrag_command which causes a warning that will make the build
fail.
FIX: send both the enum and eliminate the warning.
Change-Id: I899ff622dfb07134e6459aa65f65ea7252765293
BUG: 1418973
Signed-off-by: hari gowtham <hgowtham@redhat.com>
Reviewed-on: https://review.gluster.org/16539
Smoke: Gluster Build System <jenkins@build.gluster.org>
Tested-by: hari gowtham <hari.gowtham005@gmail.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GF_REF_PUT was called inside lock which can call
socket_poller_mayday which inturn tries to take the
same lock. This can lead to deadlock scenario.
BUG: 1410701
Change-Id: Ib3b161bcfeac810bd3593dc04c10ef984f996b17
Signed-off-by: Rajesh Joseph <rjoseph@redhat.com>
Reviewed-on: https://review.gluster.org/16343
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we relied on a complex dance of setting flags, shutting
down the socket, tearing stuff down, getting an event, tearing more
stuff down, and waiting for a higher-level retry. What we really
need, in the case where we're just trying to connect prematurely e.g.
to a brick that hasn't fully come up yet, is a simple retry of the
connect(2) call.
This was discovered by observing failures in ec-new-entry.t with
multiplexing enabled, but probably fixes other random failures as
well.
Change-Id: Ibedb8942060bccc96b02272a333c3002c9b77d4c
BUG: 1385758
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: https://review.gluster.org/16510
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some functions were allocating 64K booleans, which are (crazily) mapped to
4-byte ints, for a total of 256KB per call. Changed to use bitfields instead,
so usage is now only 8KB per call. This was the impediment to changing the
io-threads stack size, so that has been adjusted too.
Change-Id: I8781c4f2c8f2b830f4535e366995fac8dd0a8653
BUG: 1418095
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: https://review.gluster.org/15745
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: Bonnie++ rewrite operation hangs in ganesha + SSL environment
Solution: Bonnie++ hangs during execution of rewrite operation in
ganesha + SSL environment.It was hanged due to blocking on poll
call in ssl_do because no POLLOUT event was getting on socket.
Socket is not getting POLLOUT event because all other threads
are waiting to get lock and lock is not released ssl_do
because it is not getting any event on poll.To correct it
update the condition in ssl_do as same in getting error
SSL_ERROR_WANT_READ.
Test: To test the patch followed below procedure
1) Setup 2X2 Ganesha + SSL environment.
2) Run bonnie from 3 nfs client parallely
3) After run "Rewwrite operation" by bonnie it is hanged.
4) After apply the patch it is not hanged.
BUG: 1418213
Change-Id: I5985cbbc4cfdac5d287268d791e31c274abc3c8d
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Reviewed-on: https://review.gluster.org/16501
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All of the functions called to free the refcounted structure are doing a
typecast from (void*) to their own type taht is being free'd. This
really is not needed and the refcount interface is made a little simpler
without the requirement of typecasting.
With this small improvement in the API, all callers are updated too.
Change-Id: I32473b6d1799f62861d4b2d78ea30c09e6c80ab1
BUG: 1416889
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: https://review.gluster.org/16471
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds support for multiple brick translator stacks running
in a single brick server process. This reduces our per-brick memory usage by
approximately 3x, and our appetite for TCP ports even more. It also creates
potential to avoid process/thread thrashing, and to improve QoS by scheduling
more carefully across the bricks, but realizing that potential will require
further work.
Multiplexing is controlled by the "cluster.brick-multiplex" global option. By
default it's off, and bricks are started in separate processes as before. If
multiplexing is enabled, then *compatible* bricks (mostly those with the same
transport options) will be started in the same process.
Change-Id: I45059454e51d6f4cbb29a4953359c09a408695cb
BUG: 1385758
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: https://review.gluster.org/14763
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this, we will be able to trigger statedumps on remote Gluster
clients, mainly targetted for applications using libgfapi.
Design:
SIGUSR signal is the most comman way of taking a statedump in Gluster.
But it cannot be used for libgfapi based processes, as the process
loading the library might have already consumed SIGUSR signal. Hence
going by the command way.
One has to issue a Gluster command to initiate a statedump on the
libgfapi based client. The command takes hostname and PID as an
argument. All the glusterds in the cluster, check if they are connected
to the specified hostname, and send an RPC request to all the connected
clients from that hostname (via the mgmt connection).
URL: http://review.gluster.org/16357
Change-Id: Icbe4d2f026b32a2c7d5535e1bfb2cdaaff042e91
BUG: 1169302
Signed-off-by: Poornima G <pgurusid@redhat.com>
[ndevos: minor fixes and split patch in smaller pieces]
Reviewed-on: https://review.gluster.org/9228
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Tested-by: Niels de Vos <ndevos@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Samikshan Bairagya <samikshan@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
tierd is implemented by separating from rebalance process.
The commands affected:
1) Attach tier will trigger this process instead of old one
2) tier start and tier start force will also trigger this process.
3) volume status [tier] will show tier daemon as a process instead
of task and normal tier status and tier detach status works.
4) tier stop implemented.
5) detach tier implemented separately along with new detach tier
status
6) volume tier volname status will work using the changes.
7) volume set works
This patch has separated the tier translator from the legacy
DHT rebalance code. It now sends the RPCs from the CLI
to glusterd separate to the DHT rebalance code.
The daemon is now a service, similar to the snapshot daemon,
and can be viewed using the volume status command.
The code for the validation and commit phase are the same
as the earlier tier validation code in DHT rebalance.
The “brickop” phase has been changed so that the status
command can use this framework.
The service management framework is now used.
DHT rebalance does not use this framework.
This service framework takes care of :
*) spawning the daemon, killing it and other such processes.
*) volume set options , which are written on the volfile.
*) restart and reconfigure functions. Restart is to restart
the daemon at two points
1)after gluster goes down and comes up.
2) to stop detach tier.
*) reconfigure is used to make immediate volfile changes.
By doing this, we don’t restart the daemon.
it has the code to rewrite the volfile for topological
changes too (which comes into place during add and remove brick).
With this patch the log, pid, and volfile are separated
and put into respective directories.
Change-Id: I3681d0d66894714b55aa02ca2a30ac000362a399
BUG: 1313838
Signed-off-by: hari gowtham <hgowtham@redhat.com>
Reviewed-on: http://review.gluster.org/13365
Smoke: Gluster Build System <jenkins@build.gluster.org>
Tested-by: hari gowtham <hari.gowtham005@gmail.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CID = 1356502
BUG: 789278
Change-Id: I11a814addc6607902c12aca8f4efec5741cbd7d3
Signed-off-by: Muthu-vigneshwaran <mvignesh@redhat.com>
Reviewed-on: http://review.gluster.org/16028
Tested-by: Muthu Vigneshwaran <muthuvigneshwaran77@gmail.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When SSL is enabled or if "transport.socket.own-thread" option is set
then socket_poller is run as different thread. Currently during
disconnect or PARENT_DOWN scenario we don't wait for this thread
to terminate. PARENT_DOWN will disconnect the socket layer and
cleanup resources used by socket_poller.
Therefore before disconnect we should wait for poller thread to exit.
Change-Id: I71f984b47d260ffd979102f180a99a0bed29f0d6
BUG: 1404181
Signed-off-by: Rajesh Joseph <rjoseph@redhat.com>
Reviewed-on: http://review.gluster.org/16141
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Kaushal M <kaushal@redhat.com>
Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
- Check for AF_INET *and* AF_INET6.
- This is a cherry-pick of D3057373 to 3.8
Signed-off-by: Shreyas Siravara <sshreyas@fb.com>
Change-Id: I53eb79284eddfee6e13821c6570809f575b96769
BUG: 1405478
Reviewed-on: http://review.gluster.org/16167
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Tested-by: Jeff Darcy <jdarcy@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is possible that the notification thread which notifies
protocol/client layer about the disconnection is put to sleep
and meanwhile, a fuse thread or a timer thread initiates and
completes reconnection to the brick. The notification thread
is then woken up and protocol/client layer updates its flags
to indicate that network is disconnected. No reconnection is
initiated because reconnection is rpc-lib layer's responsibility
and its flags indicate that connection is connected.
Fix: Serialize connect and disconnect notify
Credit: Raghavendra Talur <rtalur@redhat.com>
Change-Id: I8ff5d1a3283b47f5c26848a42016a40bc34ffc1d
BUG: 1386626
Signed-off-by: Rajesh Joseph <rjoseph@redhat.com>
Reviewed-on: http://review.gluster.org/15916
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Upcall as a part of setattr, sends an invalidation and the
invalidation carries the resulting stat value. When a file
is converted to linkto files, even then an invalidation
is set and as a result the mountpoint shows the sticky
bit in the stat of the file.
eg: ---------T. 945 root root 0 Nov 8 10:14 hardlink.999
Fix:
When dht recieves a notification of sticky bit change, it updates
the flag, to indicate md-cache to send the subsequent lookup.
Change-Id: Ic2fd7a5b196db0754f9b97072e644e6bf69da606
BUG: 1392713
Signed-off-by: Poornima G <pgurusid@redhat.com>
Reviewed-on: http://review.gluster.org/15789
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Susant Palai <spalai@redhat.com>
Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When there are already existing non-granular indices created that are
yet to be healed, if granular-entry-heal option is toggled from 'off' to
'on', AFR self-heal whenever it kicks in, will try to look for granular
indices in 'entry-changes'. Because of the absence of name indices,
granular entry healing logic will fail to heal these directories, and
worse yet unset pending extended attributes with the assumption that
are no entries that need heal.
To get around this, a new CLI is introduced which will invoke glfsheal
program to figure whether at the time an attempt is made to enable
granular entry heal, there are pending heals on the volume OR there
are one or more bricks that are down. If either of them is true, the
command will be failed with the appropriate error.
New CLI: gluster volume heal <VOL> granular-entry-heal {enable,disable}
Change-Id: I1f4fe8162813b9068e198965d94169fee4adc099
BUG: 1370410
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/15747
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
|