glusterfs.git/rpc/rpc-lib/src, branch v7.1

rpcsvc: fix subnet_mask_v4 check

2019-11-28T10:03:04+00:00

The check we had for subnet mask validation wasn't checking in
proper sequence. Corrected the order of calling `inet_pton()` as
the fix.

Fixes: bz#1777769
Change-Id: I5d31468eb917aa94cbb85f573b37c60023e9daf3
Signed-off-by: Amar Tumballi 
(cherry picked from commit d60935d1011e387115e0445629976196f566b3b1)

event: rename event_XXX with gf_ prefixed

2019-08-21T06:13:38+00:00

I hit one crash issue when using the libgfapi.

In the libgfapi it will call glfs_poller() --> event_dispatch()
in file api/src/glfs.c:721, and the event_dispatch() is defined
by libgluster locally, the problem is the name of event_dispatch()
is the extremly the same with the one from libevent package form
the OS.

For example, if a executable program Foo, which will also use and
link the libevent and the libgfapi at the same time, I can hit the
crash, like:

kernel: glfs_glfspoll[68486]: segfault at 1c0 ip 00007fef006fd2b8 sp
00007feeeaffce30 error 4 in libevent-2.0.so.5.1.9[7fef006ed000+46000]

The link for Foo is:
lib_foo_LADD = -levent $(GFAPI_LIBS)
It will crash.

This is because the glfs_poller() is calling the event_dispatch() from
the libevent, not the libglsuter.

The gfapi link info :
GFAPI_LIBS = -lacl -lgfapi -lglusterfs -lgfrpc -lgfxdr -luuid

If I link Foo like:
lib_foo_LADD = $(GFAPI_LIBS) -levent
It will works well without any problem.

And if Foo call one private lib, such as handler_glfs.so, and the
handler_glfs.so will link the GFAPI_LIBS directly, while the Foo won't
and it will dlopen(handler_glfs.so), then the crash will be hit everytime.

The link info will be:
foo_LADD = -levent
libhandler_glfs_LIBADD = $(GFAPI_LIBS)

I can avoid the crash temporarily by linking the GFAPI_LIBS in Foo too like:
foo_LADD = $(GFAPI_LIBS) -levent
libhandler_glfs_LIBADD = $(GFAPI_LIBS)

But this is ugly since the Foo won't use any APIs from the GFAPI_LIBS.

And in some cases when the --as-needed link option is added(on many dists
it is added as default), then the crash is back again, the above workaround
won't work.

Backport of:
> https://review.gluster.org/#/c/glusterfs/+/23110/
> Change-Id: I38f0200b941bd1cff4bf3066fca2fc1f9a5263aa
> Fixes: #699
> Signed-off-by: Xiubo Li 

Change-Id: I38f0200b941bd1cff4bf3066fca2fc1f9a5263aa
updates: bz#1740519
Signed-off-by: Xiubo Li 
(cherry picked from commit 799edc73c3d4f694c365c6a7c27c9ab8eed5f260)

multiple files: another attempt to remove includes

2019-06-14T16:50:32+00:00

There are many include statements that are not needed.
A previous more ambitious attempt failed because of *BSD plafrom
(see https://review.gluster.org/#/c/glusterfs/+/21929/ )

Now trying a more conservative reduction.
It does not solve all circular deps that we have, but it
does reduce some of them. There is just too much to handle
reasonably (dht-common.h includes dht-lock.h which includes
dht-common.h ...), but it does reduce the overall number of lines
of include we need to look at in the future to understand and fix
the mess later one.

Change-Id: I550cd001bdefb8be0fe67632f783c0ef6bee3f9f
updates: bz#1193929
Signed-off-by: Yaniv Kaul

Fix some "Null pointer dereference" coverity issues

2019-05-26T13:59:13+00:00

This patch fixes the following CID's:

  * 1124829
  * 1274075
  * 1274083
  * 1274128
  * 1274135
  * 1274141
  * 1274143
  * 1274197
  * 1274205
  * 1274210
  * 1274211
  * 1288801
  * 1398629

Change-Id: Ia7c86cfab3245b20777ffa296e1a59748040f558
Updates: bz#789278
Signed-off-by: Xavi Hernandez

Revert "rpc: implement reconnect back-off strategy"

2019-05-21T08:36:32+00:00

This reverts commit 59841f7e1ff0511b04884015441a181a56d07bea.

This revert is done as a 'possible' fix for frequent regression
failures, which are random in nature too (ie, different tests fails
in different runs).

Why exactly this patch? Because this patch seemed like most probable
candidate which got merged in last 15days, and after which regressions
are failing more often.

Updates: bz#1711827
Change-Id: I35333162fcd4064f9609525ca93c666053c6d959

rpc: implement reconnect back-off strategy

2019-05-11T14:25:53+00:00

When a connection failure happens, gluster tries to reconnect every 3
seconds. In some cases the failure is spurious, so a delay of 3 seconds
could be unnecessarily long.

This patch implements a back-off strategy that tries a reconnect as soon
as 1 tenth of a second. If this fails, the time is doubled until it's
around 3 seconds. After that, the reconnect is attempted every 3 seconds
as before.

Change-Id: Icb3fbe20d618f50cbbb599dce542b4e871c22149
Updates: bz#1193929
Signed-off-by: Xavier Hernandez

rpclib: slow floating point math and libm

2019-04-03T04:40:15+00:00

In release-6 rpc/rpc-lib (libgfrpc) added the function
get_rightmost_set_bit() which calls log2(3), a call that takes
a floating point parameter.

It's used thusly:
    right_most_unset_bit = get_rightmost_set_bit(...);

(So is it really the right-most unset bit, or the right-most set bit?)

It's unclear to me whether this is in the data path or not. If it is,
it's rather scary to think about integer-to-float conversions and slow
calls to libm functions in the data path.

gcc and clang have __builtin_ctz() which returns the same result as
get_rightmost_set_bit(), and does it substantially faster. Approx
20M iterations of get_rightmost_set_bit() took ~33sec of wall clock
time on my devel machine, while 20M iterations of __builtin_ctz()
took < 9sec; get_rightmost_set_bit() is 3x slower than __builtin_ctz().

And as a side benefit, we can again eliminate the need to link libgfrpc
with libm.

Change-Id: If9e7e80874577c52223f8125b385fc930de20699
updates: bz#1193929
Signed-off-by: Kaleb S. KEITHLEY

mgmt/shd: Implement multiplexing in self heal daemon

2019-04-01T03:44:23+00:00

Problem:

Shd daemon is per node, which means they create a graph
with all volumes on it. While this is a great for utilizing
resources, it is so good in terms of performance and managebility.

Because self-heal daemons doesn't have capability to automatically
reconfigure their graphs. So each time when any configurations
changes happens to the volumes(replicate/disperse), we need to restart
shd to bring the changes into the graph.

Because of this all on going heal for all other volumes has to be
stopped in the middle, and need to restart all over again.

Solution:

This changes makes shd as a per volume daemon, so that the graph
will be generated for each volumes.

When we want to start/reconfigure shd for a volume, we first search
for an existing shd running on the node, if there is none, we will
start a new process. If already a daemon is running for shd, then
we will simply detach a graph for a volume and reatach the updated
graph for the volume. This won't touch any of the on going operations
for any other volumes on the shd daemon.

Example of an shd graph when it is per volume

                           graph
                     -----------------------
                     |     debug-iostat    |
                     -----------------------
                    /         |             \
                   /          |              \
              ---------    ---------      ----------
              | AFR-1 |    | AFR-2 |      |  AFR-3 |
              --------     ---------      ----------

A running shd daemon with 3 volumes will be like-->

                           graph
                     -----------------------
                     |     debug-iostat    |
                     -----------------------
                    /           |           \
                   /            |            \
              ------------   ------------  ------------
              | volume-1 |   | volume-2 |  | volume-3 |
              ------------   ------------  ------------

Change-Id: Idcb2698be3eeb95beaac47125565c93370afbd99
fixes: bz#1659708
Signed-off-by: Mohammed Rafi KC

rpc: Remove duplicate code

2019-03-28T05:35:25+00:00

rpc_clnt_disable() and rpc_clnt_disconnect() have same code.
Removed rpc_clnt_disconnect() and moved calls to rpc_clnt_disconnect()
to rpc_clnt_disable()

updates bz#1193929
Change-Id: I965f57cc1d5af36d266810125558b6f5e5f279d4
Signed-off-by: Pranith Kumar K

build: link libgfrpc with MATH_LIB (libm, -lm)

2019-03-26T11:31:45+00:00

tl;dnr: libgfrpc.so calls log2(3) from libm; it should be explicitly
linked with -lm

the autoconf/automake/libtool stack is more or less forgiving on
different distributions. On forgiving systems libtool will semi-
magically link with implicit dependencies. But on Ubuntu, which
seems to be tending toward being less forgiving, the link of libgfrpc
will fail with an unresolved referencee to log2(3).

Change-Id: I9fae09ddb81e49004fbea4d7d83b95fb64a484b0
updates: bz#1193929
Signed-off-by: Kaleb S. KEITHLEY