<feed xmlns='http://www.w3.org/2005/Atom'>
<title>glusterfs.git/rpc/rpc-transport/socket/src/socket.c, branch v6.8</title>
<subtitle></subtitle>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/'/>
<entry>
<title>socket: fix error handling</title>
<updated>2019-12-13T11:48:19+00:00</updated>
<author>
<name>Xavi Hernandez</name>
<email>xhernandez@redhat.com</email>
</author>
<published>2019-12-11T17:21:14+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=78b63e0feed1937cae66156ea455a3977847a2fd'/>
<id>78b63e0feed1937cae66156ea455a3977847a2fd</id>
<content type='text'>
When __socket_proto_state_machine() detected a problem in the size of
the request or it couldn't allocate an iobuf of the requested size, it
returned -ENOMEM (-12). However the caller was expecting only -1 in
case of error. For this reason the error passes undetected initially,
adding back the socket to the epoll object. On further processing,
however, the error is finally detected and the connection terminated.
Meanwhile, another thread could receive a poll_in event from the same
connection, which could cause races with the connection destruction.
When this happened, the process crashed.

To fix this, all error detection conditions have been hardened to be
more strict on what is valid and what not. Also, we don't return
-ENOMEM anymore. We always return -1 in case of error.

An additional change has been done to prevent destruction of the
transport object while it may still be needed.

Backport of:
&gt; Change-Id: I6e59cd81cbf670f7adfdde942625d4e6c3fbc82d
&gt; Fixes: bz#1782495
&gt; Signed-off-by: Xavi Hernandez &lt;xhernandez@redhat.com&gt;

Change-Id: I6e59cd81cbf670f7adfdde942625d4e6c3fbc82d
Fixes: bz#1749625
Signed-off-by: Xavi Hernandez &lt;xhernandez@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When __socket_proto_state_machine() detected a problem in the size of
the request or it couldn't allocate an iobuf of the requested size, it
returned -ENOMEM (-12). However the caller was expecting only -1 in
case of error. For this reason the error passes undetected initially,
adding back the socket to the epoll object. On further processing,
however, the error is finally detected and the connection terminated.
Meanwhile, another thread could receive a poll_in event from the same
connection, which could cause races with the connection destruction.
When this happened, the process crashed.

To fix this, all error detection conditions have been hardened to be
more strict on what is valid and what not. Also, we don't return
-ENOMEM anymore. We always return -1 in case of error.

An additional change has been done to prevent destruction of the
transport object while it may still be needed.

Backport of:
&gt; Change-Id: I6e59cd81cbf670f7adfdde942625d4e6c3fbc82d
&gt; Fixes: bz#1782495
&gt; Signed-off-by: Xavi Hernandez &lt;xhernandez@redhat.com&gt;

Change-Id: I6e59cd81cbf670f7adfdde942625d4e6c3fbc82d
Fixes: bz#1749625
Signed-off-by: Xavi Hernandez &lt;xhernandez@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>event: rename event_XXX with gf_ prefixed</title>
<updated>2019-08-28T08:34:45+00:00</updated>
<author>
<name>Xiubo Li</name>
<email>xiubli@redhat.com</email>
</author>
<published>2019-07-26T04:34:52+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=f7d1fa51cf7308a6d68ed9fa1e8cc4a7c66ad121'/>
<id>f7d1fa51cf7308a6d68ed9fa1e8cc4a7c66ad121</id>
<content type='text'>
I hit one crash issue when using the libgfapi.

In the libgfapi it will call glfs_poller() --&gt; event_dispatch()
in file api/src/glfs.c:721, and the event_dispatch() is defined
by libgluster locally, the problem is the name of event_dispatch()
is the extremly the same with the one from libevent package form
the OS.

For example, if a executable program Foo, which will also use and
link the libevent and the libgfapi at the same time, I can hit the
crash, like:

kernel: glfs_glfspoll[68486]: segfault at 1c0 ip 00007fef006fd2b8 sp
00007feeeaffce30 error 4 in libevent-2.0.so.5.1.9[7fef006ed000+46000]

The link for Foo is:
lib_foo_LADD = -levent $(GFAPI_LIBS)
It will crash.

This is because the glfs_poller() is calling the event_dispatch() from
the libevent, not the libglsuter.

The gfapi link info :
GFAPI_LIBS = -lacl -lgfapi -lglusterfs -lgfrpc -lgfxdr -luuid

If I link Foo like:
lib_foo_LADD = $(GFAPI_LIBS) -levent
It will works well without any problem.

And if Foo call one private lib, such as handler_glfs.so, and the
handler_glfs.so will link the GFAPI_LIBS directly, while the Foo won't
and it will dlopen(handler_glfs.so), then the crash will be hit everytime.

The link info will be:
foo_LADD = -levent
libhandler_glfs_LIBADD = $(GFAPI_LIBS)

I can avoid the crash temporarily by linking the GFAPI_LIBS in Foo too like:
foo_LADD = $(GFAPI_LIBS) -levent
libhandler_glfs_LIBADD = $(GFAPI_LIBS)

But this is ugly since the Foo won't use any APIs from the GFAPI_LIBS.

And in some cases when the --as-needed link option is added(on many dists
it is added as default), then the crash is back again, the above workaround
won't work.

Backport of:
&gt; https://review.gluster.org/#/c/glusterfs/+/23110/
&gt; Change-Id: I38f0200b941bd1cff4bf3066fca2fc1f9a5263aa
&gt; Fixes: #699
&gt; Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;

Change-Id: I38f0200b941bd1cff4bf3066fca2fc1f9a5263aa
updates: bz#1740525
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
(cherry picked from commit 799edc73c3d4f694c365c6a7c27c9ab8eed5f260)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
I hit one crash issue when using the libgfapi.

In the libgfapi it will call glfs_poller() --&gt; event_dispatch()
in file api/src/glfs.c:721, and the event_dispatch() is defined
by libgluster locally, the problem is the name of event_dispatch()
is the extremly the same with the one from libevent package form
the OS.

For example, if a executable program Foo, which will also use and
link the libevent and the libgfapi at the same time, I can hit the
crash, like:

kernel: glfs_glfspoll[68486]: segfault at 1c0 ip 00007fef006fd2b8 sp
00007feeeaffce30 error 4 in libevent-2.0.so.5.1.9[7fef006ed000+46000]

The link for Foo is:
lib_foo_LADD = -levent $(GFAPI_LIBS)
It will crash.

This is because the glfs_poller() is calling the event_dispatch() from
the libevent, not the libglsuter.

The gfapi link info :
GFAPI_LIBS = -lacl -lgfapi -lglusterfs -lgfrpc -lgfxdr -luuid

If I link Foo like:
lib_foo_LADD = $(GFAPI_LIBS) -levent
It will works well without any problem.

And if Foo call one private lib, such as handler_glfs.so, and the
handler_glfs.so will link the GFAPI_LIBS directly, while the Foo won't
and it will dlopen(handler_glfs.so), then the crash will be hit everytime.

The link info will be:
foo_LADD = -levent
libhandler_glfs_LIBADD = $(GFAPI_LIBS)

I can avoid the crash temporarily by linking the GFAPI_LIBS in Foo too like:
foo_LADD = $(GFAPI_LIBS) -levent
libhandler_glfs_LIBADD = $(GFAPI_LIBS)

But this is ugly since the Foo won't use any APIs from the GFAPI_LIBS.

And in some cases when the --as-needed link option is added(on many dists
it is added as default), then the crash is back again, the above workaround
won't work.

Backport of:
&gt; https://review.gluster.org/#/c/glusterfs/+/23110/
&gt; Change-Id: I38f0200b941bd1cff4bf3066fca2fc1f9a5263aa
&gt; Fixes: #699
&gt; Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;

Change-Id: I38f0200b941bd1cff4bf3066fca2fc1f9a5263aa
updates: bz#1740525
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
(cherry picked from commit 799edc73c3d4f694c365c6a7c27c9ab8eed5f260)
</pre>
</div>
</content>
</entry>
<entry>
<title>rpc: glusterd start is failed and throwing an error Address already in use</title>
<updated>2019-08-19T11:45:19+00:00</updated>
<author>
<name>Mohit Agrawal</name>
<email>moagrawal@redhat.com</email>
</author>
<published>2019-08-13T13:15:43+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=c34338cfc6466bcdb2f6b1a8a3bde7f19c24d240'/>
<id>c34338cfc6466bcdb2f6b1a8a3bde7f19c24d240</id>
<content type='text'>
Problem: Some of the .t are failed due to bind is throwing
         an error EADDRINUSE

Solution: After killing all gluster processes .t is trying
          to start glusterd but somehow if kernel has not cleaned
          up resources(socket) then glusterd startup is failed due to
          bind system call failure.To avoid the issue retries to call
          bind 10 times to execute system call succesfully

&gt; Change-Id: Ia5fd6b788f7b211c1508c1b7304fc08a32266629
&gt; Fixes: bz#1743020
&gt; Signed-off-by: Mohit Agrawal &lt;moagrawal@redhat.com&gt;
&gt; (cherry picked from commit c370c70f77079339e2cfb7f284f3a2fb13fd2f97)

Change-Id: Ia5fd6b788f7b211c1508c1b7304fc08a32266629
Fixes: bz#1743219
Signed-off-by: Mohit Agrawal &lt;moagrawal@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Problem: Some of the .t are failed due to bind is throwing
         an error EADDRINUSE

Solution: After killing all gluster processes .t is trying
          to start glusterd but somehow if kernel has not cleaned
          up resources(socket) then glusterd startup is failed due to
          bind system call failure.To avoid the issue retries to call
          bind 10 times to execute system call succesfully

&gt; Change-Id: Ia5fd6b788f7b211c1508c1b7304fc08a32266629
&gt; Fixes: bz#1743020
&gt; Signed-off-by: Mohit Agrawal &lt;moagrawal@redhat.com&gt;
&gt; (cherry picked from commit c370c70f77079339e2cfb7f284f3a2fb13fd2f97)

Change-Id: Ia5fd6b788f7b211c1508c1b7304fc08a32266629
Fixes: bz#1743219
Signed-off-by: Mohit Agrawal &lt;moagrawal@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>transport/socket: log shutdown msg occasionally</title>
<updated>2019-04-16T13:38:58+00:00</updated>
<author>
<name>Raghavendra G</name>
<email>rgowdapp@redhat.com</email>
</author>
<published>2019-03-22T05:10:45+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=b1186532c7fd093232617979aa8f3e932a645278'/>
<id>b1186532c7fd093232617979aa8f3e932a645278</id>
<content type='text'>
Change-Id: If3fc0884e7e2f45de2d278b98693b7a473220a5e
Signed-off-by: Raghavendra G &lt;rgowdapp@redhat.com&gt;
Fixes: bz#1679904
(cherry picked from commit ec1b84300fe267dd12c1e42e7e91905db935f1e2)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Change-Id: If3fc0884e7e2f45de2d278b98693b7a473220a5e
Signed-off-by: Raghavendra G &lt;rgowdapp@redhat.com&gt;
Fixes: bz#1679904
(cherry picked from commit ec1b84300fe267dd12c1e42e7e91905db935f1e2)
</pre>
</div>
</content>
</entry>
<entry>
<title>socket: socket event handlers now return void</title>
<updated>2019-03-02T11:54:24+00:00</updated>
<author>
<name>Milind Changire</name>
<email>mchangir@redhat.com</email>
</author>
<published>2019-02-15T08:50:07+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=4cb1d6d94ac85c5e79171f8989b545ca098b61d9'/>
<id>4cb1d6d94ac85c5e79171f8989b545ca098b61d9</id>
<content type='text'>
Problem:
Returning any value from socket event handlers to the event sub-system
doesn't make sense since event sub-system cannot handle socket
sub-system errors.

Solution:
Change return type of all socket event handlers to 'void'

mainline:
&gt; Change-Id: I70dc2c57f12b7ea2fae41120f71aa0d7fe0b2b6f
&gt; Fixes: bz#1651246
&gt; Signed-off-by: Milind Changire &lt;mchangir@redhat.com&gt;
&gt; Reviewed-on: https://review.gluster.org/c/glusterfs/+/22221

Change-Id: I70dc2c57f12b7ea2fae41120f71aa0d7fe0b2b6f
Fixes: bz#1683900
Signed-off-by: Milind Changire &lt;mchangir@redhat.com&gt;
(cherry picked from commit 776ba851c6ee6c265253d44cf1d6e4e3d4a21772)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Problem:
Returning any value from socket event handlers to the event sub-system
doesn't make sense since event sub-system cannot handle socket
sub-system errors.

Solution:
Change return type of all socket event handlers to 'void'

mainline:
&gt; Change-Id: I70dc2c57f12b7ea2fae41120f71aa0d7fe0b2b6f
&gt; Fixes: bz#1651246
&gt; Signed-off-by: Milind Changire &lt;mchangir@redhat.com&gt;
&gt; Reviewed-on: https://review.gluster.org/c/glusterfs/+/22221

Change-Id: I70dc2c57f12b7ea2fae41120f71aa0d7fe0b2b6f
Fixes: bz#1683900
Signed-off-by: Milind Changire &lt;mchangir@redhat.com&gt;
(cherry picked from commit 776ba851c6ee6c265253d44cf1d6e4e3d4a21772)
</pre>
</div>
</content>
</entry>
<entry>
<title>socket: don't pass return value from protocol handler to event handler</title>
<updated>2019-01-22T06:59:37+00:00</updated>
<author>
<name>Zhang Huan</name>
<email>zhanghuan@open-fs.com</email>
</author>
<published>2019-01-03T09:57:38+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=cd5714554627fe90ee2c77685cb410a8fb25eceb'/>
<id>cd5714554627fe90ee2c77685cb410a8fb25eceb</id>
<content type='text'>
Event handler handles socket level error only, while protocol handler
handles in protocol level error. If protocol handler decides to
disconnect on error in any case, it should call disconnect instead of
return an error back to event handler.

Change-Id: I9375be98cc52cb969085333f3c7229a91207d1bd
updates: bz#1666143
Signed-off-by: Zhang Huan &lt;zhanghuan@open-fs.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Event handler handles socket level error only, while protocol handler
handles in protocol level error. If protocol handler decides to
disconnect on error in any case, it should call disconnect instead of
return an error back to event handler.

Change-Id: I9375be98cc52cb969085333f3c7229a91207d1bd
updates: bz#1666143
Signed-off-by: Zhang Huan &lt;zhanghuan@open-fs.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>socket: fix issue when socket read return with EAGAIN</title>
<updated>2019-01-22T06:52:30+00:00</updated>
<author>
<name>Zhang Huan</name>
<email>zhanghuan@open-fs.com</email>
</author>
<published>2018-12-29T08:26:58+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=4d9935a4db67be0237db5fc6a2b51086635571f6'/>
<id>4d9935a4db67be0237db5fc6a2b51086635571f6</id>
<content type='text'>
In the case socket read returns EAGAIN, positive value about remaining
vector to send is returned. This return value will be passed all the way
back to event handler, making it complains.

[2018-12-29 08:02:25.603199] T [socket.c:1640:__socket_read_simple_payload] 0-test-client-0-extra.0: partial read on non-blocking socket.
[2018-12-29 08:02:25.603201] T [rpc-clnt.c:654:rpc_clnt_reply_init] 0-test-client-2-extra.1: received rpc message (RPC XID: 0xfa6 Program: GlusterFS 4.x v1, ProgVers: 400, Proc: 12) from rpc-transport (test-client-2-extra.1)
[2018-12-29 08:02:25.603207] T [socket.c:3129:socket_event_handler] 0-test-client-0-extra.0: (sock:32) socket_event_poll_in returned 1

Formerly, in socket_proto_state_machine, return value of socket_readv is
used to check if message is all read-in. In this commit, it is checked
whether size of bytes indicated in header are all read in. In this way,
only 0 and -1 will be returned from socket_proto_state_machine(),
indicating whether there is error in the underlying socket.

Change-Id: I8be0d178b049f0720d738a03aec41c4b375d2972
updates: bz#1666143
Signed-off-by: Zhang Huan &lt;zhanghuan@open-fs.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In the case socket read returns EAGAIN, positive value about remaining
vector to send is returned. This return value will be passed all the way
back to event handler, making it complains.

[2018-12-29 08:02:25.603199] T [socket.c:1640:__socket_read_simple_payload] 0-test-client-0-extra.0: partial read on non-blocking socket.
[2018-12-29 08:02:25.603201] T [rpc-clnt.c:654:rpc_clnt_reply_init] 0-test-client-2-extra.1: received rpc message (RPC XID: 0xfa6 Program: GlusterFS 4.x v1, ProgVers: 400, Proc: 12) from rpc-transport (test-client-2-extra.1)
[2018-12-29 08:02:25.603207] T [socket.c:3129:socket_event_handler] 0-test-client-0-extra.0: (sock:32) socket_event_poll_in returned 1

Formerly, in socket_proto_state_machine, return value of socket_readv is
used to check if message is all read-in. In this commit, it is checked
whether size of bytes indicated in header are all read in. In this way,
only 0 and -1 will be returned from socket_proto_state_machine(),
indicating whether there is error in the underlying socket.

Change-Id: I8be0d178b049f0720d738a03aec41c4b375d2972
updates: bz#1666143
Signed-off-by: Zhang Huan &lt;zhanghuan@open-fs.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>socket: fix issue when socket write return with EAGAIN</title>
<updated>2019-01-17T08:29:48+00:00</updated>
<author>
<name>Zhang Huan</name>
<email>zhanghuan@open-fs.com</email>
</author>
<published>2018-12-29T07:51:13+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=0301a66bda44582e3a48519f2a5d365b0c38090d'/>
<id>0301a66bda44582e3a48519f2a5d365b0c38090d</id>
<content type='text'>
In the case socket write return with EAGAIN, the remaining vector count
is return all way back to event handler, making followup pollin event to
skip handling and dispatch loop complains about failure. Even thought
temporary write failure is not an error.

[2018-12-29 07:31:41.772310] E [MSGID: 101191] [event-epoll.c:674:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler

Change-Id: Idf03d120b5f7619eda19720a583cbcc3e7da2504
updates: bz#1666143
Signed-off-by: Zhang Huan &lt;zhanghuan@open-fs.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In the case socket write return with EAGAIN, the remaining vector count
is return all way back to event handler, making followup pollin event to
skip handling and dispatch loop complains about failure. Even thought
temporary write failure is not an error.

[2018-12-29 07:31:41.772310] E [MSGID: 101191] [event-epoll.c:674:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler

Change-Id: Idf03d120b5f7619eda19720a583cbcc3e7da2504
updates: bz#1666143
Signed-off-by: Zhang Huan &lt;zhanghuan@open-fs.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>socket: fix counting of socket total_bytes_read and total_bytes_write</title>
<updated>2019-01-17T08:29:32+00:00</updated>
<author>
<name>Zhang Huan</name>
<email>zhanghuan@open-fs.com</email>
</author>
<published>2018-12-27T06:13:48+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=22778ca88977fa061c468ca257aec74d4e7d09f4'/>
<id>22778ca88977fa061c468ca257aec74d4e7d09f4</id>
<content type='text'>
Change-Id: If35d0dbae963facf00ab6bcf07c6e4d1706ed982
updates: bz#1666143
Signed-off-by: Zhang Huan &lt;zhanghuan@open-fs.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Change-Id: If35d0dbae963facf00ab6bcf07c6e4d1706ed982
updates: bz#1666143
Signed-off-by: Zhang Huan &lt;zhanghuan@open-fs.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>socket: Remove redundant in_lock in incoming message handling</title>
<updated>2018-12-26T05:15:08+00:00</updated>
<author>
<name>Krutika Dhananjay</name>
<email>kdhananj@redhat.com</email>
</author>
<published>2018-12-21T04:28:16+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=a1e7acc93a416fec6b87cc5601a9922759156771'/>
<id>a1e7acc93a416fec6b87cc5601a9922759156771</id>
<content type='text'>
A given epoll thread can handle only one incoming (POLLIN) request.
And until the socket is rearmed for listening, it is guaranteed that
there won't be any new incoming requests. As a result, the priv-&gt;in_lock
which guards the socket proto state machine seems redundant.

This patch removes priv-&gt;in_lock.

Change-Id: I26b6ddd852aba8c10385833b85ffd2e53e46cb8c
updates: bz#1467614
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A given epoll thread can handle only one incoming (POLLIN) request.
And until the socket is rearmed for listening, it is guaranteed that
there won't be any new incoming requests. As a result, the priv-&gt;in_lock
which guards the socket proto state machine seems redundant.

This patch removes priv-&gt;in_lock.

Change-Id: I26b6ddd852aba8c10385833b85ffd2e53e46cb8c
updates: bz#1467614
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
