<feed xmlns='http://www.w3.org/2005/Atom'>
<title>glusterfs.git/rpc/rpc-lib/src/rpc-clnt.c, branch v8.2</title>
<subtitle></subtitle>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/'/>
<entry>
<title>rpc: fix missing unref on reconnect</title>
<updated>2019-10-02T08:23:17+00:00</updated>
<author>
<name>Zhang Huan</name>
<email>zhanghuan@open-fs.com</email>
</author>
<published>2018-01-24T08:06:08+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=36075d4de78cb562b0e7a8bea85c868251f0e961'/>
<id>36075d4de78cb562b0e7a8bea85c868251f0e961</id>
<content type='text'>
On protocol client connecting to brick, client will firstly contact
glusterd to get port, then reconnect to glusterfsd. Reconnect cancels
the reconnect timer and start a new one. However, cancelling the timer
does not unref rpc ref-ed for it. That leads to refcount leak.

Fix this issue by unref-ing rpc if reconnect timer is canceled.

Change-Id: Ice89dcd93cb283a0c7250c369cc8961d52fb2022
Fixes: bz#1538900
BUG: 1538900
Signed-off-by: Zhang Huan &lt;zhanghuan@open-fs.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
On protocol client connecting to brick, client will firstly contact
glusterd to get port, then reconnect to glusterfsd. Reconnect cancels
the reconnect timer and start a new one. However, cancelling the timer
does not unref rpc ref-ed for it. That leads to refcount leak.

Fix this issue by unref-ing rpc if reconnect timer is canceled.

Change-Id: Ice89dcd93cb283a0c7250c369cc8961d52fb2022
Fixes: bz#1538900
BUG: 1538900
Signed-off-by: Zhang Huan &lt;zhanghuan@open-fs.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>graph/cleanup: Fix race in graph cleanup</title>
<updated>2019-09-05T16:14:44+00:00</updated>
<author>
<name>Mohammed Rafi KC</name>
<email>rkavunga@redhat.com</email>
</author>
<published>2019-07-05T14:42:59+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=43635716e6bd5bd5925fa9194b0853ee919a742d'/>
<id>43635716e6bd5bd5925fa9194b0853ee919a742d</id>
<content type='text'>
We were unconditionally cleaning up the grap when we get
child_down followed by parent_down. But this is prone to
race condition when some of the bricks are already disconnected.
In this case, even before the last child down is executed in the
client xlator code,we might have freed the graph. Because the
child_down event is alreadt recevied.

To fix this race, we have introduced a check to see if all client
xlator have cleared thier reconnect chain, and called the child_down
for last time.

Change-Id: I7d02813bc366dac733a836e0cd7b14a6fac52042
fixes: bz#1727329
Signed-off-by: Mohammed Rafi KC &lt;rkavunga@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We were unconditionally cleaning up the grap when we get
child_down followed by parent_down. But this is prone to
race condition when some of the bricks are already disconnected.
In this case, even before the last child down is executed in the
client xlator code,we might have freed the graph. Because the
child_down event is alreadt recevied.

To fix this race, we have introduced a check to see if all client
xlator have cleared thier reconnect chain, and called the child_down
for last time.

Change-Id: I7d02813bc366dac733a836e0cd7b14a6fac52042
fixes: bz#1727329
Signed-off-by: Mohammed Rafi KC &lt;rkavunga@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "rpc: implement reconnect back-off strategy"</title>
<updated>2019-05-21T08:36:32+00:00</updated>
<author>
<name>Amar Tumballi</name>
<email>amarts@redhat.com</email>
</author>
<published>2019-05-21T01:37:47+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=2f50486f431b37f4a16541911f50f2042049b0e6'/>
<id>2f50486f431b37f4a16541911f50f2042049b0e6</id>
<content type='text'>
This reverts commit 59841f7e1ff0511b04884015441a181a56d07bea.

This revert is done as a 'possible' fix for frequent regression
failures, which are random in nature too (ie, different tests fails
in different runs).

Why exactly this patch? Because this patch seemed like most probable
candidate which got merged in last 15days, and after which regressions
are failing more often.

Updates: bz#1711827
Change-Id: I35333162fcd4064f9609525ca93c666053c6d959
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts commit 59841f7e1ff0511b04884015441a181a56d07bea.

This revert is done as a 'possible' fix for frequent regression
failures, which are random in nature too (ie, different tests fails
in different runs).

Why exactly this patch? Because this patch seemed like most probable
candidate which got merged in last 15days, and after which regressions
are failing more often.

Updates: bz#1711827
Change-Id: I35333162fcd4064f9609525ca93c666053c6d959
</pre>
</div>
</content>
</entry>
<entry>
<title>rpc: implement reconnect back-off strategy</title>
<updated>2019-05-11T14:25:53+00:00</updated>
<author>
<name>Xavier Hernandez</name>
<email>jahernan@redhat.com</email>
</author>
<published>2018-01-19T11:18:13+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=59841f7e1ff0511b04884015441a181a56d07bea'/>
<id>59841f7e1ff0511b04884015441a181a56d07bea</id>
<content type='text'>
When a connection failure happens, gluster tries to reconnect every 3
seconds. In some cases the failure is spurious, so a delay of 3 seconds
could be unnecessarily long.

This patch implements a back-off strategy that tries a reconnect as soon
as 1 tenth of a second. If this fails, the time is doubled until it's
around 3 seconds. After that, the reconnect is attempted every 3 seconds
as before.

Change-Id: Icb3fbe20d618f50cbbb599dce542b4e871c22149
Updates: bz#1193929
Signed-off-by: Xavier Hernandez &lt;xhernandez@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When a connection failure happens, gluster tries to reconnect every 3
seconds. In some cases the failure is spurious, so a delay of 3 seconds
could be unnecessarily long.

This patch implements a back-off strategy that tries a reconnect as soon
as 1 tenth of a second. If this fails, the time is doubled until it's
around 3 seconds. After that, the reconnect is attempted every 3 seconds
as before.

Change-Id: Icb3fbe20d618f50cbbb599dce542b4e871c22149
Updates: bz#1193929
Signed-off-by: Xavier Hernandez &lt;xhernandez@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rpc: Remove duplicate code</title>
<updated>2019-03-28T05:35:25+00:00</updated>
<author>
<name>Pranith Kumar K</name>
<email>pkarampu@redhat.com</email>
</author>
<published>2019-03-28T05:33:34+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=108e4f3481225f98c12f7c283e1ef0388863cf8b'/>
<id>108e4f3481225f98c12f7c283e1ef0388863cf8b</id>
<content type='text'>
rpc_clnt_disable() and rpc_clnt_disconnect() have same code.
Removed rpc_clnt_disconnect() and moved calls to rpc_clnt_disconnect()
to rpc_clnt_disable()

updates bz#1193929
Change-Id: I965f57cc1d5af36d266810125558b6f5e5f279d4
Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
rpc_clnt_disable() and rpc_clnt_disconnect() have same code.
Removed rpc_clnt_disconnect() and moved calls to rpc_clnt_disconnect()
to rpc_clnt_disable()

updates bz#1193929
Change-Id: I965f57cc1d5af36d266810125558b6f5e5f279d4
Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rpc/transport: Missing a ref on dict while creating transport object</title>
<updated>2019-03-20T13:24:44+00:00</updated>
<author>
<name>Mohammed Rafi KC</name>
<email>rkavunga@redhat.com</email>
</author>
<published>2019-02-26T12:34:18+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=f2f07591b2de9ba45bbc3eb4f601d1e9a327190b'/>
<id>f2f07591b2de9ba45bbc3eb4f601d1e9a327190b</id>
<content type='text'>
while creating rpc_tranpsort object, we store a dictionary without
taking a ref on dict but it does an unref during the cleaning of the
transport object.

So the rpc layer expect the caller to take a ref on the dictionary
before passing dict to rpc layer. This leads to a lot of confusion
across the code base and leads to ref leaks.

Semantically, this is not correct. It is the rpc layer responsibility
to take a ref when storing it, and free during the cleanup.

I'm listing down the total issues or leaks across the code base because
of this confusion. These issues are currently present in the upstream
master.

1) changelog_rpc_client_init

2) quota_enforcer_init

3) rpcsvc_create_listeners : when there are two transport, like tcp,rdma.

4) quotad_aggregator_init

5) glusterd: init

6) nfs3_init_state

7) server: init

8) client:init

This patch does the cleanup according to the semantics.

Change-Id: I46373af9630373eb375ee6de0e6f2bbe2a677425
updates: bz#1659708
Signed-off-by: Mohammed Rafi KC &lt;rkavunga@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
while creating rpc_tranpsort object, we store a dictionary without
taking a ref on dict but it does an unref during the cleaning of the
transport object.

So the rpc layer expect the caller to take a ref on the dictionary
before passing dict to rpc layer. This leads to a lot of confusion
across the code base and leads to ref leaks.

Semantically, this is not correct. It is the rpc layer responsibility
to take a ref when storing it, and free during the cleanup.

I'm listing down the total issues or leaks across the code base because
of this confusion. These issues are currently present in the upstream
master.

1) changelog_rpc_client_init

2) quota_enforcer_init

3) rpcsvc_create_listeners : when there are two transport, like tcp,rdma.

4) quotad_aggregator_init

5) glusterd: init

6) nfs3_init_state

7) server: init

8) client:init

This patch does the cleanup according to the semantics.

Change-Id: I46373af9630373eb375ee6de0e6f2bbe2a677425
updates: bz#1659708
Signed-off-by: Mohammed Rafi KC &lt;rkavunga@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>clnt/rpc: ref leak during disconnect.</title>
<updated>2019-02-12T07:05:58+00:00</updated>
<author>
<name>Mohammed Rafi KC</name>
<email>rkavunga@redhat.com</email>
</author>
<published>2019-01-23T16:25:01+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=34e6028e4ceaff5ceb1165317a3a90d02e0da4ac'/>
<id>34e6028e4ceaff5ceb1165317a3a90d02e0da4ac</id>
<content type='text'>
During disconnect cleanup, we are not cancelling reconnect
timer, which causes a ref leak each time when a disconnect
happen.

Change-Id: I9d05d1f368d080e04836bf6a0bb018bf8f7b5b8a
updates: bz#1659708
Signed-off-by: Mohammed Rafi KC &lt;rkavunga@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
During disconnect cleanup, we are not cancelling reconnect
timer, which causes a ref leak each time when a disconnect
happen.

Change-Id: I9d05d1f368d080e04836bf6a0bb018bf8f7b5b8a
updates: bz#1659708
Signed-off-by: Mohammed Rafi KC &lt;rkavunga@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rpc-clnt: reduce transport connect log for EINPROGRESS</title>
<updated>2019-01-07T03:19:34+00:00</updated>
<author>
<name>Kinglong Mee</name>
<email>kinglongmee@gmail.com</email>
</author>
<published>2018-12-20T08:58:34+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=8b4822d457cd62c1525b6fbaac8668c79f3951c0'/>
<id>8b4822d457cd62c1525b6fbaac8668c79f3951c0</id>
<content type='text'>
quotad and ganesha.nfsd prints many logs as,

[rpc-clnt.c:1739:rpc_clnt_submit ] 0-&lt;VOLUME_NAME&gt;-quota: error returned while attempting to connect to host: (null), port 0

Change-Id: Ic0c815400619e4a87a772a51b19822920228c1ef
Updates: bz#1596787
Signed-off-by: Kinglong Mee &lt;mijinlong@open-fs.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
quotad and ganesha.nfsd prints many logs as,

[rpc-clnt.c:1739:rpc_clnt_submit ] 0-&lt;VOLUME_NAME&gt;-quota: error returned while attempting to connect to host: (null), port 0

Change-Id: Ic0c815400619e4a87a772a51b19822920228c1ef
Updates: bz#1596787
Signed-off-by: Kinglong Mee &lt;mijinlong@open-fs.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libglusterfs: Move devel headers under glusterfs directory</title>
<updated>2018-12-05T21:47:04+00:00</updated>
<author>
<name>ShyamsundarR</name>
<email>srangana@redhat.com</email>
</author>
<published>2018-11-29T19:08:06+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=20ef211cfa5b5fcc437484a879fdc5d4c66bbaf5'/>
<id>20ef211cfa5b5fcc437484a879fdc5d4c66bbaf5</id>
<content type='text'>
libglusterfs devel package headers are referenced in code using
include semantics for a program, this while it works can be better
especially when dealing with out of tree xlator builds or in
general out of tree devel package usage.

Towards this, the following changes are done,
- moved all devel headers under a glusterfs directory
- Included these headers using system header notation &lt;&gt; in all
code outside of libglusterfs
- Included these headers using own program notation "" within
libglusterfs

This change although big, is just moving around the headers and
making it correct when including these headers from other sources.

This helps us correctly include libglusterfs includes without
namespace conflicts.

Change-Id: Id2a98854e671a7ee5d73be44da5ba1a74252423b
Updates: bz#1193929
Signed-off-by: ShyamsundarR &lt;srangana@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
libglusterfs devel package headers are referenced in code using
include semantics for a program, this while it works can be better
especially when dealing with out of tree xlator builds or in
general out of tree devel package usage.

Towards this, the following changes are done,
- moved all devel headers under a glusterfs directory
- Included these headers using system header notation &lt;&gt; in all
code outside of libglusterfs
- Included these headers using own program notation "" within
libglusterfs

This change although big, is just moving around the headers and
making it correct when including these headers from other sources.

This helps us correctly include libglusterfs includes without
namespace conflicts.

Change-Id: Id2a98854e671a7ee5d73be44da5ba1a74252423b
Updates: bz#1193929
Signed-off-by: ShyamsundarR &lt;srangana@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rpcsvc: provide each request handler thread its own queue</title>
<updated>2018-11-29T01:19:12+00:00</updated>
<author>
<name>Raghavendra Gowdappa</name>
<email>rgowdapp@redhat.com</email>
</author>
<published>2018-10-31T10:40:58+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=95e380eca19b9f0d03a53429535f15556e5724ad'/>
<id>95e380eca19b9f0d03a53429535f15556e5724ad</id>
<content type='text'>
A single global per program queue is contended by all request handler
threads and event threads. This can lead to high contention. So,
reduce the contention by providing each request handler thread its own
private queue.

Thanks to "Manoj Pillai"&lt;mpillai@redhat.com&gt; for the idea of pairing a
single queue with a fixed request-handler-thread and event-thread,
which brought down the performance regression due to overhead of
queuing significantly.

Thanks to "Xavi Hernandez"&lt;xhernandez@redhat.com&gt; for discussion on
how to communicate the event-thread death to request-handler-thread.

Thanks to "Karan Sandha"&lt;ksandha@redhat.com&gt; for voluntarily running
the perf benchmarks to qualify that performance regression introduced
by ping-timer-fixes is fixed with this patch and patiently running
many iterations of regression tests while RCAing the issue.

Thanks to "Milind Changire"&lt;mchangir@redhat.com&gt; for patiently running
the many iterations of perf benchmarking tests while RCAing the
regression caused by ping-timer-expiry fixes.

Change-Id: I578c3fc67713f4234bd3abbec5d3fbba19059ea5
Fixes: bz#1644629
Signed-off-by: Raghavendra Gowdappa &lt;rgowdapp@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A single global per program queue is contended by all request handler
threads and event threads. This can lead to high contention. So,
reduce the contention by providing each request handler thread its own
private queue.

Thanks to "Manoj Pillai"&lt;mpillai@redhat.com&gt; for the idea of pairing a
single queue with a fixed request-handler-thread and event-thread,
which brought down the performance regression due to overhead of
queuing significantly.

Thanks to "Xavi Hernandez"&lt;xhernandez@redhat.com&gt; for discussion on
how to communicate the event-thread death to request-handler-thread.

Thanks to "Karan Sandha"&lt;ksandha@redhat.com&gt; for voluntarily running
the perf benchmarks to qualify that performance regression introduced
by ping-timer-fixes is fixed with this patch and patiently running
many iterations of regression tests while RCAing the issue.

Thanks to "Milind Changire"&lt;mchangir@redhat.com&gt; for patiently running
the many iterations of perf benchmarking tests while RCAing the
regression caused by ping-timer-expiry fixes.

Change-Id: I578c3fc67713f4234bd3abbec5d3fbba19059ea5
Fixes: bz#1644629
Signed-off-by: Raghavendra Gowdappa &lt;rgowdapp@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
