<feed xmlns='http://www.w3.org/2005/Atom'>
<title>glusterfs.git/xlators/cluster/afr/src/afr-transaction.c, branch exp</title>
<subtitle></subtitle>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/'/>
<entry>
<title>cluster/afr: Fix per-txn optimistic changelog initialisation</title>
<updated>2016-12-12T16:38:49+00:00</updated>
<author>
<name>Krutika Dhananjay</name>
<email>kdhananj@redhat.com</email>
</author>
<published>2016-12-08T17:19:48+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=2b76520ca3e41cbac8f9318dce87e0b8d670c0ee'/>
<id>2b76520ca3e41cbac8f9318dce87e0b8d670c0ee</id>
<content type='text'>
Incorrect initialisation of local-&gt;optimistic_change_log was leading
to skipped pre-op and post-op even when a brick didn't participate in
the txn because it was down.
The result - missing granular name index resulting in some entries
never getting healed.

FIX:
Initialise local-&gt;optimistic_change_log just before pre-op.

Also fixed granular entry heal to create the granular name index in
pre-op as opposed to post-op. This is to prevent loss of granular
information when during an entry txn, the good (src) brick goes
offline before the post-op is done. This would cause self-heal to
do conservative merge (since dirty xattr is the only information
available), which when granular-entry-heal is enabled, expects
granular indices, the lack of which can lead to loss of data in
the worst case.

Change-Id: Ia3ad716d6fb1821555f02180e86e8711a79f958d
BUG: 1402730
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
Reviewed-on: http://review.gluster.org/16075
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Incorrect initialisation of local-&gt;optimistic_change_log was leading
to skipped pre-op and post-op even when a brick didn't participate in
the txn because it was down.
The result - missing granular name index resulting in some entries
never getting healed.

FIX:
Initialise local-&gt;optimistic_change_log just before pre-op.

Also fixed granular entry heal to create the granular name index in
pre-op as opposed to post-op. This is to prevent loss of granular
information when during an entry txn, the good (src) brick goes
offline before the post-op is done. This would cause self-heal to
do conservative merge (since dirty xattr is the only information
available), which when granular-entry-heal is enabled, expects
granular indices, the lack of which can lead to loss of data in
the worst case.

Change-Id: Ia3ad716d6fb1821555f02180e86e8711a79f958d
BUG: 1402730
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
Reviewed-on: http://review.gluster.org/16075
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cluster/afr: Serialize conflicting locks on all subvols</title>
<updated>2016-12-07T06:47:40+00:00</updated>
<author>
<name>Pranith Kumar K</name>
<email>pkarampu@redhat.com</email>
</author>
<published>2016-12-01T04:12:19+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=a7d7ed90c9272a42168a91f92754d3a4be605da5'/>
<id>a7d7ed90c9272a42168a91f92754d3a4be605da5</id>
<content type='text'>
Problem:
1) When a blocking lock is issued and the parallel lock phase fails
on all subvolumes with EAGAIN, it is not switching to serialized
locking phase.
2) When quorum is enabled and locks fail partially it is better
to give errno returned by brick rather than the default
quorum errno.

Fix:
Handled this error case and changed op_errno to reflect the actual
errno in case of quorum error.

BUG: 1369077
Change-Id: Ifac2e4a13686e9fde601873012700966d56a7f31
Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
Reviewed-on: http://review.gluster.org/15984
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Ravishankar N &lt;ravishankar@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Problem:
1) When a blocking lock is issued and the parallel lock phase fails
on all subvolumes with EAGAIN, it is not switching to serialized
locking phase.
2) When quorum is enabled and locks fail partially it is better
to give errno returned by brick rather than the default
quorum errno.

Fix:
Handled this error case and changed op_errno to reflect the actual
errno in case of quorum error.

BUG: 1369077
Change-Id: Ifac2e4a13686e9fde601873012700966d56a7f31
Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
Reviewed-on: http://review.gluster.org/15984
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Ravishankar N &lt;ravishankar@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>afr: fix bug in passing child index in afr_inode_write_fill</title>
<updated>2016-12-06T12:12:00+00:00</updated>
<author>
<name>Ravishankar N</name>
<email>ravishankar@redhat.com</email>
</author>
<published>2016-12-05T15:44:57+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=ca13525a5de8db745878c4cdf89a45b76a9e62c6'/>
<id>ca13525a5de8db745878c4cdf89a45b76a9e62c6</id>
<content type='text'>
Change-Id: I7b70de317a5f15a3bf483ffe40b971143deddc11
BUG: 1401218
Signed-off-by: Ravishankar N &lt;ravishankar@redhat.com&gt;
Reviewed-on: http://review.gluster.org/16029
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Change-Id: I7b70de317a5f15a3bf483ffe40b971143deddc11
BUG: 1401218
Signed-off-by: Ravishankar N &lt;ravishankar@redhat.com&gt;
Reviewed-on: http://review.gluster.org/16029
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>afr, client: More mem-leak fixes in COMPOUND fop cbk</title>
<updated>2016-12-05T01:28:40+00:00</updated>
<author>
<name>Krutika Dhananjay</name>
<email>kdhananj@redhat.com</email>
</author>
<published>2016-12-03T03:39:15+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=c89cb610f51e7a5df5c4b7e9378a7ac8ac513e46'/>
<id>c89cb610f51e7a5df5c4b7e9378a7ac8ac513e46</id>
<content type='text'>
Bugs found and fixed:
1. Use correct subvolume index in pre-op-writev compound cbk
2. Prevent use-after-free of local-&gt;compound_args members in
   compound fops cbk in protocol/client
3. Fix xdata and xattr leaks in client_process_response
4. Fix possible leak of xdata in client_pre_writev() in
   test mode.
5. Free req-&gt;compound_req_array.compound_req_array_val as well
   after freeing its members
6. Free tmp_rsp-&gt;flock.lk_owner.lk_owner_val in LK fop.

Change-Id: I15b646d7d4e0e5cd4ea3d2d6452c815cf2eaf68f
BUG: 1401218
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
Reviewed-on: http://review.gluster.org/16020
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Bugs found and fixed:
1. Use correct subvolume index in pre-op-writev compound cbk
2. Prevent use-after-free of local-&gt;compound_args members in
   compound fops cbk in protocol/client
3. Fix xdata and xattr leaks in client_process_response
4. Fix possible leak of xdata in client_pre_writev() in
   test mode.
5. Free req-&gt;compound_req_array.compound_req_array_val as well
   after freeing its members
6. Free tmp_rsp-&gt;flock.lk_owner.lk_owner_val in LK fop.

Change-Id: I15b646d7d4e0e5cd4ea3d2d6452c815cf2eaf68f
BUG: 1401218
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
Reviewed-on: http://review.gluster.org/16020
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>afr: fix auto-quorum</title>
<updated>2016-11-29T06:41:24+00:00</updated>
<author>
<name>Jeff Darcy</name>
<email>jdarcy@redhat.com</email>
</author>
<published>2016-11-17T15:42:02+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=77f03db0131c88d607886bb02dd2a4276ab584d4'/>
<id>77f03db0131c88d607886bb02dd2a4276ab584d4</id>
<content type='text'>
(1) afr_have_quorum is dead code.  It was copied to afr_has_quorum,
and everything else uses that, but the original was never deleted
(until now).

(2) Auto-quorum should be default for any N&gt;2.  Leaving quorum
disabled is BAD, but apparently deemed acceptable for N=2 because
there's no real quorum in that case.  For any larger number (including
arbiter configurations) there is such a thing as real quorum and we
should use it by default.  Note that for N=3 the answers we get from
"N % 2" (the old check) and "N &gt; 2" (the new one) are the same.

(3) The special case for even N in afr_has_quorum has been simplified and
explained more thoroughly in a comment.

Change-Id: I48b33c15093512fecf516b26dcf09afecb7ae33b
Signed-off-by: Jeff Darcy &lt;jdarcy@redhat.com&gt;
Reviewed-on: http://review.gluster.org/15873
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Vijay Bellur &lt;vbellur@redhat.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
(1) afr_have_quorum is dead code.  It was copied to afr_has_quorum,
and everything else uses that, but the original was never deleted
(until now).

(2) Auto-quorum should be default for any N&gt;2.  Leaving quorum
disabled is BAD, but apparently deemed acceptable for N=2 because
there's no real quorum in that case.  For any larger number (including
arbiter configurations) there is such a thing as real quorum and we
should use it by default.  Note that for N=3 the answers we get from
"N % 2" (the old check) and "N &gt; 2" (the new one) are the same.

(3) The special case for even N in afr_has_quorum has been simplified and
explained more thoroughly in a comment.

Change-Id: I48b33c15093512fecf516b26dcf09afecb7ae33b
Signed-off-by: Jeff Darcy &lt;jdarcy@redhat.com&gt;
Reviewed-on: http://review.gluster.org/15873
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Vijay Bellur &lt;vbellur@redhat.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>afr: Fix the EIO that can occur in afr_inode_refresh as a result</title>
<updated>2016-11-29T04:10:24+00:00</updated>
<author>
<name>Poornima G</name>
<email>pgurusid@redhat.com</email>
</author>
<published>2016-11-21T06:19:35+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=570aefeb280e53e98cb5060cf384f1d74379a521'/>
<id>570aefeb280e53e98cb5060cf384f1d74379a521</id>
<content type='text'>
     of cache invalidation(upcall).

Issue:
------
When a cache invalidation is recieved as a result of changing
pending xattr, the read_subvol is reset. Consider the below chain
of execution:

CHILD_DOWN
...
afr_readv
...
afr_inode_refresh
...
afr_inode_read_subvol_reset &lt;- as a result of pending xattr set by
                               some other client GF_EVENT_UPCALL will
                               be sent
afr_refresh_done -&gt; this results in an EIO, as the read subvol was
                    reset by the end of the afr_inode_refresh

Solution:
---------
When GF_EVENT_UPCALL is recieved, instead of resetting read_subvol,
set a variable need_refresh in inode_ctx, the next time some one
starts a txn, along with event gen, need_rrefresh also needs to
be checked.

Change-Id: Ifda21a7a8039b8874215e1afa4bdf20f7d991b58
BUG: 1396952
Signed-off-by: Poornima G &lt;pgurusid@redhat.com&gt;
Reviewed-on: http://review.gluster.org/15892
Reviewed-by: Ravishankar N &lt;ravishankar@redhat.com&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
     of cache invalidation(upcall).

Issue:
------
When a cache invalidation is recieved as a result of changing
pending xattr, the read_subvol is reset. Consider the below chain
of execution:

CHILD_DOWN
...
afr_readv
...
afr_inode_refresh
...
afr_inode_read_subvol_reset &lt;- as a result of pending xattr set by
                               some other client GF_EVENT_UPCALL will
                               be sent
afr_refresh_done -&gt; this results in an EIO, as the read subvol was
                    reset by the end of the afr_inode_refresh

Solution:
---------
When GF_EVENT_UPCALL is recieved, instead of resetting read_subvol,
set a variable need_refresh in inode_ctx, the next time some one
starts a txn, along with event gen, need_rrefresh also needs to
be checked.

Change-Id: Ifda21a7a8039b8874215e1afa4bdf20f7d991b58
BUG: 1396952
Signed-off-by: Poornima G &lt;pgurusid@redhat.com&gt;
Reviewed-on: http://review.gluster.org/15892
Reviewed-by: Ravishankar N &lt;ravishankar@redhat.com&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>afr: allow I/O when favorite-child-policy is enabled</title>
<updated>2016-11-28T07:51:59+00:00</updated>
<author>
<name>Ravishankar N</name>
<email>ravishankar@redhat.com</email>
</author>
<published>2016-11-26T15:54:01+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=a07ddd8fcc8dcdcf7ccfa61211d258f13b9f9229'/>
<id>a07ddd8fcc8dcdcf7ccfa61211d258f13b9f9229</id>
<content type='text'>
Problem:
Currently, I/O on a split-brained file fails even when the
favorite-child-policy is set until the self-heal is complete.

Fix:
If a valid 'source' is found using the set favorite-child-policy, inspect
and reset the afr pending xattrs on the 'sinks' (inside appropriate locks),
refresh the inode and then proceed with the read or write transaction.

The resetting itself happens in the self-heal code and hence can also
happen in the client side background-heal or by the shd's index-heal in
addition to the txn code path explained above. When it happens in via
heal, we also add checks in undo-pending to not reset the sink xattrs
again.

Change-Id: Ic8c1317720cb26bd114b6fe6af4e58c73b864626
BUG: 1386188
Signed-off-by: Ravishankar N &lt;ravishankar@redhat.com&gt;
Reported-by: Simon Turcotte-Langevin &lt;simon.turcotte-langevin@ubisoft.com&gt;
Reviewed-on: http://review.gluster.org/15673
Tested-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Problem:
Currently, I/O on a split-brained file fails even when the
favorite-child-policy is set until the self-heal is complete.

Fix:
If a valid 'source' is found using the set favorite-child-policy, inspect
and reset the afr pending xattrs on the 'sinks' (inside appropriate locks),
refresh the inode and then proceed with the read or write transaction.

The resetting itself happens in the self-heal code and hence can also
happen in the client side background-heal or by the shd's index-heal in
addition to the txn code path explained above. When it happens in via
heal, we also add checks in undo-pending to not reset the sink xattrs
again.

Change-Id: Ic8c1317720cb26bd114b6fe6af4e58c73b864626
BUG: 1386188
Signed-off-by: Ravishankar N &lt;ravishankar@redhat.com&gt;
Reported-by: Simon Turcotte-Langevin &lt;simon.turcotte-langevin@ubisoft.com&gt;
Reviewed-on: http://review.gluster.org/15673
Tested-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cluster/afr: Fix bugs in [f]inodelk/[f]entrylk</title>
<updated>2016-11-26T15:34:59+00:00</updated>
<author>
<name>Pranith Kumar K</name>
<email>pkarampu@redhat.com</email>
</author>
<published>2016-11-07T09:17:34+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=6be7bd936eb30aa8d2b908061f60e1534e797657'/>
<id>6be7bd936eb30aa8d2b908061f60e1534e797657</id>
<content type='text'>
Problems:
1) Inodelk is not taking quorum into account
2) finodelk, [f]entrylk are not implemented correctly
3) By default afr doesn't go for non-blocking parallel locks.

Fix:
Implemented a common framework which can be used by
[f]inodelk/[f]entrylk.  Used quorum for the same.

Change-Id: I239f13875a065298630d266941df10cfa3addc85
BUG: 1369077
Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
Reviewed-on: http://review.gluster.org/15802
Tested-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
Reviewed-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Ravishankar N &lt;ravishankar@redhat.com&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Problems:
1) Inodelk is not taking quorum into account
2) finodelk, [f]entrylk are not implemented correctly
3) By default afr doesn't go for non-blocking parallel locks.

Fix:
Implemented a common framework which can be used by
[f]inodelk/[f]entrylk.  Used quorum for the same.

Change-Id: I239f13875a065298630d266941df10cfa3addc85
BUG: 1369077
Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
Reviewed-on: http://review.gluster.org/15802
Tested-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
Reviewed-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Ravishankar N &lt;ravishankar@redhat.com&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cluster/afr: Fix deadlock due to compound fops</title>
<updated>2016-11-26T11:10:31+00:00</updated>
<author>
<name>Krutika Dhananjay</name>
<email>kdhananj@redhat.com</email>
</author>
<published>2016-11-25T10:24:30+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=2fe8ba52108e94268bc816ba79074a96c4538271'/>
<id>2fe8ba52108e94268bc816ba79074a96c4538271</id>
<content type='text'>
When an afr data transaction is eligible for using
eager-lock, this information is represented in
local-&gt;transaction.eager_lock_on. However, if non-blocking
inodelk attempt (which is a full lock) fails, AFR falls back
to blocking locks which are range locks. At this point,
local-&gt;transaction.eager_lock[] per brick is reset but
local-&gt;transaction.eager_lock_on is still true.
When AFR decides to compound post-op and unlock, it is after
confirming that the transaction did not use eager lock (well,
except for a small bug where local-&gt;transaction.locks_acquired[]
is not considered).

But within afr_post_op_unlock_do(), afr again incorrectly sets
the lock range to full-lock based on local-&gt;transaction.eager_lock_on
value. This is a bug and can lead to deadlock since the locks acquired
were range locks and a full unlock is being sent leading to unlock failure
and thereby every other lock request (be it from SHD or other clients or
glfsheal) getting blocked forever and the user perceives a hang.

FIX:
Unconditionally rely on the range locks in inodelk object for unlocking
when using compounded post-op + unlock.

Big thanks to Pranith for helping with the debugging.

Change-Id: Idb4938f90397fb4bd90921f9ae6ea582042e5c67
BUG: 1398566
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
Reviewed-on: http://review.gluster.org/15929
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When an afr data transaction is eligible for using
eager-lock, this information is represented in
local-&gt;transaction.eager_lock_on. However, if non-blocking
inodelk attempt (which is a full lock) fails, AFR falls back
to blocking locks which are range locks. At this point,
local-&gt;transaction.eager_lock[] per brick is reset but
local-&gt;transaction.eager_lock_on is still true.
When AFR decides to compound post-op and unlock, it is after
confirming that the transaction did not use eager lock (well,
except for a small bug where local-&gt;transaction.locks_acquired[]
is not considered).

But within afr_post_op_unlock_do(), afr again incorrectly sets
the lock range to full-lock based on local-&gt;transaction.eager_lock_on
value. This is a bug and can lead to deadlock since the locks acquired
were range locks and a full unlock is being sent leading to unlock failure
and thereby every other lock request (be it from SHD or other clients or
glfsheal) getting blocked forever and the user perceives a hang.

FIX:
Unconditionally rely on the range locks in inodelk object for unlocking
when using compounded post-op + unlock.

Big thanks to Pranith for helping with the debugging.

Change-Id: Idb4938f90397fb4bd90921f9ae6ea582042e5c67
BUG: 1398566
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
Reviewed-on: http://review.gluster.org/15929
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cluster/afr: Handle rpc errors, xdr failures etc with proper NULL checks</title>
<updated>2016-11-25T03:03:20+00:00</updated>
<author>
<name>Krutika Dhananjay</name>
<email>kdhananj@redhat.com</email>
</author>
<published>2016-11-24T13:06:28+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=3a5169907b44d79e207c35941b1973b1f60d2079'/>
<id>3a5169907b44d79e207c35941b1973b1f60d2079</id>
<content type='text'>
Change-Id: Id8ba76ba116d056bc7299dc5ce0980680a5a23f8
BUG: 1398226
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
Reviewed-on: http://review.gluster.org/15924
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Change-Id: Id8ba76ba116d056bc7299dc5ce0980680a5a23f8
BUG: 1398226
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
Reviewed-on: http://review.gluster.org/15924
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
