<feed xmlns='http://www.w3.org/2005/Atom'>
<title>glusterfs.git/xlators/performance/write-behind/src/write-behind.c, branch v4.0.0-2</title>
<subtitle></subtitle>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/'/>
<entry>
<title>performance/write-behind: volume option fixes for GD2</title>
<updated>2018-01-22T10:18:08+00:00</updated>
<author>
<name>Csaba Henk</name>
<email>csaba@redhat.com</email>
</author>
<published>2018-01-06T01:53:45+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=b1abe3c1baa5c132d32f7c6eedf3f9255534ed37'/>
<id>b1abe3c1baa5c132d32f7c6eedf3f9255534ed37</id>
<content type='text'>
Updates #302

Change-Id: I2ade3dc4aef4da619c5b64058872594f0f1e004f
Signed-off-by: Csaba Henk &lt;csaba@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Updates #302

Change-Id: I2ade3dc4aef4da619c5b64058872594f0f1e004f
Signed-off-by: Csaba Henk &lt;csaba@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>performance/write-behind: fix bug while handling short writes</title>
<updated>2017-12-22T17:44:57+00:00</updated>
<author>
<name>Raghavendra G</name>
<email>rgowdapp@redhat.com</email>
</author>
<published>2017-12-22T06:32:09+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=0bc22bef7f3c24663aadfb3548b348aa121e3047'/>
<id>0bc22bef7f3c24663aadfb3548b348aa121e3047</id>
<content type='text'>
The variabled "fulfilled" in wb_fulfill_short_write is not reset to 0
while handling every member of the list.

This has some interesting consequences:

* If we break from the loop while processing last member of the list
  head-&gt;winds, req is reset to head as the list is a circular
  one. However, head is already fulfilled and can potentially be
  freed. So, we end up adding a freed request to wb_inode-&gt;todo
  list. This is the RCA for the crash tracked by the bug associated
  with this patch (Note that we saw "holder" which is freed in todo
  list).

* If we break from the loop while processing any of the last but one
  member of the list head-&gt;winds, req is set to next member in the
  list, skipping the current request, even though it is not entirely
  synced. This can lead to data corruption.

The fix is very simple and we've to change the code to make sure
"fulfilled" reflects whether the current request is fulfilled or not
and it doesn't carry history of previous requests in the list.

Change-Id: Ia3d6988175a51c9e08efdb521a7b7938b01f93c8
BUG: 1528558
Signed-off-by: Raghavendra G &lt;rgowdapp@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The variabled "fulfilled" in wb_fulfill_short_write is not reset to 0
while handling every member of the list.

This has some interesting consequences:

* If we break from the loop while processing last member of the list
  head-&gt;winds, req is reset to head as the list is a circular
  one. However, head is already fulfilled and can potentially be
  freed. So, we end up adding a freed request to wb_inode-&gt;todo
  list. This is the RCA for the crash tracked by the bug associated
  with this patch (Note that we saw "holder" which is freed in todo
  list).

* If we break from the loop while processing any of the last but one
  member of the list head-&gt;winds, req is set to next member in the
  list, skipping the current request, even though it is not entirely
  synced. This can lead to data corruption.

The fix is very simple and we've to change the code to make sure
"fulfilled" reflects whether the current request is fulfilled or not
and it doesn't carry history of previous requests in the list.

Change-Id: Ia3d6988175a51c9e08efdb521a7b7938b01f93c8
BUG: 1528558
Signed-off-by: Raghavendra G &lt;rgowdapp@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>core: make gf_boolean_t a C99 bool instead of an enum</title>
<updated>2017-11-03T05:04:11+00:00</updated>
<author>
<name>Jeff Darcy</name>
<email>jdarcy@fb.com</email>
</author>
<published>2017-10-31T16:56:11+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=8e973d3ab96d290a32ae3fdbdd1cf867b7060483'/>
<id>8e973d3ab96d290a32ae3fdbdd1cf867b7060483</id>
<content type='text'>
This reduces the space used from four bytes to one, and allows
new code to use familiar C99 types/values interoperably with our
old cruft. It does *not* change current declarations or code;
that will be left for a separate - much larger - patch.

Updates: #80

Change-Id: I5baedd17d3fb05b38f0d8b8bb9dd62824475842e
Signed-off-by: Jeff Darcy &lt;jdarcy@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reduces the space used from four bytes to one, and allows
new code to use familiar C99 types/values interoperably with our
old cruft. It does *not* change current declarations or code;
that will be left for a separate - much larger - patch.

Updates: #80

Change-Id: I5baedd17d3fb05b38f0d8b8bb9dd62824475842e
Signed-off-by: Jeff Darcy &lt;jdarcy@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>performance/write-behind: Honor the client pid set</title>
<updated>2017-03-10T05:16:59+00:00</updated>
<author>
<name>Kotresh HR</name>
<email>khiremat@redhat.com</email>
</author>
<published>2017-03-06T15:34:05+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=b9e1c911833ca1916055622e5265672d5935d925'/>
<id>b9e1c911833ca1916055622e5265672d5935d925</id>
<content type='text'>
write-behind xlator does not honor the client pid being
set. It doesn't pass down the client pid saved in
'frame-&gt;root-&gt;pid'. This patch fixes the same.

Change-Id: I838dcf43f56d6d0aa1d2c88811a2b271d9e88d05
BUG: 1430608
Signed-off-by: Kotresh HR &lt;khiremat@redhat.com&gt;
Reviewed-on: https://review.gluster.org/16854
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Vijay Bellur &lt;vbellur@redhat.com&gt;
Reviewed-by: Raghavendra G &lt;rgowdapp@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
write-behind xlator does not honor the client pid being
set. It doesn't pass down the client pid saved in
'frame-&gt;root-&gt;pid'. This patch fixes the same.

Change-Id: I838dcf43f56d6d0aa1d2c88811a2b271d9e88d05
BUG: 1430608
Signed-off-by: Kotresh HR &lt;khiremat@redhat.com&gt;
Reviewed-on: https://review.gluster.org/16854
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Vijay Bellur &lt;vbellur@redhat.com&gt;
Reviewed-by: Raghavendra G &lt;rgowdapp@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>events: use attribute(format(/printf)) to catch fmt string errors</title>
<updated>2017-02-26T19:15:14+00:00</updated>
<author>
<name>Kaleb S. KEITHLEY</name>
<email>kkeithle@redhat.com</email>
</author>
<published>2016-11-18T15:05:12+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=4638dfc1fee80f9338f2941f3cccb17bec63989a'/>
<id>4638dfc1fee80f9338f2941f3cccb17bec63989a</id>
<content type='text'>
and statedump too. Also "const char *" (versus just "char *") for the
fmt param.

Change-Id: Ic63734a673208a2cd49aebccce7659816e6179e3
BUG: 1399196
Signed-off-by: Kaleb S. KEITHLEY &lt;kkeithle@redhat.com&gt;
Reviewed-on: https://review.gluster.org/15881
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
and statedump too. Also "const char *" (versus just "char *") for the
fmt param.

Change-Id: Ic63734a673208a2cd49aebccce7659816e6179e3
BUG: 1399196
Signed-off-by: Kaleb S. KEITHLEY &lt;kkeithle@redhat.com&gt;
Reviewed-on: https://review.gluster.org/15881
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>performance/write-behind: access stub only if available during</title>
<updated>2017-02-02T10:02:30+00:00</updated>
<author>
<name>Raghavendra G</name>
<email>rgowdapp@redhat.com</email>
</author>
<published>2017-01-20T10:39:13+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=85d7f1d1ee24ac400d4aa223478727643532693a'/>
<id>85d7f1d1ee24ac400d4aa223478727643532693a</id>
<content type='text'>
statedump

Change-Id: Ia5dd718458a5e32138012f81f014d13fc6b28be2
BUG: 1415115
Signed-off-by: Raghavendra G &lt;rgowdapp@redhat.com&gt;
Reviewed-on: https://review.gluster.org/16440
Reviewed-by: N Balachandran &lt;nbalacha@redhat.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
statedump

Change-Id: Ia5dd718458a5e32138012f81f014d13fc6b28be2
BUG: 1415115
Signed-off-by: Raghavendra G &lt;rgowdapp@redhat.com&gt;
Reviewed-on: https://review.gluster.org/16440
Reviewed-by: N Balachandran &lt;nbalacha@redhat.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>performance/write-behind: do __wb_request_unref within locks</title>
<updated>2017-01-27T03:04:40+00:00</updated>
<author>
<name>Raghavendra G</name>
<email>rgowdapp@redhat.com</email>
</author>
<published>2017-01-24T08:48:03+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=a3b4c70afee89536374f6fa032465cc313437956'/>
<id>a3b4c70afee89536374f6fa032465cc313437956</id>
<content type='text'>
Since __wb_request_unref can remove the request from various lists,
calling it without holding wb_inode-&gt;lock results in corruptions when
other threads simultaneously try to access the lists this request is
part of.

Thanks to "Nithya Balachandran" &lt;nbalacha@redhat.com&gt; for pointing out
the bug.

Change-Id: I78fb6433c2e212500d07780f7b45c5a0e2bf9209
Signed-off-by: Raghavendra G &lt;rgowdapp@redhat.com&gt;
Reviewed-on: https://review.gluster.org/16464
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since __wb_request_unref can remove the request from various lists,
calling it without holding wb_inode-&gt;lock results in corruptions when
other threads simultaneously try to access the lists this request is
part of.

Thanks to "Nithya Balachandran" &lt;nbalacha@redhat.com&gt; for pointing out
the bug.

Change-Id: I78fb6433c2e212500d07780f7b45c5a0e2bf9209
Signed-off-by: Raghavendra G &lt;rgowdapp@redhat.com&gt;
Reviewed-on: https://review.gluster.org/16464
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>performance/write-behind: Add debug messages</title>
<updated>2017-01-09T04:33:43+00:00</updated>
<author>
<name>Raghavendra G</name>
<email>rgowdapp@redhat.com</email>
</author>
<published>2016-12-26T09:46:10+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=521c55c53bd42bfdcc0919019ee81c81305382a2'/>
<id>521c55c53bd42bfdcc0919019ee81c81305382a2</id>
<content type='text'>
Change-Id: I2ea1350fcbe4b6c06dcb8093b28316f734cd3b48
BUG: 1379655
Signed-off-by: Raghavendra G &lt;rgowdapp@redhat.com&gt;
Reviewed-on: http://review.gluster.org/16285
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Change-Id: I2ea1350fcbe4b6c06dcb8093b28316f734cd3b48
BUG: 1379655
Signed-off-by: Raghavendra G &lt;rgowdapp@redhat.com&gt;
Reviewed-on: http://review.gluster.org/16285
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>performance/write-behind: Add more fops for checking dependency with</title>
<updated>2016-12-12T13:05:46+00:00</updated>
<author>
<name>Raghavendra G</name>
<email>rgowdapp@redhat.com</email>
</author>
<published>2016-10-31T05:19:09+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=9c769c6ee1d125b6bab513073767b628b60abeeb'/>
<id>9c769c6ee1d125b6bab513073767b628b60abeeb</id>
<content type='text'>
cached writes

Fops like readdirp, link, fallocate, discard, zerofill return iatt of
files in their responses. This iatt can be cached by md-cache. Hence
it is important that write-behind maintains relative ordering of these
fops with cached writes. Failure to do so, can result in md-cache
storing stale iatts and returning the same to applications.

Change-Id: Icfe12ad807e42fe9e52a9f63e47ce63f511c6946
BUG: 1390050
Signed-off-by: Raghavendra G &lt;rgowdapp@redhat.com&gt;
Reviewed-on: http://review.gluster.org/15757
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
cached writes

Fops like readdirp, link, fallocate, discard, zerofill return iatt of
files in their responses. This iatt can be cached by md-cache. Hence
it is important that write-behind maintains relative ordering of these
fops with cached writes. Failure to do so, can result in md-cache
storing stale iatts and returning the same to applications.

Change-Id: Icfe12ad807e42fe9e52a9f63e47ce63f511c6946
BUG: 1390050
Signed-off-by: Raghavendra G &lt;rgowdapp@redhat.com&gt;
Reviewed-on: http://review.gluster.org/15757
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>performance/write-behind: fix flush stuck by former failed writes</title>
<updated>2016-11-02T04:36:35+00:00</updated>
<author>
<name>Ryan Ding</name>
<email>ryan.ding@open-fs.com</email>
</author>
<published>2016-09-01T07:40:35+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=9340b3c7a6c8556d6f1d4046de0dbd1946a64963'/>
<id>9340b3c7a6c8556d6f1d4046de0dbd1946a64963</id>
<content type='text'>
the issue is happened in this case:
assume a file is opened with fd1 and fd2.
1. some WRITE opto fd1 got error, they were add back to 'todo' queue
   because of those error.
2. fd2 closed, a FLUSH op is send to write-behind.
3. FLUSH can not be unwind because it's not a legal waiter for those
   failed write(as func __wb_request_waiting_on() say). and those failed
   WRITE also can not be ended if fd1 is not closed. fd2 stuck in close
   syscall.

to resolve this issue, we can change the way we determine 2 requests is
'conflict': flush/fsync is not conflict with those write that is not
belonged to them. so __wb_pick_winds() can wind the FLUSH op.

below is some information when the stuck issue happen:
glusterdump logs:
[xlator.performance.write-behind.wb_inode]
path=/ltp-F9eG0ZSOME/rw-buffered-16436
inode=0x7fdbe8039b9c
window_conf=1048576
window_current=249856
transit-size=0
dontsync=0

[.WRITE]
request-ptr=0x7fdbe8020200
refcount=1
wound=no
generation-number=4
req-&gt;op_ret=-1
req-&gt;op_errno=116
sync-attempts=3
sync-in-progress=no
size=131072
offset=1220608
lied=-1
append=0
fulfilled=0
go=0

[.WRITE]
request-ptr=0x7fdbe8068c30
refcount=1
wound=no
generation-number=5
req-&gt;op_ret=-1
req-&gt;op_errno=116
sync-attempts=2
sync-in-progress=no
size=118784
offset=1351680
lied=-1
append=0
fulfilled=0
go=0

[.FLUSH]
request-ptr=0x7fdbe8021cd0
refcount=1
wound=no
generation-number=6
req-&gt;op_ret=0
req-&gt;op_errno=0
sync-attempts=0

gdb detail about above 3 requests:
(gdb) print *((wb_request_t *)0x7fdbe8021cd0)
$2 = {all = {next = 0x7fdbe803a608, prev = 0x7fdbe8068c30}, todo = {next
= 0x7fdbe803a618, prev = 0x7fdbe8068c40}, lie = {next = 0x7fdbe8021cf0,
    prev = 0x7fdbe8021cf0}, winds = {next = 0x7fdbe8021d00, prev =
0x7fdbe8021d00}, unwinds = {next = 0x7fdbe8021d10, prev =
0x7fdbe8021d10}, wip = {
    next = 0x7fdbe8021d20, prev = 0x7fdbe8021d20}, stub =
0x7fdbe80224dc, write_size = 0, orig_size = 0, total_size = 0, op_ret =
0, op_errno = 0,
  refcount = 1, wb_inode = 0x7fdbe803a5f0, fop = GF_FOP_FLUSH, lk_owner
= {len = 8, data = "W\322T\f\271\367y$", '\000' &lt;repeats 1015 times&gt;},
  iobref = 0x0, gen = 6, fd = 0x7fdbe800f0dc, wind_count = 0, ordering =
{size = 0, off = 0, append = 0, tempted = 0, lied = 0, fulfilled = 0,
    go = 0}}
(gdb) print *((wb_request_t *)0x7fdbe8020200)
$3 = {all = {next = 0x7fdbe8068c30, prev = 0x7fdbe803a608}, todo = {next
= 0x7fdbe8068c40, prev = 0x7fdbe803a618}, lie = {next = 0x7fdbe8068c50,
    prev = 0x7fdbe803a628}, winds = {next = 0x7fdbe8020230, prev =
0x7fdbe8020230}, unwinds = {next = 0x7fdbe8020240, prev =
0x7fdbe8020240}, wip = {
    next = 0x7fdbe8020250, prev = 0x7fdbe8020250}, stub =
0x7fdbe8062c3c, write_size = 131072, orig_size = 4096, total_size = 0,
op_ret = -1,
  op_errno = 116, refcount = 1, wb_inode = 0x7fdbe803a5f0, fop =
GF_FOP_WRITE, lk_owner = {len = 8, data = '\000' &lt;repeats 1023 times&gt;},
  iobref = 0x7fdbe80311a0, gen = 4, fd = 0x7fdbe805c89c, wind_count = 3,
ordering = {size = 131072, off = 1220608, append = 0, tempted = -1,
    lied = -1, fulfilled = 0, go = 0}}
(gdb) print *((wb_request_t *)0x7fdbe8068c30)
$4 = {all = {next = 0x7fdbe8021cd0, prev = 0x7fdbe8020200}, todo = {next
= 0x7fdbe8021ce0, prev = 0x7fdbe8020210}, lie = {next = 0x7fdbe803a628,
    prev = 0x7fdbe8020220}, winds = {next = 0x7fdbe8068c60, prev =
0x7fdbe8068c60}, unwinds = {next = 0x7fdbe8068c70, prev =
0x7fdbe8068c70}, wip = {
    next = 0x7fdbe8068c80, prev = 0x7fdbe8068c80}, stub =
0x7fdbe806746c, write_size = 118784, orig_size = 4096, total_size = 0,
op_ret = -1,
  op_errno = 116, refcount = 1, wb_inode = 0x7fdbe803a5f0, fop =
GF_FOP_WRITE, lk_owner = {len = 8, data = '\000' &lt;repeats 1023 times&gt;},
  iobref = 0x7fdbe8052b10, gen = 5, fd = 0x7fdbe805c89c, wind_count = 2,
ordering = {size = 118784, off = 1351680, append = 0, tempted = -1,
    lied = -1, fulfilled = 0, go = 0}}

you can see they are all on 'todo' queue, and FLUSH op fd is not the
same WRITE op fd.

Change-Id: Id687f9cd3b9f281e1a97c83f1ce981ede272b8ab
BUG: 1372211
Signed-off-by: Ryan Ding &lt;ryan.ding@open-fs.com&gt;
Reviewed-on: http://review.gluster.org/15380
Tested-by: Raghavendra G &lt;rgowdapp@redhat.com&gt;
Reviewed-by: Raghavendra G &lt;rgowdapp@redhat.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
the issue is happened in this case:
assume a file is opened with fd1 and fd2.
1. some WRITE opto fd1 got error, they were add back to 'todo' queue
   because of those error.
2. fd2 closed, a FLUSH op is send to write-behind.
3. FLUSH can not be unwind because it's not a legal waiter for those
   failed write(as func __wb_request_waiting_on() say). and those failed
   WRITE also can not be ended if fd1 is not closed. fd2 stuck in close
   syscall.

to resolve this issue, we can change the way we determine 2 requests is
'conflict': flush/fsync is not conflict with those write that is not
belonged to them. so __wb_pick_winds() can wind the FLUSH op.

below is some information when the stuck issue happen:
glusterdump logs:
[xlator.performance.write-behind.wb_inode]
path=/ltp-F9eG0ZSOME/rw-buffered-16436
inode=0x7fdbe8039b9c
window_conf=1048576
window_current=249856
transit-size=0
dontsync=0

[.WRITE]
request-ptr=0x7fdbe8020200
refcount=1
wound=no
generation-number=4
req-&gt;op_ret=-1
req-&gt;op_errno=116
sync-attempts=3
sync-in-progress=no
size=131072
offset=1220608
lied=-1
append=0
fulfilled=0
go=0

[.WRITE]
request-ptr=0x7fdbe8068c30
refcount=1
wound=no
generation-number=5
req-&gt;op_ret=-1
req-&gt;op_errno=116
sync-attempts=2
sync-in-progress=no
size=118784
offset=1351680
lied=-1
append=0
fulfilled=0
go=0

[.FLUSH]
request-ptr=0x7fdbe8021cd0
refcount=1
wound=no
generation-number=6
req-&gt;op_ret=0
req-&gt;op_errno=0
sync-attempts=0

gdb detail about above 3 requests:
(gdb) print *((wb_request_t *)0x7fdbe8021cd0)
$2 = {all = {next = 0x7fdbe803a608, prev = 0x7fdbe8068c30}, todo = {next
= 0x7fdbe803a618, prev = 0x7fdbe8068c40}, lie = {next = 0x7fdbe8021cf0,
    prev = 0x7fdbe8021cf0}, winds = {next = 0x7fdbe8021d00, prev =
0x7fdbe8021d00}, unwinds = {next = 0x7fdbe8021d10, prev =
0x7fdbe8021d10}, wip = {
    next = 0x7fdbe8021d20, prev = 0x7fdbe8021d20}, stub =
0x7fdbe80224dc, write_size = 0, orig_size = 0, total_size = 0, op_ret =
0, op_errno = 0,
  refcount = 1, wb_inode = 0x7fdbe803a5f0, fop = GF_FOP_FLUSH, lk_owner
= {len = 8, data = "W\322T\f\271\367y$", '\000' &lt;repeats 1015 times&gt;},
  iobref = 0x0, gen = 6, fd = 0x7fdbe800f0dc, wind_count = 0, ordering =
{size = 0, off = 0, append = 0, tempted = 0, lied = 0, fulfilled = 0,
    go = 0}}
(gdb) print *((wb_request_t *)0x7fdbe8020200)
$3 = {all = {next = 0x7fdbe8068c30, prev = 0x7fdbe803a608}, todo = {next
= 0x7fdbe8068c40, prev = 0x7fdbe803a618}, lie = {next = 0x7fdbe8068c50,
    prev = 0x7fdbe803a628}, winds = {next = 0x7fdbe8020230, prev =
0x7fdbe8020230}, unwinds = {next = 0x7fdbe8020240, prev =
0x7fdbe8020240}, wip = {
    next = 0x7fdbe8020250, prev = 0x7fdbe8020250}, stub =
0x7fdbe8062c3c, write_size = 131072, orig_size = 4096, total_size = 0,
op_ret = -1,
  op_errno = 116, refcount = 1, wb_inode = 0x7fdbe803a5f0, fop =
GF_FOP_WRITE, lk_owner = {len = 8, data = '\000' &lt;repeats 1023 times&gt;},
  iobref = 0x7fdbe80311a0, gen = 4, fd = 0x7fdbe805c89c, wind_count = 3,
ordering = {size = 131072, off = 1220608, append = 0, tempted = -1,
    lied = -1, fulfilled = 0, go = 0}}
(gdb) print *((wb_request_t *)0x7fdbe8068c30)
$4 = {all = {next = 0x7fdbe8021cd0, prev = 0x7fdbe8020200}, todo = {next
= 0x7fdbe8021ce0, prev = 0x7fdbe8020210}, lie = {next = 0x7fdbe803a628,
    prev = 0x7fdbe8020220}, winds = {next = 0x7fdbe8068c60, prev =
0x7fdbe8068c60}, unwinds = {next = 0x7fdbe8068c70, prev =
0x7fdbe8068c70}, wip = {
    next = 0x7fdbe8068c80, prev = 0x7fdbe8068c80}, stub =
0x7fdbe806746c, write_size = 118784, orig_size = 4096, total_size = 0,
op_ret = -1,
  op_errno = 116, refcount = 1, wb_inode = 0x7fdbe803a5f0, fop =
GF_FOP_WRITE, lk_owner = {len = 8, data = '\000' &lt;repeats 1023 times&gt;},
  iobref = 0x7fdbe8052b10, gen = 5, fd = 0x7fdbe805c89c, wind_count = 2,
ordering = {size = 118784, off = 1351680, append = 0, tempted = -1,
    lied = -1, fulfilled = 0, go = 0}}

you can see they are all on 'todo' queue, and FLUSH op fd is not the
same WRITE op fd.

Change-Id: Id687f9cd3b9f281e1a97c83f1ce981ede272b8ab
BUG: 1372211
Signed-off-by: Ryan Ding &lt;ryan.ding@open-fs.com&gt;
Reviewed-on: http://review.gluster.org/15380
Tested-by: Raghavendra G &lt;rgowdapp@redhat.com&gt;
Reviewed-by: Raghavendra G &lt;rgowdapp@redhat.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
