<feed xmlns='http://www.w3.org/2005/Atom'>
<title>glusterfs.git/xlators/features/shard/src, branch v6.10</title>
<subtitle></subtitle>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/'/>
<entry>
<title>features/shard: Fix crash during shards cleanup in error cases</title>
<updated>2020-07-08T02:31:44+00:00</updated>
<author>
<name>Krutika Dhananjay</name>
<email>kdhananj@redhat.com</email>
</author>
<published>2020-03-23T06:17:10+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=663fb1c38150f2662d9f9c796ab9b0a57b7f9f97'/>
<id>663fb1c38150f2662d9f9c796ab9b0a57b7f9f97</id>
<content type='text'>
A crash is seen during a reattempt to clean up shards in background
upon remount. And this happens even on remount (which means a remount
is no workaround for the crash).

In such a situation, the in-memory base inode object will not be
existent (new process, non-existent base shard).
So local-&gt;resolver_base_inode will be NULL.

In the event of an error (in this case, of space running out), the
process would crash at the time of logging the error in the following line -

        gf_msg(this-&gt;name, GF_LOG_ERROR, local-&gt;op_errno, SHARD_MSG_FOP_FAILED,
               "failed to delete shards of %s",
               uuid_utoa(local-&gt;resolver_base_inode-&gt;gfid));

Fixed that by using local-&gt;base_gfid as the source of gfid when
local-&gt;resolver_base_inode is NULL.

Change-Id: I0b49f2b58becd0d8874b3d4b14ff8d92a89d02d5
Fixes: #1127
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
(cherry picked from commit cc43ac8651de9aa508b01cb259b43c02d89b2afc)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A crash is seen during a reattempt to clean up shards in background
upon remount. And this happens even on remount (which means a remount
is no workaround for the crash).

In such a situation, the in-memory base inode object will not be
existent (new process, non-existent base shard).
So local-&gt;resolver_base_inode will be NULL.

In the event of an error (in this case, of space running out), the
process would crash at the time of logging the error in the following line -

        gf_msg(this-&gt;name, GF_LOG_ERROR, local-&gt;op_errno, SHARD_MSG_FOP_FAILED,
               "failed to delete shards of %s",
               uuid_utoa(local-&gt;resolver_base_inode-&gt;gfid));

Fixed that by using local-&gt;base_gfid as the source of gfid when
local-&gt;resolver_base_inode is NULL.

Change-Id: I0b49f2b58becd0d8874b3d4b14ff8d92a89d02d5
Fixes: #1127
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
(cherry picked from commit cc43ac8651de9aa508b01cb259b43c02d89b2afc)
</pre>
</div>
</content>
</entry>
<entry>
<title>features/shard: Send correct size when reads are sent beyond file size</title>
<updated>2019-10-24T09:24:22+00:00</updated>
<author>
<name>Krutika Dhananjay</name>
<email>kdhananj@redhat.com</email>
</author>
<published>2019-08-07T06:42:43+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=02f26172c00143953310e9dd4f8ec08f3de66954'/>
<id>02f26172c00143953310e9dd4f8ec08f3de66954</id>
<content type='text'>
Change-Id: I0cebaaf55c09eb1fb77a274268ff564e871b743b
fixes bz#1737141
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
(cherry picked from commit 51237eda7c4b3846d08c5d24d1e3fe9b7ffba1d4)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Change-Id: I0cebaaf55c09eb1fb77a274268ff564e871b743b
fixes bz#1737141
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
(cherry picked from commit 51237eda7c4b3846d08c5d24d1e3fe9b7ffba1d4)
</pre>
</div>
</content>
</entry>
<entry>
<title>afr: support split-brain CLI for replica 3</title>
<updated>2019-10-17T10:51:33+00:00</updated>
<author>
<name>Ravishankar N</name>
<email>ravishankar@redhat.com</email>
</author>
<published>2019-09-28T03:23:08+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=50dbcd45fa3165247608e2b889d6a802ba5d6323'/>
<id>50dbcd45fa3165247608e2b889d6a802ba5d6323</id>
<content type='text'>
Ever since we added quorum checks for lookups in afr via commit
bd44d59741bb8c0f5d7a62c5b1094179dd0ce8a4, the split-brain resolution
commands would not work for replica 3 because there would be no
readables for the lookup fop.

The argument was that split-brains do not occur in replica 3 but we do
see (data/metadata) split-brain cases once in a while which indicate that there are
a few bugs/corner cases yet to be discovered and fixed.

Fortunately, commit  8016d51a3bbd410b0b927ed66be50a09574b7982 added
GF_CLIENT_PID_GLFS_HEALD as the pid for all fops made by glfsheal. If we
leverage this and allow lookups in afr when pid is GF_CLIENT_PID_GLFS_HEALD,
split-brain resolution commands will work for replica 3 volumes too.

Likewise, the check is added in shard_lookup as well to permit resolving
split-brains by specifying "/.shard/shard-file.xx" as the file name
(which previously used to fail with EPERM).

Change-Id: I3c543dea79caf7cfbc1633e9089cb1cdd2538ba9
Fixes: bz#1760792
Signed-off-by: Ravishankar N &lt;ravishankar@redhat.com&gt;
(cherry picked from commit 47dbd753187f69b3835d2e42fdbe7485874c4b3e)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Ever since we added quorum checks for lookups in afr via commit
bd44d59741bb8c0f5d7a62c5b1094179dd0ce8a4, the split-brain resolution
commands would not work for replica 3 because there would be no
readables for the lookup fop.

The argument was that split-brains do not occur in replica 3 but we do
see (data/metadata) split-brain cases once in a while which indicate that there are
a few bugs/corner cases yet to be discovered and fixed.

Fortunately, commit  8016d51a3bbd410b0b927ed66be50a09574b7982 added
GF_CLIENT_PID_GLFS_HEALD as the pid for all fops made by glfsheal. If we
leverage this and allow lookups in afr when pid is GF_CLIENT_PID_GLFS_HEALD,
split-brain resolution commands will work for replica 3 volumes too.

Likewise, the check is added in shard_lookup as well to permit resolving
split-brains by specifying "/.shard/shard-file.xx" as the file name
(which previously used to fail with EPERM).

Change-Id: I3c543dea79caf7cfbc1633e9089cb1cdd2538ba9
Fixes: bz#1760792
Signed-off-by: Ravishankar N &lt;ravishankar@redhat.com&gt;
(cherry picked from commit 47dbd753187f69b3835d2e42fdbe7485874c4b3e)
</pre>
</div>
</content>
</entry>
<entry>
<title>features/shard: Fix block-count accounting upon truncate to lower size</title>
<updated>2019-07-03T11:16:10+00:00</updated>
<author>
<name>Krutika Dhananjay</name>
<email>kdhananj@redhat.com</email>
</author>
<published>2019-05-08T07:30:51+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=25e4a1249f3904a2a918194541566e3cda512c6b'/>
<id>25e4a1249f3904a2a918194541566e3cda512c6b</id>
<content type='text'>
Backport of:
&gt; BUG: bz#1705884
&gt; Change-Id: I9128a192e9bf8c3c3a959e96b7400879d03d7c53
&gt; Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;

The way delta_blocks is computed in shard is incorrect, when a file
is truncated to a lower size. The accounting only considers change
in size of the last of the truncated shards.

FIX:

Get the block-count of each shard just before an unlink at posix in
xdata.  Their summation plus the change in size of last shard
(from an actual truncate) is used to compute delta_blocks which is
used in the xattrop for size update.

Change-Id: I9128a192e9bf8c3c3a959e96b7400879d03d7c53
fixes: bz#1716871
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
(cherry picked from commit 400b66d568ad18fefcb59949d1f8368d487b9a80)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Backport of:
&gt; BUG: bz#1705884
&gt; Change-Id: I9128a192e9bf8c3c3a959e96b7400879d03d7c53
&gt; Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;

The way delta_blocks is computed in shard is incorrect, when a file
is truncated to a lower size. The accounting only considers change
in size of the last of the truncated shards.

FIX:

Get the block-count of each shard just before an unlink at posix in
xdata.  Their summation plus the change in size of last shard
(from an actual truncate) is used to compute delta_blocks which is
used in the xattrop for size update.

Change-Id: I9128a192e9bf8c3c3a959e96b7400879d03d7c53
fixes: bz#1716871
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
(cherry picked from commit 400b66d568ad18fefcb59949d1f8368d487b9a80)
</pre>
</div>
</content>
</entry>
<entry>
<title>features/shard: Fix integer overflow in block count accounting</title>
<updated>2019-07-03T11:16:01+00:00</updated>
<author>
<name>Krutika Dhananjay</name>
<email>kdhananj@redhat.com</email>
</author>
<published>2019-05-03T05:20:40+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=1c635e5c6961ea0fc625af6399a4727542599653'/>
<id>1c635e5c6961ea0fc625af6399a4727542599653</id>
<content type='text'>
Backport of:
&gt; BUG: bz#1705884
&gt; Change-Id: I2c1ddab17457f45e27428575ad16fa678fd6c0eb
&gt; Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;

... by holding delta_blocks in 64-bit int as opposed to 32-bit int.

Change-Id: I2c1ddab17457f45e27428575ad16fa678fd6c0eb
updates: bz#1716871
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
(cherry picked from commit e18e98659dd2b41eb59cf593fd625f1821a20abf)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Backport of:
&gt; BUG: bz#1705884
&gt; Change-Id: I2c1ddab17457f45e27428575ad16fa678fd6c0eb
&gt; Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;

... by holding delta_blocks in 64-bit int as opposed to 32-bit int.

Change-Id: I2c1ddab17457f45e27428575ad16fa678fd6c0eb
updates: bz#1716871
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
(cherry picked from commit e18e98659dd2b41eb59cf593fd625f1821a20abf)
</pre>
</div>
</content>
</entry>
<entry>
<title>features/shard: Ref shard inode while adding to fsync list</title>
<updated>2019-01-24T18:14:57+00:00</updated>
<author>
<name>Krutika Dhananjay</name>
<email>kdhananj@redhat.com</email>
</author>
<published>2019-01-24T08:44:39+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=72922c1fd69191b220f79905a23395c3a87f86ce'/>
<id>72922c1fd69191b220f79905a23395c3a87f86ce</id>
<content type='text'>
PROBLEM:

Lot of the earlier changes in the management of shards in lru, fsync
lists assumed that if a given shard exists in fsync list, it must be
part of lru list as well. This was found to be not true.

Consider this - a file is FALLOCATE'd to a size which would make the
number of participant shards to be greater than the lru list size.
In this case, some of the resolved shards that are to participate in
this fop will be evicted from lru list to give way to the rest of the
shards. And once FALLOCATE completes, these shards are added to fsync
list but without a ref. After the fop completes, these shard inodes
are unref'd and destroyed while their inode ctxs are still part of
fsync list. Now when an FSYNC is called on the base file and the
fsync-list traversed, the client crashes due to illegal memory access.

FIX:

Hold a ref on the shard inode when adding to fsync list as well.
And unref under following conditions:
1. when the shard is evicted from lru list
2. when the base file is fsync'd
3. when the shards are deleted.

Change-Id: Iab460667d091b8388322f59b6cb27ce69299b1b2
fixes: bz#1669077
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
PROBLEM:

Lot of the earlier changes in the management of shards in lru, fsync
lists assumed that if a given shard exists in fsync list, it must be
part of lru list as well. This was found to be not true.

Consider this - a file is FALLOCATE'd to a size which would make the
number of participant shards to be greater than the lru list size.
In this case, some of the resolved shards that are to participate in
this fop will be evicted from lru list to give way to the rest of the
shards. And once FALLOCATE completes, these shards are added to fsync
list but without a ref. After the fop completes, these shard inodes
are unref'd and destroyed while their inode ctxs are still part of
fsync list. Now when an FSYNC is called on the base file and the
fsync-list traversed, the client crashes due to illegal memory access.

FIX:

Hold a ref on the shard inode when adding to fsync list as well.
And unref under following conditions:
1. when the shard is evicted from lru list
2. when the base file is fsync'd
3. when the shards are deleted.

Change-Id: Iab460667d091b8388322f59b6cb27ce69299b1b2
fixes: bz#1669077
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>features/shard: Fix launch of multiple synctasks for background deletion</title>
<updated>2019-01-11T08:35:55+00:00</updated>
<author>
<name>Krutika Dhananjay</name>
<email>kdhananj@redhat.com</email>
</author>
<published>2018-12-28T13:23:15+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=c0c2022e7d7097e96270a74f37813eda0c4e6339'/>
<id>c0c2022e7d7097e96270a74f37813eda0c4e6339</id>
<content type='text'>
PROBLEM:

When multiple sharded files are deleted in quick succession, multiple
issues were observed:
1. misleading logs corresponding to a sharded file where while one log
   message said the shards corresponding to the file were deleted
   successfully, this was followed by multiple logs suggesting the very
   same operation failed. This was because of multiple synctasks
   attempting to clean up shards of the same file and only one of them
   succeeding (the one that gets ENTRYLK successfully), and the rest of
   them logging failure.

2. multiple synctasks to do background deletion would be launched, one
   for each deleted file but all of them could readdir entries from
   .remove_me at the same time could potentially contend for ENTRYLK on
   .shard for each of the entry names. This is undesirable and wasteful.

FIX:
Background deletion will now follow a state machine. In the event that
there are multiple attempts to launch synctask for background deletion,
one for each file deleted, only the first task is launched. And if while
this task is doing the cleanup, more attempts are made to delete other
files, the state of the synctask is adjusted so that it restarts the
crawl even after reaching end-of-directory to pick up any files it may
have missed in the previous iteration.

This patch also fixes uninitialized lk-owner during syncop_entrylk()
which was leading to multiple background deletion synctasks entering
the critical section at the same time and leading to illegal memory access
of base inode in the second syntcask after it was destroyed post shard deletion
by the first synctask.

Change-Id: Ib33773d27fb4be463c7a8a5a6a4b63689705324e
updates: bz#1662368
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
PROBLEM:

When multiple sharded files are deleted in quick succession, multiple
issues were observed:
1. misleading logs corresponding to a sharded file where while one log
   message said the shards corresponding to the file were deleted
   successfully, this was followed by multiple logs suggesting the very
   same operation failed. This was because of multiple synctasks
   attempting to clean up shards of the same file and only one of them
   succeeding (the one that gets ENTRYLK successfully), and the rest of
   them logging failure.

2. multiple synctasks to do background deletion would be launched, one
   for each deleted file but all of them could readdir entries from
   .remove_me at the same time could potentially contend for ENTRYLK on
   .shard for each of the entry names. This is undesirable and wasteful.

FIX:
Background deletion will now follow a state machine. In the event that
there are multiple attempts to launch synctask for background deletion,
one for each file deleted, only the first task is launched. And if while
this task is doing the cleanup, more attempts are made to delete other
files, the state of the synctask is adjusted so that it restarts the
crawl even after reaching end-of-directory to pick up any files it may
have missed in the previous iteration.

This patch also fixes uninitialized lk-owner during syncop_entrylk()
which was leading to multiple background deletion synctasks entering
the critical section at the same time and leading to illegal memory access
of base inode in the second syntcask after it was destroyed post shard deletion
by the first synctask.

Change-Id: Ib33773d27fb4be463c7a8a5a6a4b63689705324e
updates: bz#1662368
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>features/shard: Assign fop id during background deletion to prevent excessive logging</title>
<updated>2019-01-08T12:08:48+00:00</updated>
<author>
<name>Krutika Dhananjay</name>
<email>kdhananj@redhat.com</email>
</author>
<published>2018-12-28T01:57:11+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=aa28fe32364e39981981d18c784e7f396d56153f'/>
<id>aa28fe32364e39981981d18c784e7f396d56153f</id>
<content type='text'>
... of the kind

"[2018-12-26 05:22:44.195019] E [MSGID: 133010]
[shard.c:2253:shard_common_lookup_shards_cbk] 0-volume1-shard: Lookup
on shard 785 failed. Base file gfid = cd938e64-bf06-476f-a5d4-d580a0d37416
[No such file or directory]"

shard_common_lookup_shards_cbk() has a specific check to ignore ENOENT error without
logging them during specific fops. But because background deletion is done in a new
frame (with local-&gt;fop being GF_FOP_NULL), the ENOENT check is skipped and the
absence of shards gets logged everytime.

To fix this, local-&gt;fop is initialized to GF_FOP_UNLINK during background deletion.

Change-Id: I0ca8d3b3bfbcd354b4a555eee520eb0479bcda35
updates: bz#1662368
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
... of the kind

"[2018-12-26 05:22:44.195019] E [MSGID: 133010]
[shard.c:2253:shard_common_lookup_shards_cbk] 0-volume1-shard: Lookup
on shard 785 failed. Base file gfid = cd938e64-bf06-476f-a5d4-d580a0d37416
[No such file or directory]"

shard_common_lookup_shards_cbk() has a specific check to ignore ENOENT error without
logging them during specific fops. But because background deletion is done in a new
frame (with local-&gt;fop being GF_FOP_NULL), the ENOENT check is skipped and the
absence of shards gets logged everytime.

To fix this, local-&gt;fop is initialized to GF_FOP_UNLINK during background deletion.

Change-Id: I0ca8d3b3bfbcd354b4a555eee520eb0479bcda35
updates: bz#1662368
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>New xlator option to control enable/disable of xlators in Gd2</title>
<updated>2018-12-07T05:28:23+00:00</updated>
<author>
<name>Aravinda VK</name>
<email>avishwan@redhat.com</email>
</author>
<published>2018-12-06T09:39:26+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=40a9e53a44e887658fade3f03afc018e82b941b9'/>
<id>40a9e53a44e887658fade3f03afc018e82b941b9</id>
<content type='text'>
Since glusterd2 don't maintain the xlator option details in code, it
directly reads the xlators options table from `*.so` files. To support
enable and disable of xlator new option added to the option table with
the name same as xlator name itself.

This change will not affect the functionality with glusterd1.

Change-Id: I23d9e537f3f422de72ddb353484466d3519de0c1
updates: #302
Signed-off-by: Aravinda VK &lt;avishwan@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since glusterd2 don't maintain the xlator option details in code, it
directly reads the xlators options table from `*.so` files. To support
enable and disable of xlator new option added to the option table with
the name same as xlator name itself.

This change will not affect the functionality with glusterd1.

Change-Id: I23d9e537f3f422de72ddb353484466d3519de0c1
updates: #302
Signed-off-by: Aravinda VK &lt;avishwan@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>all: add xlator_api to many translators</title>
<updated>2018-12-06T07:54:28+00:00</updated>
<author>
<name>Amar Tumballi</name>
<email>amarts@redhat.com</email>
</author>
<published>2018-11-28T04:35:39+00:00</published>
<link rel='alternate' type='text/html' href='http://dev.gluster.org/cgit/glusterfs.git/commit/?id=340e58f9b3bcdfe4314da65e592dcd5c2daf6fd9'/>
<id>340e58f9b3bcdfe4314da65e592dcd5c2daf6fd9</id>
<content type='text'>
Fixes: #164
Change-Id: I93ad6f0232a1dc534df099059f69951e1339086f
Signed-off-by: Amar Tumballi &lt;amarts@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fixes: #164
Change-Id: I93ad6f0232a1dc534df099059f69951e1339086f
Signed-off-by: Amar Tumballi &lt;amarts@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
