diff options
author | Pranith Kumar K <pkarampu@redhat.com> | 2017-06-28 09:10:53 +0530 |
---|---|---|
committer | Pranith Kumar Karampuri <pkarampu@redhat.com> | 2017-06-30 05:05:15 +0000 |
commit | 97defef2375b911c7b6a3924c242ba8ef4593686 (patch) | |
tree | 8021d5b796f9bcc40b73d9250d1e5178bb9bde2a | |
parent | 56da27cf5dc6ef54c7fa5282dedd6700d35a0ab0 (diff) |
features/shard: Remove ctx from LRU in shard_forget
Problem:
There is a race when the following two commands are executed on the mount in
parallel from two different terminals on a sharded volume,
which leads to use-after-free.
Terminal-1:
while true; do dd if=/dev/zero of=file1 bs=1M count=4; done
Terminal-2:
while true; do cat file1 > /dev/null; done
In the normal case this is the life-cycle of a shard-inode
1) Shard is added to LRU when it is first looked-up
2) For every operation on the shard it is moved up in LRU
3) When "unlink of the shard"/"LRU limit is hit" happens it is removed from LRU
But we are seeing a race where the inode stays in Shard LRU even after it is
forgotten which leads to Use-after-free and then some memory-corruptions.
These are the steps:
1) Shard is added to LRU when it is first looked-up
2) For every operation on the shard it is moved up in LRU
Reader-handler Truncate-handler
1) Reader handler needs shard-x to be read. 1) Truncate has just deleted shard-x
2) In shard_common_resolve_shards(), it does
inode_resolve() and that leads to
a hit in LRU, so it is going to call
__shard_update_shards_inode_list() to move the
inode to top of LRU
2) shard-x gets unlinked from the itable
and inode_forget(inode, 0) is called
to make sure the inode can be purged
upon last unref
3) when __shard_update_shards_inode_list() is
called it finds that the inode is not in LRU
so it adds it back to the LRU-list
Both these operations complete and call inode_unref(shard-x) which leads to the inode
getting freed and forgotten, even when it is in Shard LRU list. When more inodes are
added to LRU, use-after-free will happen and it leads to undefined behaviors.
Fix:
I see that the inode can be removed from LRU even by the protocol layers like gfapi/gNFS
when LRU limit is reached. So it is better to add a check in shard_forget() to remove itself
from LRU list if it exists.
BUG: 1466037
Change-Id: Ia79c0c5c9d5febc56c41ddb12b5daf03e5281638
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: https://review.gluster.org/17644
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
-rw-r--r-- | tests/bugs/shard/parallel-truncate-read.t | 48 | ||||
-rw-r--r-- | xlators/features/shard/src/shard.c | 13 |
2 files changed, 61 insertions, 0 deletions
diff --git a/tests/bugs/shard/parallel-truncate-read.t b/tests/bugs/shard/parallel-truncate-read.t new file mode 100644 index 00000000000..4de876f58f6 --- /dev/null +++ b/tests/bugs/shard/parallel-truncate-read.t @@ -0,0 +1,48 @@ +#!/bin/bash + +#This test will crash if shard's LRU contains a shard's inode even after the +#inode is forgotten. Minimum time for crash to happen I saw was 180 seconds + +. $(dirname $0)/../../include.rc + +function keep_writing { + cd $M0; + while [ -f /tmp/parallel-truncate-read ] + do + dd if=/dev/zero of=file1 bs=1M count=16 + done + cd +} + +function keep_reading { + cd $M0; + while [ -f /tmp/parallel-truncate-read ] + do + cat file1 > /dev/null + done + cd +} + +cleanup; + +TEST touch /tmp/parallel-truncate-read +TEST glusterd +TEST pidof glusterd +TEST $CLI volume create $V0 replica 3 $H0:$B0/${V0}{0,1,2} +TEST $CLI volume set $V0 features.shard on +TEST $CLI volume set $V0 performance.quick-read off +TEST $CLI volume set $V0 performance.io-cache off +TEST $CLI volume set $V0 performance.stat-prefetch off +TEST $CLI volume set $V0 performance.read-ahead off +TEST $CLI volume start $V0 + +TEST $GFS --volfile-id=$V0 --volfile-server=$H0 $M0 +keep_writing & +keep_reading & +sleep 180 +TEST rm -f /tmp/parallel-truncate-read +wait +#test that the mount is operational +TEST stat $M0 + +cleanup; diff --git a/xlators/features/shard/src/shard.c b/xlators/features/shard/src/shard.c index d7526339591..f892fb69efa 100644 --- a/xlators/features/shard/src/shard.c +++ b/xlators/features/shard/src/shard.c @@ -5011,13 +5011,26 @@ shard_forget (xlator_t *this, inode_t *inode) { uint64_t ctx_uint = 0; shard_inode_ctx_t *ctx = NULL; + shard_priv_t *priv = NULL; + priv = this->private; inode_ctx_del (inode, this, &ctx_uint); if (!ctx_uint) return 0; ctx = (shard_inode_ctx_t *)ctx_uint; + /* When LRU limit reaches inode will be forcefully removed from the + * table, inode needs to be removed from LRU of shard as well. + */ + if (!list_empty (&ctx->ilist)) { + LOCK(&priv->lock); + { + list_del_init (&ctx->ilist); + priv->inode_count--; + } + UNLOCK(&priv->lock); + } GF_FREE (ctx); return 0; |