| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
Fixed dht_order_rename_lock to use the same inodelk ordering
as that of the dht selfheal locks (dictionary order of
lock subvolumes).
Change-Id: Ia3f8353b33ea2fd3bc1ba7e8e777dda6c1d33e0d
BUG: 1570475
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The decision as to which node would migrate a file
was based on the gfid of the file. Files were divided
among the nodes for the replica/disperse set. However,
if a brick was down when rebalance started, the nodeuuids
would be saved as NULL and a set of files would not be migrated.
Now, if the nodeuuid is NULL, the first non-null entry in
the set is the node responsible for migrating the file.
Change-Id: I72554c107792c7d534e0f25640654b6f8417d373
fixes: bz#1566820
Signed-off-by: N Balachandran <nbalacha@redhat.com>
(cherry picked from commit 1f0765242a689980265c472646c64473a92d94c0)
Change-Id: Id1a6e847b0191b6a40707bea789a2a35ea3d9f68
|
|
|
|
|
|
|
|
|
|
|
| |
dht_opendir should wind the open to all subvols
whether or not local->subvols is set. This is
because dht_readdirp winds the calls to all subvols.
Change-Id: I67a96b06dad14a08967c3721301e88555aa01017
updates: bz#1566820
Signed-off-by: N Balachandran <nbalacha@redhat.com>
(cherry picked from commit c4251edec654b4e0127577e004923d9729bc323d)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of https://review.gluster.org/19045
Problem:
Setting the write_subvol value to read_subvol in case of metadata
transaction during pre-op (commit 19f9bcff4aada589d4321356c2670ed283f02c03)
might lead to the original problem of arbiter becoming source.
Scenario:
1) All bricks are up and good
2) 2 writes w1 and w2 are in progress in parallel
3) ctx->read_subvol is good for all the subvolumes
4) w1 succeeds on brick0 and fails on brick1, yet to do post-op on
the disk
5) read/lookup comes on the same file and refreshes read_subvols back
to all good
6) metadata transaction happens which makes ctx->write_subvol to be
assigned with ctx->read_subvol which is all good
7) w2 succeeds on brick1 and fails on brick0 and this will update the
brick in reverse order leading to arbiter becoming source
Fix:
Instead of setting the ctx->write_subvol to ctx->read_subvol in the
pre-op statge, if there is a metadata transaction, check in the
function __afr_set_in_flight_sb_status() if it is a data/metadata
transaction. Use the value of ctx->write_subvol if it is a data
transactions and ctx->read_subvol value for other transactions.
With this patch we assign the value of ctx->write_subvol in the
afr_transaction_perform_fop() with the on disk value, instead of
assigning it in the afr_changelog_pre_op() with the in memory value.
Change-Id: Id2025a7e965f0578af35b1abaac793b019c43cc4
BUG: 1566131
Signed-off-by: karthik-us <ksubrahm@redhat.com>
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of https://review.gluster.org/#/c/18049/
Problem:
When eager-lock is on, and two writes happen in parallel on a FD
we were observing the following behaviour:
- First write fails on one data brick
- Since the post-op is not yet happened, the inode refresh will get
both the data bricks as readable and set it in the inode context
- In flight split brain check see both the data bricks as readable
and allows the second write
- Second write fails on the other data brick
- Now the post-op happens and marks both the data bricks as bad and
arbiter will become source for healing
Fix:
Adding one more variable called write_suvol in inode context and it
will have the in memory representation of the writable subvols. Inode
refresh will not update this value and its lifetime is pre-op through
unlock in the afr transaction. Initially the pre-op will set this
value same as read_subvol in inode context and then in the in flight
split brain check we will use this value instead of read_subvol.
After all the checks we will update the value of this and set the
read_subvol same as this to avoid having incorrect value in that.
Change-Id: I2ef6904524ab91af861d59690974bbc529ab1af3
BUG: 1566131
Signed-off-by: karthik-us <ksubrahm@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For skipped files, use a return value of 1 to prevent
error messages being logged.
> Change-Id: I18de31ac1a64d4460e88dea7826c3ba03c895861
> BUG: 1553598
> Signed-off-by: N Balachandran <nbalacha@redhat.com>
Change-Id: I18de31ac1a64d4460e88dea7826c3ba03c895861
BUG: 1555161
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
On shd, we shouldn't treat any brick down based
on latency, otherwise self-heal will never happen
fixes: 1562723
Change-Id: Ica07fcc4fae91a6bfd9c9a670e2be464704d94b7
BUG: 1562723
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The xattr trusted.glusterfs.list-node-uuids was only sent to a single
subvolume. This was returning null uuids from the other subvolumes as
if they were down.
This fix forces that xattr to be requested from all subvolumes.
Backport of:
> BUG: 1561406
Change-Id: If62eb39a6857258923ba625e153d4ad79018ea2f
BUG: 1561731
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Whenever we read data from file over NFS, NFS reads
more data then requested and caches it. Based on the
stat information it makes sure that the cached/pre-read
data is valid or not.
Consider 4 + 2 EC volume and all the bricks are on
differnt nodes.
In EC, with round-robin read policy, reads are sent on
different set of data bricks. This way, it balances the
read fops to go on all the bricks and avoid heating UP
(overloading) same set of bricks.
Due to small difference in clock speed, it is possible
that we get minor difference for atime, mtime or ctime
for different bricks. That might cause a different stat
returned to NFS based on which NFS will discard
cached/pre-read data which is actually not changed and
could be used.
Solution:
Change read policy for EC as gfid-hash. That will force
all the read to go to same set of bricks.
>Change-Id: I825441cc519e94bf3dc3aa0bd4cb7c6ae6392c84
>BUG: 1554743
>Signed-off-by: Ashish Pandey <aspandey@redhat.com>
Change-Id: I825441cc519e94bf3dc3aa0bd4cb7c6ae6392c84
BUG: 1558352
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Self-heal creates a thread per brick to sweep the index looking for
files that need to be healed. These threads are started before the
volume comes online, so nothing is done but waiting for the next
sweep. This happens once per minute.
When a replace brick command is executed, the new graph is loaded and
all index sweeper threads started. When all bricks have reported, a
getxattr request is sent to the root directory of the volume. This
causes a heal on it (because the new brick doesn't have good data),
and marks its contents as pending to be healed. This is done by the
index sweeper thread on the next round, one minute later.
This patch solves this problem by waking all index sweeper threads
after a successful check on the root directory.
Additionally, the index sweep thread scans the index directory
sequentially, but it might happen that after healing a directory entry
more index entries are created but skipped by the current directory
scan. This causes the remaining entries to be processed on the next
round, one minute later. The same can happen in the next round, so
the heal is running in bursts and taking a lot to finish, specially
on volumes with many directory levels.
This patch solves this problem by immediately restarting the index
sweep if a directory has been healed.
Backport of:
> BUG: 1547662
Change-Id: I58d9ab6ef17b30f704dc322e1d3d53b904e5f30e
BUG: 1555201
Signed-off-by: Xavi Hernandez <jahernan@redhat.com>
|
|
|
|
|
|
|
|
|
| |
ENOSPC returned by a file migration is no longer
considered a rebalance failure.
Change-Id: I21cf3a8acdc827bc478e138d6cb5db649d53a28c
BUG: 1555161
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Append on a file with split-brain succeeds. Open is intercepted by open-behind,
when write comes on the file, open-behind does open+write. Open succeeds
because afr doesn't fail it. Then write succeeds because write-behind
intercepts it. Flush is also intercepted by write-behind, so the application
never gets to know that the write failed.
Fix:
Fail open on split-brain, so that when open-behind does open+write open fails
which leads to write failure. Application will know about this failure.
Change-Id: I4bff1c747c97bb2925d6987f4ced5f1ce75dbc15
BUG: 1544635
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
(cherry picked from commit 786343abca3474ff01aa1017210112d97cbc4843)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch limits itself to only handling the case
where no file (data or linkto) exists on the subvol.
Additional cases to be handled:
1. A linkto file was found on the only child subvol. This currently
calls dht_lookup_everywhere which eventually deletes it. It can be
deleted directly as it will not be pointing to a valid subvol.
2. Directory lookups - locking might be unnecessary in some cases.
> Change-Id: I940ba34531f2aaee1d36fd9ca45ecfd46be662a4
> BUG: 1546620
> Signed-off-by: N Balachandran <nbalacha@redhat.com>
Change-Id: I940ba34531f2aaee1d36fd9ca45ecfd46be662a4
BUG: 1548270
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dht_migrate_file no longer prints an error if getxattr for
posix acls fails with ENODATA/ENOATTR.
> Change-Id: Id9ecf6852cb5294c1c154b28d609889ea3420e1c
> BUG: 1546954
> Signed-off-by: N Balachandran <nbalacha@redhat.com>
Change-Id: Id9ecf6852cb5294c1c154b28d609889ea3420e1c
BUG: 1548078
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replaced "then" with "than"
> Change-Id: I73090e8c1a639befd7c5458e8d63bd173248bc7d
> BUG: 1547128
> Signed-off-by: N Balachandran <nbalacha@redhat.com>
Change-Id: I73090e8c1a639befd7c5458e8d63bd173248bc7d
BUG: 1547841
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dht_populate_inode_for_dentry tries to update the layout
for the '..' entry when listing the root of the volume.
This entry does not correspond to an entry in the volume
and therefore does not have a gfid or a layout on disk,
causing layout processing to fail.
> Change-Id: I2b7470e1c5e20d87b5545160697f24d041045140
> BUG: 1537457
> Signed-off-by: N Balachandran <nbalacha@redhat.com>
Change-Id: I2b7470e1c5e20d87b5545160697f24d041045140
BUG: 1539516
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It looks like fallocate leaves a non-empty
file behind in case of some failures. We now
truncate the file to 0 bytes on failure in
__dht_rebalance_create_dst_file.
> Change-Id: Ia4ad7b94bb3624a301fcc87d9e36c4dc751edb59
> BUG: 1541916
> Signed-off-by: N Balachandran <nbalacha@redhat.com>
Change-Id: Ia4ad7b94bb3624a301fcc87d9e36c4dc751edb59
BUG: 1542601
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Non-privileged users cannot delete linkto
files. However the failure to unlink a stale linkto
causes DHT to fail the lookup with EIO and hence
prevent access to the file.
> Change-Id: Id295362d41e52263790694602f36f1219f0646a2
> BUG: 1542318
> Signed-off-by: N Balachandran <nbalacha@redhat.com>
Change-Id: Id295362d41e52263790694602f36f1219f0646a2
BUG: 1543016
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixed an issue in dht_populate_inode_for_dentry where a layout is
set in the inode without checking if it is already set. This overwrites
the value each time without freeing the already existing layout.
Also includes the changes in https://review.gluster.org/19471
> Change-Id: I651bf539a0b82b4ddc4c355890c16a8e91f5f1fd
> BUG: 1541264
> Signed-off-by: N Balachandran <nbalacha@redhat.com>
Change-Id: I651bf539a0b82b4ddc4c355890c16a8e91f5f1fd
BUG: 1541267
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The child_up array was initialized with all elements being -1 to
allow afr_notify() to differentiate down bricks from bricks that
haven't reported yet. With current implementation this is not needed
anymore and it was causing unexpected results when other parts of
the code considered that if child_up[i] != 0, it meant that it was up.
Backport of:
> BUG: 1541038
Change-Id: I2a9d712ee64c512f24bd5cd3a48dcb37e3139472
BUG: 1541930
Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The dht_(f)xattrop implementation did not implement
migration phase1/phase2 checks which could cause issues
with rebalance on sharded volumes.
This does not solve the issue where fops may reach the target
out of order.
> Change-Id: I2416fc35115e60659e35b4b717fd51f20746586c
> BUG: 1471031
> Signed-off-by: N Balachandran <nbalacha@redhat.com>
Change-Id: I2416fc35115e60659e35b4b717fd51f20746586c
BUG: 1540224
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Existing EC code doesn't try to heal the OpenFD to
avoid unnecessary healing of the data later.
Fix implements the healing of open FDs before
carrying out file operations on them by making an
attempt to open the FDs on required up nodes.
Backport of:
>BUG: 1431955
BUG: 1536334
Change-Id: Ib696f59c41ffd8d5678a484b23a00bb02764ed15
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With heterogenous bricks now being supported in DHT
we could run into issues where files are not migrated
even though there is sufficient space in newly added bricks
which just happen to be considerably smaller than older
bricks. Using percentages instead of absolute available
space for space checks can mitigate that to some extent.
Marking bug-1247563.t as that used to depend on the easier
code to prevent a file from migrating. This will be removed
once we find a way to force a file migration failure.
> Change-Id: I3452520511f304dbf5af86f0632f654a92fcb647
> BUG: 1529440
> Signed-off-by: N Balachandran <nbalacha@redhat.com>
Change-Id: I3452520511f304dbf5af86f0632f654a92fcb647
BUG: 1530455
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Superflous dentries that cannot be fit in the buffer size provided by
kernel are thrown away by fuse-bridge. This means,
* the next readdir(p) seen by readdir-ahead would have an offset of a
dentry returned in a previous readdir(p) response. When readdir-ahead
detects non-monotonic offset it turns itself off which can result in
poor readdir performance.
* readdirp can be cpu-intensive on brick and there is no point to read
all those dentries just to be thrown away by fuse-bridge.
So, the best strategy would be to fill the buffer optimally - neither
overfill nor underfill.
> Change-Id: Idb3d85dd4c08fdc4526b2df801d49e69e439ba84
> BUG: 1492625
> Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
(cherry picked from commit e785faead91f74dce7c832848f2e8f3f43bd0be5)
Change-Id: Idb3d85dd4c08fdc4526b2df801d49e69e439ba84
BUG: 1478411
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
... in readdirp response if dentry points to a directory inode. This
is a special case where the entire layout is stored in one single
subvolume and hence no need for lookup to construct the layout
>Change-Id: I44fd951e2393ec9dac2af120469be47081a32185
>BUG: 1492625
>Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
(cherry picked from commit 59d1cc720f52357f7a6f20bb630febc6a622c99c)
Change-Id: I44fd951e2393ec9dac2af120469be47081a32185
BUG: 1478411
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
After setting split-brain-choice option to analyze the file to resolve
the split brain using the command
"setfattr -n replica.split-brain-choice -v "choiceX" <path-to-file>"
should allow to access the file from mount for default timeout of 5mins.
But the timeout was not honored and was able to access the file even after
the timeout.
Fix:
Call the inode_invalidate() in afr_set_split_brain_choice_cbk() so that
it will triger the cache invalidate after resetting the timer and the
split brain choice. So the next calls to access the file will fail with EIO.
Change-Id: I698cb833676b22ff3e4c6daf8b883a0958f51a64
BUG: 1514380
Signed-off-by: karthik-us <ksubrahm@redhat.com>
(cherry picked from commit 933ec57ccda2c1ba5ce6f207313c3b6802e67ca3)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
..
the brick file system does not support fallocate.
> Change-Id: Id76cda2d8bb3b223b779e5e7a34f17c8bfa6283c
> BUG: 1488103
> Signed-off-by: Susant Palai <spalai@redhat.com>
Change-Id: Id76cda2d8bb3b223b779e5e7a34f17c8bfa6283c
BUG: 1516691
Signed-off-by: Susant Palai <spalai@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The trusted.SGI_ACL_FILE appears to set posix
ACLs on the linkto file that is a target of
file migration. This can mess up file permissions
and cause linkto identification to fail.
Now we remove all ACL xattrs from the results of
the listxattr call on the source before setting them
on the target.
> BUG: 1514329
> Signed-off-by: N Balachandran <nbalacha@redhat.com>
Change-Id: I56802dbaed783a16e3fb90f59f4ce849f8a4a9b4
BUG: 1515042
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In DHT, after locks on all subvolumes are acquired, it would perform the
following steps sequentially,
1. send remove dir on all other subvolumes except the hashed one in a loop;
2. wait for all pending rmdir to be done
3. remove dir on the hashed subvolume
The problem is that in step 1 there is a check to skip hashed subvolume
in the loop. If the last subvolume to check is actually the
hashed one, and step 3 is quickly done before the last and hashed
subvolume is checked, by accessing shared context data be destroyed in
step 3, would cause a crash.
Fix by saving shared data in a local variable to access later in the
loop.
> BUG: 1490642
> Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>
(cherry picked from commit 206120126d455417a81a48ae473d49be337e9463)
Change-Id: I8db7cf7cb262d74efcb58eb00f02ea37df4be4e2
BUG: 1505221
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Existing EC code updates the xattr on the subvolume
in a sequential pattern resulting in very poor performance.
With this fix EC now updates the xattr on the subvolume
in parallel which improves the xattr update performance.
>BUG: 1445663
>Change-Id: I3fc40d66db0b88875ca96a9fa01002ba386c0486
>Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
BUG: 1499150
Change-Id: I3fc40d66db0b88875ca96a9fa01002ba386c0486
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of:
> Change-Id: Ibab292ba705d993b475cd0303fb3318211fb2500
> Reviewed-on: https://review.gluster.org/18026
> BUG: 1480525
> cherry-picked from commit 1e2d6537875d16b783e3c50ada7ee61487c6d796
With this change, enabling choose-local (which means its state makes
transition from "off" to "on") will be effective after the first
gfid-lookup on "/" since volume-set was executed.
Change-Id: Ibab292ba705d993b475cd0303fb3318211fb2500
BUG: 1501022
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Comparing the uuid string of the local node against that stored in the
local_subvol information is inefficient, especially as it is
done for every file to be migrated. The code has now been changed
to set the value of info to 1 if the nodeuuid is that of the node
making the comparison so this becomes an integer comparison.
> BUG: 1451434
> Signed-off-by: N Balachandran <nbalacha@redhat.com>
> https://review.gluster.org/#/c/17851
(cherry picked from commit c4a608799a577a4f38139f6bb8a47da8efb0fec3)
Change-Id: I7491d59caad3b71dbf5facc94dcde0cd53962775
BUG: 1500472
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
If a brick crashes after an entry (file or dir) is created but before
gfid is assigned, the good bricks will have pending entry heal xattrs
but the heal won't complete because afr_selfheal_recreate_entry() tries
to create the entry again and it fails with EEXIST.
Fix:
We could have fixed posx_mknod/mkdir etc to assign the gfid if the file
already exists but the right thing to do seems to be to trigger a lookup
on the bad brick and let it heal the gfid instead of winding an
mknod/mkdir in the first place.
(cherry picked from commit 20fa80057eb430fd72b4fa31b9b65598b8ec1265)
Change-Id: I82f76665a7541f1893ef8d847b78af6466aff1ff
BUG: 1499202
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
> BUG: 1493539
(cherry picked from commit 3bbb4fe4b33dc3a3ed068ed2284077f2a4d8265a)
Change-Id: I6580351b245d5f868e9ddc6a4eb4dd6afa3bb6ec
BUG: 1492066
Signed-off-by: karthik-us <ksubrahm@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
... for AFR_METADATA_TRANSACTION and just mark source and sinks if
metadata is the same.
(cherry picked from commit 24637d54dcbc06de8a7de17c75b9291fcfcfbc84)
Change-Id: I69e55d3c842c7636e3538d1b57bc4deca67bed05
BUG: 1496317
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problems:
As described in BZ 1491670, renaming hardlinks can result in data/mdata
split-brain of the DHT link-to files (T files) without any mismatch of
data and metadata.
As described in BZ 1486063, for a zero-byte file with only dirty bits
set, arbiter brick will likely be chosen as the source brick.
Fix:
For zero byte files in split-brain, pick first brick as
a) data source if file size is zero on all bricks.
b) metadata source if metadata is the same on all bricks
In arbiter case, if file size is zero on all bricks and there are no
pending afr xattrs, pick 1st brick as data source.
(cherry picked from commit 1719cffa911c5287715abfdb991bc8862f0c994e)
Change-Id: I0270a9a2f97c3b21087e280bb890159b43975e04
BUG: 1496317
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reported-by: Rahul Hinduja <rhinduja@redhat.com>
Reported-by: Mabi <mabi@protonmail.ch>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Combined backport of https://review.gluster.org/17850 and
https://review.gluster.org/18187
During graph switch, if fuse sends nameless (gfid) lookups, afr takes
the discover code path to serve it. If there are pending metadata heals,
they do not happen unless an inode refresh happens as a part of
discover (which is not guaranteed to happen always).
This patch fixes it by attempting metadata heal as a part of discover,
just like how it is done in lookup code path.
Change-Id: I87c493045b9225741cad173bf3f645848697032e
BUG: 1488168
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: https://review.gluster.org/18202
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Karthik U S <ksubrahm@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
bug-797171.7 loaded error-gen xlator on the brick which sent EBADF for a
non fd-based fop, namely setattr. This caused
dht_check_and_open_fd_on_subvol_task() to crash as local->fd was NULL.
Fix:
Call dht_check_and_open_fd_on_subvol_task() from dht_file_setattr_cbk
only for dht_fsetattr and not dht_setattr or dht_setattr2
> Reviewed-on: https://review.gluster.org/18208
> Smoke: Gluster Build System <jenkins@build.gluster.org>
> Reviewed-by: Susant Palai <spalai@redhat.com>
> Reviewed-by: Amar Tumballi <amarts@redhat.com>
> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
> Reviewed-by: N Balachandran <nbalacha@redhat.com>
> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
(cherry picked from commit 47188e9eac59de416a5c86c7ec7540ed6aaa1c98)
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Change-Id: Iab4999e213bf2065804f3f8237e470ad454e3c99
BUG: 1489260
Reviewed-on: https://review.gluster.org/18222
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There was no easy way to find out which files were
skipped during a rebalance.
Rebalance now logs a message for every skipped file
using msgid 109126, making it easier to find
all files that were skipped.
> BUG: 1480445
> Signed-off-by: N Balachandran <nbalacha@redhat.com>
> Reviewed-on: https://review.gluster.org/18021
> Smoke: Gluster Build System <jenkins@build.gluster.org>
> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
> Reviewed-by: hari gowtham <hari.gowtham005@gmail.com>
> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Change-Id: I2cac7db7285e2f82354251f3ea4094827b0daf3e
BUG: 1486557
(cherry picked from commit a4c43ba9374b8f75a48d38a032353a0c7d311a73)
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/18149
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If dht_discover finds data files on more than one subvol,
racing calls to dht_discover_cbk could end up calling
dht_aggregate_xattr which could delete dictionary data
that is being accessed by higher layer translators.
Fixed to call dht_aggregate_xattr only for directories and
consider only the first file to be found.
> BUG: 1484709
> Signed-off-by: N Balachandran <nbalacha@redhat.com>
> Reviewed-on: https://review.gluster.org/18137
> Smoke: Gluster Build System <jenkins@build.gluster.org>
> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Change-Id: I4f3d2a405ec735d4f1bb33a04b7255eb2d179f8a
BUG: 1486538
(cherry picked from commit 9420022df0962b1fa4f3ea3774249be81bc945cc)
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/18146
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
...in various self-heal code paths.
Originally found by Pranith in __afr_selfheal_name_impunge ()
Also change __afr_selfheal_assign_gfid() to send lookup only on those
bricks that don't have a gfid matching that of the source.
> Reviewed-on: https://review.gluster.org/18065
> Smoke: Gluster Build System <jenkins@build.gluster.org>
> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
(cherry picked from commit d594900dbca92c356152be65fce16f77c402117c)
Change-Id: I70a2ccd750a2af92c5fc36e0eefb2b6125404b4a
BUG: 1487319
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: https://review.gluster.org/18173
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Earlier, rebalance performed a fix-layout on a directory
before healing its subdirectories. If there were a lot of
subdirs, it could take a while before all subdirs were
created on the newly added bricks. As dht_readdirp only lists
dirs from their hashed subvol, those dirs which hashed to
the newly added bricks but were not yet created on them were
not listed.
Now, the child dirs are listed and processed before the layout
of the parent is fixed. This introduces a change in behaviour
where files in subdirs are migrated before those in parent
directories.
Credit: Shyam <srangana@redhat.com>
Github issue: #239
> BUG: 1248393
> Signed-off-by: N Balachandran <nbalacha@redhat.com>
> Reviewed-on: https://review.gluster.org/18045
> Smoke: Gluster Build System <jenkins@build.gluster.org>
> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
(cherry picked from commit 96b33b4b278391ca8a7755cf274931d4f1808cb5)
Change-Id: I8ae7f24a510754cd8d1b31e5d608bcf1928599e2
BUG: 1483402
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/18071
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add EBADF handling for dht_fremovexattr and dht_fsetxattr.
> BUG: 1476665
> Signed-off-by: N Balachandran <nbalacha@redhat.com>
> Reviewed-on: https://review.gluster.org/17999
> Smoke: Gluster Build System <jenkins@build.gluster.org>
> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
(cherry picked from commit 747a08d34e2a1e94d7fce68a3577370288bb1955)
Change-Id: Ide0d5812dae79655d2565157e5baabcd753b4309
BUG: 1479303
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/18012
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
> Reviewed-on: https://review.gluster.org/17981
> Smoke: Gluster Build System <jenkins@build.gluster.org>
> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
> Reviewed-by: Amar Tumballi <amarts@redhat.com>
> Reviewed-by: Karthik U S <ksubrahm@redhat.com>
(cherry picked from commit bead74a6e085001225bc0704bad1a5db36dd75a1)
Change-Id: I5acb8bd0a19fc4e764d61e349bb690b5236ee610
BUG: 1479474
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: https://review.gluster.org/17997
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DHT fd based fops used to check if the fd was open
on the cached subvol before winding the call. However,
this introduced a performance regression of about
30% for reads.
This check was introduced to handle cases where files
were migrated while IOs were happening. As this is not
the common case, dht will now check if the fd is
open on the cached subvol only if the call fails
with EBADF.
This will prevent a performance hit where a rebalance
is not running.
> BUG: 1476665
> Signed-off-by: N Balachandran <nbalacha@redhat.com>
> Reviewed-on: https://review.gluster.org/17976
> Smoke: Gluster Build System <jenkins@build.gluster.org>
> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
> Reviewed-by: Amar Tumballi <amarts@redhat.com>
> Reviewed-by: Susant Palai <spalai@redhat.com>
> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
(cherry picked from commit cdca1cb26a0aba390c6d8485c0d6d95e22ffc8bd)
Change-Id: I2035a858d63c3fcd22bb634055bbb0ad01686808
BUG: 1479303
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17995
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
> Change-Id: Id91ef35f890055cd42b9a94462f92297c77f1fff
> Bug: 1475282
> Signed-off-by: Susant Palai <spalai@redhat.com>
> Reviewed-on: https://review.gluster.org/17868
> Tested-by: Raghavendra G <rgowdapp@redhat.com>
> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
> Smoke: Gluster Build System <jenkins@build.gluster.org>
> Signed-off-by: Susant Palai <spalai@redhat.com>
Change-Id: Id91ef35f890055cd42b9a94462f92297c77f1fff
Bug: 1477152
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/17943
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To calculate available space on a subvolume we used to do
the following in __dht_check_free_space.
post_availspace = (dst_statfs.f_bavail * dst_statfs.f_frsize) - stbuf->ia_size
Now to subtracting the file size from available space is tricky here.
Sometime available space will be lesser than the file size and since all the
participating members in calculation are unsigned int, the result is a large
number (integer overflow).
Solution: We do not need to subtract the file size from the space available,
since fallocate would have reserved file size space already.
> Change-Id: I4f724358c44b9911933742ff3ff8d55b3dfda1cb
> BUG: 1475282
> Signed-off-by: Susant Palai <spalai@redhat.com>
> Reviewed-on: https://review.gluster.org/17876
> Smoke: Gluster Build System <jenkins@build.gluster.org>
> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
> Reviewed-by: N Balachandran <nbalacha@redhat.com>
> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
> Signed-off-by: Susant Palai <spalai@redhat.com>
Change-Id: I4f724358c44b9911933742ff3ff8d55b3dfda1cb
BUG: 1477152
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/17942
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
In case of truncate, if writev or open fails on a brick,
in some cases it does not mark the failure onlock->good_mask.
This causes the update of size and version on all the bricks
even if it has failed on one of the brick. That ultimately
causes a data corruption.
Solution:
In callback of such writev and open calls, mark fop->good
for parent too.
Thanks Pranith Kumar K <pkarampu@redhat.com> for finding the
root cause.
>Change-Id: I8a1da2888bff53b91a0d362b8c44fcdf658e7466
>BUG: 1476205
>Signed-off-by: Ashish Pandey <aspandey@redhat.com>
>Reviewed-on: https://review.gluster.org/17906
>Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
>Smoke: Gluster Build System <jenkins@build.gluster.org>
>CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
>Signed-off-by: Ashish Pandey <aspandey@redhat.com>
Change-Id: I8a1da2888bff53b91a0d362b8c44fcdf658e7466
BUG: 1476868
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
Reviewed-on: https://review.gluster.org/17932
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Corrected the iterator for looping over the list of
decommissioned bricks while checking if the new target
determined because of min-free-disk values has been
decommissioned.
> BUG: 1474318
> Signed-off-by: N Balachandran <nbalacha@redhat.com>
> Reviewed-on: https://review.gluster.org/17861
> Reviewed-by: Susant Palai <spalai@redhat.com>
(cherry picked from commit 8c3e766fe0a473734e8eca0f70d0318a2b909e2e)
Change-Id: Iee778547eb7370a8069e954b5d629fcedf54e59b
BUG: 1475181
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17872
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The size of non-migrated files was not added to the
size_processed causing incorrect rebalance estimate
calculations. This has been fixed.
> BUG: 1467209
> Signed-off-by: N Balachandran <nbalacha@redhat.com>
> Reviewed-on: https://review.gluster.org/17867
> Reviewed-by: Amar Tumballi <amarts@redhat.com>
> Smoke: Gluster Build System <jenkins@build.gluster.org>
> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
(cherry picked from commit 24ab0ef44a1646223b59e33d0109d8424f8eddd0)
Change-Id: I9f338c44da22b856e9fdc6dc558f732ae9a22f15
BUG: 1475192
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17873
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|