| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Two reasons:
* ping responses from glusterd may not be relevant for Halo
replication. Instead, it might be interested in only knowing whether
the brick itself is responsive.
* When a brick is killed, propagating GF_EVENT_CHILD_PING of ping
response from glusterd results in GF_EVENT_DISCONNECT spuriously
propagated to parent xlators. These DISCONNECT events are from the
connections client establishes with glusterd as part of its
reconnect logic. Without GF_EVENT_CHILD_PING, the last event
propagated to parent xlators would be the first DISCONNECT event
from brick and hence subsequent DISCONNECTS to glusterd are not
propagated as protocol/client prevents same event being propagated
to parent xlators consecutively. propagating GF_EVENT_CHILD_PING for
ping responses from glusterd would change the last_sent_event to
GF_EVENT_CHILD_PING and hence protocol/client cannot prevent
subsequent DISCONNECT events
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Fixes: bz#1739334
Change-Id: I50276680c52f05ca9e12149a3094923622d6eaef
(cherry picked from commit 5d66eafec581fb3209af74595784be8854ca40a4)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We are logging a warning message saying unknown-key for
tier-enabled kay. although the tier xlator is deprecated,
this key is left behind for handling the peer rejection
issues in a heterogeneous cluster. We need not to log if
this key is not found/recognised.
updates: bz#1727008
Change-Id: Ia68661898a618f99a240ca8d8a124ff6a65ebe9d
Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
(cherry picked from commit 7d270d05f835d41e32572501246b50181dc9be56)
|
|
|
|
|
|
|
|
|
| |
The current list of snapshots from priv->dirents is obtained outside
the lock.
Change-Id: I8876ec0a38308da5db058397382fbc82cc7ac177
Fixes: bz#1731512
(cherry picked from commit 8e795617fd6f5193d0d52a336059ce1a28108c0e)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
In case of thin arbiter, before index healer starts crawling the
indices at every heal-timeout interval, even if there is nothing to
be healed it will send an upcall notification to all the clients to
release any AFR_TA_DOM_NOTIFY locks that they hold. SHD will wait
for the upcall to return before proceeding with the heal even though
there is nothing to be healed. This will also invalidates the cached
information about the bricks states on the clients which leads to
extra calls on TA from clients for the next reads & writes if needed.
This will impact the IO performance.
Fix:
- Before sending the upcall to the clients, check for any pending heals
on TA without taking any locks.
- If there is nothing marked bad on TA, then continue with the index
crawl to heal any dirty markings present on the files due to any post-op
failure.
- If there is a brick marked as bad on TA, then take the
AFR_TA_DOM_NOTIFY lock on TA from SHD, get the state on TA and
continue with the current healing process.
Change-Id: Ieb477bc6cb18bbdfd4e7a0453c5ed79b574ec9d6
fixes: bz#1729483
Signed-off-by: karthik-us <ksubrahm@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For a normal volume, we are updating the pid from a the
process while we do a daemonization or at the end of the
init if it is no-daemon mode. Along with updating the pid
we also lock the file, to make sure that the process is
running fine.
With brick mux, we were updating the pidfile from gluterd
after an attach/detach request.
There are two problems with this approach.
1) We are not holding a pidlock for any file other than parent
process.
2) There is a chance for possible race conditions with attach/detach.
For example, shd start and a volume stop could race. Let's say
we are starting an shd and it is attached to a volume.
While we trying to link the pid file to the running process,
this would have deleted by the thread that doing a volume stop.
Backport of : https://review.gluster.org/#/c/glusterfs/+/22935/
>Change-Id: I29a00352102877ce09ea3f376ca52affceb5cf1a
>Updates: bz#1722541
>Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Change-Id: I29a00352102877ce09ea3f376ca52affceb5cf1a
Updates: bz#1732668
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were cleaning xlator from botton to top, which might
lead to problems when upper xlators trying to access
the xlator object loaded below.
One such scenario is when fd_unref happens as part of the
fini call which might lead to calling the releasedir to
lower xlator. This will lead to invalid mem access
Backport of:https://review.gluster.org/#/c/glusterfs/+/22968/
>Change-Id: I8a6cb619256fab0b0c01a2d564fc88287c4415a0
>Updates: bz#1716695
>Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Change-Id: I8a6cb619256fab0b0c01a2d564fc88287c4415a0
Updates: bz#1730229
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were using glusterfs_graph_fini to free the xl rec from
glusterfs_process_volfp as well as glusterfs_graph_cleanup.
Instead we can use glusterfs_graph_deactivate, which is does
fini as well as other common rec free.
Backportof : https://review.gluster.org/22904
>Change-Id: Ie4a5f2771e5254aa5ed9f00c3672a6d2cc8e4bc1
>Updates: bz#1716695
>Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Change-Id: Ie4a5f2771e5254aa5ed9f00c3672a6d2cc8e4bc1
Updates: bz#1730229
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
...otherwise this leads to a crash when volume status is run on a
heterogeneous mode.
> Fixes: bz#1723658
> Change-Id: I0d39f412b2e5e9d3ef0a3462b90b38bb5364b09d
> Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
> (cherry picked from commit a72452fcf90679b28baec12d2769cbaa982bb4e4)
Fixes: bz#1728127
Change-Id: I0d39f412b2e5e9d3ef0a3462b90b38bb5364b09d
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problems:
1. When checking for type and gfid mismatch, if the type or gfid
is unknown because of missing gfid handle and the gfid xattr
it will be reported as type or gfid mismatch and the heal will
not complete.
2. If the source selected during entry heal has null gfid the same
will be sent to afr_lookup_and_heal_gfid(). In this function when
we try to assign the gfid on the bricks where it does not exist,
we are considering the same gfid and try to assign that on those
bricks. This will fail in posix_gfid_set() since the gfid sent
is null.
Fix:
If the gfid sent to afr_lookup_and_heal_gfid() is null choose a
valid gfid before proceeding to assign the gfid on the bricks
where it is missing.
In afr_selfheal_detect_gfid_and_type_mismatch(), do not report
type/gfid mismatch if the type/gfid is unknown or not set.
Change-Id: Ia06552e4dc4a9f89cb7f5302833604bd21bbf7da
fixes: bz#1729481
Signed-off-by: karthik-us <ksubrahm@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need to send the commit req to peers in case of geo-rep
operations even though it is a no volname operation. In commit
phase peers try to set the txn_opinfo which will fail because
it is a no volname operation where we don't require a commit
phase. We mark skip_locking as true for no volname operations,
but we have to give an exception to geo-rep operations, so that
they can set txn_opinfo in commit phase.
Please refer to detailed RCA at the bug: 1730543
fixes: bz#1730543
Change-Id: I9f2478b12a281f6e052035c0563c40543493a3fc
Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
(cherry picked from commit b917974ee922d7a2e079692ad7d6f61f900b37b2)
|
|
|
|
|
|
|
|
|
|
|
|
| |
snapview server xlator makes use of "localhost" as the volfile server while
initing the new glfs instance to talk to a snapshot. While localhost is fine,
better use the same volfile server that was used to start the snapshot
daemon containing the snapview-server xlator.
Change-Id: I4485d39b0e3d066f481adc6958ace53ea33237f7
fixes: bz#1728391
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
(cherry picked from commit 36f6e6df0ff15d0464b869803710adca2b65e8ba)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: get-state does not show correct brick status if brick
status is not Started, it always shows started if any value
is set brickinfo->status
Solution: Check the value of brickinfo->status to show correct status
in get-state
Change-Id: I12a79619024c2cf59f338220d144f2f034059b3b
fixes: bz#1726905
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
(cherry picked from commit af989db23d1db00e087f2b9d3dfc43b13ef17153)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gluster volume create <VOLNAME> replica 2 thin-arbiter 1 <host1>:<brick1> <host2>:<brick2>
<thin-arbiter-host>:<path-to-store-replica-id-file> [force]
The changes have been made in a way that the last brick in the bricks list
will be treated as the thin-arbiter.
GD1 will be manipulated to consider replica count to be as 2 and continue creating the
volume like any other replica 2 volume but since thin-arbiter volumes need ta-brick
client xlator entries for each subvolume in fuse volfile, volfile generation is
modified in a way to inject these entries seperately in the volfile for every subvolume.
Few more additions -
1- Save the volinfo with new fields ta_bricks list and thin_arbiter_count.
2- Introduce a new option client.ta-brick-port to add remote-port to ta-brick xlator entry
in fuse volfiles. The option can be set using the following CLI syntax -
gluster volume set <VOLNAME> client.ta-brick-port <PORTNO.>
3- Volume Info will contain a Thin-Arbiter-path entry to distinguish
from other replicate volumes.
Change-Id: Ib434e2313b29716f32476c6c211d282c4ef39406
Updates #687
Signed-off-by: Vishal Pandey <vpandey@redhat.com>
(cherry picked from commit 9b223b15ab69fce4076de036ee162f36a058bcd2)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the shd mux changes, shd was havinga a logfile
with volname of the first started volume.
This was creating a lot confusion, as other volumes data
is also logging to a logfile which has a different vol name.
With this changes the logfile will be changed to a unique name
ie "/var/log/glusterfs/glustershd.log". This was the same
logfile name before the shd mux
Change-Id: I2b94c1f0b2cf3c9493505dddf873687755a46dda
fixes: bz#1721601
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is a small race window, where we have a shd proc
without having a connection. That is when we stopped the
last shd running on a process. The list was removed
outside of a lock just after stopping the process.
So there is a window where we stopped the process, but
the shd proc list contains the entry.
Change-Id: Id82a82509e5cd72acac24e8b7b87197626525441
fixes: bz#1722541
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Race:
Thread-1 Thread-2
1) Does ec_get_size_version() to perform
pre-op fxattrop as part of write-1
2) Calls ec_set_dirty_flag() in
ec_get_size_version() for write-2.
This sets dirty[] to 1
3) Completes executing
ec_prepare_update_cbk leading to
ctx->dirty[] = '1'
4) Takes LOCK(inode->lock) to check if there are
any flags and sets dirty-flag because
lock->waiting_flag is 0 now. This leads to
fxattrop to increment on-disk dirty[] to '2'
At the end of the writes the file will be marked for heal even when it doesn't need heal.
Fix:
Perform ec_set_dirty_flag() and other checks inside LOCK() to prevent dirty[] to be marked
as '1' in step 2) above
Updates bz#1593224
Change-Id: Icac2ab39c0b1e7e154387800fbededc561612865
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
On a EC volume, during upgrade from the older version where
ctime feature is not enabled(or not present) to the newer
version where the ctime feature is available (enabled default),
the self heal hangs and doesn't complete.
Cause:
The ctime feature has both client side code (utime) and
server side code (posix). The feature is driven from client.
Only if the client side sets the time in the frame, should
the server side sets the time attributes in xattr. But posix
setattr/fseattr was not doing that. When one of the server
nodes is updated, since ctime is enabled by default, it
starts setting xattr on setattr/fseattr on the updated node/brick.
On a EC volume the first two updated nodes(bricks) are not a
problem because there are 4 other bricks with consistent data.
However once the third brick is updated, the new attribute(mdata xattr)
will cause an inconsistency on metadata on 3 bricks, which
prevents the file to be repaired.
Fix:
Don't create mdata xattr with utimes/utimensat system call.
Only update if already present.
Change-Id: Ieacedecb8a738bb437283ef3e0f042fd49dc4c8c
fixes: bz#1720201
Signed-off-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
If worm_create_cbk receives an error (op_ret == -1) fd will be NULL
and therefore performing fsetxattr would lead to a segfault and the
brick process crashes. To avoid this we allow setting fsetxattr only
if op_ret >= 0 . If an error happens we explicitly unwind
Change-Id: Ie7f8a198add93e5cd908eb7029cffc834c3b58a6
fixes: bz#1717757
Signed-off-by: David Spisla <david.spisla@iternity.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
A name-less lookup does not contain parent's stat,
It is hard to check the lookuped file is at the right path.
This patch changes to a name lookup, and check file's gfid with
expected gfid. If the gfid is different, mark it estale.
fixes: bz#1702131
Change-Id: I2de20b10d680eed1e2fb1d3830b3b3dec4520dbf
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Network latency is an important factor selecting a read subvolume.
So this patch is adding two new policy.
1) We measure the latency of a child during a GF_DUMP rpc call.
Then use this latency to pick a read subvol having the least
latency.
2) Second one is an hybrid mode where it calculates the effective
latency by multiplying outstanding pending read request and
latency, and choose the least one.
Change-Id: Ia49c8a08ab61f7dcdad8b8950aa4d338e7accf97
fixes: #520
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
| |
Fixes: bz#1721474
Change-Id: Ic2a53fa3d1e9e23424c6898e0986f80d52c5e3f6
Signed-off-by: Susant Palai <spalai@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The feature is not supported and is moved out of the codebase from
glusterfs-5.x release. Doesn't make sense to keep the code to
support it.
For those who want to upgrade from an version supporting it to higher
version, please do a 'gluster volume reset $VOL encryption reset' and
then continue with the upgrade process.
updates: bz#1648169
Change-Id: I8cf822c0d7195940bd37f6af2432a3cac68d44d1
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. For parallel writes from nfs-ganesha, two fops with two generations,
but the fops reply maybe returned disordered.
2. The inode md-cache timeout should not increase conf->generation.
With this patch,
1, Fop only gets generation from inode md-cache or conf, does not increase it.
2. The generation is increased at upcall invalidate, estal/enoent error
invalidate, reply with zeroed out stat from write-behind.
Change-Id: I897ecaa143fd18bc024c1948c7d1a6f831fd53da
Updates: bz#1683594
Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
|
|
|
|
|
|
|
|
|
|
| |
Some internal DHT xattrs were not being
removed when calling getxattr in pass-through mode.
This has been fixed.
Change-Id: If7e3dbc7b495db88a566bd560888e3e9c167defa
fixes: bz#1721435
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The BD xlator was removed some time ago. Remove it from the graph.
We can also remove the caps settings - only the BD xlator
was using it.
Lastly, remove the caps (which only BD was using) and the document
describing the translator.
Change-Id: Id0adcb2952f4832a5dc6301e726874522e07935d
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
| |
warning: ā%sā directive argument is null [-Wformat-overflow=]
Change-Id: I69b8d47f0002c58b00d1cc947fac6f1c64e0b295
updates: bz#1193929
Signed-off-by: SheetalPamecha <spamecha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: commit d42221bec9 added a log message based on rsp.op_ret
check. but while running subdir-mount.t, this message is seen even on
successful mounts.
Solution: in __server_getspec(), return value of sys_read() is assigned to
ret, which will be a non-negative number in when sys_read() is success.
This non-zero value is assigned to rsp.op_ret. We should log an error only
when rsp.op_ret is negative.
fixes: bz#1718848
Change-Id: Ieef8ba33c2c7b4a97d4aef17543f58e66fd3b341
Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
|
|
|
|
|
|
|
|
|
| |
... with out which volume creation fails with "volume create: <xyz>: failed:
Failed to create volume files"
Fixes: bz#1716812
Change-Id: I2f4c2c6d5290f066b54e1c1db19e25db9937bedb
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
If ctime and uss enabled, tar still complains with 'file
changed as we read it'
Cause:
To clear nfs cache (gluster-nfs), the ctime was incremented
in snap-view client on stat cbk.
Fix:
The ctime should not be incremented manually. Since gluster-nfs
is planning to be deprecated, this code is being removed to
fix the issue.
Change-Id: Iae7f100c20fce880a50b008ba716077350281404
fixes: bz#1720290
Signed-off-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
| |
We should free the mem_pool local_pool during an afr_fini.
Otherwise this will lead to mem leak for shd
Change-Id: I805a34a88077bf7b886c28b403798bf9eeeb1c0b
Updates: bz#1716695
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dht-common.c: because there was a 'goto err' before assigning
the 'local' variable, there is possibility of NULL
dereference. As the check which was done wouldn't
ever be true, removed the check.
glusterd-geo-rep.c: a possible path where 'slave_host' could be
NULL when it gets passed to strcmp() is found.
strcmp() expects a valid string. Add a NULL check.
Updates: bz#1622665
Change-Id: I64c280bc1beac9a2b109e8fa88f2a5ce8b823c3a
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are many include statements that are not needed.
A previous more ambitious attempt failed because of *BSD plafrom
(see https://review.gluster.org/#/c/glusterfs/+/21929/ )
Now trying a more conservative reduction.
It does not solve all circular deps that we have, but it
does reduce some of them. There is just too much to handle
reasonably (dht-common.h includes dht-lock.h which includes
dht-common.h ...), but it does reduce the overall number of lines
of include we need to look at in the future to understand and fix
the mess later one.
Change-Id: I550cd001bdefb8be0fe67632f783c0ef6bee3f9f
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
| |
So, we will get more debug info.
fixes: #679
Change-Id: I3588e204ad25c20b69271c1a4ee17d0d158bd794
Signed-off-by: Xie Changlong <xiechanglong@cmss.chinamobile.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For nameless LOOKUPs, server creates a new inode which shall
remain invalid until the fop is successfully processed post
which it is linked to the inode table.
But incase if there is an already linked inode for that entry,
it discards that newly created inode which results in upcall
notification. This may result in client being bombarded with
unnecessary upcalls affecting performance if the data set is huge.
This issue can be avoided by looking up and storing the upcall
context in the original linked inode (if exists), thus saving up on
those extra callbacks.
Change-Id: I044a1737819bb40d1a049d2f53c0566e746d2a17
fixes: bz#1718338
Signed-off-by: Soumya Koduri <skoduri@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch cleans some iovec code and creates two additional helper
functions to simplify management of iovec structures.
iov_range_copy(struct iovec *dst, uint32_t dst_count, uint32_t dst_offset,
struct iovec *src, uint32_t src_count, uint32_t src_offset,
uint32_t size);
This function copies up to 'size' bytes from 'src' at offset
'src_offset' to 'dst' at 'dst_offset'. It returns the number of
bytes copied.
iov_skip(struct iovec *iovec, uint32_t count, uint32_t size);
This function removes the initial 'size' bytes from 'iovec' and
returns the updated number of iovec vectors remaining.
The signature of iov_subset() has also been modified to make it safer
and easier to use. The new signature is:
iov_subset(struct iovec *src, int src_count, uint32_t start, uint32_t size,
struct iovec **dst, int32_t dst_count);
This function creates a new iovec array containing the subset of the
'src' vector starting at 'start' with size 'size'. The resulting
array is allocated if '*dst' is NULL, or copied to '*dst' if it fits
(based on 'dst_count'). It returns the number of iovec vectors used.
A new set of functions to iterate through an iovec array have been
created. They can be used to simplify the implementation of other
iovec-based helper functions.
Change-Id: Ia5fe57e388e23392a8d6cdab17670e337cadd587
Updates: bz#1193929
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
We currently don't have a roll-back/undoing of post-ops if quorum is not met.
Though the FOP is still unwound with failure, the xattrs remain on the disk.
Due to these partial post-ops and partial heals (healing only when 2 bricks
are up), we can end up in metadata split-brain purely from the afr xattrs
point of view i.e each brick is blamed by atleast one of the others for
metadata. These scenarios are hit when there is frequent connect/disconnect
of the client/shd to the bricks.
Fix:
Pick a source based on the xattr values. If 2 bricks blame one, the blamed
one must be treated as sink. If there is no majority, all are sources. Once
we pick a source, self-heal will then do the heal instead of erroring out
due to split-brain.
This patch also adds restriction of all the bricks to be up to perform
metadata heal to avoid any metadata loss.
Removed the test case tests/bugs/replicate/bug-1468279-source-not-blaming-sinks.t
as it was doing metadata heal even when only 2 of 3 bricks were up.
Change-Id: I07a9d62f84ceda329dcab1f02a33aeed258dcb09
fixes: bz#1717819
Signed-off-by: karthik-us <ksubrahm@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Long tale of double unref! But do read...
In cases where a shard base inode is evicted from lru list while still
being part of fsync list but added back soon before its unlink, there
could be an extra inode_unref() leading to premature inode destruction
leading to crash.
One such specific case is the following -
Consider features.shard-deletion-rate = features.shard-lru-limit = 2.
This is an oversimplified example but explains the problem clearly.
First, a file is FALLOCATE'd to a size so that number of shards under
/.shard = 3 > lru-limit.
Shards 1, 2 and 3 need to be resolved. 1 and 2 are resolved first.
Resultant lru list:
1 -----> 2
refs on base inode - (1) + (1) = 2
3 needs to be resolved. So 1 is lru'd out. Resultant lru list -
2 -----> 3
refs on base inode - (1) + (1) = 2
Note that 1 is inode_unlink()d but not destroyed because there are
non-zero refs on it since it is still participating in this ongoing
FALLOCATE operation.
FALLOCATE is sent on all participant shards. In the cbk, all of them are
added to fync_list.
Resulting fsync list -
1 -----> 2 -----> 3 (order doesn't matter)
refs on base inode - (1) + (1) + (1) = 3
Total refs = 3 + 2 = 5
Now an attempt is made to unlink this file. Background deletion is triggered.
The first $shard-deletion-rate shards need to be unlinked in the first batch.
So shards 1 and 2 need to be resolved. inode_resolve fails on 1 but succeeds
on 2 and so it's moved to tail of list.
lru list now -
3 -----> 2
No change in refs.
shard 1 is looked up. In lookup_cbk, it's linked and added back to lru list
at the cost of evicting shard 3.
lru list now -
2 -----> 1
refs on base inode: (1) + (1) = 2
fsync list now -
1 -----> 2 (again order doesn't matter)
refs on base inode - (1) + (1) = 2
Total refs = 2 + 2 = 4
After eviction, it is found 3 needs fsync. So fsync is wound, yet to be ack'd.
So it is still inode_link()d.
Now deletion of shards 1 and 2 completes. lru list is empty. Base inode unref'd and
destroyed.
In the next batched deletion, 3 needs to be deleted. It is inode_resolve()able.
It is added back to lru list but base inode passed to __shard_update_shards_inode_list()
is NULL since the inode is destroyed. But its ctx->inode still contains base inode ptr
from first addition to lru list for no additional ref on it.
lru list now -
3
refs on base inode - (0)
Total refs on base inode = 0
Unlink is sent on 3. It completes. Now since the ctx contains ptr to base_inode and the
shard is part of lru list, base shard is unref'd leading to a crash.
FIX:
When shard is readded back to lru list, copy the base inode pointer as is into its inode ctx,
even if it is NULL. This is needed to prevent double unrefs at the time of deleting it.
Change-Id: I99a44039da2e10a1aad183e84f644d63ca552462
Updates: bz#1696136
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
While we process a cleanup, there is a chance for a race between
async operations, for example ec_launch_replace_heal. So this can
lead to invalid mem access.
Solution:
Just like we track on going heal fops, we can also track fops like
ec_launch_replace_heal, so that we can decide when to send a
PARENT_DOWN request.
Change-Id: I055391c5c6c34d58aef7336847f3b570cb831298
fixes: bz#1703948
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
| |
* Also some logging enhancements in snapview-server
Change-Id: I6a7646771cedf4bd1c62806eea69d720bbaf0c83
fixes: bz#1715921
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
commit 146e4b45d0ce906ae50fd6941a1efafd133897ea enabled
storage.fips-mode-rchecksum option for all new volumes with op-version
>=GD_OP_VERSION_7_0 but `gluster vol get $volname
storage.fips-mode-rchecksum` was displaying it as 'off'. This patch fixes it.
fixes: bz#1717782
Change-Id: Ie09f89838893c5776a3f60569dfe8d409d1494dd
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
| |
Change-Id: I8cf0a153f84ef2d162e6dd03261441d211c07d40
updates: bz#1193929
Signed-off-by: Anoop C S <anoopcs@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Fixed a bug in the revalidate code path that wiped out
directory permissions if no mds subvol was found.
Change-Id: I8b4239ffee7001493c59d4032a2d3062586ea115
fixes: bz#1716830
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
| |
All these checks are done after analyzing clang-scan report produced
by the CI job @ https://build.gluster.org/job/clang-scan
updates: bz#1622665
Change-Id: I590305af4ceb779be952974b2a36066ffc4865ca
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The way delta_blocks is computed in shard is incorrect, when a file
is truncated to a lower size. The accounting only considers change
in size of the last of the truncated shards.
FIX:
Get the block-count of each shard just before an unlink at posix in
xdata. Their summation plus the change in size of last shard
(from an actual truncate) is used to compute delta_blocks which is
used in the xattrop for size update.
Change-Id: I9128a192e9bf8c3c3a959e96b7400879d03d7c53
fixes: bz#1705884
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
1401716: Resource leak
1401714: Dereference before null check
updates: bz#789278
Change-Id: I8fb0b143a1d4b37ee6be7d880d9b5b84ba00bf36
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: While tier code was removed, the is_tier_enabled
related to tier wasn't handled for upgrade.
As this option was missing in the info file, the checksum
mismatch issue happens during upgrade.
This results in the peer rejections happening.
Fix: use the op_version check and note down the is_tier_enabled
always. This way it will be dummy key, but the future upgrades
will work fine.
NOTE: Just having the key from 3.10 to 7 will cause issues when
upgraded from 5 to 8 or any such upgrade which skips the version
where we handle it.
Change-Id: I9951e2b74f16e58e884e746c34dcf53e559c7143
fixes: bz#1714973
Signed-off-by: hari gowtham <hgowtham@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
upcall: remove extra variable assignment and use just one
initialization.
open-behind: reduce the overall number of lines, in functions
not frequently called
selinux: reduce some lines in init failure cases
updates: bz#1693692
Change-Id: I7c1de94f2ec76a5bfe1f48a9632879b18e5fbb95
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* locks/posix.c: key was not freed in one of the cases.
* locks/common.c: lock was being free'd out of context.
* nfs/exports: handle case of missing free.
* protocol/client: handle case of entry not freed.
* storage/posix: handle possible case of double free
CID: 1398628, 1400731, 1400732, 1400756, 1124796, 1325526
updates: bz#789278
Change-Id: Ieeaca890288bc4686355f6565f853dc8911344e8
Signed-off-by: Amar Tumballi <amarts@redhat.com>
Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
storage.reserve-size option will take size as input
instead of percentage. If set, priority will be given to
storage.reserve-size over storage.reserve. Default value
of this option is 0.
fixes: bz#1651445
Change-Id: I7a7342c68e436e8bf65bd39c567512ee04abbcea
Signed-off-by: Sheetal Pamecha <sheetal.pamecha08@gmail.com>
|
|
|
|
|
|
|
| |
updates: bz#1193929
Change-Id: Ieb5e35d454498bc389972f9f15fe46b640f1b97d
Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
|