| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The general idea of the changes is to prevent resetting event generation
to zero in the inode ctx, since event gen is something that should
follow 'causal order'.
Change #1:
For a read txn, in inode refresh cbk, if event_generation is
found zero, we are failing the read fop. This is not needed
because change in event gen is only a marker for the next inode refresh to
happen and should not be taken into account by the current read txn.
Change #2:
The event gen being zero above can happen if there is a racing lookup,
which resets even get (in afr_lookup_done) if there are non zero afr
xattrs. The resetting is done only to trigger an inode refresh and a
possible client side heal on the next lookup. That can be acheived by
setting the need_refresh flag in the inode ctx. So replaced all
occurences of resetting even gen to zero with a call to
afr_inode_need_refresh_set().
Change #3:
In both lookup and discover path, we are doing an inode refresh which is
not required since all 3 essentially do the same thing- update the inode
ctx with the good/bad copies from the brick replies. Inode refresh also
triggers background heals, but I think it is okay to do it when we call
refresh during the read and write txns and not in the lookup path.
The .ts which relied on inode refresh in lookup path to trigger heals are
now changed to do read txn so that inode refresh and the heal happens.
Change-Id: Iebf39a9be6ffd7ffd6e4046c96b0fa78ade6c5ec
Fixes: #1179
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reported-by: Erik Jacobson <erik.jacobson at hpe.com>
(cherry picked from commit f0fcd909ad4535b60c9208d4804ebe6afe421a09)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Support for gluster volume heal <volname> info healed/heal-failed
was removed by commit bb02cfb56ae08f56df4452c2b948fa962ae1212b in
release-3.6. cli parser will display the usage message in all the
supported versions whenever these clis are run, leading to some
dead code in the latest branches. Since support for these clis
were removed long back, this should not give any backward
compatibility issues as well. Hence removing the dead code from
the code base which will lead to better code coverage by the
regression runs as well.
Updates: #1052
Change-Id: I0c2b061469caf233c06d9699b0d159ce48e240b9
Signed-off-by: karthik-us <ksubrahm@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
...if pending xattrs are zero for all children.
Problem:
If there are no pending xattrs and a metadata heal needs to be
performed, it can be possible that we end up with xattrs inadvertendly
deleted from all bricks, as explained in the BZ.
Fix:
After picking one among the sources as the good copy, mark pending xattrs on
all sources to blame the sinks. Now even if this metadata heal fails midway,
a subsequent heal will still choose one of the valid sources that it
picked previously.
Fixes: #1067
Change-Id: If1b050b70b0ad911e162c04db4d89b263e2b8d7b
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Current implementation assumes that ping-event will come after connect event
but that may not be the case in the cases where after socket connection fds
need to be re-opened which would consume more time. So handle any order of the
ping/child-up events.
fixes: bz#1800583
Change-Id: I6bcdc0caa503bdc039ef2b4739fbf4afae121f05
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The was a problem when self-heal was sending lookups at the same time
that one of the bricks was coming up. In this case there was a chance
that the number of 'up' bricks changes in the middle of sending the
requests to subvolumes which caused a discrepancy in the expected
number of replies and the actual number of sent requests.
This discrepancy caused that AFR continued executing requests before
all requests were complete. Eventually, the frame of the pending
request was destroyed when the operation terminated, causing a use-
after-free issue when the answer was finally received.
In theory the same thing could happen in the reverse way, i.e. AFR
tries to wait for more replies than sent requests, causing a hang.
Change-Id: I7ed6108554ca379d532efb1a29b2de8085410b70
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
Fixes: bz#1808875
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
In a hyperconverged setup with granular-entry-heal enabled, if a file is
recreated while one of the bricks is down, and an index heal is triggered
(with the brick still down), entry-self heal was doing a spurious heal
with just the 2 good bricks. It was doing a post-op leading to removal
of the filename from .glusterfs/indices/entry-changes as well as
erroneous setting of afr xattrs on the parent. When the brick came up,
the xattrs were cleared, resulting in the renamed file not getting
healed and leading to gfid split-brain and EIO on the mount.
Fix:
Proceed with entry heal only when shd can connect to all bricks of the replica,
just like in data and metadata heal.
fixes: bz#1801624
Change-Id: I916ae26ad1fabf259bc6362da52d433b7223b17e
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
When we mount a ta volume, as soon as 2 data bricks are connected
we consider that the mount is done and then send a lookup/create on
ta file on ta node. However, this connection with ta node might not
have been completed.
Due to this delay, ta replica id file will not be created and we
will see ENOTCONN error in log file if we do lookup.
Solution:
As we know that this ta node could have a higher latency, we should
wait for reasonable time for connection to happen before sending
lookup/create on replica id file.
fixes: bz#1720463
Change-Id: I36f90865afe617e4e84cee57fec832a16f5dd6cc
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
In function afr_selfheal_data_block(), we only check for the lock count
to be equal to or greater than the number of sinks. There can be a case
where we have 2 source bricks and one sink and the locking is successful
on only the source brick(s). In this case we continue with the healing
on sink without having a lock, which is not correct.
Fix:
Check for lock on atleast source & one sink before starting the data heal.
Change-Id: Iebcb57dcaa4b31831fedfee63d6ca16e9d6c8df8
fixes: bz#1688115
Signed-off-by: karthik-us <ksubrahm@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
heal-info code assumes that all indices in xattrop directory
definitely need heal. There is one corner case.
The very first xattrop on the file will lead to adding the
gfid to 'xattrop' index in fop path and in _cbk path it is
removed because the fop is zero-xattr xattrop in success case.
These gfids could be read by heal-info and shown as needing heal.
Fix:
Check the pending flag to see if the file definitely needs or
not instead of which index is being crawled at the moment.
fixes: bz#1801623
Change-Id: I79f00dc7366fedbbb25ec4bec838dba3b34c7ad5
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
For files: During metadata heal, we restore timestamps
only for non-regular (char, block etc.) files.
Extenting it for regular files as timestamp is updated
via touch command also
fixes: bz#1787274
Change-Id: I26fe4fb6dff679422ba4698a7f828bf62ca7ca18
Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In many cases, we were freely allocating long keys with no need.
Smaller char arrays are just fine almost anywhere, so just went ahead
and looked where they we can use smaller ones.
In some cases, annotated the functions as static and the prefixes
passed as const as it was easier to read and understand.
Where relevant, converted the dict functions to use known key length.
Change-Id: I882ab33ea20d90b63278336cd1370c09ffdab7f2
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This volume option was not made avaialble to `gluster volume set` CLI.
Reported-by: epolakis(https://github.com/kinsu) in
https://github.com/gluster/glusterfs/issues/781
fixes: bz#1787554
Change-Id: I7141bdd4e53ee99e22b354edde8d023bfc0b2cd7
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Perform AFR_COUNT() once, in afr_has_quorum() and pass the result
to afr_lookup_has_quorum()
2. Simplify afr_lookup_has_quorum() - pass less parameters to it.
(Via the change in item 1 above).
3. Make afr_is_add_replica_mount_lookup_on_root() static function.
4. Remove dead code - afr_decide_heal_info() which was not used.
Change-Id: If9168cd01e22788a0e60b91e315787d2aa60e97b
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Code like:
f(..., uuid_utoa(x), uuid_utoa(y));
is not valid (causes undefined behaviour) because uuid_utoa()
uses the only static thread-local buffer which will be overwritten
by the subsequent call. All such cases should be converted to use
uuid_utoa_r() with explicitly specified buffer.
Change-Id: I5e72bab806d96a9dd1707c28ed69ca033b9c8d6c
Updates: bz#1193929
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
|
|
|
|
|
|
|
|
| |
convert gf_msg() into gf_smsg()
Change-Id: I8f5b7bbb9caa78902b06f67257502b67adab7405
Updates: #657
Signed-off-by: yatipadia <ypadia@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Changes in locks xlator:
Added support for per-domain inodelk count requests.
Caller needs to set GLUSTERFS_MULTIPLE_DOM_LK_CNT_REQUESTS key in the
dict and then set each key with name
'GLUSTERFS_INODELK_DOM_PREFIX:<domain name>'.
In the response dict, the xlator will send the per domain count as
values for each of these keys.
Changes in AFR:
Replaced afr_selfheal_locked_inspect() with afr_lockless_inspect(). Logic has
been added to make the latter behave same as the former, thus not
breaking the current heal info output behaviour.
fixes: bz#1774011
Change-Id: Ie9e83c162aa77f44a39c2ba7115de558120ada4d
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
As a follow up on https://review.gluster.org/#/c/glusterfs/+/23749/,
adding error logging for the entire method.
In addition, converted logging to structured logging in the method.
Fixes: bz#1778457
Change-Id: I1f412159e6849d6f6ddbde53ec4a85ad709bbdf4
Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Added a log for a failure in order to avoid "unused variable" coverity
issue.
fixes: CID#1274209
Change-Id: Ibc6b0ab4bdff482096e42e88fd4c8c7eadfeeadb
Updates: bz#789278
Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
|
|
|
|
|
|
|
|
| |
This reverts commit fce5f68bc72d448490a0d41be494ac54a9181b3c.
I merged the wrong patch by mistake! Hence reverting it.
updates: bz#1774011
Change-Id: Id7d6ed1d727efc02467c8a9aea3374331261ebd5
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Changes in locks xlator:
Added support for per-domain inodelk count requests.
Caller needs to set GLUSTERFS_MULTIPLE_DOM_LK_CNT_REQUESTS key in the
dict and then set each key with name
'GLUSTERFS_INODELK_DOM_PREFIX:<domain name>'.
In the response dict, the xlator will send the per domain count as
values for each of these keys.
Changes in AFR:
Replaced afr_selfheal_locked_inspect() with afr_lockless_inspect(). Logic has
been added to make the latter behave same as the former, thus not
breaking the current heal info output behaviour.
fixes: bz#1774011
Change-Id: I9ae08ce768b39aeb6ee230207b5b7fa744176952
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit "ccf33e789 - dict.c: remove redundant checks"
removed some NULL checks in certain dict functions. This caused
flooding of fuse mount logs when I/O was done on the mount on a replica
volume:
Message:
W [dict.c:1478:dict_get_with_refn]
(-->/usr/local/lib/libglusterfs.so.0(dict_get_uint32+0x4d)
[0x7ff9121ec963] -->/usr/local/lib/libglusterfs.so.0(dict_get_with_ref+0x90)
[0x7ff9121eb93f] -->/usr/local/lib/libglusterfs.so.0(+0x229be)
[0x7ff9121eb9be] ) 0-dict: dict OR key (glusterfs.lk.lkmode) is NULL [Invalid argument]
Fix:
In the relevant AFR functions, check that dict is not NULL before trying
to perform operations on it.
See bug description for more details.
fixes: bz#1772006
Change-Id: I30c89c0b5d6c80cc86a6047aae70127769412120
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implements lock healing for gluster-block fencing use case.
If mandatory lock is enabled:
- Add domain lock/unlock to afr_lk fop.
- Maintain a list of locks to be healed in afr_private_t.
- Add lock to the list if afr_lk(F_SETLK or F_SETLKW) was sucessful.
- Remove it from the list during afr_lk(F_UNLCK).
- On child_down, mark lock as needing heal on that child. If lock is
lost on quorum no. of bricks, remove it from the list and mark fd bad.
- For fds marked as bad, fail the subsequent fd based fops.
- On parent up, traverse the list and heal the locks IFF the client is
the lk owner and has quorum. (shd does not heal any locks).
updates: #613
Change-Id: I03c46ceaea30f5e6236d5ec13f71d843d827f1bc
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Afr adds its own xattrs to the req, so it should take a copy of the
dictionary to prevent parent xlator re-using the modified xattr-req
to another subvolume
fixes: bz#1765155
Change-Id: I268e2dbd1b12323135d369e90a22a8bdde2cf7c2
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
| |
fixes: bz#1760189
Change-Id: Iffbf8d6f4c50b8e2de8364658697bdbe96549f5d
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
squash >50 warnings on padding of structs in afr structures.
The warnings were found by manually added '-Wpadded' to the GCC
command line.
Change-Id: I961fbdeb33715cedf3dd10db8e4f8ef40cd3e867
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
In a situation where B1 blames B2, B2 blames B1 and B3 doesn't blame
anything for entry heal, heal will not complete even though we have
clear source and sinks. This will happen because while doing
afr_selfheal_find_direction() only the bricks which are blamed by
non-accused bricks are considered as sinks. Later in
__afr_selfheal_entry_finalize_source() when it tries to mark all the
non-sources as sinks it fails to do so because there won't be any
healed_sinks marked, no witness present and there will be a source.
Fix:
If there is a source and no healed_sinks, then reset all the locked
sources to 0 and healed sinks to 1 to do conservative merge.
Change-Id: If40d8bc95d52a52b2730f55bdcf135109b421548
Fixes: bz#1749322
Signed-off-by: karthik-us <ksubrahm@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ever since we added quorum checks for lookups in afr via commit
bd44d59741bb8c0f5d7a62c5b1094179dd0ce8a4, the split-brain resolution
commands would not work for replica 3 because there would be no
readables for the lookup fop.
The argument was that split-brains do not occur in replica 3 but we do
see (data/metadata) split-brain cases once in a while which indicate that there are
a few bugs/corner cases yet to be discovered and fixed.
Fortunately, commit 8016d51a3bbd410b0b927ed66be50a09574b7982 added
GF_CLIENT_PID_GLFS_HEALD as the pid for all fops made by glfsheal. If we
leverage this and allow lookups in afr when pid is GF_CLIENT_PID_GLFS_HEALD,
split-brain resolution commands will work for replica 3 volumes too.
Likewise, the check is added in shard_lookup as well to permit resolving
split-brains by specifying "/.shard/shard-file.xx" as the file name
(which previously used to fail with EPERM).
Change-Id: I3c543dea79caf7cfbc1633e9089cb1cdd2538ba9
Fixes: bz#1756938
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
If you are already under lock, just decrement the call count
directly instead of removing the lock, re-taking the lock
and decrementing.
Implements https://github.com/gluster/glusterfs/issues/728
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
Change-Id: I3fa20b4651fbdb826655c5a03baeed46e99b5487
|
|
|
|
|
|
|
|
|
|
| |
In 3 cases, there was a memory allocation and zeroing, followed
directly by populating it with content. Replaced with memory
allocation that did not zero the memory.
Change-Id: I4fbb5c924fb3a144e415d2368126b784dde760ea
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After add-brick and rebalance, the ctime xattr is not present
on rebalanced directories on new brick. This patch fixes the
same.
Note that ctime still doesn't support consistent time across
distribute sub-volume.
This patch also fixes the in-memory inconsistency of time attributes
when metadata is self healed.
Change-Id: Ia20506f1839021bf61d4753191e7dc34b31bb2df
fixes: bz#1734026
Signed-off-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
We were not passing xattr_req when doing a name self heal
as well as a meta data heal. Because of this, some xdata
was missing which causes i/o errors
Change-Id: Ibfb1205a7eb0195632dc3820116ffbbb8043545f
Fixes: bz#1728770
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
...whenever shd is re-enabled after disabling or there is a change in
`cluster.heal-timeout`, without needing to restart shd or waiting for the
current `cluster.heal-timeout` seconds to expire.
See BZ 1743988 for more details.
Change-Id: Ia5ebd7c8e9f5b54cba3199c141fdd1af2f9b9bfe
fixes: bz#1744548
Reported-by: Glen Kiessling <glenk1973@hotmail.com>
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
-Minor change to if-else structure to avoid code duplication.
-Added logging in case method calls fails
CID: 1394654
Updates: bz#789278
Change-Id: Ibef4450dc89ddd3bf951303d5b87f503924fd250
Signed-off-by: Barak Sason <bsasonro@redhat.com>
|
|
|
|
|
|
| |
Fixes: bz#1734370
Change-Id: I29e338bac62104233a6f80212df8d0fb016affda
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Initialize a dictionary for example seems to be prefectly fine to be done
before taking a lock.
Change-Id: Ib29516c4efa8f0e2b526d512beab488fcd16d2e7
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
| |
This function does length, allocation and serialization for you.
Change-Id: I142a259952a2fe83dd719442afaefe4a43a8e55e
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
In case of thin arbiter, before index healer starts crawling the
indices at every heal-timeout interval, even if there is nothing to
be healed it will send an upcall notification to all the clients to
release any AFR_TA_DOM_NOTIFY locks that they hold. SHD will wait
for the upcall to return before proceeding with the heal even though
there is nothing to be healed. This will also invalidates the cached
information about the bricks states on the clients which leads to
extra calls on TA from clients for the next reads & writes if needed.
This will impact the IO performance.
Fix:
- Before sending the upcall to the clients, check for any pending heals
on TA without taking any locks.
- If there is nothing marked bad on TA, then continue with the index
crawl to heal any dirty markings present on the files due to any post-op
failure.
- If there is a brick marked as bad on TA, then take the
AFR_TA_DOM_NOTIFY lock on TA from SHD, get the state on TA and
continue with the current healing process.
Change-Id: Ieb477bc6cb18bbdfd4e7a0453c5ed79b574ec9d6
fixes: bz#1724184
Signed-off-by: karthik-us <ksubrahm@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problems:
1. When checking for type and gfid mismatch, if the type or gfid
is unknown because of missing gfid handle and the gfid xattr
it will be reported as type or gfid mismatch and the heal will
not complete.
2. If the source selected during entry heal has null gfid the same
will be sent to afr_lookup_and_heal_gfid(). In this function when
we try to assign the gfid on the bricks where it does not exist,
we are considering the same gfid and try to assign that on those
bricks. This will fail in posix_gfid_set() since the gfid sent
is null.
Fix:
If the gfid sent to afr_lookup_and_heal_gfid() is null choose a
valid gfid before proceeding to assign the gfid on the bricks
where it is missing.
In afr_selfheal_detect_gfid_and_type_mismatch(), do not report
type/gfid mismatch if the type/gfid is unknown or not set.
Change-Id: Ia06552e4dc4a9f89cb7f5302833604bd21bbf7da
fixes: bz#1722507
Signed-off-by: karthik-us <ksubrahm@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Network latency is an important factor selecting a read subvolume.
So this patch is adding two new policy.
1) We measure the latency of a child during a GF_DUMP rpc call.
Then use this latency to pick a read subvol having the least
latency.
2) Second one is an hybrid mode where it calculates the effective
latency by multiplying outstanding pending read request and
latency, and choose the least one.
Change-Id: Ia49c8a08ab61f7dcdad8b8950aa4d338e7accf97
fixes: #520
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
| |
We should free the mem_pool local_pool during an afr_fini.
Otherwise this will lead to mem leak for shd
Change-Id: I805a34a88077bf7b886c28b403798bf9eeeb1c0b
Updates: bz#1716695
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are many include statements that are not needed.
A previous more ambitious attempt failed because of *BSD plafrom
(see https://review.gluster.org/#/c/glusterfs/+/21929/ )
Now trying a more conservative reduction.
It does not solve all circular deps that we have, but it
does reduce some of them. There is just too much to handle
reasonably (dht-common.h includes dht-lock.h which includes
dht-common.h ...), but it does reduce the overall number of lines
of include we need to look at in the future to understand and fix
the mess later one.
Change-Id: I550cd001bdefb8be0fe67632f783c0ef6bee3f9f
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
We currently don't have a roll-back/undoing of post-ops if quorum is not met.
Though the FOP is still unwound with failure, the xattrs remain on the disk.
Due to these partial post-ops and partial heals (healing only when 2 bricks
are up), we can end up in metadata split-brain purely from the afr xattrs
point of view i.e each brick is blamed by atleast one of the others for
metadata. These scenarios are hit when there is frequent connect/disconnect
of the client/shd to the bricks.
Fix:
Pick a source based on the xattr values. If 2 bricks blame one, the blamed
one must be treated as sink. If there is no majority, all are sources. Once
we pick a source, self-heal will then do the heal instead of erroring out
due to split-brain.
This patch also adds restriction of all the bricks to be up to perform
metadata heal to avoid any metadata loss.
Removed the test case tests/bugs/replicate/bug-1468279-source-not-blaming-sinks.t
as it was doing metadata heal even when only 2 of 3 bricks were up.
Change-Id: I07a9d62f84ceda329dcab1f02a33aeed258dcb09
fixes: bz#1717819
Signed-off-by: karthik-us <ksubrahm@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In function "afr_selfheal_entry_granular", after completing the
heal we are not destroying the frame. This will lead to crash.
when we execute statedump operation, where it tried to access
xlator object. If this xlator object is freed as part of the
graph destroy this will lead to an invalid memory access
Change-Id: I0a5e78e704ef257c3ac0087eab2c310e78fbe36d
fixes: bz#1708926
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
We were not properly cleaning self-heal daemon resources
during ec fini. With shd multiplexing, it is absolutely
necessary to cleanup all the resources during ec fini.
Change-Id: Iae4f1bce7d8c2e1da51ac568700a51088f3cc7f2
fixes: bz#1703948
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- pass fop state instead of afr local to
afr_ta_dom_lock_check_and_release()
- avoid afr_lock_release_synctask() being called simultaneosuly from
notify code path and transaction (post-op) code path due to races.
- Check if the post-op on TA is valid based on event_gen checks.
- Invalidate in-memory information when we get TA child down.
Note: Thi patch addresses some pending review comments of commit
053b1309dc8fbc05fcde5223e734da9f694cf5cc
(https://review.gluster.org/#/c/glusterfs/+/20095/)
fixes: bz#1698449
Change-Id: I2ccd7e1b53362f9f3fed8680aecb23b5011eb18c
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I was working on a blog about troubleshooting AFR issues and I wanted to copy
the messages logged by self-heal for my blog. I then realized that AFR-v2 is not
logging *before* attempting data heal while it logs it for metadata and entry
heals.
I [MSGID: 108026] [afr-self-heal-entry.c:883:afr_selfheal_entry_do]
0-testvol-replicate-0: performing entry selfheal on
d120c0cf-6e87-454b-965b-0d83a4c752bb
I [MSGID: 108026] [afr-self-heal-common.c:1741:afr_log_selfheal]
0-testvol-replicate-0: Completed entry selfheal on
d120c0cf-6e87-454b-965b-0d83a4c752bb. sources=[0] 2 sinks=1
I [MSGID: 108026] [afr-self-heal-common.c:1741:afr_log_selfheal]
0-testvol-replicate-0: Completed data selfheal on
a9b5f183-21eb-4fb3-a342-287d3a7dddc5. sources=[0] 2 sinks=1
I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-testvol-replicate-0: performing metadata selfheal on
a9b5f183-21eb-4fb3-a342-287d3a7dddc5
I [MSGID: 108026] [afr-self-heal-common.c:1741:afr_log_selfheal]
0-testvol-replicate-0: Completed metadata selfheal on
a9b5f183-21eb-4fb3-a342-287d3a7dddc5. sources=[0] 2 sinks=1
Adding it in this patch. Now there is a 'performing' and a corresponding
'Completed' message for every type of heal.
fixes: bz#1707746
Change-Id: I0b954cf1e17b48280aefa76640b5119b92133d61
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Fixed coverity error, "Unchecked return value (CHECKED_RETURN)".
Checking return value & logging error message if afr_set_pending_dict
fails.
updates: bz#789278
Change-Id: Iab7da6b4f3cd0622b95b8e1c412b007a330467e5
Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
|
|
|
|
|
|
| |
Updates: bz#1624701
Change-Id: I7152c28ad85925abccdcc4cd6de8cb2a2b847a51
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When eager-lock lock acquisition fails because of say network failures, the
local is not being removed from owners_list, this leads to accumulation of
waiting frames and the application will hang because the waiting frames are
under the assumption that another transaction is in the process of acquiring
lock because owner-list is not empty. Handled this case as well in this patch.
Added asserts to make it easier to find these problems in future.
fixes bz#1696599
Change-Id: I3101393265e9827755725b1f2d94a93d8709e923
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
| |
memdup() and gf_memdup() have the same implementation. Removed one API
as the presence of both can be confusing.
Change-Id: I562130c668457e13e4288e592792872d2e49887e
updates: bz#1193929
Signed-off-by: Vijay Bellur <vbellur@redhat.com>
|