| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem1:
When a directory is renamed while a brick
is down entry-heal always did an rm -rf on that directory on
the sink on old location and did mkdir and created the directory
hierarchy again in the new location. This is inefficient.
Problem2:
Renamedir heal order may lead to a scenario where directory in
the new location could be created before deleting it from old
location leading to 2 directories with same gfid in posix.
Fix:
As part of heal, if oldlocation is healed first and is not present in
source-brick always rename it into a hidden directory inside the
sink-brick so that when heal is triggered in new-location shd can
rename it from this hidden directory to the new-location.
If new-location heal is triggered first and it detects that the
directory already exists in the brick, then it should skip healing the
directory until it appears in the hidden directory.
Credits: Ravi for rename-data-loss.t script
Fixes: #1211
Change-Id: I0cba2006f35cd03d314d18211ce0bd530e254843
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: add-brick operation fails with multiple bricks on same
server error when replica count is increased.
This was happening because of extra runs in a loop to compare
hostnames and if bricks supplied were less than "replica" count,
the bricks will get compared to itself resulting in above error.
Fixes: #1508
Change-Id: I8668e964340b7bf59728bb838525d2db062197ed
Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Glusterfs so far constrained itself with an arbitrary limit (32)
for the number of groups read from /proc/[pid]/status (this was
the number of groups shown there prior to Linux commit
v3.7-9553-g8d238027b87e (v3.8-rc1~74^2~59); since this commit, all
groups are shown).
With this change we'll read groups up to the number Glusterfs
supports in general (64k).
Note: the actual number of groups that are made use of in a
regular Glusterfs setup shall still be capped at ~93 due to limitations
of the RPC transport. To be able to handle more groups than that,
brick side gid resolution (server.manage-gids option) can be used along
with NIS, LDAP or other such networked directory service (see
https://github.com/gluster/glusterdocs/blob/5ba15a2/docs/Administrator%20Guide/Handling-of-users-with-many-groups.md#limit-in-the-glusterfs-protocol
).
Also adding some diagnostic messages to frame_fill_groups().
Change-Id: I271f3dc3e6d3c44d6d989c7a2073ea5f16c26ee0
fixes: #1075
Signed-off-by: Csaba Henk <csaba@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* also add some time gap in other tests to see if we get things properly
* create a directory 'tests/000/', which can host any tests, which are flaky.
* move all the tests mentioned in the issue to above directory.
* as the above dir gets tested first, all flaky tests would be reported quickly.
* change `run-tests.sh` to continue tests even if flaky tests fail.
Reference: gluster/project-infrastructure#72
Updates: #1000
Change-Id: Ifdafa38d083ebd80f7ae3cbbc9aa3b68b6d21d0e
Signed-off-by: Amar Tumballi <amar@kadalu.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Assume that we are preallocating a VM of size 1TB with a shard
block size of 64MB then there will be ~16k shards.
This creation happens in 2 steps shard_fallocate() path i.e
1. lookup for the shards if any already present and
2. mknod over those shards do not exist.
But in case of fresh creation, we dont have to lookup for all
shards which are not present as the the file size will be 0.
Through this, we can save lookup on all shards which are not
present. This optimization is quite useful in the case of
preallocating big vm.
Also if the file is already present and the call is to
extend it to bigger size then we need not to lookup for non-
existent shards. Just lookup preexisting shards, populate
the inodes and issue mknod on extended size.
Fixes: #1425
Change-Id: I60036fe8302c696e0ca80ff11ab0ef5bcdbd7880
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a cluster env: getspec() detects that volfile not found.
but further on, this return code is set by another call
so the error is lost and not handled.
As a result the server responds with ambiguous message:
{op_ret = -1, op_errno = 0..} - which cause the client to stuck.
Fix:
server side: don't override the failure error.
fixes: #1375
Change-Id: Id394954d4d0746570c1ee7d98969649c305c6b0d
Signed-off-by: Tamar Shacked <tshacked@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The scenario of setting an xattr to a dir, killing one of the bricks,
removing the xattr, bringing back the brick results in xattr
inconsistency - The downed brick will still have the xattr, but the rest
won't.
This patch add a mechanism that will remove the extra xattrs during
lookup.
This patch is a modification to a previous patch based on comments that
were made after merge:
https://review.gluster.org/#/c/glusterfs/+/24613/
fixes: #1324
Change-Id: Ifec0b7aea6cd40daa8b0319b881191cf83e031d1
Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Mark dir as missing in layout structure to be healed in
dht_selfheal_directory.
fixes: #1327
Change-Id: If2c69294bd8107c26624cfe220f008bc3b952a4e
Signed-off-by: Susant Palai <spalai@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added test for volume options like
localtime-logging, fixed enable-shared-storage
to include function coverage and few negative
tests for other volume options to increase the
code coverage in the glusterd component.
Change-Id: Ib1706c1fd5bc98a64dcb5c8b15a121d639a597d7
Updates: #1052
Signed-off-by: nik-redhat <nladha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 620158475f462251c996901a8e24306ef6cb4c42.
The patch to revert is https://review.gluster.org/#/c/glusterfs/+/24613/
Reverting is required as comments were posted regarding a more
efficient implementation were made after the patch was merged.
A new patch will be posted to adress the comments will be posted.
updates: #1324
Change-Id: I59205baefe1cada033c736d41ce9c51b21727d3f
Signed-off-by: Barak Sason Rofman <redhat@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The scenario of setting an xattr to a dir, killing one of the bricks,
removing the xattr, bringing back the brick results in xattr
inconsistency - The downed brick will still have the xattr, but the rest
won't.
This patch add a mechanism that will remove the extra xattrs during
lookup.
fixes: #1324
Change-Id: Ibcc449bad6c7cb46bcae380e42e4496d733b453d
Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: add-brick operation is failing when replica or disperse
count is not mentioned in the add-brick command.
Reason: with commit a113d93 we are checking brick order while
doing add-brick operation for replica and disperse volumes. If
replica count or disperse count is not mentioned in the command,
the dict get is failing and resulting add-brick operation failure.
fixes: #1306
Change-Id: Ie957540e303bfb5f2d69015661a60d7e72557353
Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
Test Summary Report
-------------------
tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
(Wstat: 0 Tests: 23 Failed: 3)
Failed tests: 21-23
After glusterd restart, volume start is failing. Looks like, it need some
time to sync the data. Adding sleep for the same.
Note: All other changes are made to avoid spurious failures in the future.
fixes: #1272
Change-Id: Ib184757fb936e03b5b6208465e44a8e790b71c1c
Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: See github issue for details.
Fix:
-In lookup if the entry exists in 2 out of 3 bricks, don't fail the
lookup with ENOENT just because there is an entrylk on the parent.
Consider quorum before deciding.
-If entry FOP does not succeed on quorum no. of bricks, do not perform
new entry mark.
Fixes: #1303
Change-Id: I56df8c89ad53b29fa450c7930a7b7ccec9f4a6c5
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Logs and other output carrying timestamps
will have now timezone offsets indicated, eg.:
[2020-03-12 07:01:05.584482 +0000] I [MSGID: 106143] [glusterd-pmap.c:388:pmap_registry_remove] 0-pmap: removing brick (null) on port 49153
To this end,
- gf_time_fmt() now inserts timezone offset via %z strftime(3) template.
- A new utility function has been added, gf_time_fmt_tv(), that
takes a struct timeval pointer (*tv) instead of a time_t value to
specify the time. If tv->tv_usec is negative,
gf_time_fmt_tv(... tv ...)
is equivalent to
gf_time_fmt(... tv->tv_sec ...)
Otherwise it also inserts tv->tv_usec to the formatted string.
- Building timestamps of usec precision has been converted to
gf_time_fmt_tv, which is necessary because the method of appending
a period and the usec value to the end of the timestamp does not work
if the timestamp has zone offset, but it's also beneficial in terms of
eliminating repetition.
- The buffer passed to gf_time_fmt/gf_time_fmt_tv has been unified to
be of GF_TIMESTR_SIZE size (256). We need slightly larger buffer space
to accommodate the zone offset and it's preferable to use a buffer
which is undisputedly large enough.
This change does *not* do the following:
- Retaining a method of timestamp creation without timezone offset.
As to my understanding we don't need such backward compatibility
as the code just emits timestamps to logs and other diagnostic
texts, and doesn't do any later processing on them that would rely
on their format. An exception to this, ie. a case where timestamp
is built for internal use, is graph.c:fill_uuid(). As far as I can
see, what matters in that case is the uniqueness of the produced
string, not the format.
- Implementing a single-token (space free) timestamp format.
While some timestamp formats used to be single-token, now all of
them will include a space preceding the offset indicator. Again,
I did not see a use case where this could be significant in terms
of representation.
- Moving the codebase to a single unified timestamp format and
dropping the fmt argument of gf_time_fmt/gf_time_fmt_tv.
While the gf_timefmt_FT format is almost ubiquitous, there are
a few cases where different formats are used. I'm not convinced
there is any reason to not use gf_timefmt_FT in those cases too,
but I did not want to make a decision in this regard.
Change-Id: I0af73ab5d490cca7ed8d07a2ce7ac22a6df2920a
Updates: #837
Signed-off-by: Csaba Henk <csaba@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Issue:
When a process has the open fd and the same file is
unlinked in middle of the operations, then file based
lookup fails with ENOENT or stale file
Solution:
When the file already open and fd is available, use fstat
to get the file attributes
Change-Id: I0e83aee9f11b616dcfe13769ebfcda6742e4e0f4
Fixes: #1281
Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Brick process are not properly attached on any cluster node while
some volume options are changed on peer node and glusterd is down on
that specific node.
Solution: At the time of restart glusterd it got a friend update request
from a peer node if peer node having some changes on volume.If the brick
process is started before received a friend update request in that case
brick_mux behavior is not workingproperly. All bricks are attached to
the same process even volumes options are not the same. To avoid the
issue introduce an atomic flag volpeerupdate and update the value while
glusterd has received a friend update request from peer for a specific
volume.If volpeerupdate flag is 1 volume is started by
glusterd_import_friend_volume synctask
Change-Id: I4c026f1e7807ded249153670e6967a2be8d22cb7
Credit: Sanju Rakaonde <srakonde@redhat.com>
fixes: #1290
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
In a replicate/arbiter volume if file creations or writes fails on
quorum number of bricks and on one brick it is due to ENOSPC and
on other brick it fails for a different reason, it may fail with
errors other than ENOSPC in some cases.
Fix:
Prioritize ENOSPC over other lesser priority errors and do not set
op_errno in posix_gfid_set if op_ret is 0 to avoid receiving any
error_no which can be misinterpreted by __afr_dir_write_finalize().
Also removing the function afr_has_arbiter_fop_cbk_quorum() which
might consider a successful reply form a single brick as quorum
success in some cases, whereas we always need fop to be successful
on quorum number of bricks in arbiter configuration.
Change-Id: I106e267f8b9451f681022f1cccb410d9bc824c08
Fixes: #1254
Signed-off-by: karthik-us <ksubrahm@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There was a critical flaw in the previous implementation of open-behind.
When an open is done in the background, it's necessary to take a
reference on the fd_t object because once we "fake" the open answer,
the fd could be destroyed. However as long as there's a reference,
the release function won't be called. So, if the application closes
the file descriptor without having actually opened it, there will
always remain at least 1 reference, causing a leak.
To avoid this problem, the previous implementation didn't take a
reference on the fd_t, so there were races where the fd could be
destroyed while it was still in use.
To fix this, I've implemented a new xlator cbk that gets called from
fuse when the application closes a file descriptor.
The whole logic of handling background opens have been simplified and
it's more efficient now. Only if the fop needs to be delayed until an
open completes, a stub is created. Otherwise no memory allocations are
needed.
Correctly handling the close request while the open is still pending
has added a bit of complexity, but overall normal operation is simpler.
Change-Id: I6376a5491368e0e1c283cc452849032636261592
Fixes: #1225
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- performance.cache-size has a flawed semantics, as it's
dispatched on two independent translators, io-cache
and quick-read.
- performance.qr-cache-timeout has a confusing name, as
other options affecting quick-read have an unabbreviated
"quick-read-..." prefix in their names.
We keep these options with unchanged operation, but in the
help output we indicate their deprecation.
The following better alternatives are introduced:
- performance.io-cache-size to tune cache-size option of io-cache
- performance.quick-read-cache-size to tune cache-size option of
quick-read
- performance.quick-read-cache-timeout as a preferred synonym for
performance.qr-cache-timeout
Fixes: #952
Change-Id: Ibd04fb638de8cac450ba992ad8a415154f9f4281
Signed-off-by: Csaba Henk <csaba@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Posix translator returns pre and postbufs in the dict in {F}REMOVEXATTR fops.
These iatts are further cached at layers like md-cache.
Shard translator, in its current state, simply returns these values without
updating the aggregated file size and block-count.
This patch fixes this problem.
Change-Id: I4b2dd41ede472c5829af80a67401ec5a6376d872
Fixes: #1243
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Posix translator returns pre and postbufs in the dict in {F}SETXATTR fops.
These iatts are further cached at layers like md-cache.
Shard translator, in its current state, simply returns these values without
updating the aggregated file size and block-count.
This patch fixes this problem.
Change-Id: I4da0eceb4235b91546df79270bcc0af8cd64e9ea
Fixes: #1243
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: tests/bugs/replicate/bug-1101647.t test case fails sporadically
in the volume heal since connection to the bricks with shd was not being
checked before running the index heal.
Build link: https://build.gluster.org/job/regression-test-burn-in/5007/
Fix: Check for the connection status of the bricks with shd before
performing the index heal.
Change-Id: Ie7060f379b63bef39fd4f9804f6e22e0a25680c1
Updates: #1154
Signed-off-by: karthik-us <ksubrahm@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: Sometimes test case is failing at the time of creating files
on mount point after mounting the volume
Solution: After started the volume need to wait to make sure all
bricks instances are completely started so put a online_brick_count
check after just started the volume
Change-Id: I5020e7e417539377277ca00189f9c51d2cf877a6
Fixes: #1162
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Do not truncate file offsets and sizes to 32-bit to
prevent tests from spurious failures on >2Gb files.
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Change-Id: I2a77ea5f9f415249b23035eecf07129f19194ac2
Fixes: #1161
|
|
|
|
|
|
|
|
| |
Parts of the test weren't designed to run in mux mode, this is now fixed
Change-Id: I428c2fcce6d047e324ca5dcaef677ee1794e3dfe
updates: #1154
Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
tests/bugs/glusterd/serialize-shd-manager-glusterd-restart.t
Problem: Sometime volume status is failed after restart glusterd
in one cluster node
Solution: Wait to finish glusterd handshake on down cluster node
Change-Id: Ib23ca41c943caf2903c61ebf42dc437c1b9d6054
Fixes: #1158
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: The key "GF_PREOP_PARENT_KEY" has been populated by dht and
for non-distribute volume like 1x3 key is not populated so
posix_is_layout stale throw a message while a file is created
Solution: To avoid a log put a condition before delete a key
Change-Id: I813ee7960633e7f9f5e9ad2f42f288053d9eb71f
Fixes: #1150
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
tests/bugs/protocol/bug-1433815-auth-allow.t fails
sometimes because of stale mount. This stale mount
comes into picture when parent process dies without
waiting for the child process which mounts fuse fs
to die
Fix:
Wait for mounting child process to die before dying.
Fixes: #1152
Change-Id: I8baee8720e88614fdb762ea822d5877973eef8dc
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When bringing back a downed brick and performing lookup from the client
side, the permission on said brick aren't updated on the first lookup,
but only on the second.
This patch modifies permission update logic so the first lookup will
trigger a permission update on the downed brick.
LIMITATIONS OF THE PATCH:
As the choice of source depends on whether the directory has layout or not.
Even the directories on the newly added brick will have layout xattr[zeroed], but the same is not true for a root directory.
Hence, in case in the entire cluster only the newly added bricks are up [and others are down], then any change in permission during this time will be overwritten by the older permissions when the cluster is restarted.
fixes: #999
Change-Id: Ieb70246d41e59f9cae9f70bc203627a433dfbd33
Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Test should wait for process down notification to be received
by glusterd.
Fixes: #1153
Change-Id: I9162b58a92c1a909ca98097f14c0714f9086bdd1
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
| |
Fixes: #1149
Change-Id: I38483fc7d76d7fe0ac9fb649669a46bdf9c82234
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There was a bug in write-behind that allowed a previous completed write
to overwrite the overlapping region of data from a future write.
Suppose we want to send three writes (W1, W2 and W3). W1 and W2 are
sequential, and W3 writes at the same offset of W2:
W2.offset = W3.offset = W1.offset + W1.size
Both W1 and W2 are sent in parallel. W3 is only sent after W2 completes.
So W3 should *always* overwrite the overlapping part of W2.
Suppose write-behind processes the requests from 2 concurrent threads:
Thread 1 Thread 2
<received W1>
<received W2>
wb_enqueue_tempted(W1)
/* W1 is assigned gen X */
wb_enqueue_tempted(W2)
/* W2 is assigned gen X */
wb_process_queue()
__wb_preprocess_winds()
/* W1 and W2 are sequential and all
* other requisites are met to merge
* both requests. */
__wb_collapse_small_writes(W1, W2)
__wb_fulfill_request(W2)
__wb_pick_unwinds() -> W2
/* In this case, since the request is
* already fulfilled, wb_inode->gen
* is not updated. */
wb_do_unwinds()
STACK_UNWIND(W2)
/* The application has received the
* result of W2, so it can send W3. */
<received W3>
wb_enqueue_tempted(W3)
/* W3 is assigned gen X */
wb_process_queue()
/* Here we have W1 (which contains
* the conflicting W2) and W3 with
* same gen, so they are interpreted
* as concurrent writes that do not
* conflict. */
__wb_pick_winds() -> W3
wb_do_winds()
STACK_WIND(W3)
wb_process_queue()
/* Eventually W1 will be
* ready to be sent */
__wb_pick_winds() -> W1
__wb_pick_unwinds() -> W1
/* Here wb_inode->gen is
* incremented. */
wb_do_unwinds()
STACK_UNWIND(W1)
wb_do_winds()
STACK_WIND(W1)
So, as we can see, W3 is sent before W1, which shouldn't happen.
The problem is that wb_inode->gen is only incremented for requests that
have not been fulfilled but, after a merge, the request is marked as
fulfilled even though it has not been sent to the brick. This allows
that future requests are assigned to the same generation, which could
be internally reordered.
Solution:
Increment wb_inode->gen before any unwind, even if it's for a fulfilled
request.
Special thanks to Stefan Ring for writing a reproducer that has been
crucial to identify the issue.
Change-Id: Id4ab0f294a09aca9a863ecaeef8856474662ab45
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
Fixes: #884
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
frame is accessed after stack-wind. This can lead to crash
if the cbk frees the frame.
Fix:
Use new frame for the wind instead.
Updates: #832
Change-Id: I64754609f1114b0bbd4d1336fa81a56f2cca6e03
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
...if pending xattrs are zero for all children.
Problem:
If there are no pending xattrs and a metadata heal needs to be
performed, it can be possible that we end up with xattrs inadvertendly
deleted from all bricks, as explained in the BZ.
Fix:
After picking one among the sources as the good copy, mark pending xattrs on
all sources to blame the sinks. Now even if this metadata heal fails midway,
a subsequent heal will still choose one of the valid sources that it
picked previously.
Fixes: #1067
Change-Id: If1b050b70b0ad911e162c04db4d89b263e2b8d7b
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Issue:
1- In a cluster of 3 Nodes N1, N2, N3. Create 3 volumes vol1,
vol2, vol3 with 3 bricks (one from each node)
2- Set cluster.brick-multiplex on
3- Start all 3 volumes
4- Check if all bricks on a node are running on same port
5- Kill N1
6- Set performance.readdir-ahead for volumes vol1, vol2, vol3
7- Bring N1 up and check volume status
8- All bricks processes not running on N1.
Root Cause -
Since, There is a diff in volfile versions in N1 as compared
to N2 and N3 therefore glusterd_import_friend_volume() is called.
glusterd_import_friend_volume() copies the new_volinfo and deletes
old_volinfo and then calls glusterd_start_bricks().
glusterd_start_bricks() looks for the volfiles and sends an rpc
request to glusterfs_handle_attach(). Now, since the volinfo
has been deleted by glusterd_delete_stale_volume()
from priv->volumes list before glusterd_start_bricks() and
glusterd_create_volfiles_and_notify_services() and
glusterd_list_add_order is called after glusterd_start_bricks(),
therefore the attach RPC req gets an empty volfile path
and that causes the brick to crash.
Fix- Call glusterd_list_add_order() and
glusterd_create_volfiles_and_notify_services before
glusterd_start_bricks() cal is made in glusterd_import_friend_volume
Change-Id: Idfe0e8710f7eb77ca3ddfa1cabeb45b2987f41aa
Fixes: bz#1773856
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
In a hyperconverged setup with granular-entry-heal enabled, if a file is
recreated while one of the bricks is down, and an index heal is triggered
(with the brick still down), entry-self heal was doing a spurious heal
with just the 2 good bricks. It was doing a post-op leading to removal
of the filename from .glusterfs/indices/entry-changes as well as
erroneous setting of afr xattrs on the parent. When the brick came up,
the xattrs were cleared, resulting in the renamed file not getting
healed and leading to gfid split-brain and EIO on the mount.
Fix:
Proceed with entry heal only when shd can connect to all bricks of the replica,
just like in data and metadata heal.
fixes: bz#1801624
Change-Id: I916ae26ad1fabf259bc6362da52d433b7223b17e
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Changelog creates threads even if the changelog is not enabled
Background:
Changelog xlator broadly does two things
1. Journalling - Cosumers are geo-rep and glusterfind
2. Event Notification for registered events like (open, release etc) -
Consumers are bitrot, geo-rep
The existing option "changelog.changelog" controls journalling and
there is no option to control event notification and is enabled by
default. So when bitrot/geo-rep is not enabled on the volume, threads
and resources(rpc and rbuf) related to event notifications consumes
resources and cpu cycle which is unnecessary.
Solution:
The solution is to have two different options as below.
1. changelog-notification : Event notifications
2. changelog : Journalling
This patch introduces the option "changelog-notification" which is
not exposed to user. When either bitrot or changelog (journalling)
is enabled, it internally enbales 'changelog-notification'. But
once the 'changelog-notification' is enabled, it will not be disabled
for the life time of the brick process even after bitrot and changelog
is disabled. As of now, rpc resource cleanup has lot of races and is
difficult to cleanup cleanly. If allowed, it leads to memory leaks
and crashes on enable/disable of bitrot or changelog (journal) in a
loop. Hence to be safer, the event notification is not disabled within
lifetime of process once enabled.
Change-Id: Ifd00286e0966049e8eb9f21567fe407cf11bb02a
Updates: #475
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: With lookup-optimize set to on by default, a client with
stale-layout can create a new file on a wrong subvol. This will lead to
possible duplicate files if two different clients attempt to create the
same file with two different layouts.
Solution: Send in-memory layout to be cross checked at posix before
commiting a "create". In case of a mismatch, sync the client layout with
that of the server and attempt the create fop one more time.
test: Manual, testcase(attached)
fixes: bz#1786679
Change-Id: Ife0941f105113f1c572f4363cbcee65e0dd9bd6a
Signed-off-by: Susant Palai <spalai@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
'nfsnobody' user and group is merged with
'nobody' user and group in RHEL8.
Tests are modified to use appropriate user and group.
BUG: 1756900
Change-Id: I59863da2262283b00b1cb417d3652ebe29a36407
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
For files: During metadata heal, we restore timestamps
only for non-regular (char, block etc.) files.
Extenting it for regular files as timestamp is updated
via touch command also
fixes: bz#1787274
Change-Id: I26fe4fb6dff679422ba4698a7f828bf62ca7ca18
Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: At the time of coming up one server node(1x3) after reboot
client is unmounted.The client is unmounted because a client
is getting AUTH_FAILED event and client call fini for the graph.The
client is getting AUTH_FAILED because brick is not attached with a
graph at that moment
Solution: To avoid the unmounting the client graph throw ENOENT error
from server in case if brick is not attached with server at
the time of authenticate clients.
Credits: Xavi Hernandez <xhernandez@redhat.com>
Change-Id: Ie6fbd73cbcf23a35d8db8841b3b6036e87682f5e
Fixes: bz#1793852
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: default option itransport.address-family is disappered
in volume info output after a volume reset.
Cause: with 3.8.0 onwards volume option transport.address-family
has default value, any volume which is created will have this
option set. So, volume info will show this in its output. But,
with reset volume, this option is not handled.
Solution: In glusterd_enable_default_options(), we should add this
option along with other default options. This function is called
by glusterd_options_reset() with volume reset command.
fixes: bz#1786478
Change-Id: I58f7aa24cf01f308c4efe6cae748cc3bc8b99b1d
Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: Sometime fops like posix_writev, posix_fallocate, posix_zerofile
failed and throw error ENOSPC if storage.reserve threshold limit
has reached even fops is overwriting the data
Solution: Retry the fops in case of overwrite if diskspace check
is failed
Credits: kinsu <vpolakis@gmail.com>
Change-Id: I987d73bcf47ed1bb27878df40c39751296e95fe8
Updates: #745
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Imagine the following set of operations:
1. touch $M0/file
2. ln $M0/file $M0/file.lnk
3. rm $M1/file
4. wait for md-cache-timeout
5. stat $M0/file.lnk $M0/file
stat on $M0/file in step 5 succeeds when md-cache-timeout is non-zero
even though it was removed from $M1. The reason is
1. fuse-bridge on $M0 would resolve both file and file.lnk to same
inode. This inode i1 is cached in md-cache.
2. After md-cache-timeout lookup on $M0/file.lnk would be sent to
backend. This lookup will be successful as file.lnk is present on
backend and would refresh the md-cache.
3. The lookup on $M0/file sent within md-cache-timeout after lookup on
$M0/file.lnk would be sent with i1. Since i1 was refreshed by
lookup on $M0/file.lnk, the cache is deemed valid and md-cache
responds lookup on $M0/file as success
To fix this failure we can either:
1. remove lookup on $M0/file.lnk, so that lookup on $M0/file is
actually sent to backend.
2. turn off md-cache
This patch chooses option 2.
credits: Csaba Henk <csaba@redhat.com>
Change-Id: I352c2acd377fe10c4bdf3b6e53c1de86a4e544c7
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Updates: bz#1756900
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The old command for log rotate is still present removing
it completely. Also adding testcase to test the
log rotate command with both the old as well as the new command
and fixing testcase which use the old syntax to use the new
one.
Code to be removed:
1. In cli-cmd-volume.c from struct cli_cmd volume_cmds[]:
{"volume log rotate <VOLNAME> [BRICK]", cli_cmd_log_rotate_cbk,
"rotate the log file for corresponding volume/brick"
" NOTE: This is an old syntax, will be deprecated from next release."},
2. In cli-cmd-volume.c from cli_cmd_log_rotate_cbk():
||(strcmp("rotate", words[2]) == 0)))
3. In cli-cmd-parser.c from cli_cmd_log_rotate_parse()
if (strcmp("rotate", words[2]) == 0)
volname = (char *)words[3];
else
fixes: bz#1750387
Change-Id: I56e4d295044e8d5fd1fc0d848bc87e135e9e32b4
Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Configure the list of gluster servers in the key
GLUSTERD_BRICK_SERVERS at the time of GETSPEC RPC CALL
and access the value in client side to update volfile
serve list so that client would be able to connect
next volfile server in case of current volfile server
is down
Updates #741
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Change-Id: I23f36ddb92982bb02ffd83937a8bd8a2c97e8104
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: When one of the node is down in cluster,
rebalance status is not displaying detailed
information.
Cause: In glusterd_volume_rebalance_use_rsp_dict()
we are aggregating rsp from all the nodes into a
dictionary and sending it to cli for printing. While
assigning a index to keys we are considering all the
peers instead of considering only the peers which are
up. Because of which, index is not reaching till 1.
while parsing the rsp cli unable to find status-1
key in dictionary and going out without printing
any information.
Solution: The simplest fix for this without much
code change is to continue to look for other keys
when status-1 key is not found.
fixes: bz#1764119
Change-Id: I0062839933c9706119eb85416256eade97e976dc
Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
On a rhel-8 machine, we need to have a key length
greater than or eaual to 2048. So changing the values
to 2048 to pass the test.
Credits: Mohit Agrawal <moagrawal@redhat.com>
Change-Id: I0f21db4d737203d0b2e44e7e61f50ae1279795ad
Updates: bz#1756900
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
| |
fixes: bz#1759002
Change-Id: I4d49e1c2ca9b3c1d74b9dd5a30f1c66983a76529
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|