| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: Sometime brick is going down to health check thread is
failed without logging error codes return by aio system calls.
As per aio_error man page it returns a positive error number
if the asynchronous I/O operation failed.
Solution: log aio_error return codes in error message
> Change-Id: I2496b1bc16e602b0fd3ad53e211de11ec8c641ef
> (Cherry picked from commit 032862fa3944fc7152140aaa13cdc474ae594a51)
> (Reviwed on upstream link https://review.gluster.org/#/c/glusterfs/+/23284/
> Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
Change-Id: I2496b1bc16e602b0fd3ad53e211de11ec8c641ef
Fixes: #1168
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
tests/bugs/protocol/bug-1433815-auth-allow.t fails
sometimes because of stale mount. This stale mount
comes into picture when parent process dies without
waiting for the child process which mounts fuse fs
to die
Fix:
Wait for mounting child process to die before dying.
Fixes: #1152
Change-Id: I8baee8720e88614fdb762ea822d5877973eef8dc
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
"snap_scheduler.py init" command failing with the below traceback:
[root@dhcp43-104 ~]# snap_scheduler.py init
Traceback (most recent call last):
File "/usr/sbin/snap_scheduler.py", line 941, in <module>
sys.exit(main(sys.argv[1:]))
File "/usr/sbin/snap_scheduler.py", line 851, in main
initLogger()
File "/usr/sbin/snap_scheduler.py", line 153, in initLogger
logfile = os.path.join(process.stdout.read()[:-1], SCRIPT_NAME + ".log")
File "/usr/lib64/python3.6/posixpath.py", line 94, in join
genericpath._check_arg_types('join', a, *p)
File "/usr/lib64/python3.6/genericpath.py", line 151, in _check_arg_types
raise TypeError("Can't mix strings and bytes in path components") from None
TypeError: Can't mix strings and bytes in path components
Solution:
Added the 'universal_newlines' flag to Popen to support backward compatibility.
Added a basic test for snapshot scheduler.
Backport of:
>Upstream patch:
>https://review.gluster.org/#/c/glusterfs/+/24257/
>Change-Id: I78e8fabd866fd96638747ecd21d292f5ca074a4e
>Fixes: #1134
>Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
>(cherry picked from commit a7d7ec066e56ac03bf252c26beb20fdc2c3b6772)
Change-Id: I78e8fabd866fd96638747ecd21d292f5ca074a4e
Fixes: #1134
Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
frame is accessed after stack-wind. This can lead to crash
if the cbk frees the frame.
Fix:
Use new frame for the wind instead.
Fixes: #832
Change-Id: I64754609f1114b0bbd4d1336fa81a56f2cca6e03
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There was a bug in write-behind that allowed a previous completed write
to overwrite the overlapping region of data from a future write.
Suppose we want to send three writes (W1, W2 and W3). W1 and W2 are
sequential, and W3 writes at the same offset of W2:
W2.offset = W3.offset = W1.offset + W1.size
Both W1 and W2 are sent in parallel. W3 is only sent after W2 completes.
So W3 should *always* overwrite the overlapping part of W2.
Suppose write-behind processes the requests from 2 concurrent threads:
Thread 1 Thread 2
<received W1>
<received W2>
wb_enqueue_tempted(W1)
/* W1 is assigned gen X */
wb_enqueue_tempted(W2)
/* W2 is assigned gen X */
wb_process_queue()
__wb_preprocess_winds()
/* W1 and W2 are sequential and all
* other requisites are met to merge
* both requests. */
__wb_collapse_small_writes(W1, W2)
__wb_fulfill_request(W2)
__wb_pick_unwinds() -> W2
/* In this case, since the request is
* already fulfilled, wb_inode->gen
* is not updated. */
wb_do_unwinds()
STACK_UNWIND(W2)
/* The application has received the
* result of W2, so it can send W3. */
<received W3>
wb_enqueue_tempted(W3)
/* W3 is assigned gen X */
wb_process_queue()
/* Here we have W1 (which contains
* the conflicting W2) and W3 with
* same gen, so they are interpreted
* as concurrent writes that do not
* conflict. */
__wb_pick_winds() -> W3
wb_do_winds()
STACK_WIND(W3)
wb_process_queue()
/* Eventually W1 will be
* ready to be sent */
__wb_pick_winds() -> W1
__wb_pick_unwinds() -> W1
/* Here wb_inode->gen is
* incremented. */
wb_do_unwinds()
STACK_UNWIND(W1)
wb_do_winds()
STACK_WIND(W1)
So, as we can see, W3 is sent before W1, which shouldn't happen.
The problem is that wb_inode->gen is only incremented for requests that
have not been fulfilled but, after a merge, the request is marked as
fulfilled even though it has not been sent to the brick. This allows
that future requests are assigned to the same generation, which could
be internally reordered.
Solution:
Increment wb_inode->gen before any unwind, even if it's for a fulfilled
request.
Special thanks to Stefan Ring for writing a reproducer that has been
crucial to identify the issue.
Change-Id: Id4ab0f294a09aca9a863ecaeef8856474662ab45
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
Fixes: #884
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In case where uid is not set to be 0, there are possible errors
from acl xlator. So, set `uid = 0;` with pid indicating this is
set from UTIME activity.
The message "E [MSGID: 148002] [utime.c:146:gf_utime_set_mdata_setxattr_cbk] 0-dev_SNIP_data-utime: dict set of key for set-ctime-mdata failed [Permission denied]" repeated 2 times between [2019-12-19 21:27:55.042634] and [2019-12-19 21:27:55.047887]
Change-Id: Ieadf329835a40a13ac0bf908dac776e66954466c
Updates: #832
Signed-off-by: Amar Tumballi <amar@kadalu.io>
(cherry picked from commit eb916c057036db8289b41265797e5dce066d1512)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A crash is seen during a reattempt to clean up shards in background
upon remount. And this happens even on remount (which means a remount
is no workaround for the crash).
In such a situation, the in-memory base inode object will not be
existent (new process, non-existent base shard).
So local->resolver_base_inode will be NULL.
In the event of an error (in this case, of space running out), the
process would crash at the time of logging the error in the following line -
gf_msg(this->name, GF_LOG_ERROR, local->op_errno, SHARD_MSG_FOP_FAILED,
"failed to delete shards of %s",
uuid_utoa(local->resolver_base_inode->gfid));
Fixed that by using local->base_gfid as the source of gfid when
local->resolver_base_inode is NULL.
Change-Id: I0b49f2b58becd0d8874b3d4b14ff8d92a89d02d5
Fixes: #1127
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
(cherry picked from commit cc43ac8651de9aa508b01cb259b43c02d89b2afc)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
...if pending xattrs are zero for all children.
Problem:
If there are no pending xattrs and a metadata heal needs to be
performed, it can be possible that we end up with xattrs inadvertendly
deleted from all bricks, as explained in the BZ.
Fix:
After picking one among the sources as the good copy, mark pending xattrs on
all sources to blame the sinks. Now even if this metadata heal fails midway,
a subsequent heal will still choose one of the valid sources that it
picked previously.
Updates: #1067
Change-Id: If1b050b70b0ad911e162c04db4d89b263e2b8d7b
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
(cherry picked from commit 2d5ba449e9200b16184b1e7fc84cabd015f1f779)
|
|
|
|
|
|
|
|
|
|
|
| |
Open behind was not keeping any reference on fd's pending to be
opened. This makes it possible that a concurrent close and an entry
fop (unlink, rename, ...) caused destruction of the fd while it
was still being used.
Change-Id: Ie9e992902cf2cd7be4af1f8b4e57af9bd6afd8e9
Fixes: #1028
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
| |
Fixes: #1124
Change-Id: I02fbb97ea7ab7617086ebcece07213e67cb1aa67
Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
ec_getxattr_heal_cbk was called with NULL as second argument
in case heal was failing.
This function was dereferencing "cookie" argument which caused crash.
Solution:
Cookie is changed to carry the value that was supposed to be
stored in fop->data, so even in the case when fop is NULL in error
case, there won't be any NULL dereference.
Thanks to Xavi for the suggestion about the fix.
Change-Id: I0798000d5cadb17c3c2fbfa1baf77033ffc2bb8c
updates: #1061
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Issue:
1- In a cluster of 3 Nodes N1, N2, N3. Create 3 volumes vol1,
vol2, vol3 with 3 bricks (one from each node)
2- Set cluster.brick-multiplex on
3- Start all 3 volumes
4- Check if all bricks on a node are running on same port
5- Kill N1
6- Set performance.readdir-ahead for volumes vol1, vol2, vol3
7- Bring N1 up and check volume status
8- All bricks processes not running on N1.
Root Cause -
Since, There is a diff in volfile versions in N1 as compared
to N2 and N3 therefore glusterd_import_friend_volume() is called.
glusterd_import_friend_volume() copies the new_volinfo and deletes
old_volinfo and then calls glusterd_start_bricks().
glusterd_start_bricks() looks for the volfiles and sends an rpc
request to glusterfs_handle_attach(). Now, since the volinfo
has been deleted by glusterd_delete_stale_volume()
from priv->volumes list before glusterd_start_bricks() and
glusterd_create_volfiles_and_notify_services() and
glusterd_list_add_order is called after glusterd_start_bricks(),
therefore the attach RPC req gets an empty volfile path
and that causes the brick to crash.
Fix- Call glusterd_list_add_order() and
glusterd_create_volfiles_and_notify_services before
glusterd_start_bricks() cal is made in glusterd_import_friend_volume
> Change-Id: Idfe0e8710f7eb77ca3ddfa1cabeb45b2987f41aa
> Bug: bz#1773856
> Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
(cherry picked from commit 45e81aae791da9d013aba2286af44826227c05ec)
Change-Id: Idfe0e8710f7eb77ca3ddfa1cabeb45b2987f41aa
fixes: bz#1808964
Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When an event was generated and the target host was resolved to an IPv6
address, there was a memory overflow when that address was copied to a
fixed IPv4 structure (IPv6 addresses are longer than IPv4 ones).
This fix correctly handles IPv4 and IPv6 addresses returned by
getaddrinfo()
Backport of:
> Change-Id: I5864a0c6e6f1b405bd85988529570140cf23b250
> Fixes: bz#1790870
> Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
Change-Id: I5864a0c6e6f1b405bd85988529570140cf23b250
Fixes: #1030
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch addresses two problems:
1. During friend handshaking, if a volume is imported due to change in
the version, the old bricks were not stopped which would lead to a
situation where bricks will run with old volfiles.
2. As part of attaching shd service in glusterd_attach_svc, there might
be a case that the volume for which we're attempting to attach a shd
service might become stale and in the process of deletion and hence in
every retrials (if the rpc connection isn't ready) check for the
existance of the volume and then only attempt the further attach
request.
patch on master: https://review.gluster.org/#/c/glusterfs/+/23042/
> Bug: bz#1733425
> Change-Id: I6bac6b871f7e31cb5bf277db979289dec196a03e
> Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
> Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
fixes: bz#1812849
Change-Id: I6bac6b871f7e31cb5bf277db979289dec196a03e
Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using inode_ctx_get() or inode_ctx_set(), a 'uint64_t *' is expected.
In many cases, the value to retrieve or store is a pointer, which will be
of smaller size in some architectures (for example 32-bits). In this case,
directly passing the address of the pointer casted to an 'uint64_t *' is
wrong and can cause memory corruption.
Backport of:
> Change-Id: Iae616da9dda528df6743fa2f65ae5cff5ad23258
> Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
> Fixes: bz#1785611
Change-Id: Iae616da9dda528df6743fa2f65ae5cff5ad23258
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
Fixes: bz#1785323
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The was a problem when self-heal was sending lookups at the same time
that one of the bricks was coming up. In this case there was a chance
that the number of 'up' bricks changes in the middle of sending the
requests to subvolumes which caused a discrepancy in the expected
number of replies and the actual number of sent requests.
This discrepancy caused that AFR continued executing requests before
all requests were complete. Eventually, the frame of the pending
request was destroyed when the operation terminated, causing a use-
after-free issue when the answer was finally received.
In theory the same thing could happen in the reverse way, i.e. AFR
tries to wait for more replies than sent requests, causing a hang.
Backport of:
> Change-Id: I7ed6108554ca379d532efb1a29b2de8085410b70
> Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
> Fixes: bz#1808875
Change-Id: I7ed6108554ca379d532efb1a29b2de8085410b70
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
Fixes: bz#1809438
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
server.sin_family was set to AF_INET while creating socket connection,
this was failing if the input address is IPv6(`::1`).
With this patch, sin_family is set by reading the ai_family of
`getaddrinfo` result.
Backport of:
> Fixes: bz#1752330
> Change-Id: I499f957b432842fa989c698f6e5b25b7016084eb
> Signed-off-by: Aravinda VK <avishwan@redhat.com>
Fixes: bz#1807785
Change-Id: I499f957b432842fa989c698f6e5b25b7016084eb
Signed-off-by: Aravinda VK <avishwan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ec_manager_open/opendir memsets ctx->loc which causes
memory/inode leak, and ec_fheal uses ctx->loc out of fd->lock
that loc_copy may copy bad data when memset it.
This patch skips updating ctx->loc when it is initilizaed.
With it, ctx->loc is filled once, and never updated.
Change-Id: I3bf5ffce4caf4c1c667f7acaa14b451d37a3550a
fixes: bz#1806843
Signed-off-by: Kinglong Mee <mijinlong@horiscale.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
In a hyperconverged setup with granular-entry-heal enabled, if a file is
recreated while one of the bricks is down, and an index heal is triggered
(with the brick still down), entry-self heal was doing a spurious heal
with just the 2 good bricks. It was doing a post-op leading to removal
of the filename from .glusterfs/indices/entry-changes as well as
erroneous setting of afr xattrs on the parent. When the brick came up,
the xattrs were cleared, resulting in the renamed file not getting
healed and leading to gfid split-brain and EIO on the mount.
Fix:
Proceed with entry heal only when shd can connect to all bricks of the replica,
just like in data and metadata heal.
fixes: bz#1804591
Change-Id: I916ae26ad1fabf259bc6362da52d433b7223b17e
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
(cherry picked from commit 06453d77d056fbaa393a137ca277a20e38d2f67e)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Objects allocated from a per-thread memory pool keep a reference to it
to be able to return the object to the pool when not used anymore. The
object holding this reference can have a long life cycle that could
survive a glfs_fini() call.
This means that it's unsafe to destroy memory pools from glfs_fini().
Another side effect of destroying memory pools from glfs_fini() is that
the TLS variable that points to one of those pools cannot be reset for
all alive threads. This means that any attempt to allocate memory from
those threads will access already free'd memory, which is very
dangerous.
To fix these issues, mem_pools_fini() doesn't destroy pool lists
anymore. They should be destroyed when the library is unloaded or the
process is terminated, but this cannot be done right now because
gluster doesn't stop other threads before calling exit(), which could
cause some races.
This patch is the backport of 2 master patches:
> Change-Id: Ib189a5510ab6bdac78983c6c65a022e9634b0965
> Fixes: bz#1801684
> Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
>
> Change-Id: Id7cfb4407fcf208e28f03a7c3cdc3ef9c1f3bf9b
> Fixes: bz#1801684
> Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
Change-Id: Id7cfb4407fcf208e28f03a7c3cdc3ef9c1f3bf9b
Fixes: bz#1805668
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
When we mount a ta volume, as soon as 2 data bricks are connected
we consider that the mount is done and then send a lookup/create on
ta file on ta node. However, this connection with ta node might not
have been completed.
Due to this delay, ta replica id file will not be created and we
will see ENOTCONN error in log file if we do lookup.
Solution:
As we know that this ta node could have a higher latency, we should
wait for reasonable time for connection to happen before sending
lookup/create on replica id file.
fixes: bz#1804058
Change-Id: I36f90865afe617e4e84cee57fec832a16f5dd6cc
(cherry picked from commit a7fa54ddea3fe429f143b37e4de06a93b49d776a)
|
|
|
|
|
|
|
| |
Fixes: bz#1803713
Change-Id: I0ff6c3152624b8c8f8f76057a1948aed2dc3cec0
Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Thin-arbiter module makes use of 'pending-xattr' name for the translator
as the filename which gets created in thin-arbiter node. By making this
unique, we can host single thin-arbiter node for multiple clusters.
Updates: #763
Change-Id: Ib3c732e7e04e6dba229e71ae3e64f1f3cb6d794d
Signed-off-by: Amar Tumballi <amar@kadalu.io>
(cherry picked from commit 8db8202f716fd24c8c52f8ee5f66e169310dc9b1)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
heal-info code assumes that all indices in xattrop directory
definitely need heal. There is one corner case.
The very first xattrop on the file will lead to adding the
gfid to 'xattrop' index in fop path and in _cbk path it is
removed because the fop is zero-xattr xattrop in success case.
These gfids could be read by heal-info and shown as needing heal.
Fix:
Check the pending flag to see if the file definitely needs or
not instead of which index is being crawled at the moment.
fixes: bz#1802449
Change-Id: I79f00dc7366fedbbb25ec4bec838dba3b34c7ad5
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
(cherry picked from commit d27df94016b5526c18ee964d4a47508326329dda)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: At the time of coming up one server node(1x3) after reboot
client is unmounted.The client is unmounted because a client
is getting AUTH_FAILED event and client call fini for the graph.The
client is getting AUTH_FAILED because brick is not attached with a
graph at that moment
Solution: To avoid the unmounting the client graph throw ENOENT error
from server in case if brick is not attached with server at
the time of authenticate clients.
> Credits: Xavi Hernandez <xhernandez@redhat.com>
> Change-Id: Ie6fbd73cbcf23a35d8db8841b3b6036e87682f5e
> Fixes: bz#1793852
> Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
> (cherry picked from commit > f6421dff22a6ddaf14134f6894deae219948c89d)
Change-Id: Ie6fbd73cbcf23a35d8db8841b3b6036e87682f5e
Fixes: bz#1794019
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: At the time of cleanup rpc object ssl specific data
is not freeing so it has become a leak.
Solution: To avoid the leak cleanup ssl specific data at the
time of cleanup rpc object
> Credits: l17zhou <cynthia.zhou@nokia-sbell.com.cn>
> Fixes: bz#1768407
> Change-Id: I37f598673ae2d7a33c75f39eb8843ccc6dffaaf0
> (cherry picked from commit > > 54ed71dba174385ab0d8fa415e09262f6250430c)
Change-Id: I37f598673ae2d7a33c75f39eb8843ccc6dffaaf0
Fixes: bz#1795540
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
xlators/features/quota/src/quota.c quota_log_usage function.
The quota_log_helper() function applies memory
for path through inode_path(), should be GF_FREE().
Upstream Patch:
https://review.gluster.org/#/c/glusterfs/+/24018/
Backport of:
> fixes: bz#1792707
> Change-Id: I33143bdf272bf10837061df4a1b7b2fc146162d5
> Signed-off-by: Xi Jinyu <xijinyu@cmss.chinamobile.com>
> (cherry picked from commit 18549de12bcfafe4ac30fc2e11ad7a3f3c216b38)
fixes: bz#1791154
Change-Id: I33143bdf272bf10837061df4a1b7b2fc146162d5
Signed-off-by: Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
glfsheal program uses unix-socket-based volfile server.
volfile server will be the path to socket in this case.
gf_event expects this to be hostname in all cases. So getaddrinfo
will fail on the unix-socket path, events won't be sent in this case.
Fix:
In case of unix sockets, default to localhost
fixes: bz#1793085
Change-Id: I60d27608792c29d83fb82beb5fde5ef4754bece8
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If non-standard ssh-port is used, Geo-rep can be configured to use ssh port
by using config option, the value should be in allowed port range and non negative.
At present it can accept negative value and outside allowed port range which is incorrect.
Many Linux kernels use the port range 32768 to 61000.
IANA suggests it should be in the range 1 to 2^16 - 1, so keeping the same.
$ gluster volume geo-replication master 127.0.0.1::slave config ssh-port -22
geo-replication config updated successfully
$ gluster volume geo-replication master 127.0.0.1::slave config ssh-port 22222222
geo-replication config updated successfully
This patch fixes the above issue and have added few validations around this
in test cases.
Upstream Patch:
https://review.gluster.org/#/c/glusterfs/+/24035/
Backport of:
> Change-Id: I9875ab3f00d7257370fbac6f5ed4356d2fed3f3c
> Fixes: bz#1792276
> Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
> (cherry picked from commit 485212e858bddd97573a3b2b811357b0d822005a)
Change-Id: I9875ab3f00d7257370fbac6f5ed4356d2fed3f3c
Fixes: bz#1793412
Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Winter is coming. So is gcc-10.
Compiling with gcc-10-20191219 snapshot reveals dupe defns of
cli_default_conn_timeout and cli_ten_minutes_timeout in
.../cli/src/cli.[ch] due to missing extern decl.
There are many changes coming in gcc-10 described in
https://gcc.gnu.org/gcc-10/changes.html
compiling cli.c with gcc-9 we see:
...
.quad .LC88
.comm cli_ten_minutes_timeout,4,4
.comm cli_default_conn_timeout,4,4
.text
.Letext0:
...
and with gcc-10:
...
.quad .LC88
.globl cli_ten_minutes_timeout
.bss
.align 4
.type cli_ten_minutes_timeout, @object
.size cli_ten_minutes_timeout, 4
cli_ten_minutes_timeout:
.zero 4
.globl cli_default_conn_timeout
.align 4
.type cli_default_conn_timeout, @object
.size cli_default_conn_timeout, 4
cli_default_conn_timeout:
.zero 4
.text
.Letext0:
...
which is reflected in the .o file as (gcc-9):
...
0000000000000004 C cli_ten_minutes_timeout
0000000000000004 C cli_default_conn_timeout
...
and (gcc-10):
...
0000000000000020 B cli_ten_minutes_timeout
0000000000000024 B cli_default_conn_timeout
...
See nm(1) and ld(1) for a description C (common) and B (BSS) and how
they are treated by the linker.
Note: gcc-10 will land in Fedora-32!
Change-Id: I54ea485736a4910254eeb21222ad263721cdef3c
Fixes: bz#1793492
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
|
|
|
|
|
|
|
| |
Fixes: bz#1791177
Change-Id: I0d732f82217ee4fecf1d32df4be4b7492022c0ca
Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of:
> Upstream Patch: https://review.gluster.org/#/c/glusterfs/+/24011/
>fixes: bz#1790748
>Change-Id: I1cb12c975142794139456d0f8e99fbdbb03c53a1
>Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
>(cherry picked from commit d73872e764214f8071c8915536a75bdac1e5e685)
fixes: bz#1790846
Change-Id: I1cb12c975142794139456d0f8e99fbdbb03c53a1
Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. In dictionary values(), returns list in py2 and not in py3.
So explicitly convert it into list.
2. xattr module returns values in bytes. So explicitly convert
them to str to work both with py2 and py3
Backport of:
> fixes: bz#1789439
> Change-Id: I27a639cda4f7a4ece9744a97c3d16e247906bd94
> Signed-off-by: Kotresh HR <khiremat@redhat.com>
> (cherry picked from commit 45894c39a4d05ed1f6a6f1bdbeafb5fe74ef29c3)
Change-Id: I27a639cda4f7a4ece9744a97c3d16e247906bd94
Fixes: bz#1790423
Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
While we delete gluster volume the hook script 'S57glusterfind-delete-post.py'
is failed to execute and error message can be observed in glusterd log.
Traceback:
File "/var/lib/glusterd/hooks/1/delete/post/S57glusterfind-delete-post", line 69, in <module>
main()
File "/var/lib/glusterd/hooks/1/delete/post/S57glusterfind-delete-post", line 39, in main
glusterfind_dir = os.path.join(get_glusterd_workdir(), "glusterfind")
File "/usr/lib64/python3.7/posixpath.py", line 94, in join
genericpath._check_arg_types('join', a, *p)
File "/usr/lib64/python3.7/genericpath.py", line 155, in _check_arg_types
raise TypeError("Can't mix strings and bytes in path components") from None
TypeError: Can't mix strings and bytes in path components
Solution:
Added the 'universal_newlines' flag to Popen to support backward compatibility.
Backport of:
> Change-Id: Ie5655b11b55535c5ad2338108d0448e6fdaacf4f
> Fixes: bz#1789478
> Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
> (cherry picked from commit 33c3cbe71b67f523538b04334f1ef962953281ed)
Change-Id: Ie5655b11b55535c5ad2338108d0448e6fdaacf4f
Fixes: bz#1790438
Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
glusterfind is unable to copy remote output file to local node when a
remove-brick is in progress on the remote node. After copying remote
files, in the --full output listing path, a "sort -u" command is run on
the collected files. However, "sort" exits with an error code if it
finds any file missing.
Solution:
Maintain a map of (pid, output file) when the node commands are started
and remove the mapping for the pid for which the command returns an
error. Use the list of files present in the map for the "sort" command.
Backport of:
> Change-Id: Ie6e019037379f4cb163f24b1c65eb382efc2fb3b
> fixes: bz#1410439
> Signed-off-by: Milind Changire <mchangir@redhat.com>
> Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
> (cherry picked from commit 42c1605f42b89520d4d05806d7074e9e93b63640)
Change-Id: Ie6e019037379f4cb163f24b1c65eb382efc2fb3b
Fixes: bz#1790428
Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Glusterfs client process has memory leak if create serveral files under one folder, and delete the folder.
According to statedump, the ref counts of readdir-ahead is bigger than zero in the inode table. Readdir-ahead get parent inode by inode_parent in rda_mark_inode_dirty when each rda_writev_cbk,the inode ref count of parent folder will be increased in inode_parent, but readdir-ahead do not unref it later.
The correction is unref the parent inode at the end of rda_mark_inode_dirty
Backport of:
> Change-Id: Iee68ab1089cbc2fbc4185b93720fb1f66ee89524
> Fixes: bz#1779055
> Signed-off-by: HuangShujun <549702281@qq.com>
Change-Id: Iee68ab1089cbc2fbc4185b93720fb1f66ee89524
(cherry picked from commit 99044a5cedcff9a9eec40a07ecb32bd66271cd02)
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Fixes: bz#1789336
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In current timer implementation, each event has an absolute time at which
it will be fired. When the first timer of the queue has not elapsed yet,
a pthread_cond_timedwait() is used to wait until the expected time.
Apparently that's fine. However the time passed to that function was a
pointer to the timespec structure contained in the event itself. This is
problematic because of how pthread_cond_timedwait() works internally.
Simplifying a bit, pthread_cond_timedwait() basically queues itself as a
waiter for the given condition variable and releases the mutex. Then it
does the timed wait using the passed value.
With that in mind, the follwing case is possible:
Timer Thread Other Thread
------------ ------------
gf_timer_call_cancel()
pthread_mutex_lock() |
+ pthread_mutex_lock()
event = current_event() |
pthread_cond_timedwait(&event->at) |
+ pthread_mutex_unlock() |
| + remove_event()
| + destroy_event()
+ timed_wait(&event->at)
As we can see, the time is used after it has been destroyed, which means
we have a use-after-free problem.
This patch fixes the problem by copying the time to a local variable
before calling pthread_cond_timedwait()
Backport of:
> Change-Id: I0f4e8eded24fe3a1276dc75c6cf093bae973d26b
> Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
> Fixes: bz#1785208
Change-Id: I0f4e8eded24fe3a1276dc75c6cf093bae973d26b
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
Fixes: bz#1767264
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of https://review.gluster.org/#/c/glusterfs/+/23960/
This volume option was not made avaialble to `gluster volume set` CLI.
Reported-by: epolakis(https://github.com/kinsu) in
https://github.com/gluster/glusterfs/issues/781
fixes: bz#1788785
Change-Id: I7141bdd4e53ee99e22b354edde8d023bfc0b2cd7
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added logrotate support for user serviceable snapshot's logs.
Backport of:
>Change-Id: Ic920eaa8ab5e44daf5937a027c6913d7bb26d517
>Fixes: bz#1786722
>Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
Change-Id: Ic920eaa8ab5e44daf5937a027c6913d7bb26d517
Fixes: bz#1786753
Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
|
|
|
|
|
|
|
| |
Fixes: bz#1778047
Change-Id: I52f9ee376d6816ecaf522fab962ea340f20d13fb
Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need to safe-guard against possible zero'ing out of iatt
structure in acl ctx, which can cause many issues.
> fixes: bz#1668286
> Change-Id: Ie81a57d7453a6624078de3be8c0845bf4d432773
> Signed-off-by: Amar Tumballi <amarts@redhat.com>
> (cherry picked from commit 6bf9637a93011298d032332ca93009ba4e377e46)
fixes: bz#1785493
Change-Id: Ie81a57d7453a6624078de3be8c0845bf4d432773
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Null character string is a valid xattr value in file system. But for
those xattrs processed by md-cache, it does not update its entries if
value is null('\0'). This results in ENODATA when those xattrs are
queried afterwards via getxattr() causing failures in basic operations
like create, copy etc in a specially configured Samba setup for Mac OS
clients.
On the other side snapview-server is internally setting empty string("")
as value for xattrs received as part of listxattr() and are not intended
to be cached. Therefore we try to maintain that behaviour using an
additional dictionary key to prevent updation of entries in getxattr()
and fgetxattr() callbacks in md-cache.
Credits: Poornima G <pgurusid@redhat.com>
Change-Id: I7859cbad0a06ca6d788420c2a495e658699c6ff7
Fixes: bz#1785228
Signed-off-by: Anoop C S <anoopcs@redhat.com>
(cherry picked from commit b4b683736367d93daad08a5ee6ca95778c07c5a4)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
On a freshly installed system non-root geo-rep test case gets blocked.
Solution:
On a freshly installed system, the remote key need to be accepted automatically by ssh-copy-id.
Credits: M. Scherer <mscherer@redhat.com>
Backport of:
> Change-Id: I5077f99a6681660f7e3e84c25ef216f521b7c29c
> Fixes: bz#1779742
> Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
Change-Id: I5077f99a6681660f7e3e84c25ef216f521b7c29c
Fixes: bz#1784790
Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When __socket_proto_state_machine() detected a problem in the size of
the request or it couldn't allocate an iobuf of the requested size, it
returned -ENOMEM (-12). However the caller was expecting only -1 in
case of error. For this reason the error passes undetected initially,
adding back the socket to the epoll object. On further processing,
however, the error is finally detected and the connection terminated.
Meanwhile, another thread could receive a poll_in event from the same
connection, which could cause races with the connection destruction.
When this happened, the process crashed.
To fix this, all error detection conditions have been hardened to be
more strict on what is valid and what not. Also, we don't return
-ENOMEM anymore. We always return -1 in case of error.
An additional change has been done to prevent destruction of the
transport object while it may still be needed.
Backport of:
> Change-Id: I6e59cd81cbf670f7adfdde942625d4e6c3fbc82d
> Fixes: bz#1782495
> Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
Change-Id: I6e59cd81cbf670f7adfdde942625d4e6c3fbc82d
Fixes: bz#1783227
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Changes in locks xlator:
Added support for per-domain inodelk count requests.
Caller needs to set GLUSTERFS_MULTIPLE_DOM_LK_CNT_REQUESTS key in the
dict and then set each key with name
'GLUSTERFS_INODELK_DOM_PREFIX:<domain name>'.
In the response dict, the xlator will send the per domain count as
values for each of these keys.
Changes in AFR:
Replaced afr_selfheal_locked_inspect() with afr_lockless_inspect(). Logic has
been added to make the latter behave same as the former, thus not
breaking the current heal info output behaviour.
fixes: bz#1783858
Change-Id: Ie9e83c162aa77f44a39c2ba7115de558120ada4d
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
(cherry picked from commit d7e049160a9dea988ded5816491c2234d40ab6b3)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: In the commit faf5ac13c4ee00a05e9451bf8da3be2a9043bbf2 missed one
condition to come out from the loop so after reach the slot_used to
1024 loop has become infinite loop
Solution: Correct the code path to avoid the infinite loop
> Change-Id: Ia02a109571f0d8cc9902c32db3e9b9282ee5c1db
> Fixes: bz#1781440
> Credits: Xavi Hernandez <xhernandez@redhat.com>
> Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
> (cherry picked from commit 8030f9c0f092170ceb50cedf59b9c330022825b7)
Change-Id: Ia02a109571f0d8cc9902c32db3e9b9282ee5c1db
Fixes: bz#1782826
Credits: Xavi Hernandez <xhernandez@redhat.com>
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
|
|
|
|
|
|
| |
Change-Id: I88f494f16153d27ab6e2f2faf4d557e075671b10
Fixes: bz#1781483
Signed-off-by: Anoop C S <anoopcs@redhat.com>
|
|
|
|
|
|
| |
Change-Id: I9ff54f2ca6f86bb5b2f4740485a0159e1fd7785f
Fixes: bz#1781486
Signed-off-by: yinkui <13965432176@163.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: Current slot allocation/deallocation code path is not
synchronized.There are scenario when due to race condition
in slot allocation/deallocation code path brick is crashed.
Solution: Synchronize slot allocation/deallocation code path to
avoid the issue
> Change-Id: I4fb659a75234218ffa0e5e0bf9308f669f75fc25
> Fixes: bz#1763036
> Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
> (cherry picked from commit faf5ac13c4ee00a05e9451bf8da3be2a9043bbf2)
Change-Id: I4fb659a75234218ffa0e5e0bf9308f669f75fc25
Fixes: bz#1778175
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The check we had for subnet mask validation wasn't checking in
proper sequence. Corrected the order of calling `inet_pton()` as
the fix.
Fixes: bz#1777769
Change-Id: I5d31468eb917aa94cbb85f573b37c60023e9daf3
Signed-off-by: Amar Tumballi <amar@kadalu.io>
(cherry picked from commit d60935d1011e387115e0445629976196f566b3b1)
|