| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
When eager-lock lock acquisition fails because of say network failures, the
local is not being removed from owners_list, this leads to accumulation of
waiting frames and the application will hang because the waiting frames are
under the assumption that another transaction is in the process of acquiring
lock because owner-list is not empty. Handled this case as well in this patch.
Added asserts to make it easier to find these problems in future.
fixes bz#1699736
Change-Id: I3101393265e9827755725b1f2d94a93d8709e923
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
In an arbiter volume configuration SHD will not send any writes onto the arbiter
brick even if there is data pending marker for the arbiter brick. If we have a
arbiter setup on the geo-rep master and there are data pending markers for the files
on arbiter brick, SHD will not mark any data changelog during healing. While syncing
the data from master to slave, if the arbiter-brick is considered as ACTIVE, then
there is a chance that slave will miss out some data. If the arbiter brick is being
newly added or replaced there is a chance of slave missing all the data during sync.
Fix:
If there is data pending marker for the arbiter brick, send truncate on the arbiter
brick during heal, so that it will record truncate as the data transaction in changelog.
Change-Id: I3242ba6cea6da495c418ef860d9c3359c5459dec
fixes: bz#1687687
Signed-off-by: karthik-us <ksubrahm@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a fop to create an entry fails on one of the data brick,
we mark the pending changelog on the entry on brick for which
it was successful. This is done as part of post op phase to
make sure that entry gets healed even if it gets renamed to
some other path where its parent was not marked as bad.
As it happens as part of post op, we should consider thin-arbiter
to check if the brick, which was successful, is the good brick or not.
This will avoide split brain and other issues.
>Change-Id: I12686675be98f02f70a5186b3ed748c541514d53
>Signed-off-by: Ashish Pandey <aspandey@redhat.com>
Change-Id: I12686675be98f02f70a5186b3ed748c541514d53
updates: bz#1672314
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
|
|
|
|
|
|
| |
Fixes: bz#1673268
Change-Id: I2b9be45f199f6436b858536c6f49be85902217f0
Signed-off-by: Nigel Babu <nigelb@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: When quorum count option is updated, the change is not reflected in
the nfs-server.vol file. This is because in get_checksum_for_file(), when the
last part of the file read has size less than buffer size, the read buffer
stores old data value along with correct data value.
Solution: Pass the bytes read instead of fixed buffer size, for calculating
checksum.
Change-Id: I4b641607c8a262961b3f3da0028a54e08c3f8589
fixes: bz#1672248
Signed-off-by: Varsha Rao <varao@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PROBLEM:
Lot of the earlier changes in the management of shards in lru, fsync
lists assumed that if a given shard exists in fsync list, it must be
part of lru list as well. This was found to be not true.
Consider this - a file is FALLOCATE'd to a size which would make the
number of participant shards to be greater than the lru list size.
In this case, some of the resolved shards that are to participate in
this fop will be evicted from lru list to give way to the rest of the
shards. And once FALLOCATE completes, these shards are added to fsync
list but without a ref. After the fop completes, these shard inodes
are unref'd and destroyed while their inode ctxs are still part of
fsync list. Now when an FSYNC is called on the base file and the
fsync-list traversed, the client crashes due to illegal memory access.
FIX:
Hold a ref on the shard inode when adding to fsync list as well.
And unref under following conditions:
1. when the shard is evicted from lru list
2. when the base file is fsync'd
3. when the shards are deleted.
Change-Id: Iab460667d091b8388322f59b6cb27ce69299b1b2
fixes: bz#1669382
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
(cherry picked from commit 72922c1fd69191b220f79905a23395c3a87f86ce)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
...when ctime is zero. ia_type and ia_gfid always need to be non-zero
for things to work correctly.
Problem:
Commit c9bde3021202f1d5c5a2d19ac05a510fc1f788ac zeroed out the iatt
buffer in the cbks of modification fops before unwinding if the ctime in
the buffer was zero. This was causing the fops to fail: noticeable when
AFR's 'consistent-metadata' option was enabled. (AFR zeros out the ctime
when the option is set. See commit
4c4624c9bad2edf27128cb122c64f15d7d63bbc8).
Fixes:
-Do not zero out the ia_type and ia_gfid of the iatt buff under any
circumstance.
-Also, fixed _rda_inode_ctx_update_iatts() to always update these values from
the incoming buf when ctime is zero. Otherwise we end up with zero
ia_type and ia_gfid the first time the function is called *and* the
incoming buf has ctime set to zero.
fixes: bz#1665145
Reported-By:Michael Hanselmann <public@hansmi.ch>
Change-Id: Ib72228892d42c3513c19fc6dfb543f2aa3489eca
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
(cherry picked from commit 09db11b0c020bc79d493c6d7e7ea4f3beb000c68)
|
|
|
|
|
|
|
|
|
|
|
|
| |
rm -rf <dir> fails on dirs which contain linkto files
that point to themselves because dht incorrectly thought
that they were cached files after looking them up.
The fix now treats them as invalid linkto files
and deletes them.
Change-Id: I376c72a5309714ee339c74485e02cfb4e29be643
fixes: bz#1671611
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The inode LRU mechanism is moot in fuse xlator (ie. there is no
limit for the LRU list), as fuse inodes are referenced from
kernel context, and thus they can only be dropped on request of
the kernel. This might results in a high number of passive
inodes which are useless for the glusterfs client, causing a
significant memory overhead.
This change tries to remedy this by extending the LRU semantics
and allowing to set a finite limit on the fuse inode LRU.
A brief history of problem:
When gluster's inode table was designed, fuse didn't have any
'invalidate' method, which means, userspace application could
never ask kernel to send a 'forget()' fop, instead had to wait
for kernel to send it based on kernel's parameters. Inode table
remembers the number of times kernel has cached the inode based
on the 'nlookup' parameter. And 'nlookup' field is not used by
no other entry points (like server-protocol, gfapi etc).
Hence the inode_table of fuse module always has to have lru-limit
as '0', which means no limit. GlusterFS always had to keep all
inodes in memory as kernel would have had a reference to it.
Again, the reason for this is, kernel's glusterfs inode reference
was pointer of 'inode_t' structure in glusterfs. As it is a
pointer, we could never free it (to prevent segfault, or memory
corruption).
Solution:
In the inode table, handle the prune case of inodes with 'nlookup'
differently, and call a 'invalidator' method, which in this case is
fuse_invalidate(), and it sends the request to kernel for getting
the forget request.
When the kernel sends the forget, it means, it has dropped all
the reference to the inode, and it will send the forget with the
'nlookup' parameter too. We just need to make sure to reduce the
'nlookup' value we have when we get forget. That automatically
cause the relevant prune to happen.
Credits: Csaba Henk, Xavier Hernandez, Raghavendra Gowdappa, Nithya B
fixes: bz#1623107
Change-Id: Ifee0737b23b12b1426c224ec5b8f591f487d83a2
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Creation of files/directories with non-ascii names fails
to sync to the slave. It crashes with below traceback on
slave.
...
File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/repce.py", line 118, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/resource.py", line 709, in entry_ops
[ESTALE, EINVAL, EBUSY])
File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/syncdutils.py", line 546, in errno_wrap
return call(*arg)
File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/libcxattr.py", line 83, in lsetxattr
cls.raise_oserr()
File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/libcxattr.py", line 38, in raise_oserr
raise OSError(errn, os.strerror(errn))
OSError: [Errno 12] Cannot allocate memory
Cause:
The length calculation arguments passed to blob creation was done before encoding. Hence
was failing in gfid-access layer.
Fix:
It appears that the calculating lenght properly fixes this issue. But it will cause
issues in other places in 'python2' and not in 'python3'. So encoding and decoding
each required string to make geo-rep compatible with both 'python2' and 'python3'
is a nightmare and is not fool proof. Hence kept 'python2' code as is with out
encode/decode and applied encode/decode only to 'python3'
Added non-ascii filename tests to regression
Backport of:
> Patch: https://review.gluster.org/21668
> BUG: 1650893
> Change-Id: I35cfaf848e07b1a0b5cb93c01b98b472f08271a6
> Signed-off-by: Kotresh HR <khiremat@redhat.com>
fixes: bz#1648642
Change-Id: I35cfaf848e07b1a0b5cb93c01b98b472f08271a6
Signed-off-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a single brick is added to the volume and the
newly added brick is the first to respond to a
dht_revalidate call, its stbuf will not be merged
into local->stbuf as the brick does not yet have
a layout. The is_permission_different check therefore
fails to detect that an attr heal is required as it
only considers the stbuf values from existing bricks.
To fix this, merge all stbuf values into local->stbuf
and use local->prebuf to store the correct directory
attributes.
Change-Id: Ic9e8b04a1ab9ed1248b6b056e3450bbafe32e1bc
fixes: bz#1660736
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The default value of shard-block-size was changed from 4MB
to 64MB sometime back. The script "fallocate"s a 6MB file
and expects it to have 1 shard under .shard. This worked when
the shard-block-size was 4MB. With the default value now at 64MB,
file "file1" won't have any shards under .shard and the stat on the
1st shard's path fails with ENOENT.
Changed the script to explicitly set shard-block-size to 4MB.
Change-Id: I7f1785922287d16d74c95fa57cbbe12e6e66e4f7
fixes: bz#1660932
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
(cherry picked from commit e69fc87593334b24432978dbf592fa73fe5fc38b)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
2 domain locking + xattrop for write-txn failures:
--------------------------------------------------
- A post-op wound on TA takes AFR_TA_DOM_NOTIFY range lock and
AFR_TA_DOM_MODIFY full lock, does xattrop on TA and releases
AFR_TA_DOM_MODIFY lock and stores in-memory which brick is bad.
- All further write txn failures are handled based on this in-memory
value without querying the TA.
- When shd heals the files, it does so by requesting full lock on
AFR_TA_DOM_NOTIFY domain. Client uses this as a cue (via upcall),
releases AFR_TA_DOM_NOTIFY range lock and invalidates its in-memory
notion of which brick is bad. The next write txn failure is wound on TA
to again update the in-memory state.
- Any incomplete write txns before the AFR_TA_DOM_NOTIFY upcall release
request is got is completed before the lock is released.
- Any write txns got after the release request are maintained in a ta_waitq.
- After the release is complete, the ta_waitq elements are spliced to a
separate queue which is then processed one by one.
- For fops that come in parallel when the in-memory bad brick is still
unknown, only one is wound to TA on wire. The other ones are maintained
in a ta_onwireq which is then processed after we get the response from
TA.
Change-Id: I32c7b61a61776663601ab0040e2f0767eca1fd64
updates: bz#1648205
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
If parent dir is in split-brain or has dirty xattrs set, and the file
has gfid missing on one of the bricks, then name heal won't assign the
gfid.
Fix:
Use the brick we select the gfid from as the 'source'.
Note: Problem was found while trying to debug a split-brain issue on
Cynthia Zhou's setup.
fixes: bz#1655545
Change-Id: Id088d4f0fb017aa35122de426654194e581ed742
Reported-by: Cynthia Zhou <cynthia.zhou@nokia-sbell.com>
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
(cherry picked from commit 4d58730c0cd6ab5db39aec8a15276f7bd3371b04)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Upcall notifications are received from server via epoll
and same thread is used to forward these notifications
to the application. This may lead to deadlock and hang
in the following scenario.
Consider if as part of handling these callbacks,
application has to do some operations which involve
sending I/Os to gfapi stack which inturn have to wait for
epoll threads to receive repsonse. Thus this may lead to
deadlock if all the epoll threads are waiting to complete
these callback notifications.
To address it, instead of using epoll thread itself,
make use of synctask to send those notificaitons to the
application.
Change-Id: If614e0d09246e4279b9d1f40d883a32a39c8fd90
updates: bz#1651323
Signed-off-by: Soumya Koduri <skoduri@redhat.com>
(cherry picked from commit ad35193718a99494ab1b852ca4cbdf054f73de88)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the glusterfs server recalls the lease, it expects
client to flush data and unlock the lease. If not it sets
a timer (starting from the time it sends RECALL request) and post
timeout, it revokes it.
Here we could have a race where in client did send UNLK
lease request but because of network delay it may have reached
after server revokes it. To handle such situations, treat
such requests as noop and return sucesss.
Change-Id: I166402d10273f4f115ff04030ecbc14676a01663
updates: bz#1651323
Signed-off-by: Soumya Koduri <skoduri@redhat.com>
(cherry picked from commit c2e758b54d8a3f778e3e63db0000bb8b63de9b25)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this change when SHD starts the index crawl it requests
all the clients to release the AFR_TA_DOM_NOTIFY lock so that
clients will know the in memory state is no more valid and
any new operations needs to query the thin-arbiter if required.
When SHD completes healing all the files without any failure, it
will again take the AFR_TA_DOM_NOTIFY lock and gets the xattrs on
TA to see whether there are any new failures happened by that time.
If there are new failures marked on TA, SHD will start the crawl
immediately to heal those failures as well. If there are no new
failures, then SHD will take the AFR_TA_DOM_MODIFY lock and unsets
the xattrs on TA, so that both the data bricks will be considered
as good there after.
>Change-Id: I037b89a0823648f314580ba0716d877bd5ddb1f1
>fixes: bz#1579788
>Signed-off-by: karthik-us <ksubrahm@redhat.com>
(cherry picked from commit 5784a00f997212d34bd52b2303e20c097240d91c)
Change-Id: I037b89a0823648f314580ba0716d877bd5ddb1f1
fixes: bz#1648205
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the commit febf5ed4848, during the volume create op,
we are setting volinfo->caps to 0, only if any of the bricks
belong to the same node and brickinfo->vg[0] is null.
Previously, we used to set volinfo->caps to 0, when
either brick doesn't belong to the same node or brickinfo->vg[0]
is null.
With this patch, we set volinfo->caps to 0, when either brick
doesn't belong to the same node or brickinfo->vg[0] is null.
(as we do earlier without commit febf5ed4848).
> BUG: bz#1635820
> Change-Id: I00a97415786b775fb088ac45566ad52b402f1a49
> Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
(cherry picked from commit aae1c402b74fd02ed2f6473b896f108d82aef8e3)
fixes: bz#1647968
Change-Id: I00a97415786b775fb088ac45566ad52b402f1a49
Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Patch https://review.gluster.org/#/c/glusterfs/+/19135/ has
optimised glusterd test cases by clubbing the similar test
cases into a single test case.
https://review.gluster.org/#/c/glusterfs/+/19135/15/tests/bugs/glusterd/bug-1293414-import-brickinfo-uuid.t
test case has been deleted and added as a part of
tests/bugs/glusterd/optimized-basic-testcases-in-cluster.t
In the original test case, we create a volume with two bricks,
each on a separate node(N1 & N2). From another node in cluster(N3),
we try to detach a node which is hosting bricks. It fails.
In the new test, we created volume with single brick on N1.
and from another node in cluster, we tried to detach N1. we
expect peer detach to fail, but peer detach was success as
the node is hosting all the bricks of volume.
Now, changing the new test case to cover the original test case scenario.
Please refer https://bugzilla.redhat.com/show_bug.cgi?id=1642597#c1 to
understand why the new test case is not failing in centos-regression.
> BUG: bz#1642597
> Change-Id: Ifda12b5677143095f263fbb97a6808573f513234
> Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
(cherry picked from commit 0ca6773eaf5aeb507ebc72d2c2f61902eeff414c)
fixes: bz#1643078
Change-Id: Ifda12b5677143095f263fbb97a6808573f513234
Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
https://review.gluster.org/#/c/glusterfs/+/21427/ seems to be failing
this .t spuriously. On checking one of the failure logs, I see:
22:05:44 Launching heal operation to perform index self heal on volume patchy has been unsuccessful:
22:05:44 Self-heal daemon is not running. Check self-heal daemon log file.
22:05:44 not ok 20 , LINENUM:38
In glusterd log:
[2018-10-18 22:05:44.298832] E [MSGID: 106301] [glusterd-syncop.c:1352:gd_stage_op_phase] 0-management: Staging of operation 'Volume Heal' failed on localhost : Self-heal daemon is not running. Check self-heal daemon log file
But the tests which preceed this check whether via a statedump if the shd is
conected to the bricks, and they have succeeded and even started
healing. From glustershd.log:
[2018-10-18 22:05:40.975268] I [MSGID: 108026] [afr-self-heal-common.c:1732:afr_log_selfheal] 0-patchy-replicate-0: Completed data selfheal on 3b83d2dd-4cf2-4ea3-a33e-4275be40f440. sources=[0] 1 sinks=2
So the only reason I can see launching heal via cli failing is a race where
shd has been spawned but glusterd has not yet updated in-memory that it is up,
and hence failing the CLI.
Fix:
Check for shd up status before launching heal via CLI
Change-Id: Ic88abf14ad3d51c89cb438db601fae4df179e8f4
fixes: bz#1641872
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
(cherry picked from commit 3dea105556130abd4da0fd3f8f2c523ac52398d1)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of:
> Change-Id: Ic15ca41444dd04684a9458bd4a526b1d3e160499
> Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
> (cherry picked from commit e627977)
> BUG: 1605056
In __shard_update_shards_inode_list(), previously shard translator
was not holding a ref on the base inode whenever a shard was added to
the lru list. But if the base shard is forgotten and destroyed either
by fuse due to memory pressure or due to the file being deleted at some
point by a different client with this client still containing stale
shards in its lru list, the client would crash at the time of locking
lru_base_inode->lock owing to illegal memory access.
So now the base shard is ref'd into the inode ctx of every shard that
is added to lru list until it gets lru'd out.
The patch also handles the case where none of the shards associated
with a file that is about to be deleted are part of the LRU list and
where an unlink at the beginning of the operation destroys the base
inode (because there are no refkeepers) and hence all of the shards
that are about to be deleted will be resolved without the existence
of a base shard in-memory. This, if not handled properly, could lead
to a crash.
Change-Id: Ic15ca41444dd04684a9458bd4a526b1d3e160499
updates: bz#1641440
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes below issues in gfapi lease code-path
* 'glfs_setfsleasid' should allow NULL input to be
able to reset leaseid
* Applications should be allowed to (un)register for
upcall notifications of type GLFS_EVENT_LEASE_RECALL
* APIs added to read contents of GLFS_EVENT_LEASE_RECALL
argument which is of type "struct glfs_upcall_lease"
This is backport of the below mainline patch -
https://review.gluster.org/#/c/glusterfs/+/21391
Change-Id: I3320ddf235cc82fad561e13b9457ebd64db6c76b
updates: #350
Signed-off-by: Soumya Koduri <skoduri@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
translators like readdir-ahead selectively retain entry information of
iatt (gfid and type) when rest of the iatt is invalidated (for write
invalidating ia_size, (m)(c)times etc). Fuse-bridge uses this
information and sends only entry information in readdirplus
response. However such option doesn't exist in gfapi. This patch
modifies gfapi to populate the stat by forcing an extra lookup.
Thanks to Shyamsundar Ranganathan <srangana@redhat.com> and Prashanth
Pai <ppai@redhat.com> for tests.
Change-Id: Ieb5f8fc76359c327627b7d8420aaf20810e53000
Fixes: bz#1630804
Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com>
Signed-off-by: Soumya Koduri <skoduri@redhat.com>
(cherry picked from commit 6257276d9de3f15643f159b2ec627a67c84fc23d)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of https://review.gluster.org/#/c/glusterfs/+/21135/
Problem:
When a directory has dirty xattrs due to failed post-ops or when
replace/reset brick is performed, AFR does a conservative merge as
expected, but heal-info reports it as split-brain because there are no
clear sources.
Fix:
Modify pending flag to contain information about pending heals and
split-brains. For directories, if spit-brain flag is not set,just show
them as needing heal and not being in split-brain.
Change-Id: I09ef821f6887c87d315ae99e6b1de05103cd9383
fixes: bz#1638163
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of https://review.gluster.org/#/c/glusterfs/+/21380/
Problem:
In an arbiter volume, if there is a pending data heal of a file only on
arbiter brick, self-heal takes inodelks twice due to a code-bug but unlocks
it only once, leaving behind a stale lock on the brick. This causes
the next write to the file to hang.
Fix:
Fix the code-bug to take lock only once. This bug was introduced master
with commit eb472d82a083883335bc494b87ea175ac43471ff
Thanks to Pranith Kumar K <pkarampu@redhat.com> for finding the RCA.
fixes: bz#1638159
Change-Id: I15ad969e10a6a3c4bd255e2948b6be6dcddc61e1
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most of the applications are {c|m}time dependant
and very few are atime dependant. So provide noatime
option to not update atime when ctime feature is
enabled.
Also this option has to be enabled with ctime
feature to avoid unnecessary self heal. Since
AFR/EC reads data from single subvolume, atime
is only updated in one subvolume triggering self
heal.
Backport of:
> Patch: https://review.gluster.org/21073
> BUG: 1593538
> Change-Id: I085fb33c882296545345f5df194cde7b6cbc337e
> Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 89636be4c73b12de2e11c75d8e59527bb243f147)
updates: bz#1633015
Change-Id: I085fb33c882296545345f5df194cde7b6cbc337e
Signed-off-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The jenkins release-new job runs on a CentOS 7 box, which does not
have python3. As a result it runs (autogen.sh and) configure before
producing the dist tar file, converting all the python3 shebangs to
python2 shebangs in the dist tar file.
Then when that tar file is "carried" to, e.g. Fedora koji build
system to build packages, the shebangs are incorrect, despite having
originally been correct in the git repo.
Change-Id: I5154baba3f6d29d3c4823bafc2b57abecbf90e5b
updates: #411
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. '--ignore-mising-args' option for rsync is not
being used even though the rsync version is
greater than 3.1.0. Fixed the same.
2. '--existing' option for rsync is also not being
used. Fixed the same.
3. geo-rep config fails to set rsync-options as the
value contains '--'. Interestingly, python argsparse
treats the value with '--' (e.g., --ignore-missing-args)
as option. But when passed with something like
--value=--ignore-missing-args, it succeeds. Fixed the
same.
Backport of:
> Patch: https://review.gluster.org/21191
> Change-Id: Iaeb838acaff1c2920fee9c7f920c99edce13a0a1
> Signed-off-by: Kotresh HR <khiremat@redhat.com>
> BUG: 1629561
(cherry picked from commit b977b44dd0adfcd7a3b432844260de4b8d1c4adf)
Change-Id: Iaeb838acaff1c2920fee9c7f920c99edce13a0a1
Signed-off-by: Kotresh HR <khiremat@redhat.com>
fixes: bz#1630673
|
|
|
|
|
|
|
|
| |
experimental xlators removed from 5.0
Change-Id: I47219d8b95efc3d5875ec9224d1e79f8371e9f76
Updates: bz#1628620
Signed-off-by: ShyamsundarR <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reverted the following:
- 248152767b0599986bbb6bb35fc27197f6be6964
- 09943beb499617212f2985ca8ea9ecd1ed1b470e
- d01f7244e9d9f7e3ef84e0ba7b48ef1b1b09d809
The reverts are redone by hand, due to clang format changes
that made using git to revert the changes more tedious.
Change-Id: I96489638a2b641fb2206a110298543225783f7be
Updates: bz#1628620
Signed-off-by: ShyamsundarR <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 384562b294e9a7847403961e878a4daa0fff33eb.
The revert is done manually owing to the clang format changes,
and the 4.1 patch that reverts this fix is used as the source
for the revert.
The 4.1 patch has the commit ID: 98376e0c0a
Change-Id: Ib2cbce9940f6a2a2755eb47bf332832147835e4d
Updates: bz#1628620
Signed-off-by: ShyamsundarR <srangana@redhat.com>
|
|
|
|
|
| |
Change-Id: Ia84cc24c8924e6d22d02ac15f611c10e26db99b4
Signed-off-by: Nigel Babu <nigelb@redhat.com>
|
|
|
|
| |
Change-Id: I6f5d8140a06f3c1b2d196849299f8d483028d33b
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* One #!/usr/bin/env python and three #!/usr/bin/python were overlooked
in all the other python fixups. Ugh.
* Two new python files missed the memo about #!/usr/bin/python3.
* One #!/usr/bin/env bash.
Various distribution packaging policies have strong wording about
the use of #!/usr/bin/env ...
Note: this patch does not change the use of #!/usr/bin/env bash in
the two files extras/{clang-checker.sh,check_goto.pl} as these are
not included in any packages. (Although I'm not actually sure why
anyone would ever use '/usr/bin/env {sh,bash}' as I'm not aware of
any version-specific differences like there are with, e.g., python.)
* One #!/usr/bin/bash.
On Fedora and CentOS > 6, /bin is a symlink to /usr/bin, so it
makes little difference. But Debian & Ubuntu still have separate
/bin and /usr/bin; and sh and bash are in /bin, not /usr/bin.
(Historically, in BSD and SYSV Unix it was /bin/sh.)
Note: Fedora and CentOS package build runs a script that converts
all /bin/sh and /bin/bash to /usr/bin/sh and /usr/bin/bash.
Change-Id: I9171265829af78dd0cd7622c22b56d22179ff8a3
updates: bz#1193929
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Adding checks for avoiding glusterd's working directory used as
a brick for volume creation.
fixes: bz#853601
Change-Id: I4b16a05f752e92216aa628f542a4fdbf59b3c669
Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The earlier implementation required the file to already exist
when trying to get the hashed subvol. The reworked implementation
allows a user to get the hashed subvol for any filename, whether
it exists or not.
Usage: getfattr -n "dht.file.hashed-subvol.<filename>" <parent dir>
Eg:To get the hashed subvol for file-1 inside dir-1
getfattr -n "dht.file.hashed-subvol.file-1" /mnt/gluster/dir1
credit: rgowdapp@redhat.com
Change-Id: Iae20bd5f56d387ef48c1c0a4ffa9f692866bf739
fixes: bz#1624244
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It wouldn't make sense to allow iostats file to be written in
*any* directory. While the formating makes sure we try to append
io-stats-name for the file, so overwriting existing file is slim,
but in any case it makes sense to restrict dumping to one directory.
Below are the sample commands, and files created for the corresponding
values:
$ setfattr -n trusted.io-stats-dump -v file-for-dump $M0
In this case, the file would be in /var/run/gluster/file-for-dump
$ setfattr -n trusted.io-stats-dump -v /dir1/dir2/file-for-dump $M0
In this case, then the dump file is in /var/run/gluster/dir1-dir2-file-for-dump
Note that the value passed for this virtual xattr would be treated as a
file, and even if the value has '/' in it, it would be changed to '-'
for sanity.
Fixes: bz#1625106
Change-Id: Id9ae6a40a190b8937c51662e6e1c2a0f6c86a0e0
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If both data bricks are up, read subvol will be based on read_subvols.
If only one data brick is up:
- First qeury the data-brick that is up. If it blames the other brick,
allow the reads.
- If if doesn't, query the TA to obtain the source of truth.
TODO: See if in-memory state can be maintained for read txns (BZ 1624358).
updates: bz#1579788
Change-Id: I61eec35592af3a1aaf9f90846d9a358b2e4b2fcc
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
New CLI option for `glusterfsd` binary to get the path of
libexec directory. This helps glusterd2 to detect the
installed path of `gsyncd` and other binaries.
Usage: `glusterfsd --print-libexecdir`
Updates: bz#1193929
Change-Id: I8c1a74afd9acec7ee7bd3deabed9d9f20fe3fb5f
Signed-off-by: Aravinda VK <avishwan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
xlators/cluster/stripe/src/stripe-helpers.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
xlators/cluster/dht/src/tier.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
xlators/cluster/dht/src/dht-layout.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
xlators/cluster/dht/src/dht-helper.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
xlators/cluster/dht/src/dht-common.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
xlators/cluster/afr/src/afr.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
xlators/cluster/afr/src/afr-inode-read.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
tests/bugs/replicate/bug-1250170-fsync.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
tests/basic/gfapi/gfapi-async-calls-test.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
tests/basic/ec/ec-fast-fgetxattr.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
rpc/xdr/src/glusterfs3.h: Move to GF_MALLOC() instead of GF_CALLOC() when possible
rpc/rpc-transport/socket/src/socket.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
rpc/rpc-lib/src/rpc-clnt.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
extras/geo-rep/gsync-sync-gfid.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
cli/src/cli-xml-output.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
cli/src/cli-rpc-ops.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
cli/src/cli-cmd-volume.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
cli/src/cli-cmd-system.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
cli/src/cli-cmd-snapshot.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
cli/src/cli-cmd-peer.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
cli/src/cli-cmd-global.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
It doesn't make sense to calloc (allocate and clear) memory
when the code right away fills that memory with data.
It may be optimized by the compiler, or have a microscopic
performance improvement.
In some cases, also changed allocation size to be sizeof some
struct or type instead of a pointer - easier to read.
In some cases, removed redundant strlen() calls by saving the result
into a variable.
1. Only done for the straightforward cases. There's room for improvement.
2. Please review carefully, especially for string allocation, with the
terminating NULL string.
Only compile-tested!
updates: bz#1193929
Original-Author: Yaniv Kaul <ykaul@redhat.com>
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
Signed-off-by: Amar Tumballi <amarts@redhat.com>
Change-Id: I16274dca4078a1d06ae09a0daf027d734b631ac2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
When name-self-heal is triggered on the mount, it blocks
lookup until name-self-heal completes. But that can lead
to hangs when lot of clients are accessing a directory which
needs name heal and all of them trigger heals waiting
for other clients to complete heal.
Fix:
When a name-heal is needed but quorum number of names have the
file and pending xattrs exist on the parent, then better to
delegate the heal to SHD which will be completed as part of
entry-heal of the parent directory. We could also do the same
for quorum-number of names not present but we don't have
any known use-case where this is a frequent occurrence so
not changing that part at the moment. When there is a gfid
mismatch or missing gfid it is important to complete the heal
so that next rename doesn't assume everything is fine and
perform a rename etc
fixes bz#1622821
Change-Id: I8b002c85dffc6eb6f2833e742684a233daefeb2c
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
see https://review.gluster.org/#/c/19788/,
https://review.gluster.org/#/c/19871/,
https://review.gluster.org/#/c/19952/,
https://review.gluster.org/#/c/20104/,
https://review.gluster.org/#/c/20162/,
https://review.gluster.org/#/c/20185/,
https://review.gluster.org/#/c/20207/,
https://review.gluster.org/#/c/20227/,
https://review.gluster.org/#/c/20307/,
https://review.gluster.org/#/c/20320/,
https://review.gluster.org/#/c/20332/,
https://review.gluster.org/#/c/20364/,
https://review.gluster.org/#/c/20441/, and
https://review.gluster.org/#/c/20484
shebangs changed from /usr/bin/python2 to /usr/bin/python3.
(Reminder, various distribution packaging guidelines require use
of explicit python version and don't allow '#!/usr/bin/env python',
regardless of how handy that idiom may be.)
glusterfs.spec(.in) package python{2,3}-gluster and python2 or
python3 dependencies as appropriate.
configure(.ac):
+ test for and use python2 or python3 as appropriate. If build
machine has python2 and python3, use python3. Override by
setting PYTHON=/usr/bin/python2 when running configure.
+ PYTHONDEV_CPPFLAGS from python[23]-config --includes is a
better match to the original python sysconfig.get_python_inc().
All those other extraneous flags breaks the build.
+ Only change the shebangs once. Changing them over and over
again, e.g., during a `make glusterrpms` in extras/LinuxRPM
just sends make (is it really make that's looping?) into an
infinite loop. If you figure out why, let me know.
+ Oldest python2 is python2.6 on CentOS 6 and Debian 8 (Jessie).
Everything else has 2.7 or 3.x
+ logic from https://review.gluster.org/c/glusterfs/+/21050, which
needs to be removed/merged after that patch is merged.
Builds on CentOS 6, CentOS 7, Fedora 28, Fedora rawhide, and the
mysterious RHEL > 7.
Change-Id: Idae21d3b6f58b32372e1daa0d234e491e563198f
updates: #411
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The linkto file creation for the dst was done in parallel with
the unlink of the old src linkto. If these operations reached
the brick out of order, we end up with a dst linkto file without
a .glusterfs handle.
Fixed by the unlinking only after the linkto file creation has
completed.
Change-Id: I4246f7655f5bc180f5ded7fd34d263b7828a8110
fixes: bz#1621981
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
If requested start time and end time doesn't fall into
first HTIME file, then history API fails even though
continuous changelogs are avaiable for the requested range
in other HTIME files. This is induced by changelog disable
and enable which creates fresh HTIME index file.
Cause and Analysis:
Each HTIME index file represents the availability of
continuous changelogs. If changelog is disabled and enabled,
a new HTIME index file is created represents non availability
of continuous changelogs. So as long as the requested start
and end falls into single HTIME index file and not across,
history API should succeed.
But History API checks for the changelogs only in first
HTIME index file and errors out if not available.
Fix:
Check in all HTIME index files for availability of continuous
changelogs for requested change.
fixes: bz#1622549
Change-Id: I80eeceb5afbd1b89f86a9dc4c320e161907d3559
Signed-off-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
When metadata-self-heal is triggered on the mount, it blocks
lookup until metadata-self-heal completes. But that can lead
to hangs when lot of clients are accessing a directory which
needs metadata heal and all of them trigger heals waiting
for other clients to complete heal.
Fix:
Only when the heal is needed but the pending xattrs are not set,
trigger metadata heal that could block lookup. This is the only
case where different clients may give different metadata to the
clients without heals, which should be avoided.
Updates bz#1622821
Change-Id: I6089e9fda0770a83fb287941b229c882711f4e66
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
| |
Change-Id: Iaeea470d040587027f37e0760ae27c4fc205a189
fixes: bz#1613098
Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: If ctr xlator is not required it consumes
resources unnecessarily
Solution: Call ctr xlator init only while feature is enabled
Fixes: bz#1524323
Change-Id: I378113a390a286be20c4ade1b1bac170a8ef1b14
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When tests timeout, the timeout command sends TERM
signal to the command being executed. In the case of run-tests.sh
it invokes prove, which further invokes perl and finally the test
is run using bash. The TERM signal does not seem to be reachnig
the end bash that is actually executing the tests, and hence
when any test is terminated due to a timeout, the cleanup routine
in include.rc does not get a chance to run and preserve the
tarball.
Further, cleanup invokes tarball generation, but is invoked at
the beginning and end of every test, and at times in beteween
as well. This caused way too many tarballs in case we decide to
preserve the same whenever generated by cleanup.
This patch hence moves the tarball generation to run-tests.sh
instead, and further stores them named <test>-iteration-<n>.tar
and also prints tarball name generated and stored per iteration.
This should help relate failed runs to the tarball iteration #
and to look at relevant logs.
Further the patch also provides a -p option to run-tests.sh for
unit testing purposes, where running a test in a loop without the
option will generate as many tarballs, and using the option will
reduce this to preserving the last tarball, saving space in
smaller unit test setups.
Fixes: bz#1614062
Change-Id: I0aee76c89df0691cf4d0c1fcd4c04dffe0d7c896
Signed-off-by: ShyamsundarR <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PROBLEM:
========
USS design depends on snapview-server translator communicating with each
individual snapshot via gfapi. So, the snapview-server xlator maintains
the glfs instance (thus the snapshot) to which a inode belongs to by
storing it inside the inode context.
Suppose, a file from a snapshot is opened by a application, and the fd
is still valid from application's point of view (i.e. application has
not yet closed fd). Now, if the snapshot to which the opened file
belongs to is deleted, then the glfs_t instance corresponding to the
snapshot is destroyed by snapview-server as part of snap deletion.
But now, if the application does IO on the fd it has kept open, then
snapview server tries to send that request to the corresponding snap
via glfs instance for that snapshot stored in the inode context for
the file on which the application is sending the fop. And this results
in freed up glfs_t pointer being accessed and causes a segfault.
FIX:
===
For fd based operations, check whether the glfs instance that the inode
contains in its context, is still valid or not.
For non fd based operations, usually lookup should guarantee that. But
if the file was already looked up, and the client accessing the snap data
(either NFS, or native glusterfs fuse) does not bother to send a lookup
and directly sends a path based fop, then that path based fop should
ensure that the fs instance is valid.
Change-Id: I881be15ec46ecb51aa844d7fd41d5630f0d644fb
updates: bz#1602070
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
The value of trusted.pgfid.xx was always set to 1
in posix_mknod. This is incorrect if posix_mknod
calls posix_create_link_if_gfid_exists.
Change-Id: Ibe87ca6f155846b9a7c7abbfb1eb8b6a99a5eb68
fixes: bz#1619720
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|