summaryrefslogtreecommitdiffstats
path: root/tests
Commit message (Collapse)AuthorAgeFilesLines
* afr: prevent spurious entry heals leading to gfid split-brainRavishankar N2020-04-062-14/+62
| | | | | | | | | | | | | | | | | | | | | Problem: In a hyperconverged setup with granular-entry-heal enabled, if a file is recreated while one of the bricks is down, and an index heal is triggered (with the brick still down), entry-self heal was doing a spurious heal with just the 2 good bricks. It was doing a post-op leading to removal of the filename from .glusterfs/indices/entry-changes as well as erroneous setting of afr xattrs on the parent. When the brick came up, the xattrs were cleared, resulting in the renamed file not getting healed and leading to gfid split-brain and EIO on the mount. Fix: Proceed with entry heal only when shd can connect to all bricks of the replica, just like in data and metadata heal. fixes: #1103 Change-Id: I916ae26ad1fabf259bc6362da52d433b7223b17e Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 06453d77d056fbaa393a137ca277a20e38d2f67e)
* afr: mark pending xattrs as a part of metadata healRavishankar N2020-04-021-0/+59
| | | | | | | | | | | | | | | | | | | | ...if pending xattrs are zero for all children. Problem: If there are no pending xattrs and a metadata heal needs to be performed, it can be possible that we end up with xattrs inadvertendly deleted from all bricks, as explained in the BZ. Fix: After picking one among the sources as the good copy, mark pending xattrs on all sources to blame the sinks. Now even if this metadata heal fails midway, a subsequent heal will still choose one of the valid sources that it picked previously. Updates: #1067 Change-Id: If1b050b70b0ad911e162c04db4d89b263e2b8d7b Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 2d5ba449e9200b16184b1e7fc84cabd015f1f779)
* Cluster/afr: Don't treat all bricks having metadata pending as split-brainkarthik-us2020-03-022-64/+130
| | | | | | | | | | | | | | | | | | | | | | | | | | Problem: We currently don't have a roll-back/undoing of post-ops if quorum is not met. Though the FOP is still unwound with failure, the xattrs remain on the disk. Due to these partial post-ops and partial heals (healing only when 2 bricks are up), we can end up in metadata split-brain purely from the afr xattrs point of view i.e each brick is blamed by atleast one of the others for metadata. These scenarios are hit when there is frequent connect/disconnect of the client/shd to the bricks. Fix: Pick a source based on the xattr values. If 2 bricks blame one, the blamed one must be treated as sink. If there is no majority, all are sources. Once we pick a source, self-heal will then do the heal instead of erroring out due to split-brain. This patch also adds restriction of all the bricks to be up to perform metadata heal to avoid any metadata loss. Removed the test case tests/bugs/replicate/bug-1468279-source-not-blaming-sinks.t as it was doing metadata heal even when only 2 of 3 bricks were up. Change-Id: I07a9d62f84ceda329dcab1f02a33aeed258dcb09 fixes: bz#1806931 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* protocol/client: Do not fallback to anon-fd if fd is not openPranith Kumar K2020-03-021-0/+36
| | | | | | | | | | | | | | | | | | | | | | If an open comes on a file when a brick is down and after the brick comes up, a fop comes on the fd, client xlator would still wind the fop on anon-fd leading to wrong behavior of the fops in some cases. Example: If lk fop is issued on the fd just after the brick is up in the scenario above, lk fop will be sent on anon-fd instead of failing it on that client xlator. This lock will never be freed upon close of the fd as flush on anon-fd is invalid and is not wound below server xlator. As a fix, failing the fop unless the fd has FALLBACK_TO_ANON_FD flag. > Change-Id: I77692d056660b2858e323bdabdfe0a381807cccc > fixes bz#1390914 > Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Change-Id: I77692d056660b2858e323bdabdfe0a381807cccc fixes bz#1808256 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* afr: wake up index healer threadsRavishankar N2020-02-271-0/+42
| | | | | | | | | | | | | | | Backport of https://review.gluster.org/#/c/glusterfs/+/23288/ ...whenever shd is re-enabled after disabling or there is a change in `cluster.heal-timeout`, without needing to restart shd or waiting for the current `cluster.heal-timeout` seconds to expire. See BZ 1743988 for more details. Change-Id: Ia5ebd7c8e9f5b54cba3199c141fdd1af2f9b9bfe fixes: bz#1807431 Reported-by: Glen Kiessling <glenk1973@hotmail.com> Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* tests: fix spurious failure of bug-1402841.t-mt-dir-scan-race.tRavishankar N2020-02-271-4/+5
| | | | | | | | | | | | | | | | | | | | | | | Problem: Since commit 600ba94183333c4af9b4a09616690994fd528478, shd starts healing as soon as it is toggled from disabled to enabled. This was causing the following line in the .t to fail on a 'fast' machine (always on my laptop and sometimes on the jenkins slaves). EXPECT_NOT "^0$" get_pending_heal_count $V0 because by the time shd was disabled, the heal was already completed. Fix: Increase the no. of files to be healed and make it a variable called FILE_COUNT, should we need to bump it up further because the machines become even faster. Also created pending metadata heals to increase the time taken to heal a file. fixes: bz#1807748 Change-Id: I5a26b08e45b8c19bce3c01ce67bdcc28ed48198d Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 724c657995a2e148243eeb78c68b620c6d7714a5)
* cluster/ec: fix EIO error for concurrent writes on sparse filesXavi Hernandez2020-02-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | EC doesn't allow concurrent writes on overlapping areas, they are serialized. However non-overlapping writes are serviced in parallel. When a write is not aligned, EC first needs to read the entire chunk from disk, apply the modified fragment and write it again. The problem appears on sparse files because a write to an offset implicitly creates data on offsets below it (so, in some way, they are overlapping). For example, if a file is empty and we read 10 bytes from offset 10, read() will return 0 bytes. Now, if we write one byte at offset 1M and retry the same read, the system call will return 10 bytes (all containing 0's). So if we have two writes, the first one at offset 10 and the second one at offset 1M, EC will send both in parallel because they do not overlap. However, the first one will try to read missing data from the first chunk (i.e. offsets 0 to 9) to recombine the entire chunk and do the final write. This read will happen in parallel with the write to 1M. What could happen is that half of the bricks process the write before the read, and the half do the read before the write. Some bricks will return 10 bytes of data while the otherw will return 0 bytes (because the file on the brick has not been expanded yet). When EC tries to recombine the answers from the bricks, it can't, because it needs more than half consistent answers to recover the data. So this read fails with EIO error. This error is propagated to the parent write, which is aborted and EIO is returned to the application. The issue happened because EC assumed that a write to a given offset implies that offsets below it exist. This fix prevents the read of the chunk from bricks if the current size of the file is smaller than the read chunk offset. This size is correctly tracked, so this fixes the issue. Also modifying ec-stripe.t file for Test #13 within it. In this patch, if a file size is less than the offset we are writing, we fill zeros in head and tail and do not consider it strip cache miss. That actually make sense as we know what data that part holds and there is no need of reading it from bricks. Change-Id: Ic342e8c35c555b8534109e9314c9a0710b6225d6 Fixes: bz#1805053 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* cluster/ec: Prevent double pre-op xattropsPranith Kumar K2020-02-241-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Race: Thread-1 Thread-2 1) Does ec_get_size_version() to perform pre-op fxattrop as part of write-1 2) Calls ec_set_dirty_flag() in ec_get_size_version() for write-2. This sets dirty[] to 1 3) Completes executing ec_prepare_update_cbk leading to ctx->dirty[] = '1' 4) Takes LOCK(inode->lock) to check if there are any flags and sets dirty-flag because lock->waiting_flag is 0 now. This leads to fxattrop to increment on-disk dirty[] to '2' At the end of the writes the file will be marked for heal even when it doesn't need heal. Fix: Perform ec_set_dirty_flag() and other checks inside LOCK() to prevent dirty[] to be marked as '1' in step 2) above Fixes: bz#1805050 Change-Id: Icac2ab39c0b1e7e154387800fbededc561612865 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* server: Mount fails after reboot 1/3 gluster nodesMohit Agrawal2020-02-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | Problem: At the time of coming up one server node(1x3) after reboot client is unmounted.The client is unmounted because a client is getting AUTH_FAILED event and client call fini for the graph.The client is getting AUTH_FAILED because brick is not attached with a graph at that moment Solution: To avoid the unmounting the client graph throw ENOENT error from server in case if brick is not attached with server at the time of authenticate clients. > Credits: Xavi Hernandez <xhernandez@redhat.com> > Change-Id: Ie6fbd73cbcf23a35d8db8841b3b6036e87682f5e > Fixes: bz#1793852 > Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> > (cherry picked from commit f6421dff22a6ddaf14134f6894deae219948c89d) Fixes: bz#1804512 Change-Id: Ie6fbd73cbcf23a35d8db8841b3b6036e87682f5e Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* cluster/ec: fix fd reopenXavi Hernandez2020-02-202-0/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently EC tries to reopen fd's that have been opened while a brick was down. This is done as part of regular write operations, just after having acquired the locks, and it's sent as a sub-fop of the main write fop. There were two problems: 1. The reopen was attempted on all UP bricks, even if a previous lock didn't succeed. This is incorrect because most probably the open will fail. 2. If reopen is sent and fails, the error is propagated to the main operation, causing it to fail when it shouldn't. To fix this, we only attempt reopens on bricks where the current fop owns a lock, and we prevent any error to be propagated to the main fop. To implement this behaviour an argument used to indicate the minimum number of required answers has overloaded to also include some flags. To make the change consistent, it has been necessary to rename the argument, which means that a lot of files have been changed. However there are no functional changes. This change has also uncovered a problem in discard code, which didn't correctely process requests of small sizes because no real discard fop was being processed, only a write of 0's on some region. In this case some fields of the fop remained uninitialized or with incorrect values. To fix this, a new function has been created to simulate success on a fop and it's used in the discard case. Thanks to Pranith for providing a test script that has also detected an issue in this patch. This patch includes a small modification of this script to force data to be written into bricks before stopping them. Change-Id: If272343873369186c2fb8f43c1d9c52c3ea304ec Fixes: bz#1805047 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* cluster/afr: Heal entries when there is a source & no healed_sinkskarthik-us2019-10-111-0/+89
| | | | | | | | | | | | | | | | | | | | Problem: In a situation where B1 blames B2, B2 blames B1 and B3 doesn't blame anything for entry heal, heal will not complete even though we have clear source and sinks. This will happen because while doing afr_selfheal_find_direction() only the bricks which are blamed by non-accused bricks are considered as sinks. Later in __afr_selfheal_entry_finalize_source() when it tries to mark all the non-sources as sinks it fails to do so because there won't be any healed_sinks marked, no witness present and there will be a source. Fix: If there is a source and no healed_sinks, then reset all the locked sources to 0 and healed sinks to 1 to do conservative merge. Change-Id: If40d8bc95d52a52b2730f55bdcf135109b421548 Fixes: bz#1760710 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* afr/lookup: Pass xattr_req in while doing a selfheal in lookupMohammed Rafi KC2019-09-052-0/+53
| | | | | | | | | | | | | | | We were not passing xattr_req when doing a name self heal as well as a meta data heal. Because of this, some xdata was missing which causes i/o errors Backport of>https://review.gluster.org/#/c/glusterfs/+/23024/ >Change-Id: Ibfb1205a7eb0195632dc3820116ffbbb8043545f >Fixes: bz#1728770 >Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Change-Id: I740ccdbfcf2667fe4a850c12ecdc4b9eeed08293 Fixes: bz#1749352 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* glusterd: add GF_TRANSPORT_BOTH_TCP_RDMA in glusterd_get_gfproxy_client_volfileAtin Mukherjee2019-07-181-1/+3
| | | | | | | | | | | | | ... with out which volume creation fails with "volume create: <xyz>: failed: Failed to create volume files" >Fixes: bz#1716812 >Change-Id: I2f4c2c6d5290f066b54e1c1db19e25db9937bedb >Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Fixes: bz#1721106 Change-Id: I2f4c2c6d5290f066b54e1c1db19e25db9937bedb Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* cluster/ec: honor contention notifications for partially acquired locksXavi Hernandez2019-06-281-0/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | EC was ignoring lock contention notifications received while a lock was being acquired. When a lock is partially acquired (some bricks have granted the lock but some others not yet) we can receive notifications from acquired bricks, which should be honored, since we may not receive more notifications after that. Since EC was ignoring them, once the lock was acquired, it was not released until the eager-lock timeout, causing unnecessary delays on other clients. This fix takes into consideration the notifications received before having completed the full lock acquisition. After that, the lock will be releaed as soon as possible. Backport of: > BUG: bz#1708156 > Change-Id: I2a306dbdb29fb557dcab7788a258bd75d826cc12 > Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Fixes: bz#1717282 Change-Id: I2a306dbdb29fb557dcab7788a258bd75d826cc12 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Signed-off-by: Hari Gowtham <hgowtham@redhat.com>
* tests/utils: Fix py2/py3 util python scriptsKotresh HR2019-06-288-30/+261
| | | | | | | | | | | | | | | | | | | | | Following files are fixed. tests/bugs/distribute/overlap.py tests/utils/changelogparser.py tests/utils/create-files.py tests/utils/gfid-access.py tests/utils/libcxattr.py Have marked glupy as bad test. Backport of: > Change-Id: I3db857cc19e19163d368d913eaec1269fbc37140 > BUG: 1193929 > Signed-off-by: Kotresh HR <khiremat@redhat.com> Change-Id: I3db857cc19e19163d368d913eaec1269fbc37140 Updates: bz#1629877 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* cluster/afr: Remove local from owners_list on failure of lock-acquisitionPranith Kumar K2019-05-081-0/+47
| | | | | | | | | | | | | When eager-lock lock acquisition fails because of say network failures, the local is not being removed from owners_list, this leads to accumulation of waiting frames and the application will hang because the waiting frames are under the assumption that another transaction is in the process of acquiring lock because owner-list is not empty. Handled this case as well in this patch. Added asserts to make it easier to find these problems in future. fixes bz#1699736 Change-Id: I3101393265e9827755725b1f2d94a93d8709e923 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/afr: Send truncate on arbiter brick from SHDkarthik-us2019-03-122-1/+39
| | | | | | | | | | | | | | | | | | | Problem: In an arbiter volume configuration SHD will not send any writes onto the arbiter brick even if there is data pending marker for the arbiter brick. If we have a arbiter setup on the geo-rep master and there are data pending markers for the files on arbiter brick, SHD will not mark any data changelog during healing. While syncing the data from master to slave, if the arbiter-brick is considered as ACTIVE, then there is a chance that slave will miss out some data. If the arbiter brick is being newly added or replaced there is a chance of slave missing all the data during sync. Fix: If there is data pending marker for the arbiter brick, send truncate on the arbiter brick during heal, so that it will record truncate as the data transaction in changelog. Change-Id: I3242ba6cea6da495c418ef860d9c3359c5459dec fixes: bz#1687687 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* cluster/thin-arbiter: Consider thin-arbiter before marking new entry changelogAshish Pandey2019-02-181-6/+16
| | | | | | | | | | | | | | | | | | | If a fop to create an entry fails on one of the data brick, we mark the pending changelog on the entry on brick for which it was successful. This is done as part of post op phase to make sure that entry gets healed even if it gets renamed to some other path where its parent was not marked as bad. As it happens as part of post op, we should consider thin-arbiter to check if the brick, which was successful, is the good brick or not. This will avoide split brain and other issues. >Change-Id: I12686675be98f02f70a5186b3ed748c541514d53 >Signed-off-by: Ashish Pandey <aspandey@redhat.com> Change-Id: I12686675be98f02f70a5186b3ed748c541514d53 updates: bz#1672314 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* Bump up timeout for tests on AWSNigel Babu2019-02-073-3/+4
| | | | | | Fixes: bz#1673268 Change-Id: I2b9be45f199f6436b858536c6f49be85902217f0 Signed-off-by: Nigel Babu <nigelb@redhat.com>
* libglusterfs/common-utils.c: Fix buffer size for checksum computationVarsha Rao2019-02-041-0/+35
| | | | | | | | | | | | | | Problem: When quorum count option is updated, the change is not reflected in the nfs-server.vol file. This is because in get_checksum_for_file(), when the last part of the file read has size less than buffer size, the read buffer stores old data value along with correct data value. Solution: Pass the bytes read instead of fixed buffer size, for calculating checksum. Change-Id: I4b641607c8a262961b3f3da0028a54e08c3f8589 fixes: bz#1672248 Signed-off-by: Varsha Rao <varao@redhat.com>
* features/shard: Ref shard inode while adding to fsync listKrutika Dhananjay2019-02-041-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PROBLEM: Lot of the earlier changes in the management of shards in lru, fsync lists assumed that if a given shard exists in fsync list, it must be part of lru list as well. This was found to be not true. Consider this - a file is FALLOCATE'd to a size which would make the number of participant shards to be greater than the lru list size. In this case, some of the resolved shards that are to participate in this fop will be evicted from lru list to give way to the rest of the shards. And once FALLOCATE completes, these shards are added to fsync list but without a ref. After the fop completes, these shard inodes are unref'd and destroyed while their inode ctxs are still part of fsync list. Now when an FSYNC is called on the base file and the fsync-list traversed, the client crashes due to illegal memory access. FIX: Hold a ref on the shard inode when adding to fsync list as well. And unref under following conditions: 1. when the shard is evicted from lru list 2. when the base file is fsync'd 3. when the shards are deleted. Change-Id: Iab460667d091b8388322f59b6cb27ce69299b1b2 fixes: bz#1669382 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> (cherry picked from commit 72922c1fd69191b220f79905a23395c3a87f86ce)
* readdir-ahead: do not zero-out iatt in fop cbkRavishankar N2019-02-041-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | ...when ctime is zero. ia_type and ia_gfid always need to be non-zero for things to work correctly. Problem: Commit c9bde3021202f1d5c5a2d19ac05a510fc1f788ac zeroed out the iatt buffer in the cbks of modification fops before unwinding if the ctime in the buffer was zero. This was causing the fops to fail: noticeable when AFR's 'consistent-metadata' option was enabled. (AFR zeros out the ctime when the option is set. See commit 4c4624c9bad2edf27128cb122c64f15d7d63bbc8). Fixes: -Do not zero out the ia_type and ia_gfid of the iatt buff under any circumstance. -Also, fixed _rda_inode_ctx_update_iatts() to always update these values from the incoming buf when ctime is zero. Otherwise we end up with zero ia_type and ia_gfid the first time the function is called *and* the incoming buf has ctime set to zero. fixes: bz#1665145 Reported-By:Michael Hanselmann <public@hansmi.ch> Change-Id: Ib72228892d42c3513c19fc6dfb543f2aa3489eca Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 09db11b0c020bc79d493c6d7e7ea4f3beb000c68)
* cluster/dht: Delete invalid linkto files in rmdirN Balachandran2019-02-041-0/+63
| | | | | | | | | | | | rm -rf <dir> fails on dirs which contain linkto files that point to themselves because dht incorrectly thought that they were cached files after looking them up. The fix now treats them as invalid linkto files and deletes them. Change-Id: I376c72a5309714ee339c74485e02cfb4e29be643 fixes: bz#1671611 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* fuse: add --lru-limit optionAmar Tumballi2019-01-161-0/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The inode LRU mechanism is moot in fuse xlator (ie. there is no limit for the LRU list), as fuse inodes are referenced from kernel context, and thus they can only be dropped on request of the kernel. This might results in a high number of passive inodes which are useless for the glusterfs client, causing a significant memory overhead. This change tries to remedy this by extending the LRU semantics and allowing to set a finite limit on the fuse inode LRU. A brief history of problem: When gluster's inode table was designed, fuse didn't have any 'invalidate' method, which means, userspace application could never ask kernel to send a 'forget()' fop, instead had to wait for kernel to send it based on kernel's parameters. Inode table remembers the number of times kernel has cached the inode based on the 'nlookup' parameter. And 'nlookup' field is not used by no other entry points (like server-protocol, gfapi etc). Hence the inode_table of fuse module always has to have lru-limit as '0', which means no limit. GlusterFS always had to keep all inodes in memory as kernel would have had a reference to it. Again, the reason for this is, kernel's glusterfs inode reference was pointer of 'inode_t' structure in glusterfs. As it is a pointer, we could never free it (to prevent segfault, or memory corruption). Solution: In the inode table, handle the prune case of inodes with 'nlookup' differently, and call a 'invalidator' method, which in this case is fuse_invalidate(), and it sends the request to kernel for getting the forget request. When the kernel sends the forget, it means, it has dropped all the reference to the inode, and it will send the forget with the 'nlookup' parameter too. We just need to make sure to reduce the 'nlookup' value we have when we get forget. That automatically cause the relevant prune to happen. Credits: Csaba Henk, Xavier Hernandez, Raghavendra Gowdappa, Nithya B fixes: bz#1623107 Change-Id: Ifee0737b23b12b1426c224ec5b8f591f487d83a2 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* geo-rep: Fix syncing of files with non-ascii filenamesKotresh HR2018-12-261-1/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Creation of files/directories with non-ascii names fails to sync to the slave. It crashes with below traceback on slave. ... File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/repce.py", line 118, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/resource.py", line 709, in entry_ops [ESTALE, EINVAL, EBUSY]) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/syncdutils.py", line 546, in errno_wrap return call(*arg) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/libcxattr.py", line 83, in lsetxattr cls.raise_oserr() File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/libcxattr.py", line 38, in raise_oserr raise OSError(errn, os.strerror(errn)) OSError: [Errno 12] Cannot allocate memory Cause: The length calculation arguments passed to blob creation was done before encoding. Hence was failing in gfid-access layer. Fix: It appears that the calculating lenght properly fixes this issue. But it will cause issues in other places in 'python2' and not in 'python3'. So encoding and decoding each required string to make geo-rep compatible with both 'python2' and 'python3' is a nightmare and is not fool proof. Hence kept 'python2' code as is with out encode/decode and applied encode/decode only to 'python3' Added non-ascii filename tests to regression Backport of: > Patch: https://review.gluster.org/21668 > BUG: 1650893 > Change-Id: I35cfaf848e07b1a0b5cb93c01b98b472f08271a6 > Signed-off-by: Kotresh HR <khiremat@redhat.com> fixes: bz#1648642 Change-Id: I35cfaf848e07b1a0b5cb93c01b98b472f08271a6 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* cluster/dht: sync brick root perms on add brickN Balachandran2018-12-261-6/+5
| | | | | | | | | | | | | | | | | If a single brick is added to the volume and the newly added brick is the first to respond to a dht_revalidate call, its stbuf will not be merged into local->stbuf as the brick does not yet have a layout. The is_permission_different check therefore fails to detect that an attr heal is required as it only considers the stbuf values from existing bricks. To fix this, merge all stbuf values into local->stbuf and use local->prebuf to store the correct directory attributes. Change-Id: Ic9e8b04a1ab9ed1248b6b056e3450bbafe32e1bc fixes: bz#1660736 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* tests: Fix zero-flag.t scriptKrutika Dhananjay2018-12-191-0/+1
| | | | | | | | | | | | | | | | The default value of shard-block-size was changed from 4MB to 64MB sometime back. The script "fallocate"s a 6MB file and expects it to have 1 shard under .shard. This worked when the shard-block-size was 4MB. With the default value now at 64MB, file "file1" won't have any shards under .shard and the stat on the 1st shard's path fails with ENOENT. Changed the script to explicitly set shard-block-size to 4MB. Change-Id: I7f1785922287d16d74c95fa57cbbe12e6e66e4f7 fixes: bz#1660932 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> (cherry picked from commit e69fc87593334b24432978dbf592fa73fe5fc38b)
* afr: thin-arbiter 2 domain locking and in-memory stateRavishankar N2018-12-121-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 2 domain locking + xattrop for write-txn failures: -------------------------------------------------- - A post-op wound on TA takes AFR_TA_DOM_NOTIFY range lock and AFR_TA_DOM_MODIFY full lock, does xattrop on TA and releases AFR_TA_DOM_MODIFY lock and stores in-memory which brick is bad. - All further write txn failures are handled based on this in-memory value without querying the TA. - When shd heals the files, it does so by requesting full lock on AFR_TA_DOM_NOTIFY domain. Client uses this as a cue (via upcall), releases AFR_TA_DOM_NOTIFY range lock and invalidates its in-memory notion of which brick is bad. The next write txn failure is wound on TA to again update the in-memory state. - Any incomplete write txns before the AFR_TA_DOM_NOTIFY upcall release request is got is completed before the lock is released. - Any write txns got after the release request are maintained in a ta_waitq. - After the release is complete, the ta_waitq elements are spliced to a separate queue which is then processed one by one. - For fops that come in parallel when the in-memory bad brick is still unknown, only one is wound to TA on wire. The other ones are maintained in a ta_onwireq which is then processed after we get the response from TA. Change-Id: I32c7b61a61776663601ab0040e2f0767eca1fd64 updates: bz#1648205 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* afr: assign gfid during name heal when no 'source' is present.Ravishankar N2018-12-121-0/+149
| | | | | | | | | | | | | | | | | | | Problem: If parent dir is in split-brain or has dirty xattrs set, and the file has gfid missing on one of the bricks, then name heal won't assign the gfid. Fix: Use the brick we select the gfid from as the 'source'. Note: Problem was found while trying to debug a split-brain issue on Cynthia Zhou's setup. fixes: bz#1655545 Change-Id: Id088d4f0fb017aa35122de426654194e581ed742 Reported-by: Cynthia Zhou <cynthia.zhou@nokia-sbell.com> Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 4d58730c0cd6ab5db39aec8a15276f7bd3371b04)
* gfapi: Offload callback notifications to synctaskSoumya Koduri2018-12-122-0/+374
| | | | | | | | | | | | | | | | | | | | | | | Upcall notifications are received from server via epoll and same thread is used to forward these notifications to the application. This may lead to deadlock and hang in the following scenario. Consider if as part of handling these callbacks, application has to do some operations which involve sending I/Os to gfapi stack which inturn have to wait for epoll threads to receive repsonse. Thus this may lead to deadlock if all the epoll threads are waiting to complete these callback notifications. To address it, instead of using epoll thread itself, make use of synctask to send those notificaitons to the application. Change-Id: If614e0d09246e4279b9d1f40d883a32a39c8fd90 updates: bz#1651323 Signed-off-by: Soumya Koduri <skoduri@redhat.com> (cherry picked from commit ad35193718a99494ab1b852ca4cbdf054f73de88)
* lease: Treat unlk request as noop if lease not foundSoumya Koduri2018-12-121-2/+2
| | | | | | | | | | | | | | | | | When the glusterfs server recalls the lease, it expects client to flush data and unlock the lease. If not it sets a timer (starting from the time it sends RECALL request) and post timeout, it revokes it. Here we could have a race where in client did send UNLK lease request but because of network delay it may have reached after server revokes it. To handle such situations, treat such requests as noop and return sucesss. Change-Id: I166402d10273f4f115ff04030ecbc14676a01663 updates: bz#1651323 Signed-off-by: Soumya Koduri <skoduri@redhat.com> (cherry picked from commit c2e758b54d8a3f778e3e63db0000bb8b63de9b25)
* cluster/afr: Use 2 domain locking in SHD for thin-arbiterkarthik-us2018-11-292-0/+230
| | | | | | | | | | | | | | | | | | | | | | | | With this change when SHD starts the index crawl it requests all the clients to release the AFR_TA_DOM_NOTIFY lock so that clients will know the in memory state is no more valid and any new operations needs to query the thin-arbiter if required. When SHD completes healing all the files without any failure, it will again take the AFR_TA_DOM_NOTIFY lock and gets the xattrs on TA to see whether there are any new failures happened by that time. If there are new failures marked on TA, SHD will start the crawl immediately to heal those failures as well. If there are no new failures, then SHD will take the AFR_TA_DOM_MODIFY lock and unsets the xattrs on TA, so that both the data bricks will be considered as good there after. >Change-Id: I037b89a0823648f314580ba0716d877bd5ddb1f1 >fixes: bz#1579788 >Signed-off-by: karthik-us <ksubrahm@redhat.com> (cherry picked from commit 5784a00f997212d34bd52b2303e20c097240d91c) Change-Id: I037b89a0823648f314580ba0716d877bd5ddb1f1 fixes: bz#1648205
* glusterd: ensure volinfo->caps is set to correct value.Sanju Rakonde2018-11-091-0/+9
| | | | | | | | | | | | | | | | | | | | | | With the commit febf5ed4848, during the volume create op, we are setting volinfo->caps to 0, only if any of the bricks belong to the same node and brickinfo->vg[0] is null. Previously, we used to set volinfo->caps to 0, when either brick doesn't belong to the same node or brickinfo->vg[0] is null. With this patch, we set volinfo->caps to 0, when either brick doesn't belong to the same node or brickinfo->vg[0] is null. (as we do earlier without commit febf5ed4848). > BUG: bz#1635820 > Change-Id: I00a97415786b775fb088ac45566ad52b402f1a49 > Signed-off-by: Sanju Rakonde <srakonde@redhat.com> (cherry picked from commit aae1c402b74fd02ed2f6473b896f108d82aef8e3) fixes: bz#1647968 Change-Id: I00a97415786b775fb088ac45566ad52b402f1a49 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* tests: correction in tests/bugs/glusterd/optimized-basic-testcases-in-cluster.tSanju Rakonde2018-11-081-9/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch https://review.gluster.org/#/c/glusterfs/+/19135/ has optimised glusterd test cases by clubbing the similar test cases into a single test case. https://review.gluster.org/#/c/glusterfs/+/19135/15/tests/bugs/glusterd/bug-1293414-import-brickinfo-uuid.t test case has been deleted and added as a part of tests/bugs/glusterd/optimized-basic-testcases-in-cluster.t In the original test case, we create a volume with two bricks, each on a separate node(N1 & N2). From another node in cluster(N3), we try to detach a node which is hosting bricks. It fails. In the new test, we created volume with single brick on N1. and from another node in cluster, we tried to detach N1. we expect peer detach to fail, but peer detach was success as the node is hosting all the bricks of volume. Now, changing the new test case to cover the original test case scenario. Please refer https://bugzilla.redhat.com/show_bug.cgi?id=1642597#c1 to understand why the new test case is not failing in centos-regression. > BUG: bz#1642597 > Change-Id: Ifda12b5677143095f263fbb97a6808573f513234 > Signed-off-by: Sanju Rakonde <srakonde@redhat.com> (cherry picked from commit 0ca6773eaf5aeb507ebc72d2c2f61902eeff414c) fixes: bz#1643078 Change-Id: Ifda12b5677143095f263fbb97a6808573f513234 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* tests: check for shd up status in bug-1637802-arbiter-stale-data-heal-lock.tRavishankar N2018-10-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: https://review.gluster.org/#/c/glusterfs/+/21427/ seems to be failing this .t spuriously. On checking one of the failure logs, I see: 22:05:44 Launching heal operation to perform index self heal on volume patchy has been unsuccessful: 22:05:44 Self-heal daemon is not running. Check self-heal daemon log file. 22:05:44 not ok 20 , LINENUM:38 In glusterd log: [2018-10-18 22:05:44.298832] E [MSGID: 106301] [glusterd-syncop.c:1352:gd_stage_op_phase] 0-management: Staging of operation 'Volume Heal' failed on localhost : Self-heal daemon is not running. Check self-heal daemon log file But the tests which preceed this check whether via a statedump if the shd is conected to the bricks, and they have succeeded and even started healing. From glustershd.log: [2018-10-18 22:05:40.975268] I [MSGID: 108026] [afr-self-heal-common.c:1732:afr_log_selfheal] 0-patchy-replicate-0: Completed data selfheal on 3b83d2dd-4cf2-4ea3-a33e-4275be40f440. sources=[0] 1 sinks=2 So the only reason I can see launching heal via cli failing is a race where shd has been spawned but glusterd has not yet updated in-memory that it is up, and hence failing the CLI. Fix: Check for shd up status before launching heal via CLI Change-Id: Ic88abf14ad3d51c89cb438db601fae4df179e8f4 fixes: bz#1641872 Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 3dea105556130abd4da0fd3f8f2c523ac52398d1)
* features/shard: Hold a ref on base inode when adding a shard to lru listKrutika Dhananjay2018-10-255-7/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of: > Change-Id: Ic15ca41444dd04684a9458bd4a526b1d3e160499 > Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> > (cherry picked from commit e627977) > BUG: 1605056 In __shard_update_shards_inode_list(), previously shard translator was not holding a ref on the base inode whenever a shard was added to the lru list. But if the base shard is forgotten and destroyed either by fuse due to memory pressure or due to the file being deleted at some point by a different client with this client still containing stale shards in its lru list, the client would crash at the time of locking lru_base_inode->lock owing to illegal memory access. So now the base shard is ref'd into the inode ctx of every shard that is added to lru list until it gets lru'd out. The patch also handles the case where none of the shards associated with a file that is about to be deleted are part of the LRU list and where an unlink at the beginning of the operation destroys the base inode (because there are no refkeepers) and hence all of the shards that are about to be deleted will be resolved without the existence of a base shard in-memory. This, if not handled properly, could lead to a crash. Change-Id: Ic15ca41444dd04684a9458bd4a526b1d3e160499 updates: bz#1641440 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* gfapi: Bug fixes in leases processing code-pathSoumya Koduri2018-10-181-2/+2
| | | | | | | | | | | | | | | | | This patch fixes below issues in gfapi lease code-path * 'glfs_setfsleasid' should allow NULL input to be able to reset leaseid * Applications should be allowed to (un)register for upcall notifications of type GLFS_EVENT_LEASE_RECALL * APIs added to read contents of GLFS_EVENT_LEASE_RECALL argument which is of type "struct glfs_upcall_lease" This is backport of the below mainline patch - https://review.gluster.org/#/c/glusterfs/+/21391 Change-Id: I3320ddf235cc82fad561e13b9457ebd64db6c76b updates: #350 Signed-off-by: Soumya Koduri <skoduri@redhat.com>
* api: fill out attribute information if not validRaghavendra Gowdappa2018-10-172-0/+137
| | | | | | | | | | | | | | | | | | translators like readdir-ahead selectively retain entry information of iatt (gfid and type) when rest of the iatt is invalidated (for write invalidating ia_size, (m)(c)times etc). Fuse-bridge uses this information and sends only entry information in readdirplus response. However such option doesn't exist in gfapi. This patch modifies gfapi to populate the stat by forcing an extra lookup. Thanks to Shyamsundar Ranganathan <srangana@redhat.com> and Prashanth Pai <ppai@redhat.com> for tests. Change-Id: Ieb5f8fc76359c327627b7d8420aaf20810e53000 Fixes: bz#1630804 Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com> Signed-off-by: Soumya Koduri <skoduri@redhat.com> (cherry picked from commit 6257276d9de3f15643f159b2ec627a67c84fc23d)
* afr: fix incorrect reporting of directory split-brainRavishankar N2018-10-112-1/+63
| | | | | | | | | | | | | | | | | | | Backport of https://review.gluster.org/#/c/glusterfs/+/21135/ Problem: When a directory has dirty xattrs due to failed post-ops or when replace/reset brick is performed, AFR does a conservative merge as expected, but heal-info reports it as split-brain because there are no clear sources. Fix: Modify pending flag to contain information about pending heals and split-brains. For directories, if spit-brain flag is not set,just show them as needing heal and not being in split-brain. Change-Id: I09ef821f6887c87d315ae99e6b1de05103cd9383 fixes: bz#1638163 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* afr: prevent winding inodelks twice for arbiter volumesRavishankar N2018-10-111-0/+44
| | | | | | | | | | | | | | | | | | | | Backport of https://review.gluster.org/#/c/glusterfs/+/21380/ Problem: In an arbiter volume, if there is a pending data heal of a file only on arbiter brick, self-heal takes inodelks twice due to a code-bug but unlocks it only once, leaving behind a stale lock on the brick. This causes the next write to the file to hang. Fix: Fix the code-bug to take lock only once. This bug was introduced master with commit eb472d82a083883335bc494b87ea175ac43471ff Thanks to Pranith Kumar K <pkarampu@redhat.com> for finding the RCA. fixes: bz#1638159 Change-Id: I15ad969e10a6a3c4bd255e2948b6be6dcddc61e1 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* ctime: Provide noatime optionKotresh HR2018-10-021-0/+51
| | | | | | | | | | | | | | | | | | | | | | | | Most of the applications are {c|m}time dependant and very few are atime dependant. So provide noatime option to not update atime when ctime feature is enabled. Also this option has to be enabled with ctime feature to avoid unnecessary self heal. Since AFR/EC reads data from single subvolume, atime is only updated in one subvolume triggering self heal. Backport of: > Patch: https://review.gluster.org/21073 > BUG: 1593538 > Change-Id: I085fb33c882296545345f5df194cde7b6cbc337e > Signed-off-by: Kotresh HR <khiremat@redhat.com> (cherry picked from commit 89636be4c73b12de2e11c75d8e59527bb243f147) updates: bz#1633015 Change-Id: I085fb33c882296545345f5df194cde7b6cbc337e Signed-off-by: Kotresh HR <khiremat@redhat.com>
* python3: assume python3 unless building _packages_ on sys without py3Kaleb S. KEITHLEY2018-09-2710-10/+1
| | | | | | | | | | | | | | | The jenkins release-new job runs on a CentOS 7 box, which does not have python3. As a result it runs (autogen.sh and) configure before producing the dist tar file, converting all the python3 shebangs to python2 shebangs in the dist tar file. Then when that tar file is "carried" to, e.g. Fedora koji build system to build packages, the shebangs are incorrect, despite having originally been correct in the git repo. Change-Id: I5154baba3f6d29d3c4823bafc2b57abecbf90e5b updates: #411 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* geo-rep: Fix issues related config setKotresh HR2018-09-191-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | 1. '--ignore-mising-args' option for rsync is not being used even though the rsync version is greater than 3.1.0. Fixed the same. 2. '--existing' option for rsync is also not being used. Fixed the same. 3. geo-rep config fails to set rsync-options as the value contains '--'. Interestingly, python argsparse treats the value with '--' (e.g., --ignore-missing-args) as option. But when passed with something like --value=--ignore-missing-args, it succeeds. Fixed the same. Backport of: > Patch: https://review.gluster.org/21191 > Change-Id: Iaeb838acaff1c2920fee9c7f920c99edce13a0a1 > Signed-off-by: Kotresh HR <khiremat@redhat.com> > BUG: 1629561 (cherry picked from commit b977b44dd0adfcd7a3b432844260de4b8d1c4adf) Change-Id: Iaeb838acaff1c2920fee9c7f920c99edce13a0a1 Signed-off-by: Kotresh HR <khiremat@redhat.com> fixes: bz#1630673
* core: remove experimental xlators and associated testsShyamsundarR2018-09-175-252/+0
| | | | | | | | experimental xlators removed from 5.0 Change-Id: I47219d8b95efc3d5875ec9224d1e79f8371e9f76 Updates: bz#1628620 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* gfapi: revert several patchs that introduced pre/post attrsShyamsundarR2018-09-174-4/+4
| | | | | | | | | | | | | | Reverted the following: - 248152767b0599986bbb6bb35fc27197f6be6964 - 09943beb499617212f2985ca8ea9ecd1ed1b470e - d01f7244e9d9f7e3ef84e0ba7b48ef1b1b09d809 The reverts are redone by hand, due to clang format changes that made using git to revert the changes more tedious. Change-Id: I96489638a2b641fb2206a110298543225783f7be Updates: bz#1628620 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* gfapi: revert "gfapi: return pre/post attributes at callback for glfs api"ShyamsundarR2018-09-172-4/+2
| | | | | | | | | | | | | | This reverts commit 384562b294e9a7847403961e878a4daa0fff33eb. The revert is done manually owing to the clang format changes, and the 4.1 patch that reverts this fix is used as the source for the revert. The 4.1 patch has the commit ID: 98376e0c0a Change-Id: Ib2cbce9940f6a2a2755eb47bf332832147835e4d Updates: bz#1628620 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* Land part 2 of clang-format changesGluster Ant2018-09-1265-7508/+7651
| | | | | Change-Id: Ia84cc24c8924e6d22d02ac15f611c10e26db99b4 Signed-off-by: Nigel Babu <nigelb@redhat.com>
* Land clang-format changesGluster Ant2018-09-122-83/+108
| | | | Change-Id: I6f5d8140a06f3c1b2d196849299f8d483028d33b
* misc: fix misc. shebangsKaleb S. KEITHLEY2018-09-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * One #!/usr/bin/env python and three #!/usr/bin/python were overlooked in all the other python fixups. Ugh. * Two new python files missed the memo about #!/usr/bin/python3. * One #!/usr/bin/env bash. Various distribution packaging policies have strong wording about the use of #!/usr/bin/env ... Note: this patch does not change the use of #!/usr/bin/env bash in the two files extras/{clang-checker.sh,check_goto.pl} as these are not included in any packages. (Although I'm not actually sure why anyone would ever use '/usr/bin/env {sh,bash}' as I'm not aware of any version-specific differences like there are with, e.g., python.) * One #!/usr/bin/bash. On Fedora and CentOS > 6, /bin is a symlink to /usr/bin, so it makes little difference. But Debian & Ubuntu still have separate /bin and /usr/bin; and sh and bash are in /bin, not /usr/bin. (Historically, in BSD and SYSV Unix it was /bin/sh.) Note: Fedora and CentOS package build runs a script that converts all /bin/sh and /bin/bash to /usr/bin/sh and /usr/bin/bash. Change-Id: I9171265829af78dd0cd7622c22b56d22179ff8a3 updates: bz#1193929 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* glusterd: avoid using glusterd's working directory as a brickSanju Rakonde2018-09-081-0/+9
| | | | | | | | | Adding checks for avoiding glusterd's working directory used as a brick for volume creation. fixes: bz#853601 Change-Id: I4b16a05f752e92216aa628f542a4fdbf59b3c669 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>