glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	cluster/ec: fix EIO error for concurrent writes on sparse files	Xavi Hernandez	2019-08-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	EC doesn't allow concurrent writes on overlapping areas, they are serialized. However non-overlapping writes are serviced in parallel. When a write is not aligned, EC first needs to read the entire chunk from disk, apply the modified fragment and write it again. The problem appears on sparse files because a write to an offset implicitly creates data on offsets below it (so, in some way, they are overlapping). For example, if a file is empty and we read 10 bytes from offset 10, read() will return 0 bytes. Now, if we write one byte at offset 1M and retry the same read, the system call will return 10 bytes (all containing 0's). So if we have two writes, the first one at offset 10 and the second one at offset 1M, EC will send both in parallel because they do not overlap. However, the first one will try to read missing data from the first chunk (i.e. offsets 0 to 9) to recombine the entire chunk and do the final write. This read will happen in parallel with the write to 1M. What could happen is that half of the bricks process the write before the read, and the half do the read before the write. Some bricks will return 10 bytes of data while the otherw will return 0 bytes (because the file on the brick has not been expanded yet). When EC tries to recombine the answers from the bricks, it can't, because it needs more than half consistent answers to recover the data. So this read fails with EIO error. This error is propagated to the parent write, which is aborted and EIO is returned to the application. The issue happened because EC assumed that a write to a given offset implies that offsets below it exist. This fix prevents the read of the chunk from bricks if the current size of the file is smaller than the read chunk offset. This size is correctly tracked, so this fixes the issue. Also modifying ec-stripe.t file for Test #13 within it. In this patch, if a file size is less than the offset we are writing, we fill zeros in head and tail and do not consider it strip cache miss. That actually make sense as we know what data that part holds and there is no need of reading it from bricks. Change-Id: Ic342e8c35c555b8534109e9314c9a0710b6225d6 Fixes: bz#1739427 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	afr: restore timestamp of parent dir during entry-heal	Ravishankar N	2019-08-21	1	-0/+78
\| \| \| \| \| \| \|	Fixes: bz#1741041 Change-Id: I29e338bac62104233a6f80212df8d0fb016affda Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 8e9c53ebf16705b9a1db2fc486dc24a5cb244ddd)
*	features/shard: Send correct size when reads are sent beyond file size	Krutika Dhananjay	2019-08-21	1	-0/+29
\| \| \| \| \| \| \|	Change-Id: I0cebaaf55c09eb1fb77a274268ff564e871b743b fixes bz#1740316 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> (cherry picked from commit 51237eda7c4b3846d08c5d24d1e3fe9b7ffba1d4)
*	ctime: Set mdata xattr on legacy files	Kotresh HR	2019-08-19	1	-0/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: The files which were created before ctime enabled would not have "trusted.glusterfs.mdata"(stores time attributes) xattr. Upon fops which modifies either ctime or mtime, the xattr gets created with latest ctime, mtime and atime, which is incorrect. It should update only the corresponding time attribute and rest from backend Solution: Creating xattr with values from brick is not possible as each brick of replica set would have different times. So create the xattr upon successful lookup if the xattr is not created Note To Reviewers: The time attributes used to set xattr is got from successful lookup. Instead of sending the whole iatt over the wire via setxattr, a structure called mdata_iatt is sent. The mdata_iatt contains only time attributes. Backport of: > Patch: https://review.gluster.org/22936 > Change-Id: I5e535631ddef04195361ae0364336410a2895dd4 > BUG: 1593542 > Signed-off-by: Kotresh HR <khiremat@redhat.com> Change-Id: I5e535631ddef04195361ae0364336410a2895dd4 updates: bz#1739430 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	cluster/afr: Fix incorrect reporting of gfid & type mismatch	karthik-us	2019-07-20	1	-0/+116
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problems: 1. When checking for type and gfid mismatch, if the type or gfid is unknown because of missing gfid handle and the gfid xattr it will be reported as type or gfid mismatch and the heal will not complete. 2. If the source selected during entry heal has null gfid the same will be sent to afr_lookup_and_heal_gfid(). In this function when we try to assign the gfid on the bricks where it does not exist, we are considering the same gfid and try to assign that on those bricks. This will fail in posix_gfid_set() since the gfid sent is null. Fix: If the gfid sent to afr_lookup_and_heal_gfid() is null choose a valid gfid before proceeding to assign the gfid on the bricks where it is missing. In afr_selfheal_detect_gfid_and_type_mismatch(), do not report type/gfid mismatch if the type/gfid is unknown or not set. Change-Id: Ia06552e4dc4a9f89cb7f5302833604bd21bbf7da fixes: bz#1729481 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	glusterd: do not mark skip_locking as true for geo-rep operations	Sanju Rakonde	2019-07-19	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need to send the commit req to peers in case of geo-rep operations even though it is a no volname operation. In commit phase peers try to set the txn_opinfo which will fail because it is a no volname operation where we don't require a commit phase. We mark skip_locking as true for no volname operations, but we have to give an exception to geo-rep operations, so that they can set txn_opinfo in commit phase. Please refer to detailed RCA at the bug: 1730543 fixes: bz#1730543 Change-Id: I9f2478b12a281f6e052035c0563c40543493a3fc Signed-off-by: Sanju Rakonde <srakonde@redhat.com> (cherry picked from commit b917974ee922d7a2e079692ad7d6f61f900b37b2)
*	tests: Fix bug-1717819-metadata-split-brain-detection.t failure	karthik-us	2019-07-15	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: tests/bugs/replicate/bug-1717819-metadata-split-brain-detection.t fails intermittently in test cases #49 & #50, which compare the values of the user set xattr values after enabling the heal. We are not waiting for the heal to complete before comparing those values, which might lead those tests to fail. Fix: Wait till the HEAL-TIMEOUT before comparing the xattr values. Also cheking for the shd to come up and the bricks to connect to the shd process in another case. Change-Id: I0e245b328da9df23ce70c5300278fad1c1d9f7ff fixes: bz#1729895 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	test: Fix spurious failures in bug-1040275-brick-uid-reset-on-volume-restart.t	Mohit Agrawal	2019-07-09	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: test case is failing after just starting the volume at the time of running stat command on mount point and client is getting error transport endpoint is not conencted Solution: To avoid the error make sure all brick instance should be up and mount point should be active > Change-Id: I49553a04d5b13e155ee02f4a1888a07fe3ee2ff5 > fixes: bz#1721590 > Signed-off-by: Mohit Agrawal <moagrawal@redhat.com> > (cherry picked from commit 283b77805cca3027e333a11c9b00ac611662c9ee) Change-Id: I49553a04d5b13e155ee02f4a1888a07fe3ee2ff5 fixes: bz#1728182 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	glusterd/thin-arbiter: Thin-arbiter integration with GD1	Vishal Pandey	2019-07-04	2	-0/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gluster volume create <VOLNAME> replica 2 thin-arbiter 1 <host1>:<brick1> <host2>:<brick2> <thin-arbiter-host>:<path-to-store-replica-id-file> [force] The changes have been made in a way that the last brick in the bricks list will be treated as the thin-arbiter. GD1 will be manipulated to consider replica count to be as 2 and continue creating the volume like any other replica 2 volume but since thin-arbiter volumes need ta-brick client xlator entries for each subvolume in fuse volfile, volfile generation is modified in a way to inject these entries seperately in the volfile for every subvolume. Few more additions - 1- Save the volinfo with new fields ta_bricks list and thin_arbiter_count. 2- Introduce a new option client.ta-brick-port to add remote-port to ta-brick xlator entry in fuse volfiles. The option can be set using the following CLI syntax - gluster volume set <VOLNAME> client.ta-brick-port <PORTNO.> 3- Volume Info will contain a Thin-Arbiter-path entry to distinguish from other replicate volumes. Change-Id: Ib434e2313b29716f32476c6c211d282c4ef39406 Updates #687 Signed-off-by: Vishal Pandey <vpandey@redhat.com> (cherry picked from commit 9b223b15ab69fce4076de036ee162f36a058bcd2)
*	cluster/ec: Prevent double pre-op xattrops	Pranith Kumar K	2019-06-22	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Race: Thread-1 Thread-2 1) Does ec_get_size_version() to perform pre-op fxattrop as part of write-1 2) Calls ec_set_dirty_flag() in ec_get_size_version() for write-2. This sets dirty[] to 1 3) Completes executing ec_prepare_update_cbk leading to ctx->dirty[] = '1' 4) Takes LOCK(inode->lock) to check if there are any flags and sets dirty-flag because lock->waiting_flag is 0 now. This leads to fxattrop to increment on-disk dirty[] to '2' At the end of the writes the file will be marked for heal even when it doesn't need heal. Fix: Perform ec_set_dirty_flag() and other checks inside LOCK() to prevent dirty[] to be marked as '1' in step 2) above Updates bz#1593224 Change-Id: Icac2ab39c0b1e7e154387800fbededc561612865 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	posix/ctime: Fix ctime upgrade issue	Kotresh HR	2019-06-21	3	-16/+202
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: On a EC volume, during upgrade from the older version where ctime feature is not enabled(or not present) to the newer version where the ctime feature is available (enabled default), the self heal hangs and doesn't complete. Cause: The ctime feature has both client side code (utime) and server side code (posix). The feature is driven from client. Only if the client side sets the time in the frame, should the server side sets the time attributes in xattr. But posix setattr/fseattr was not doing that. When one of the server nodes is updated, since ctime is enabled by default, it starts setting xattr on setattr/fseattr on the updated node/brick. On a EC volume the first two updated nodes(bricks) are not a problem because there are 4 other bricks with consistent data. However once the third brick is updated, the new attribute(mdata xattr) will cause an inconsistency on metadata on 3 bricks, which prevents the file to be repaired. Fix: Don't create mdata xattr with utimes/utimensat system call. Only update if already present. Change-Id: Ieacedecb8a738bb437283ef3e0f042fd49dc4c8c fixes: bz#1720201 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	encryption/crypt: remove from volume file	Amar Tumballi	2019-06-20	3	-457/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The feature is not supported and is moved out of the codebase from glusterfs-5.x release. Doesn't make sense to keep the code to support it. For those who want to upgrade from an version supporting it to higher version, please do a 'gluster volume reset $VOL encryption reset' and then continue with the upgrade process. updates: bz#1648169 Change-Id: I8cf822c0d7195940bd37f6af2432a3cac68d44d1 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	tests: subdir-mount.t is failing for brick_mux regrssion	Mohit Agrawal	2019-06-17	1	-3/+8
\| \| \| \| \| \| \| \| \|	To avoid the failure wait to run hook script S13create-subdir-mounts.sh after executed add-brick command by test case. Change-Id: I063b6d0f86a550ed0a0527255e4dfbe8f0a8c02e fixes: bz#1720993 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	core: improve timer accuracy	Xavier Hernandez	2019-06-17	1	-3/+4
\| \| \| \| \| \| \| \|	Also fixed some issues on test ec-1468261.t. Change-Id: If156f86af986d9eed13cdd1f15c5a7214cd11706 Updates: bz#1193929 Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
*	glusterd: add GF_TRANSPORT_BOTH_TCP_RDMA in glusterd_get_gfproxy_client_volfile	Atin Mukherjee	2019-06-17	1	-1/+3
\| \| \| \| \| \| \| \| \|	... with out which volume creation fails with "volume create: <xyz>: failed: Failed to create volume files" Fixes: bz#1716812 Change-Id: I2f4c2c6d5290f066b54e1c1db19e25db9937bedb Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	tests: Add missing NFS test tag to the testfile	Aravinda VK	2019-06-15	1	-0/+2
\| \| \| \| \| \| \| \|	$SRC/glusterfs/bugs/nfs/showmount-many-clients.t Change-Id: I48758cc66fcb55f48c4a8a0a738b06867f6814a1 Signed-off-by: Aravinda VK <avishwan@redhat.com> Updates: bz#1193929
*	gfapi: provide an api for setting statedump path	Amar Tumballi	2019-06-14	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently for an application using glfsapi to use glusterfs, when a statedump is taken, it uses /var/run/gluster dir to dump info. There can be concerns as this directory may be owned by some other user, and hence it may fail taking statedump. Such applications should have an option to use different path. This patch provides an API to do so. Updates: bz#1689097 Change-Id: I8918e002bc823d83614c972b6c738baa04681b23 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	tests: keep glfsxmp in tests directory	Amar Tumballi	2019-06-11	10	-17/+1821
\| \| \| \| \| \| \| \| \| \| \| \| \|	this is critical so all the tests will be contained in the same directory, and one can just 'cp -a tests/ <any-location>/' and run glusterfs tests. only 'glfsxmp.c' was an exception as it was just copying the file from api example directory. Now moved it to tests. updates: bz#1193929 Change-Id: I00359d64be580bffc5b3c3a090968d86c2c6952a Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	tests: Fix split-brain-favorite-child-policy.t failure	karthik-us	2019-06-10	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: The test case is failing to heal the volume within $HEAL_TIMEOUT @195. This is happening because as part of split-brain resolution the file gets expunged from the sink and the new entry mark for that file will be done on the source bricks as part of impunging. Since the source bricks shd-threads failed to get the heal-domain lock, they will wait for the heal-timeout of 10 minutes, which is greater than $HEAL_TIMEOUT. Fix: Set the cluster.heal-timeout to 5 seconds to trigger the heal so that one of the source brick heals the file within the $HEAL_TIMEOUT. Change-Id: Ie73c578cc5361c0d617a48ccc86026734d20ba8c fixes: bz#1718998 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	Cluster/afr: Don't treat all bricks having metadata pending as split-brain	karthik-us	2019-06-10	2	-64/+130
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: We currently don't have a roll-back/undoing of post-ops if quorum is not met. Though the FOP is still unwound with failure, the xattrs remain on the disk. Due to these partial post-ops and partial heals (healing only when 2 bricks are up), we can end up in metadata split-brain purely from the afr xattrs point of view i.e each brick is blamed by atleast one of the others for metadata. These scenarios are hit when there is frequent connect/disconnect of the client/shd to the bricks. Fix: Pick a source based on the xattr values. If 2 bricks blame one, the blamed one must be treated as sink. If there is no majority, all are sources. Once we pick a source, self-heal will then do the heal instead of erroring out due to split-brain. This patch also adds restriction of all the bricks to be up to perform metadata heal to avoid any metadata loss. Removed the test case tests/bugs/replicate/bug-1468279-source-not-blaming-sinks.t as it was doing metadata heal even when only 2 of 3 bricks were up. Change-Id: I07a9d62f84ceda329dcab1f02a33aeed258dcb09 fixes: bz#1717819 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	tests: added cleanup for lock files	Sunny Kumar	2019-06-10	1	-0/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: useradd fails with: Cannot allocate memory useradd: cannot lock /etc/passwd; try again later. Solution: Lock files should get automatically removed once "usradd" or "groupadd" command finishes. But sometimes we encounter situations (bugs) where some of these files may not get properly unlocked after the execution of the command. In that case, when we execute useradd next time, it may show the error “cannot lock /etc/password” or “unable to lock group file”. So, to avoid any such errors, check for any lock files under /etc and remove those. updates: bz#1193929 Change-Id: If6456a271c2bc0717f768d7101a40ce44a9af3d7 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
*	features/shard: Fix extra unref when inode object is lru'd out and added back	Krutika Dhananjay	2019-06-09	1	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Long tale of double unref! But do read... In cases where a shard base inode is evicted from lru list while still being part of fsync list but added back soon before its unlink, there could be an extra inode_unref() leading to premature inode destruction leading to crash. One such specific case is the following - Consider features.shard-deletion-rate = features.shard-lru-limit = 2. This is an oversimplified example but explains the problem clearly. First, a file is FALLOCATE'd to a size so that number of shards under /.shard = 3 > lru-limit. Shards 1, 2 and 3 need to be resolved. 1 and 2 are resolved first. Resultant lru list: 1 -----> 2 refs on base inode - (1) + (1) = 2 3 needs to be resolved. So 1 is lru'd out. Resultant lru list - 2 -----> 3 refs on base inode - (1) + (1) = 2 Note that 1 is inode_unlink()d but not destroyed because there are non-zero refs on it since it is still participating in this ongoing FALLOCATE operation. FALLOCATE is sent on all participant shards. In the cbk, all of them are added to fync_list. Resulting fsync list - 1 -----> 2 -----> 3 (order doesn't matter) refs on base inode - (1) + (1) + (1) = 3 Total refs = 3 + 2 = 5 Now an attempt is made to unlink this file. Background deletion is triggered. The first $shard-deletion-rate shards need to be unlinked in the first batch. So shards 1 and 2 need to be resolved. inode_resolve fails on 1 but succeeds on 2 and so it's moved to tail of list. lru list now - 3 -----> 2 No change in refs. shard 1 is looked up. In lookup_cbk, it's linked and added back to lru list at the cost of evicting shard 3. lru list now - 2 -----> 1 refs on base inode: (1) + (1) = 2 fsync list now - 1 -----> 2 (again order doesn't matter) refs on base inode - (1) + (1) = 2 Total refs = 2 + 2 = 4 After eviction, it is found 3 needs fsync. So fsync is wound, yet to be ack'd. So it is still inode_link()d. Now deletion of shards 1 and 2 completes. lru list is empty. Base inode unref'd and destroyed. In the next batched deletion, 3 needs to be deleted. It is inode_resolve()able. It is added back to lru list but base inode passed to __shard_update_shards_inode_list() is NULL since the inode is destroyed. But its ctx->inode still contains base inode ptr from first addition to lru list for no additional ref on it. lru list now - 3 refs on base inode - (0) Total refs on base inode = 0 Unlink is sent on 3. It completes. Now since the ctx contains ptr to base_inode and the shard is part of lru list, base shard is unref'd leading to a crash. FIX: When shard is readded back to lru list, copy the base inode pointer as is into its inode ctx, even if it is NULL. This is needed to prevent double unrefs at the time of deleting it. Change-Id: I99a44039da2e10a1aad183e84f644d63ca552462 Updates: bz#1696136 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
*	uss: Ensure that snapshot is deleted before creating a new snapshot	Raghavendra Bhat	2019-06-08	1	-0/+12
\| \| \| \| \| \| \| \|	* Also some logging enhancements in snapview-server Change-Id: I6a7646771cedf4bd1c62806eea69d720bbaf0c83 fixes: bz#1715921 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
*	tests/utils: Fix py2/py3 util python scripts	Kotresh HR	2019-06-07	7	-30/+258
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Following files are fixed. tests/bugs/distribute/overlap.py tests/utils/changelogparser.py tests/utils/create-files.py tests/utils/gfid-access.py tests/utils/libcxattr.py Change-Id: I3db857cc19e19163d368d913eaec1269fbc37140 updates: bz#1193929 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	tests/quick-read-upcall: mark it bad	Amar Tumballi	2019-06-07	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Frequent intermittent failures observed. ``` 08:59:24 ok 11 [ 10/ 3] < 36> 'write_to /mnt/glusterfs/0/test.txt test-message1' 08:59:24 ok 12 [ 10/ 6] < 37> 'test-message1 cat /mnt/glusterfs/0/test.txt' 08:59:24 ok 13 [ 10/ 4] < 38> 'test-message0 cat /mnt/glusterfs/1/test.txt' 08:59:24 not ok 14 [ 3715/ 6] < 45> 'test-message1 cat /mnt/glusterfs/1/test.txt' -> 'Got "test-message0" instead of "test-message1"' 08:59:24 ok 15 [ 10/ 162] < 47> 'gluster --mode=script --wignore volume set patchy features.cache-invalidation on' 08:59:24 ok 16 [ 10/ 148] < 48> 'gluster --mode=script --wignore volume set patchy performance.qr-cache-timeout 15' ``` updates: bz#1718191 Change-Id: Ieb9e5a9a428995ff178f77bc4a5155b8298d3fa0 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	tests/volume-scale-shd-mux: mark as bad test	Amar Tumballi	2019-06-07	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \|	The test is giving frequent failures in regression. Error seen is normally like below: `09:09:24 not ok 58 [ 14/ 80343] < 104> '^3$ number_healer_threads_shd patchy_distribute1 __afr_shd_healer_wait' -> 'Got "1" instead of "^3$"'` updates: bz#1708929 Change-Id: I240bdcfb76b1f953d75937a53c5dfabba134f282 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	tests/shd: Add test coverage for shd mux	Mohammed Rafi KC	2019-06-06	4	-0/+372
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch add more test cases for shd mux test cases The test case includes 1) Createing multiple volumes to check the attach and detach of self heal daemon requests. 2) Make sure the healing happens in all sceanarios 3) After a volume detach make sure the threads of the detached volume is all cleaned. 4) Repeat all the above tests for ec volume 5) Node Reboot case 6) glusterd restart cases 7) Add-brick/remove brick 8) Convert a distributed volume to disperse volume 9) Convert a replicated volume to distributed volume Change-Id: I7c317ef9d23a45ffd831157e4890d7c83a8fce7b fixes: bz#1708929 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	tests/geo-rep: Add geo-rep cli testcases	Kotresh HR	2019-06-06	2	-0/+19
\| \| \| \| \| \|	Change-Id: Icf93b90bcac022a355d4718220698987dbc91ecf Signed-off-by: Kotresh HR <khiremat@redhat.com> updates: bz#1693692
*	lcov: run more fops on translators	Amar Tumballi	2019-06-04	6	-2/+159
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Translators covered: * playground/template * debug/delay-gen * debug/error-gen * features/namespace * features/quiesce * meta updates: bz#1693692 Change-Id: Ic8fde8efcb309ea492d8e819241f786f7ff467a1 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	features/shard: Fix block-count accounting upon truncate to lower size	Krutika Dhananjay	2019-06-04	1	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The way delta_blocks is computed in shard is incorrect, when a file is truncated to a lower size. The accounting only considers change in size of the last of the truncated shards. FIX: Get the block-count of each shard just before an unlink at posix in xdata. Their summation plus the change in size of last shard (from an actual truncate) is used to compute delta_blocks which is used in the xattrop for size update. Change-Id: I9128a192e9bf8c3c3a959e96b7400879d03d7c53 fixes: bz#1705884 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
*	tests/geo-rep: Add geo-rep glusterd test cases	Kotresh HR	2019-06-04	3	-0/+223
\| \| \| \| \| \| \| \| \| \|	1. Add geo-rep fanout test case 2. Add glusterd geo-rep negative test cases 3. Add glusterd geo-rep config test cases Change-Id: I856c087eb3216d8f0ffd1f266deac88e9a4effec Signed-off-by: Kotresh HR <khiremat@redhat.com> updates: bz#1693692
*	tests/geo-rep: Remove a rename test case on EC volume	Kotresh HR	2019-06-04	2	-5/+5
\| \| \| \| \| \| \| \| \| \|	Rename with existing name testcase is occasionaly failing on EC volume. Hence commenting the same until it's analysed Change-Id: Icb2ad189b9e4d12101e8f5abcb8a033181360386 Signed-off-by: Kotresh HR <khiremat@redhat.com> updates: bz#1193929
*	posix: add storage.reserve-size option	Sheetal Pamecha	2019-06-03	1	-0/+58
\| \| \| \| \| \| \| \| \| \| \|	storage.reserve-size option will take size as input instead of percentage. If set, priority will be given to storage.reserve-size over storage.reserve. Default value of this option is 0. fixes: bz#1651445 Change-Id: I7a7342c68e436e8bf65bd39c567512ee04abbcea Signed-off-by: Sheetal Pamecha <sheetal.pamecha08@gmail.com>
*	tests/geo-rep: Add tests to cover glusterd geo-rep	Kotresh HR	2019-05-31	1	-0/+3
\| \| \| \| \| \|	Change-Id: Ide59a3fde11b23f654b1ec03d72b4ec53b36a03b Signed-off-by: Kotresh HR <khiremat@redhat.com> updates: bz#1693692
*	glusterfsd/cleanup: Protect graph object under a lock	Mohammed Rafi KC	2019-05-31	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \|	While processing a cleanup_and_exit function, we are accessing a graph object. But this has not been protected under a lock. Because a parallel cleanup of a graph is quite possible which might lead to an invalid memory access Change-Id: Id05ca70d5b57e172b0401d07b6a1f5386c044e79 fixes: bz#1708926 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	tests/geo-rep: Add EC volume test case	Shwetha K Acharya	2019-05-31	2	-0/+447
\| \| \| \| \| \| \| \|	Added geo-rep regression tests with EC volume. fixes: bz#1650095 Change-Id: Ifb6e68e0a6103a98fced7f84d3088b8edf33d52f Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
*	lcov: more coverage to shard, old-protocol, sdfs	Amar Tumballi	2019-05-31	6	-6/+58
\| \| \| \| \| \|	updates: bz#1693692 Change-Id: If4c30572d4501d169bb4b0871c677d974515867c Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	tests: add tests for different signal handling	Amar Tumballi	2019-05-30	4	-11/+74
\| \| \| \| \| \| \| \| \| \| \|	Also some cleanup: * old-protocol.t was actually added to make sure we have line-coverage * first-test.t should have been removed as per the comment. It doesn't do anything. * add statvfs to rpc-coverage so we can cover statvfs in few xlators. updates: bz#1693692 Change-Id: Ie8651ce007de484c4abced16b4de765aa5e517be Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	glusterd: bulkvoldict thread is not handling all volumes	Mohit Agrawal	2019-05-27	1	-6/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In commit ac70f66c5805e10b3a1072bd467918730c0aeeb4 I missed one condition to populate volume dictionary in multiple threads while brick_multiplex is enabled.Due to that glusterd is not sending volume dictionary for all volumes to peer. Solution: Update the condition in code as well as update test case also to avoid the issue Change-Id: I06522dbdfee4f7e995d9cc7b7098fdf35340dc52 fixes: bz#1711250 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	tests: Add changelog api tests	Kotresh HR	2019-05-27	2	-0/+135
\| \| \| \| \| \|	updates: bz#1193929 Change-Id: Iee9aab8140882069165621189741f189fb2cc884 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	glusterd/tier: remove tier related code from glusterd	Hari Gowtham	2019-05-27	4	-179/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	The handler functions are pointed to dummy functions. The switch case handling for tier also have been moved to point default case to avoid issues, if reintroduced. The tier changes in DHT still remain as such. updates: bz#1693692 Change-Id: I80d80c9a3eb862b4440a36b31ae82b2e9d92e4dc Signed-off-by: Hari Gowtham <hgowtham@redhat.com>
*	tests: Add history api tests	Kotresh HR	2019-05-27	5	-0/+171
\| \| \| \| \| \|	updates: bz#1193929 Change-Id: Ic26ab5277f720c734f083150c1c541763dfa64aa Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	gfapi:add missng api to increase code coverage	Sheetal Pamecha	2019-05-26	1	-18/+340
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	add test for async Read/Write combinations glfs_read_async/write_async glfs_pread_async/pwrite_async glfs_readv_async/writev_async glfs_preadv_async/pwritev_async ftruncate/ftruncate_async fsync/fsync_async fdatasync/fdatasync_async Updates: #655 Change-Id: I12beb97029fd60bce79650a376d8fcd8d383ef16 Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
*	api/glfsxmp.c: minor fixes	Sheetal Pamecha	2019-05-26	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \|	* add more fops: f{get,set,list,remove}xattr(), access(), fstat(), fsetattr(), getxattr(), lgetxattr(), llistxattr(), lsetxattr(), fgetxattr() * handle some error cases (like volume not found) Updates: #655 Change-Id: I3334bdf3090eafd83a54e1be12036ea01b181089 Signed-off-by: Amar Tumballi <amarts@redhat.com> Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
*	cluster/ec: honor contention notifications for partially acquired locks	Xavi Hernandez	2019-05-25	1	-0/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	EC was ignoring lock contention notifications received while a lock was being acquired. When a lock is partially acquired (some bricks have granted the lock but some others not yet) we can receive notifications from acquired bricks, which should be honored, since we may not receive more notifications after that. Since EC was ignoring them, once the lock was acquired, it was not released until the eager-lock timeout, causing unnecessary delays on other clients. This fix takes into consideration the notifications received before having completed the full lock acquisition. After that, the lock will be releaed as soon as possible. Fixes: bz#1708156 Change-Id: I2a306dbdb29fb557dcab7788a258bd75d826cc12 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	tests: Fix spurious failures in ta-write-on-bad-brick.t	Pranith Kumar K	2019-05-24	5	-17/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: afr_child_up_status_meta works only when LOOKUP on $M0 is successful. There are cases where quorum is not met and LOOKUP fails on $M0 which leads to failures similar to: grep: /mnt/glusterfs/0/.meta/graphs/active/patchy-replicate-0/private: Transport endpoint is not connected This was happening once in a while based on attribute-timeout and md-cache not serving the lookup. Fix: Find child-up status based on statedump instead. Also changed mount options to include --entry-timeout=0 and --attribute-timeout=0 updates bz#1193929 Change-Id: Ic0de72c3006d7399a5feb3e4d10d4748949b2ab3 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	tests: Test openfd heal doesn't truncate files	Pranith Kumar K	2019-05-24	2	-0/+218
\| \| \| \| \| \|	fixes bz#1706603 Change-Id: I0bfd30f787f157b7a54f71088f767ccfd7621208 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	tests/quick-read-with-upcall.t: increase the timeout	Amar Tumballi	2019-05-21	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Running with 2 second sleep at this place caused failures like: `not ok 14 [ 2014/ 7] < 41> 'test-message1 cat /mnt/glusterfs/1/test.txt' -> 'Got "test-message0" instead of "test-message1"'` in few runs in 100 iterations. But when increased to higher than sleep 3, have not seen any failures in 100 runs. While I don't know the exact reasons for the behavior yet, looks like this increase in wait helps to pass the regression without failures. updates: bz#1693692 Change-Id: I0610b79bea53e36de3eea6c11234b7fc9dfd6232 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	tests: change usleep() to sleep()	Sanju Rakonde	2019-05-16	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	While running a test case the following warning messages are seen on the display. To avoid suh warnings changing usleep() to sleep(). warning: usleep is deprecated, and will be removed in near future! warning: use "sleep 0.25" instead... updates: bz#1193929 Signed-off-by: Sanju Rakonde <srakonde@redhat.com> Change-Id: I48b79ede1c70b101f654635dd4cc83e50ea55b73
*	features/shard: Fix crash during background shard deletion in a specific case	Krutika Dhananjay	2019-05-16	3	-1/+155
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Consider the following case - 1. A file gets FALLOCATE'd such that > "shard-lru-limit" number of shards are created. 2. And then it is deleted after that. The unique thing about FALLOCATE is that unlike WRITE, all of the participant shards are resolved and created and fallocated in a single batch. This means, in this case, after the first "shard-lru-limit" number of shards are resolved and added to lru list, as part of resolution of the remaining shards, some of the existing shards in lru list will need to be evicted. So these evicted shards will be inode_unlink()d as part of eviction. Now once the fop gets to the actual FALLOCATE stage, the lru'd-out shards get added to fsync list. 2 things to note at this point: i. the lru'd out shards are only part of fsync list, so each holds 1 ref on base shard ii. and the more recently used shards are part of both fsync and lru list. So each of these shards holds 2 refs on base inode - one for being part of fsync list, and the other for being part of lru list. FALLOCATE completes successfully and then this very file is deleted, and background shard deletion launched. Here's where the ref counts get mismatched. First as part of inode_resolve()s during the deletion, the lru'd-out inodes return NULL, because they are inode_unlink()'d by now. So these inodes need to be freshly looked up. But as part of linking them in lookup_cbk (precisely in shard_link_block_inode()), inode_link() returns the lru'd-out inode object. And its inode ctx is still valid and ctx->base_inode valid from the last time it was added to list. But shard_common_lookup_shards_cbk() passes NULL in the place of base_pointer to __shard_update_shards_inode_list(). This means, as part of adding the lru'd out inode back to lru list, base inode is not ref'd since its NULL. Whereas post unlinking this shard, during shard_unlink_block_inode(), ctx->base_inode is accessible and is unref'd because the shard was found to be part of LRU list, although the matching ref didn't occur. This at some point leads to base_inode refcount becoming 0 and it getting destroyed and released back while some of its associated shards are continuing to be unlinked in parallel and the client crashes whenever it is accessed next. Fix is to pass base shard correctly, if available, in shard_link_block_inode(). Also, the patch fixes the ret value check in tests/bugs/shard/shard-fallocate.c Change-Id: Ibd0bc4c6952367608e10701473cbad3947d7559f Updates: bz#1696136 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>