glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	io-stats: dump io-stats info in /var/run/gluster	Amar Tumballi	2018-09-05	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It wouldn't make sense to allow iostats file to be written in any directory. While the formating makes sure we try to append io-stats-name for the file, so overwriting existing file is slim, but in any case it makes sense to restrict dumping to one directory. Below are the sample commands, and files created for the corresponding values: $ setfattr -n trusted.io-stats-dump -v file-for-dump $M0 In this case, the file would be in /var/run/gluster/file-for-dump $ setfattr -n trusted.io-stats-dump -v /dir1/dir2/file-for-dump $M0 In this case, then the dump file is in /var/run/gluster/dir1-dir2-file-for-dump Note that the value passed for this virtual xattr would be treated as a file, and even if the value has '/' in it, it would be changed to '-' for sanity. Fixes: bz#1625106 Change-Id: Id9ae6a40a190b8937c51662e6e1c2a0f6c86a0e0 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	multiple files: calloc -> malloc	Yaniv Kaul	2018-09-04	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	xlators/cluster/stripe/src/stripe-helpers.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible xlators/cluster/dht/src/tier.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible xlators/cluster/dht/src/dht-layout.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible xlators/cluster/dht/src/dht-helper.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible xlators/cluster/dht/src/dht-common.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible xlators/cluster/afr/src/afr.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible xlators/cluster/afr/src/afr-inode-read.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible tests/bugs/replicate/bug-1250170-fsync.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible tests/basic/gfapi/gfapi-async-calls-test.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible tests/basic/ec/ec-fast-fgetxattr.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible rpc/xdr/src/glusterfs3.h: Move to GF_MALLOC() instead of GF_CALLOC() when possible rpc/rpc-transport/socket/src/socket.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible rpc/rpc-lib/src/rpc-clnt.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible extras/geo-rep/gsync-sync-gfid.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible cli/src/cli-xml-output.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible cli/src/cli-rpc-ops.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible cli/src/cli-cmd-volume.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible cli/src/cli-cmd-system.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible cli/src/cli-cmd-snapshot.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible cli/src/cli-cmd-peer.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible cli/src/cli-cmd-global.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible It doesn't make sense to calloc (allocate and clear) memory when the code right away fills that memory with data. It may be optimized by the compiler, or have a microscopic performance improvement. In some cases, also changed allocation size to be sizeof some struct or type instead of a pointer - easier to read. In some cases, removed redundant strlen() calls by saving the result into a variable. 1. Only done for the straightforward cases. There's room for improvement. 2. Please review carefully, especially for string allocation, with the terminating NULL string. Only compile-tested! updates: bz#1193929 Original-Author: Yaniv Kaul <ykaul@redhat.com> Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Signed-off-by: Amar Tumballi <amarts@redhat.com> Change-Id: I16274dca4078a1d06ae09a0daf027d734b631ac2
*	core: python3	Kaleb S. KEITHLEY	2018-09-03	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	see https://review.gluster.org/#/c/19788/, https://review.gluster.org/#/c/19871/, https://review.gluster.org/#/c/19952/, https://review.gluster.org/#/c/20104/, https://review.gluster.org/#/c/20162/, https://review.gluster.org/#/c/20185/, https://review.gluster.org/#/c/20207/, https://review.gluster.org/#/c/20227/, https://review.gluster.org/#/c/20307/, https://review.gluster.org/#/c/20320/, https://review.gluster.org/#/c/20332/, https://review.gluster.org/#/c/20364/, https://review.gluster.org/#/c/20441/, and https://review.gluster.org/#/c/20484 shebangs changed from /usr/bin/python2 to /usr/bin/python3. (Reminder, various distribution packaging guidelines require use of explicit python version and don't allow '#!/usr/bin/env python', regardless of how handy that idiom may be.) glusterfs.spec(.in) package python{2,3}-gluster and python2 or python3 dependencies as appropriate. configure(.ac): + test for and use python2 or python3 as appropriate. If build machine has python2 and python3, use python3. Override by setting PYTHON=/usr/bin/python2 when running configure. + PYTHONDEV_CPPFLAGS from python[23]-config --includes is a better match to the original python sysconfig.get_python_inc(). All those other extraneous flags breaks the build. + Only change the shebangs once. Changing them over and over again, e.g., during a `make glusterrpms` in extras/LinuxRPM just sends make (is it really make that's looping?) into an infinite loop. If you figure out why, let me know. + Oldest python2 is python2.6 on CentOS 6 and Debian 8 (Jessie). Everything else has 2.7 or 3.x + logic from https://review.gluster.org/c/glusterfs/+/21050, which needs to be removed/merged after that patch is merged. Builds on CentOS 6, CentOS 7, Fedora 28, Fedora rawhide, and the mysterious RHEL > 7. Change-Id: Idae21d3b6f58b32372e1daa0d234e491e563198f updates: #411 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
*	cluster/dht: In rename, unlink after creating linkto file	N Balachandran	2018-09-03	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The linkto file creation for the dst was done in parallel with the unlink of the old src linkto. If these operations reached the brick out of order, we end up with a dst linkto file without a .glusterfs handle. Fixed by the unlinking only after the linkto file creation has completed. Change-Id: I4246f7655f5bc180f5ded7fd34d263b7828a8110 fixes: bz#1621981 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	cluster/afr: Delegate metadata heal with pending xattrs to SHD	Pranith Kumar K	2018-08-28	1	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: When metadata-self-heal is triggered on the mount, it blocks lookup until metadata-self-heal completes. But that can lead to hangs when lot of clients are accessing a directory which needs metadata heal and all of them trigger heals waiting for other clients to complete heal. Fix: Only when the heal is needed but the pending xattrs are not set, trigger metadata heal that could block lookup. This is the only case where different clients may give different metadata to the clients without heals, which should be avoided. Updates bz#1622821 Change-Id: I6089e9fda0770a83fb287941b229c882711f4e66 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	storage/posix: Increment trusted.pgfid in posix_mknod	N Balachandran	2018-08-23	1	-0/+57
\| \| \| \| \| \| \| \| \| \|	The value of trusted.pgfid.xx was always set to 1 in posix_mknod. This is incorrect if posix_mknod calls posix_create_link_if_gfid_exists. Change-Id: Ibe87ca6f155846b9a7c7abbfb1eb8b6a99a5eb68 fixes: bz#1619720 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	cli: Fix invalid access of option_str variable	ShyamsundarR	2018-08-20	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In function cli_cmd_volume_statedump_options_parse if the wordcount of arguments is exactly 3, then option_str would remain NULL, and hence the function will generate a segmentation fault on the strstr check in its body. This can be triggered when we run the command, `gluster volume statedump <volname>` The fix is to check if option_str is non-NULL before use and also to pass in a duplicated empty string to the dict key "options" when this is NULL. Fixes: bz#1619423 Change-Id: Ic029ab60b64890d92c7a0876a638929495d3aa59 Signed-off-by: ShyamsundarR <srangana@redhat.com>
*	cluster/afr: Fix bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t	karthik-us	2018-08-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In line #13 of the test case, it checks whether the file is present on first 2 bricks or not. If it is not present on even one of the bricks it will break the loop and checks for the dirty marking on the parent on the 3rd brick and checks for file not present on the 1st and 2nd bricks. The below scenario can happen in this case: - File gets created on 1st and 3rd bricks - In line #13 it sees file is not present on both 1st & 2nd bricks and breaks the loop - In line #51 test fails because the file will be present on the 1st brick - In line #53 test will fail because the file creation was not failed on quorum bricks and dirty marking will not be there on the parent on 3rd brick Fix: Don't break from the loop if file is present on either brick 1 or brick 2. Change-Id: I918068165e4b9124c1de86cfb373801b5b432bd9 fixes: bz#1612054 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	posix: Delete the entry if gfid link creation fails	karthik-us	2018-08-20	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: If the gfid link file inside .glusterfs is not present for a file, the operations which are dependent on the gfid will fail, complaining the link file does not exists inside .glusterfs. Fix: If the link file creation fails, fail the entry creation operation and delete the original file. Change-Id: Id767511de2da46b1f45aea45cb68b98d965ac96d fixes: bz#1612037 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	snapshot/handshake: store description after strdup	Mohammed Rafi KC	2018-08-20	1	-0/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	problem: During a handshake, when we import a friend data snap description variable was just referenced to dictionary value. Solution: snap description should have a separate memory allocated through gf_strdup Change-Id: I94da0c57919e1228919231d1563a001362b100b8 fixes: bz#1618004 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	performance/readdir-ahead: keep stats of cached dentries in sync with ↵	Krutika Dhananjay	2018-08-18	2	-0/+99
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	modifications PROBLEM: Stats of dentries that are readdirp'd ahead can become stale due to fops like writes, truncate etc that modify the file pointed by dentries. When a readdir is finally wound at offset corresponding to these entries, the iatts that are returned to the application come from readdir-ahead's cache, which are stale by now. This problem gets further aggravated when caching translators/modules cache and continue to serve this stale information. FIX: * Store the iatt in context of the inode pointed by dentry. * Whenever the inode pointed by dentry undergoes modification, in cbk of modification fop, update the iatt stored in inode-ctx to reflect the modification. * When serving a readdirp response from application, update iatts of dentries with the iatts stored in the context of inodes pointed by these dentries. * Some fops don't have valid iatts in their responses. For eg., write response whose data is still cached in write-behind will have zeroed out stat. In this case keep only ia_type and ia_gfid and reset rest of the iatt members to zero. - fuse-bridge in this case just sends "entry" information back to kernel and attr is not sent. - gfapi sets entry->inode to NULL and zeroes out the entire stat * There is one tiny race between the entry creation and a readdirp on its parent dir, which could cause the inode-ctx setting and inode ctx reading to happen on two different inode objects. To prevent this, when entry->inode doesn't eqaul to linked_inode, - fuse-bridge is made to send only "entry" information without attributes - gfapi sets entry->inode to NULL and zeroes out the entire stat. Change-Id: Ia27ff49a61922e88c73a1547ad8aacc9968a69df BUG: 1390050 Updates: bz#1390050 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
*	tests: Increase timeout to 30 minutes to handle lcov slowness	Pranith Kumar K	2018-08-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	This script on a normal setup takes 15 minutes. With lcov it needs to be increased. Considering we did 1.5X of the default $run_timeout in run-tests.sh, I am doing the same for this script. fixes bz#1614718 Change-Id: Ia571b33ff13deb8cbd8e48561769e876aa0b1aff Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	features/shard: Fix crash and test case in RENAME fop	Krutika Dhananjay	2018-08-14	1	-40/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Setting the refresh flag in inode ctx in shard_rename_src_cbk() is applicable only when the dst file exists and is sharded and has a hard link > 1 at the time of rename. But this piece of code is exercised even when dst doesn't exist. In this case, the mount crashes because local->int_inodelk.loc.inode is NULL. Change-Id: Iaf85a5ee3dff8b01a76e11972f10f2bb9dcbd407 Updates: bz#1611692 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
*	tests: fix brick check orders	Atin Mukherjee	2018-08-13	6	-34/+57
\| \| \| \| \| \| \| \| \| \| \| \|	fix brick checks for validating-server-quorum.t & quorum-validation.t ...and make brick_up_status_1 function more generic. Also fix a timing issue in bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t Change-Id: I797ef4cec5b160aafa979bae7151b1e99fcb48ac Updates: bz#1603063 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	tests/quick-read/bug-846240.t: fix a wrong test	Raghavendra G	2018-08-13	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Earlier this test did following things on M0 and M1 mounted on same volume: 1 create file M0/testfile 2 open an fd on M0/testfile 3 remove the file from M1, M1/testfile 4 echo "data" >> M0/testfile The test expects appending data to M0/testfile to fail. However, redirector ">>" creates a file if it doesn't exist. So, the only reason test succeeded was due to lookup succeeding due to stale stat in md-cache. This hypothesis is verified by two experiments: * Add a sleep of 10 seconds before append operation. md-cache cache expires and lookup fails followed by creation of file and hence append succeeds to new file. * set md-cache timeout to 600 seconds and test never fails even with sleep 10 before append operation. Reason is stale stat in md-cache survives sleep 10. So, the spurious nature of failure was dependent on whether lookup is done when stat is present in md-cache or not. The actual test should've been to write to the fd opened in step 2 above. I've changed the test accordingly. Note that this patch also remounts M0 after initial file creation as open-behind disables opening-behind on witnessing a setattr on the inode and touch involves a setattr. On remount, create operation is not done and hence file is opened-behind. Change-Id: I739f255e0a62ff0024f0824dad3539974955df99 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Fixes: bz#1615096
*	cluster/afr: Fix bug-1433571-undo-pending-only-on-up-bricks.t	karthik-us	2018-08-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: The test case was checking for the entry pending marker reset on the root after performing client side lookup at line #60-63. But sometimes the entry heal was not getting completed immediately. Fix: Wait for the entry heal to complete before checking the changelog. Change-Id: I42fde21b04a126ab044ce58373a996d72f125d96 fixes: bz#1614730 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	tests: potential fixes to bugs/replicate/bug-1408712.t	Ravishankar N	2018-08-13	1	-2/+15
\| \| \| \| \| \| \| \|	See BZ for details. Change-Id: I2cc2064f14d80271ebcc21747103ce4cee848cbf fixes: bz#1615078 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	tests: fix tests/bugs/shard/configure-lru-limit.t	Atin Mukherjee	2018-08-13	1	-0/+4
\| \| \| \| \| \| \| \|	Check for the bricks to be up before attempting to mount. Change-Id: I1224908137016df3007f4467aa9760967ce0694d Fixes: bz#1615092 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	tests: Fix cleanup routine for some mux tests	ShyamsundarR	2018-08-13	2	-6/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some of the mux tests, set a trap to catch test exit and call cleanup. This will cause cleanup to be invoked twice in case the test times out, or even otherwise, as include.rc also sets a trap to cleanup on exit (TERM and others). This leads to the tarballs generated on failures for these tests to be empty and does not aid debugging. This patch corrects this pattern across the tests to the more standard cleanup at the end. Fixes: bz#1615037 Change-Id: Ib83aeb09fac2aa591b390b9fb9e1f605bfef9a8b Signed-off-by: ShyamsundarR <srangana@redhat.com>
*	Make sure EXPECT_WITHIN executes the statement multiple times	Pranith Kumar K	2018-08-12	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	When we pass a command to be executed in EXPECT_WITHIN and we use `` the value is passed by value, so if the first execution gives a result that is different from the expected value, EXPECT_WITHIN test will fail because the command will not be re-evaluated. Changed the expression with `` to a function. Added sleep(3) in afr.c for reconfigure to both RC and re-test after the change. fixes bz#1614662 Change-Id: I3bc8a75b996729261aa48067f6ed8da9c6273b13 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	glusterd: Compare volume_id before start/attach a brick	Mohit Agrawal	2018-08-10	1	-0/+93
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: After reboot a node brick is not coming up because fsid comparison is failed before start a brick Solution: Instead of comparing fsid compare volume_id to resolve the same because fsid is changed after reboot a node but volume_id persist as a xattr on brick_root path at the time of creating a volume. Change-Id: Ic289aab1b4ebfd83bbcae8438fee26ae61a0fff4 fixes: bz#1612418 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	tests: Increase timeout for mpx restart crash test	ShyamsundarR	2018-08-09	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In lcov based regression testing environments, all tests take more time than what occurs in centos7 regressions. Possibly due to code instrumentation for lcov purposes. Due to this the test, bug-1432542-mpx-restart-crash.t constantly times out. This patch increases the timeout for the same to enable lcov tests to pass on a more regular basis. It was also noted by Nithya that the test at times generated an OOM kill on the regression machines. In order to reduce runtime memory foot print of the tests, FUSE mounts are unmounted as soon as the required test is complete. Fixes: bz#1608568 Change-Id: I37f8d4b45807a69c52c7c7df4923c0fc33fab4e4 Signed-off-by: ShyamsundarR <srangana@redhat.com>
*	glusterd: Bricks of a normal volumes should not attach to ↵	Sanju Rakonde	2018-08-03	2	-33/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gluster_shared_storage bricks Problem: In a brick multiplexing environment, Bricks of a normal volume created by user are getting attached to the bricks of a volume "gluster_shared_storage" which is created by enabling the enable-shared-storage option. Mounting gluster_shared_storage has strict authentication checks. when we attach bricks of a normal volume to bricks of gluster_shared_storage, mounting the normal volume created by user will fail due to strict authentication checks. Solution: We should not attach bricks of a normal volume to brick process of gluster_shared_storage volume and vice versa. fixes: bz#1610726 Change-Id: If1b5a2a02675789a2915ba480fb48c145449163d Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	performance/open-behind: don't use anonymous fds for reads by default	Raghavendra G	2018-08-02	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	anonymous fds interfere with working of read-ahead as read-ahead won't be able to store its cache in fd. Also, as seen in bz 1455872, anonymous fds also affect performance of large file sequential reads as the cost of opening fd for each read on brick stack is significant. So, have a proper fd which enables read-ahead to store its cache and brick stack to reuse the fd during reads. With this change test tests/bugs/snapshot/bug-1167580-set-proper-uid-and-gid-during-nfs-access.t fails consistently. The failure can also be seen with open-behind off. bz 1611532 has been filed to track the issue with test. Thanks to Rafi <rkavunga@redhat.com> for assistance provided in debugging test failure. Change-Id: Ifa52d8ff017f115e83247f3396b9d27f0295ce3f Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Fixes: bz#1455872
*	performance/md-cache: update cache only from fops issued after previous ↵	Raghavendra G	2018-08-02	4	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	invalidation Invalidations are triggered mainly by two codepaths - upcall and write-behind unwinding a cached write with zeroed out stat. For the case of upcall, following race can happen: * stat s1 is fetched from brick * invalidation is detected on brick * invalidation is propagated to md-cache and cache is invalidated * s1 updates md-cache with a stale state For the case of write-behind, imagine following sequence of operations, * A stat s1 was issued from application thread t1 when size of file was s1 * stat s1 completes on brick stack, but yet to reach md-cache * A write w1 from application thread t2 extends file to size s2 is cached in write-behind and response is unwound with zeroed out stat * md-cache while handling write-cbk, invalidates cache * md-cache receives response for s1, updates cache with stale stat with size of s1 overwriting invalidation state Fix is to remember when s1 was incident on md-cache and update cache with results of s1 only if the it was incident after invalidation of cache. This patch identified some bugs in regression tests which is tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1608158. As a stop gap measure I am marking following tests as bad basic/afr/split-brain-resolution.t bugs/bug-1368312.t bugs/replicate/bug-1238398-split-brain-resolution.t bugs/replicate/bug-1417522-block-split-brain-resolution.t bugs/replicate/bug-1438255-do-not-mark-self-accusing-xattrs.t Change-Id: Ia4bb9dd36494944e2d91e9e71a79b5a3974a8c77 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Updates: bz#1512691
*	glusterd: Add multiple checks before attach/start a brick	Mohit Agrawal	2018-07-27	3	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In brick mux scenario sometime glusterd is not able to start/attach a brick and gluster v status shows brick is already running Solution: 1) To make sure brick is running check brick_path in /proc/<pid>/fd , if a brick is consumed by the brick process it means brick stack is come up otherwise not 2) Before start/attach a brick check if a brick is mounted or not 3) At the time of printing volume status check brick is consumed by any brick process Test: To test the same followed procedure 1) Setup brick mux environment on a vm 2) Put a breaking point in gdb in function posix_health_check_thread_proc at the time of notify GF_EVENT_CHILD_DOWN event 3) unmount anyone brick path forcefully 4) check gluster v status it will show N/A for the brick 5) Try to start volume with force option, glusterd throw message "No device available for mount brick" 6) Mount the brick_root path 7) Try to start volume with force option 8) down brick is started successfully Change-Id: I91898dad21d082ebddd12aa0d1f7f0ed012bdf69 fixes: bz#1595320 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
*	features/shard: Make lru limit of inode list configurable	Krutika Dhananjay	2018-07-23	1	-0/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently this lru limit is hard-coded to 16384. This patch makes it configurable to make it easier to hit the lru limit and enable testing of different cases that arise when the limit is reached. The option is features.shard-lru-limit. It is by design allowed to be configured only in init() but not in reconfigure(). This is to avoid all the complexity associated with eviction of least recently used shards when the list is shrunk. Change-Id: Ifdcc2099f634314fafe8444e2d676e192e89e295 updates: bz#1605056 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
*	All: run codespell on the code and fix issues.	Yaniv Kaul	2018-07-22	4	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Please review, it's not always just the comments that were fixed. I've had to revert of course all calls to creat() that were changed to create() ... Only compile-tested! Change-Id: I7d02e82d9766e272a7fd9cc68e51901d69e5aab5 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
*	build/core: location of xattr.h	Kaleb S. KEITHLEY	2018-07-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	starting with libattr-devel-2.4.48-1 in Fedora 28 <attr/xattr.h> has been removed from the package. On Fedora, RHEL/CentOS, and SUSE, the glibc-headers package has provided a near identical file, <sys/xattr.h>, in all the releases that we care about. On Debian, libc6-dev provides <sys/xattr.h> all the way back to 8/Jessie and presumably all Ubuntu derivatives since then. Note that on Debian the sys headers are installed in /usr/include/$DEB_HOST_GNU_TYPE/sys/... Change-Id: Id07c4b225bdaa6556bd54772604e75b8f346fb60 updates: bz#1193929 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
*	cluster/afr: Mark dirty for entry transactions for quorum failures	karthik-us	2018-07-16	1	-0/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: If an entry creation transaction fails on quprum number of bricks it might end up setting the pending changelogs on the file itself on the brick where it got created. But the parent does not have any entry pending marker set. This will lead to the entry not getting healed by the self heal daemon automatically. Fix: For entry transactions mark dirty on the parent if it fails on quorum number of bricks, so that the heal can do conservative merge and entry gets healed by shd. Change-Id: I56448932dd409b3ddb095e2ae32e037b6157a607 fixes: bz#1586020 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	Quota: Fix crawling of files	Sanoj Unnikrishnan	2018-07-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Running "find ." does not crawl files. It goes over the directories and lists all dentries with getdents system call. Hence the files are not looked up. Solution: explicitly triggerr stat on files with find . -exec stat {} \; since crawl can take slightly longer, updating timeout in test case Change-Id: If3c1fba2ed8e300c9cc08c1b5c1ba93cb8e4d6b6 fixes: bz#1533000 Signed-off-by: Sanoj Unnikrishnan <sunnikri@redhat.com>
*	snapshot : remove stale entry	Sunny Kumar	2018-07-12	1	-0/+57
\| \| \| \| \| \| \| \| \| \| \|	During snap delete after removing brick-path we should remove snap-path too i.e. /var/run/gluster/snaps/<snap-name>. During snap deactivate also we should remove snap-path. Change-Id: Ib80b5d8844d6479d31beafa732e5671b0322248b fixes: bz#1597662 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
*	md-cache: Do not invalidate cache post set/remove xattr	Poornima G	2018-07-11	1	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since setxattr and removexattr fops cbk do not carry poststat, the stat cache was being invalidated in setxatr/remoxattr cbk. Hence the further lookup wouldn't be served from cache. To prevent this invalidation, md-cache is modified to get the poststat in set/removexattr_cbk in dict. Co-authored with Xavi Hernandez. Change-Id: I6b946be2d20b807e2578825743c25ba5927a60b4 fixes: bz#1586018 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Signed-off-by: Poornima G <pgurusid@redhat.com>
*	dht: Inconsistent permission for directories after brick stop/start	Mohit Agrawal	2018-07-11	1	-0/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Inconsistent access permissions on directories after bringing back the down sub-volumes, in case of directories dht_setattr first wind a call on MDS once call is finished on MDS then wind a call on NON-MDS.At the time of revalidating dht just compare the uid/gid with stbuf uid/gid and if anyone differs set a flag to heal the same. Solution: Add a condition to compare permission also in dht_revalidate_cbk to set a flag to call dht_dir_attr_heal. BUG: 1584517 Change-Id: I3e039607148005015b5d93364536158380d4c5aa fixes: bz#1584517 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
*	io-stats: Terminate dump thread when dump interval is set to 0	ShyamsundarR	2018-07-09	1	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	_ios_dump_thread is not terminated by the function _ios_destroy_dump_thread when the diagnostic interval is set to 0 (which means disable auto dumping). During reconfigure, if the value changes from 0 to another then the thread is started, but on reconfiguring this to 0 the thread is not being terminated. Further, if the value is changed from 0 to X to 0 to Y, where X and Y are 2 arbitrary duration numbers, the reconfigure code ends up starting one more thread (for each change from 0 to a valid interval). This patch fixes the same by terminating the thread when the value changes from non-zero to 0. NOTE: It would seem nicer to use conf->dump_thread and check its value for thread presence etc. but there is no documented invalid value for the same, and hence an invalid check is not feasible, thus introducing a new running bool to determine the same. Fixes: bz#1598548 Change-Id: I3e7d2ce8f033879542932ac730d57bfcaf14af73 Signed-off-by: ShyamsundarR <srangana@redhat.com>
*	core: Increase SCRIPT_TIMEOUT for bug-1432542-mpx-restart-crash.t	Mohit Agrawal	2018-07-09	1	-1/+1
\| \| \| \| \| \| \|	BUG: 1599250 Change-Id: I27bda2a0764580289e7154766e13a0c358cba3a8 fixes: bz#1599250 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	glusterd: Introduce daemon-log-level cluster wide option	Atin Mukherjee	2018-07-03	1	-0/+93
\| \| \| \| \| \| \| \| \| \| \| \|	This option, applicable to the node level daemons can be very helpful in controlling the log level of these services. Please note any daemon which is started prior to setting the specific value of this option (if not INFO) will need to go through a restart to have this change into effect. Change-Id: I7f6d2620bab2b094c737f5cc816bc093e9c9c4c9 fixes: bz#1597473 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	storage/posix: Fix posix_symlinks_match()	Pranith Kumar K	2018-06-26	1	-5/+10
\| \| \| \| \| \| \| \| \| \|	1) snprintf into linkname_expected should happen with PATH_MAX 2) comparison should happen with linkname_actual with complete string linkname_expected fixes bz#1595190 Change-Id: Ic3b3c362dc6c69c046b9a13e031989be47ecff14 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	core/various: python3 compat, prepare for python2 -> python3	Kaleb S. KEITHLEY	2018-06-21	1	-13/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	see https://review.gluster.org/#/c/19788/, https://review.gluster.org/#/c/19871/, https://review.gluster.org/#/c/19952/, https://review.gluster.org/#/c/20104/, https://review.gluster.org/#/c/20162/, https://review.gluster.org/#/c/20185/, https://review.gluster.org/#/c/20207/, https://review.gluster.org/#/c/20227/, and https://review.gluster.org/#/c/20307/ This patch fixes more selected comma white space (ws_comma) as suggested by the 2to3 utility. Note: Fedora packaging guidelines and SUSE rpmlint require explicit shebangs, so popular practices like #!/usr/bin/env python and or #!/usr/bin/python3 Note: Selected small fixes from 2to3 utility. Specifically apply, basestring, funcattrs, has_key, idioms, map, numliterals, raise, set_literal, types, urllib, and zip have already been applied. Also version agnostic imports for urllib, cpickle, socketserver, _thread, queue, etc., suggested by Aravinda in https://review.gluster.org/#/c/19767/1 Note: these 2to3 fixes report no changes are necessary: asserts, buffer, exec, execfile, exitfunc, filter, getcwdu, imports2, input, intern, itertools, metaclass, methodattrs, ne, next, nonzero, operator, paren, raw_input, reduce, reload, renames, repr, standarderror, sys_exc, throw, tuple_params, xreadlines. Change-Id: I60932030813484803f73733a9b2b7b23c7a843fd updates: #411 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
*	features/shard: Perform shards deletion in the background	Krutika Dhananjay	2018-06-20	5	-101/+127
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A synctask is created that would scan the indices from .shard/.remove_me, to delete the shards associated with the gfid corresponding to the index bname and the rate of deletion is controlled by the option features.shard-deletion-rate whose default value is 100. The task is launched on two accounts: 1. when shard receives its first-ever lookup on the volume 2. when a rename or unlink deleted an inode Change-Id: Ia83117230c9dd7d0d9cae05235644f8475e97bc3 updates: bz#1568521 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
*	afr: heal gfids when file is not present on all bricks	Ravishankar N	2018-06-19	1	-0/+128
\| \| \| \| \| \| \| \| \| \| \|	commit 20fa80057eb430fd72b4fa31b9b65598b8ec1265 introduced a regression wherein if a file is present in only 1 brick of replica and doesn't have a gfid associated with it, it doesn't get healed upon the next lookup from the client. Fix it. Change-Id: I7d1111dcb45b1b8b8340a7d02558f05df70aa599 fixes: bz#1591193 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	features/shard: Introducing ".shard/.remove_me" for atomic shard deletion ↵	Krutika Dhananjay	2018-06-13	5	-31/+188
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(part 1) PROBLEM: Shards are deleted synchronously when a sharded file is unlinked or when a sharded file participating as the dst in a rename() is going to be replaced. The problem with this approach is it makes the operation really slow, sometimes causing the application to time out, especially with large files. SOLUTION: To make this operation atomic, we introduce a ".remove_me" directory. Now renames and unlinks will simply involve two steps: 1. creating an empty file under .remove_me named after the gfid of the file participating in unlink/rename 2. carrying out the actual rename/unlink A synctask is created (more on that in part 2) to scan this directory after every unlink/rename operation (or upon a volume mount) and clean up all shards associated with it. All of this happens in the background. The task takes care to delete the shards associated with the gfid in .remove_me only if this gfid doesn't exist in backend, ensuring that the file was successfully renamed/unlinked and its shards can be discarded now safely. Change-Id: Ia1d238b721a3e99f951a73abbe199e4245f51a3a updates: bz#1568521 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
*	glusterd: address test failures with brick mux enabled	Atin Mukherjee	2018-05-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch addresses following: 1. On volume stop, for the last brick, pmap_registry_remove () is invoked by glusterd. 2. If a brick process is sigkilled, remove all the associated brick instances from the portmap. 3. Bump up PROCESS_UP_TIMEOUT to 45. 4. gf_attach to kill a brick takes more time in mux (which is an issue that needs a fix), but in the interim, give br-state-check.t more time to complete (there are 2 kill_bricks, each taking 120 seconds, and the test usually passes in 30 odd seconds, hence bumping this up to 350 seconds) 5. The test bug-1559004-EMLINK-handling.t is taking ~950 seconds at times on master without mux, in mux cases, when it fails, it is almost at the last iteration, hence bumping the timeout for this test case to reduce regression error rates Updates: bz#1577672 Change-Id: I1922675e112baca4c125c4c094eaa42a11e34e67 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	tests/bug-1543279: mark it as bad	Raghavendra G	2018-05-25	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	There seems to be races which are not fixed by commit 9704d203f0. Though the test itself is not bad, it is failing very frequently. So, till the issue is fixed, marking this test as bad. updates: bz#1543279 Change-Id: I4a5869da1586178914ffb9563414879e6beab183 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
*	Fix a test case that can fail for certain builds	Poornima G	2018-05-25	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	If certain builds have readdir-ahead disabled by default, the test case fails, as it asumes readdir-ahead is enabled by default. Hence explicitly enabling readdir-ahead. Change-Id: Ib5bef266707c2c557aeb2cf2ffbf4d0c92025c46 fixes: bz#1582051 Signed-off-by: Poornima G <pgurusid@redhat.com>
*	test: Marking test case bug-1309462.t as bad	ShyamsundarR	2018-05-23	1	-1/+2
\| \| \| \| \| \| \| \| \|	Details in the bug Updates: bz#1581735 Change-Id: Id984e10b60daf274d5510e3ccbf7abf0cb19f368 Signed-off-by: ShyamsundarR <srangana@redhat.com>
*	afr: fix bug-1363721.t failure	Ravishankar N	2018-05-21	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In the .t, when the only good brick was brought down, writes on the fd were still succeeding on the bad bricks. The inflight split-brain check was marking the write as failure but since the write succeeded on all the bad bricks, afr_txn_nothing_failed() was set to true and we were unwinding writev with success to DHT and then catching the failure in post-op in the background. Fix: Don't wind the FOP phase if the write_subvol (which is populated with readable subvols obtained in pre-op cbk) does not have at least 1 good brick which was up when the transaction started. Note: This fix is not related to brick muliplexing. I ran the .t 10 times with this fix and brick-mux enabled without any failures. Change-Id: I915c9c366aa32cd342b1565827ca2d83cb02ae85 updates: bz#1577672 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	make posix return errors when gfid2path key is absent	Raghavendra Bhat	2018-05-18	1	-0/+70
\| \| \| \| \| \|	Change-Id: I3a8d452d00560dac5e0b7ff0b1835d1f20a59f91 updates: bz#1570962 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
*	glusterd: handling brick termination in brick-mux	Sanju Rakonde	2018-05-07	1	-0/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: There's a race between the glusterfs_handle_terminate() response sent to glusterd from last brick of the process and the socket disconnect event that encounters after the brick process got killed. Solution: When it is a last brick for the brick process, instead of sending GLUSTERD_BRICK_TERMINATE to brick process, glusterd will kill the process (same as we do it in case of non brick multiplecing). The test case is added for https://bugzilla.redhat.com/show_bug.cgi?id=1549996 Change-Id: If94958cd7649ea48d09d6af7803a0f9437a85503 fixes: bz#1545048 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	cluster/dht: fixes to parallel renames to same destination codepath	Raghavendra G	2018-05-07	1	-0/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Test case: # while true; do uuid="`uuidgen`"; echo "some data" > "test$uuid"; mv "test$uuid" "test" -f \|\| break; echo "done:$uuid"; done This script was run in parallel from multiple mountpoints Along the course of getting the above usecase working, many issues were found: Issue 1: ======= consider a case of rename (src, dst). We can encounter a situation where, * dst is a file present at the time of lookup * dst is removed by the time rename fop reaches glusterfs In this scenario, acquring inodelk on dst fails with ESTALE resulting in failure of rename. However, as per POSIX irrespective of whether dst is present or not, rename should be successful. Acquiring entrylk provides synchronization even in races like this. Algorithm: 1. Take inodelks on src and dst (if dst is present) on respective cached subvols. These inodelks are done to preserve backward compatibility with older clients, so that synchronization is preserved when a volume is mounted by clients of different versions. Once relevant older versions (3.10, 3.12, 3.13) reach EOL, this code can be removed. 2. Ignore ENOENT/ESTALE errors of inodelk on dst. 3. protect namespace of src and dst. To protect namespace of a file, take inodelk on parent on hashed subvol, then take entrylk on the same subvol on parent with basename of file. inodelk on parent is done to guard against changes to parent layout so that hashed subvol won't change during rename. 4. <rest of rename continues> 5. unlock all locks Issue 2: ======== linkfile creation in lookup codepath can race with a rename. Imagine the following scenario: * lookup finds a data-file with gfid - gfid-dst - without a corresponding linkto file on hashed-subvol. It decides to create linkto file with gfid - gfid-dst. - Note that some codepaths of dht-rename deletes linkto file of dst as first step. So, a lookup racing with an in-progress rename can easily run into this situation. * a rename (src-path:gfid-src, dst-path:gfid-dst) renames data-file and hence gfid of data-file changes to gfid-src with path dst-path. * lookup proceeds and creates linkto file - dst-path - with gfid - dst-gfid - on hashed-subvol. * rename tries to create a linkto file dst-path with src-gfid on hashed-subvol, but it fails with EEXIST. But EEXIST is ignored during linkto file creation. Now we've ended with dst-path having different gfids - dst-gfid on linkto file and src-gfid on data file. Future lookups on dst-path will always fail with ESTALE, due to differing gfids. The fix is to synchronize linkfile creation in lookup path with rename using the same mechanism of protecting namespace explained in solution of Issue 1. Once locks are acquired, before proceeding with linkfile creation, we check whether conditions for linkto file creation are still valid. If not, we skip linkto file creation. Issue 3: ======== gfid of dst-path can change by the time locks are acquired. This means, either another rename overwrote dst-path or dst-path was deleted and recreated by a different client. When this happens, cached-subvol for dst can change. If rename proceeds with old-gfid and old-cached subvol, we'll end up in inconsistent state(s) like dst-path with different gfids on different subvols, more than one data-file being present etc. Fix is to do the lookup with a new inode after protecting namespace of dst. Post lookup, we've to compare gfids and correct local state appropriately to be in sync with backend. Issue 4: ======== During revalidate lookup, if following a linkto file doesn't lead to a valid data-file, local->cached-subvol was not reset to NULL. This means we would be operating on a stale state which can lead to inconsistency. As a fix, reset it to NULL before proceeding with lookup everywhere. Issue 5: ======== Stale dentries left out in inode table on brick resulted in failures of link fop even though the file/dentry didn't exist on backend fs. A patch is submitted to fix this issue. Please check the dependency tree of current patch on gerrit for details In short, we fix the problem by not blindly trusting the inode-table. Instead we validate whether dentry is present by doing lookup on backend fs. Change-Id: I832e5c47d232f90c4edb1fafc512bf19bebde165 updates: bz#1543279 BUG: 1543279 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>