glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	cluster/afr: Prioritize ENOSPC over other errors	karthik-us	2020-07-08	1	-0/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In a replicate/arbiter volume if file creations or writes fails on quorum number of bricks and on one brick it is due to ENOSPC and on other brick it fails for a different reason, it may fail with errors other than ENOSPC in some cases. Fix: Prioritize ENOSPC over other lesser priority errors and do not set op_errno in posix_gfid_set if op_ret is 0 to avoid receiving any error_no which can be misinterpreted by __afr_dir_write_finalize(). Also removing the function afr_has_arbiter_fop_cbk_quorum() which might consider a successful reply form a single brick as quorum success in some cases, whereas we always need fop to be successful on quorum number of bricks in arbiter configuration. Change-Id: I106e267f8b9451f681022f1cccb410d9bc824c08 Fixes: #1254 Signed-off-by: karthik-us <ksubrahm@redhat.com> (cherry picked from commit fa63b45ca5edf172b1b89b28b5db3c5129cc57b6)
*	afr: more quorum checks in lookup and new entry marking	Ravishankar N	2020-07-08	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: See github issue for details. Fix: -In lookup if the entry exists in 2 out of 3 bricks, don't fail the lookup with ENOENT just because there is an entrylk on the parent. Consider quorum before deciding. -If entry FOP does not succeed on quorum no. of bricks, do not perform new entry mark. Fixes: #1303 Change-Id: I56df8c89ad53b29fa450c7930a7b7ccec9f4a6c5 Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit c4a6748f25d2c1ab3ebcf89952278ebf94c8d371)
*	tests/bug-844688.t: test bug-844688.t is failing on master	Mohammed Rafi KC	2020-06-16	1	-11/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Test case bug-844688.t is failing quite frequently on master. This test check for the existence of call_stack, frame creation time. But there is a chance that at a point in time, the stack count might become zero. So doing the check in EXPECT_WITHIN make more sense. Change-Id: Id2ede7f6fdcb5f016f52c5c0557ce6ac510d4e96 Fixes: #1307 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> (cherry picked from commit 08a9f198d576bbae3596536bbd2c4d34dadd1a93)
*	features/utime: Don't access frame after stack-wind	Pranith Kumar K	2020-04-22	1	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: frame is accessed after stack-wind. This can lead to crash if the cbk frees the frame. Fix: Use new frame for the wind instead. Updates: #832 Change-Id: I64754609f1114b0bbd4d1336fa81a56f2cca6e03 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	write-behind: fix data corruption	Xavi Hernandez	2020-04-20	2	-0/+307
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There was a bug in write-behind that allowed a previous completed write to overwrite the overlapping region of data from a future write. Suppose we want to send three writes (W1, W2 and W3). W1 and W2 are sequential, and W3 writes at the same offset of W2: W2.offset = W3.offset = W1.offset + W1.size Both W1 and W2 are sent in parallel. W3 is only sent after W2 completes. So W3 should always overwrite the overlapping part of W2. Suppose write-behind processes the requests from 2 concurrent threads: Thread 1 Thread 2 <received W1> <received W2> wb_enqueue_tempted(W1) /* W1 is assigned gen X / wb_enqueue_tempted(W2) / W2 is assigned gen X / wb_process_queue() __wb_preprocess_winds() / W1 and W2 are sequential and all * other requisites are met to merge * both requests. / __wb_collapse_small_writes(W1, W2) __wb_fulfill_request(W2) __wb_pick_unwinds() -> W2 / In this case, since the request is * already fulfilled, wb_inode->gen * is not updated. / wb_do_unwinds() STACK_UNWIND(W2) / The application has received the * result of W2, so it can send W3. / <received W3> wb_enqueue_tempted(W3) / W3 is assigned gen X / wb_process_queue() / Here we have W1 (which contains * the conflicting W2) and W3 with * same gen, so they are interpreted * as concurrent writes that do not * conflict. / __wb_pick_winds() -> W3 wb_do_winds() STACK_WIND(W3) wb_process_queue() / Eventually W1 will be * ready to be sent / __wb_pick_winds() -> W1 __wb_pick_unwinds() -> W1 / Here wb_inode->gen is * incremented. */ wb_do_unwinds() STACK_UNWIND(W1) wb_do_winds() STACK_WIND(W1) So, as we can see, W3 is sent before W1, which shouldn't happen. The problem is that wb_inode->gen is only incremented for requests that have not been fulfilled but, after a merge, the request is marked as fulfilled even though it has not been sent to the brick. This allows that future requests are assigned to the same generation, which could be internally reordered. Solution: Increment wb_inode->gen before any unwind, even if it's for a fulfilled request. Special thanks to Stefan Ring for writing a reproducer that has been crucial to identify the issue. Change-Id: Id4ab0f294a09aca9a863ecaeef8856474662ab45 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Fixes: #884
*	afr: mark pending xattrs as a part of metadata heal	Ravishankar N	2020-04-20	1	-0/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	...if pending xattrs are zero for all children. Problem: If there are no pending xattrs and a metadata heal needs to be performed, it can be possible that we end up with xattrs inadvertendly deleted from all bricks, as explained in the BZ. Fix: After picking one among the sources as the good copy, mark pending xattrs on all sources to blame the sinks. Now even if this metadata heal fails midway, a subsequent heal will still choose one of the valid sources that it picked previously. Updates: #1067 Change-Id: If1b050b70b0ad911e162c04db4d89b263e2b8d7b Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 2d5ba449e9200b16184b1e7fc84cabd015f1f779)
*	glusterd: Brick process fails to come up with brickmux on	Vishal Pandey	2020-03-04	1	-1/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Issue: 1- In a cluster of 3 Nodes N1, N2, N3. Create 3 volumes vol1, vol2, vol3 with 3 bricks (one from each node) 2- Set cluster.brick-multiplex on 3- Start all 3 volumes 4- Check if all bricks on a node are running on same port 5- Kill N1 6- Set performance.readdir-ahead for volumes vol1, vol2, vol3 7- Bring N1 up and check volume status 8- All bricks processes not running on N1. Root Cause - Since, There is a diff in volfile versions in N1 as compared to N2 and N3 therefore glusterd_import_friend_volume() is called. glusterd_import_friend_volume() copies the new_volinfo and deletes old_volinfo and then calls glusterd_start_bricks(). glusterd_start_bricks() looks for the volfiles and sends an rpc request to glusterfs_handle_attach(). Now, since the volinfo has been deleted by glusterd_delete_stale_volume() from priv->volumes list before glusterd_start_bricks() and glusterd_create_volfiles_and_notify_services() and glusterd_list_add_order is called after glusterd_start_bricks(), therefore the attach RPC req gets an empty volfile path and that causes the brick to crash. Fix- Call glusterd_list_add_order() and glusterd_create_volfiles_and_notify_services before glusterd_start_bricks() cal is made in glusterd_import_friend_volume > Change-Id: Idfe0e8710f7eb77ca3ddfa1cabeb45b2987f41aa > Bug: bz#1773856 > Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Change-Id: Idfe0e8710f7eb77ca3ddfa1cabeb45b2987f41aa Fixes: bz#1808966 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	afr: prevent spurious entry heals leading to gfid split-brain	Ravishankar N	2020-02-28	2	-14/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In a hyperconverged setup with granular-entry-heal enabled, if a file is recreated while one of the bricks is down, and an index heal is triggered (with the brick still down), entry-self heal was doing a spurious heal with just the 2 good bricks. It was doing a post-op leading to removal of the filename from .glusterfs/indices/entry-changes as well as erroneous setting of afr xattrs on the parent. When the brick came up, the xattrs were cleared, resulting in the renamed file not getting healed and leading to gfid split-brain and EIO on the mount. Fix: Proceed with entry heal only when shd can connect to all bricks of the replica, just like in data and metadata heal. fixes: bz#1804594 Change-Id: I916ae26ad1fabf259bc6362da52d433b7223b17e Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 06453d77d056fbaa393a137ca277a20e38d2f67e)
*	Cluster/afr: Don't treat all bricks having metadata pending as split-brain	karthik-us	2020-02-25	2	-64/+130
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: We currently don't have a roll-back/undoing of post-ops if quorum is not met. Though the FOP is still unwound with failure, the xattrs remain on the disk. Due to these partial post-ops and partial heals (healing only when 2 bricks are up), we can end up in metadata split-brain purely from the afr xattrs point of view i.e each brick is blamed by atleast one of the others for metadata. These scenarios are hit when there is frequent connect/disconnect of the client/shd to the bricks. Fix: Pick a source based on the xattr values. If 2 bricks blame one, the blamed one must be treated as sink. If there is no majority, all are sources. Once we pick a source, self-heal will then do the heal instead of erroring out due to split-brain. This patch also adds restriction of all the bricks to be up to perform metadata heal to avoid any metadata loss. Removed the test case tests/bugs/replicate/bug-1468279-source-not-blaming-sinks.t as it was doing metadata heal even when only 2 of 3 bricks were up. Change-Id: I07a9d62f84ceda329dcab1f02a33aeed258dcb09 fixes: bz#1805097 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	server: Mount fails after reboot 1/3 gluster nodes	Mohit Agrawal	2020-02-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: At the time of coming up one server node(1x3) after reboot client is unmounted.The client is unmounted because a client is getting AUTH_FAILED event and client call fini for the graph.The client is getting AUTH_FAILED because brick is not attached with a graph at that moment Solution: To avoid the unmounting the client graph throw ENOENT error from server in case if brick is not attached with server at the time of authenticate clients. > Credits: Xavi Hernandez <xhernandez@redhat.com> > Change-Id: Ie6fbd73cbcf23a35d8db8841b3b6036e87682f5e > Fixes: bz#1793852 > Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> > (cherry picked from commit > f6421dff22a6ddaf14134f6894deae219948c89d) Change-Id: Ie6fbd73cbcf23a35d8db8841b3b6036e87682f5e Fixes: bz#1794020 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
*	features/shard: Send correct size when reads are sent beyond file size	Krutika Dhananjay	2019-10-24	1	-0/+29
\| \| \| \| \| \| \|	Change-Id: I0cebaaf55c09eb1fb77a274268ff564e871b743b fixes bz#1737141 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> (cherry picked from commit 51237eda7c4b3846d08c5d24d1e3fe9b7ffba1d4)
*	cluster/afr: Heal entries when there is a source & no healed_sinks	karthik-us	2019-10-17	1	-0/+89
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In a situation where B1 blames B2, B2 blames B1 and B3 doesn't blame anything for entry heal, heal will not complete even though we have clear source and sinks. This will happen because while doing afr_selfheal_find_direction() only the bricks which are blamed by non-accused bricks are considered as sinks. Later in __afr_selfheal_entry_finalize_source() when it tries to mark all the non-sources as sinks it fails to do so because there won't be any healed_sinks marked, no witness present and there will be a source. Fix: If there is a source and no healed_sinks, then reset all the locked sources to 0 and healed sinks to 1 to do conservative merge. Change-Id: If40d8bc95d52a52b2730f55bdcf135109b421548 Fixes: bz#1760706 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	afr: support split-brain CLI for replica 3	Ravishankar N	2019-10-17	1	-0/+111
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ever since we added quorum checks for lookups in afr via commit bd44d59741bb8c0f5d7a62c5b1094179dd0ce8a4, the split-brain resolution commands would not work for replica 3 because there would be no readables for the lookup fop. The argument was that split-brains do not occur in replica 3 but we do see (data/metadata) split-brain cases once in a while which indicate that there are a few bugs/corner cases yet to be discovered and fixed. Fortunately, commit 8016d51a3bbd410b0b927ed66be50a09574b7982 added GF_CLIENT_PID_GLFS_HEALD as the pid for all fops made by glfsheal. If we leverage this and allow lookups in afr when pid is GF_CLIENT_PID_GLFS_HEALD, split-brain resolution commands will work for replica 3 volumes too. Likewise, the check is added in shard_lookup as well to permit resolving split-brains by specifying "/.shard/shard-file.xx" as the file name (which previously used to fail with EPERM). Change-Id: I3c543dea79caf7cfbc1633e9089cb1cdd2538ba9 Fixes: bz#1760792 Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 47dbd753187f69b3835d2e42fdbe7485874c4b3e)
*	ctime/rebalance: Heal ctime xattr on directory during rebalance	Kotresh HR	2019-09-27	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After add-brick and rebalance, the ctime xattr is not present on rebalanced directories on new brick. This patch fixes the same. Note that ctime still doesn't support consistent time across distribute sub-volume. This patch also fixes the in-memory inconsistency of time attributes when metadata is self healed. Backport of: > Patch: https://review.gluster.org/23127 > Change-Id: Ia20506f1839021bf61d4753191e7dc34b31bb2df > BUG: 1734026 > Signed-off-by: Kotresh HR <khiremat@redhat.com> Patch: https://review.gluster.org/23127 Change-Id: Ia20506f1839021bf61d4753191e7dc34b31bb2df fixes: bz#1752413 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	dht: Custom xattrs are not healed in case of add-brick	root	2019-09-27	1	-0/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: If any custom xattrs are set on the directory before add a brick, xattrs are not healed on the directory after adding a brick. Solution: xattr are not healed because dht_selfheal_dir_mkdir_lookup_cbk checks the value of MDS and if MDS value is not negative selfheal code path does not take reference of MDS xattrs.Change the condition to take reference of MDS xattr so that custom xattrs are populated on newly added brick Backport of: > Patch: https://review.gluster.org/22520 > BUG: bz#1702299 > Change-Id: Id14beedb98cce6928055f294e1594b22132e811c > Signed-off-by: Mohit Agrawal <moagrawal@redhat.com> fixes: bz#1753561 Change-Id: Id14beedb98cce6928055f294e1594b22132e811c Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	afr/lookup: Pass xattr_req in while doing a selfheal in lookup	Mohammed Rafi KC	2019-09-23	1	-0/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We were not passing xattr_req when doing a name self heal as well as a meta data heal. Because of this, some xdata was missing which causes i/o errors Backport of > https://review.gluster.org/#/c/glusterfs/+/23024/ >Change-Id: Ibfb1205a7eb0195632dc3820116ffbbb8043545f >Fixes: bz#1728770 >Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Fixes: bz#1749307 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> (cherry picked from commit d026f0bcfd301712e4f0671ccf238f43f2e6dd30) Change-Id: Ibfb1205a7eb0195632dc3820116ffbbb8043545f
*	afr: wake up index healer threads	Ravishankar N	2019-09-05	1	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(Backport of https://review.gluster.org/#/c/glusterfs/+/23288/) ...whenever shd is re-enabled after disabling or there is a change in `cluster.heal-timeout`, without needing to restart shd or waiting for the current `cluster.heal-timeout` seconds to expire. See BZ 1743988 for more details. Change-Id: Ia5ebd7c8e9f5b54cba3199c141fdd1af2f9b9bfe fixes: bz#1743988 Reported-by: Glen Kiessling <glenk1973@hotmail.com> Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	tests: fix spurious failure of bug-1402841.t-mt-dir-scan-race.t	Ravishankar N	2019-09-05	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Since commit 600ba94183333c4af9b4a09616690994fd528478, shd starts healing as soon as it is toggled from disabled to enabled. This was causing the following line in the .t to fail on a 'fast' machine (always on my laptop and sometimes on the jenkins slaves). EXPECT_NOT "^0$" get_pending_heal_count $V0 because by the time shd was disabled, the heal was already completed. Fix: Increase the no. of files to be healed and make it a variable called FILE_COUNT, should we need to bump it up further because the machines become even faster. Also created pending metadata heals to increase the time taken to heal a file. fixes: bz#1749157 Change-Id: I5a26b08e45b8c19bce3c01ce67bdcc28ed48198d Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 724c657995a2e148243eeb78c68b620c6d7714a5)
*	performance/md-cache: Do not skip caching of null character xattr values	Anoop C S	2019-08-28	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Null character string is a valid xattr value in file system. But for those xattrs processed by md-cache, it does not update its entries if value is null('\0'). This results in ENODATA when those xattrs are queried afterwards via getxattr() causing failures in basic operations like create, copy etc in a specially configured Samba setup for Mac OS clients. On the other side snapview-server is internally setting empty string("") as value for xattrs received as part of listxattr() and are not intended to be cached. Therefore we try to maintain that behaviour using an additional dictionary key to prevent updation of entries in getxattr() and fgetxattr() callbacks in md-cache. Credits: Poornima G <pgurusid@redhat.com> Change-Id: I7859cbad0a06ca6d788420c2a495e658699c6ff7 Fixes: bz#1743782 Signed-off-by: Anoop C S <anoopcs@redhat.com> (cherry picked from commit b4b683736367d93daad08a5ee6ca95778c07c5a4)
*	afr: restore timestamp of parent dir during entry-heal	Ravishankar N	2019-08-26	1	-0/+78
\| \| \| \| \| \| \|	Fixes: bz#1741044 Change-Id: I29e338bac62104233a6f80212df8d0fb016affda Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 8e9c53ebf16705b9a1db2fc486dc24a5cb244ddd)
*	glusterd: ./tests/bugs/glusterd/bug-1595320.t is failing	Mohit Agrawal	2019-08-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: sometime ./tests/bugs/glusterd/bug-1595320.t is failing is failing at the time of checking brick_process after sending a kill signal to brick process Solution: Wait sometime after just sending a kill signal to brick process to make sure brick process is stopped > Change-Id: Iee9e91284618abfc62a550d47e4f9117785def58 > Fixes: bz#1743200 > Signed-off-by: Mohit Agrawal <moagrawal@redhat.com> > (cherry picked from commit 8f1620ad7f5d3d040fee55c5f873349800e2268d) Change-Id: Iee9e91284618abfc62a550d47e4f9117785def58 Fixes: bz#1745421 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	glusterd: add GF_TRANSPORT_BOTH_TCP_RDMA in glusterd_get_gfproxy_client_volfile	Atin Mukherjee	2019-07-16	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	... with out which volume creation fails with "volume create: <xyz>: failed: Failed to create volume files" >Fixes: bz#1716812 >Change-Id: I2f4c2c6d5290f066b54e1c1db19e25db9937bedb >Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Fixes: bz#1721105 Change-Id: I2f4c2c6d5290f066b54e1c1db19e25db9937bedb Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	features/shard: Fix block-count accounting upon truncate to lower size	Krutika Dhananjay	2019-07-03	1	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of: > BUG: bz#1705884 > Change-Id: I9128a192e9bf8c3c3a959e96b7400879d03d7c53 > Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> The way delta_blocks is computed in shard is incorrect, when a file is truncated to a lower size. The accounting only considers change in size of the last of the truncated shards. FIX: Get the block-count of each shard just before an unlink at posix in xdata. Their summation plus the change in size of last shard (from an actual truncate) is used to compute delta_blocks which is used in the xattrop for size update. Change-Id: I9128a192e9bf8c3c3a959e96b7400879d03d7c53 fixes: bz#1716871 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> (cherry picked from commit 400b66d568ad18fefcb59949d1f8368d487b9a80)
*	tests/utils: Fix py2/py3 util python scripts	Kotresh HR	2019-06-27	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Following files are fixed. tests/bugs/distribute/overlap.py tests/utils/changelogparser.py tests/utils/create-files.py tests/utils/gfid-access.py tests/utils/libcxattr.py Backport of: > Change-Id: I3db857cc19e19163d368d913eaec1269fbc37140 > BUG: 1193929 > Signed-off-by: Kotresh HR <khiremat@redhat.com> Change-Id: I3db857cc19e19163d368d913eaec1269fbc37140 updates: bz#1679998 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	cluster/ec: honor contention notifications for partially acquired locks	Xavi Hernandez	2019-06-03	1	-0/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	EC was ignoring lock contention notifications received while a lock was being acquired. When a lock is partially acquired (some bricks have granted the lock but some others not yet) we can receive notifications from acquired bricks, which should be honored, since we may not receive more notifications after that. Since EC was ignoring them, once the lock was acquired, it was not released until the eager-lock timeout, causing unnecessary delays on other clients. This fix takes into consideration the notifications received before having completed the full lock acquisition. After that, the lock will be releaed as soon as possible. Backport of: > BUG: bz#1708156 > Change-Id: I2a306dbdb29fb557dcab7788a258bd75d826cc12 > Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Fixes: bz#1714172 Change-Id: I2a306dbdb29fb557dcab7788a258bd75d826cc12 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	glusterd: define dumpops in the xlator_api of glusterd	Sanju Rakonde	2019-05-08	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: statedump is not capturing information related to glusterd Solution: statdump is not capturing glusterd info because trav->dumpops is null in gf_proc_dump_single_xlator_info () where trav is glusterd xlator object. trav->dumpops is null because we missed to define dumpops in xlator_api of glusterd. defining dumpops in xlator_api of glusterd fixes the issue. fixes: bz#1703759 Change-Id: If85429ecb1ef580aced8d5b88d09fc15258bfc4c Signed-off-by: Sanju Rakonde <srakonde@redhat.com> (cherry picked from commit 5d866c13efdcdeddf184f012aa88a652e90ff22e)
*	extras/hooks: syntactical errors in SELinux hooks, scipt logic improved	Milan Zink	2019-05-08	1	-1/+3
\| \| \| \| \| \| \|	Fixes: bz#1701818 Change-Id: Ia5fa1df81bbaec3a84653d136a331c76b457f42c Signed-off-by: Milan Zink <zeten30@gmail.com> (cherry picked from commit 1ad201a9fd6748d7ef49fb073fcfe8c6858d557d)
*	cluster/ec: fix fd reopen	Pranith Kumar K	2019-05-08	1	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently EC tries to reopen fd's that have been opened while a brick was down. This is done as part of regular write operations, just after having acquired the locks, and it's sent as a sub-fop of the main write fop. There were two problems: 1. The reopen was attempted on all UP bricks, even if a previous lock didn't succeed. This is incorrect because most probably the open will fail. 2. If reopen is sent and fails, the error is propagated to the main operation, causing it to fail when it shouldn't. To fix this, we only attempt reopens on bricks where the current fop owns a lock, and we prevent any error to be propagated to the main fop. To implement this behaviour an argument used to indicate the minimum number of required answers has overloaded to also include some flags. To make the change consistent, it has been necessary to rename the argument, which means that a lot of files have been changed. However there are no functional changes. This change has also uncovered a problem in discard code, which didn't correctely process requests of small sizes because no real discard fop was being processed, only a write of 0's on some region. In this case some fields of the fop remained uninitialized or with incorrect values. To fix this, a new function has been created to simulate success on a fop and it's used in the discard case. Thanks to Pranith for providing a test script that has also detected an issue in this patch. This patch includes a small modification of this script to force data to be written into bricks before stopping them. Backport of: > Change-Id: If272343873369186c2fb8f43c1d9c52c3ea304ec > BUG: bz#1699866 > Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Change-Id: If272343873369186c2fb8f43c1d9c52c3ea304ec Fixes: bz#1699917 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	cluster/afr: Remove local from owners_list on failure of lock-acquisition	Pranith Kumar K	2019-04-16	1	-0/+47
\| \| \| \| \| \| \| \| \| \| \| \| \|	When eager-lock lock acquisition fails because of say network failures, the local is not being removed from owners_list, this leads to accumulation of waiting frames and the application will hang because the waiting frames are under the assumption that another transaction is in the process of acquiring lock because owner-list is not empty. Handled this case as well in this patch. Added asserts to make it easier to find these problems in future. fixes bz#1699731 Change-Id: I3101393265e9827755725b1f2d94a93d8709e923 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	core: Log level changes do not effect on running client process	Mohit Agrawal	2019-04-16	1	-0/+113
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: commit c34e4161f3cb6539ec83a9020f3d27eb4759a975 set log-level per xlator during reconfigure only for a brick process not for the client process. Solution: 1) Change per xlator log-level only if brick_mux is enabled.To make sure about brick multiplex introudce a flag brick_mux at ctx->cmd_args. Note: There are two other changes done with this patch 1) Ignore client-log-level option to attach a brick with already running brick if brick_mux is enabled 2) Add a log to print pid of the running process to make easier debugging > Change-Id: I39e85de778e150d0685cd9a79425ce8b4783f9c9 > Signed-off-by: Mohit Agrawal <moagrawal@redhat.com> > Fixes: bz#1696046 > (Cherry picked from commit 798aadbe51a9a02dd98a0f861cc239ecf7c8ed57) > (Reviewed on upstream link https://review.gluster.org/#/c/glusterfs/+/22495/) Change-Id: If91682830f894ab8f6857f19dcb1797fc15ca64c Fixes: bz#1699715 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	core: Brick is not able to detach successfully in brick_mux environment	Mohit Agrawal	2019-04-16	1	-0/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In brick_mux environment, while volumes are stopped in a loop bricks are not detached successfully. Brick's are not detached because xprtrefcnt has not become 0 for detached brick. At the time of initiating brick detach process server_notify saves xprtrefcnt on detach brick and once counter has become 0 then server_rpc_notify spawn a server_graph_janitor_threads for cleanup brick resources.xprtrefcnt has not become 0 because socket framework is not working due to assigning 0 as a fd for socket. In commit dc25d2c1eeace91669052e3cecc083896e7329b2 there was a change in changelog fini to close htime_fd if htime_fd is not negative, by default htime_fd is 0 so it close 0 also. Solution: Initialize htime_fd to -1 after just allocate changelog_priv by GF_CALLOC > Fixes: bz#1699025 > Change-Id: I5f7ca62a0eb1c0510c3e9b880d6ab8af8d736a25 > Signed-off-by: Mohit Agrawal <moagrawal@redhat.com> > (cherry picked from commit b777d83001d8006420b6c7d2d88fe68950aa7e00) Change-Id: I7a2b6fc2d36405d51990376333e093661be48475 Fixes: bz#1699714 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	protocol/client: Do not fallback to anon-fd if fd is not open	Pranith Kumar K	2019-04-16	1	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If an open comes on a file when a brick is down and after the brick comes up, a fop comes on the fd, client xlator would still wind the fop on anon-fd leading to wrong behavior of the fops in some cases. Example: If lk fop is issued on the fd just after the brick is up in the scenario above, lk fop will be sent on anon-fd instead of failing it on that client xlator. This lock will never be freed upon close of the fd as flush on anon-fd is invalid and is not wound below server xlator. As a fix, failing the fop unless the fd has FALLBACK_TO_ANON_FD flag. Change-Id: I77692d056660b2858e323bdabdfe0a381807cccc fixes bz#1699198 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> (cherry picked from commit 92ae26ae8039847e38c738ef98835a14be9d4296)
*	afr: thin-arbiter read txn fixes	Ravishankar N	2019-04-16	1	-0/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Fixes afr_ta_read_txn() to handle inode refresh failures. code-path. - Fixes a double free issue of dict. Note: This patch address post-merge review comments for commit 69532c141be160b3fea03c1579ae4ac13018dcdf fixes: bz#1693992 Change-Id: Id5299b45b68569d47df6b73755918237a1592cb4 Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 500bd0014128e6727e83b6cb77e8ac94304b8f4a)
*	cluster/ec: Don't enqueue an entry if it is already healing	Ashish Pandey	2019-04-16	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: 1 - heal-wait-qlength is by default 128. If shd is disabled and we need to heal files, client side heal is needed. If we access these files that will trigger the heal. However, it has been observed that a file will be enqueued multiple times in the heal wait queue, which in turn causes queue to be filled and prevent other files to be enqueued. 2 - While a file is going through healing and a write fop from mount comes on that file, it sends write on all the bricks including healing one. At the end it updates version and size on all the bricks. However, it does not unset dirty flag on all the bricks, even if this write fop was successful on all the bricks. After healing completion this dirty flag remain set and never gets cleaned up if SHD is disabled. Solution: 1 - If an entry is already in queue or going through heal process, don't enqueue next client side request to heal the same file. 2 - Unset dirty on all the bricks at the end if fop has succeeded on all the bricks even if some of the bricks are going through heal. Change-Id: Ia61ffe230c6502ce6cb934425d55e2f40dd1a727 updates: bz#1693223 Signed-off-by: Ashish Pandey <aspandey@redhat.com> (cherry picked from commit 313dcefe7a62bd16cd794040df068f9bec9c6927)
*	cluster/afr: Send truncate on arbiter brick from SHD	karthik-us	2019-03-12	1	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In an arbiter volume configuration SHD will not send any writes onto the arbiter brick even if there is data pending marker for the arbiter brick. If we have a arbiter setup on the geo-rep master and there are data pending markers for the files on arbiter brick, SHD will not mark any data changelog during healing. While syncing the data from master to slave, if the arbiter-brick is considered as ACTIVE, then there is a chance that slave will miss out some data. If the arbiter brick is being newly added or replaced there is a chance of slave missing all the data during sync. Fix: If there is data pending marker for the arbiter brick, send truncate on the arbiter brick during heal, so that it will record truncate as the data transaction in changelog. Change-Id: I3242ba6cea6da495c418ef860d9c3359c5459dec fixes: bz#1687672 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	Bump up timeout for tests on AWS	Nigel Babu	2019-02-07	2	-2/+3
\| \| \| \| \| \|	Fixes: bz#1673267 Change-Id: I2b9be45f199f6436b858536c6f49be85902217f0 Signed-off-by: Nigel Babu <nigelb@redhat.com>
*	glusterd: manage upgrade to current master	Amar Tumballi	2019-02-04	3	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Scenarios tested: * Upgrade the node when there are stripe / tiering and regular type of volumes are present. - All volumes are started fine (as the change was not on brick volfile) - For tier, the functionality may not even work, as changetimerecorder is not present. - 'gluster volume info' properly shows as 'NOT SUPPORTED' for stripe and tier type of volume. * Upgrade in a rolling upgrade scenario, where an old version is able to connect to higher master. - on a normal volume, if the volfile-server was new, the newer client volfiles needed to have utime xlator conditionally. - with this one change, all other changes seem to work fine. Change-Id: Ib2d3b69dafa02b2c695a735b13c1aa70aba07cb8 updates: bz#1635688 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	readdir-ahead: do not zero-out iatt in fop cbk	Ravishankar N	2019-01-31	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	...when ctime is zero. ia_type and ia_gfid always need to be non-zero for things to work correctly. Problem: Commit c9bde3021202f1d5c5a2d19ac05a510fc1f788ac zeroed out the iatt buffer in the cbks of modification fops before unwinding if the ctime in the buffer was zero. This was causing the fops to fail: noticeable when AFR's 'consistent-metadata' option was enabled. (AFR zeros out the ctime when the option is set. See commit 4c4624c9bad2edf27128cb122c64f15d7d63bbc8). Fixes: -Do not zero out the ia_type and ia_gfid of the iatt buff under any circumstance. -Also, fixed _rda_inode_ctx_update_iatts() to always update these values from the incoming buf when ctime is zero. Otherwise we end up with zero ia_type and ia_gfid the first time the function is called and the incoming buf has ctime set to zero. fixes: bz#1670253 Reported-By:Michael Hanselmann <public@hansmi.ch> Change-Id: Ib72228892d42c3513c19fc6dfb543f2aa3489eca Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	features/shard: Ref shard inode while adding to fsync list	Krutika Dhananjay	2019-01-24	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PROBLEM: Lot of the earlier changes in the management of shards in lru, fsync lists assumed that if a given shard exists in fsync list, it must be part of lru list as well. This was found to be not true. Consider this - a file is FALLOCATE'd to a size which would make the number of participant shards to be greater than the lru list size. In this case, some of the resolved shards that are to participate in this fop will be evicted from lru list to give way to the rest of the shards. And once FALLOCATE completes, these shards are added to fsync list but without a ref. After the fop completes, these shard inodes are unref'd and destroyed while their inode ctxs are still part of fsync list. Now when an FSYNC is called on the base file and the fsync-list traversed, the client crashes due to illegal memory access. FIX: Hold a ref on the shard inode when adding to fsync list as well. And unref under following conditions: 1. when the shard is evicted from lru list 2. when the base file is fsync'd 3. when the shards are deleted. Change-Id: Iab460667d091b8388322f59b6cb27ce69299b1b2 fixes: bz#1669077 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
*	tests: run nfs tests only if --enable-gnfs is provided	Amar Tumballi	2019-01-24	39	-0/+79
\| \| \| \| \| \|	Fixes: bz#1665358 Change-Id: Idbf88ec3ac683733b32c313377eeb72f2819bf0d Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	tests/bug-brick-mux-restart: add extra information	Amar Tumballi	2019-01-24	1	-1/+12
\| \| \| \| \| \| \| \| \| \|	so that we can understand more about process memory and thread consumptions With this, we will also be able to understand more about the process details with brick-mux. updates: bz#1193929 Change-Id: I147a3e3814fc37dfb635217d0a0f0184fae0994f Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	afr: not resolve splitbrains when copies are of same size	Iraj Jamali	2019-01-22	1	-0/+55
\| \| \| \| \| \| \| \| \| \| \| \|	Automatic Splitbrain with size as policy must not resolve splitbrains when the copies are of same size. Determining if the sizes of copies are same and returning -1 in that case. updates: bz#1655052 Change-Id: I3d8e8b4d7962b070ed16c3ee02a1e5a926fd5eab Signed-off-by: Iraj Jamali <ijamali@redhat.com>
*	cluster/dht: Delete invalid linkto files in rmdir	N Balachandran	2019-01-22	1	-0/+63
\| \| \| \| \| \| \| \| \| \| \| \|	rm -rf <dir> fails on dirs which contain linkto files that point to themselves because dht incorrectly thought that they were cached files after looking them up. The fix now treats them as invalid linkto files and deletes them. Change-Id: I376c72a5309714ee339c74485e02cfb4e29be643 fixes: bz#1667804 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	afr: Splitbrain with size as policy must not resolve for directory	Sheetal Pamecha	2019-01-21	1	-0/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	In automatic Splitbrain resolution when favorite child policy is set as size, split brain resolution must not work for directories. Currently, if a directory is in split brain with both copies having same size, the source is selected arbitrarily and healed. fixes: bz#1655050 Change-Id: I5739498639c17c89874cc577362e543adab55f5d Signed-off-by: Sheetal Pamecha <sheetal.pamecha08@gmail.com>
*	core: Feature added to accept CidrIp in auth.allow	Rinku Kothiya	2019-01-18	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added functionality to gluster volume set auth.allow command to accept CIDR IP addresses. Modified few functions to isolate cidr feature so that it prevents other gluster commands such as peer probe to use cidr format ip. The functions are modified in such a way that they have an option to enable accepting of cidr format for other gluster commands if required in furture. updates: bz#1138841 Change-Id: Ie6734002a7078f1820e5df42d404411cce945e8b Credits: Mohit Agrawal Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
*	core: Resolve dict_leak at the time of destroying graph	Mohit Agrawal	2019-01-14	1	-0/+112
\| \| \| \| \| \| \| \| \| \| \| \|	Problem: In gluster code some of the places it call's get_new_dict to create a dictionary without taking reference so at the time of dict_unref it has become a leak Solution: To resolve the same call dict_new instead of get_new_dict updates bz#1650403 Change-Id: I3ccbbf5af07079a4fa09aad2cd0458c8625b2f06 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	libglusterfs/common-utils.c: Fix buffer size for checksum computation	Varsha Rao	2019-01-11	1	-0/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: When quorum count option is updated, the change is not reflected in the nfs-server.vol file. This is because in get_checksum_for_file(), when the last part of the file read has size less than buffer size, the read buffer stores old data value along with correct data value. Solution: Pass the bytes read instead of fixed buffer size, for calculating checksum. Change-Id: I4b641607c8a262961b3f3da0028a54e08c3f8589 fixes: bz#1657744 Signed-off-by: Varsha Rao <varao@redhat.com>
*	performance/md-cache: Fix a crash when statfs caching is enabled	Vijay Bellur	2019-01-11	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \|	mem_put() in STACK_UNWIND_STRICT causes a crash if frame->local is not null as md-cache obtains local from CALLOC. Changed two occurrences of STACK_UNWIND_STRICT to MDC_STACK_UNWIND as the latter macro does not rely on STACK_UNWIND_STRICT for cleaning up frame->local. fixes: bz#1632503 Change-Id: I1b3edcb9372a164ef73119e99a49e747765d7166 Signed-off-by: Vijay Bellur <vbellur@redhat.com>
*	tests: increase the timeout for distribute bug 1117851.t	Amar Tumballi	2019-01-11	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	The test is in borderline of 200seconds, and many a times, randomly takes little more time, and fails the whole regression. Better to keep timeout high, so we don't 'randomly' fail regression tests. updates: bz#1193929 Change-Id: Ib0d3a9f7a75ee44446ec6da5e0510cccf83eecaa Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	cluster/afr: Disable client side heals in AFR by default.	Sunil Kumar Acharya	2019-01-10	6	-0/+20
\| \| \| \| \| \| \| \| \|	With this changeset, default value for the AFR client side heal volume option is set to "off" fixes: bz#1663102 Change-Id: Ie4016932339c4896487e3e7cb5caca68739b7ba2 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>