glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	features/locks:Use pthread_mutex_unlock() instead of pthread_mutex_lock()	Susant Palai	2018-11-08	1	-1/+1
\| \| \| \| \| \| \| \| \|	Fixes CID 1396581 Change-Id: Ic04091b5783a75d8e1e605a9c1c28b77fea048d3 updates: bz#1647972 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Signed-off-by: Susant Palai <spalai@redhat.com>
*	lock: Do not allow meta-lock count to be more than one	Susant Palai	2018-11-08	1	-1/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the current scheme of glusterfs where lock migration is experimental, (ideally) the rebalance process which is migrating the file should request for a metalock. Hence, the metalock count should not be more than one for an inode. In future, if there is a need for meta-lock from other clients, this patch can be reverted. Since pl_metalk is called as part of setxattr operation, any client process(non-rebalance) residing outside trusted network can exhaust memory of the server node by issuing setxattr repetitively on the metalock key. The current patch makes sure that more than one metalock cannot be granted on an inode. Fixes CVE-2018-14660 updates: bz#1647972 Change-Id: Ie1e697766388718804a9551bc58351808fe71069 Signed-off-by: Susant Palai <spalai@redhat.com>
*	index: prevent arbitrary file creation outside entry-changes folder	Ravishankar N	2018-11-06	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Patch in master: https://review.gluster.org/#/c/glusterfs/+/21534/ A compromised client can set arbitrary values for the GF_XATTROP_ENTRY_IN_KEY and GF_XATTROP_ENTRY_OUT_KEY during xattrop fop. These values are consumed by index as a filename to be created/deleted according to the key. Thus it is possible to create/delete random files even outside the gluster volume boundary. Fix: Index expects the filename to be a basename, i.e. it must not contain any pathname components like "/" or "../". Enforce this. Fixes: CVE-2018-14654 Fixes: bz#1646200 Change-Id: I35f2a39257b5917d17283d0a4f575b92f783f143 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	glusterd: ensure volinfo->caps is set to correct value	Sanju Rakonde	2018-11-05	2	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the commit febf5ed4848, during the volume create op, we are setting volinfo->caps to 0, only if any of the bricks belong to the same node and brickinfo->vg[0] is null. Previously, we used to set volinfo->caps to 0, when either brick doesn't belong to the same node or brickinfo->vg[0] is null. With this patch, we set volinfo->caps to 0, when either brick doesn't belong to the same node or brickinfo->vg[0] is null. (as we do earlier without commit febf5ed4848). > BUG: bz#1635820 > Change-Id: I00a97415786b775fb088ac45566ad52b402f1a49 > Signed-off-by: Sanju Rakonde <srakonde@redhat.com> fixes: bz#1643052 Change-Id: I00a97415786b775fb088ac45566ad52b402f1a49 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	geo-rep/scripts: Fix traceback in gluster-mountbroker	Kotresh HR	2018-11-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When 'gluster-mountbroker status' was issued, it crashes in a corner case with 'str object has not attribute get'. Fixed the same. Backport of: > Patch: https://review.gluster.org/21507 > fixes: bz#1643929 > Signed-off-by: Kotresh HR <khiremat@redhat.com> > Change-Id: Iaf1a937ed0136b3b2058230c75fa89a215d8a5eb (cherry picked from commit 5987b3388126a3c5e77481913cbaa4142117d19a) fixes: bz#1644516 Signed-off-by: Kotresh HR <khiremat@redhat.com> Change-Id: Iaf1a937ed0136b3b2058230c75fa89a215d8a5eb
*	posix/ctime: Avoid log flood in posix_update_utime_in_mdata	Kotresh HR	2018-11-05	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \| \|	posix_update_utime_in_mdata() unconditionally logs an error if consistent time attributes features is not enabled. This log does not add any value, prints an incorrect errno & floods the log file. Hence nuking this log message in this patch. fixes: bz#1644524 Change-Id: I01736d2ed48d14f12ccd8a808521f59145e42ccb Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	geo-rep: Add more intelligence to automatic error handling	Kotresh HR	2018-11-05	1	-22/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Geo-rep's automatic error handling does gfid conflict resolution. But if there are ENOENT errors because the parent is not synced to slave, it doesn' handle them. This patch adds the intelligence to create missing parent directories on slave. It can create the missing directories upto the depth of 10. Backport of: > Patch: https://review.gluster.org/21498 > BUG: 1643402 > Change-Id: Ic97ed1fa5899c087e404d559e04f7963ed7bb54c > Signed-off-by: Kotresh HR <khiremat@redhat.com> (cherry picked from commit 19775e0445411cca9ddd9d294fd54d0b6fbe6a03) fixes: bz#1644518 Change-Id: Ic97ed1fa5899c087e404d559e04f7963ed7bb54c Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	features/locks: add buffer overflow checks in pl_getxattr	Ravishankar N	2018-11-02	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: A compromised client can send a variable length buffer value for the GF_XATTR_CLRLK_CMD virtual xattr. If the length is greater than the size of the "key" used to send the response back, locks xlator can segfault when it tries to do a dict_set because of the buffer overflow in strncpy of pl_getxattr(). Fix: Perform size checks while forming the 'key'. Note: This fix is already there in the master branch upstream as a part of the commit 052849983e51a061d7fb2c3ffd74fa78bb257084 (https://review.gluster.org/#/c/glusterfs/+/20933/) This patch just picks the code change needed to fix the vulnerability. Fixes: CVE-2018-14652 fixes: bz#1645363 Change-Id: I101693e91f9ea2bd26cef6c0b7d82527fefcb3e2 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	afr/lease: Read child nodes from lease structure	root	2018-10-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	For lease operation, we allocate and store child nodes data in lease structure. Use the same in afr_lease_cbk() while checking for the quorum. Change-Id: If1fdd5a0798888afd39ad3df57d96487baf9d1e6 fixes: bz#1644474 Signed-off-by: Soumya Koduri <skoduri@redhat.com>
*	geo-rep: Fix issue in gfid-conflict-resolution	Kotresh HR	2018-10-30	1	-17/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: During gfid-conflict-resolution, geo-rep crashes with 'ValueError: list.remove(x): x not in list' Cause and Analysis: During gfid-conflict-resolution, the entry blob is passed back to master along with additional information to verify it's integrity. If everything looks fine, the entry creation is ignored and is deleted from the original list. But it is crashing during removal of entry from the list saying entry not in list. The reason is that the stat information in the entry blob was modified and sent back to master if present. Fix: Send back the correct stat information for gfid-conflict-resolution. Backport of: > BUG: 1642865 > Change-Id: I47a6aa60b2a495465aa9314eebcb4085f0b1c4fd > Signed-off-by: Kotresh HR <khiremat@redhat.com> (cherry picked from commit ff18121945bff394f3234e9f1a9d61ac97d4d493) fixes: bz#1644163 Change-Id: I47a6aa60b2a495465aa9314eebcb4085f0b1c4fd Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	tests: correction in tests/bugs/glusterd/optimized-basic-testcases-in-cluster.t	Sanju Rakonde	2018-10-25	1	-9/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Patch https://review.gluster.org/#/c/glusterfs/+/19135/ has optimised glusterd test cases by clubbing the similar test cases into a single test case. https://review.gluster.org/#/c/glusterfs/+/19135/15/tests/bugs/glusterd/bug-1293414-import-brickinfo-uuid.t test case has been deleted and added as a part of tests/bugs/glusterd/optimized-basic-testcases-in-cluster.t In the original test case, we create a volume with two bricks, each on a separate node(N1 & N2). From another node in cluster(N3), we try to detach a node which is hosting bricks. It fails. In the new test, we created volume with single brick on N1. and from another node in cluster, we tried to detach N1. we expect peer detach to fail, but peer detach was success as the node is hosting all the bricks of volume. Now, changing the new test case to cover the original test case scenario. Please refer https://bugzilla.redhat.com/show_bug.cgi?id=1642597#c1 to understand why the new test case is not failing in centos-regression. > BUG: bz#1642597 > Change-Id: Ifda12b5677143095f263fbb97a6808573f513234 > Signed-off-by: Sanju Rakonde <srakonde@redhat.com> (cherry picked from commit 0ca6773eaf5aeb507ebc72d2c2f61902eeff414c) fixes: bz#1643075 Change-Id: Ifda12b5677143095f263fbb97a6808573f513234 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	leases:Mark the fop conflicting if lease_id not set	Soumya Koduri	2018-10-24	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Glusterfs leases expects lease_id to be set and sent for each fop to determine conflict resolution with the existing lease. Incase if not set (most likely if there is an older client in a mixed cluster), it makes sense to consider it as conflicitng fop and recall the lease. Also fixed the return status check for __remove_lease(), wherein non-negative value is considered as success case. This is backport of below mainline patch - https://review.gluster.org/21458 Change-Id: I5bcfba4f7c71a5af7cdedeb03436d0b818e85783 updates: #350 Signed-off-by: Soumya Koduri <skoduri@redhat.com>
*	gfapi: Bug fixes in leases processing code-path	Soumya Koduri	2018-10-22	7	-31/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes below issues in gfapi lease code-path * 'glfs_setfsleasid' should allow NULL input to be able to reset leaseid * Applications should be allowed to (un)register for upcall notifications of type GLFS_EVENT_LEASE_RECALL * APIs added to read contents of GLFS_EVENT_LEASE_RECALL argument which is of type "struct glfs_upcall_lease" This is backport of below mainline path - https://review.gluster.org/#/c/glusterfs/+/21391 Change-Id: I3320ddf235cc82fad561e13b9457ebd64db6c76b updates: #350 Signed-off-by: Soumya Koduri <skoduri@redhat.com>
*	tests: check for shd up status in bug-1637802-arbiter-stale-data-heal-lock.t	Ravishankar N	2018-10-22	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: https://review.gluster.org/#/c/glusterfs/+/21427/ seems to be failing this .t spuriously. On checking one of the failure logs, I see: 22:05:44 Launching heal operation to perform index self heal on volume patchy has been unsuccessful: 22:05:44 Self-heal daemon is not running. Check self-heal daemon log file. 22:05:44 not ok 20 , LINENUM:38 In glusterd log: [2018-10-18 22:05:44.298832] E [MSGID: 106301] [glusterd-syncop.c:1352:gd_stage_op_phase] 0-management: Staging of operation 'Volume Heal' failed on localhost : Self-heal daemon is not running. Check self-heal daemon log file But the tests which preceed this check whether via a statedump if the shd is conected to the bricks, and they have succeeded and even started healing. From glustershd.log: [2018-10-18 22:05:40.975268] I [MSGID: 108026] [afr-self-heal-common.c:1732:afr_log_selfheal] 0-patchy-replicate-0: Completed data selfheal on 3b83d2dd-4cf2-4ea3-a33e-4275be40f440. sources=[0] 1 sinks=2 So the only reason I can see launching heal via cli failing is a race where shd has been spawned but glusterd has not yet updated in-memory that it is up, and hence failing the CLI. Fix: Check for shd up status before launching heal via CLI Change-Id: Ic88abf14ad3d51c89cb438db601fae4df179e8f4 fixes: bz#1641761 Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 3dea105556130abd4da0fd3f8f2c523ac52398d1)
*	afr: prevent winding inodelks twice for arbiter volumes	Ravishankar N	2018-10-10	2	-1/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of https://review.gluster.org/#/c/glusterfs/+/21380/ Problem: In an arbiter volume, if there is a pending data heal of a file only on arbiter brick, self-heal takes inodelks twice due to a code-bug but unlocks it only once, leaving behind a stale lock on the brick. This causes the next write to the file to hang. Fix: Fix the code-bug to take lock only once. This bug was introduced master with commit eb472d82a083883335bc494b87ea175ac43471ff Thanks to Pranith Kumar K <pkarampu@redhat.com> for finding the RCA. fixes: bz#1637953 Change-Id: I15ad969e10a6a3c4bd255e2948b6be6dcddc61e1 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	afr: fix incorrect reporting of directory split-brain	Ravishankar N	2018-10-05	9	-17/+85
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of https://review.gluster.org/#/c/glusterfs/+/21135/ Problem: When a directory has dirty xattrs due to failed post-ops or when replace/reset brick is performed, AFR does a conservative merge as expected, but heal-info reports it as split-brain because there are no clear sources. Fix: Modify pending flag to contain information about pending heals and split-brains. For directories, if spit-brain flag is not set,just show them as needing heal and not being in split-brain. Change-Id: I09ef821f6887c87d315ae99e6b1de05103cd9383 fixes: bz#1633634 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	glusterd: make sure that brickinfo->uuid is not null	Sanju Rakonde	2018-10-05	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: After an upgrade from the version where shared-brick-count option is not present to a version which introduced this option causes issue at the mount point i.e, size of the volume at mount point will be reduced by shared-brick-count value times. Cause: shared-brick-count is equal to the number of bricks that are sharing the file system. gd_set_shared_brick_count() calculates the shared-brick-count value based on uuid of the node and fsid of the brick. https://review.gluster.org/#/c/glusterfs/+/19484 handles setting of fsid properly during an upgrade path. This patch assumed that when the code path is reached, brickinfo->uuid is non-null. But brickinfo->uuid is null for all the bricks, as the uuid is null https://review.gluster.org/#/c/glusterfs/+/19484 couldn't reached the code path to set the fsid for bricks. So, we had fsid as 0 for all bricks, which resulted in gd_set_shared_brick_count() to calculate shared-brick-count in a wrong way. i.e, the logic written in gd_set_shared_brick_count() didn't work as expected since fsid is 0. Solution: Before control reaches the code path written by https://review.gluster.org/#/c/glusterfs/+/19484, adding a check for whether brickinfo->uuid is null and if brickinfo->uuid is having null value, calling glusterd_resolve_brick will set the brickinfo->uuid to a proper value. When we have proper uuid, fsid for the bricks will be set properly and shared-brick-count value will be caluculated correctly. Please take a look at the bug https://bugzilla.redhat.com/show_bug.cgi?id=1632889 for complete RCA Steps followed to test the fix: 1. Created a 2 node cluster, the cluster is running with binary which doesn't have shared-brick-count option 2. Created a 2x(2+1) volume and started it 3. Mouted the volume, checked size of volume using df 4. Upgrade to a version where shared-brick-count is introduced (upgraded the nodes one by one i.e, stop the glusterd, upgrade the node and start the glusterd). 5. after upgrading both the nodes, bumped up the cluster.op-version 6. At mount point, df shows the correct size for volume. > BUG: 1632889 > Change-Id: Ib9f078aafb15e899a01086eae113270657ea916b > Signed-off-by: Sanju Rakonde <srakonde@redhat.com> (cherry picked from commit f1e9b878ce2067db83a0baa5f384eda87287719d) fixes: bz#1633479 Change-Id: Ib9f078aafb15e899a01086eae113270657ea916b Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	cluster/afr: Make data eager-lock decision based on number of locks	Pranith Kumar K	2018-10-05	3	-10/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For both Virt and block workloads the file is opened multiple times leading to dynamically setting eager-lock to off for the workload. Instead of depending on the number-of-open-fds, if we change the logic to depend on number of inodelks, then it will give better performance than the earlier logic. When there is an eager-lock and number of inodelks is more than 1 we know that there is a conflicting lock, so depend on that information to decide whether to keep the current transaction go through delayed-post-op or not. Locks xlator doesn't have implementation to query number of locks in fxattrop in releases older than 3.10 so to keep things backward compatible in 3.12, data transactions will use new logic where as fxattrop transactions will use old logic. I am planning to send one more patch which makes metadata domain locks also depend on inodelk-count Profile info for a dd of 500MB to a file with another fd opened on the file using exec 250>filename Without this patch: 0.14 67.41 us 16.72 us 3870.82 us 892 FINODELK 0.59 279.87 us 95.71 us 2085.89 us 898 FXATTROP 3.46 366.43 us 81.75 us 6952.79 us 4000 WRITE 95.79 148733.99 us 50568.12 us 919127.86 us 273 FSYNC With this patch: 0.00 51.01 us 38.07 us 80.16 us 4 FINODELK 0.00 235.43 us 235.43 us 235.43 us 1 TRUNCATE 0.00 125.07 us 56.80 us 193.33 us 2 GETXATTR 0.00 135.86 us 62.13 us 209.59 us 2 INODELK 0.00 197.88 us 155.39 us 253.90 us 4 FXATTROP 0.00 450.59 us 394.28 us 506.89 us 2 XATTROP 0.00 56.96 us 19.06 us 406.59 us 23 FLUSH 37.81 273648.93 us 48.43 us 6017657.05 us 44 LOOKUP 62.18 4951.86 us 93.80 us 1143154.75 us 3999 WRITE postgresql benchmark performance changed from ~1130 TPS to ~2300TPS randio fio job inside Ovirt based VM went from ~600IOPs to ~2000IOPS fixes bz#1635980 Change-Id: If7f7388d2f08cf7f17ca517a4ea222560661dc36 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	cluster/afr: Batch writes in same lock even when multiple fds are open	Pranith Kumar K	2018-10-05	1	-9/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: When eager-lock is disabled because of multiple-fds opened and app writes come on conflicting regions, the number of locks grows very fast leading to all the CPU being spent just in locking and unlocking by traversing huge queues in locks xlator for granting locks. Fix: Reduce the number of locks in transit by bundling the writes in the same lock and disable delayed piggy-pack when we learn that multiple fds are open on the file. This will reduce the size of queues in the locks xlator. This also reduces the number of network calls like inodelk/fxattrop. Please note that this problem can still happen if eager-lock is disabled as the writes will not be bundled in the same lock. fixes bz#1635979 Change-Id: I8fd1cf229aed54ce5abd4e6226351a039924dd91 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	mgmt/glusterd: use proper path to the volfile	Raghavendra Bhat	2018-10-04	3	-8/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Till now, glusterd was generating the volfile path for the snapshot volume's bricks like this. /snaps/<snap name>/<brick volfile> But in reality, the path to the brick volfile for a snapshot volume is /snaps/<snap name>/<snap volume name>/<brick volfile> The above workaround was used to distinguish between a mount command used to mount the snapshot volume, and a brick of the snapshot volume, so that based on what is actually happening, glusterd can return the proper volfile (client volfile for the former and the brick volfile for the latter). But, this was causing problems for snapshot restore when brick multiplexing is enabled. Because, with brick multiplexing, it tries to find the volfile and sends GETSPEC rpc call to glusterd using the 2nd style of path i.e. /snaps/<snap name>/<snap volume name>/<brick volfile> So, when the snapshot brick (which is multiplexed) sends a GETSPEC rpc request to glusterd for obtaining the brick volume file, glusterd was returning the client volume file of the snapshot volume instead of the brick volume file. Change-Id: I28b2dfa5d9b379fe943db92c2fdfea879a6a594e fixes: bz#1636218 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
*	georep: fix hard-coded paths in gsyncd.conf.in	Kaleb S. KEITHLEY	2018-09-23	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is part of the reason why we use autoconf (i.e. configure). For an ordinary clone+autogen.sh+configure SBIN_DIR is /usr/local/sbin; for an rpm or dpkg build it will be /usr/sbin. I wonder how many more are lurking in our sources? /usr/libexec is one that frequently bites us on Debian and Ubuntu, which don't have /usr/libexec. (But it's all Linux, right?) See https://bugzilla.redhat.com/show_bug.cgi?id=1601532 Reported-by: lohmaier+rhbz@gmail.com Change-Id: I6523894416cc06236ea1f99529efd36e957bd98e updates: bz#1632013 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
*	doc: Added release notes for release 4.1.5v4.1.5	ShyamsundarR	2018-09-21	1	-0/+27
\| \| \| \| \| \|	Fixes: bz#1630186 Change-Id: Ie5ea9b69fea22eab65d7e85215f8538b617da456 Signed-off-by: ShyamsundarR <srangana@redhat.com>
*	cluster/afr: Delegate name-heal when possible	Pranith Kumar K	2018-09-21	4	-27/+205
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: When name-self-heal is triggered on the mount, it blocks lookup until name-self-heal completes. But that can lead to hangs when lot of clients are accessing a directory which needs name heal and all of them trigger heals waiting for other clients to complete heal. Fix: When a name-heal is needed but quorum number of names have the file and pending xattrs exist on the parent, then better to delegate the heal to SHD which will be completed as part of entry-heal of the parent directory. We could also do the same for quorum-number of names not present but we don't have any known use-case where this is a frequent occurrence so not changing that part at the moment. When there is a gfid mismatch or missing gfid it is important to complete the heal so that next rename doesn't assume everything is fine and perform a rename etc fixes bz#1625575 Change-Id: I8b002c85dffc6eb6f2833e742684a233daefeb2c Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	cluster/afr: Delegate metadata heal with pending xattrs to SHD	Pranith Kumar K	2018-09-21	5	-51/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: When metadata-self-heal is triggered on the mount, it blocks lookup until metadata-self-heal completes. But that can lead to hangs when lot of clients are accessing a directory which needs metadata heal and all of them trigger heals waiting for other clients to complete heal. Fix: Only when the heal is needed but the pending xattrs are not set, trigger metadata heal that could block lookup. This is the only case where different clients may give different metadata to the clients without heals, which should be avoided. Updates bz#1625575 Change-Id: I6089e9fda0770a83fb287941b229c882711f4e66 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	libgfchangelog: Fix changelog history API	Kotresh HR	2018-09-21	4	-5/+325
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: If requested start time and end time doesn't fall into first HTIME file, then history API fails even though continuous changelogs are avaiable for the requested range in other HTIME files. This is induced by changelog disable and enable which creates fresh HTIME index file. Cause and Analysis: Each HTIME index file represents the availability of continuous changelogs. If changelog is disabled and enabled, a new HTIME index file is created represents non availability of continuous changelogs. So as long as the requested start and end falls into single HTIME index file and not across, history API should succeed. But History API checks for the changelogs only in first HTIME index file and errors out if not available. Fix: Check in all HTIME index files for availability of continuous changelogs for requested change. Backport of: > Patch: https://review.gluster.org/21016/ > BUG: bz#1622549 > Change-Id: I80eeceb5afbd1b89f86a9dc4c320e161907d3559 > Signed-off-by: Kotresh HR <khiremat@redhat.com> (cherry picked from commit 35aa67001c8fac99b040fbc61f36ef4f1b1590ac) fixes: bz#1630141 Change-Id: I80eeceb5afbd1b89f86a9dc4c320e161907d3559 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	geo-rep: Fix issues related config set	Kotresh HR	2018-09-21	8	-60/+251
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. '--ignore-mising-args' option for rsync is not being used even though the rsync version is greater than 3.1.0. Fixed the same. 2. '--existing' option for rsync is also not being used. Fixed the same. 3. geo-rep config fails to set rsync-options as the value contains '--'. Interestingly, python argsparse treats the value with '--' (e.g., --ignore-missing-args) as option. But when passed with something like --value=--ignore-missing-args, it succeeds. Fixed the same. Backport of: > Patch: https://review.gluster.org/21191 > Change-Id: Iaeb838acaff1c2920fee9c7f920c99edce13a0a1 > Signed-off-by: Kotresh HR <khiremat@redhat.com> > BUG: 1629561 Change-Id: Iaeb838acaff1c2920fee9c7f920c99edce13a0a1 Signed-off-by: Kotresh HR <khiremat@redhat.com> fixes: bz#1630140
*	geo-rep: Fix deadlock during worker start	Kotresh HR	2018-09-21	2	-4/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Analysis: Monitor process spawns monitor threads (one per brick). Each monitor thread, forks worker and agent processes. Each monitor thread, while intializing, updates the monitor status file. It is synchronized using flock. The race is that, some thread can fork worker while other thread opened the status file resulting in holding the reference of fd in worker process. Cause: flock gets unlocked either by specifically unlocking it or by closing all duplicate fds referring to the file. The code was relying on fd close, hence a reference in worker/agent process by fork could cause the deadlock. Fix: 1. flock is unlocked specifically. 2. Also made sure to update status file in approriate places so that the reference is not leaked to worker/agent process. With this fix, both the deadlock and possible fd leaks is solved. Backport of: > Patch: https://review.gluster.org/20704 > BUG: bz#1614799 > Change-Id: I0d1ce93072dab07d0dbcc7e779287368cd9f093d > Signed-off-by: Kotresh HR <khiremat@redhat.com> fixes: bz#1630145 Change-Id: I0d1ce93072dab07d0dbcc7e779287368cd9f093d Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	geo-rep/hook-script: Fix ssh/scp options	Kotresh HR	2018-09-21	5	-24/+84
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Always use ssh and scp with "-oPasswordAuthentication=no" and "-oStrictHostKeyChecking=no" options. It might hang the post script otherwise leading geo-rep setup failure Also increased geo-rep timeout. Occasionally, it's taking more time to reach Active/Passive status. Especially, the first start after create. Backport of: > Patch: https://review.gluster.org/20601 > BUG: bz#1610405 > Change-Id: I9560d64dbe0edf5db73446a9fc97dda19b88d233 > Signed-off-by: Kotresh HR <khiremat@redhat.com> fixes: bz#1630144 Change-Id: I9560d64dbe0edf5db73446a9fc97dda19b88d233 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	rpc: handle EAGAIN when SSL_ERROR_SYSCALL is returned	Milind Changire	2018-09-21	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: A return value of ENODATA was forcibly returned in the case where SSL_get_error(r) returned SSL_ERROR_SYSCALL. Sometimes SSL_ERROR_SYSCALL is a transient error which is identified by setting errno to EAGAIN. EAGAIN is not a fatal error and indicates that the syscall needs to be retried. Solution: Bubble up the errno in case SSL_get_error(r) returns SSL_ERROR_SYSCALL and let the upper layers handle it appropriately. fixes: bz#1601356 Change-Id: I76eff278378930ee79abbf9fa267a7e77356eed6 Signed-off-by: Milind Changire <mchangir@redhat.com>
*	storage/posix: Avoid log flood in posix_set_parent_ctime()	Vijay Bellur	2018-09-17	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	posix_set_parent_ctime() unconditionally logs an error if consistent time attributes is not enabled. This log does not add any value, prints an incorrect errno & floods the log file. Hence nuking this log message in this patch. Backport of : > Patch: https://review.gluster.org/20547/ > Change-Id: I82a78f2f8ce5ab518f8cdf6d9086a97049712f75 > BUG: 1607049 > Signed-off-by: Vijay Bellur <vbellur@redhat.com> (cherry picked from commit e0df887ba044ce92e9a2822be9261d0f712b02bd) Change-Id: I82a78f2f8ce5ab518f8cdf6d9086a97049712f75 fixes: bz#1629548 Signed-off-by: Vijay Bellur <vbellur@redhat.com>
*	doc: Release notes for v4.1.4v4.1.4	Jiffin Tony Thottan	2018-09-06	1	-0/+41
\| \| \| \| \| \|	Change-Id: Idfce8b9ec79303b92045e68ab98765f7e2f98940 fixes: bz#1623161 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com>
*	posix: disable open/read/write on special files	Amar Tumballi	2018-09-06	1	-0/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the file system, the responsibility w.r.to the block and char device files is related to only support for 'creating' them (using mknod(2)). Once the device files are created, the read/write syscalls for the specific devices are handled by the device driver registered for the specific major number, and depending on the minor number, it knows where to read from. Hence, we are at risk of reading contents from devices which are handled by the host kernel on server nodes. By disabling open/read/write on the device file, we would be safe with the bypass one can achieve from client side (using gfapi) Fixes: bz#1625096 Change-Id: I48c776b0af1cbd2a5240862826d3d8918601e47f Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	protocol: don't use alloca	Amar Tumballi	2018-09-06	2	-103/+58
\| \| \| \| \| \| \| \| \| \| \| \|	current implementation of alloca can cause issues when strings larger than the allocated buffer is passed to the xdr. Hence it makes sense to allow XDR decode functions to deal with memory allocations, which we can free later. Fixes: bz#1625097 Change-Id: I3a05553f5702de9575c244649ca0e5ac9abaac94 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	io-stats: dump io-stats info in /var/run/gluster	Amar Tumballi	2018-09-06	2	-15/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It wouldn't make sense to allow iostats file to be written in any directory. While the formating makes sure we try to append io-stats-name for the file, so overwriting existing file is slim, but in any case it makes sense to restrict dumping to one directory. Below are the sample commands, and files created for the corresponding values: $ setfattr -n trusted.io-stats-dump -v file-for-dump $M0 In this case, the file would be in /var/run/gluster/file-for-dump $ setfattr -n trusted.io-stats-dump -v /dir1/dir2/file-for-dump $M0 In this case, then the dump file is in /var/run/gluster/dir1-dir2-file-for-dump Note that the value passed for this virtual xattr would be treated as a file, and even if the value has '/' in it, it would be changed to '-' for sanity. Fixes: bz#1625106 Change-Id: Id9ae6a40a190b8937c51662e6e1c2a0f6c86a0e0 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	server-protocol: don't allow '../' path in 'name'	Amar Tumballi	2018-09-06	2	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \|	This will prevent any arbitrary file creation through glusterfs by modifying the client bits. Also check for the similar flaw inside posix too, so we prevent any changes in layers in-between. Fixes: bz#1625095 Signed-off-by: Amar Tumballi <amarts@redhat.com> Change-Id: Id9fe0ef6e86459e8ed85ab947d977f058c5ae06e
*	dict: handle negative key/value length while unserialize	Amar Tumballi	2018-09-06	1	-1/+2
\| \| \| \| \| \|	Fixes: bz#1625089 Change-Id: Ie56df0da46c242846a1ba51ccb9e011af118b119 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	posix: remove not supported get/set content	Amar Tumballi	2018-09-06	4	-192/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	getting and setting a file's content using extended attribute worked great as a GET/PUT alternative when an object storage is supported on top of Gluster. But it needs application changes, and also, it skips some caching layers. It is not used over years, and not supported any more. Removing the dead code. Fixes: bz#1625102 Change-Id: Ide3b3f1f644f6ca58558bbe45561f346f96b95b7 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	storage/posix: Increment trusted.pgfid in posix_mknod	N Balachandran	2018-08-29	2	-4/+71
\| \| \| \| \| \| \| \| \| \|	The value of trusted.pgfid.xx was always set to 1 in posix_mknod. This is incorrect if posix_mknod calls posix_create_link_if_gfid_exists. Change-Id: Ibe87ca6f155846b9a7c7abbfb1eb8b6a99a5eb68 fixes: bz#1623317 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	doc: Release notes for v4.1.3v4.1.3	ShyamsundarR	2018-08-27	1	-0/+36
\| \| \| \| \| \|	Change-Id: Ia944a2353f0a89b61f66bde7f270489dff3793b4 fixes: bz#1607945 Signed-off-by: ShyamsundarR <srangana@redhat.com>
*	gfapi : Handle the path == "" glfs_resolve_at	Jiffin Tony Thottan	2018-08-22	1	-7/+10
\| \| \| \| \| \| \| \| \| \| \| \|	Currently there is no check for path = "" in glfs_resolve_at. So if application sends an empty path, then the function resolves into the parent inode which is incorrect. Plus modified possible of "path" with "origpath" in the same function. Change-Id: Ie5ff9ce4b771607b7dbb3fe00704fe670421792a fixes: bz#1618347 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> (cherry picked from commit febee007bb1a99d65300630c2a98cbb642b1c8dc)
*	Bash integration script should namespace variables	Mark Mielke	2018-08-21	1	-20/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the original submitted script, it looks like there was effort put into namespacing all global variables. However a few mistakes remained. GLUSTER_TOP_SUBOPTIONSx were defined, but TOP_SUBOPTIONSx were referenced. This was likely an unrecognized defect in the original code submission? These are now corrected to refer to GLUSTER_TOP_SUBOPTIONSx. FINAL_LIST, LIST, and TOP were leaked into all Bash shells and used by the command completion functions. The most problematic of these was TOP, which was declared with "-i" making it an integer. This cause other code which used TOP to define a path to fail like this: $ bash $ TOP=/abc bash: /abc: syntax error: operand expected (error token is "/abc") These are now qualified as GLUSTER_FINAL_LIST, GLUSTER_LIST, and GLUSTER_TOP to reduce impact on scripts that might choose to use these extremely common variable names. > Change-Id: Ic96eda8efd1f3238bbade6c6ddb69118e8d82158 > Signed-off-by: Mark Mielke <mark.mielke@gmail.com> (cherry picked from commit 89545e745e4075845c18078be67a31dea93a4e88) Change-Id: Ic96eda8efd1f3238bbade6c6ddb69118e8d82158 Fixes: bz#1425326 Signed-off-by: Mark Mielke <mark.mielke@gmail.com>
*	dht: Delete MDS internal xattr from dict in dht_getxattr_cbk	Mohit Agrawal	2018-08-16	2	-31/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: At the time of fetching xattr to heal xattr by afr it is not able to fetch xattr because posix_getxattr has a check to ignore if xattr name is MDS Solution: To ignore same xattr update a check in dht_getxattr_cbk instead of having a check in posix_getxattr Backport of: > BUG: 1584098 > Change-Id: I86cd2b2ee08488cb6c12f407694219d57c5361dc > Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Change-Id: I86cd2b2ee08488cb6c12f407694219d57c5361dc fixes: bz#1611116 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
*	geo-rep : fix possible crash	Sunny Kumar	2018-08-16	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem : In 'glusterd_verify_slave' while tokenizing error message we call 'strtok_r' and store return value in 'tmp' which can be NULL. We are passing this 'tmp' as 1st argument to 'strcmp' which will lead to segmentation fault. Solution : before calling 'strcmp' we should NULL check 'tmp'. Backport of: > Change-Id: Ifd3864b904afe6cd09d9e5a4b55c6d0578e22b9d > BUG: 1602121 > Signed-off-by: Sunny Kumar <sunkumar@redhat.com> Change-Id: Ifd3864b904afe6cd09d9e5a4b55c6d0578e22b9d fixes: bz#1611115 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
*	geo-rep: Fix issues with gfid conflict handling	Kotresh HR	2018-08-16	3	-64/+169
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. MKDIR/RMDIR is recorded on all bricks. So if one brick succeeds creating it, other bricks should ignore it. But this was not happening. The fix rename of directories in hybrid crawl, was trying to rename the directory to itself and in the process crashing with ENOENT if the directory is removed. 2. If file is created, deleted and a directory is created with same name, it was failing to sync. Again the issue is around the fix for rename of directories in hybrid crawl. Fixed the same. If the same case was done with hardlink present for the file, it was failing. This patch fixes that too. Backport of: > BUG: 1598884 > Change-Id: I6f3bca44e194e415a3d4de3b9d03cc8976439284 > Signed-off-by: Kotresh HR <khiremat@redhat.com> fixes: bz#1611114 Change-Id: I6f3bca44e194e415a3d4de3b9d03cc8976439284 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	geo-rep: Fix symlink rename syncing issue	Kotresh HR	2018-08-16	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Geo-rep sometimes fails to sync the rename of symlink if the I/O is as follows 1. touch file1 2. ln -s "./file1" sym_400 3. mv sym_400 renamed_sym_400 4. mkdir sym_400 The file 'renamed_sym_400' failed to sync to slave Cause: Assume there are three distribute subvolume (brick1, brick2, brick3). The changelogs are recorded as follows for above I/O pattern. Note that the MKDIR is recorded on all bricks. 1. brick1: ------- CREATE file1 SYMLINK sym_400 RENAME sym_400 renamed_sym_400 MKDIR sym_400 2. brick2: ------- MKDIR sym_400 3. brick3: ------- MKDIR sym_400 The operations on 'brick1' should be processed sequentially. But since MKDIR is recorded on all the bricks, The brick 'brick2/brick3' processed MKDIR first before 'brick1' causing out of order syncing and created directory sym_400 first. Now 'brick1' processed it's changelog. CREATE file1 -> succeeds SYMLINK sym_400 -> No longer present in master. Ignored RENAME sym_400 renamed_sym_400 While processing RENAME, if source('sym_400') doesn't present, destination('renamed_sym_400') is created. But geo-rep stats the name 'sym_400' to confirm source file's presence. In this race, since source name 'sym_400' is present as directory, it doesn't create destination. Hence RENAME is ignored. Fix: The fix is not rely only on stat of source name during RENAME. It should stat the name and if the name is present, gfid should be same. Only then it can conclude the presence of source. Backport of: > BUG: 1600405 > Change-Id: I9fbec4f13ca6a182798a7f81b356fe2003aff969 > Signed-off-by: Kotresh HR <khiremat@redhat.com> fixes: bz#1611113 Change-Id: I9fbec4f13ca6a182798a7f81b356fe2003aff969 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	geo-rep: Fix geo-rep for older versions of unshare	Kotresh HR	2018-08-16	3	-7/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Geo-rep mounts are private to worker. It uses mount namespace using unshare command to achieve the same. Well, the unshare command has to support '--propagation' option. So geo-rep breaks on the systems with older unshare version. The patch makes it fall back to lazy umount behaviour if the unshare does not support propagation option. Backport of: > BUG: 1589782 > Change-Id: Ia614f068aede288d63ac62fea4461b1865066054 > Signed-off-by: Kotresh HR <khiremat@redhat.com> fixes: bz#1611111 Change-Id: Ia614f068aede288d63ac62fea4461b1865066054 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	glusterd: memory leak in geo-rep status	Sanju Rakonde	2018-08-16	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \|	Backport of: > BUG: 1580352 > Change-Id: I9648e73090f5a2edbac663a6fb49acdb702cdc49 > Signed-off-by: Sanju Rakonde <srakonde@redhat.com> fixes: bz#1611110 Change-Id: I9648e73090f5a2edbac663a6fb49acdb702cdc49 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	geo-rep/scheduler: Fix crash	Kotresh HR	2018-08-16	1	-35/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix crash where session_name is referenced before assignment. Well, this is a corner case where the geo-rep session exists and the status output doesn't show any rows. This might happen when glusterd is down or when the system is in inconsistent state w.r.t glusterd. Backpor of: > BUG: 1576179 > Change-Id: Iec1557e01b35068041b4b3c1aacee2bfa0e05873 > Signed-off-by: Kotresh HR <khiremat@redhat.com> fixes: bz#1611108 Change-Id: Iec1557e01b35068041b4b3c1aacee2bfa0e05873 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	glusterd/geo-rep: Fix glusterd crash	Kotresh HR	2018-08-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using strdump instead of gf_strdup crashes during free if mempool is being used. gf_free checks the magic number in the header which will not be taken care if strdup is used. Backport of: > BUG: 1576392 > Change-Id: Iab36496554b838a036af9d863e3f5fd07fd9780e > Signed-off-by: Kotresh HR <khiremat@redhat.com> fixes: bz#1611106 Change-Id: Iab36496554b838a036af9d863e3f5fd07fd9780e Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	geo-rep: Fix upgrade issue	Kotresh HR	2018-08-15	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Cause and Analysis: The last synced changelog for entry operations is marked in current version to avoid re-processing of already processed entry operations in a batch during crash/restart of geo-rep. This was not present in previous versoins. The marker is maintained in the dictionary with the key 'last_synced_entry' and dictionary is persisted into status file. So upgrading to current version in which the marker is present was failing with KeyError. Solution: Load the dictionary with default keys first which contains all the keys including latest ones and then load the values from status file instead of doing otherwise. Backport of: > BUG: 1575490 > Change-Id: Ic654e6f9a3c97f616761f1362f890352a2186fb4 > Signed-off-by: Kotresh HR <khiremat@redhat.com> (cherry picked from commit 23c1385b5f6f6103e820d15ecfe1df31940fdb45) fixes: bz#1611104 Change-Id: Ic654e6f9a3c97f616761f1362f890352a2186fb4 Signed-off-by: Kotresh HR <khiremat@redhat.com> (cherry picked from commit 23c1385b5f6f6103e820d15ecfe1df31940fdb45)