glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	cluster/ec: Change log level to DEBUG for lookup combine	Ashish Pandey	2018-11-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	As lookup is not a locked fop, we can not trust the data received in this to be same. Changing the log level to DEBUG in case lookup finds any difference. (cherry picked from commit 9be6bf3d90e3783b3ba559c93d41b933f8d53f03) Change-Id: I39499c44688a2455c7c6c69a798762d045d21b39 updates: bz#1644622 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
*	glusterd: ensure volinfo->caps is set to correct value.	Sanju Rakonde	2018-11-09	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the commit febf5ed4848, during the volume create op, we are setting volinfo->caps to 0, only if any of the bricks belong to the same node and brickinfo->vg[0] is null. Previously, we used to set volinfo->caps to 0, when either brick doesn't belong to the same node or brickinfo->vg[0] is null. With this patch, we set volinfo->caps to 0, when either brick doesn't belong to the same node or brickinfo->vg[0] is null. (as we do earlier without commit febf5ed4848). > BUG: bz#1635820 > Change-Id: I00a97415786b775fb088ac45566ad52b402f1a49 > Signed-off-by: Sanju Rakonde <srakonde@redhat.com> (cherry picked from commit aae1c402b74fd02ed2f6473b896f108d82aef8e3) fixes: bz#1647968 Change-Id: I00a97415786b775fb088ac45566ad52b402f1a49 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	all: fix the format string exceptions	Amar Tumballi	2018-11-09	27	-79/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, there are possibilities in few places, where a user-controlled (like filename, program parameter etc) string can be passed as 'fmt' for printf(), which can lead to segfault, if the user's string contains '%s', '%d' in it. While fixing it, makes sense to make the explicit check for such issues across the codebase, by making the format call properly. Fixes: CVE-2018-14661 Fixes: bz#1647666 Change-Id: Ib547293f2d9eb618594cbff0df3b9c800e88bde4 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	features/locks:Use pthread_mutex_unlock() instead of pthread_mutex_lock()	Vijay Bellur	2018-11-09	1	-1/+1
\| \| \| \| \| \| \| \| \|	Fixes CID 1396581 Change-Id: Ic04091b5783a75d8e1e605a9c1c28b77fea048d3 updates: bz#1647962 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Signed-off-by: Susant Palai <spalai@redhat.com>
*	lock: Do not allow meta-lock count to be more than one	Susant Palai	2018-11-09	1	-1/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the current scheme of glusterfs where lock migration is experimental, (ideally) the rebalance process which is migrating the file should request for a metalock. Hence, the metalock count should not be more than one for an inode. In future, if there is a need for meta-lock from other clients, this patch can be reverted. Since pl_metalk is called as part of setxattr operation, any client process(non-rebalance) residing outside trusted network can exhaust memory of the server node by issuing setxattr repetitively on the metalock key. The current patch makes sure that more than one metalock cannot be granted on an inode. Fixes CVE-2018-14660 updates: bz#1647962 Change-Id: Ie1e697766388718804a9551bc58351808fe71069 Signed-off-by: Susant Palai <spalai@redhat.com>
*	server: don't allow '/' in basename	Amar Tumballi	2018-11-08	2	-5/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Server stack needs to have all the sort of validation, assuming clients can be compromized. It is possible for a compromized client to send basenames with paths with '/', and with that create files without permission on server. By sanitizing the basename, and not allowing anything other than actual directory as the parent for any entry creation, we can mitigate the effects of clients not able to exploit the server. Fixes: CVE-2018-14651 Fixes: bz#1647663 Change-Id: I5dc0da0da2713452ff2b65ac2ddbccf1a267dc20 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	cluster/afr : Check for UP bricks before starting heal	Ashish Pandey	2018-11-08	3	-1/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Currently for replica volume, even if only one brick is UP SHD will keep crawling index entries even if it can not heal anything. In thin-arbiter volume which is also a replica 2 volume, this causes inode lock contention which in turn sends upcall to all the clients to release notify locks, even if it can not do anything for healing. This will slow down the client performance and kills the purpose of keeping in memory information about bad brick. Solution: Before starting heal or even crawling, check if sufficient number of children are UP and available to check and heal entries. (cherry picked from commit f73b4476b15f9d6d3dc3c8e20c9742aacd857f9f) Change-Id: I011c9da3b37cae275f791affd56b8f1c1ac9255d updates: bz#1644645 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
*	glusterd: allow shared-storage to use bricks under glusterd working directory	Sanju Rakonde	2018-11-08	6	-11/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With commit 44e4db, we are not allowing user to create a volume using glusterd's working directory as a brick or any sub directory under glusterd's working directory as a brick.This has broken shared-storage since the volume "gluster-shared-storage" is created using the bricks under glusterd's working directory. With this patch, we let the "gluster-shared-storage" volume to use bricks under glusterd's working directory. > BUG: bz#1647029 > Change-Id: Ifcbcf4576eea12cf46f199dea287b29bd3ec3bfd > Signed-off-by: Sanju Rakonde <srakonde@redhat.com> (cherry picked from commit bdb4ca184913c82ccf9552298f5d5b597794f2aa) fixes: bz#1647801 Change-Id: Ifcbcf4576eea12cf46f199dea287b29bd3ec3bfd Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	io-stats: prevent taking file dump on server side	Amar Tumballi	2018-11-08	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \|	By allowing clients taking dump in a file on brick process, we are allowing compromised clients to create io-stats dumps on server, which can exhaust all the available inodes. Fixes: CVE-2018-14659 Fixes: bz#1647665 Change-Id: I32bfde9d4fe646d819a45e627805b928cae2e1ca Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	glusterd-handshake: prevent a buffer overflow	Amar Tumballi	2018-11-08	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \|	as key size in xdr can be anything, it can be bigger than the 'NAME_MAX' allowed in the structure, which can allow for service denial attacks. Fixes: CVE-2018-14653 Fixes: bz#1647664 Change-Id: I2dc5e99af27ddf44c12c94b07e51adb8674cce80 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	protocol: remove the option 'verify-volfile-checksum'	Amar Tumballi	2018-11-08	3	-354/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	'getspec' operation is not used between 'client' and 'server' ever since we have off-loaded volfile management to glusterd, ie, at least 7 years. No reason to keep the dead code! The removed option had no meaning, as glusterd didn't provide a way to set (or unset) this option. So, no regression should be observed from any of the existing glusterfs deployment, supported or unsupported. Updates: CVE-2018-14653 Updates: bz#1647664 Change-Id: I4a2e0f673c5bcd4644976a61dbd2d37003a428eb Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	posix/ctime: Avoid log flood in posix_update_utime_in_mdata	Kotresh HR	2018-11-08	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	posix_update_utime_in_mdata() unconditionally logs an error if consistent time attributes features is not enabled. This log does not add any value, prints an incorrect errno & floods the log file. Hence nuking this log message in this patch. Backport of: > Patch: https://review.gluster.org/21520/ > BUG: 1644129 > Change-Id: I9a1f9e7ada3366d2830f18d81f16a1461040092e > Signed-off-by: Kotresh HR <khiremat@redhat.com> fixes: bz#1644526 Change-Id: I9a1f9e7ada3366d2830f18d81f16a1461040092e Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	afr/lease: Read child nodes from lease structure	root	2018-11-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	For lease operation, we allocate and store child nodes data in lease structure. Use the same in afr_lease_cbk() while checking for the quorum. Change-Id: If1fdd5a0798888afd39ad3df57d96487baf9d1e6 updates: #350 Signed-off-by: Soumya Koduri <skoduri@redhat.com>
*	index: prevent arbitrary file creation outside entry-changes folder	Ravishankar N	2018-11-05	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Patch in master: https://review.gluster.org/#/c/glusterfs/+/21534/ Problem: A compromised client can set arbitrary values for the GF_XATTROP_ENTRY_IN_KEY and GF_XATTROP_ENTRY_OUT_KEY during xattrop fop. These values are consumed by index as a filename to be created/deleted according to the key. Thus it is possible to create/delete random files even outside the gluster volume boundary. Fix: Index expects the filename to be a basename, i.e. it must not contain any pathname components like "/" or "../". Enforce this. Fixes: CVE-2018-14654 Fixes: bz#1646204 Change-Id: I35f2a39257b5917d17283d0a4f575b92f783f143 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	leases:Mark the fop conflicting if lease_id not set	Soumya Koduri	2018-10-25	1	-5/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Glusterfs leases expects lease_id to be set and sent for each fop to determine conflict resolution with the existing lease. Incase if not set (most likely if there is an older client in a mixed cluster), it makes sense to consider it as conflicitng fop and recall the lease. Also fixed the return status check for __remove_lease(), wherein non-negative value is considered as success case. Change-Id: I5bcfba4f7c71a5af7cdedeb03436d0b818e85783 updates: #350 Signed-off-by: Soumya Koduri <skoduri@redhat.com> (cherry picked from commit cf5b13896d65b6916634976a3a5f61ddeefbc19c)
*	features/shard: Hold a ref on base inode when adding a shard to lru list	Krutika Dhananjay	2018-10-25	1	-14/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of: > Change-Id: Ic15ca41444dd04684a9458bd4a526b1d3e160499 > Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> > (cherry picked from commit e627977) > BUG: 1605056 In __shard_update_shards_inode_list(), previously shard translator was not holding a ref on the base inode whenever a shard was added to the lru list. But if the base shard is forgotten and destroyed either by fuse due to memory pressure or due to the file being deleted at some point by a different client with this client still containing stale shards in its lru list, the client would crash at the time of locking lru_base_inode->lock owing to illegal memory access. So now the base shard is ref'd into the inode ctx of every shard that is added to lru list until it gets lru'd out. The patch also handles the case where none of the shards associated with a file that is about to be deleted are part of the LRU list and where an unlink at the beginning of the operation destroys the base inode (because there are no refkeepers) and hence all of the shards that are about to be deleted will be resolved without the existence of a base shard in-memory. This, if not handled properly, could lead to a crash. Change-Id: Ic15ca41444dd04684a9458bd4a526b1d3e160499 updates: bz#1641440 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
*	storage/posix: Do not fail entry creation fops if gfid handle already exists	Krutika Dhananjay	2018-10-22	2	-6/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of: > Change-Id: I84a5e54d214b6c47ed85671a880bb1c767a29f4d > Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> > (cherry picked from commit 15c9976) > BUG: 1638453 PROBLEM: tests/bugs/shard/bug-1251824.t fails occasionally with EIO due to gfid mismatch across replicas on the same shard when dd is executed. CAUSE: Turns out this is due to a race between posix_mknod() and posix_lookup(). posix mknod does 3 operations, among other things: 1. creation of the entry itself under its parent directory 2. setting the gfid xattr on the file, and 3. creating the gfid link under .glusterfs. Consider a case where the thread doing posix_mknod() (initiated by shard) has executed steps 1 and 2 and is on its way to executing 3. And a parallel LOOKUP from another thread on noting that loc->inode->gfid is NULL, tries to perform gfid_heal where it attempts to create the gfid link under .glusterfs and succeeds. As a result, posix_gfid_set() through MKNOD (step 3) fails with EEXIST. In the older code, MKNOD under such conditions was NOT being treated as a failure. But commit e37ee6d changes this behavior by failing MKNOD, causing the entry creation to be undone in posix_mknod() (it's another matter that the stale gfid handle gets left behind if lookup has gone ahead and gfid-healed it). All of this happens on only one replica while on the other MKNOD succeeds. Now if a parallel write causes shard translator to send another MKNOD of the same shard (shortly after AFR releases entrylk from the first MKNOD), the file is created on the other replica too, although with a new gfid (since "gfid-req" that is passed now is a new UUID. This leads to a gfid-mismatch across the replicas. FIX: The solution is to not fail MKNOD (or any other entry fop for that matter that does posix_gfid_set()) if the .glusterfs link creation fails with EEXIST. Change-Id: I84a5e54d214b6c47ed85671a880bb1c767a29f4d fixes: bz#1641429 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
*	debug/io-stats: io stats filenames contain garbage	N Balachandran	2018-10-18	1	-4/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As dict_unserialize does not null terminate the value, using snprintf adds garbage characters to the buffer used to create the filename. The code also used this->name in the filename which will be the same for all bricks for a volume. The files were thus overwritten if a node contained multiple bricks for a volume. The code now uses the conf->unique instead if available. Change-Id: I2c72534b32634b87961d3b3f7d53c5f2ca2c068c fixes: bz#1640392 Signed-off-by: N Balachandran <nbalacha@redhat.com> (cherry picked from commit 219cd649fdbd7bfd6c2268a0a4f66bcc15918e31)
*	afr: fix incorrect reporting of directory split-brain	Ravishankar N	2018-10-11	7	-16/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of https://review.gluster.org/#/c/glusterfs/+/21135/ Problem: When a directory has dirty xattrs due to failed post-ops or when replace/reset brick is performed, AFR does a conservative merge as expected, but heal-info reports it as split-brain because there are no clear sources. Fix: Modify pending flag to contain information about pending heals and split-brains. For directories, if spit-brain flag is not set,just show them as needing heal and not being in split-brain. Change-Id: I09ef821f6887c87d315ae99e6b1de05103cd9383 fixes: bz#1638163 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	afr: prevent winding inodelks twice for arbiter volumes	Ravishankar N	2018-10-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of https://review.gluster.org/#/c/glusterfs/+/21380/ Problem: In an arbiter volume, if there is a pending data heal of a file only on arbiter brick, self-heal takes inodelks twice due to a code-bug but unlocks it only once, leaving behind a stale lock on the brick. This causes the next write to the file to hang. Fix: Fix the code-bug to take lock only once. This bug was introduced master with commit eb472d82a083883335bc494b87ea175ac43471ff Thanks to Pranith Kumar K <pkarampu@redhat.com> for finding the RCA. fixes: bz#1638159 Change-Id: I15ad969e10a6a3c4bd255e2948b6be6dcddc61e1 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	posix: Fix exporting default value for `export-statfs-size`	Aravinda VK	2018-10-08	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	No default value was specified for `export-statfs-size` in posix option table. Glusterd2 sets default value as `off` since the option type is `bool`. Posix treats `export-statfs-size=on` if not specified in volfile(That means default value is `on`) This patch sets default value as `on` > Change-Id: I5c6341183be9b62a78fdbc94621220f9284e1382 > updates: #302 > Signed-off-by: Aravinda VK <avishwan@redhat.com> (cherry picked from commit 07088d95e450f847722e5decbfa5da18a0dbd9de) Change-Id: Ib6b3accdb9921376c16040bd2312b99b0226a26f Fixes: bz#1636842 Signed-off-by: Aravinda VK <avishwan@redhat.com>
*	cluster/afr: Make data eager-lock decision based on number of locks	Pranith Kumar K	2018-10-05	3	-6/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For both Virt and block workloads the file is opened multiple times leading to dynamically setting eager-lock to off for the workload. Instead of depending on the number-of-open-fds, if we change the logic to depend on number of inodelks, then it will give better performance than the earlier logic. When there is an eager-lock and number of inodelks is more than 1 we know that there is a conflicting lock, so depend on that information to decide whether to keep the current transaction go through delayed-post-op or not. Locks xlator doesn't have implementation to query number of locks in fxattrop in releases older than 3.10 so to keep things backward compatible in 3.12, data transactions will use new logic where as fxattrop transactions will use old logic. I am planning to send one more patch which makes metadata domain locks also depend on inodelk-count Profile info for a dd of 500MB to a file with another fd opened on the file using exec 250>filename Without this patch: 0.14 67.41 us 16.72 us 3870.82 us 892 FINODELK 0.59 279.87 us 95.71 us 2085.89 us 898 FXATTROP 3.46 366.43 us 81.75 us 6952.79 us 4000 WRITE 95.79 148733.99 us 50568.12 us 919127.86 us 273 FSYNC With this patch: 0.00 51.01 us 38.07 us 80.16 us 4 FINODELK 0.00 235.43 us 235.43 us 235.43 us 1 TRUNCATE 0.00 125.07 us 56.80 us 193.33 us 2 GETXATTR 0.00 135.86 us 62.13 us 209.59 us 2 INODELK 0.00 197.88 us 155.39 us 253.90 us 4 FXATTROP 0.00 450.59 us 394.28 us 506.89 us 2 XATTROP 0.00 56.96 us 19.06 us 406.59 us 23 FLUSH 37.81 273648.93 us 48.43 us 6017657.05 us 44 LOOKUP 62.18 4951.86 us 93.80 us 1143154.75 us 3999 WRITE postgresql benchmark performance changed from ~1130 TPS to ~2300TPS randio fio job inside Ovirt based VM went from ~600IOPs to ~2000IOPS fixes bz#1635972 Change-Id: If7f7388d2f08cf7f17ca517a4ea222560661dc36 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	cluster/afr: Batch writes in same lock even when multiple fds are open	Pranith Kumar K	2018-10-05	1	-9/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: When eager-lock is disabled because of multiple-fds opened and app writes come on conflicting regions, the number of locks grows very fast leading to all the CPU being spent just in locking and unlocking by traversing huge queues in locks xlator for granting locks. Fix: Reduce the number of locks in transit by bundling the writes in the same lock and disable delayed piggy-pack when we learn that multiple fds are open on the file. This will reduce the size of queues in the locks xlator. This also reduces the number of network calls like inodelk/fxattrop. Please note that this problem can still happen if eager-lock is disabled as the writes will not be bundled in the same lock. fixes bz#1635975 Change-Id: I8fd1cf229aed54ce5abd4e6226351a039924dd91 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	mgmt/glusterd: use proper path to the volfile	Raghavendra Bhat	2018-10-05	3	-9/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Till now, glusterd was generating the volfile path for the snapshot volume's bricks like this. /snaps/<snap name>/<brick volfile> But in reality, the path to the brick volfile for a snapshot volume is /snaps/<snap name>/<snap volume name>/<brick volfile> The above workaround was used to distinguish between a mount command used to mount the snapshot volume, and a brick of the snapshot volume, so that based on what is actually happening, glusterd can return the proper volfile (client volfile for the former and the brick volfile for the latter). But, this was causing problems for snapshot restore when brick multiplexing is enabled. Because, with brick multiplexing, it tries to find the volfile and sends GETSPEC rpc call to glusterd using the 2nd style of path i.e. /snaps/<snap name>/<snap volume name>/<brick volfile> So, when the snapshot brick (which is multiplexed) sends a GETSPEC rpc request to glusterd for obtaining the brick volume file, glusterd was returning the client volume file of the snapshot volume instead of the brick volume file. Change-Id: I28b2dfa5d9b379fe943db92c2fdfea879a6a594e fixes: bz#1636162 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> (cherry picked from commit 83a89296a3d12a3fc2a643c0630be5ce659204ea)
*	mdcache: Fix asan reported potential heap buffer overflow	ShyamsundarR	2018-10-02	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The char pointer mdc_xattr_str in function mdc_xattr_list_populate is malloc'd and doing a strcat into a malloc'd region can overflow content allocated based on prior contents of the memory region. Added a NULL terimation to the malloc'd region to prevent the overflow, and treat it as an empty string. Change-Id: If0decab669551581230a8ede4c44c319ff04bac9 Updates: bz#1635373 Signed-off-by: ShyamsundarR <srangana@redhat.com> (cherry picked from commit d00a2a1b398346bbdc5ac9b3ba4b09fb1ce1e699)
*	ctime: Provide noatime option	Kotresh HR	2018-10-02	10	-9/+139
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Most of the applications are {c\|m}time dependant and very few are atime dependant. So provide noatime option to not update atime when ctime feature is enabled. Also this option has to be enabled with ctime feature to avoid unnecessary self heal. Since AFR/EC reads data from single subvolume, atime is only updated in one subvolume triggering self heal. Backport of: > Patch: https://review.gluster.org/21073 > BUG: 1593538 > Change-Id: I085fb33c882296545345f5df194cde7b6cbc337e > Signed-off-by: Kotresh HR <khiremat@redhat.com> (cherry picked from commit 89636be4c73b12de2e11c75d8e59527bb243f147) updates: bz#1633015 Change-Id: I085fb33c882296545345f5df194cde7b6cbc337e Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	glusterd: acquire lock to update volinfo structure	Sanju Rakonde	2018-09-27	3	-34/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: With commit cb0339f92, we are using a separate syntask for restart_bricks. There can be a situation where two threads are accessing the same volinfo structure at the same time and updating volinfo structure. This can lead volinfo to have inconsistent values and assertion failures because of unexpected values. Solution: While updating the volinfo structure, acquire a store_volinfo_lock, and release the lock only when the thread completed its critical section part. > BUG: bz#1627610 > Signed-off-by: Sanju Rakonde <srakonde@redhat.com> > Change-Id: I545e4e2368e3285d8f7aa28081ff4448abb72f5d (cherry picked from commit 484f417da945cf83afdbf136bb4817311862a8d2) fixes: bz#1633552 Change-Id: I545e4e2368e3285d8f7aa28081ff4448abb72f5d Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	glusterd: make sure that brickinfo->uuid is not null	Sanju Rakonde	2018-09-27	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: After an upgrade from the version where shared-brick-count option is not present to a version which introduced this option causes issue at the mount point i.e, size of the volume at mount point will be reduced by shared-brick-count value times. Cause: shared-brick-count is equal to the number of bricks that are sharing the file system. gd_set_shared_brick_count() calculates the shared-brick-count value based on uuid of the node and fsid of the brick. https://review.gluster.org/#/c/glusterfs/+/19484 handles setting of fsid properly during an upgrade path. This patch assumed that when the code path is reached, brickinfo->uuid is non-null. But brickinfo->uuid is null for all the bricks, as the uuid is null https://review.gluster.org/#/c/glusterfs/+/19484 couldn't reached the code path to set the fsid for bricks. So, we had fsid as 0 for all bricks, which resulted in gd_set_shared_brick_count() to calculate shared-brick-count in a wrong way. i.e, the logic written in gd_set_shared_brick_count() didn't work as expected since fsid is 0. Solution: Before control reaches the code path written by https://review.gluster.org/#/c/glusterfs/+/19484, adding a check for whether brickinfo->uuid is null and if brickinfo->uuid is having null value, calling glusterd_resolve_brick will set the brickinfo->uuid to a proper value. When we have proper uuid, fsid for the bricks will be set properly and shared-brick-count value will be caluculated correctly. Please take a look at the bug https://bugzilla.redhat.com/show_bug.cgi?id=1632889 for complete RCA Steps followed to test the fix: 1. Created a 2 node cluster, the cluster is running with binary which doesn't have shared-brick-count option 2. Created a 2x(2+1) volume and started it 3. Mouted the volume, checked size of volume using df 4. Upgrade to a version where shared-brick-count is introduced (upgraded the nodes one by one i.e, stop the glusterd, upgrade the node and start the glusterd). 5. after upgrading both the nodes, bumped up the cluster.op-version 6. At mount point, df shows the correct size for volume. > backport of https://review.gluster.org/#/c/glusterfs/+/21278/ > Change-Id: Ib9f078aafb15e899a01086eae113270657ea916b > Signed-off-by: Sanju Rakonde <srakonde@redhat.com> (cherry picked from commit f1e9b878ce2067db83a0baa5f384eda87287719d) fixes: bz#1633242 Change-Id: Ib9f078aafb15e899a01086eae113270657ea916b Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	geo-rep: Fix issues related config set	Kotresh HR	2018-09-19	2	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. '--ignore-mising-args' option for rsync is not being used even though the rsync version is greater than 3.1.0. Fixed the same. 2. '--existing' option for rsync is also not being used. Fixed the same. 3. geo-rep config fails to set rsync-options as the value contains '--'. Interestingly, python argsparse treats the value with '--' (e.g., --ignore-missing-args) as option. But when passed with something like --value=--ignore-missing-args, it succeeds. Fixed the same. Backport of: > Patch: https://review.gluster.org/21191 > Change-Id: Iaeb838acaff1c2920fee9c7f920c99edce13a0a1 > Signed-off-by: Kotresh HR <khiremat@redhat.com> > BUG: 1629561 (cherry picked from commit b977b44dd0adfcd7a3b432844260de4b8d1c4adf) Change-Id: Iaeb838acaff1c2920fee9c7f920c99edce13a0a1 Signed-off-by: Kotresh HR <khiremat@redhat.com> fixes: bz#1630673
*	core: remove experimental xlators and associated tests	ShyamsundarR	2018-09-17	50	-5657/+2
\| \| \| \| \| \| \| \|	experimental xlators removed from 5.0 Change-Id: I47219d8b95efc3d5875ec9224d1e79f8371e9f76 Updates: bz#1628620 Signed-off-by: ShyamsundarR <srangana@redhat.com>
*	glusterd: Update op-version from 4.2 to 5.0	ShyamsundarR	2018-09-17	4	-14/+14
\| \| \| \| \| \| \| \| \| \| \| \|	Post changing the max op-version to 4.2, after release 4.1 branching, the decision was to go with increasing release numbers. Thus this needs to change to 5.0. This commit addresses the above change. Fixes: bz#1628668 Change-Id: Ifcc0c6da90fdd51e4eceea40749511110a432cce Signed-off-by: ShyamsundarR <srangana@redhat.com>
*	gfapi: revert several patchs that introduced pre/post attrs	ShyamsundarR	2018-09-17	9	-23/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reverted the following: - 248152767b0599986bbb6bb35fc27197f6be6964 - 09943beb499617212f2985ca8ea9ecd1ed1b470e - d01f7244e9d9f7e3ef84e0ba7b48ef1b1b09d809 The reverts are redone by hand, due to clang format changes that made using git to revert the changes more tedious. Change-Id: I96489638a2b641fb2206a110298543225783f7be Updates: bz#1628620 Signed-off-by: ShyamsundarR <srangana@redhat.com>
*	build: cleanup xlator link, --no-undefined, libuuidv6dev	Kaleb S. KEITHLEY	2018-09-12	7	-17/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While attempting to build a (pre-)5.0 of glusterfs on Ubuntu bionic and cosmic, it became apparent that there are some gremlins hiding in the combination of the xlator export-symbols, the newish addition of -Wl,--no-undefined, and the new switch to libuuid from the old contrib/uuid. Note: even though Fedora 28 (and later) and Ubuntu bionic (and later) have the same nominal version of libtool, the Fedora version appears to do a better job of recursing through dependencies to determine the libraries to link with. Examination of the build logs showed that despite appearing to work on Fedora, not all xlators and shared libs were linked with -Wl, --no-undefined, and -luuid. And in the case of the gnfs xlator, it was not only not linked with -Wl,--no-undefined but alsos not linked with -lgfxdr and -lgfrpc. Added GF_XLATOR_LDFLAGS, similar to GF_XLATOR_DEFAULT_LDFLAGS. GF_XLATOR_DEFAULT_LDFLAGS is for xlators that export/expose the default or common set of symbols. GF_XLATOR_LDFLAGS is for those remaining xlators that export/expose non-default symbols, e.g. dht and glupy. This removes the need in the future to add things like $(UUID_LIBS) to every xlator's Makefile.am. Just add it to GF_XLATOR_LDFLAGS and GF_XLATOR_DEFAULT_LDFLAGS in configure.ac and you're done. This patch was tested on Fedora 28 (build, rpmbuild), Fedora Rawhide/30 (rpmbuild), RHEL8 (rpmbuild), CentOS7 (rpmbuild), Fedora koji --scratch build for f30/rawhide, and a Launchpad build for Ubuntu cosmic/18.10. Change-Id: Ieca104fa5c5d3c094e701c8ca4a73754dd0292b0 updates: bz#1193929 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
*	template files: revert clang	Amar Tumballi	2018-09-12	5	-1169/+1260
\| \| \| \| \| \|	Change-Id: If3925191d23afe83cbbdbc3cf0554c0a9c76d043 updates: bz#1564149 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	Land part 2 of clang-format changes	Gluster Ant	2018-09-12	304	-334839/+323946
\| \| \| \| \|	Change-Id: Ia84cc24c8924e6d22d02ac15f611c10e26db99b4 Signed-off-by: Nigel Babu <nigelb@redhat.com>
*	Land clang-format changes	Gluster Ant	2018-09-12	257	-17145/+16171
\| \| \| \|	Change-Id: I6f5d8140a06f3c1b2d196849299f8d483028d33b
*	dht: Use snprintf instead of strncpy	N Balachandran	2018-09-12	1	-9/+20
\| \| \| \| \| \| \| \| \| \| \|	The recent changes to use malloc instead of calloc left the new_name and new_path non-null terminated. We now use snprintf instead of strncpy to fix this. Change-Id: I1a31701ca9447efde38921be0ba2c73cde2e7976 fixes: bz#1626346 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	mount/fuse: convert ENOENT to ESTALE in open(dir)_resume	Raghavendra G	2018-09-11	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch is continuation of commit fb4b914ce84bc83a5f418719c5ba7c25689a9251. <snip> mount/fuse: never fail open(dir) with ENOENT open(dir) being an operation on inode should never fail with ENOENT. If gfid is not present, the appropriate error is ESTALE. This will enable kernel to retry open after a revalidate lookup. </snip> Earlier commit failed to fix codepath where error response is sent back on gfid resolution failures in fuse_open(dir)_resume. Current patch completes that work Change-Id: Ia07e3cece404811703c8cfbac9b402ca5fe98c1e Signed-off-by: Raghavendra G <rgowdapp@redhat.com> updates: bz#1627620
*	mgmt xlators: store boolean fields using integer	Yaniv Kaul	2018-09-11	2	-8/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Surprisingly, there is not set_boolean() as there is a get_boolean() In fact, it is stored as an INT dictionary type. In some occasions it was stored using a string, and this caused errors such as: key gfproxy-server, integer type asked, has string type [Invalid argument] I've fixed what I saw in some logs, I'm sure there are more. The CORRECT fix is to create a boolean set and use it, but this requires a bit more work. I'll see if I can do it later on. Only compile-tested! updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I45fd0c7a0824b2f42b8ce510296c9dfa4f32ad66
*	glusterd: NULL pointer dereferencing clang fix	Harpreet Lalwani	2018-09-11	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \|	problem: NULL point dereferencing solution: Adding a conditional statement before and then dereferencing it. Updates: bz#1622665 Change-Id: I562ca90aebf2a4882cfea10114a90364d9ef1996 Signed-off-by: Harpreet Lalwani <hlalwani@redhat.com>
*	misc: fix misc. shebangs	Kaleb S. KEITHLEY	2018-09-11	4	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* One #!/usr/bin/env python and three #!/usr/bin/python were overlooked in all the other python fixups. Ugh. * Two new python files missed the memo about #!/usr/bin/python3. * One #!/usr/bin/env bash. Various distribution packaging policies have strong wording about the use of #!/usr/bin/env ... Note: this patch does not change the use of #!/usr/bin/env bash in the two files extras/{clang-checker.sh,check_goto.pl} as these are not included in any packages. (Although I'm not actually sure why anyone would ever use '/usr/bin/env {sh,bash}' as I'm not aware of any version-specific differences like there are with, e.g., python.) * One #!/usr/bin/bash. On Fedora and CentOS > 6, /bin is a symlink to /usr/bin, so it makes little difference. But Debian & Ubuntu still have separate /bin and /usr/bin; and sh and bash are in /bin, not /usr/bin. (Historically, in BSD and SYSV Unix it was /bin/sh.) Note: Fedora and CentOS package build runs a script that converts all /bin/sh and /bin/bash to /usr/bin/sh and /usr/bin/bash. Change-Id: I9171265829af78dd0cd7622c22b56d22179ff8a3 updates: bz#1193929 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
*	storage/posix: Fix coverity issue - Unchecked return value	Ashish Pandey	2018-09-11	1	-1/+8
\| \| \| \| \| \| \| \| \| \|	Fixes CID: 1388886 https://scan6.coverity.com/reports.htm#v42607/p10714/fileInstanceId=85287446&defectInstanceId=25997291&mergedDefectId=1388886 Change-Id: Ic4e558bba7e15d213c07bc31affb2e175ace5502 updates: bz#789278 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
*	performance/write-behind: remove the request from wip queue in ↵	Raghavendra G	2018-09-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	wb_fulfill_request The bug is very similar to bz 1379655 and the fix too very similar to commit a8b2a981881221925bb5edfe7bb65b25ad855c04. Before this patch, a request is removed from wip queue only when ref count of request hits 0. Though, wb_fulfill_request does an unref, it need not be the last unref and hence the request may survive in wip queue till the last unref. Let, T1: the time at which wb_fulfill_request is invoked T2: the time at which last unref is done on request Let's consider a case of T2 > T1. In the time window between T1 and T2, any other request (waiter) conflicting with request in liability queue (blocker - basically a write which has been lied) is blocked from winding. If T2 happens to be when wb_do_unwinds is invoked, no further processing of request list happens and "waiter" would get blocked forever. An example imaginary sequence of events is given below: 1. A write request w1 is picked up for winding in __wb_pick_winds and w1 is moved to wip queue. Let's call this invocation of wb_process_queue by wb_writev as PQ1. Note w1 is not unwound. 2. A dependent write (w2) hits write-behind and is unwound followed by a flush (f1) request. Since the liability queue of inode is not empty, w2 and f1 are not picked for unwinding. Let's call the invocation of wb_process_queue by wb_flush as PQ2. Note that invocation of wb_process_queue by w2 doesn't wind w2 instead unwinds it after which we hit PQ2 3. PQ2 continues and picks w1 for fulfilling and invokes wb_fulfill. As part of successful wb_fulfill_cbk, wb_fulfill_request (w1) is invoked. But, w1 is not freed (and hence not removed from wip queue) as w1 is not unwound _yet_ and a ref remains (PQ1 has not invoked wb_do_unwinds _yet_). 4. wb_fulfill_cbk (triggered by PQ2) invokes a wb_process_queue (let's say PQ3). w2 is not picked up for winding in PQ3 as w1 is still in wip queue. At this time, PQ2 and PQ3 are complete. 5. PQ1 continues, unwinds w1 and does last unref on w1 and w1 is freed (and removed from wip queue). Since PQ1 didn't invoke wb_fulfill on any other write requests, there won't be any future codepaths that would invoke wb_process_queue and w2 is stuck forever. This will prevent f2 too and hence close syscall is hung With this fix, w1 is removed from liability queue in step 3 above and PQ3 winds w2 in step 4 (as there are no requests conflicting with w2 in liability queue during execution of PQ3). Once w2 is complete, f1 is resumed. Change-Id: Ia972fad0858dc4abccdc1227cb4d880f85b3b89b Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Fixes: bz#1626787
*	mgmt/glusterd : Fix coverity issue	Ashish Pandey	2018-09-11	1	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \|	CID: 727146, 727066 https://scan6.coverity.com/reports.htm#v42607/p10714/fileInstanceId=85393035&defectInstanceId=26034751&mergedDefectId=727146 https://scan6.coverity.com/reports.htm#v42607/p10714/fileInstanceId=85392913&defectInstanceId=26034571&mergedDefectId=727066 updates: bz#789278 Change-Id: Ieaef33829ec88e68690dabce4ea21d2e61dad9f6 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
*	glusterd: NULL pointer dereferencing clang fix	Sheetal Pamecha	2018-09-11	1	-1/+2
\| \| \| \| \| \| \| \| \|	Added ternary operator to avoid NULL pointer dereferencing Updates: bz#1622665 Change-Id: I4b970176b6b555c2eda1da2848c493e45b1e4217 Signed-off-by: Sheetal Pamecha <sheetal.pamecha08@gmail.com>
*	features/snapview-server: remove the ret variable	Raghavendra Bhat	2018-09-11	1	-4/+2
\| \| \| \| \| \| \| \|	Used only once to store the return value of a function. Change-Id: Ib2b9db847b5f652ce46f220297d9c7c3503eaea2 fixes: #230 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
*	cluster/dht: Create a linkto file if required	N Balachandran	2018-09-10	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \|	Using the dht_filter_loc_subvol_key to create files on specific subvols did not create a linkto file. This can make the file inaccessible as lookup-optimize is now enabled by default. Change-Id: I78add5a31887378a479cb9c746b91678876b0dbe fixes: bz#1626394 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	posix: fix the duplicate macro definitions	Amar Tumballi	2018-09-10	2	-8/+0
\| \| \| \| \| \| \| \| \| \|	the macro 'PATH_SET_TIMESPEC_OR_TIMEVAL' is defined in posix.h, which seems to be good place for it updates: bz#1193929 Change-Id: I521a2a6efeb26891c24637a5f06b1d90c00b1135 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	cluster/dht: optimize readdir for 1xn vols	N Balachandran	2018-09-10	1	-12/+42
\| \| \| \| \| \| \| \| \|	Skip the hashed subvol check for volumes with distribute count of 1. Change-Id: I5703508b54a17c49a217c8a8e09884980705953a fixes: bz#1608175 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	ec-heal: remove a duplicate definition of alloca0	Amar Tumballi	2018-09-10	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	the same macro is defined in common-utils.h, which seems to be much better place for the same. Updates: bz#1193929 Change-Id: I409b719c291102136500b955e5827a550142ed96 Signed-off-by: Amar Tumballi <amarts@redhat.com>