summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* doc: release notes for release 5.2v5.2ShyamsundarR2018-12-121-0/+27
| | | | | | Fixes: bz#1649895 Change-Id: Ic8f59df9d99efafd56c930e30040f046654a1f9e Signed-off-by: ShyamsundarR <srangana@redhat.com>
* afr: thin-arbiter 2 domain locking and in-memory stateRavishankar N2018-12-127-81/+684
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 2 domain locking + xattrop for write-txn failures: -------------------------------------------------- - A post-op wound on TA takes AFR_TA_DOM_NOTIFY range lock and AFR_TA_DOM_MODIFY full lock, does xattrop on TA and releases AFR_TA_DOM_MODIFY lock and stores in-memory which brick is bad. - All further write txn failures are handled based on this in-memory value without querying the TA. - When shd heals the files, it does so by requesting full lock on AFR_TA_DOM_NOTIFY domain. Client uses this as a cue (via upcall), releases AFR_TA_DOM_NOTIFY range lock and invalidates its in-memory notion of which brick is bad. The next write txn failure is wound on TA to again update the in-memory state. - Any incomplete write txns before the AFR_TA_DOM_NOTIFY upcall release request is got is completed before the lock is released. - Any write txns got after the release request are maintained in a ta_waitq. - After the release is complete, the ta_waitq elements are spliced to a separate queue which is then processed one by one. - For fops that come in parallel when the in-memory bad brick is still unknown, only one is wound to TA on wire. The other ones are maintained in a ta_onwireq which is then processed after we get the response from TA. Change-Id: I32c7b61a61776663601ab0040e2f0767eca1fd64 updates: bz#1648205 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* features/bitrot: compare the signature with proper lengthRaghavendra Bhat2018-12-122-8/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | * The scrubber was comparing the checksum of the file that it calculated (by reading the file) with the on disk signature (stored via xattr) wrongly. It was using strlen to calculate the signature, while the actual length of the signature is given by the brick. Just use the actual length that the brick provides instead of trying to calculate the signature length via strlen API. * In posix, gfid2path was using the same string that contains the list of all the xattrs of file to save the value of the gfid2path xattr as well. This causes confusion when gfid2path xattr is queried by scrubber for getting the actual path of a corrupted file. Use separate string to fetch the value of the xattr instead of the string that contains the list of xattrs. Backport of: > Patch: https://review.gluster.org/21752/ > BUG: 1654805 > Change-Id: I2d664ab524d2b312233476cb35863dde3122e9a9 (cherry picked from commit f77fb6d568616592ab25501c402c140d15235ca9) Change-Id: I2d664ab524d2b312233476cb35863dde3122e9a9 fixes: bz#1654370 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* afr: assign gfid during name heal when no 'source' is present.Ravishankar N2018-12-125-52/+201
| | | | | | | | | | | | | | | | | | | Problem: If parent dir is in split-brain or has dirty xattrs set, and the file has gfid missing on one of the bricks, then name heal won't assign the gfid. Fix: Use the brick we select the gfid from as the 'source'. Note: Problem was found while trying to debug a split-brain issue on Cynthia Zhou's setup. fixes: bz#1655545 Change-Id: Id088d4f0fb017aa35122de426654194e581ed742 Reported-by: Cynthia Zhou <cynthia.zhou@nokia-sbell.com> Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 4d58730c0cd6ab5db39aec8a15276f7bd3371b04)
* leases: Do not conflict with internal fopsSoumya Koduri2018-12-121-0/+11
| | | | | | | | | | | | | | | Internal fops (with frame->root->pid < 0) are used to heal or move data and maintains data integrity. That is they do not modify client data which holds the lease. Hence no need to recall Lease for such fops. Note: Like for locks, we would need rebalance and self-heal daemon process to heal lease state as well. Change-Id: I8988693fef8d00e17c19dcc842e2238f9eb5ab48 updates: bz#1651323 Signed-off-by: Soumya Koduri <skoduri@redhat.com> (cherry picked from commit 080aa5b9e9d998552e23f7c33aed3afb0ca93c34)
* geo-rep: Fix traceback with symlink metadata syncKotresh HR2018-12-121-15/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While syncing metadata, 'os.chmod', 'os.chown', 'os.utime' should be used without de-reference. But python supports only 'os.chown' without de-reference. That's mostly because Linux doesn't support 'chmod' on symlink file itself but it does support 'chown'. So while syncing metadata ops, if it's symlink we should only sync 'chown' and not do 'chmod' and 'utime'. It will lead to tracebacks with errors like EROFS, EPERM, ACCESS, ENOENT. All the three errors (EPERM, ACCESS, ENOENT) were handled except EROFS. But the way it was handled was not fool proof. The operation is tried and failure was handled based on the errors. All the errors with symlink file for 'chown', 'utime' had to be passed to safe errors list of 'errno_wrap'. This patch handles it better by avoiding 'chmod' and 'utime' if it's symlink file. Backport of : < Patch: https://review.gluster.org/21546/ > BUG: bz#1646104 > Change-Id: Ic354206455cdc7ab2a87d741d81f4efe1f19d77d > Signed-off-by: Kotresh HR <khiremat@redhat.com> (cherry picked from commit 3c6cf9a4a1b46cab2dc53c1ee0afca0fe993102e) fixes: bz#1654115 Change-Id: Ic354206455cdc7ab2a87d741d81f4efe1f19d77d Signed-off-by: Kotresh HR <khiremat@redhat.com>
* gfapi: Offload callback notifications to synctaskSoumya Koduri2018-12-124-10/+419
| | | | | | | | | | | | | | | | | | | | | | | Upcall notifications are received from server via epoll and same thread is used to forward these notifications to the application. This may lead to deadlock and hang in the following scenario. Consider if as part of handling these callbacks, application has to do some operations which involve sending I/Os to gfapi stack which inturn have to wait for epoll threads to receive repsonse. Thus this may lead to deadlock if all the epoll threads are waiting to complete these callback notifications. To address it, instead of using epoll thread itself, make use of synctask to send those notificaitons to the application. Change-Id: If614e0d09246e4279b9d1f40d883a32a39c8fd90 updates: bz#1651323 Signed-off-by: Soumya Koduri <skoduri@redhat.com> (cherry picked from commit ad35193718a99494ab1b852ca4cbdf054f73de88)
* lease: Treat unlk request as noop if lease not foundSoumya Koduri2018-12-122-4/+16
| | | | | | | | | | | | | | | | | When the glusterfs server recalls the lease, it expects client to flush data and unlock the lease. If not it sets a timer (starting from the time it sends RECALL request) and post timeout, it revokes it. Here we could have a race where in client did send UNLK lease request but because of network delay it may have reached after server revokes it. To handle such situations, treat such requests as noop and return sucesss. Change-Id: I166402d10273f4f115ff04030ecbc14676a01663 updates: bz#1651323 Signed-off-by: Soumya Koduri <skoduri@redhat.com> (cherry picked from commit c2e758b54d8a3f778e3e63db0000bb8b63de9b25)
* geo-rep: Fix permissions with non-root setupKotresh HR2018-11-291-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In non-root fail-over/fail-back(FO/FB), when slave is promoted as master, the session goes to 'Faulty' Cause: The command 'gluster-mountbroker <mountbroker-root> <group>' is run as a pre-requisite on slave in non-root setup. It modifies the permission and group of following required directories and files recursively [1] /var/lib/glusterd/geo-replication [2] /var/log/glusterfs/geo-replication-slaves In a normal setup, this is executed on slave node and hence doing it recursively is not an issue on [1]. But when original master becomes slave in non-root during FO/FB, it contains ssh public keys and modifying permissions on them causes geo-rep to fail with incorrect permissions. Fix: Don't do permission change recursively. Fix permissions for required files. Backport of: > Patch: https://review.gluster.org/#/c/glusterfs/+/21689/ > BUG: bz#1651498 > Change-Id: I68a744644842e3b00abc26c95c06f123aa78361d > Signed-off-by: Kotresh HR <khiremat@redhat.com> (cherry picked from commit b2776b1ec1ad845ba568c4439bca3b57cc4d2592) fixes: bz#1654117 Change-Id: I68a744644842e3b00abc26c95c06f123aa78361d Signed-off-by: Kotresh HR <khiremat@redhat.com>
* leases: Fix incorrect inode_ref/unrefsSoumya Koduri2018-11-292-3/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From testing & code-reading, found couple of places where we incorrectly unref the inode resulting in use_after_free crash or ref leaks. This patch addresses couple of them. a) When we try to grant the very first lease for a inode, inode_ref is taken in __add_lease. This ref should be active till all the leases granted to that inode are released (i.e, till lease_cnt > 0). In addition even after lease_cnt becomes '0', the inode should be active till all the blocked fops are resumed. Hence release this ref, after resuming all those fops. To avoid granting new leases while resuming those fops, defined a new boolean (blocked_fops_resuming) to flag it in the lease_ctx. b) 'new_lease_inode' which creates new lease_inode_entry and takes ref on inode, is used while adding that entry to client_list and recall_list. Use its counter function '__destroy_lease_inode' which does unref while removing those entries from those lists. c) inode ref is also taken when added to timer->data. Unref the same after processing timer->data. Change-Id: Ie77c78ff4a971e0d9a66178597fb34faf39205fb updates: bz#1651323 Signed-off-by: Soumya Koduri <skoduri@redhat.com> (cherry picked from commit b7aec05aa965202ab73120acf0da4c32fe0cf16c)
* gfapi: Send fop_attr dict as part of syncop_openSoumya Koduri2018-11-291-1/+1
| | | | | | | | | | | Leaseid (stored in thread locals) is sent to server via xdata. This dict variable is set but not passed as argument in glfs_h_open(). Fixed the same. Change-Id: Idd2f8a0ec184b4b6b1ad1e6e5d75df551c36a96d updates: bz#1651323 Signed-off-by: Soumya Koduri <skoduri@redhat.com> (cherry picked from commit 04be5463b20ababc29942fa967017e763d0ae2af)
* afr: open_ftruncate_cbk should read fd from local->cont.open structSoumya Koduri2018-11-291-2/+2
| | | | | | | | | | | | | afr_open stores the fd as part of its local->cont.open struct but when it calls ftruncate (if open flags contain O_TRUNC), the corresponding cbk function (afr_ open_ftruncate_cbk) is incorrectly referencing uninitialized local->fd. This patch fixes the same. Change-Id: Icbdedbd1b8cfea11d8f41b6e5c4cb4b44d989aba updates: bz#1651322 Signed-off-by: Soumya Koduri <skoduri@redhat.com> (cherry picked from commit fda594875c4cdb2a22e27aa13f5c66bee032ccb5)
* cluster/afr: Use 2 domain locking in SHD for thin-arbiterkarthik-us2018-11-295-89/+390
| | | | | | | | | | | | | | | | | | | | | | | | With this change when SHD starts the index crawl it requests all the clients to release the AFR_TA_DOM_NOTIFY lock so that clients will know the in memory state is no more valid and any new operations needs to query the thin-arbiter if required. When SHD completes healing all the files without any failure, it will again take the AFR_TA_DOM_NOTIFY lock and gets the xattrs on TA to see whether there are any new failures happened by that time. If there are new failures marked on TA, SHD will start the crawl immediately to heal those failures as well. If there are no new failures, then SHD will take the AFR_TA_DOM_MODIFY lock and unsets the xattrs on TA, so that both the data bricks will be considered as good there after. >Change-Id: I037b89a0823648f314580ba0716d877bd5ddb1f1 >fixes: bz#1579788 >Signed-off-by: karthik-us <ksubrahm@redhat.com> (cherry picked from commit 5784a00f997212d34bd52b2303e20c097240d91c) Change-Id: I037b89a0823648f314580ba0716d877bd5ddb1f1 fixes: bz#1648205
* cluster/ec: prevent infinite loop in self-heal fullXavi Hernandez2018-11-291-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | There was a problem in commit 7f81067 that caused infinite loop when full heal was triggered. The previous commit was made to prevent self-heal to go idle after a replace brick operation. One of the changes consisted on setting a flag to force an immediate scan of the dirty directory if a heal on a directory succeeded (assuming it could have generated newer entries). However that change was causing an issue with a full self-heal, since every time an already healed directory was checked and it returned suceessfully, it was also setting the flag, forcing self-heal to start over again. This patch fixes this issue by only setting the flag if the heal is not full. It's assumed that a full self-heal will already traverse all entries automatically, so there's no need to force a new scan later. >Change-Id: Id12dbfc04e622b18183e796cc6cc87ccc30a6d55 >fixes: bz#1636631 >Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> (cherry picked from commit 7150c51ad75ccba22045a35fc31e5037612d1ad4) Change-Id: Id12dbfc04e622b18183e796cc6cc87ccc30a6d55 fixes: bz#1651525 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* glfsheal: add a '--nolog' flagRavishankar N2018-11-284-17/+56
| | | | | | | | | | | | | | | | | | | | | | ....and if set, change the log level to GF_LOG_NONE. This is useful for monitoring applications which invoke the heal info set of commands once every minute, leading to un-necessary glfsheal* logs in /var/log/glusterfs/. For example, we can now run `gluster volume heal <VOLNAME> info --nolog` `gluster volume heal <VOLNAME> info split-brain --nolog` etc. The default log level is still retained at GF_LOG_INFO. The patch also changes glfsheal internally to accept '--xml' instead of 'xml'. Note: The --nolog flag is *not* displayed in the help anywhere, for the sake of consistency in how the other flags are not displayed anywhere in the help. fixes: bz#1654236 Change-Id: Ia08b6aa6e4a0548379db7e313dd4411ebc66f206 Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit fc9889d0373c323aab0d93f8ca31d2d8151bd041)
* doc: Release notes for 5.1 release of GlusterFSv5.1ShyamsundarR2018-11-141-0/+51
| | | | | | Change-Id: I526bf9bfd889dd7aea19f71059042cd9a993e1d0 Signed-off-by: ShyamsundarR <srangana@redhat.com> Fixes: bz#1640685
* cluster/ec: Change log level to DEBUG for lookup combineAshish Pandey2018-11-141-1/+1
| | | | | | | | | | | | | | As lookup is not a locked fop, we can not trust the data received in this to be same. Changing the log level to DEBUG in case lookup finds any difference. (cherry picked from commit 9be6bf3d90e3783b3ba559c93d41b933f8d53f03) Change-Id: I39499c44688a2455c7c6c69a798762d045d21b39 updates: bz#1644622 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* gfapi: fix bad dict setting of lease-idKinglong Mee2018-11-121-10/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lease_id is a 16 bits opaque data, copying it by gf_strdup is wrong. Invalid read of size 2 at 0x483FA2F: memmove (vg_replace_strmem.c:1270) by 0xE2EF6FB: ??? (in /usr/lib64/libtirpc.so.3.0.0) by 0xE2EE047: xdr_opaque (in /usr/lib64/libtirpc.so.3.0.0) by 0x107A97DC: xdr_gfx_value (glusterfs4-xdr.c:207) by 0x107A98C0: xdr_gfx_dict_pair (glusterfs4-xdr.c:321) by 0xE2EF35E: xdr_array (in /usr/lib64/libtirpc.so.3.0.0) by 0x107A9A89: xdr_gfx_dict (glusterfs4-xdr.c:335) by 0x107AA97B: xdr_gfx_write_req (glusterfs4-xdr.c:897) by 0x107A181E: xdr_serialize_generic (xdr-generic.c:25) by 0x231044A2: client_submit_request (client.c:205) by 0x2314D3C1: client4_0_writev (client-rpc-fops_v2.c:3863) by 0x230FD5FA: client_writev (client.c:956) Address 0xad659e18 is 72 bytes inside a block of size 73 alloc'd at 0x483880B: malloc (vg_replace_malloc.c:299) by 0x106BA7EC: __gf_malloc (mem-pool.c:136) by 0x1064521E: gf_strndup (mem-pool.h:166) by 0x1064521E: gf_strdup (mem-pool.h:183) by 0x1064521E: get_fop_attr_thrd_key (glfs.c:627) by 0x1064D8E9: glfs_pwritev@@GFAPI_3.4.0 (glfs-fops.c:1154) by 0x10610C0C: glusterfs_write2 (handle.c:2092) by 0x54D30C: mdcache_write2 (mdcache_file.c:647) by 0x48A3FC: nfs4_write (nfs4_op_write.c:459) by 0x48A44D: nfs4_op_write (nfs4_op_write.c:487) by 0x4634F5: nfs4_Compound (nfs4_Compound.c:947) by 0x460155: nfs_rpc_process_request (nfs_worker_thread.c:1329) by 0x4608A3: nfs_rpc_valid_NFS (nfs_worker_thread.c:1539) by 0x488F12F: svc_vc_decode (svc_vc.c:825) Backport of: > Patch: https://review.gluster.org/21586/ > BUG: bz#1647651 > Change-Id: Ib9fff55c897bc43c15036a869888e763df133757 > Signed-off-by: Kinglong Mee <mijinlong@open-fs.com> (cherry picked from commit 6d4cd8ce6c0d88d331ffed97c51d3061a3900561) Updates bz#1648923 Change-Id: Ib9fff55c897bc43c15036a869888e763df133757 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
* geo-rep: Add more intelligence to automatic error handlingKotresh HR2018-11-091-24/+46
| | | | | | | | | | | | | | | | | | | | | Geo-rep's automatic error handling does gfid conflict resolution. But if there are ENOENT errors because the parent is not synced to slave, it doesn' handle them. This patch adds the intelligence to create missing parent directories on slave. It can create the missing directories upto the depth of 10. Backport of: > Patch: https://review.gluster.org/21498/ > fixes: bz#1643402 > Change-Id: Ic97ed1fa5899c087e404d559e04f7963ed7bb54c > Signed-off-by: Kotresh HR <khiremat@redhat.com> (cherry picked from commit 19775e0445411cca9ddd9d294fd54d0b6fbe6a03) fixes: bz#1646896 Change-Id: Ic97ed1fa5899c087e404d559e04f7963ed7bb54c Signed-off-by: Kotresh HR <khiremat@redhat.com>
* glusterd: ensure volinfo->caps is set to correct value.Sanju Rakonde2018-11-092-0/+11
| | | | | | | | | | | | | | | | | | | | | | With the commit febf5ed4848, during the volume create op, we are setting volinfo->caps to 0, only if any of the bricks belong to the same node and brickinfo->vg[0] is null. Previously, we used to set volinfo->caps to 0, when either brick doesn't belong to the same node or brickinfo->vg[0] is null. With this patch, we set volinfo->caps to 0, when either brick doesn't belong to the same node or brickinfo->vg[0] is null. (as we do earlier without commit febf5ed4848). > BUG: bz#1635820 > Change-Id: I00a97415786b775fb088ac45566ad52b402f1a49 > Signed-off-by: Sanju Rakonde <srakonde@redhat.com> (cherry picked from commit aae1c402b74fd02ed2f6473b896f108d82aef8e3) fixes: bz#1647968 Change-Id: I00a97415786b775fb088ac45566ad52b402f1a49 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* all: fix the format string exceptionsAmar Tumballi2018-11-0940-118/+126
| | | | | | | | | | | | | | | | Currently, there are possibilities in few places, where a user-controlled (like filename, program parameter etc) string can be passed as 'fmt' for printf(), which can lead to segfault, if the user's string contains '%s', '%d' in it. While fixing it, makes sense to make the explicit check for such issues across the codebase, by making the format call properly. Fixes: CVE-2018-14661 Fixes: bz#1647666 Change-Id: Ib547293f2d9eb618594cbff0df3b9c800e88bde4 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* features/locks:Use pthread_mutex_unlock() instead of pthread_mutex_lock()Vijay Bellur2018-11-091-1/+1
| | | | | | | | | Fixes CID 1396581 Change-Id: Ic04091b5783a75d8e1e605a9c1c28b77fea048d3 updates: bz#1647962 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Signed-off-by: Susant Palai <spalai@redhat.com>
* lock: Do not allow meta-lock count to be more than oneSusant Palai2018-11-091-1/+34
| | | | | | | | | | | | | | | | | | | | In the current scheme of glusterfs where lock migration is experimental, (ideally) the rebalance process which is migrating the file should request for a metalock. Hence, the metalock count should not be more than one for an inode. In future, if there is a need for meta-lock from other clients, this patch can be reverted. Since pl_metalk is called as part of setxattr operation, any client process(non-rebalance) residing outside trusted network can exhaust memory of the server node by issuing setxattr repetitively on the metalock key. The current patch makes sure that more than one metalock cannot be granted on an inode. Fixes CVE-2018-14660 updates: bz#1647962 Change-Id: Ie1e697766388718804a9551bc58351808fe71069 Signed-off-by: Susant Palai <spalai@redhat.com>
* server: don't allow '/' in basenameAmar Tumballi2018-11-082-5/+19
| | | | | | | | | | | | | | | | Server stack needs to have all the sort of validation, assuming clients can be compromized. It is possible for a compromized client to send basenames with paths with '/', and with that create files without permission on server. By sanitizing the basename, and not allowing anything other than actual directory as the parent for any entry creation, we can mitigate the effects of clients not able to exploit the server. Fixes: CVE-2018-14651 Fixes: bz#1647663 Change-Id: I5dc0da0da2713452ff2b65ac2ddbccf1a267dc20 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* cluster/afr : Check for UP bricks before starting healAshish Pandey2018-11-083-1/+19
| | | | | | | | | | | | | | | | | | | | | | | | | Problem: Currently for replica volume, even if only one brick is UP SHD will keep crawling index entries even if it can not heal anything. In thin-arbiter volume which is also a replica 2 volume, this causes inode lock contention which in turn sends upcall to all the clients to release notify locks, even if it can not do anything for healing. This will slow down the client performance and kills the purpose of keeping in memory information about bad brick. Solution: Before starting heal or even crawling, check if sufficient number of children are UP and available to check and heal entries. (cherry picked from commit f73b4476b15f9d6d3dc3c8e20c9742aacd857f9f) Change-Id: I011c9da3b37cae275f791affd56b8f1c1ac9255d updates: bz#1644645 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* glusterd: allow shared-storage to use bricks under glusterd working directorySanju Rakonde2018-11-086-11/+15
| | | | | | | | | | | | | | | | | | | | With commit 44e4db, we are not allowing user to create a volume using glusterd's working directory as a brick or any sub directory under glusterd's working directory as a brick.This has broken shared-storage since the volume "gluster-shared-storage" is created using the bricks under glusterd's working directory. With this patch, we let the "gluster-shared-storage" volume to use bricks under glusterd's working directory. > BUG: bz#1647029 > Change-Id: Ifcbcf4576eea12cf46f199dea287b29bd3ec3bfd > Signed-off-by: Sanju Rakonde <srakonde@redhat.com> (cherry picked from commit bdb4ca184913c82ccf9552298f5d5b597794f2aa) fixes: bz#1647801 Change-Id: Ifcbcf4576eea12cf46f199dea287b29bd3ec3bfd Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* io-stats: prevent taking file dump on server sideAmar Tumballi2018-11-081-0/+9
| | | | | | | | | | | | By allowing clients taking dump in a file on brick process, we are allowing compromised clients to create io-stats dumps on server, which can exhaust all the available inodes. Fixes: CVE-2018-14659 Fixes: bz#1647665 Change-Id: I32bfde9d4fe646d819a45e627805b928cae2e1ca Signed-off-by: Amar Tumballi <amarts@redhat.com>
* glusterd-handshake: prevent a buffer overflowAmar Tumballi2018-11-081-0/+7
| | | | | | | | | | | | as key size in xdr can be anything, it can be bigger than the 'NAME_MAX' allowed in the structure, which can allow for service denial attacks. Fixes: CVE-2018-14653 Fixes: bz#1647664 Change-Id: I2dc5e99af27ddf44c12c94b07e51adb8674cce80 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* protocol: remove the option 'verify-volfile-checksum'Amar Tumballi2018-11-083-354/+5
| | | | | | | | | | | | | | | | 'getspec' operation is not used between 'client' and 'server' ever since we have off-loaded volfile management to glusterd, ie, at least 7 years. No reason to keep the dead code! The removed option had no meaning, as glusterd didn't provide a way to set (or unset) this option. So, no regression should be observed from any of the existing glusterfs deployment, supported or unsupported. Updates: CVE-2018-14653 Updates: bz#1647664 Change-Id: I4a2e0f673c5bcd4644976a61dbd2d37003a428eb Signed-off-by: Amar Tumballi <amarts@redhat.com>
* tests: correction in tests/bugs/glusterd/optimized-basic-testcases-in-cluster.tSanju Rakonde2018-11-081-9/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch https://review.gluster.org/#/c/glusterfs/+/19135/ has optimised glusterd test cases by clubbing the similar test cases into a single test case. https://review.gluster.org/#/c/glusterfs/+/19135/15/tests/bugs/glusterd/bug-1293414-import-brickinfo-uuid.t test case has been deleted and added as a part of tests/bugs/glusterd/optimized-basic-testcases-in-cluster.t In the original test case, we create a volume with two bricks, each on a separate node(N1 & N2). From another node in cluster(N3), we try to detach a node which is hosting bricks. It fails. In the new test, we created volume with single brick on N1. and from another node in cluster, we tried to detach N1. we expect peer detach to fail, but peer detach was success as the node is hosting all the bricks of volume. Now, changing the new test case to cover the original test case scenario. Please refer https://bugzilla.redhat.com/show_bug.cgi?id=1642597#c1 to understand why the new test case is not failing in centos-regression. > BUG: bz#1642597 > Change-Id: Ifda12b5677143095f263fbb97a6808573f513234 > Signed-off-by: Sanju Rakonde <srakonde@redhat.com> (cherry picked from commit 0ca6773eaf5aeb507ebc72d2c2f61902eeff414c) fixes: bz#1643078 Change-Id: Ifda12b5677143095f263fbb97a6808573f513234 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* geo-rep/scripts: Fix traceback in gluster-mountbrokerKotresh HR2018-11-081-1/+1
| | | | | | | | | | | | | | | | When 'gluster-mountbroker status' was issued, it crashes in a corner case with 'str object has not attribute get'. Fixed the same. Backport of: > BUG: 1643929 > Signed-off-by: Kotresh HR <khiremat@redhat.com> > Change-Id: Iaf1a937ed0136b3b2058230c75fa89a215d8a5eb (cherry picked from commit 5987b3388126a3c5e77481913cbaa4142117d19a) fixes: bz#1644515 Signed-off-by: Kotresh HR <khiremat@redhat.com> Change-Id: Iaf1a937ed0136b3b2058230c75fa89a215d8a5eb
* posix/ctime: Avoid log flood in posix_update_utime_in_mdataKotresh HR2018-11-081-4/+0
| | | | | | | | | | | | | | | | | | posix_update_utime_in_mdata() unconditionally logs an error if consistent time attributes features is not enabled. This log does not add any value, prints an incorrect errno & floods the log file. Hence nuking this log message in this patch. Backport of: > Patch: https://review.gluster.org/21520/ > BUG: 1644129 > Change-Id: I9a1f9e7ada3366d2830f18d81f16a1461040092e > Signed-off-by: Kotresh HR <khiremat@redhat.com> fixes: bz#1644526 Change-Id: I9a1f9e7ada3366d2830f18d81f16a1461040092e Signed-off-by: Kotresh HR <khiremat@redhat.com>
* georep: python2 to python3 compat - schedulerKotresh HR2018-11-082-2/+2
| | | | | | | | | | | | | | | | 1. scheduler - Popen 2. syncdutils - corner case on failure Backport of: > Patch: https://review.gluster.org/21505 > BUG: 1643932 > Change-Id: I65af97a244a8790e976acedc2728db6ebbf2ae10 > Signed-off-by: Kotresh HR <khiremat@redhat.com> (cherry picked from commit 33e96100e17e9a293db6d63d9d5449d6c2d69376) fixes: bz#1644514 Change-Id: I65af97a244a8790e976acedc2728db6ebbf2ae10 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* build/packaging: el-X (x > 7) ismsKaleb S. KEITHLEY2018-11-081-3/+8
| | | | | | | | | | | | | | | | | lvm2(-devel) 2.03.00 no longer has liblvm2app.so. (I expect a similar change in fedora-30 before too much longer, but for now fedora-30 still has lvm2 and lvm2-devel 2.02.181 rpcgen has been removed from glibc-common and unbundled rpcgen is now required. And I guess nobody has ever built rpms with '--without bd' or we would have discovered the attempted inclusion of .../storage/bd.so in the rpm when it hadn't actually been built. Change-Id: I71e26c3d06af5d329ae89cc249a4ad88664ddf53 updates: bz#1644314 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* geo-rep: Fix issue in gfid-conflict-resolutionKotresh HR2018-11-081-17/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: During gfid-conflict-resolution, geo-rep crashes with 'ValueError: list.remove(x): x not in list' Cause and Analysis: During gfid-conflict-resolution, the entry blob is passed back to master along with additional information to verify it's integrity. If everything looks fine, the entry creation is ignored and is deleted from the original list. But it is crashing during removal of entry from the list saying entry not in list. The reason is that the stat information in the entry blob was modified and sent back to master if present. Fix: Send back the correct stat information for gfid-conflict-resolution. Backport of: > BUG: bz#1642865 > Change-Id: I47a6aa60b2a495465aa9314eebcb4085f0b1c4fd > Signed-off-by: Kotresh HR <khiremat@redhat.com> (cherry picked from commit ff18121945bff394f3234e9f1a9d61ac97d4d493) fixes: bz#1644158 Change-Id: I47a6aa60b2a495465aa9314eebcb4085f0b1c4fd Signed-off-by: Kotresh HR <khiremat@redhat.com>
* cliutils: python2 to python3 compatKotresh HR2018-11-081-1/+2
| | | | | | | | | | | | | | | Make Popen py2 and py3 compatiable Backport of: > BUG: 1643935 > Change-Id: Ife34cb38024dcdc0420436e7d76fd208223f9d86 > Signed-off-by: Kotresh HR <khiremat@redhat.com> (cherry picked from commit bae584148761ad98cd3d5c380f8cea1ff83aa8c3) fixes: bz#1644161 Change-Id: Ife34cb38024dcdc0420436e7d76fd208223f9d86 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* afr/lease: Read child nodes from lease structureroot2018-11-081-1/+1
| | | | | | | | | | For lease operation, we allocate and store child nodes data in lease structure. Use the same in afr_lease_cbk() while checking for the quorum. Change-Id: If1fdd5a0798888afd39ad3df57d96487baf9d1e6 updates: #350 Signed-off-by: Soumya Koduri <skoduri@redhat.com>
* index: prevent arbitrary file creation outside entry-changes folderRavishankar N2018-11-051-0/+17
| | | | | | | | | | | | | | | | | | | | | Patch in master: https://review.gluster.org/#/c/glusterfs/+/21534/ Problem: A compromised client can set arbitrary values for the GF_XATTROP_ENTRY_IN_KEY and GF_XATTROP_ENTRY_OUT_KEY during xattrop fop. These values are consumed by index as a filename to be created/deleted according to the key. Thus it is possible to create/delete random files even outside the gluster volume boundary. Fix: Index expects the filename to be a basename, i.e. it must not contain any pathname components like "/" or "../". Enforce this. Fixes: CVE-2018-14654 Fixes: bz#1646204 Change-Id: I35f2a39257b5917d17283d0a4f575b92f783f143 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* leases:Mark the fop conflicting if lease_id not setSoumya Koduri2018-10-251-5/+8
| | | | | | | | | | | | | | | | | Glusterfs leases expects lease_id to be set and sent for each fop to determine conflict resolution with the existing lease. Incase if not set (most likely if there is an older client in a mixed cluster), it makes sense to consider it as conflicitng fop and recall the lease. Also fixed the return status check for __remove_lease(), wherein non-negative value is considered as success case. Change-Id: I5bcfba4f7c71a5af7cdedeb03436d0b818e85783 updates: #350 Signed-off-by: Soumya Koduri <skoduri@redhat.com> (cherry picked from commit cf5b13896d65b6916634976a3a5f61ddeefbc19c)
* tests: check for shd up status in bug-1637802-arbiter-stale-data-heal-lock.tRavishankar N2018-10-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: https://review.gluster.org/#/c/glusterfs/+/21427/ seems to be failing this .t spuriously. On checking one of the failure logs, I see: 22:05:44 Launching heal operation to perform index self heal on volume patchy has been unsuccessful: 22:05:44 Self-heal daemon is not running. Check self-heal daemon log file. 22:05:44 not ok 20 , LINENUM:38 In glusterd log: [2018-10-18 22:05:44.298832] E [MSGID: 106301] [glusterd-syncop.c:1352:gd_stage_op_phase] 0-management: Staging of operation 'Volume Heal' failed on localhost : Self-heal daemon is not running. Check self-heal daemon log file But the tests which preceed this check whether via a statedump if the shd is conected to the bricks, and they have succeeded and even started healing. From glustershd.log: [2018-10-18 22:05:40.975268] I [MSGID: 108026] [afr-self-heal-common.c:1732:afr_log_selfheal] 0-patchy-replicate-0: Completed data selfheal on 3b83d2dd-4cf2-4ea3-a33e-4275be40f440. sources=[0] 1 sinks=2 So the only reason I can see launching heal via cli failing is a race where shd has been spawned but glusterd has not yet updated in-memory that it is up, and hence failing the CLI. Fix: Check for shd up status before launching heal via CLI Change-Id: Ic88abf14ad3d51c89cb438db601fae4df179e8f4 fixes: bz#1641872 Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 3dea105556130abd4da0fd3f8f2c523ac52398d1)
* features/shard: Hold a ref on base inode when adding a shard to lru listKrutika Dhananjay2018-10-256-21/+140
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of: > Change-Id: Ic15ca41444dd04684a9458bd4a526b1d3e160499 > Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> > (cherry picked from commit e627977) > BUG: 1605056 In __shard_update_shards_inode_list(), previously shard translator was not holding a ref on the base inode whenever a shard was added to the lru list. But if the base shard is forgotten and destroyed either by fuse due to memory pressure or due to the file being deleted at some point by a different client with this client still containing stale shards in its lru list, the client would crash at the time of locking lru_base_inode->lock owing to illegal memory access. So now the base shard is ref'd into the inode ctx of every shard that is added to lru list until it gets lru'd out. The patch also handles the case where none of the shards associated with a file that is about to be deleted are part of the LRU list and where an unlink at the beginning of the operation destroys the base inode (because there are no refkeepers) and hence all of the shards that are about to be deleted will be resolved without the existence of a base shard in-memory. This, if not handled properly, could lead to a crash. Change-Id: Ic15ca41444dd04684a9458bd4a526b1d3e160499 updates: bz#1641440 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* storage/posix: Do not fail entry creation fops if gfid handle already existsKrutika Dhananjay2018-10-222-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of: > Change-Id: I84a5e54d214b6c47ed85671a880bb1c767a29f4d > Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> > (cherry picked from commit 15c9976) > BUG: 1638453 PROBLEM: tests/bugs/shard/bug-1251824.t fails occasionally with EIO due to gfid mismatch across replicas on the same shard when dd is executed. CAUSE: Turns out this is due to a race between posix_mknod() and posix_lookup(). posix mknod does 3 operations, among other things: 1. creation of the entry itself under its parent directory 2. setting the gfid xattr on the file, and 3. creating the gfid link under .glusterfs. Consider a case where the thread doing posix_mknod() (initiated by shard) has executed steps 1 and 2 and is on its way to executing 3. And a parallel LOOKUP from another thread on noting that loc->inode->gfid is NULL, tries to perform gfid_heal where it attempts to create the gfid link under .glusterfs and succeeds. As a result, posix_gfid_set() through MKNOD (step 3) fails with EEXIST. In the older code, MKNOD under such conditions was NOT being treated as a failure. But commit e37ee6d changes this behavior by failing MKNOD, causing the entry creation to be undone in posix_mknod() (it's another matter that the stale gfid handle gets left behind if lookup has gone ahead and gfid-healed it). All of this happens on only one replica while on the other MKNOD succeeds. Now if a parallel write causes shard translator to send another MKNOD of the same shard (shortly after AFR releases entrylk from the first MKNOD), the file is created on the other replica too, although with a new gfid (since "gfid-req" that is passed now is a new UUID. This leads to a gfid-mismatch across the replicas. FIX: The solution is to not fail MKNOD (or any other entry fop for that matter that does posix_gfid_set()) if the .glusterfs link creation fails with EEXIST. Change-Id: I84a5e54d214b6c47ed85671a880bb1c767a29f4d fixes: bz#1641429 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* doc: Update release 5.0 release notesv5.0ShyamsundarR2018-10-181-3/+289
| | | | | | | | | - Added missing options - Added bugs fixed Change-Id: I3b788a093bc00fb977f792ce535ed50cc0cd9c9e Updates: bz#1628620 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* gfapi: Bug fixes in leases processing code-pathSoumya Koduri2018-10-187-23/+64
| | | | | | | | | | | | | | | | | This patch fixes below issues in gfapi lease code-path * 'glfs_setfsleasid' should allow NULL input to be able to reset leaseid * Applications should be allowed to (un)register for upcall notifications of type GLFS_EVENT_LEASE_RECALL * APIs added to read contents of GLFS_EVENT_LEASE_RECALL argument which is of type "struct glfs_upcall_lease" This is backport of the below mainline patch - https://review.gluster.org/#/c/glusterfs/+/21391 Change-Id: I3320ddf235cc82fad561e13b9457ebd64db6c76b updates: #350 Signed-off-by: Soumya Koduri <skoduri@redhat.com>
* core: libuuid-devel breakageKaleb S. KEITHLEY2018-10-182-11/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The #include "uuid.h" left over from using .../contrib/uuid is debatably incorrect now that we use the "system header" file /usr/include/uuid/uuid.h from libuuid-devel. Unfortunately this is complicated by things like FreeBSD having its own /usr/include/uuid.h, and the e2fsprogs-libuuid uuid.h in installed - as most third-party packages in FreeBSD are - in /usr/local as /usr/local/include/uuid/uuid.h With a system header file it should at least be #include <uuid.h>, and even better as #include <uuid/uuid.h>, much like the way <sys/types.h> and <net/if.h> are included. Using #include <uuid/uuid.h> guarantees not getting the /usr/include/uuid.h on FreeBSD, but clang/cc knows to find "system" header files like this in /usr/local/include; with or without the -I/... from uuid.pc. Also using #include "uuid.h" leaves the compiler free to find a uuid.h from any -I option it might be passed. (Fortunately we don't have any at this time.) As we now require libuuid-devel or e2fsprogs-libuuid and configure will exit with an error if the uuid.pc file doesn't exist, the HAVE_LIBUUID (including the #elif FreeBSD) tests in compat-uuid.h are redundant. We are guaranteed to have it, so testing for it is a bit silly IMO. It may also break building third party configure scripts if they omit defining it. (Just how hard do we want to make things for third party developers?) Change-Id: I7317f63c806281a5d27de7d3b2208d86965545e1 updates: bz#1639688 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* debug/io-stats: io stats filenames contain garbageN Balachandran2018-10-181-4/+23
| | | | | | | | | | | | | | | | As dict_unserialize does not null terminate the value, using snprintf adds garbage characters to the buffer used to create the filename. The code also used this->name in the filename which will be the same for all bricks for a volume. The files were thus overwritten if a node contained multiple bricks for a volume. The code now uses the conf->unique instead if available. Change-Id: I2c72534b32634b87961d3b3f7d53c5f2ca2c068c fixes: bz#1640392 Signed-off-by: N Balachandran <nbalacha@redhat.com> (cherry picked from commit 219cd649fdbd7bfd6c2268a0a4f66bcc15918e31)
* api: fill out attribute information if not validRaghavendra Gowdappa2018-10-173-2/+147
| | | | | | | | | | | | | | | | | | translators like readdir-ahead selectively retain entry information of iatt (gfid and type) when rest of the iatt is invalidated (for write invalidating ia_size, (m)(c)times etc). Fuse-bridge uses this information and sends only entry information in readdirplus response. However such option doesn't exist in gfapi. This patch modifies gfapi to populate the stat by forcing an extra lookup. Thanks to Shyamsundar Ranganathan <srangana@redhat.com> and Prashanth Pai <ppai@redhat.com> for tests. Change-Id: Ieb5f8fc76359c327627b7d8420aaf20810e53000 Fixes: bz#1630804 Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com> Signed-off-by: Soumya Koduri <skoduri@redhat.com> (cherry picked from commit 6257276d9de3f15643f159b2ec627a67c84fc23d)
* afr: fix incorrect reporting of directory split-brainRavishankar N2018-10-119-17/+85
| | | | | | | | | | | | | | | | | | | Backport of https://review.gluster.org/#/c/glusterfs/+/21135/ Problem: When a directory has dirty xattrs due to failed post-ops or when replace/reset brick is performed, AFR does a conservative merge as expected, but heal-info reports it as split-brain because there are no clear sources. Fix: Modify pending flag to contain information about pending heals and split-brains. For directories, if spit-brain flag is not set,just show them as needing heal and not being in split-brain. Change-Id: I09ef821f6887c87d315ae99e6b1de05103cd9383 fixes: bz#1638163 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* afr: prevent winding inodelks twice for arbiter volumesRavishankar N2018-10-112-1/+45
| | | | | | | | | | | | | | | | | | | | Backport of https://review.gluster.org/#/c/glusterfs/+/21380/ Problem: In an arbiter volume, if there is a pending data heal of a file only on arbiter brick, self-heal takes inodelks twice due to a code-bug but unlocks it only once, leaving behind a stale lock on the brick. This causes the next write to the file to hang. Fix: Fix the code-bug to take lock only once. This bug was introduced master with commit eb472d82a083883335bc494b87ea175ac43471ff Thanks to Pranith Kumar K <pkarampu@redhat.com> for finding the RCA. fixes: bz#1638159 Change-Id: I15ad969e10a6a3c4bd255e2948b6be6dcddc61e1 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* cliutils : python2 to python3 compatSunny Kumar2018-10-101-13/+13
| | | | | | | | | | | This patch fixes import issue in cliutils. Provided solution is to use relative import. Change-Id: I14c9a0b528ef52e7c91f6b17b569c68c2ced8912 updates: #411 Signed-off-by: Sunny Kumar <sunkumar@redhat.com> (cherry picked from commit 8d4c5e022bba1b99786ce13f407c27024beccc23)