summaryrefslogtreecommitdiffstats
path: root/xlators
Commit message (Collapse)AuthorAgeFilesLines
* glusterd: Free up svc->conn on volume deleteAtin Mukherjee2017-12-121-0/+4
| | | | | | | | | | | | | Daemons like snapd, tierd and gfproxyd are maintained on per volume basis and on a volume delete we should destroy the rpc connection established for them. >mainline patch : https://review.gluster.org/#/c/18957/ Change-Id: Id1440e39da07b990fdb9b207df18da04b1ca8014 BUG: 1523048 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> (cherry picked from commit 36ce4c614a3391043a3417aa061d0aa16e60b2d3)
* Disable gfid2path by default on NetBSDEmmanuel Dreyfus2017-12-081-0/+11
| | | | | | | | | | | | | | | NetBSD storage of extended attributes for UFS1 badly scales when the list of extended attributes names rises. gfid2path can add as many extended attributes names as we have files, hence we keep it disabled for performance sake. > Change-Id: Id77b5f5ceb4d5eba1b3362b4b9fc693450ffbc2b > Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> > BUG: 1129939 Change-Id: Id77b5f5ceb4d5eba1b3362b4b9fc693450ffbc2b Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> BUG: 1513258
* cluster/dht: don't overfill the buffer in readdir(p)Raghavendra G2017-12-081-3/+18
| | | | | | | | | | | | | | | | | | | | | | | | Superflous dentries that cannot be fit in the buffer size provided by kernel are thrown away by fuse-bridge. This means, * the next readdir(p) seen by readdir-ahead would have an offset of a dentry returned in a previous readdir(p) response. When readdir-ahead detects non-monotonic offset it turns itself off which can result in poor readdir performance. * readdirp can be cpu-intensive on brick and there is no point to read all those dentries just to be thrown away by fuse-bridge. So, the best strategy would be to fill the buffer optimally - neither overfill nor underfill. > Change-Id: Idb3d85dd4c08fdc4526b2df801d49e69e439ba84 > BUG: 1492625 > Signed-off-by: Raghavendra G <rgowdapp@redhat.com> (cherry picked from commit e785faead91f74dce7c832848f2e8f3f43bd0be5) Change-Id: Idb3d85dd4c08fdc4526b2df801d49e69e439ba84 BUG: 1478411 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: populate inode in dentry for single subvolume dhtRaghavendra G2017-12-062-1/+69
| | | | | | | | | | | | | | | ... in readdirp response if dentry points to a directory inode. This is a special case where the entire layout is stored in one single subvolume and hence no need for lookup to construct the layout >Change-Id: I44fd951e2393ec9dac2af120469be47081a32185 >BUG: 1492625 >Signed-off-by: Raghavendra G <rgowdapp@redhat.com> (cherry picked from commit 59d1cc720f52357f7a6f20bb630febc6a622c99c) Change-Id: I44fd951e2393ec9dac2af120469be47081a32185 BUG: 1478411 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* glusterd: display gluster volume status, when quorum type is serverSanju Rakonde2017-11-301-0/+6
| | | | | | | | | | | | | | Problem: when server-quorum-type is server, after restarting glusterd in the node which is up, gluster volume status is giving incorrect information. Fix: check whether server is blank, before adding other keys into the dictionary. Change-Id: I926ebdffab330ccef844f23f6d6556e137914047 BUG: 1511782 Signed-off-by: Sanju Rakonde <srakonde@redhat.com> (cherry picked from commit 046c7e3199fca715592762e271e6061ac99b0c4b)
* cluster/afr: Honor default timeout of 5min for analyzing split-brain fileskarthik-us2017-11-301-1/+5
| | | | | | | | | | | | | | | | | | | | Problem: After setting split-brain-choice option to analyze the file to resolve the split brain using the command "setfattr -n replica.split-brain-choice -v "choiceX" <path-to-file>" should allow to access the file from mount for default timeout of 5mins. But the timeout was not honored and was able to access the file even after the timeout. Fix: Call the inode_invalidate() in afr_set_split_brain_choice_cbk() so that it will triger the cache invalidate after resetting the timer and the split brain choice. So the next calls to access the file will fail with EIO. Change-Id: I698cb833676b22ff3e4c6daf8b883a0958f51a64 BUG: 1514380 Signed-off-by: karthik-us <ksubrahm@redhat.com> (cherry picked from commit 933ec57ccda2c1ba5ce6f207313c3b6802e67ca3)
* features/locks: Fix memory leaksXavier Hernandez2017-11-305-5/+11
| | | | | | | | | Backport of: > BUG: 1515161 Change-Id: Ic1d2e17a7d14389b6734d1b88bd28c0a2907bbd6 BUG: 1517689 Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
* cluster/dht: make rebalance use truncate incaseSusant Palai2017-11-233-71/+99
| | | | | | | | | | | | | .. the brick file system does not support fallocate. > Change-Id: Id76cda2d8bb3b223b779e5e7a34f17c8bfa6283c > BUG: 1488103 > Signed-off-by: Susant Palai <spalai@redhat.com> Change-Id: Id76cda2d8bb3b223b779e5e7a34f17c8bfa6283c BUG: 1516691 Signed-off-by: Susant Palai <spalai@redhat.com>
* cluster/dht: Don't set ACLs on linkto fileN Balachandran2017-11-201-0/+11
| | | | | | | | | | | | | | | | | The trusted.SGI_ACL_FILE appears to set posix ACLs on the linkto file that is a target of file migration. This can mess up file permissions and cause linkto identification to fail. Now we remove all ACL xattrs from the results of the listxattr call on the source before setting them on the target. > BUG: 1514329 > Signed-off-by: N Balachandran <nbalacha@redhat.com> Change-Id: I56802dbaed783a16e3fb90f59f4ce849f8a4a9b4 BUG: 1515042 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* glusterd: restart the brick if qorum status is NOT_APPLICABLE_QUORUMAtin Mukherjee2017-11-101-1/+2
| | | | | | | | | | | | | | If a volume is not having server quorum enabled and in a trusted storage pool all the glusterd instances from other peers are down, on restarting glusterd the brick start trigger doesn't happen resulting into the brick not coming up. > mainline patch : https://review.gluster.org/#/c/18669/ Change-Id: If1458e03b50a113f1653db553bb2350d11577539 BUG: 1511301 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> (cherry picked from commit 635c1c3691a102aa658cf1219fa41ca30dd134ba)
* md-cache: avoid checking the xattr value buffer with string functions.Günther Deschner2017-11-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | xattrs may very well contain binary, non-text data with leading 0 values. Using strcmp for checking empty values is not the appropriate thing to do: In the best case, it might treat a binary xattr value starting with 0 from being cached (and hence also from being reported back with xattr). In the worst case, we might read beyond the end of a data blob that does contain any zero byte. We fix this by checking the length of the data blob and checking the first byte against 0 if the length is one. > Signed-off-by: Guenther Deschner <gd@samba.org> > Pair-Programmed-With: Michael Adam <obnox@samba.org> > Change-Id: If723c465a630b8a37b6be58782a2724df7ac6b11 > BUG: 1476324 > Reviewed-on: https://review.gluster.org/17910 > Reviewed-by: Michael Adam <obnox@samba.org> > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Poornima G <pgurusid@redhat.com> > Tested-by: Poornima G <pgurusid@redhat.com> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > (cherry picked from commit ab4ffdac9dec1867f2d9b33242179cf2b347319d) Change-Id: If723c465a630b8a37b6be58782a2724df7ac6b11 BUG: 1499892 Signed-off-by: Günther Deschner <gd@samba.org>
* glusterd : introduce timer in mgmt_v3_lockGaurav Yadav2017-11-064-17/+241
| | | | | | | | | | | | | | | | | | Problem: In a multinode environment, if two of the op-sm transactions are initiated on one of the receiver nodes at the same time, there might be a possibility that glusterd may end up in stale lock. Solution: During mgmt_v3_lock a registration is made to gf_timer_call_after which release the lock after certain period of time >mainline patch : https://review.gluster.org/#/c/18437/ Change-Id: I16cc2e5186a2e8a5e35eca2468b031811e093843 BUG: 1503239 Signed-off-by: Gaurav Yadav <gyadav@redhat.com>
* protocol/server: fix the comparision logic in case of subdir mountAmar Tumballi2017-11-061-30/+30
| | | | | | | | | | | | | | without the fix, the stat entry on a file would return inode==1 for many files, in case of subdir mount This happened with the confusion of return value of 'gf_uuid_compare()', it is more like strcmp, instead of a gf_boolean return value, and hence resulted in the bug. Change-Id: I31b8cbd95eaa3af5ff916a969458e8e4020c86bb BUG: 1505527 Signed-off-by: Amar Tumballi <amarts@redhat.com> (cherry picked from commit 2ade36cd98ea0f5bd2a8f619a19c20438318afaf)
* protocol/client: handle the subdir handshake properly for add-brickAmar Tumballi2017-11-061-1/+9
| | | | | | | | | | There should be different way we handle handshake in case of subdir mount for the first time, and in case of subsequent graph changes. Change-Id: I2a7ba836433bb0a0f4a861809e2bb0d7fbc4da54 BUG: 1505323 Signed-off-by: Amar Tumballi <amarts@redhat.com> (cherry picked from commit 9aa574a51b84717c1f3949ed2e28a49e49840a93)
* md-cache: Use correct xattr keynames for virtual glusterfs ACLs.Günther Deschner2017-11-061-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The "glusterfs.posix_acl." prefix does not catch the glusterfs posix acl xattr keynames which are * "glusterfs.posix.acl" and * "glusterfs.posix.default_acl" Using the GF_POSIX_ACL_ACCESS and GF_POSIX_ACL_DEFAULT defines directly is the savest option. Guenther > BUG: 1476295 > Signed-off-by: Guenther Deschner <gd@samba.org> > Reviewed-on: https://review.gluster.org/17909 > Reviewed-by: Michael Adam <obnox@samba.org> > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Niels de Vos <ndevos@redhat.com> > Tested-by: Niels de Vos <ndevos@redhat.com> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > (cherry picked from commit 5fe8555800cbc9818e7c976f63499795a378cd8d) > Signed-off-by: Günther Deschner <gd@samba.org> Change-Id: I5aba64b26b6cbec850ea02316dd9f069400e857f BUG: 1499889 Signed-off-by: Günther Deschner <gd@samba.org>
* glusterd: clean up portmap on brick disconnectAtin Mukherjee2017-11-064-11/+46
| | | | | | | | | | | | | | | | | | | | | | | GlusterD's portmap entry for a brick is cleaned up when a PMAP_SIGNOUT event is initiated by the brick process at the shutdown. But if the brick process crashes or gets killed through SIGKILL then this event is not initiated and glusterd ends up with a stale port. Since GlusterD's portmap traversal happens both ways, forward for allocation and backward for registry search, there is a possibility that glusterd might end up running with a stale port for a brick which eventually will end up with clients to fail to connect to the bricks. Solution is to clean up the port entry in case the process is down as part of the brick disconnect event. Although with this the handling PMAP_SIGNOUT event becomes redundant in most of the cases, but this is the safeguard method to avoid glusterd getting into the stale port issues. > mainline patch : https://review.gluster.org/#/c/18541/ Change-Id: I04c5be6d11e772ee4de16caf56dbb37d5c944303 BUG: 1507747 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> (cherry picked from commit 30e0b86aae00430823f2523c6efa3c4ebbf0a478)
* glusterd: fix brick restart parallelismAtin Mukherjee2017-11-066-32/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | glusterd's brick restart logic is not always sequential as there is atleast three different ways how the bricks are restarted. 1. through friend-sm and glusterd_spawn_daemons () 2. through friend-sm and handling volume quorum action 3. through friend handshaking when there is a mimatch on quorum on friend import. In a brick multiplexing setup, glusterd ended up trying to spawn the same brick process couple of times as almost in fraction of milliseconds two threads hit glusterd_brick_start () because of which glusterd didn't have any choice of rejecting any one of them as for both the case brick start criteria met. As a solution, it'd be better to control this madness by two different flags, one is a boolean called start_triggered which indicates a brick start has been triggered and it continues to be true till a brick dies or killed, the second is a mutex lock to ensure for a particular brick we don't end up getting into glusterd_brick_start () more than once at same point of time. Change-Id: I292f1e58d6971e111725e1baea1fe98b890b43e2 BUG: 1508283 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> (cherry picked from commit 82be66ef8e9e3127d41a4c843daf74c1d8aec4aa)
* glusterd: delete source brick only once in reset-brick commit forceAtin Mukherjee2017-11-021-1/+1
| | | | | | | | | | | | | | While stopping the brick which is to be reset and replaced delete_brick flag was passed as true which resulted glusterd to free up to source brick before the actual operation. This results commit force to fail failing to find the source brickinfo. > mainline patch : https://review.gluster.org/#/c/18581/ Change-Id: I1aa7508eff7cc9c9b5d6f5163f3bb92736d6df44 BUG: 1507877 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> (cherry picked from commit 0fb8acaa6ff80c43e46deac0ce66b29ae0df0ca4)
* glusterd: persist brickinfo's port change into glusterd's storeGaurav Yadav2017-11-025-10/+61
| | | | | | | | | | | | | | | | | | Problem: Consider a case where node reboot is performed and prior to reboot brick was listening to 49153. Post reboot glusterd assigned 49152 to brick and started the brick process but the new port was never persisted. Now when glusterd restarts glusterd always read the port from its persisted store i.e 49153 however pmap signin happens with the correct port i.e 49152. Fix: Make sure when glusterd_brick_start is called, glusterd_store_volinfo is eventually invoked. Change-Id: Ic0efbd48c51d39729ed951a42922d0e59f7115a1 BUG: 1507748 Signed-off-by: Gaurav Yadav <gyadav@redhat.com>
* features/worm: Adding check to newloc when doing renameluneo72017-10-251-2/+16
| | | | | | | | | | | | | | | | | | | | | | | Problem: Since rename didn't check if newloc exists and it's retention state it was possible to rename a new file that wasn't in retention over a existing file that was in read-only state. Cherry picked from commit 00a4dc0: > Change-Id: I63c6bbabb7bb456ebedf201cc77b878ffda62229 > BUG: 1484490 > Signed-off-by: luneo7 <luneo7@gmail.com> > Reviewed-on: https://review.gluster.org/18104 > Tested-by: jiffin tony Thottan <jthottan@redhat.com> > Tested-by: Prashanth Pai <ppai@redhat.com> > Smoke: Gluster Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Prashanth Pai <ppai@redhat.com> > Reviewed-by: Karthik U S <ksubrahm@redhat.com> > Reviewed-by: Amar Tumballi <amarts@redhat.com> Change-Id: I63c6bbabb7bb456ebedf201cc77b878ffda62229 BUG: 1484489 Signed-off-by: luneo7 <luneo7@gmail.com>
* glusterd: documenting server.allow-insecureSanju Rakonde2017-10-251-1/+1
| | | | | | | | | | | | | problem: "server.allow-insecure" is invisible in gluster volume set help. Fix: "server.allow-insecure" is defined as NO_DOC type, chainging it to DOC type solve the problem. Change-Id: I327f1e4c1684ff846deb8b7df07d4d8a09073274 BUG: 1505373 Signed-off-by: Sanju Rakonde <srakonde@redhat.com> (cherry picked from commit c0b08f10ed07bfe06309e31a7fff85cadb733ce2)
* libgfchangelog: Fix possible null pointer dereferenceKotresh HR2017-10-251-6/+6
| | | | | | | | | | | | | | | | If pthread_attr_init fails, gf_msg uses this->name where 'this' is not initialized yet. This patch fixes the same. > Change-Id: Ie004cbe1015a0d62fc3b5512e8954c5606eeeb5f > Signed-off-by: Kotresh HR <khiremat@redhat.com> > BUG: 1505325 (cherry picked from commit 738c38f0efa7b4d4dab0cf23d00589d68e4eb88d) Change-Id: Ie004cbe1015a0d62fc3b5512e8954c5606eeeb5f Signed-off-by: Kotresh HR <khiremat@redhat.com> BUG: 1505856
* cluster/dht: fix crash when deleting directoriesZhang Huan2017-10-251-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | In DHT, after locks on all subvolumes are acquired, it would perform the following steps sequentially, 1. send remove dir on all other subvolumes except the hashed one in a loop; 2. wait for all pending rmdir to be done 3. remove dir on the hashed subvolume The problem is that in step 1 there is a check to skip hashed subvolume in the loop. If the last subvolume to check is actually the hashed one, and step 3 is quickly done before the last and hashed subvolume is checked, by accessing shared context data be destroyed in step 3, would cause a crash. Fix by saving shared data in a local variable to access later in the loop. > BUG: 1490642 > Signed-off-by: Zhang Huan <zhanghuan@open-fs.com> (cherry picked from commit 206120126d455417a81a48ae473d49be337e9463) Change-Id: I8db7cf7cb262d74efcb58eb00f02ea37df4be4e2 BUG: 1505221 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* protocol-auth: use the proper validation methodAmar Tumballi2017-10-251-33/+7
| | | | | | | | | | Currently, server protocol's init and glusterd's option validation methods are different, causing an issue. They should be same for having consistent behavior Change-Id: Ibbf9a18c7192b2d77f9b7675ae7da9b8d2fe5de4 BUG: 1501315 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* geo-rep: Filter out volume-mark xattrKotresh HR2017-10-231-0/+1
| | | | | | | | | | | | | | | | | | | The volume-mark xattr, maintained at brick root of slave volume is specific to geo-replication and should be filtered out for all other clients. It should also be filtered out from list getxattr from all mounts including geo-rep mount as it might cause rsync to read and set. > Change-Id: If9eb5a3af18051083c853e70d93b2819e8eea222 > BUG: 1500433 > Signed-off-by: Kotresh HR <khiremat@redhat.com> (cherry picked from commit c64fd0d4b0ef313bb44aae68a376ec0c9ee8657a) Change-Id: If9eb5a3af18051083c853e70d93b2819e8eea222 BUG: 1502104 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* mount/fuse : Fix parsing of vol_id for snapshot volumeMohammed Rafi KC2017-10-151-2/+4
| | | | | | | | | | | | | | | | | | | | | For supporting sub-dir mount, we changed the volid. Which means anything after a '/' in volume_id will be considered as sub-dir path. But snapshot volume has vol_id stracture of /snaps/<volname>/<snapname> which has to be considered as during the parsing. Note 1: sub-dir mount is not supported on snapshot volume Note 2: With sub-dir mount changes brick based mount for quota cannot be executed via mount command. It has to be a direct call via glusterfs Backport of> >Change-Id: I0d824de0236b803db8a918f683dabb0cb523cb04 >BUG: 1501235 >Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Change-Id: I0d824de0236b803db8a918f683dabb0cb523cb04 BUG: 1501238 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> (cherry picked from commit 067f38063e13fc75d4e3f7adf93441d15099c557)
* cluster/ec: Improve performance with xattrop updateSunil Kumar Acharya2017-10-121-24/+102
| | | | | | | | | | | | | | | | Existing EC code updates the xattr on the subvolume in a sequential pattern resulting in very poor performance. With this fix EC now updates the xattr on the subvolume in parallel which improves the xattr update performance. >BUG: 1445663 >Change-Id: I3fc40d66db0b88875ca96a9fa01002ba386c0486 >Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com> BUG: 1499150 Change-Id: I3fc40d66db0b88875ca96a9fa01002ba386c0486 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
* glusterd:Marking all the brick status as stopped when a process goes down in ↵Sanju Rakonde2017-10-121-1/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | brick multiplexing In brick multiplexing environment, if a brick process goes down i.e., if we kill it with SIGKILL, the status of the brick for which the process came up for the first time is only changing to stopped. all other brick statuses are remain started. This is happening because the process was killed abruptly using SIGKILL signal and signal handler wasn't invoked and further cleanup wasn't triggered. When we try to start a volume using force, it shows error saying "Request timed out", since all the brickinfo->status are still in started state, we're waiting for one of the brick process to come up which never going to happen since the brick process was killed. To resolve this, In the disconnect event, We are checking all the processes that whether the brick which got disconnected belongs the process. Once we get the process we are calling a function named glusterd_mark_bricks_stopped_by_proc() and sending brick_proc_t object as an argument. From the glusterd_brick_proc_t we can get all the bricks attached to that process. but these are duplicated ones. To get the original brickinfo we are reading volinfo from brick. In volinfo we will have original brickinfo copies. We are changing brickinfo->status to stopped for all the bricks. >Change-Id: Ifb9054b3ee081ef56b39b2903ae686984fe827e7 >BUG: 1499509 >Signed-off-by: Sanju Rakonde <srakonde@redhat.com> >Reviewed-on: https://review.gluster.org/#/c/18444/ >Smoke: Gluster Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Atin Mukherjee <amukherj@redhat.com> >cherry picked from commit 9422446d72bc054962d72ace9912ecb885946d49) Change-Id: Ifb9054b3ee081ef56b39b2903ae686984fe827e7 BUG: 1501154 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* cluster/afr: Make choose-local "reconfigurable"Krutika Dhananjay2017-10-121-0/+11
| | | | | | | | | | | | | | | | Backport of: > Change-Id: Ibab292ba705d993b475cd0303fb3318211fb2500 > Reviewed-on: https://review.gluster.org/18026 > BUG: 1480525 > cherry-picked from commit 1e2d6537875d16b783e3c50ada7ee61487c6d796 With this change, enabling choose-local (which means its state makes transition from "off" to "on") will be effective after the first gfid-lookup on "/" since volume-set was executed. Change-Id: Ibab292ba705d993b475cd0303fb3318211fb2500 BUG: 1501022 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* fuse/readdirp: Remove need_lookup from fuse_readdirp_cbkSusant Palai2017-10-121-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | background: Various xlators used to populate their ctx, on an explicit lookup. That means without a lookup, the translator will have either null or stale data to function. E.g. dht would depend on lookup to create linkto files on the correct node/hashed subvol, afr would rely on this lookup to heal pending data/metadata etc. So to complete above actions a lookup used to be issued on files, even their inode was populated on a readdirp_cbk. This was done by setting the need_lookup flag on all the files those were read on readdirp fop. We tried a small test on "ACL client". For listing 50k files on root itself, it took around 50seconds with readdirp enabled while the same operation took 5-6 seconds with readdirp disabled. Both the times md-cache was enabled. We observed that on the 1st test case (readdirp enabled), post readdirp a getxattr is done. The number of getxattr depends on the number of acl xattrs (I saw requests on these two: system.posix_acl_default, system.posix_acl_access). Since need_lookup flag is set, during fuse_resolve a nameless lookup is executed on the inode(getxattr being inode operation, hence the nameless lookup). Since md-cache does not serve nameless lookup, a network hop is needed for each file, costing the time. With readdirp disabled, the getxattrs are served from md-cache itself(note: we are discussing the 2nd attempt of ls -l use case). _Current affairs around need of lookup for a file to populate it's ctx_: For the xlators on client stack we discussed quite extensively about the need for a lookup fop post readdirp in all three cluster translators - afr, EC and dht. EC and dht don't really need a nameless lookup post readdirp. For afr too, the need for lookup was negated with patch (http://review.gluster.org/6010 - AFRV2), where afr added a function called afr_inode_refresh() which does a lookup and populates its inode context in case a FOP came to AFR without a lookup being issued prior to it. We ran a thread on gluster-devel asking for feedback on the need of explicit lookup post readdirp. For responses refer [1]. Refer [2] for discussions happened on gerrit. After gathering inputs from [1] and [2], it looks like there is no xlator in current state that requires an explicit lookup post readdirp to function properly. * A separate similar patch will be sent for gfapi/nfs/nfs-ganesha. Note: Only file's inode is built with readdirp. [1] http://lists.gluster.org/pipermail/gluster-devel/2017-August/053505.html [2] https://review.gluster.org/#/c/17985/ > Change-Id: Ie1d68ce7bea5e1f8a1fab9a62217f478322554f5 > BUG: 1492996 > Signed-off-by: Susant Palai <spalai@redhat.com> Change-Id: Ie1d68ce7bea5e1f8a1fab9a62217f478322554f5 BUG: 1499123 Signed-off-by: Susant Palai <spalai@redhat.com>
* cluster/dht: Don't store the entire uuid for subvolsN Balachandran2017-10-124-17/+45
| | | | | | | | | | | | | | | | | Comparing the uuid string of the local node against that stored in the local_subvol information is inefficient, especially as it is done for every file to be migrated. The code has now been changed to set the value of info to 1 if the nodeuuid is that of the node making the comparison so this becomes an integer comparison. > BUG: 1451434 > Signed-off-by: N Balachandran <nbalacha@redhat.com> > https://review.gluster.org/#/c/17851 (cherry picked from commit c4a608799a577a4f38139f6bb8a47da8efb0fec3) Change-Id: I7491d59caad3b71dbf5facc94dcde0cd53962775 BUG: 1500472 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* glusterd: disallow replace brick for dist only volumesAtin Mukherjee2017-10-121-1/+11
| | | | | | | | | | | | | | | | | | | | Allowing replace-brick on dist only volumes will lead to data loss. This patch blocks replace brick commit force to fail if a volume is dist only. Also removing tests/basic/pump.t as its of no use as per the discussion in http://lists.gluster.org/pipermail/gluster-devel/2017-September/053652.html >Reviewed-on: https://review.gluster.org/18226 >Smoke: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: N Balachandran <nbalacha@redhat.com> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >(cherry picked from commit 7f70d38b66ce755f848ff0197814457a28b321df) Change-Id: Iabb0c16f865f3fc361b64a19bfcf0c0fbb5c2682 BUG: 1493975 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* cli/afr: gluster volume heal info "healed" command output is not appropriateMohit Agrawal2017-10-111-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: "gluster volume heal info [healed] [heal-failed]" command output on terminal is not appropriate in case of down any volume. Solution: To make message more appropriate change the condition in function "gd_syncop_mgmt_brick_op". Test : To verify the fix followed below procedure 1) Create 2*3 distribute replicate volume 2) set self-heal daemon off 3) kill two bricks (3, 6) 4) create some file on mount point 5) bring brick 3,6 up 6) kill other two brick (2 and 4) 7) make self heal daemon on 8) run "gluster v heal <vol-name>" Note: After apply the patch options (healed | heal-failed) will deprecate from command line. > BUG: 1388509 > Change-Id: I229c320c9caeb2525c76b78b44a53a64b088545a > Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> > (Cherry pick from commit d1f15cdeb609a1b720a04a502f7a63b2d3922f41) BUG: 1500662 Change-Id: I229c320c9caeb2525c76b78b44a53a64b088545a Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* glusterd: fix invalid memory reference returnedXavier Hernandez2017-10-101-2/+9
| | | | | | | | | | | | | > BUG: 1490897 > Reviewed-on: https://review.gluster.org/18263 > Smoke: Gluster Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> > Reviewed-by: Gaurav Yadav <gyadav@redhat.com> Change-Id: I0823c7b33060b48040c1d86ad346a5f6e15bc190 BUG: 1491178 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
* afr: heal gfid as a part of entry healRavishankar N2017-10-104-67/+120
| | | | | | | | | | | | | | | | | | | Problem: If a brick crashes after an entry (file or dir) is created but before gfid is assigned, the good bricks will have pending entry heal xattrs but the heal won't complete because afr_selfheal_recreate_entry() tries to create the entry again and it fails with EEXIST. Fix: We could have fixed posx_mknod/mkdir etc to assign the gfid if the file already exists but the right thing to do seems to be to trigger a lookup on the bad brick and let it heal the gfid instead of winding an mknod/mkdir in the first place. (cherry picked from commit 20fa80057eb430fd72b4fa31b9b65598b8ec1265) Change-Id: I82f76665a7541f1893ef8d847b78af6466aff1ff BUG: 1499202 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* glusterd: fix client io-threads option for replicate volumesRavishankar N2017-10-096-34/+92
| | | | | | | | | | | | | | | | | | | | | Backport of https://review.gluster.org/#/c/18430/ Problem: Commit ff075a3d6f9b142911d25c27fd209838782bfff0 disabled loading client-io-threads for replicate volumes (it was set to on by default in commit e068c1997314046658dd502e9118dab32decf879) due to performance issues but in doing so, inadvertently failed to load the xlator even if the user explicitly enabled the option using the volume set command. This was despite returning returning sucess for the volume set. Fix: Modify the check in perfxl_option_handler() and add checks in volume create/add-brick/remove-brick code paths, tying it all to GD_OP_VERSION_3_12_2. Change-Id: Ib612973a999a7da818cc926f5c2601b1f0794fcf BUG: 1499158 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* glusterd: spelling errors reported by Debian maintainerKaleb S. KEITHLEY2017-10-062-4/+4
| | | | | | | | | | Reported-by: "Patrick Matthäi" <pmatthaei@debian.org> master https://review.gluster.org/18185 Change-Id: I0dd6b7d88ddf3c98e8083b75f8dd848babcfd30a BUG: 1494523 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* Posix: fix additional inode refhari gowtham2017-10-061-4/+1
| | | | | | | | | | | | | | | backport of https://review.gluster.org/#/c/18406/ Problem: In the iteration, the inode is being ref-ed twice and unref-ed once. this leads to ref leak. Fix: assign the parent to the inode instead of referencing it. >BUG: 1496379 Change-Id: Ib154b12d38ad68220f8f5288bbc50081beccc2b9 BUG: 1497084 Signed-off-by: hari gowtham <hgowtham@redhat.com>
* cluster/afr: Sending subvol up/down events when subvol comes up or goes downkarthik-us2017-10-061-0/+2
| | | | | | | | | | > BUG: 1493539 (cherry picked from commit 3bbb4fe4b33dc3a3ed068ed2284077f2a4d8265a) Change-Id: I6580351b245d5f868e9ddc6a4eb4dd6afa3bb6ec BUG: 1492066 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* features/locks: Maintain separation of lock->client_pid, flock->l_pidPranith Kumar K2017-10-062-34/+15
| | | | | | | | | | | | | | | | | | | | | | | Problem: grant_blocked_locks() constructs flock from lock. Locks xlator uses frame->root->pid interchangeably flock->l_pid. With gNFS frame->root->pid (which translates to lock->client_pid) is not same as flock->l_pid, this leads to lk's cbk returning flock with l_pid from lock->client_pid instead of input flock->l_pid. This triggers EC's error code path leading to failure of lk call, because the response' flock->l_pid is different from request's flock->l_pid. Fix: Maintain separation of lock->client_pid, flock->l_pid. Always unwind with flock with correct pid. >BUG: 1472961 >Change-Id: Ifab35c458662cf0082b902f37782f8c5321d823d >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >(cherry picked from commit 572b4bf889d903dcaed49a57a75270a763dc259d) BUG: 1496326 Change-Id: Ifab35c458662cf0082b902f37782f8c5321d823d Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* afr: don't check for file size in afr_mark_source_sinks_if_file_emptyRavishankar N2017-10-051-6/+7
| | | | | | | | | | ... for AFR_METADATA_TRANSACTION and just mark source and sinks if metadata is the same. (cherry picked from commit 24637d54dcbc06de8a7de17c75b9291fcfcfbc84) Change-Id: I69e55d3c842c7636e3538d1b57bc4deca67bed05 BUG: 1496317 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* glusterd: retrieve uuid under mutex lockAtin Mukherjee2017-10-051-7/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In a multi node cluster, if one of the glusterd instance goes down and comes back, then there might be a race situation where glusterd needs to retrieve its uuid (glusterd_retrieve_uuid) and at the same time as part of receiving a friend handshake from other peer, glusterd iterates over the volume information recieved from remote node and checks for if a brick is local or not by calling MY_UUID which in turn calls glusterd_retrieve_uuid. And the same applies for glusterd_store_global_info () function too. This could end up in a situation where for the same node glusterd ends up generating two UUID files in /var/lib/glusterd. Following is the log snippet which confirms the above: [2017-09-01 03:09:24.458030] I [glusterd.c:146:glusterd_uuid_init] 0-management: retrieved UUID: fd46a495-7e33-468f-88f6-63c815fac640 // thread 1 retrieve uuid from glusterd.info [2017-09-01 03:09:24.458034] E [glusterd-store.c:2109:glusterd_retrieve_uuid] 0-: No previous uuid is present //thread 2 can not retrieve uuid, because in thread1 the file pointer has already become eof. [2017-09-01 03:09:24.458041] E [glusterd-store.c:2117:glusterd_retrieve_uuid] 0-: Returning -1 [2017-09-01 03:09:24.458076] I [glusterd.c:176:glusterd_uuid_generate_save] 0-management: generated UUID: 190bb292-a296-4125-96da-42b247511cc4 [2017-09-01 03:09:24.458129] E [store.c:367:gf_store_save_value] 0-: Able to store key: UUID,value: 190bb292-a296-4125-96da-42b247511cc4 Fix is to retrieve the uuid under mutex lock. Credits : cynthia.zhou@nokia-sbell.com > Reviewed-on: : https://review.gluster.org/#/c/18333/ >(cherry picked from commit 898f0b7ce31ddf8ec02e572c5d22eff2e4205b4c) Change-Id: Ib9a5e159c3febf2aef13aa5e38f0a51fe409dadb BUG: 1495162 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* afr: auto-resolve split-brains for zero-byte filesRavishankar N2017-10-053-0/+79
| | | | | | | | | | | | | | | | | | | | | | | | | Problems: As described in BZ 1491670, renaming hardlinks can result in data/mdata split-brain of the DHT link-to files (T files) without any mismatch of data and metadata. As described in BZ 1486063, for a zero-byte file with only dirty bits set, arbiter brick will likely be chosen as the source brick. Fix: For zero byte files in split-brain, pick first brick as a) data source if file size is zero on all bricks. b) metadata source if metadata is the same on all bricks In arbiter case, if file size is zero on all bricks and there are no pending afr xattrs, pick 1st brick as data source. (cherry picked from commit 1719cffa911c5287715abfdb991bc8862f0c994e) Change-Id: I0270a9a2f97c3b21087e280bb890159b43975e04 BUG: 1496317 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reported-by: Rahul Hinduja <rhinduja@redhat.com> Reported-by: Mabi <mabi@protonmail.ch>
* mount/fuse: Make event-history feature configurableKrutika Dhananjay2017-10-053-14/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | ... and disable it by default. Backport of: > Change-Id: Ia533788d309c78688a315dc8cd04d30fad9e9485 > Reviewed-on: https://review.gluster.org/18242 > BUG: 1467614 > cherry-picked from commit 956d43d6e89d40ee683547003b876f1f456f03b6 This is because having it disabled seems to improve performance. This could be due to the lock contention by the different epoll threads on the circular buff lock in the fop cbks just before writing their response to /dev/fuse. Just to provide some data - wrt ovirt-gluster hyperconverged environment, I saw an increase in IOPs by 12K with event-history disabled for randrom read workload. Usage: mount -t glusterfs -o event-history=on $HOSTNAME:$VOLNAME $MOUNTPOINT OR glusterfs --event-history=on --volfile-server=$HOSTNAME --volfile-id=$VOLNAME $MOUNTPOINT Change-Id: Ia533788d309c78688a315dc8cd04d30fad9e9485 BUG: 1495397 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* eventsapi: Fix issue with CLIENT_CONNECT eventAravinda VK2017-10-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | A mismatch in event format causing below error in events.log when it detects CLIENT_CONNECT event. [2017-09-19 09:35:06,785] WARNING [glustereventsd - 46:handle] - Unable to parse Event 1505793906 97 client_uid=f241-16363-2017/09/19-04:05:06:747558-gv1-client- 0-0-0;client_identifier=192.168.122.208:49150;server_identifier= 192.168.122.208:49152;brick_path=/bricks/b1,subdir_mount=(null) > Reviewed-on: https://review.gluster.org/18322 > Reviewed-by: Atin Mukherjee <amukherj@redhat.com> > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Amar Tumballi <amarts@redhat.com> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Prashanth Pai <ppai@redhat.com> > (cherry picked from commit 23649f9ab73448737ee5d9509502f96e4775dca3) BUG: 1492061 Change-Id: Ie6d507725a7e6b54fca44651f9c5e66eca2be244 Signed-off-by: Aravinda VK <avishwan@redhat.com>
* features/shard: Change default shard-block-size to 64MBKrutika Dhananjay2017-09-281-1/+1
| | | | | | | | | | | | Backport of: > Change-Id: I55fa87e07136cff10b0d725ee24dd3151016e64e > BUG: 1489823 > Reviewed-on: https://review.gluster.org/18243 > cherry picked from commit e4a59b384f5bbaaeb937a53cef64f4e388f85153 Change-Id: I55fa87e07136cff10b0d725ee24dd3151016e64e BUG: 1492026 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* mount/fuse: Include sub-directory in source argument for mount()Vijay Bellur2017-09-111-1/+7
| | | | | | | | | | | | | | | | With this, mount of a sub-directory 'foo' gets listed in /proc/mounts as: <hostname>:<volname>/foo on /mnt/glusterfs type fuse.glusterfs (rw,relatime...) Signed-off-by: Vijay Bellur <vbellur@redhat.com> BUG: 1490493 Change-Id: Ib1e1ac3741bf66e1a912d792f2948b748931f2b0 Reviewed-on: https://review.gluster.org/18210 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> (cherry picked from commit 84f8fb81d73b87463092eb082a5cc6a4055103f4)
* afr: heal metadata in discover code pathRavishankar N2017-09-112-17/+46
| | | | | | | | | | | | | | | | | | | | | | | Combined backport of https://review.gluster.org/17850 and https://review.gluster.org/18187 During graph switch, if fuse sends nameless (gfid) lookups, afr takes the discover code path to serve it. If there are pending metadata heals, they do not happen unless an inode refresh happens as a part of discover (which is not guaranteed to happen always). This patch fixes it by attempting metadata heal as a part of discover, just like how it is done in lookup code path. Change-Id: I87c493045b9225741cad173bf3f645848697032e BUG: 1488168 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: https://review.gluster.org/18202 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Karthik U S <ksubrahm@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
* arbiter: return ENOSYS for 'non readable' FOPsNiels de Vos2017-09-101-10/+35
| | | | | | | | | | | | | | | | | | | | | | AFR marks the arbiter as 'non readable'. This has been introduced with commit 8ab87137 (afr: do not set arbiter as a readable subvol in inode context). arbiter_readv() should not get called anymore, so it could be removed. However, it is a good defensive approach to have all the inode read FOPs that can not be handled by the arbiter to return ENOSYS. > Reviewed-on: https://review.gluster.org/18103 > Smoke: Gluster Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> (cherry picked from commit b1352d0974328b367afa7360e9523585efb7178d) Change-Id: I6ea41680832859bd6790dc8d7440ee98d38205fc BUG: 1489511 Signed-off-by: Niels de Vos <ndevos@redhat.com> Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: https://review.gluster.org/18227 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* dht: add FOP check to dht_file_setattr_cbkRavishankar N2017-09-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: bug-797171.7 loaded error-gen xlator on the brick which sent EBADF for a non fd-based fop, namely setattr. This caused dht_check_and_open_fd_on_subvol_task() to crash as local->fd was NULL. Fix: Call dht_check_and_open_fd_on_subvol_task() from dht_file_setattr_cbk only for dht_fsetattr and not dht_setattr or dht_setattr2 > Reviewed-on: https://review.gluster.org/18208 > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Susant Palai <spalai@redhat.com> > Reviewed-by: Amar Tumballi <amarts@redhat.com> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > Reviewed-by: N Balachandran <nbalacha@redhat.com> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> (cherry picked from commit 47188e9eac59de416a5c86c7ec7540ed6aaa1c98) Signed-off-by: Ravishankar N <ravishankar@redhat.com> Change-Id: Iab4999e213bf2065804f3f8237e470ad454e3c99 BUG: 1489260 Reviewed-on: https://review.gluster.org/18222 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com>