summaryrefslogtreecommitdiffstats
path: root/xlators/mount
Commit message (Collapse)AuthorAgeFilesLines
...
* mount/fuse: never fail open(dir) with ENOENTRaghavendra G2017-10-171-0/+7
| | | | | | | | | | open(dir) being an operation on inode should never fail with ENOENT. If gfid is not present, the appropriate error is ESTALE. This will enable kernel to retry open after a revalidate lookup. Change-Id: I8d07d2ebb5a0da6c3ea478317442cb42f1797a4b BUG: 1500269 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* Revert "mount/fuse: report ESTALE as ENOENT"Raghavendra G2017-10-171-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 26d16b90ec7f8acbe07e56e8fe1baf9c9fa1519e. Consider rename (index.new, store.idx) and open (store.idx) being executed in parallel. When we break down operations following sequence is possible. * lookup (store.idx) - as part of open(store.idx) returns gfid1 as the result. * rename (index.new, store.idx) changes gfid of store.idx to gfid2. Note that gfid2 was the nodeid of index.new. Since rename is successful, gfid2 is associated with store.idx. * open (store.idx) resumes and issues open fop to glusterfs with gfid1. open in glusterfs fails as gfid1 doesn't exist and the error returned by glusterfs to kernel-fuse is ENOENT. * kernel passes back the same error to application as a result to open. This error could've been prevented if kernel retries open with gfid2. Interestingly kernel do retry open when it receives ESTALE error. Even though failure to find gfid resulted in ESTALE error, commit 26d16b90ec7f8acb converted that error to ENOENT while sending an error reply to kernel. This prevented kernel from retrying open resulting in error. Change-Id: I2e752ca60dd8af1b989dd1d29c7b002ee58440b4 BUG: 1500269 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* mount/fuse : Fix parsing of vol_id for snapshot volumeMohammed Rafi KC2017-10-151-2/+4
| | | | | | | | | | | | | | | | For supporting sub-dir mount, we changed the volid. Which means anything after a '/' in volume_id will be considered as sub-dir path. But snapshot volume has vol_id stracture of /snaps/<volname>/<snapname> which has to be considered as during the parsing. Note 1: sub-dir mount is not supported on snapshot volume Note 2: With sub-dir mount changes brick based mount for quota cannot be executed via mount command. It has to be a direct call via glusterfs Change-Id: I0d824de0236b803db8a918f683dabb0cb523cb04 BUG: 1501235 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* mount/fuse: Make event-history feature configurableKrutika Dhananjay2017-09-243-14/+42
| | | | | | | | | | | | | | | | | | | | | | ... and disable it by default. This is because having it disabled seems to improve performance. This could be due to the lock contention by the different epoll threads on the circular buff lock in the fop cbks just before writing their response to /dev/fuse. Just to provide some data - wrt ovirt-gluster hyperconverged environment, I saw an increase in IOPs by 12K with event-history disabled for randrom read workload. Usage: mount -t glusterfs -o event-history=on $HOSTNAME:$VOLNAME $MOUNTPOINT OR glusterfs --event-history=on --volfile-server=$HOSTNAME --volfile-id=$VOLNAME $MOUNTPOINT Change-Id: Ia533788d309c78688a315dc8cd04d30fad9e9485 BUG: 1467614 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* fuse/readdirp: Remove need_lookup from fuse_readdirp_cbkSusant Palai2017-09-201-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | background: Various xlators used to populate their ctx, on an explicit lookup. That means without a lookup, the translator will have either null or stale data to function. E.g. dht would depend on lookup to create linkto files on the correct node/hashed subvol, afr would rely on this lookup to heal pending data/metadata etc. So to complete above actions a lookup used to be issued on files, even their inode was populated on a readdirp_cbk. This was done by setting the need_lookup flag on all the files those were read on readdirp fop. We tried a small test on "ACL client". For listing 50k files on root itself, it took around 50seconds with readdirp enabled while the same operation took 5-6 seconds with readdirp disabled. Both the times md-cache was enabled. We observed that on the 1st test case (readdirp enabled), post readdirp a getxattr is done. The number of getxattr depends on the number of acl xattrs (I saw requests on these two: system.posix_acl_default, system.posix_acl_access). Since need_lookup flag is set, during fuse_resolve a nameless lookup is executed on the inode(getxattr being inode operation, hence the nameless lookup). Since md-cache does not serve nameless lookup, a network hop is needed for each file, costing the time. With readdirp disabled, the getxattrs are served from md-cache itself(note: we are discussing the 2nd attempt of ls -l use case). _Current affairs around need of lookup for a file to populate it's ctx_: For the xlators on client stack we discussed quite extensively about the need for a lookup fop post readdirp in all three cluster translators - afr, EC and dht. EC and dht don't really need a nameless lookup post readdirp. For afr too, the need for lookup was negated with patch (http://review.gluster.org/6010 - AFRV2), where afr added a function called afr_inode_refresh() which does a lookup and populates its inode context in case a FOP came to AFR without a lookup being issued prior to it. We ran a thread on gluster-devel asking for feedback on the need of explicit lookup post readdirp. For responses refer [1]. Refer [2] for discussions happened on gerrit. After gathering inputs from [1] and [2], it looks like there is no xlator in current state that requires an explicit lookup post readdirp to function properly. * A separate similar patch will be sent for gfapi/nfs/nfs-ganesha. Note: Only file's inode is built with readdirp. [1] http://lists.gluster.org/pipermail/gluster-devel/2017-August/053505.html [2] https://review.gluster.org/#/c/17985/ Change-Id: Ie1d68ce7bea5e1f8a1fab9a62217f478322554f5 BUG: 1492996 Signed-off-by: Susant Palai <spalai@redhat.com>
* mount/fuse: Include sub-directory in source argument for mount()Vijay Bellur2017-09-071-1/+7
| | | | | | | | | | | | | | | With this, mount of a sub-directory 'foo' gets listed in /proc/mounts as: <hostname>:<volname>/foo on /mnt/glusterfs type fuse.glusterfs (rw,relatime...) Signed-off-by: Vijay Bellur <vbellur@redhat.com> BUG: 1488913 Change-Id: Ib1e1ac3741bf66e1a912d792f2948b748931f2b0 Reviewed-on: https://review.gluster.org/18210 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* scripts: mount.glusterfs contains non-portable bashismsKaleb S. KEITHLEY2017-09-051-2/+3
| | | | | | | | | | | | | | | | | | Debian's default shell is dash, i.e. /bin/sh -> dash, which doesn't support bash extensions Reported-by: "Michael Lundkvist" <brels.debian@solske.net> Reported-by: pmatthaei@debian.org Debian BZ: 873878 Change-Id: I33003183b9bc6459cae28c565125e6b2bd1eaa47 BUG: 1487830 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: https://review.gluster.org/18184 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* Infra to indentify processhari gowtham2017-08-162-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: currently we can't identify which process is running and how many instances of it are available. Fix: name the process when its spawned and send it to the server and save it in the client_t The processes that abide by this change from this patch are: 1) fuse mount, 2) rebalance, 3) selfheal, 4) tier, 5) quota, 6) snapshot, 7) brick. 8) gfapi (by default. gfapi.<processname> if processname is found) Note: fuse gets a process name as native-fuse-client by default. If the user gives a name for the fuse and spawns it, it will be of this type --process-name native-fuse-client.<name_specified>. This can be made use by the process like aux mount done by quota, geo-rep, etc by adding another option in the aux mount " -o process-name=gsync_mount" Updates: #178 Signed-off-by: hari gowtham <hgowtham@redhat.com> Change-Id: Ie4d02257216839338043737691753bab9a974d5e Reviewed-on: https://review.gluster.org/17957 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: hari gowtham <hari.gowtham005@gmail.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Aravinda VK <avishwan@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* glusterfsd: allow subdir mountAmar Tumballi2017-08-041-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes: 1. Take subdir mount option in client (mount.gluster / glusterfsd) 2. Pass the subdir mount to server-handshake (from client-handshake) 3. Handle subdir-mount dir's lookup in server-first-lookup and handle all fops resolution accordingly with proper gfid of subdir 4. Change the auth/addr module to handle the multiple subdir entries in option, and valid parsing. How to use the feature: `# mount -t glusterfs $hostname:/$volname/$subdir /$mount_point` Or `# mount -t glusterfs $hostname:/$volname -osubdir_mount=$subdir /$mount_point` Option can be set like: `# gluster volume set <volname> auth.allow "/subdir1(192.168.1.*),/(192.168.10.*),/subdir2(192.168.8.*)"` Updates #175 Change-Id: I7ea57f76ddbe6c3862cfe02e13f89e8a39719e11 Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: https://review.gluster.org/17141 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* libglusterfs: Name threads on creationRaghavendra Talur2017-07-191-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set names to threads on creation for easier debugging. Output of top -H -p <PID-OF-GLUSTERFSD> Before: 19773 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19774 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19775 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19776 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19777 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19778 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19779 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19780 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19781 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19782 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19783 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19784 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19785 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd 19786 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd 19787 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd 19789 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19790 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 25178 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 5398 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 7881 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd After: 19773 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19774 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustertimer 19775 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19776 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustermemsweep 19777 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustersproc0 19778 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustersproc1 19779 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll0 19780 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusteridxwrker 19781 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusteriotwr0 19782 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterbrssign 19783 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterbrswrker 19784 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterclogecon 19785 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd0 19786 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd1 19787 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd2 19789 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixjan 19790 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixfsy 25178 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll1 5398 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll2 7881 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixhc Change-Id: Id5f333755c1ba168a2ffaa4fce6e71c375e10703 BUG: 1254002 Updates: #271 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: https://review.gluster.org/11926 Reviewed-by: Prashanth Pai <ppai@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* fuse: memory leak fixesDanny Couture2017-07-141-38/+30
| | | | | | | | | | | | | | | | | | | Fix fuse ctx memory leak in case an error occurs and the cleanup path is different than usual. Also fix a memory leak in logging if eh_save_history() fails. Change-Id: I7ec967c807b0ed91184e5b958be70702215c46c9 BUG: 1470220 Signed-off-by: Danny Couture <couture.danny@gmail.com> Reviewed-on: https://review.gluster.org/17759 Reviewed-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Amar Tumballi <amarts@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* groups: don't allocate auxiliary gid list on stackCsaba Henk2017-07-061-14/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When glusterfs wants to retrieve the list of auxiliary gids of a user, it typically allocates a sufficiently big gid_t array on stack and calls getgrouplist(3) with it. However, "sufficiently big" means to be of maximum supported gid list size, which in GlusterFS is GF_MAX_AUX_GROUPS = 64k. That means a 64k * sizeof(gid_t) = 256k allocation, which is big enough to overflow the stack in certain cases. A further observation is that stack allocation of the gid list brings no gain, as in all cases the content of the gid list eventually gets copied over to a heap allocated buffer. So we add a convenience wrapper of getgrouplist to libglusterfs called gf_getgrouplist which calls getgrouplist with a sufficiently big heap allocated buffer (it takes care of the allocation too). We are porting all the getgrouplist invocations to gf_getgrouplist and thus eliminate the huge stack allocation. BUG: 1464327 Change-Id: Icea76d0d74dcf2f87d26cb299acc771ca3b32d2b Signed-off-by: Csaba Henk <csaba@redhat.com> Reviewed-on: https://review.gluster.org/17706 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* Link against missed libraries to resolve symbolsPrashanth Pai2017-07-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | When external programs perform a dlopen("..so", RTLD_LAZY|RTLD_LOCAL) on some shared objects like xlators, it can fail with dlerror set to error string "undefined symbol <some-type>". This was observed for the following shared objects: fuse.so, quota.so, quotad.so, server.so, libgfrpc.so and socket.so P.S: This was found while running a go program which fetches the list of xlator options (volume_option_t) from xlator's shared object. BUG: 1193929 Change-Id: I7b958409cf11fb67c2be32a3f85a96fb1260236b Signed-off-by: Prashanth Pai <ppai@redhat.com> Reviewed-on: https://review.gluster.org/17659 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* fuse-bridge: cleanup first_lookup()Amar Tumballi2017-05-301-68/+13
| | | | | | | | | | | | | | | use syncop_lookup instead of synchronising stack_wind/unwind again. Updates #175 Change-Id: Iad4a181d8601235a999039979bfb7ec688675520 Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: https://review.gluster.org/17075 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Prashanth Pai <ppai@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* fuse: implement "-oauto_unmount"Csaba Henk2017-05-233-4/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | libfuse has an auto_unmount option which, if enabled, ensures that the file system is unmounted at FUSE server termination by running a separate monitor process that performs the unmount when that occurs. (This feature would probably better be called "robust auto-unmount", as FUSE servers usually do try to unmount their file systems upon termination, it's just this mechanism is not crash resilient.) This change implements that option and behavior for glusterfs. Note that "auto unmount" (robust or not) is a leaky abstraction, as the kernel cannot guarantee that at the path where the FUSE fs is mounted is actually the toplevel mount at the time of the umount(2) call, for multiple reasons, among others, see: fuse-devel: "fuse: feasible to distinguish between umount and abort?" http://fuse.996288.n3.nabble.com/fuse-feasible-to-distinguish-between-umount-and-abort-tt14358.html https://github.com/libfuse/libfuse/issues/122 Updates #153 Change-Id: Ia4432580c9fd2c156d9c73c3a44f4bfd42437599 Signed-off-by: Csaba Henk <csaba@redhat.com> Reviewed-on: https://review.gluster.org/17230 Tested-by: Amar Tumballi <amarts@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* mount/fuse: Handle racing notify on more than one graph properlyRaghavendra G2017-05-101-2/+6
| | | | | | | | | | | | | | | Make sure that we always use latest graph as a candidate for active-subvol. Change-Id: Ie37c818366f28ba6b1570d65a9eb17697d38a6c5 BUG: 1448364 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: https://review.gluster.org/17200 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* Halo Replication feature for AFR translatorKevin Vigor2017-05-021-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Halo Geo-replication is a feature which allows Gluster or NFS clients to write locally to their region (as defined by a latency "halo" or threshold if you like), and have their writes asynchronously propagate from their origin to the rest of the cluster. Clients can also write synchronously to the cluster simply by specifying a halo-latency which is very large (e.g. 10seconds) which will include all bricks. In other words, it allows clients to decide at mount time if they desire synchronous or asynchronous IO into a cluster and the cluster can support both of these modes to any number of clients simultaneously. There are a few new volume options due to this feature: halo-shd-latency: The threshold below which self-heal daemons will consider children (bricks) connected. halo-nfsd-latency: The threshold below which NFS daemons will consider children (bricks) connected. halo-latency: The threshold below which all other clients will consider children (bricks) connected. halo-min-replicas: The minimum number of replicas which are to be enforced regardless of latency specified in the above 3 options. If the number of children falls below this threshold the next best (chosen by latency) shall be swapped in. New FUSE mount options: halo-latency & halo-min-replicas: As descripted above. This feature combined with multi-threaded SHD support (D1271745) results in some pretty cool geo-replication possibilities. Operational Notes: - Global consistency is gaurenteed for synchronous clients, this is provided by the existing entry-locking mechanism. - Asynchronous clients on the other hand and merely consistent to their region. Writes & deletes will be protected via entry-locks as usual preventing concurrent writes into files which are undergoing replication. Read operations on the other hand should never block. - Writes are allowed from _any_ region and propagated from the origin to all other regions. The take away from this is care should be taken to ensure multiple writers do not write the same files resulting in a gfid split-brain which will require resolution via split-brain policies (majority, mtime & size). Recommended method for preventing this is using the nfs-auth feature to define which region for each share has RW permissions, tiers not in the origin region should have RO perms. TODO: - Synchronous clients (including the SHD) should choose clients from their own region as preferred sources for reads. Most of the plumbing is in place for this via the child_latency array. - Better GFID split brain handling & better dent type split brain handling (i.e. create a trash can and move the offending files into it). - Tagging in addition to latency as a means of defining which children you wish to synchronously write to Test Plan: - The usual suspects, clang, gcc w/ address sanitizer & valgrind - Prove tests Reviewers: jackl, dph, cjh, meyering Reviewed By: meyering Subscribers: ethanr Differential Revision: https://phabricator.fb.com/D1272053 Tasks: 4117827 Change-Id: I694a9ab429722da538da171ec528406e77b5e6d1 BUG: 1428061 Signed-off-by: Kevin Vigor <kvigor@fb.com> Reviewed-on: http://review.gluster.org/16099 Reviewed-on: https://review.gluster.org/16177 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* fuse: enhance fusedump to include timestamp and a signaturev3.12devCsaba Henk2017-04-301-12/+67
| | | | | | | | | | | | | | (Also referred to as "fusedump v2".) Change-Id: I837944024efd1b9055c2f5f91bd5723ef350e688 Signed-off-by: Csaba Henk <csaba@redhat.com> Reviewed-on: https://review.gluster.org/16422 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* mount/fuse: Replace GF_LOG_OCCASIONALLY with gf_log() to report fop failure ↵Krutika Dhananjay2017-04-301-5/+2
| | | | | | | | | | | | | at all times Change-Id: Ibd8e1c6172812951092ff6097ba4bed943051b7c BUG: 1440051 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: https://review.gluster.org/17086 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* fuse: clean up mount flag processingCsaba Henk2017-04-271-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In general, when one invokes a mount helper program -- basically anything that mounts something based on its command line, so thinking of mount(8), mount.<fs-type> or fusermount, but also of FUSE servers in general, including glusterfs -- the command line arguments that are to affect mount(2) are mapped to a bitmask called the mount flags, which is passed to mount(2), so that the kernel can interpret the flag bits and adjusts properties of the mount accordingly. There is a traditional syntax for this mechanism as implemented in mount(8): one passes "-ocomma,separated,mount,options" and the individual option name strings are mapped to flag bits in mount(8). FUSE further explores this idea and typically the FUSE server command lines allow further option names to be used in the "-ooption,name,list" which are then separated from the kernel sanctioned option names (to which we'll refer as "system mount options") and are passed to a platform specific lower level fuse mount helper interface. The separation of system mount option names and FUSE specific option names is also platform specific, so the general mount interface function, which in case of glusterfs is gf_fuse_mount(), should abstract this away. Therefore we change the signature of this function from int gf_fuse_mount (const char *mountpoint, char *fsname, unsigned long mountflags, char *mnt_param, pid_t *mtab_pid, int status_fd); to int gf_fuse_mount (const char *mountpoint, char *fsname, char *mnt_param, pid_t *mtab_pid, int status_fd); and deal with flag extraction in platform specific mount code. Note that the sole purpose of the mountflags argument was to indicate read-only mounting. The other system mount option names were expected to reside in the comma-separated mnt_param string, but they were not properly processed (see the referred BUG). With the new gf_fuse_mount signature read-only mounting is to be indicated as a "ro" component in mnt_param. - For Darwin, which has a dedicated, separate gf_fuse_mount implementation, gf_fuse_mount was ignoring mountflags, so only the signature had to to be adjusted. However, as bonus, we gain read-only support for Darwin, which was missing so far, given that it was indicated via the ignored mountflags. Darwin's low level mount helper relies on the "ro" component of the option string, which agrees with the new calling convention of gf_fuse_mount. - On Linux, system mount option name handling (apart from the distinguished read-only option) used to have the inadvertent side effect of adding "nosuid,nodev" as indicated in BUG; since Ia89d975d1e27fcfa5ab2036ba546aa8fa0d2d1b0 this side effect is removed, but system mount option name handling was left broken (passing system mount options other than "ro" fails to mount). - On other platforms, system mount option name handling is broken (expect for the distinguished read-only option). As of this change, in the general (non-Darwin) implementation of gf_fuse_mount we take care of proper separation of system mount names and their conversion to mount flags. For Linux, we adopt the conversion table from FUSE upstream. For other systems we just provide a best effort to support those system mount options which are understood across all Unices (nosuid,nodev,noatime,noexec,ro). (This can be improved later to provide proper plaform support.) BUG: 1297182 Change-Id: I5d10b5df46feba7a02bf5bf1018db69e6b52260a Signed-off-by: Csaba Henk <csaba@redhat.com> Reviewed-on: https://review.gluster.org/16313 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Amar Tumballi <amarts@redhat.com>
* fuse-bridge: fuse_getattr(): send root lookup() only in case of failure.Amar Tumballi2017-04-071-21/+37
| | | | | | | | | | | | | | this will make sure we don't fail in any current cases, and also will enable the performance better. Change-Id: Ia421e1913e1b00f0730a004bf7c84bf7e2a62636 BUG: 1437780 Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: https://review.gluster.org/16945 Reviewed-by: Niels de Vos <ndevos@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* core: run many bricks within one glusterfsd processJeff Darcy2017-01-301-0/+10
| | | | | | | | | | | | | | | | | | | | | | | This patch adds support for multiple brick translator stacks running in a single brick server process. This reduces our per-brick memory usage by approximately 3x, and our appetite for TCP ports even more. It also creates potential to avoid process/thread thrashing, and to improve QoS by scheduling more carefully across the bricks, but realizing that potential will require further work. Multiplexing is controlled by the "cluster.brick-multiplex" global option. By default it's off, and bricks are started in separate processes as before. If multiplexing is enabled, then *compatible* bricks (mostly those with the same transport options) will be started in the same process. Change-Id: I45059454e51d6f4cbb29a4953359c09a408695cb BUG: 1385758 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: https://review.gluster.org/14763 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* fuse: Fix a possible resource leak under GF_SOLARIS_HOST_OSSaurabh Badhwar2017-01-171-1/+3
| | | | | | | | | | | | | | in fuse-helpers.c Change-Id: Ie367a6dec2a0d5848631b19ebbe39ceafa954a60 BUG: 1412918 Signed-off-by: Saurabh Badhwar <sbsaurabhbadhwar9@gmail.com> Reviewed-on: http://review.gluster.org/16395 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Saravanakumar Arumugam <sarumuga@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* fuse: fix memory leak in setxattrXavier Hernandez2017-01-121-27/+21
| | | | | | | | | | | | | | | | | | | If there's some failed check in setxattr of mount/fuse before actually starting the operation, a fuse_state_t structure is leaked. This fix correctly releases allocated resources in case of error. Change-Id: I8b1cda67a613c13b6bc38947352e2ccfccf96a1d BUG: 1412174 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/16380 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* mount/fuse: Fix the place where graph-switch event is loggedKrutika Dhananjay2017-01-041-3/+4
| | | | | | | | | | Change-Id: I3c8577b87db02a2a6ce6159e7d04cf58a2bda0c1 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/16302 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* fuse: fix fuse dumping for FUSE_WRITECsaba Henk2016-09-221-1/+6
| | | | | | | | | | | | | | | | | Data coming with FUSE_WRITE requests are arranged with a special alignment, cf. 15d85ff1. fuse_dumper() was not aware of this and didn't dump the proper reqion for FUSE_WRITE. BUG: 1377427 Signed-off-by: Csaba Henk <csaba@redhat.com> Change-Id: I36255ca3336e95be6e2d256c8199761ddec41869 Reviewed-on: http://review.gluster.org/15525 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* build: out-of-tree builds generates files in the wrong directoryKaleb S KEITHLEY2016-09-181-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | And minor cleanup of a few of the Makefile.am files while we're at it. Rewrite the make rules to do what xdrgen does. Now we can get rid of xdrgen. Note 1. netbsd6's sed doesn't do -i. Why are we still running smoke tests on netbsd6 and not netbsd7? We barely support netbsd7 as it is. Note 2. Why is/was libgfxdr.so (.../rpc/xdr/src/...) linked with libglusterfs? A cut-and-paste mistake? It has no references to symbols in libglusterfs. Note3. "/#ifndef\|#define\|#endif/" (note the '\'s) is a _basic_ regex that matches the same lines as the _extended_ regex "/#(ifndef|define|endif)/". To match the extended regex sed needs to be run with -r on Linux; with -E on *BSD. However NetBSD's and FreeBSD's sed helpfully also provide -r for compatibility. Using a basic regex avoids having to use a kludge in order to run sed with the correct option on OS X. Note 4. Not copying the bit of xdrgen that inserts copyright/license boilerplate. AFAIK it's silly to pretend that machine generated files like these can be copyrighted or need license boilerplate. The XDR source files have their own copyright and license; and their copyrights are bound to be more up to date than old boilerplate inserted by a script. From what I've seen of other Open Source projects -- e.g. gcc and its C parser files generated by yacc and lex -- IIRC they don't bother to add copyright/license boilerplate to their generated files. It appears that it's a long-standing feature of make (SysV, BSD, gnu) for out-of-tree builds to helpfully pretend that the source files it can find in the VPATH "exist" as if they are in the $cwd. rpcgen doesn't work well in this situation and generates files with "bad" #include directives. E.g. if you `rpcgen ../../../../$srcdir/rpc/xdr/src/glusterfs3-xdr.x`, you get an #include directive in the generated .c file like this: ... #include "../../../../$srcdir/rpc/xdr/src/glusterfs3-xdr.h" ... which (obviously) results in compile errors on out-of-tree build because the (generated) header file doesn't exist at that location. Compared to `rpcgen ./glusterfs3-xdr.x` where you get: ... #include "glusterfs3-xdr.h" ... Which is what we need. We have to resort to some Stupid Make Tricks like the addition of various .PHONY targets to work around the VPATH "help". Warning: When doing an in-tree build, -I$(top_builddir)/rpc/xdr/... looks exactly like -I$(top_srcdir)/rpc/xdr/... Don't be fooled though. And don't delete the -I$(top_builddir)/rpc/xdr/... bits Change-Id: Iba6ab96b2d0a17c5a7e9f92233993b318858b62e BUG: 1330604 Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/14085 Tested-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* dict: Don't expose get_new_dict/dict_destroyPranith Kumar K2016-07-252-4/+1
| | | | | | | | | | | | | | | get_new_dict/dict_destroy is causing confusion where, dict_new/dict_destroy or get_new_dict/dict_unref are used instead of dict_new/dict_unref. Change-Id: I4cc69f5b6711d720823395e20fd624a0c6c1168c BUG: 1296043 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13183 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
* glusterfsd/main: Add ability to set oom_score_adjOleksandr Natalenko2016-06-011-0/+7
| | | | | | | | | | | | | | Give the administrator a possibility to set oom_score_adj for glusterfs process. Applies to Linux only. Change-Id: Iff13c2f4cb28457871c6ebeff6130bce4a8bf543 BUG: 1336818 Signed-off-by: Oleksandr Natalenko <oleksandr@natalenko.name> Reviewed-on: http://review.gluster.org/14399 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* fuse: accept the -s option to allow automountingNiels de Vos2016-05-311-1/+4
| | | | | | | | | | | | | | | | | | | autofs passes the -s option when mounting. All /sbin/mount.<fs> helpers accept this, except mount.glusterfs. Because the helper fails when -s is passed accessing the mountpoint through autofs gives the following error: $ ls /lan/storage.lan.example.net/repos ls: cannot open directory /lan/storage.lan.example.net/repos: Too many levels of symbolic links BUG: 1340936 Change-Id: I84755cdac59e630618cb745c0eb3228cc1e93a1a Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/14559 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Oleksandr Natalenko <oleksandr@natalenko.name>
* fuse: unref dict even if fuse_first_lookup failsKremmyda, Olia (NSN - GR/Athens)2016-05-231-2/+1
| | | | | | | | | | | | | | | | | In fuse_first_lookup function, "dict_unref (dict)" should be included in the out label, in case create_frame returns an empty pointer the dict to be unreferenced as well. Bug: 1338544 Change-Id: Ifb8a3378aec6521c1aa848f818968b6bfdb72089 Signed-off-by: Olia Kremmyda <olympia.kremmyda@nokia.com> Reviewed-on: http://review.gluster.org/14464 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* mount/fuse: Log gfid and fd ptr as well when writev/readv failKrutika Dhananjay2016-05-121-3/+9
| | | | | | | | | | | | Change-Id: Iaed3850171155f2452fc29ebecd350c2da0b55cb BUG: 1335091 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/14291 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* build: remove unneeded include <sys/user.h> for FreeBSDNiels de Vos2016-04-281-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The include <sys/user.h> causes a conflicting definition of an RPC 'struct pmap': --- fuse-helpers.lo --- In file included from /usr/include/rpc/rpc.h:73:0, from ../../../../libglusterfs/src/glusterfs-fops.h:35, from /usr/home/jenkins/root/workspace/freebsd-smoke/libglusterfs/src/glusterfs.h:32, from /usr/home/jenkins/root/workspace/freebsd-smoke/xlators/mount/fuse/src/fuse-bridge.h:22, from /usr/home/jenkins/root/workspace/freebsd-smoke/xlators/mount/fuse/src/fuse-helpers.c:26: /usr/include/rpc/pmap_prot.h:89:8: error: redefinition of 'struct pmap' struct pmap { ^ In file included from /usr/include/vm/pmap.h:90:0, from /usr/include/sys/user.h:52, from /usr/home/jenkins/root/workspace/freebsd-smoke/xlators/mount/fuse/src/fuse-helpers.c:19: /usr/include/machine/pmap.h:299:8: note: originally defined here struct pmap { ^ It seems that building on FreeBSD still functions without any additional warnings or errors, even when the include is removed. Change-id I98fc8cf7e4b631082c7b203b5a0a77111bec1fb9 identified this issue, and this build-fix is needed for applying I98fc8cf7. BUG: 1198849 Change-Id: Ib8241b7dc47eb2c3593d2f8ea1d196178e63d02d Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/14093 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* mount/fuse: report ESTALE as ENOENTRaghavendra G2016-04-181-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | When the inode/gfid is missing, brick report back as an ESTALE error. However, most of the applications don't accept ESTALE as an error for a file-system object missing, changing their behaviour. For eg., rm -rf ignores ENOENT errors during unlink of files/directories. But with ESTALE error it doesn't send rmdir on a directory if unlink had failed with ESTALE for any of the files or directories within it. Thanks to Ravishankar N <ravishankar@redhat.com>, here is a link as to why we split up ENOENT into ESTALE and ENOENT. http://review.gluster.org/#/c/6318/ Change-Id: I467df0fdf22734a8ef20c79ac52606410fad04d1 BUG: 1245065 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/13816 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: N Balachandran <nbalacha@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* dht: add "nuke" functionality for efficient server-side deletionJeff Darcy2016-04-071-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This turns a special xattr into an rmdir with flags set. When that hits the posix translator on the server side, that causes the file/directory to be moved into the special "landfill" directory. From there, the posix janitor thread will take care of deleting it entirely on the server side - traversing it recursively if necessary. A couple of secondary issues were fixed to make this effective. * FUSE now ensures that setxattr values are NUL terminated. * The janitor thread now gets woken up immediately when something is placed in 'landfill' instead of only when file descriptors need to be closed. * The default landfill-emptying interval was reduced to 10s. To use the feature, issue a setxattr something like this: setfattr -n glusterfs.dht.nuke -v "" /mnt/glusterfs/vol/some_dir The value doesn't actually matter; the mere receipt of a request with this key is sufficient. Some day it might be useful to allow setting a required value as a sort of password, so that only those who know it can access the underlying special functionality. Change-Id: I8a343c2cdb40a76d5a06c707191fb67babb8514f Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/13878 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* mount/fuse: cleanup an additional inode_ref()Vijay Bellur2016-03-161-1/+4
| | | | | | | | | | | | | | | | | commit ca515db0127 introduced a check in fuse_resolve_inode_simple(). This results in an additional ref being held on inodes which were obtained through readdirp. As a result, the inode table keeps growing and entries remain in the active list even after deletion of such inodes. Change-Id: I780ec5513990d6ef00ea051ec57ff20e4428081e BUG: 1317948 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/13689 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* fuse: Address the review comments in the backportPoornima G2016-03-093-26/+37
| | | | | | | | | | | | | | | | | Backport @ http://review.gluster.org/#/c/13626/3 Fix a typo error, consolidate the selinux and capability check in getxattr and setxattr. Change-Id: I4303de3d4dd00853169b07577311e03cbb912ed7 BUG: 1316327 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/13653 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Vijay Bellur <vbellur@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* fuse: Add a new mount option capabilityPoornima G2016-03-073-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | Originally all security.* xattrs were forbidden if selinux is disabled, which was causing Samba's acl_xattr module to not work, as it would store the NTACL in security.NTACL. To fix this http://review.gluster.org/#/c/12826/ was sent, which forbid only security.selinux. This opened up a getxattr call on security.capability before every write fop and others. Capabilities can be used without selinux, hence if selinux is disabled, security.capability cannot be forbidden. Hence adding a new mount option called capability. Only when "--capability" or "--selinux" mount option is used, security.capability is sent to the brick, else it is forbidden. Change-Id: I77f60e0fb541deaa416159e45c78dd2ae653105e BUG: 1309462 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/13540 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* glusterd: Bug fixes for IPv6 supportNithin D2016-02-202-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Glusterd not working using ipv6 transport. The idea is with proper glusterd.vol configuration, 1. glusterd needs to listen on default port (240007) as IPv6 TCP listner. 2. Volume creation/deletion/mounting/add-bricks/delete-bricks/peer-probe needs to work using ipv6 addresses. 3. Bricks needs to listen on ipv6 addresses. All the above functionality is needed to say that glusterd supports ipv6 transport and this is broken. Fix: When "option transport.address-family inet6" option is present in glusterd.vol file, it is made sure that glusterd creates listeners using ipv6 sockets only and also the same information is saved inside brick volume files used by glusterfsd brick process when they are starting. Tests Run: Regression tests using ./run-tests.sh IPv4: Ran manually till tests/basic/rpm.t . IPv6: (Need to add the above mentioned config and also add an entry for "hostname ::1" in /etc/hosts) Started failing at ./tests/basic/glusterd/arbiter-volume-probe.t and ran successfully till here Unit Tests using Ipv6 peer probe add-bricks remove-bricks create volume replace-bricks start volume stop volume delete volume Change-Id: Iebc96e6cce748b5924ce5da17b0114600ec70a6e BUG: 1117886 Signed-off-by: Nithin D <nithind1988@yahoo.in> Reviewed-on: http://review.gluster.org/11988 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* fuse: add support for SEEK_HOLE and SEEK_DATA through lseek()Niels de Vos2016-02-102-1/+67
| | | | | | | | | | | | | | | | | | | | The Linux FUSE kernel module has gained support for passing SEEK_HOLE and SEEK_DATA on through lseek(). This can greatly improve performance when working with sparse files. Linux FUSE introduced support for lseek() with version 4.5. The commit in mainline Linux is 0b5da8db145bfd44266ac964a2636a0cf8d7c286. URL: http://thread.gmane.org/gmane.comp.file-systems.fuse.devel/14752 Change-Id: I12496d788e59461a3023ddd30e0ea3179007f77e BUG: 1220173 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/11474 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* Fuse: Add a check for NULL in fuse_itable_dumpAshish Pandey2016-02-081-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Immediately after starting a disperse volume (2+1) kill one brick and just after that try to mount it through fuse. This lead to crash. Our test scripts use process statedumps to determine various things like whether they are up, connected to bricks etc. It takes some time for an active_subvol to be be associated with fuse even after mount process is daemonized. This time is normally a function of completion of handshake with bricks. So, if we try to take statedump in this time window, fuse wouldn't have an active_subvol associated with it leading to this crash. This happened while executing ec-notify.t, which contains above steps. Solution: Check priv and priv->active_subvol for NULL before inode_table_dump. If priv->active_subvol is null its perfectly fine to skip dumping of inode table as inode table is associated with an active_subvol. A Null active_subvol indicates initialization in progress and fuse wouldn't even have started reading requests from /dev/fuse and hence there wouldn't be any inodes or file system activity. Change-Id: I323a154789edf8182dbd1ac5ec7ae07bf59b2060 BUG: 1299410 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/13253 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* fuse: update fuse_kernel.h to version 23Ravishankar N2016-02-061-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following changes were made upstream: - add FUSE_WRITEBACK_CACHE - add time_gran to fuse_init_out - add reserved space to fuse_init_out - add FATTR_CTIME - add ctime and ctimensec to fuse_setattr_in - add FUSE_RENAME2 request - add FUSE_NO_OPEN_SUPPORT flag Including these changes will make it easier to backport support for lseek(). Because the fuse_init_out structure changed its size, older versions of FUSE would fail initializing. When an older version of FUSE is detected, the fuse_init_out structure is reduced to the previous size. This is harmless, as the attributes that are not passed, are not used for earlier versions anyway. BUG: 1220173 Change-Id: I58c74e161638b2d4ce12fc91a206fdc1b96de14d Signed-off-by: Ravishankar N <ravishankar@redhat.com> [ndevos: splitted from http://review.gluster.org/11474 old version fuse_init_out size correction] Reviewed-on: http://review.gluster.org/11537 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* fuse: fix inode and dentry leaksXavier Hernandez2016-02-031-19/+27
| | | | | | | | | | | | | | | | | | When a readdirp was executed, the nlookup count for each inode of the returned entries was incremented. However the kernel does not increment the counter for '.' and '..' entries. This caused kernel to send forgets with a counter smaller than the inode's current value. This prevented these inodes to be retired when ref count was 0. Change-Id: I31901af36ab7b4cdc3e6fa2f30a0263a1a2daef8 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/13327 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* fuse: use-after-free fix in fuse-bridge, revisitedKaleb S KEITHLEY2016-02-021-14/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prompted by the email exchange in gluster-devel between Oleksandr Natalenko, xavi, and soumyak, I looked at this because the fuse client on the longevity cluster has also been suffering from a serious memory leak for some time. (longevity cluster is currently running 3.7.6) The longevity cluster manifests the same kernel notifier loop terminated log message the Oleksandr sees, and some sample runs suggest that the length passed to the (sys_)write call is unexpectedly and abnormally large. Basically this fix a) uses correct types for len and rv, b) copies the len from potentially incorrectly aligned memory (in a way that should minimize potential performance issues related to accessing unaligned memory.) c) changes log level of the kernel notifier loop terminated message d) fixes a potential mutex lock/unlock issue Change-Id: Icedb3525706f59803878bb37ef6b4ffe4a986880 BUG: 1288857 Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/13274 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* fuse: pass standard mount options to the kernelNiels de Vos2016-01-241-1/+23
| | | | | | | | | | | | | | | | | | | | | Some of the default mount options were made invalid with glusterfs-3.6. The /sbin/mount.glusterfs script changed heavily and now requires all valid mount options to be listed. Earlier versions (glusterfs-3.5 and before) passed all unknown mount options on to fuse. With this change, all mount options from 'man 8 mount' are explicitly included in the /sbin/mount.glusterfs script. Some of the options are marked with TODO, these are not commonly used and may require some additional support in Gluster/FUSE too. BUG: 1294809 Change-Id: Ic312140d7318b54523996bb08772ff065af7eb27 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/13166 Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* fuse: send lookup if inode_ctx is not setMohammed Rafi KC2016-01-132-6/+22
| | | | | | | | | | | | | | | During resolving of an entry or inode, if inode ctx was not set, we will send a lookup. This patch also make sure that inode_ctx will be created after every inode_link Change-Id: I4211533ca96a51b89d9f010fc57133470e52dc11 BUG: 1297311 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/13225 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* fuse:sent at least one lookup before actual fopMohammed Rafi KC2016-01-131-2/+18
| | | | | | | | | | | | | | | | Fuse shoud sent atleast one lookup for an inode/gfid populated via readdirp before actual fop to populate inode ctx for xlators Change-Id: I5c02ed73f892924c9e404d91cbe0633a275accbd BUG: 1236032 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/11892 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* no-mtab (-n) mount option ignore next mount optionJames Augustine2016-01-101-1/+0
| | | | | | | | | | | | | | | The -n option does not take any arguments. It seems like this shift is removing the next option. On my CentOS 7 system, automount calls mount.glusterfs with the parameters: host:/volume /mountpoint -n -o rw,acl,_netdev This causes the -o option to be siliently ignored. Change-Id: Ice3c877f6ab346b04292e3dfed968d04d15077a5 BUG: 1297195 Signed-off-by: James Augustine <jcaugust81@gmail.com> Reviewed-on: http://review.gluster.org/12988 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Niels de Vos <ndevos@redhat.com>
* build: export minimum symbols from xlators for correct resolutionKaleb S KEITHLEY2015-12-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | Revisiting http://review.gluster.org/#/c/11814/, which unintentionally introduced warnings from libtool about the xlator .so names. According to [1], the -module option must appear in the Makefile.am file(s); if -module is defined in a macro, e.g. in configure(.ac), then libtool will not recognize that this is a module and will emit a warning. [1] http://www.gnu.org/software/automake/manual/automake.html#Libtool-Modules Change-Id: Ifa5f9327d18d139597791c305aa10cc4410fb078 BUG: 1248669 Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/13003 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: soumya k <skoduri@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* performance/write-behind: retry "failed syncs to backend"Raghavendra G2015-12-221-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. When sync fails, the cached-write is still preserved unless there is a flush/fsync waiting on it. 2. When a sync fails and there is a flush/fsync waiting on the cached-write, the cache is thrown away and no further retries will be made. In other words flush/fsync act as barriers for all the previous writes. The behaviour of fsync acting as a barrier is controlled by an option (see below for details). All previous writes are either successfully synced to backend or forgotten in case of an error. Without such barrier fop (especially flush which is issued prior to a close), we end up retrying for ever even after fd is closed. 3. If a fop is waiting on cached-write and syncing to backend fails, the waiting fop is failed. 4. sync failures when no fop is waiting are ignored and are not propagated to application. For eg., a. first attempt of sync of a cached-write w1 fails b. second attempt of sync of w1 succeeds If there are no fops dependent on w1 are issued b/w a and b, application won't know about failure encountered in a. 5. The effect of repeated sync failures is that, there will be no cache for future writes and they cannot be written behind. fsync as a barrier and resync of cached writes post fsync failure: ================================================================== Whether to keep retrying failed syncs post fsync is controlled by an option "resync-failed-syncs-after-fsync". By default, this option is set to "off". If sync of "cached-writes issued before fsync" (to backend) fails, this option configures whether to retry syncing them after fsync or forget them. If set to on, cached-writes are retried till a "flush" fop (or a successful sync) on sync failures. fsync itself is failed irrespective of the value of this option, when there is a sync failure of any cached-writes issued before fsync. Change-Id: I6097c9257bfb9ee5b15616fbe6a0576ae9af369a Signed-off-by: Raghavendra G <rgowdapp@redhat.com> BUG: 1279730 Reviewed-on: http://review.gluster.org/12594