glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	fuse: diagnostic FLUSH interrupt	Csaba Henk	2018-11-06	3	-2/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We add dummy interrupt handling for the FLUSH fuse message. It can be enabled by the "--fuse-flush-handle-interrupt" hidden command line option, or "-ofuse-flush-handle-interrupt=yes" mount option. It serves no other than diagnostic & demonstational purposes -- to exercise the interrupt handling framework a bit and to give an usage example. Documentation is also provided that showcases interrupt handling via FLUSH. Change-Id: I522f1e798501d06b74ac3592a5f73c1ab0590c60 updates: #465 Signed-off-by: Csaba Henk <csaba@redhat.com>
*	fuse: interrupt handling framework	Csaba Henk	2018-11-06	3	-1/+512
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- add sub-framework to send timed responses to kernel - add interrupt handler queue - implement INTERRUPT fuse_interrupt looks up handlers for interrupted messages in the queue. If found, it invokes the handler function. Else responds with EAGAIN with a delay. See spec at https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/fuse.txt?h=v4.17#n148 and explanation in comments. Change-Id: I1a79d3679b31f36e14b4ac8f60b7f2c1ea2badfb updates: #465 Signed-off-by: Csaba Henk <csaba@redhat.com>
*	all: fix the format string exceptions	Amar Tumballi	2018-11-05	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, there are possibilities in few places, where a user-controlled (like filename, program parameter etc) string can be passed as 'fmt' for printf(), which can lead to segfault, if the user's string contains '%s', '%d' in it. While fixing it, makes sense to make the explicit check for such issues across the codebase, by making the format call properly. Fixes: CVE-2018-14661 Fixes: bz#1644763 Change-Id: Ib547293f2d9eb618594cbff0df3b9c800e88bde4 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	mount.glusterfs: A more explicit check to avoid identical mounts	Han Han	2018-10-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Change check condition from "[[:space:]+]${mount_point}[[:space:]+]fuse" to "[[:space:]+]${mount_point}[[:space:]+]fuse.glusterfs". Fix false postive check result for mount points of other FUSEes, such as "fuse.sshfs". Change-Id: I13898b50a651a8f5ecc3a94d01b3b5de37ec4cbc fixes: bz#1640026 Signed-off-by: Han Han <hhan@redhat.com>
*	mount/fuse: return ESTALE instead of ENOENT on all inode based operations	Raghavendra Gowdappa	2018-10-20	1	-2/+115
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch is continuation of commit fb4b914ce84bc83a5f418719c5ba7c25689a9251. This patch extends that logic to all inode based operations and not just open(dir). <snip> mount/fuse: never fail open(dir) with ENOENT open(dir) being an operation on inode should never fail with ENOENT. If gfid is not present, the appropriate error is ESTALE. This will enable kernel to retry open after a revalidate lookup. </snip> Change-Id: I6313f520827e9af725485631cb6a9d9718243bc4 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Fixes: bz#1627620
*	all: fix warnings on non 64-bits architectures	Xavi Hernandez	2018-10-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	When compiling in other architectures there appear many warnings. Some of them are actual problems that prevent gluster to work correctly on those architectures. Change-Id: Icdc7107a2bc2da662903c51910beddb84bdf03c0 fixes: bz#1632717 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	fuse: prevent error message "can't shift that many"	Niels de Vos	2018-10-01	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On systems where /bin/sh is not Bash, running plain mount.glusterfs gives the unhelpful error "can't shift that many". The argument parsing can be a little improved. Adding a check for the number of arguments, minimal two (Gluster ip:/volume, and mountpoint), but possibly more (-o, -v etc.). With the additional check, running 'mount.glusterfs -h' now shows the following messags: Usage: /sbin/mount.glusterfs <server>:<volume/subdir> <mountpoint> -o<options> Options: man 8 mount.glusterfs To display the version number of the mount helper: /sbin/mount.glusterfs -V Change-Id: I50e3ade0c6217fab4155f35ad8cb35d99d52e133 Fixes: bz#1564890 Reported-by: Alexander Zimmermann <alexander.zimmermann96@gmail.com> Signed-off-by: Niels de Vos <ndevos@redhat.com>
*	Land part 2 of clang-format changes	Gluster Ant	2018-09-12	3	-5816/+5624
\| \| \| \| \|	Change-Id: Ia84cc24c8924e6d22d02ac15f611c10e26db99b4 Signed-off-by: Nigel Babu <nigelb@redhat.com>
*	Land clang-format changes	Gluster Ant	2018-09-12	2	-361/+373
\| \| \| \|	Change-Id: I6f5d8140a06f3c1b2d196849299f8d483028d33b
*	mount/fuse: convert ENOENT to ESTALE in open(dir)_resume	Raghavendra G	2018-09-11	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch is continuation of commit fb4b914ce84bc83a5f418719c5ba7c25689a9251. <snip> mount/fuse: never fail open(dir) with ENOENT open(dir) being an operation on inode should never fail with ENOENT. If gfid is not present, the appropriate error is ESTALE. This will enable kernel to retry open after a revalidate lookup. </snip> Earlier commit failed to fix codepath where error response is sent back on gfid resolution failures in fuse_open(dir)_resume. Current patch completes that work Change-Id: Ia07e3cece404811703c8cfbac9b402ca5fe98c1e Signed-off-by: Raghavendra G <rgowdapp@redhat.com> updates: bz#1627620
*	Multiple files: calloc -> malloc	Yaniv Kaul	2018-09-04	2	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	xlators/storage/posix/src/posix-inode-fd-ops.c: xlators/storage/posix/src/posix-helpers.c: xlators/storage/bd/src/bd.c: xlators/protocol/client/src/client-lk.c: xlators/performance/quick-read/src/quick-read.c: xlators/performance/io-cache/src/page.c xlators/nfs/server/src/nfs3-helpers.c xlators/nfs/server/src/nfs-fops.c xlators/nfs/server/src/mount3udp_svc.c xlators/nfs/server/src/mount3.c xlators/mount/fuse/src/fuse-helpers.c xlators/mount/fuse/src/fuse-bridge.c xlators/mgmt/glusterd/src/glusterd-utils.c xlators/mgmt/glusterd/src/glusterd-syncop.h xlators/mgmt/glusterd/src/glusterd-snapshot.c xlators/mgmt/glusterd/src/glusterd-rpc-ops.c xlators/mgmt/glusterd/src/glusterd-replace-brick.c xlators/mgmt/glusterd/src/glusterd-op-sm.c xlators/mgmt/glusterd/src/glusterd-mgmt.c xlators/meta/src/subvolumes-dir.c xlators/meta/src/graph-dir.c xlators/features/trash/src/trash.c xlators/features/shard/src/shard.h xlators/features/shard/src/shard.c xlators/features/marker/src/marker-quota.c xlators/features/locks/src/common.c xlators/features/leases/src/leases-internal.c xlators/features/gfid-access/src/gfid-access.c xlators/features/cloudsync/src/cloudsync-plugins/src/cloudsyncs3/src/libcloudsyncs3.c xlators/features/bit-rot/src/bitd/bit-rot.c xlators/features/bit-rot/src/bitd/bit-rot-scrub.c bxlators/encryption/crypt/src/metadata.c xlators/encryption/crypt/src/crypt.c xlators/performance/md-cache/src/md-cache.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible It doesn't make sense to calloc (allocate and clear) memory when the code right away fills that memory with data. It may be optimized by the compiler, or have a microscopic performance improvement. In some cases, also changed allocation size to be sizeof some struct or type instead of a pointer - easier to read. In some cases, removed redundant strlen() calls by saving the result into a variable. 1. Only done for the straightforward cases. There's room for improvement. 2. Please review carefully, especially for string allocation, with the terminating NULL string. Only compile-tested! .. and allocate memory as much as needed. xlators/nfs/server/src/mount3.c : Don't blindly allocate PATH_MAX, but strlen() the string and allocate appropriately. Also, align error messges. updates: bz#1193929 Original-Author: Yaniv Kaul <ykaul@redhat.com> Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: Ibda6f33dd180b7f7694f20a12af1e9576fe197f5
*	performance/readdir-ahead: keep stats of cached dentries in sync with ↵	Krutika Dhananjay	2018-08-18	1	-4/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	modifications PROBLEM: Stats of dentries that are readdirp'd ahead can become stale due to fops like writes, truncate etc that modify the file pointed by dentries. When a readdir is finally wound at offset corresponding to these entries, the iatts that are returned to the application come from readdir-ahead's cache, which are stale by now. This problem gets further aggravated when caching translators/modules cache and continue to serve this stale information. FIX: * Store the iatt in context of the inode pointed by dentry. * Whenever the inode pointed by dentry undergoes modification, in cbk of modification fop, update the iatt stored in inode-ctx to reflect the modification. * When serving a readdirp response from application, update iatts of dentries with the iatts stored in the context of inodes pointed by these dentries. * Some fops don't have valid iatts in their responses. For eg., write response whose data is still cached in write-behind will have zeroed out stat. In this case keep only ia_type and ia_gfid and reset rest of the iatt members to zero. - fuse-bridge in this case just sends "entry" information back to kernel and attr is not sent. - gfapi sets entry->inode to NULL and zeroes out the entire stat * There is one tiny race between the entry creation and a readdirp on its parent dir, which could cause the inode-ctx setting and inode ctx reading to happen on two different inode objects. To prevent this, when entry->inode doesn't eqaul to linked_inode, - fuse-bridge is made to send only "entry" information without attributes - gfapi sets entry->inode to NULL and zeroes out the entire stat. Change-Id: Ia27ff49a61922e88c73a1547ad8aacc9968a69df BUG: 1390050 Updates: bz#1390050 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
*	All: run codespell on the code and fix issues.	Yaniv Kaul	2018-07-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Please review, it's not always just the comments that were fixed. I've had to revert of course all calls to creat() that were changed to create() ... Only compile-tested! Change-Id: I7d02e82d9766e272a7fd9cc68e51901d69e5aab5 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
*	add check if no matching password record was found with getpwuid_r(uid)	Vitaly Lipatov	2018-07-13	1	-0/+5
\| \| \| \| \| \|	Change-Id: Iae712828ee656008faf5fe2bc4e6f96fa12ea4cb fixes: bz#1600687 Signed-off-by: Vitaly Lipatov <lav@etersoft.ru>
*	fuse: avoid using the which command	John Mulligan	2018-06-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	In mount.glusterfs avoid using the which tool as it may not exist on minimal system installs. Use the "command -v" builtin as it is expected to be more portable. Remove a extra semicolon while we're at it. Change-Id: Ib682ed4955d5bad1beb94b65d10f4c44e9490767 fixes: bz#1593351 Signed-off-by: John Mulligan <jmulligan@redhat.com>
*	fuse: make sure the send lookup on root instead of getattr()	Amar Tumballi	2018-05-07	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change was done in https://review.gluster.org/16945. While the changes added there were required, it was not necessary to remove the getattr part. As fuse's lookup on root(/) comes as getattr only, this change is very much required. The previous fix for this bug was to add the check for revalidation in lookup when it was sent on root. But I had removed the part where getattr is coming on root. The removing was not requried to fix the issue then. Added back this part of the code, to make sure we have proper validation of root inode in many places like acl, etc. updates: bz#1437780 Change-Id: I859c4ee1a3f407465cbf19f8934530848424ff50 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	mount,fuse: make fuse dumping available as mount option	Csaba Henk	2018-05-04	1	-0/+7
\| \| \| \| \| \|	Updates: bz#1193929 Change-Id: I4dd4d0e607f89650ebb74b893b911b554472826d Signed-off-by: Csaba Henk <csaba@redhat.com>
*	fuse: add support for kernel writeback cache	Csaba Henk	2018-05-04	3	-4/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Added kernel-writeback-cache command line and xlator option for requesting utilisation of the writeback cache of the kernel in FUSE_INIT (see [1]). - Added attr-times-granularity command line and xlator option via which granularity of the {a,m,c}time in stat (attr) data that we support can be indicated to kernel. This is a means to avoid divergence of the attr times between kernel and userspace that could occur with writeback-cache, while still maintaining maximum time precision the FUSE server is capable of (see [2]). - Handling FATTR_CTIME flag in FUSE_SETATTR that indicates presence of ctime in setattr payload. Currently we cannot associate arbitrary ctimes to files on backend, so we just touch them to update their ctimes to current time. Having ctimes in setattr payload is also a side effect of writeback cache (see [3] and [4]). [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d99ff8, "fuse: Turn writeback cache on" [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e27c9d3, "fuse: fuse: add time_gran to INIT_OUT" [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1e18bda, "fuse: add .write_inode" [4]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ab9e13f, "fuse: allow ctime flushing to userspace" Updates: #435 Change-Id: Id174c8e0c815c4456c35f8c53e41a6a507d91855 Signed-off-by: Csaba Henk <csaba@redhat.com>
*	fuse: do fd_resolve in fuse_getattr if fd is received	Susant Palai	2018-04-18	1	-5/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	problem: With the current code, post graph switch the old fd is received for fuse_getattr and since it is associated with old inode, it does not have the inode ctx across xlators in new graph. Hence, dht errored out saying "no layout" for fstat call. Hence the EINVAL. Solution: if fd is passed, init and resolve fd to carry on getattr test case: - Created a single brick distributed volume - Started untar - Added a new-brick Without this fix, untar used to abort with ERROR. Change-Id: I5805c463fb9a04ba5c24829b768127097ff8b9f9 fixes: bz#1566207 Signed-off-by: Susant Palai <spalai@redhat.com>
*	fuse: retire statvfs tweak	Csaba Henk	2018-04-16	1	-13/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fuse xlator used to override the filesystem block size of the storage backend to indicate its preferences. Now we retire this tweak and pass on what we get from the backend. This fixes the anomaly reported in the referred BUG. For more background, see the following email, which was sent out to gluster-devel and gluster-users mailing lists to gauge if anyone sees any use of this tweak: http://lists.gluster.org/pipermail/gluster-devel/2018-March/054660.html http://lists.gluster.org/pipermail/gluster-users/2018-March/033775.html Noone vetoed the removal of it but it got endorsement: http://lists.gluster.org/pipermail/gluster-devel/2018-March/054686.html BUG: 1523219 Change-Id: I3b7111d3037a1b91a288c1589f407b2c48d81bfa Signed-off-by: Csaba Henk <csaba@redhat.com>
*	mount/fuse: Set default fuse reader thread count to 1	Krutika Dhananjay	2018-04-02	1	-1/+1
\| \| \| \| \| \| \|	Updates #412 Change-Id: Ida53d8b630feabb856a3551fa888f92382ade768 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
*	mount/fuse: Add support for multi-threaded fuse readers	Krutika Dhananjay	2018-04-02	5	-83/+168
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Usage: Use 'reader-thread-count=<NUM>' as command line option to set the thread count at the time of mounting the volume. Next task is to make these threads auto-scale based on the load, instead of having the user remount the volume everytime to change the thread count. Updates #412 Change-Id: I94aa1505e5ae6a133683d473e0e4e0edd139b76b Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
*	fuse: enable proper "fgetattr"-like semantics	Csaba Henk	2018-03-06	1	-1/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GETATTR FUSE message can carry a file handle reference in which case it serves as a hint for the FUSE server that the stat data is preferably acquired in context of the given filehandle (which we call '"fgetattr"-like semantics'). So far FUSE ignored the GETTATTR provided filehandle and grabbed a file handle heuristically. This caused confusion in the caching layers, which has been tracked down as one of the reasons of referred BUG. As of the BUG, this is just a partial fix. BUG: 1512691 Change-Id: I67eebbf5407ca725ed111fbda4181ead10d03f6d Signed-off-by: Csaba Henk <csaba@redhat.com>
*	gfapi: return pre/post attributes from glfs_fsync/fdatasync	Kinglong Mee	2018-02-12	1	-1/+2
\| \| \| \| \| \|	Updates: #389 Change-Id: I4153df72d5eeecefa7579170899db4c340128bea Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
*	fuse: write out reverse notification to fuse dump	Csaba Henk	2018-01-17	1	-30/+59
\| \| \| \| \| \|	BUG: 1534602 Change-Id: Ide42cf9cffe462d0cc46272b327c2a05999f09ba Signed-off-by: Csaba Henk <csaba@redhat.com>
*	dict: add more types for values	Amar Tumballi	2018-01-05	2	-3/+3
\| \| \| \| \| \| \| \| \| \|	Added 2 more types which are present in gluster codebase, mainly IATT and UUID. Updates #203 Change-Id: Ib6d6d6aefb88c3494fbf93dcbe08d9979484968f Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	libglusterfs: fix the call_stack_set_group() function	Csaba Henk	2017-11-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- call_stack_set_group() will take the ownership of passed buffer from caller; - to indicate the change, its signature is changed from including the buffer directly to take a pointer to it; - either the content of the buffer is copied to the groups_small embedded buffer of the call stack, or the buffer is set as groups_large member of the call stack; - the groups member of the call stack is set to, respectively, groups_small or groups_large, according to the memory management conventions of the call stack; - the buffer address is overwritten with junk to effectively prevent the caller from using it further on. Also move call_stack_set_group to stack.c from stack.h to prevent "defined but not used [-Wunused-function]" warnings (not using it anymore in call_stack_alloc_group() implementation, which saved us from this so far). protocol/server: refactor gid_resolve() In gid_resolve there are two cases: either the gid_cache_lookup() call returns a value or not. The result is caputured in the agl variable, and throughout the function, each particular stage of the implementation comes with an agl and a no-agl variant. In most cases this is explicitly indicated via an if (agl) { ... } else { ... } but some of this branching are expressed via goto constructs (obfuscating the fact we stated above, that is, each particular stage having an agl/no-agl variant). In the current refactor, we bring the agl conditional to the top, and present the agl/non-agl implementations sequentially. Also we take the opportunity to clean up and fix the agl case: - remove the spurious gl.gl_list = agl->gl_list; setting, as gl is not used in the agl caae - populate the group list of call stack from agl, fixing thus referred BUG. Also fixes BUG: 1513920 Change-Id: I61f4574ba21969f7661b9ff0c9dce202b874025d BUG: 1513928 Signed-off-by: Csaba Henk <csaba@redhat.com>
*	mount/fuse: use fstat in getattr implementation if any opened fd is available	Raghavendra G	2017-11-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The restriction of using fds opened by the same Pid means fds cannot be shared across threads of multithreaded application. Note that fops from kernel have different Pid for different threads. Imagine following sequence of operations: * Turn off performance.open-behind * Thread t1 opens an fd - fd1 - on file "file". Let's assume nodeid of "file" is "nodeid-file". * Thread t2 does RENAME ("newfile", "file"). Let's assume nodeid of "newfile" as "nodeid-newfile". * t2 proceeds to do fstat (fd1) The above set of operations can sometimes result in ESTALE/ENOENT errors. RENAME overwrites "file" with "newfile" changing its nodeid from "nodeid-file" to "nodeid-newfile" and post RENAME, "nodeid-file" is removed from the backend. If fstat carries nodeid-file as argument, which can happen if lookup has not refreshed the nodeid of "file" and since t2 doesn't have an fd opened, fuse_getattr_resume uses STAT which will fail as "nodeid-file" no longer exists. Since the above set of operations and sharing of fds across multiple threads are valid, this is a bug. The fix is to use any fd opened on the inode. In this specific example fuse_getattr_resume will find fd1 and winds down the call as fstat (fd1) which won't fail. Cross-checked with "Miklos Szeredi" <mszeredi.at.redhat.dot.com> for any security issues with this solution and he approves the solution. Thanks to "Miklos Szeredi" <mszeredi.at.redhat.dot.com> for all the pointers and discussions. Change-Id: I88dd29b3607cd2594eee9d72a1637b5346c8d49c BUG: 1510401 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
*	core: make gf_boolean_t a C99 bool instead of an enum	Jeff Darcy	2017-11-03	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \|	This reduces the space used from four bytes to one, and allows new code to use familiar C99 types/values interoperably with our old cruft. It does not change current declarations or code; that will be left for a separate - much larger - patch. Updates: #80 Change-Id: I5baedd17d3fb05b38f0d8b8bb9dd62824475842e Signed-off-by: Jeff Darcy <jdarcy@fb.com>
*	gfproxyd: Let glusterd manage gfproxy daemon	Poornima G	2017-10-18	3	-0/+18
\| \| \| \| \| \| \|	Updates: #242 BUG: 1428063 Change-Id: Iaaf2edf99b2ecc75f6d30762c752a6d445c1c826 Signed-off-by: Poornima G <pgurusid@redhat.com>
*	mount/fuse: never fail open(dir) with ENOENT	Raghavendra G	2017-10-17	1	-0/+7
\| \| \| \| \| \| \| \| \| \|	open(dir) being an operation on inode should never fail with ENOENT. If gfid is not present, the appropriate error is ESTALE. This will enable kernel to retry open after a revalidate lookup. Change-Id: I8d07d2ebb5a0da6c3ea478317442cb42f1797a4b BUG: 1500269 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
*	Revert "mount/fuse: report ESTALE as ENOENT"	Raghavendra G	2017-10-17	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 26d16b90ec7f8acbe07e56e8fe1baf9c9fa1519e. Consider rename (index.new, store.idx) and open (store.idx) being executed in parallel. When we break down operations following sequence is possible. * lookup (store.idx) - as part of open(store.idx) returns gfid1 as the result. * rename (index.new, store.idx) changes gfid of store.idx to gfid2. Note that gfid2 was the nodeid of index.new. Since rename is successful, gfid2 is associated with store.idx. * open (store.idx) resumes and issues open fop to glusterfs with gfid1. open in glusterfs fails as gfid1 doesn't exist and the error returned by glusterfs to kernel-fuse is ENOENT. * kernel passes back the same error to application as a result to open. This error could've been prevented if kernel retries open with gfid2. Interestingly kernel do retry open when it receives ESTALE error. Even though failure to find gfid resulted in ESTALE error, commit 26d16b90ec7f8acb converted that error to ENOENT while sending an error reply to kernel. This prevented kernel from retrying open resulting in error. Change-Id: I2e752ca60dd8af1b989dd1d29c7b002ee58440b4 BUG: 1500269 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
*	mount/fuse : Fix parsing of vol_id for snapshot volume	Mohammed Rafi KC	2017-10-15	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For supporting sub-dir mount, we changed the volid. Which means anything after a '/' in volume_id will be considered as sub-dir path. But snapshot volume has vol_id stracture of /snaps/<volname>/<snapname> which has to be considered as during the parsing. Note 1: sub-dir mount is not supported on snapshot volume Note 2: With sub-dir mount changes brick based mount for quota cannot be executed via mount command. It has to be a direct call via glusterfs Change-Id: I0d824de0236b803db8a918f683dabb0cb523cb04 BUG: 1501235 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	mount/fuse: Make event-history feature configurable	Krutika Dhananjay	2017-09-24	3	-14/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	... and disable it by default. This is because having it disabled seems to improve performance. This could be due to the lock contention by the different epoll threads on the circular buff lock in the fop cbks just before writing their response to /dev/fuse. Just to provide some data - wrt ovirt-gluster hyperconverged environment, I saw an increase in IOPs by 12K with event-history disabled for randrom read workload. Usage: mount -t glusterfs -o event-history=on $HOSTNAME:$VOLNAME $MOUNTPOINT OR glusterfs --event-history=on --volfile-server=$HOSTNAME --volfile-id=$VOLNAME $MOUNTPOINT Change-Id: Ia533788d309c78688a315dc8cd04d30fad9e9485 BUG: 1467614 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
*	fuse/readdirp: Remove need_lookup from fuse_readdirp_cbk	Susant Palai	2017-09-20	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	background: Various xlators used to populate their ctx, on an explicit lookup. That means without a lookup, the translator will have either null or stale data to function. E.g. dht would depend on lookup to create linkto files on the correct node/hashed subvol, afr would rely on this lookup to heal pending data/metadata etc. So to complete above actions a lookup used to be issued on files, even their inode was populated on a readdirp_cbk. This was done by setting the need_lookup flag on all the files those were read on readdirp fop. We tried a small test on "ACL client". For listing 50k files on root itself, it took around 50seconds with readdirp enabled while the same operation took 5-6 seconds with readdirp disabled. Both the times md-cache was enabled. We observed that on the 1st test case (readdirp enabled), post readdirp a getxattr is done. The number of getxattr depends on the number of acl xattrs (I saw requests on these two: system.posix_acl_default, system.posix_acl_access). Since need_lookup flag is set, during fuse_resolve a nameless lookup is executed on the inode(getxattr being inode operation, hence the nameless lookup). Since md-cache does not serve nameless lookup, a network hop is needed for each file, costing the time. With readdirp disabled, the getxattrs are served from md-cache itself(note: we are discussing the 2nd attempt of ls -l use case). _Current affairs around need of lookup for a file to populate it's ctx_: For the xlators on client stack we discussed quite extensively about the need for a lookup fop post readdirp in all three cluster translators - afr, EC and dht. EC and dht don't really need a nameless lookup post readdirp. For afr too, the need for lookup was negated with patch (http://review.gluster.org/6010 - AFRV2), where afr added a function called afr_inode_refresh() which does a lookup and populates its inode context in case a FOP came to AFR without a lookup being issued prior to it. We ran a thread on gluster-devel asking for feedback on the need of explicit lookup post readdirp. For responses refer [1]. Refer [2] for discussions happened on gerrit. After gathering inputs from [1] and [2], it looks like there is no xlator in current state that requires an explicit lookup post readdirp to function properly. * A separate similar patch will be sent for gfapi/nfs/nfs-ganesha. Note: Only file's inode is built with readdirp. [1] http://lists.gluster.org/pipermail/gluster-devel/2017-August/053505.html [2] https://review.gluster.org/#/c/17985/ Change-Id: Ie1d68ce7bea5e1f8a1fab9a62217f478322554f5 BUG: 1492996 Signed-off-by: Susant Palai <spalai@redhat.com>
*	mount/fuse: Include sub-directory in source argument for mount()	Vijay Bellur	2017-09-07	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With this, mount of a sub-directory 'foo' gets listed in /proc/mounts as: <hostname>:<volname>/foo on /mnt/glusterfs type fuse.glusterfs (rw,relatime...) Signed-off-by: Vijay Bellur <vbellur@redhat.com> BUG: 1488913 Change-Id: Ib1e1ac3741bf66e1a912d792f2948b748931f2b0 Reviewed-on: https://review.gluster.org/18210 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com>
*	scripts: mount.glusterfs contains non-portable bashisms	Kaleb S. KEITHLEY	2017-09-05	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Debian's default shell is dash, i.e. /bin/sh -> dash, which doesn't support bash extensions Reported-by: "Michael Lundkvist" <brels.debian@solske.net> Reported-by: pmatthaei@debian.org Debian BZ: 873878 Change-Id: I33003183b9bc6459cae28c565125e6b2bd1eaa47 BUG: 1487830 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: https://review.gluster.org/18184 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	Infra to indentify process	hari gowtham	2017-08-16	2	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: currently we can't identify which process is running and how many instances of it are available. Fix: name the process when its spawned and send it to the server and save it in the client_t The processes that abide by this change from this patch are: 1) fuse mount, 2) rebalance, 3) selfheal, 4) tier, 5) quota, 6) snapshot, 7) brick. 8) gfapi (by default. gfapi.<processname> if processname is found) Note: fuse gets a process name as native-fuse-client by default. If the user gives a name for the fuse and spawns it, it will be of this type --process-name native-fuse-client.<name_specified>. This can be made use by the process like aux mount done by quota, geo-rep, etc by adding another option in the aux mount " -o process-name=gsync_mount" Updates: #178 Signed-off-by: hari gowtham <hgowtham@redhat.com> Change-Id: Ie4d02257216839338043737691753bab9a974d5e Reviewed-on: https://review.gluster.org/17957 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: hari gowtham <hari.gowtham005@gmail.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Aravinda VK <avishwan@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
*	glusterfsd: allow subdir mount	Amar Tumballi	2017-08-04	1	-1/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Changes: 1. Take subdir mount option in client (mount.gluster / glusterfsd) 2. Pass the subdir mount to server-handshake (from client-handshake) 3. Handle subdir-mount dir's lookup in server-first-lookup and handle all fops resolution accordingly with proper gfid of subdir 4. Change the auth/addr module to handle the multiple subdir entries in option, and valid parsing. How to use the feature: `# mount -t glusterfs $hostname:/$volname/$subdir /$mount_point` Or `# mount -t glusterfs $hostname:/$volname -osubdir_mount=$subdir /$mount_point` Option can be set like: `# gluster volume set <volname> auth.allow "/subdir1(192.168.1.),/(192.168.10.),/subdir2(192.168.8.*)"` Updates #175 Change-Id: I7ea57f76ddbe6c3862cfe02e13f89e8a39719e11 Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: https://review.gluster.org/17141 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	libglusterfs: Name threads on creation	Raghavendra Talur	2017-07-19	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Set names to threads on creation for easier debugging. Output of top -H -p <PID-OF-GLUSTERFSD> Before: 19773 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19774 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19775 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19776 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19777 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19778 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19779 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19780 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19781 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19782 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19783 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19784 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19785 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd 19786 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd 19787 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd 19789 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19790 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 25178 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 5398 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 7881 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd After: 19773 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19774 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustertimer 19775 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19776 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustermemsweep 19777 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustersproc0 19778 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustersproc1 19779 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll0 19780 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusteridxwrker 19781 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusteriotwr0 19782 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterbrssign 19783 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterbrswrker 19784 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterclogecon 19785 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd0 19786 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd1 19787 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd2 19789 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixjan 19790 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixfsy 25178 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll1 5398 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll2 7881 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixhc Change-Id: Id5f333755c1ba168a2ffaa4fce6e71c375e10703 BUG: 1254002 Updates: #271 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: https://review.gluster.org/11926 Reviewed-by: Prashanth Pai <ppai@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	fuse: memory leak fixes	Danny Couture	2017-07-14	1	-38/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix fuse ctx memory leak in case an error occurs and the cleanup path is different than usual. Also fix a memory leak in logging if eh_save_history() fails. Change-Id: I7ec967c807b0ed91184e5b958be70702215c46c9 BUG: 1470220 Signed-off-by: Danny Couture <couture.danny@gmail.com> Reviewed-on: https://review.gluster.org/17759 Reviewed-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Amar Tumballi <amarts@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
*	groups: don't allocate auxiliary gid list on stack	Csaba Henk	2017-07-06	1	-14/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When glusterfs wants to retrieve the list of auxiliary gids of a user, it typically allocates a sufficiently big gid_t array on stack and calls getgrouplist(3) with it. However, "sufficiently big" means to be of maximum supported gid list size, which in GlusterFS is GF_MAX_AUX_GROUPS = 64k. That means a 64k * sizeof(gid_t) = 256k allocation, which is big enough to overflow the stack in certain cases. A further observation is that stack allocation of the gid list brings no gain, as in all cases the content of the gid list eventually gets copied over to a heap allocated buffer. So we add a convenience wrapper of getgrouplist to libglusterfs called gf_getgrouplist which calls getgrouplist with a sufficiently big heap allocated buffer (it takes care of the allocation too). We are porting all the getgrouplist invocations to gf_getgrouplist and thus eliminate the huge stack allocation. BUG: 1464327 Change-Id: Icea76d0d74dcf2f87d26cb299acc771ca3b32d2b Signed-off-by: Csaba Henk <csaba@redhat.com> Reviewed-on: https://review.gluster.org/17706 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	Link against missed libraries to resolve symbols	Prashanth Pai	2017-07-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When external programs perform a dlopen("..so", RTLD_LAZY\|RTLD_LOCAL) on some shared objects like xlators, it can fail with dlerror set to error string "undefined symbol <some-type>". This was observed for the following shared objects: fuse.so, quota.so, quotad.so, server.so, libgfrpc.so and socket.so P.S: This was found while running a go program which fetches the list of xlator options (volume_option_t) from xlator's shared object. BUG: 1193929 Change-Id: I7b958409cf11fb67c2be32a3f85a96fb1260236b Signed-off-by: Prashanth Pai <ppai@redhat.com> Reviewed-on: https://review.gluster.org/17659 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
*	fuse-bridge: cleanup first_lookup()	Amar Tumballi	2017-05-30	1	-68/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	use syncop_lookup instead of synchronising stack_wind/unwind again. Updates #175 Change-Id: Iad4a181d8601235a999039979bfb7ec688675520 Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: https://review.gluster.org/17075 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Prashanth Pai <ppai@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	fuse: implement "-oauto_unmount"	Csaba Henk	2017-05-23	3	-4/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	libfuse has an auto_unmount option which, if enabled, ensures that the file system is unmounted at FUSE server termination by running a separate monitor process that performs the unmount when that occurs. (This feature would probably better be called "robust auto-unmount", as FUSE servers usually do try to unmount their file systems upon termination, it's just this mechanism is not crash resilient.) This change implements that option and behavior for glusterfs. Note that "auto unmount" (robust or not) is a leaky abstraction, as the kernel cannot guarantee that at the path where the FUSE fs is mounted is actually the toplevel mount at the time of the umount(2) call, for multiple reasons, among others, see: fuse-devel: "fuse: feasible to distinguish between umount and abort?" http://fuse.996288.n3.nabble.com/fuse-feasible-to-distinguish-between-umount-and-abort-tt14358.html https://github.com/libfuse/libfuse/issues/122 Updates #153 Change-Id: Ia4432580c9fd2c156d9c73c3a44f4bfd42437599 Signed-off-by: Csaba Henk <csaba@redhat.com> Reviewed-on: https://review.gluster.org/17230 Tested-by: Amar Tumballi <amarts@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
*	mount/fuse: Handle racing notify on more than one graph properly	Raghavendra G	2017-05-10	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make sure that we always use latest graph as a candidate for active-subvol. Change-Id: Ie37c818366f28ba6b1570d65a9eb17697d38a6c5 BUG: 1448364 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: https://review.gluster.org/17200 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	Halo Replication feature for AFR translator	Kevin Vigor	2017-05-02	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Halo Geo-replication is a feature which allows Gluster or NFS clients to write locally to their region (as defined by a latency "halo" or threshold if you like), and have their writes asynchronously propagate from their origin to the rest of the cluster. Clients can also write synchronously to the cluster simply by specifying a halo-latency which is very large (e.g. 10seconds) which will include all bricks. In other words, it allows clients to decide at mount time if they desire synchronous or asynchronous IO into a cluster and the cluster can support both of these modes to any number of clients simultaneously. There are a few new volume options due to this feature: halo-shd-latency: The threshold below which self-heal daemons will consider children (bricks) connected. halo-nfsd-latency: The threshold below which NFS daemons will consider children (bricks) connected. halo-latency: The threshold below which all other clients will consider children (bricks) connected. halo-min-replicas: The minimum number of replicas which are to be enforced regardless of latency specified in the above 3 options. If the number of children falls below this threshold the next best (chosen by latency) shall be swapped in. New FUSE mount options: halo-latency & halo-min-replicas: As descripted above. This feature combined with multi-threaded SHD support (D1271745) results in some pretty cool geo-replication possibilities. Operational Notes: - Global consistency is gaurenteed for synchronous clients, this is provided by the existing entry-locking mechanism. - Asynchronous clients on the other hand and merely consistent to their region. Writes & deletes will be protected via entry-locks as usual preventing concurrent writes into files which are undergoing replication. Read operations on the other hand should never block. - Writes are allowed from _any_ region and propagated from the origin to all other regions. The take away from this is care should be taken to ensure multiple writers do not write the same files resulting in a gfid split-brain which will require resolution via split-brain policies (majority, mtime & size). Recommended method for preventing this is using the nfs-auth feature to define which region for each share has RW permissions, tiers not in the origin region should have RO perms. TODO: - Synchronous clients (including the SHD) should choose clients from their own region as preferred sources for reads. Most of the plumbing is in place for this via the child_latency array. - Better GFID split brain handling & better dent type split brain handling (i.e. create a trash can and move the offending files into it). - Tagging in addition to latency as a means of defining which children you wish to synchronously write to Test Plan: - The usual suspects, clang, gcc w/ address sanitizer & valgrind - Prove tests Reviewers: jackl, dph, cjh, meyering Reviewed By: meyering Subscribers: ethanr Differential Revision: https://phabricator.fb.com/D1272053 Tasks: 4117827 Change-Id: I694a9ab429722da538da171ec528406e77b5e6d1 BUG: 1428061 Signed-off-by: Kevin Vigor <kvigor@fb.com> Reviewed-on: http://review.gluster.org/16099 Reviewed-on: https://review.gluster.org/16177 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	fuse: enhance fusedump to include timestamp and a signaturev3.12dev	Csaba Henk	2017-04-30	1	-12/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	(Also referred to as "fusedump v2".) Change-Id: I837944024efd1b9055c2f5f91bd5723ef350e688 Signed-off-by: Csaba Henk <csaba@redhat.com> Reviewed-on: https://review.gluster.org/16422 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
*	mount/fuse: Replace GF_LOG_OCCASIONALLY with gf_log() to report fop failure ↵	Krutika Dhananjay	2017-04-30	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	at all times Change-Id: Ibd8e1c6172812951092ff6097ba4bed943051b7c BUG: 1440051 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: https://review.gluster.org/17086 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
*	fuse: clean up mount flag processing	Csaba Henk	2017-04-27	1	-6/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In general, when one invokes a mount helper program -- basically anything that mounts something based on its command line, so thinking of mount(8), mount.<fs-type> or fusermount, but also of FUSE servers in general, including glusterfs -- the command line arguments that are to affect mount(2) are mapped to a bitmask called the mount flags, which is passed to mount(2), so that the kernel can interpret the flag bits and adjusts properties of the mount accordingly. There is a traditional syntax for this mechanism as implemented in mount(8): one passes "-ocomma,separated,mount,options" and the individual option name strings are mapped to flag bits in mount(8). FUSE further explores this idea and typically the FUSE server command lines allow further option names to be used in the "-ooption,name,list" which are then separated from the kernel sanctioned option names (to which we'll refer as "system mount options") and are passed to a platform specific lower level fuse mount helper interface. The separation of system mount option names and FUSE specific option names is also platform specific, so the general mount interface function, which in case of glusterfs is gf_fuse_mount(), should abstract this away. Therefore we change the signature of this function from int gf_fuse_mount (const char mountpoint, char fsname, unsigned long mountflags, char mnt_param, pid_t mtab_pid, int status_fd); to int gf_fuse_mount (const char mountpoint, char fsname, char mnt_param, pid_t mtab_pid, int status_fd); and deal with flag extraction in platform specific mount code. Note that the sole purpose of the mountflags argument was to indicate read-only mounting. The other system mount option names were expected to reside in the comma-separated mnt_param string, but they were not properly processed (see the referred BUG). With the new gf_fuse_mount signature read-only mounting is to be indicated as a "ro" component in mnt_param. - For Darwin, which has a dedicated, separate gf_fuse_mount implementation, gf_fuse_mount was ignoring mountflags, so only the signature had to to be adjusted. However, as bonus, we gain read-only support for Darwin, which was missing so far, given that it was indicated via the ignored mountflags. Darwin's low level mount helper relies on the "ro" component of the option string, which agrees with the new calling convention of gf_fuse_mount. - On Linux, system mount option name handling (apart from the distinguished read-only option) used to have the inadvertent side effect of adding "nosuid,nodev" as indicated in BUG; since Ia89d975d1e27fcfa5ab2036ba546aa8fa0d2d1b0 this side effect is removed, but system mount option name handling was left broken (passing system mount options other than "ro" fails to mount). - On other platforms, system mount option name handling is broken (expect for the distinguished read-only option). As of this change, in the general (non-Darwin) implementation of gf_fuse_mount we take care of proper separation of system mount names and their conversion to mount flags. For Linux, we adopt the conversion table from FUSE upstream. For other systems we just provide a best effort to support those system mount options which are understood across all Unices (nosuid,nodev,noatime,noexec,ro). (This can be improved later to provide proper plaform support.) BUG: 1297182 Change-Id: I5d10b5df46feba7a02bf5bf1018db69e6b52260a Signed-off-by: Csaba Henk <csaba@redhat.com> Reviewed-on: https://review.gluster.org/16313 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Amar Tumballi <amarts@redhat.com>