summaryrefslogtreecommitdiffstats
path: root/xlators
Commit message (Collapse)AuthorAgeFilesLines
* build: many rpm-build fixesJeff Darcy2017-12-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Highlights include: * Fixed GF_CONF_OPTS (dev builds) and RPM_BUILD_FLAGS (rpm builds) * Fixed version in configure.ac * Fixed handling of files only present when BUILD_FB_EXTRAS is set * Fixed disable-georeplication (upstream bug) * Fixed disable-tiering (upstream bug) * Removed .service files which should be generated from .in versions * Fixed tirpc (previously fbtirpc) references * Fixed init_enable problems * Removed delay-gen references Test Plan: Use build.sh to build an RPM, and install it. Differential Revision: https://phabricator.intern.facebook.com/D6611299 Change-Id: If61a4964a149f782038ea47362a82b813e6b7738 Signed-off-by: Jeff Darcy <jdarcy@fb.com>
* build: merge our 3.6 and upstream 3.8 configure/specfileJeff Darcy2017-11-202-0/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: They have a common ancestor at 3.6, but there were hundreds of lines of changes for each file on each side of the fork. In both cases the easiest method was to take the upstream 3.8 version and re-apply our own changes since we branched. Some changes were dropped (e.g. runit) and a few other files needed new changes (e.g. pkg-version) to keep up. Then there was more hacking to fix stealth geo-rep dependencies, enable tirpc/IPv6, and so on. Also added buildrpm38 and makerelease38. These should probably not go upstream, but not sure what else to do with them. Test Plan: Build RPMs. Install, create volumes, mount, do I/O. Reviewers: sshreyas, #posix_storage Reviewed By: sshreyas Subscribers: jbacik, aquevedo, scientist, sshreyas, calvinowens, jweiner Differential Revision: https://phabricator.intern.facebook.com/D6259797 Tasks: T20348589 Tags: posix-2017h2, gluster, posix_storage Change-Id: I2d43fc6f7f5603293e406c21e4ec85bf19610b77 Signature: 6259797:1510694123:fc5d2975fec134a51d4b70f7f983cd71971e175a
* glusterd: fix missing/renamed optionsJeff Darcy2017-10-021-3/+46
| | | | | Change-Id: I2ca0298ee9d166f58b8730256ea76a04e547ce5d Signed-off-by: Jeff Darcy <jdarcy@fb.com>
* self-heal: fix automatic split-brain resolution optionsJeff Darcy2017-09-274-44/+188
| | | | | | Differential Revision: https://phabricator.intern.facebook.com/D5927193 Change-Id: Ife04c8738b9ee721e7be9bc843b2f6d54bbb468e
* io-threads: re-port changes since 3.6 on top of FB versionJeff Darcy2017-09-183-22/+63
| | | | | | | | | | Includes io-threads parts of the following patches: 9e3fea1 performance/io-threads: Exit all threads on PARENT_DOWN 2cfb7bc performance/io-threads: Exit threads in fini() as well Change-Id: Id7cc7720e75414fb8a3ac2db68a5fe63c459ffe2 Signed-off-by: Jeff Darcy <jdarcy@fb.com>
* io-stats: re-port changes since 3.6 on top of FB versionJeff Darcy2017-09-153-55/+90
| | | | | | | | | | | | | | | | | | | Includes io-stats parts of the following patches: 1e421a5 logging: Avoid re-initing log level in io-stats 0facb11 io-stats: Fix overwriting of client profile by the bricks 91004b0 debug/io-stats: Disable fop stats dump by default 62f9659 all: fix various cppcheck warnings e62c0fe build: export minimum symbols from xlators for correct resolution 1d0a0d1 core: use syscall wrappers instead of direct syscalls - tail 0773ca6 all: reduce "inline" usage 8a9328e build: do not #include "config.h" in each file 320455b io-stats: Fixing dereference after null check. 28397ca Avoid conflict between contrib/uuid and system uuid 49d6894 io-stats : null dereference coverity fix. Change-Id: If1bdad6244e5749c6d8c456e6c64b5c5b483e273 Signed-off-by: Jeff Darcy <jdarcy@fb.com>
* Replace namespace/io-stats/io-threads with 3.6-fb versionsJeff Darcy2017-09-1510-1346/+2931
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This rolls up multiple patches related to namespace identificaton and throttling/QoS. This primarily includes the following, all by Michael Goulet <mgoulet@fb.com>. io-threads: Add weighted round robin queueing by namespace https://phabricator.facebook.com/D5615269 io-threads: Add per-namespaces queue sizes to IO_THREADS_QUEUE_SIZE_KEY https://phabricator.facebook.com/D5683162 io-threads: Implement better slot allocation algorithm https://phabricator.facebook.com/D5683186 io-threads: Only enable weighted queueing on bricks https://phabricator.facebook.com/D5700062 io-threads: Update queue sizes on drain https://phabricator.facebook.com/D5704832 Fix parsing (-1) as default NS weight https://phabricator.facebook.com/D5723383 Parts of the following patches have also been applied to satisfy dependencies. io-throttling: Calculate moving averages and throttle offending hosts https://phabricator.fb.com/D2516161 Shreyas Siravara <sshreyas@fb.com> Hook up ODS logging for FUSE clients. https://phabricator.facebook.com/D3963376 Kevin Vigor <kvigor@fb.com> Add the flag --skip-nfsd-start to skip the NFS daemon stating, even if it is enabled https://phabricator.facebook.com/D4575368 Alex Lorca <alexlorca@fb.com> There are also some "standard" changes: dealing with code that moved, reindenting to comply with Gluster coding standards, gf_uuid_xxx, etc. This patch *does* revert some changes which have occurred upstream since 3.6; these will be re-applied as apppropriate on top of this new base. Change-Id: I69024115da7a60811e5b86beae781d602bdb558d Signed-off-by: Jeff Darcy <jdarcy@fb.com>
* nfs: Correctly reconfigure NFS optionsMichael Goulet2017-09-141-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: A mistake was made in D2519423 where `ret` wasn't being set to `0` at the end of `nfs3_init_subvolume_options` since code was inserted between the final `ret = 0` and the return, causing the function to return phony positive ret values. This causes the code to interpret the reconfigure function as a failure, meaning that changes can't be persisted. This only affects the `reconfigure` path and not the `init` path, since the `reconfigure` path fails when `ret != 0` and the init path only fails when `ret == -1`... Test Plan: See that volume options are actually being set when the `nfs` xlator is alive, instead of simply on init. Reviewers: jdarcy, kvigor, dph, sshreyas Reviewed By: sshreyas Subscribers: #posix_storage Differential Revision: https://phabricator.intern.facebook.com/D5699888 Change-Id: I89006ce3970f22a4206e58ca5630c21df536031c Signed-off-by: Jeff Darcy <jdarcy@fb.com> Reviewed-on: https://review.gluster.org/18293 Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* Add options to disable new featuresMichael Goulet2017-09-143-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | Summary: @sshreyas thought the best idea to roll out these new features in the default-off state. This diff adds a few options and modifies tests to make sure that this is done. Test Plan: The brick restart test works fine, but now it's default disabled on all bricks. Reviewers: sshreyas, jdarcy Reviewed By: jdarcy Subscribers: sshreyas, #posix_storage Differential Revision: https://phabricator.intern.facebook.com/D5653138 Porting note: includes disconnected-reqs option; retart-bricks inapplicable Change-Id: I332339894d3cbfafdabeb8592e95c37f30f9751a Signed-off-by: Jeff Darcy <jdarcy@fb.com> Reviewed-on: https://review.gluster.org/18291 Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* Get glusterfs to output p50, p90, and p95 latenciesSheena Artrip2017-09-131-67/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: [done] separate p99 dumping into general funcs [done] add p95, p90, and p50 stats - add p95, p90, p50 within p99, and generalize - rename config to dump-percentile-lantencies Test Plan: make install glusterfs on dev machine. gluster volume create $name ... mount volume on /mnt/$name <brick1, brick2, ...> dd if=/dev/zero of=/mnt/$name/test check each brick for pn printing /var/lib/glusterd/stats/glusterfsd__$brick.dump Reviewers: sshreyas, kvigor, jdarcy Reviewed By: jdarcy Differential Revision: https://phabricator.intern.facebook.com/D5645951 Change-Id: Ic8ada48d9772bf2d5b3a2ba3c845d91d4e03c9d3 Signed-off-by: Jeff Darcy <jdarcy@fb.com> Reviewed-on: https://review.gluster.org/18279 Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* Get glusterfs to output p50, p90, and p95 latenciesSheena Artrip2017-09-132-52/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: [done] separate p99 dumping into general funcs [done] add p95, p90, and p50 stats - add p95, p90, p50 within p99, and generalize - rename config to dump-percentile-lantencies Test Plan: make install glusterfs on dev machine. gluster volume create $name ... mount volume on /mnt/$name <brick1, brick2, ...> dd if=/dev/zero of=/mnt/$name/test check each brick for pn printing /var/lib/glusterd/stats/glusterfsd__$brick.dump Reviewers: sshreyas, kvigor, jdarcy Reviewed By: jdarcy Differential Revision: https://phabricator.intern.facebook.com/D5645951 Change-Id: I7bcd7201fc3753316db0ece809491a1cbdbefd32 Signed-off-by: Jeff Darcy <jdarcy@fb.com> Reviewed-on: https://review.gluster.org/18278 Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* added p99 support to the samples loggingAugustus Wynn2017-09-132-0/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: added global and by-fop-type calculation of p99 latency to the sampled fop data Test Plan: build local glusterfs mount and looked at the stats while dd if=/dev/zero of=/mnt/fuse/groot/share1/test1 bs=5 Reviewers: sshreyas, mgoulet, jdarcy Reviewed By: jdarcy Subscribers: jdarcy Differential Revision: https://phabricator.intern.facebook.com/D5597662 Change-Id: I3f5cd9c0ea59ae4357827fcbd19bbf009e661c05 Signed-off-by: Jeff Darcy <jdarcy@fb.com> Reviewed-on: https://review.gluster.org/18277 Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* Disable brick daemon from incorrect brick directorykrad2017-09-136-1/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Currently the bricks can open any mount directory from the given volume. This patch adds a provision to prevent bricks from opening brick directories that aren't created for them. This will help with operating gluster on large scale. We add a new xattr GF_XATTR_BRICK_NAME to the brick directory. When we start a brick daemon, we make sure the path on disk matches with the config provided. For backward compatibility, we ignore if there is no value for GF_XATTR_BRICK_NAME and set the current brick daemon's path as value. We ignore GF_XATTR_BRICK_NAME during healing and reset GF_XATTR_BRICK_NAME on brick replace. Test Plan: Run fb-smoke Reviewers: jdarcy, sshreyas Reviewed By: sshreyas Differential Revision: https://phabricator.intern.facebook.com/D5448921 Porting note: disabled some checks to deal with the snapshot case Change-Id: I98e62033dfd07f30ad3b99ac003ce94c8d935e5f Signed-off-by: Jeff Darcy <jdarcy@fb.com> Reviewed-on: https://review.gluster.org/18275 Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* glusterd: make peerfile parsing more robustJeff Darcy2017-09-131-15/+32
| | | | | | | | | | | | | | | | | | | | | | | | Summary: This will now skip files in the peer directory that don't have names or contents that match what we expect for a valid peerfile, instead of blowing up the entire glusterd initialization as soon as the first unexpected thing happens. Test Plan: Test (peer-parsing.t) included. Reviewers: #posix_storage, kvigor Reviewed By: kvigor Subscribers: kvigor Differential Revision: https://phabricator.intern.facebook.com/D5498639 Tags: gluster, posix_storage Change-Id: Ifad9b047a828c2f76f97d0c39f305b7ec5a8ca4c Signed-off-by: Jeff Darcy <jdarcy@fb.com> Reviewed-on: https://review.gluster.org/18276 Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* Wait for a brick's local filesystem to be mounted before starting.Jeff Darcy2017-09-131-4/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: By far the most common reason why a brick's directory might not exist is that the local filesystem on which it lives hasn't finished mounting yet. This is unlike other checks we do, such as for a volume ID and GFID. Some of these are normal conditions when a brick is firstcreated; others are often the result of operator/script error. In the singular case of the directory being absent, wait a little while to see if it comes up. Test Plan: Create a volume. Start/stop a volume once so everything gets initialized. Move a brick directory out of place. Try to start the volume. This should pause. Immediately move the brick directory back into place. This should break the pause. Reviewers: #posix_storage, sshreyas Reviewed By: sshreyas Subscribers: shreyas, sshreyas, ventullo, moox Differential Revision: https://phabricator.intern.facebook.com/D5063515 Tags: gluster Change-Id: Ied7b07b1a60f54856a67d4cdbad35bfce9e196e4 Signed-off-by: Jeff Darcy <jdarcy@fb.com> Reviewed-on: https://review.gluster.org/18274 Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* inodelk-count: Add stats to count the number of lock objectskrad2017-09-136-1/+194
| | | | | | | | | | | | | | | | | | | | | | | | Summary: We want to track the number of locks held by the locks xlator. One of the ways to do it would be to track the total number of pl_lock objects in the system. This patch tracks the total number of pl_lock object and exposes the stat via io-stats JSON dump. Test Plan: WIP, haven't got a pass. Putting the diff to get a sense of this approach would yield what you guys are looking for? Reviewers: kvigor, sshreyas, jdarcy Reviewed By: jdarcy Differential Revision: https://phabricator.intern.facebook.com/D5303071 Change-Id: I946debcbff61699ec28b4d6f243042440107a224 Signed-off-by: Jeff Darcy <jdarcy@fb.com> Reviewed-on: https://review.gluster.org/18273 Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* Remediation for XFS/DIO corruption problem.Jeff Darcy2017-09-1214-24/+551
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a new volume option, shd-validate-data. When set, the self-heal code will fetch checksums for regular files along with all the usual xattrs. If the file seems OK but the checksums show a data mismatch, and if there is only one replica that's out of step with the others, then we modify the source/sink calculations to force a heal from one of the agreeing replicas to the odd one out. Combined with a tool to put files into the self-heal index (being developed separately), this provides a very rudimentary kind of scrubbing functionality. Validation is now conditional on the "trusted.glusterfs.validate-status" xattr having the specific value of "suspect" to avoid redoing validation (which is expensive) as we find the same file in multiple bricks' indices. When we decide to take action, we update this xattr to "clean" for copies that were in the majority and "repaired" for the odd one out that gets clobbered. We also copy the about-to-be-clobbered copy into an "orphans" directory to facilitate analysis of corruption patterns. The data goes into ${GFID}.data there, while ${GFID}.link is a symlink to the file's old location. Porting note: this is several internal squashed together ("See Also") Differential Revision: https://phabricator.intern.facebook.com/D5092983 See Also: https://phabricator.intern.facebook.com/D5126974 See Also: https://phabricator.intern.facebook.com/D5127427 See Also: https://phabricator.intern.facebook.com/D5132804 See Also: https://phabricator.intern.facebook.com/D5209185 See Also: https://phabricator.intern.facebook.com/D5370353 Change-Id: Ie0ae18b368c408a5e47d0bf03ebac80b87b70aa9 Signed-off-by: Jeff Darcy <jdarcy@fb.com> Reviewed-on: https://review.gluster.org/18269 Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* nfs: Enable multi-core epoll support in gNFSdShreyas Siravara2017-09-123-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: - Enables multi-core epoll support in the nfs daemon. - Option can be turned on using: gluster volume set <volname> nfs.event-threads <numthreads> Test Plan: Prove test! Reviewers: kvigor, rwareing Reviewed By: rwareing Subscribers: dld, moox, dph Differential Revision: https://phabricator.fb.com/D3117966 Change-Id: Ie8a7b1ba04b0e83f5ec7a09f9d181fe59be479ca Signed-off-by: Jeff Darcy <jdarcy@fb.com> Reviewed-on: https://review.gluster.org/18266 Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* event: Idle connection managementShreyas Siravara2017-09-123-2/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: - This diff adds support for detecting and tracking idle client connections. - It allows *service translators* (server, nfs) to opt-in to detect and close idle client connections. - Right now it explicitly restricts the service to NFS as a safety. Here are the debug logs when a client connection gets closed: [2016-03-29 17:27:06.154232] W [socket.c:2426:socket_timeout_handler] 0-socket: Shutting down idle client connection (idle=20s,fd=20,conn=[2401:db00:11:d0af:face:0:3:0:957]->[2401:db00:11:d0af:face:0:3:0:2049])! [2016-03-29 17:27:06.154292] D [event-epoll.c:655:__event_epoll_timeout_slot] 0-epoll: Connection on slot->fd=9 was idle for 20 seconds! [2016-03-29 17:27:06.163282] D [socket.c:629:__socket_rwv] 0-socket.nfs-server: EOF on socket [2016-03-29 17:27:06.163298] D [socket.c:2474:socket_event_handler] 0-transport: disconnecting now [2016-03-29 17:27:06.163316] D [event-epoll.c:614:event_dispatch_epoll_handler] 0-epoll: generation bumped on idx=9 from gen=4 to slot->gen=5, fd=20, slot->fd=20 Test Plan: - Used stuck NFS mounts to create idle clients and unstuck them. Reviewers: kvigor, rwareing Reviewed By: rwareing Subscribers: dld, moox, dph Differential Revision: https://phabricator.fb.com/D3112099 Change-Id: Ic06c89e03f87daabab7f07f892390edd1a1fcc20 Signed-off-by: Jeff Darcy <jdarcy@fb.com> Reviewed-on: https://review.gluster.org/18265 Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* clusters/afr: Move root entry heal flow to SHDRichard Wareing2017-09-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: - Improves upon D2387001 by moving the "forced" root gfid heal to the SHDs - Removed code which forced NFSd/FUSE clients through the entry heal for the root GFID, this will make them spin up just as fast as prior to D2387001 (i.e. instantly) Porting note: mostly inapplicable in 3.8, only one non-test change survived Test Plan: - Must pass tests/bugs/fb8149516.t Reviewers: dph, moox, sshreyas Reviewed By: sshreyas Differential Revision: https://phabricator.fb.com/D2722239 Change-Id: I35f5827df6ead1bb0ff886ca0adabb2add2e7163 Signed-off-by: Jeff Darcy <jdarcy@fb.com> Reviewed-on: https://review.gluster.org/18259 Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* io-threads: nuke everything from a client when it disconnectsJeff Darcy2017-09-091-3/+38
| | | | | | | | | | | | Summary: These requests haven't been issued, yet alone acknowledged. They would disappear if we crashed, which to the client is indistinguishable from any other kind of disconnection - if indeed the client itself isn't the one that died. So we're completely within our rights to discard these. There are strong hints that such "orphan" requests are part of how we get into the lock-revocation hangs we've been seeing for a while. Even if that theory doesn't pan out, there's no good reason to keep them around clogging up queues and so forth. This is a port of D5430057 & D5662545 to 3.8 Change-Id: Ie4c88f7791aac85540631f60f5c639497468ad76 Reviewed-on: https://review.gluster.org/18254 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* posix: Add option to disable nftw() based deletes when purging the landfill ↵Shreyas Siravara2017-09-094-7/+54
| | | | | | | | | | | | | | | | | | | | | | | | | directory Summary: - We may have found an issue where certain directories were being moved into .landfill and then being quickly purged via nftw(). - We would like to have an emergency option to disable these purges. Test Plan: Build, vol-set, read logs Reviewers: rwareing, dph Reviewed By: dph Subscribers: #posix_storage Differential Revision: https://phabricator.intern.facebook.com/D4862021 Change-Id: I90b54c535930c1ca2925a928728199b6b80eadd9 Signature: t1:4862021:1491855616:51b9b5b8957b0bb97afe27766f2e5aa78ff9edd4 Reviewed-on: https://review.gluster.org/18253 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/afr: SHD should not use did_discovery code pathsRichard Wareing2017-09-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | Summary: - Exempt the SHD from the discover code path Test Plan: - prove -v tests/bugs/fb8149516.t - Make rc and canary on offending host (gfsdataswarm048.prn2) Reviewers: moox, dph, sshreyas Reviewed By: sshreyas Differential Revision: https://phabricator.fb.com/D2491694 Change-Id: I691a990950e13be6e376c64fddb110cd6ceefe47 Signed-off-by: Jeff Darcy <jdarcy@fb.com> Reviewed-on: https://review.gluster.org/18251 Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* posix-acl: Add assume-permissive option for EACCES debugging / rug-sweeping.Kevin Vigor2017-09-092-3/+163
| | | | | | | | | | | | | Summary: Add assume-permissive option for EACCES debugging / rug-sweeping. Re-fetch permissions when needed if they're absent. This is a port of D5104707 & D5131597 to 3.8 Change-Id: I900fc66876ec8e73b04049f844c428b3d225d4ad Reviewed-on: https://review.gluster.org/18249 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* cluster/afr: AFR2 Discovery entry heal flow should only happen on root gfidRichard Wareing2017-09-081-3/+4
| | | | | | | | | | | | | | | | | | | | Summary: - Prevents entry self-heal flow from happening on non-root GFIDs Test Plan: - Run prove -v tests/bugs/fb8149516.t Reviewers: dph, moox, sshreyas Reviewed By: sshreyas Differential Revision: https://phabricator.fb.com/D2470622 Change-Id: Id8559f2cfeb6e1e5c26dc1571854c0fbc0b59e08 Signed-off-by: Jeff Darcy <jdarcy@fb.com> Reviewed-on: https://review.gluster.org/18250 Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* exports: Add a reference count to export_item_t structs and ensure they are ↵Shreyas Siravara2017-09-083-10/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | correctly used Summary: This diff fixes a bug in the NFS daemon where the auth cache would use an export item after it was free'd by the auth params refresh thread. This usually manifests as a crash in production, when exports files are updated by chef. Since each auth cache entry holds a pointer to an export_item_t it makes sense that it should first get a reference to it. Freein'g the export_item_t struct happens only in `exp_item_unref()`, once the reference count has dropped to 0. This diff also fixes a use-after-free bug in the auth-cache, in the insertion path. In _cache_item(), if we find an entry in the dict, we update that entry with a timestamp & ref the export item associated with it. However, if the item already existed and we called old_cache_insert() with the same key, we gave the dict permission to free the old entry. We then end up using that entry. The fix is to use dict_set_static_bin() instead of dict_set_bin() which informs the dict that the pointer we are giving it belongs to us. This is a port of D5780476, D5785038 to 3.8 Change-Id: I5cdcdc1cc6abad26c7077d66a14f263da07678ac Reviewed-on: https://review.gluster.org/18248 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* nfs: Add locking to auth-cache, fix some bugsJeff Darcy2017-09-082-71/+216
| | | | | | | | | | | | | | | | | | | Summary: A lot of the diff "volume" is just refactoring, which should have no functional effect. It's preparation for adding a new implementation. The main functional change is locking around the external calls into this module, to prevent some of the races that we've seen. Additional fixes: - entry_data->data can be NULL, so we should check lookup_res before dereferencing it below. - It renames functions that need to be locked to have double underscores in front of them. This is a port of D5658875, D5658809 & D5762136 to 3.8 Change-Id: If1b71b5c3268271f3a41c07394c215290a12c0ec Reviewed-on: https://review.gluster.org/18247 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* md-cache: Allow custom per-directory timeoutsShreyas Siravara2017-09-081-14/+62
| | | | | | | | | | | | | | | Summary: - This diff looks for a custom xattr on a directory or file called 'trusted.glusterfs.md-cache-timeout' and uses that timeout if it finds it instead of the default timeout value for the cache. - For example, if we know that a customer has a fixed set of directories that never change, we can set that attribute on all their directories and cache directory metadata for the lifetime of the client (NFS or FUSE) process. - Port of D5430395 to 3.8 Reviewed By: jdarcy Change-Id: Ieb232bc1365c59dd7c396c7a617f12973cc8ea01 Reviewed-on: https://review.gluster.org/18241 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* glusterd: Consider NULL peers to be invalidMax Rijevski2017-09-081-0/+7
| | | | | | | | | | | | | | | Summary: Null peer UUIDs are assumed to be invalid. Glusterd should complain and bail if we try to load any on startup. This is a port of D5160925 to 3.8 Reviewed By: sshreyas Change-Id: Ib8679c7501a4fc1fbf9b34fdbf47037f38ec7cb8 Reviewed-on: https://review.gluster.org/18238 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* performance/io-threads: Add watchdog to cover up a possible thread leakJeff Darcy2017-09-083-16/+213
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: There appears to be a thread leak somewhere, which causes io-threads to run out of threads to process a particular (priority-based) queue. The leak should obviously be fixed, but that might take a while and the consequences until then are severe - a brick essentially going offline without the courtesy of actually dying. This patch adds a watchdog that checks for stuck queues, and adds threads to break the deadlock. The same thing done manually on one afflicted cluster caused brick CPU usage to drop from 2600% to 400%, with latency quickly returning to normal. The controlling option is performance.iot-watchdog-secs, which is the number of seconds we'll watch for a non-empty queue with no items being dequeued. That's our signal to add a thread. A value of zero (the default) disables this watchdog feature. This is a port of D5177414 to 3.8. Test Plan: All the usual tests to determine safety. Use gdb to hack priv->queue_sizes to a non-zero value. This will make it look like the queue is non-empty, but since it does in fact have zero items there will be no dequeues. After watchdog-secs seconds, this should add a thread, with a corresponding entry in the brick log. Change-Id: Ic051e411d3e9351e1cf5e233bad8bbb5078cb259 Reviewed-on: https://review.gluster.org/18239 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* nfs: Enable NFS by defaultShreyas Siravara2017-09-081-1/+1
| | | | | | | | Change-Id: I520894244063ef854b4416cb5418065bd9de7277 Reviewed-on: https://review.gluster.org/18237 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* debug/io-stats: Track outstanding requestsJunsong Li2017-09-081-2/+16
| | | | | | | | | | | | | | | Summary: Add outstanding-req field to track requests that have been sent down the stack and haven't come back. This is a port of D4908836 to 3.8 Reviewers: sshreyas Change-Id: I5870f63008d553416109c1808a434f526f5a633d Reviewed-on: https://review.gluster.org/18236 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* [io-cache] New volume options for read sizesJoshua Eilers2017-09-083-5/+38
| | | | | | | | | | | | | | | | | | | Summary: Two new volume options that control reads. performance.io-cache.read-size - Tells gluster how much it should try to read on each posix_readv call performance.io-cache.min-cached-read-size - Tells gluster the smallest files it should start caching, anything smaller is not cached This is a port of D4844662 to 3.8 Change-Id: I5ba891906f97e514e7365cc34374619379434766 Reviewed-on: https://review.gluster.org/18235 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* glusterd: Allow volume stop to succeed if certain processes are already deadShreyas Siravara2017-09-081-4/+14
| | | | | | | | | | | | | Summary: - Sometimes a the process that glusterd is trying to kill is already dead. - In that case, if it can't find the pid, it should just continue on and not fail the entire operation. - This is a port of D4837916 to 3.8 Change-Id: Ic96952a8d31927446f648830ede6ccd82512663f Reviewed-on: https://review.gluster.org/18234 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* storage/posix: Add limit to number of hard links.Shreyas Siravara2017-09-083-2/+33
| | | | | | | | | | | | | | | | Summary: Too may hard links blow up btrfs by exceeding max xattr size (recordign pgfid for each hardlink). Add a limit to prevent this explosion. This is a port D4682329 to 3.8 Reviewed By: sshreyas Change-Id: I614a247834fb8f2b2743c0c67d11cefafff0dbaa Reviewed-on: https://review.gluster.org/18232 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* cluster/afr: Set AFR UP message as soon as quorum is obtained.Shreyas Siravara2017-09-072-17/+41
| | | | | | | | | | | | | | | | | | | Summary: AFR currently waits for all children to respond before sending an UP message. This means that one dead host cal cause us to wait a TCP timeout (2 mins!) before declaring the volume up. Now we send an UP as soon as quorum is obtained. This is a port of D4701919 to 3.8. Reviewed By: sshreyas Change-Id: I642d4eb7dc7e0b289e89b7a16abf99a3f98aa8b3 Reviewed-on: https://review.gluster.org/18231 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* md-cache: Invalidate inode metadata on flushShreyas Siravara2017-09-071-0/+35
| | | | | | | | | | | | | | | | | | Summary: - When you write a file and then stat it immediately, md-cache returns stale stat information. - This diff implements flush() in md-cache so that we can correctly invalidate inodes after a write. - This is a port of D4762171 to 3.8 Reviewers: kvigor, dph Reviewed By: kvigor Change-Id: I368b7870d61b14a7e390917d195cbccc67029eb7 Reviewed-on: https://review.gluster.org/18233 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* io-stats: Error count and rate collectionShreyas Siravara2017-09-071-0/+24
| | | | | | | | | | | | | | | | | | | | | | Summary: - This diff adds error counts and rates to the regular io-stats dump. - It outputs keys that look like this: "storage.gluster.nfsd.groot.aggr.errors.<error_name>.count": "6", "storage.gluster.nfsd.groot.inter.errors.<error_name>.per_sec": "0.00" - <error_name> is the lowercase representation of errno values (e.g., ENOENT -> enoent, etc.) - This is a port of D4691581 to 3.8 Reviewers: dph, kvigor Reviewed By: kvigor Change-Id: I96857d4283c47f9d330ae1978f113013e7c78a87 Reviewed-on: https://review.gluster.org/18230 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* posix: fadvise filedescriptors POSIX_FADV_RANDOM to bypass kernel read-size bugShreyas Siravara2017-09-074-1/+36
| | | | | | | | | | | | | | | | | | | Summary: - There is a known kernel bug that causes reads to disk to be limited by the RA setting in /sys/block/sd[a-z]/queue/read_ahead_kb. - The workaround is to fadvise POSIX_FADV_RANDOM on file descriptors before reading. - This is a port of D4585521 to 3.8 Test Plan: Still need to figure out a good test for this, other than simple inspection. Reviewers: rwareing, kvigor Reviewed By: kvigor Change-Id: I4a307573da620d9a1955fb5f4e8cd67154e11ace Reviewed-on: https://review.gluster.org/18229 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* features/namespace: Add namespace xlator and link into brick graphMichael Goulet2017-09-077-1/+1409
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This translator tags namespaces with a unique hash that corresponds to the top-level directory (right under the gluster root) of the file the fop acts on. The hash information is injected into the call frame by this translator, so this namespace information can later be used to do throttling, QoS and other namespace-specific stats collection and actions in later xlators further down the stack. When the translator can't find a path directly for the fd_t or loc_t, it winds a GET_ANCESTRY_PATH_KEY down to the posix xlator to get the path manually. Caching this namespace information in the inode makes sure that most requests don't need to recalculate the hash, so that typically fops are just doing an inode_ctx_get instead of the more expensive code paths that this xlator can take. Right now the xlator is hard-coded to only hash the top-level directory, but this could be easily extended to more sophisticated matching by modification of the parse_path function. Test Plan: Run `prove -v tests/basic/namespace.t` to see that tagging works. Change-Id: I960ddadba114120ac449d27a769d409cc3759ebc Reviewed-on: https://review.gluster.org/18041 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* md-cache: Cache statfs callsShreyas Siravara2017-09-071-0/+126
| | | | | | | | | | | | | | | | | | | | | | | | Summary: - This gives md-cache to cache statfs calls - You can turn it on or off via 'gluster vol set groot performance.md-cache-statfs <on|off>' - This is a port of D4652632 Test Plan: Tested functionality on devserver Reviewers: kvigor Reviewed By: kvigor Subscribers: #posix_storage Differential Revision: https://phabricator.intern.facebook.com/D4652632 Change-Id: I664579e3c19fb9a6cd9d7b3a0eae061f70f4def4 Signature: t1:4652632:1488581841:111cc01efe83c71f1e98d075abb10589c4574705 Reviewed-on: https://review.gluster.org/18228 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* debug/io-stats: Restrict io-thread queue depth stats to NFSShreyas Siravara2017-09-031-20/+21
| | | | | | | | | | | | | | | | Summary: - Fixes the unecessary log spew in other daemons - This is a port of D3646627 to 3.8 Reviewers: rwareing, kvigor Reviewed By: kvigor Change-Id: Id54ab41cdfdd2006d3af2d8774c38025c566c523 Reviewed-on: https://review.gluster.org/18199 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* auditing: Sample creation & removal of filesystem entries as well as errorsShreyas Siravara2017-09-032-2/+241
| | | | | | | | | | | | | | | Summary: - Adds the ability for gluster to log every single CREATE and UNLINK that happens on the bricks (right before invoking sys_unlink() or open(...| O_CREAT) - Makes it so that CREATEs and UNLINKs are not downsampled in io-stats - This is a port of D3268156, D3778968, D3903894 & D3301527 to 3.8 Reviewed By: kvigor Change-Id: I1bce28068c02b7d202f094094237646b4d39794b Reviewed-on: https://review.gluster.org/18198 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* storage/posix: Fix crash in posix_make_ancestryfromgfidRichard Wareing2017-09-031-0/+10
| | | | | | | | | | | | | | | | Summary: - Log an OOPS and bail when *parent is null just before going into posix_resolve code path (to avoid crash) Test Plan: - Prove test/canary on cluster Differential Revision: https://phabricator.fb.com/D2640497 Change-Id: I6140ef6fdb711748dad1c66d929aca36328bc574 Reviewed-on: https://review.gluster.org/17969 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
* glusterd: Log & print old clients when doing a volume set operationShreyas Siravara2017-09-031-10/+37
| | | | | | | | | | | | | | | | | Summary: - Prior to this diff, Gluster would simply log "One more more clients cannot ..." - With this diff, we now show up to 20 clients that are mismatched. - This is a port of D3313082 to 3.8 Reviewers: rwareing, kvigor Reviewed By: kvigor Change-Id: Ia8830f18c922bda1aee787a2e3d6033164bb64d4 Reviewed-on: https://review.gluster.org/18196 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* debug/io-stats: Fix multi-volume support for nfsdShreyas Siravara2017-09-032-11/+18
| | | | | | | | | | | | | | | | | | Summary: - Adds iamshd (iamnfsd already there due to fop throttling) options to io-stats xlator. - Leverages these options to correctly write multi-volume NFSd stats - This is a port of D2714648 to 3.8 Test Plan: - Tested on local dev server, verified multiple files are generated for multiple vols Change-Id: Id2014a135fe52045da462eaaa91f336f45cdf167 Reviewed-on: https://review.gluster.org/18195 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* cluster/afr: Fix gfid unsplit code when renamed filename exceeds NAME_MAXShreyas Siravara2017-09-031-1/+14
| | | | | | | | | | | | | | | | | | | Summary: - We noticed some folks name their files all the way up to NAME_MAX (usually 255) and when split-brain is encountered, we fail to heal the file. - This diff puts an upper bound on the number of bytes we will snprintf into the buffer so that we do not fail the rename. - This is a port of D3646254 to 3.8 Test Plan: Prove test -- can show it fails without patch as well. Reviewers: #posix_storage, rwareing Reviewed By: rwareing Change-Id: I51c6b28374d4a3f21e29044cb727b4b1da7b69e1 Reviewed-on: https://review.gluster.org/18194 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* nfs: Check if FQDN is authorized before unmounting clientsShreyas Siravara2017-09-031-7/+39
| | | | | | | | | | | | | | | | | | | Summary: - We have a thread that checks if connected clients are "still" authorized for a mount. - This thread is currently only checking the IP (regression from the 3.4 -> 3.6 rebase, perhaps). - This diff adds code toe check the IP *and* the FQDN before unmounting the client. Test Plan: Tested on devserver, auth prove tests. Reviewers: rwareing, kvigor Reviewed By: kvigor Change-Id: I441a4436d8df064d2f09a2539acb780ab53943f6 Reviewed-on: https://review.gluster.org/18193 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* debug/io-stats: Adding stat for weighted & unweighted average latencyRichard Wareing2017-09-031-0/+30
| | | | | | | | | | | | | | | | | | | | | Summary: - Our current approach to measuring "average fop latency" is badly flawed in that it doesn't weight the FOPs correctly according to how many occurred in the time interval. This makes Statisticians very sad. This patch adds an internally computed weighted average latency which will be far more efficient to display via ODS, as well as having the benefit of not being complete nonsense. - This is a port of D3148415 & D3405772 to 3.8 Reviewers: kvigor, dph, sshreyas Reviewed By: sshreyas Change-Id: Ie3618f279b545610b7ed1a8482243fcc8dc53217 Reviewed-on: https://review.gluster.org/18192 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* features/lock-revocation: Remove 3.6 version dependency for lock revocationShreyas Siravara2017-09-031-4/+4
| | | | | | | | | | | | | | | | | | Summary: - Per title - This is a port of D2875451 to 3.8 Test Plan: Live? Reviewers: dph, moox, dld, rwareing Reviewed By: rwareing Change-Id: Ie2862bcbb49d1159cf2465d48cc506f629c527e0 Reviewed-on: https://review.gluster.org/18191 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>