summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* inodelk-count: Add stats to count the number of lock objectskrad2017-09-136-1/+194
| | | | | | | | | | | | | | | | | | | | | | | | Summary: We want to track the number of locks held by the locks xlator. One of the ways to do it would be to track the total number of pl_lock objects in the system. This patch tracks the total number of pl_lock object and exposes the stat via io-stats JSON dump. Test Plan: WIP, haven't got a pass. Putting the diff to get a sense of this approach would yield what you guys are looking for? Reviewers: kvigor, sshreyas, jdarcy Reviewed By: jdarcy Differential Revision: https://phabricator.intern.facebook.com/D5303071 Change-Id: I946debcbff61699ec28b4d6f243042440107a224 Signed-off-by: Jeff Darcy <jdarcy@fb.com> Reviewed-on: https://review.gluster.org/18273 Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* Remediation for XFS/DIO corruption problem.Jeff Darcy2017-09-1215-24/+669
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a new volume option, shd-validate-data. When set, the self-heal code will fetch checksums for regular files along with all the usual xattrs. If the file seems OK but the checksums show a data mismatch, and if there is only one replica that's out of step with the others, then we modify the source/sink calculations to force a heal from one of the agreeing replicas to the odd one out. Combined with a tool to put files into the self-heal index (being developed separately), this provides a very rudimentary kind of scrubbing functionality. Validation is now conditional on the "trusted.glusterfs.validate-status" xattr having the specific value of "suspect" to avoid redoing validation (which is expensive) as we find the same file in multiple bricks' indices. When we decide to take action, we update this xattr to "clean" for copies that were in the majority and "repaired" for the odd one out that gets clobbered. We also copy the about-to-be-clobbered copy into an "orphans" directory to facilitate analysis of corruption patterns. The data goes into ${GFID}.data there, while ${GFID}.link is a symlink to the file's old location. Porting note: this is several internal squashed together ("See Also") Differential Revision: https://phabricator.intern.facebook.com/D5092983 See Also: https://phabricator.intern.facebook.com/D5126974 See Also: https://phabricator.intern.facebook.com/D5127427 See Also: https://phabricator.intern.facebook.com/D5132804 See Also: https://phabricator.intern.facebook.com/D5209185 See Also: https://phabricator.intern.facebook.com/D5370353 Change-Id: Ie0ae18b368c408a5e47d0bf03ebac80b87b70aa9 Signed-off-by: Jeff Darcy <jdarcy@fb.com> Reviewed-on: https://review.gluster.org/18269 Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* nfs: Enable multi-core epoll support in gNFSdShreyas Siravara2017-09-124-0/+65
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: - Enables multi-core epoll support in the nfs daemon. - Option can be turned on using: gluster volume set <volname> nfs.event-threads <numthreads> Test Plan: Prove test! Reviewers: kvigor, rwareing Reviewed By: rwareing Subscribers: dld, moox, dph Differential Revision: https://phabricator.fb.com/D3117966 Change-Id: Ie8a7b1ba04b0e83f5ec7a09f9d181fe59be479ca Signed-off-by: Jeff Darcy <jdarcy@fb.com> Reviewed-on: https://review.gluster.org/18266 Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* event: Idle connection managementShreyas Siravara2017-09-1210-50/+373
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: - This diff adds support for detecting and tracking idle client connections. - It allows *service translators* (server, nfs) to opt-in to detect and close idle client connections. - Right now it explicitly restricts the service to NFS as a safety. Here are the debug logs when a client connection gets closed: [2016-03-29 17:27:06.154232] W [socket.c:2426:socket_timeout_handler] 0-socket: Shutting down idle client connection (idle=20s,fd=20,conn=[2401:db00:11:d0af:face:0:3:0:957]->[2401:db00:11:d0af:face:0:3:0:2049])! [2016-03-29 17:27:06.154292] D [event-epoll.c:655:__event_epoll_timeout_slot] 0-epoll: Connection on slot->fd=9 was idle for 20 seconds! [2016-03-29 17:27:06.163282] D [socket.c:629:__socket_rwv] 0-socket.nfs-server: EOF on socket [2016-03-29 17:27:06.163298] D [socket.c:2474:socket_event_handler] 0-transport: disconnecting now [2016-03-29 17:27:06.163316] D [event-epoll.c:614:event_dispatch_epoll_handler] 0-epoll: generation bumped on idx=9 from gen=4 to slot->gen=5, fd=20, slot->fd=20 Test Plan: - Used stuck NFS mounts to create idle clients and unstuck them. Reviewers: kvigor, rwareing Reviewed By: rwareing Subscribers: dld, moox, dph Differential Revision: https://phabricator.fb.com/D3112099 Change-Id: Ic06c89e03f87daabab7f07f892390edd1a1fcc20 Signed-off-by: Jeff Darcy <jdarcy@fb.com> Reviewed-on: https://review.gluster.org/18265 Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* cluster/afr: Add additional test coverage for unsplit flowsRichard Wareing2017-09-115-0/+615
| | | | | | | | | | | | | | | | | | | | | | Summary: - Adds test coverage for unsplitting via SHD Test Plan: - Run prove -v tests/bugs/fb2506544* (https://phabricator.fb.com/P56056659) Reviewers: moox, dld, dph, sshreyas Reviewed By: sshreyas Differential Revision: https://phabricator.fb.com/D2770524 Porting note: also added fb*.t tests to test_env. Change-Id: Iac28b595194925a45e62b6438611c9bade58b30b Signed-off-by: Jeff Darcy <jdarcy@fb.com> Reviewed-on: https://review.gluster.org/18261 Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* clusters/afr: Move root entry heal flow to SHDRichard Wareing2017-09-112-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: - Improves upon D2387001 by moving the "forced" root gfid heal to the SHDs - Removed code which forced NFSd/FUSE clients through the entry heal for the root GFID, this will make them spin up just as fast as prior to D2387001 (i.e. instantly) Porting note: mostly inapplicable in 3.8, only one non-test change survived Test Plan: - Must pass tests/bugs/fb8149516.t Reviewers: dph, moox, sshreyas Reviewed By: sshreyas Differential Revision: https://phabricator.fb.com/D2722239 Change-Id: I35f5827df6ead1bb0ff886ca0adabb2add2e7163 Signed-off-by: Jeff Darcy <jdarcy@fb.com> Reviewed-on: https://review.gluster.org/18259 Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* io-threads: nuke everything from a client when it disconnectsJeff Darcy2017-09-092-3/+39
| | | | | | | | | | | | Summary: These requests haven't been issued, yet alone acknowledged. They would disappear if we crashed, which to the client is indistinguishable from any other kind of disconnection - if indeed the client itself isn't the one that died. So we're completely within our rights to discard these. There are strong hints that such "orphan" requests are part of how we get into the lock-revocation hangs we've been seeing for a while. Even if that theory doesn't pan out, there's no good reason to keep them around clogging up queues and so forth. This is a port of D5430057 & D5662545 to 3.8 Change-Id: Ie4c88f7791aac85540631f60f5c639497468ad76 Reviewed-on: https://review.gluster.org/18254 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* posix: Add option to disable nftw() based deletes when purging the landfill ↵Shreyas Siravara2017-09-094-7/+54
| | | | | | | | | | | | | | | | | | | | | | | | | directory Summary: - We may have found an issue where certain directories were being moved into .landfill and then being quickly purged via nftw(). - We would like to have an emergency option to disable these purges. Test Plan: Build, vol-set, read logs Reviewers: rwareing, dph Reviewed By: dph Subscribers: #posix_storage Differential Revision: https://phabricator.intern.facebook.com/D4862021 Change-Id: I90b54c535930c1ca2925a928728199b6b80eadd9 Signature: t1:4862021:1491855616:51b9b5b8957b0bb97afe27766f2e5aa78ff9edd4 Reviewed-on: https://review.gluster.org/18253 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* tests: Disable flaky read-size testShreyas Siravara2017-09-091-0/+1
| | | | | | | | Change-Id: Ie5f2e085169000ed385f9911ea6222aac7ac46ad Reviewed-on: https://review.gluster.org/18252 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* cluster/afr: SHD should not use did_discovery code pathsRichard Wareing2017-09-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | Summary: - Exempt the SHD from the discover code path Test Plan: - prove -v tests/bugs/fb8149516.t - Make rc and canary on offending host (gfsdataswarm048.prn2) Reviewers: moox, dph, sshreyas Reviewed By: sshreyas Differential Revision: https://phabricator.fb.com/D2491694 Change-Id: I691a990950e13be6e376c64fddb110cd6ceefe47 Signed-off-by: Jeff Darcy <jdarcy@fb.com> Reviewed-on: https://review.gluster.org/18251 Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* posix-acl: Add assume-permissive option for EACCES debugging / rug-sweeping.Kevin Vigor2017-09-093-3/+177
| | | | | | | | | | | | | Summary: Add assume-permissive option for EACCES debugging / rug-sweeping. Re-fetch permissions when needed if they're absent. This is a port of D5104707 & D5131597 to 3.8 Change-Id: I900fc66876ec8e73b04049f844c428b3d225d4ad Reviewed-on: https://review.gluster.org/18249 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* cluster/afr: AFR2 Discovery entry heal flow should only happen on root gfidRichard Wareing2017-09-082-7/+17
| | | | | | | | | | | | | | | | | | | | Summary: - Prevents entry self-heal flow from happening on non-root GFIDs Test Plan: - Run prove -v tests/bugs/fb8149516.t Reviewers: dph, moox, sshreyas Reviewed By: sshreyas Differential Revision: https://phabricator.fb.com/D2470622 Change-Id: Id8559f2cfeb6e1e5c26dc1571854c0fbc0b59e08 Signed-off-by: Jeff Darcy <jdarcy@fb.com> Reviewed-on: https://review.gluster.org/18250 Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* exports: Add a reference count to export_item_t structs and ensure they are ↵Shreyas Siravara2017-09-083-10/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | correctly used Summary: This diff fixes a bug in the NFS daemon where the auth cache would use an export item after it was free'd by the auth params refresh thread. This usually manifests as a crash in production, when exports files are updated by chef. Since each auth cache entry holds a pointer to an export_item_t it makes sense that it should first get a reference to it. Freein'g the export_item_t struct happens only in `exp_item_unref()`, once the reference count has dropped to 0. This diff also fixes a use-after-free bug in the auth-cache, in the insertion path. In _cache_item(), if we find an entry in the dict, we update that entry with a timestamp & ref the export item associated with it. However, if the item already existed and we called old_cache_insert() with the same key, we gave the dict permission to free the old entry. We then end up using that entry. The fix is to use dict_set_static_bin() instead of dict_set_bin() which informs the dict that the pointer we are giving it belongs to us. This is a port of D5780476, D5785038 to 3.8 Change-Id: I5cdcdc1cc6abad26c7077d66a14f263da07678ac Reviewed-on: https://review.gluster.org/18248 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* nfs: Add locking to auth-cache, fix some bugsJeff Darcy2017-09-082-71/+216
| | | | | | | | | | | | | | | | | | | Summary: A lot of the diff "volume" is just refactoring, which should have no functional effect. It's preparation for adding a new implementation. The main functional change is locking around the external calls into this module, to prevent some of the races that we've seen. Additional fixes: - entry_data->data can be NULL, so we should check lookup_res before dereferencing it below. - It renames functions that need to be locked to have double underscores in front of them. This is a port of D5658875, D5658809 & D5762136 to 3.8 Change-Id: If1b71b5c3268271f3a41c07394c215290a12c0ec Reviewed-on: https://review.gluster.org/18247 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* md-cache: Allow custom per-directory timeoutsShreyas Siravara2017-09-081-14/+62
| | | | | | | | | | | | | | | Summary: - This diff looks for a custom xattr on a directory or file called 'trusted.glusterfs.md-cache-timeout' and uses that timeout if it finds it instead of the default timeout value for the cache. - For example, if we know that a customer has a fixed set of directories that never change, we can set that attribute on all their directories and cache directory metadata for the lifetime of the client (NFS or FUSE) process. - Port of D5430395 to 3.8 Reviewed By: jdarcy Change-Id: Ieb232bc1365c59dd7c396c7a617f12973cc8ea01 Reviewed-on: https://review.gluster.org/18241 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* glusterd: Consider NULL peers to be invalidMax Rijevski2017-09-081-0/+7
| | | | | | | | | | | | | | | Summary: Null peer UUIDs are assumed to be invalid. Glusterd should complain and bail if we try to load any on startup. This is a port of D5160925 to 3.8 Reviewed By: sshreyas Change-Id: Ib8679c7501a4fc1fbf9b34fdbf47037f38ec7cb8 Reviewed-on: https://review.gluster.org/18238 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* gfapi: build a *working* glfsxmp to show how it's doneJeff Darcy2017-09-082-5/+16
| | | | | | | | | | | | | | | | | | | Summary: Previously, glfsxmp would fail way down in XDR code. The reasons are still a bit unclear, but exactly duplicating the build flags etc. we use for other programs seems to fix the issue. With this change, we have one example of one set of flags that can be used to build other GFAPI programs. This is a port of D5370316 to 3.8 Reviewed By: sshreyas Change-Id: I74535a791545189f829f10f04caf34a8a07295f7 Reviewed-on: https://review.gluster.org/18240 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* performance/io-threads: Add watchdog to cover up a possible thread leakJeff Darcy2017-09-083-16/+213
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: There appears to be a thread leak somewhere, which causes io-threads to run out of threads to process a particular (priority-based) queue. The leak should obviously be fixed, but that might take a while and the consequences until then are severe - a brick essentially going offline without the courtesy of actually dying. This patch adds a watchdog that checks for stuck queues, and adds threads to break the deadlock. The same thing done manually on one afflicted cluster caused brick CPU usage to drop from 2600% to 400%, with latency quickly returning to normal. The controlling option is performance.iot-watchdog-secs, which is the number of seconds we'll watch for a non-empty queue with no items being dequeued. That's our signal to add a thread. A value of zero (the default) disables this watchdog feature. This is a port of D5177414 to 3.8. Test Plan: All the usual tests to determine safety. Use gdb to hack priv->queue_sizes to a non-zero value. This will make it look like the queue is non-empty, but since it does in fact have zero items there will be no dequeues. After watchdog-secs seconds, this should add a thread, with a corresponding entry in the brick log. Change-Id: Ic051e411d3e9351e1cf5e233bad8bbb5078cb259 Reviewed-on: https://review.gluster.org/18239 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* nfs: Enable NFS by defaultShreyas Siravara2017-09-081-1/+1
| | | | | | | | Change-Id: I520894244063ef854b4416cb5418065bd9de7277 Reviewed-on: https://review.gluster.org/18237 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* debug/io-stats: Track outstanding requestsJunsong Li2017-09-082-2/+17
| | | | | | | | | | | | | | | Summary: Add outstanding-req field to track requests that have been sent down the stack and haven't come back. This is a port of D4908836 to 3.8 Reviewers: sshreyas Change-Id: I5870f63008d553416109c1808a434f526f5a633d Reviewed-on: https://review.gluster.org/18236 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* [io-cache] New volume options for read sizesJoshua Eilers2017-09-084-5/+103
| | | | | | | | | | | | | | | | | | | Summary: Two new volume options that control reads. performance.io-cache.read-size - Tells gluster how much it should try to read on each posix_readv call performance.io-cache.min-cached-read-size - Tells gluster the smallest files it should start caching, anything smaller is not cached This is a port of D4844662 to 3.8 Change-Id: I5ba891906f97e514e7365cc34374619379434766 Reviewed-on: https://review.gluster.org/18235 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* glusterd: Allow volume stop to succeed if certain processes are already deadShreyas Siravara2017-09-081-4/+14
| | | | | | | | | | | | | Summary: - Sometimes a the process that glusterd is trying to kill is already dead. - In that case, if it can't find the pid, it should just continue on and not fail the entire operation. - This is a port of D4837916 to 3.8 Change-Id: Ic96952a8d31927446f648830ede6ccd82512663f Reviewed-on: https://review.gluster.org/18234 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* storage/posix: Add limit to number of hard links.Shreyas Siravara2017-09-084-2/+77
| | | | | | | | | | | | | | | | Summary: Too may hard links blow up btrfs by exceeding max xattr size (recordign pgfid for each hardlink). Add a limit to prevent this explosion. This is a port D4682329 to 3.8 Reviewed By: sshreyas Change-Id: I614a247834fb8f2b2743c0c67d11cefafff0dbaa Reviewed-on: https://review.gluster.org/18232 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* cluster/afr: Set AFR UP message as soon as quorum is obtained.Shreyas Siravara2017-09-073-17/+103
| | | | | | | | | | | | | | | | | | | Summary: AFR currently waits for all children to respond before sending an UP message. This means that one dead host cal cause us to wait a TCP timeout (2 mins!) before declaring the volume up. Now we send an UP as soon as quorum is obtained. This is a port of D4701919 to 3.8. Reviewed By: sshreyas Change-Id: I642d4eb7dc7e0b289e89b7a16abf99a3f98aa8b3 Reviewed-on: https://review.gluster.org/18231 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* md-cache: Invalidate inode metadata on flushShreyas Siravara2017-09-071-0/+35
| | | | | | | | | | | | | | | | | | Summary: - When you write a file and then stat it immediately, md-cache returns stale stat information. - This diff implements flush() in md-cache so that we can correctly invalidate inodes after a write. - This is a port of D4762171 to 3.8 Reviewers: kvigor, dph Reviewed By: kvigor Change-Id: I368b7870d61b14a7e390917d195cbccc67029eb7 Reviewed-on: https://review.gluster.org/18233 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* io-stats: Error count and rate collectionShreyas Siravara2017-09-072-0/+25
| | | | | | | | | | | | | | | | | | | | | | Summary: - This diff adds error counts and rates to the regular io-stats dump. - It outputs keys that look like this: "storage.gluster.nfsd.groot.aggr.errors.<error_name>.count": "6", "storage.gluster.nfsd.groot.inter.errors.<error_name>.per_sec": "0.00" - <error_name> is the lowercase representation of errno values (e.g., ENOENT -> enoent, etc.) - This is a port of D4691581 to 3.8 Reviewers: dph, kvigor Reviewed By: kvigor Change-Id: I96857d4283c47f9d330ae1978f113013e7c78a87 Reviewed-on: https://review.gluster.org/18230 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* posix: fadvise filedescriptors POSIX_FADV_RANDOM to bypass kernel read-size bugShreyas Siravara2017-09-074-1/+36
| | | | | | | | | | | | | | | | | | | Summary: - There is a known kernel bug that causes reads to disk to be limited by the RA setting in /sys/block/sd[a-z]/queue/read_ahead_kb. - The workaround is to fadvise POSIX_FADV_RANDOM on file descriptors before reading. - This is a port of D4585521 to 3.8 Test Plan: Still need to figure out a good test for this, other than simple inspection. Reviewers: rwareing, kvigor Reviewed By: kvigor Change-Id: I4a307573da620d9a1955fb5f4e8cd67154e11ace Reviewed-on: https://review.gluster.org/18229 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* features/namespace: Add namespace xlator and link into brick graphMichael Goulet2017-09-0711-1/+1520
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This translator tags namespaces with a unique hash that corresponds to the top-level directory (right under the gluster root) of the file the fop acts on. The hash information is injected into the call frame by this translator, so this namespace information can later be used to do throttling, QoS and other namespace-specific stats collection and actions in later xlators further down the stack. When the translator can't find a path directly for the fd_t or loc_t, it winds a GET_ANCESTRY_PATH_KEY down to the posix xlator to get the path manually. Caching this namespace information in the inode makes sure that most requests don't need to recalculate the hash, so that typically fops are just doing an inode_ctx_get instead of the more expensive code paths that this xlator can take. Right now the xlator is hard-coded to only hash the top-level directory, but this could be easily extended to more sophisticated matching by modification of the parse_path function. Test Plan: Run `prove -v tests/basic/namespace.t` to see that tagging works. Change-Id: I960ddadba114120ac449d27a769d409cc3759ebc Reviewed-on: https://review.gluster.org/18041 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* md-cache: Cache statfs callsShreyas Siravara2017-09-071-0/+126
| | | | | | | | | | | | | | | | | | | | | | | | Summary: - This gives md-cache to cache statfs calls - You can turn it on or off via 'gluster vol set groot performance.md-cache-statfs <on|off>' - This is a port of D4652632 Test Plan: Tested functionality on devserver Reviewers: kvigor Reviewed By: kvigor Subscribers: #posix_storage Differential Revision: https://phabricator.intern.facebook.com/D4652632 Change-Id: I664579e3c19fb9a6cd9d7b3a0eae061f70f4def4 Signature: t1:4652632:1488581841:111cc01efe83c71f1e98d075abb10589c4574705 Reviewed-on: https://review.gluster.org/18228 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* debug/io-stats: Restrict io-thread queue depth stats to NFSShreyas Siravara2017-09-032-21/+22
| | | | | | | | | | | | | | | | Summary: - Fixes the unecessary log spew in other daemons - This is a port of D3646627 to 3.8 Reviewers: rwareing, kvigor Reviewed By: kvigor Change-Id: Id54ab41cdfdd2006d3af2d8774c38025c566c523 Reviewed-on: https://review.gluster.org/18199 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* auditing: Sample creation & removal of filesystem entries as well as errorsShreyas Siravara2017-09-033-18/+312
| | | | | | | | | | | | | | | Summary: - Adds the ability for gluster to log every single CREATE and UNLINK that happens on the bricks (right before invoking sys_unlink() or open(...| O_CREAT) - Makes it so that CREATEs and UNLINKs are not downsampled in io-stats - This is a port of D3268156, D3778968, D3903894 & D3301527 to 3.8 Reviewed By: kvigor Change-Id: I1bce28068c02b7d202f094094237646b4d39794b Reviewed-on: https://review.gluster.org/18198 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* storage/posix: Fix crash in posix_make_ancestryfromgfidRichard Wareing2017-09-031-0/+10
| | | | | | | | | | | | | | | | Summary: - Log an OOPS and bail when *parent is null just before going into posix_resolve code path (to avoid crash) Test Plan: - Prove test/canary on cluster Differential Revision: https://phabricator.fb.com/D2640497 Change-Id: I6140ef6fdb711748dad1c66d929aca36328bc574 Reviewed-on: https://review.gluster.org/17969 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
* Add a bounded queue implementation.Kevin Vigor2017-09-034-2/+320
| | | | | | | | | | | | | | | | | Summary: - This queue will be used to hold the set of directory crawl / file migrate operations in the multi-threaded rebalance. - This is a port of D3712047 to 3.8 Test Plan: Unit test included. Reviewed By: sshreyas Change-Id: I25497a64beba744430807b3512eaee5d90f089c4 Reviewed-on: https://review.gluster.org/18197 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* glusterd: Log & print old clients when doing a volume set operationShreyas Siravara2017-09-033-10/+79
| | | | | | | | | | | | | | | | | Summary: - Prior to this diff, Gluster would simply log "One more more clients cannot ..." - With this diff, we now show up to 20 clients that are mismatched. - This is a port of D3313082 to 3.8 Reviewers: rwareing, kvigor Reviewed By: kvigor Change-Id: Ia8830f18c922bda1aee787a2e3d6033164bb64d4 Reviewed-on: https://review.gluster.org/18196 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* debug/io-stats: Fix multi-volume support for nfsdShreyas Siravara2017-09-034-15/+22
| | | | | | | | | | | | | | | | | | Summary: - Adds iamshd (iamnfsd already there due to fop throttling) options to io-stats xlator. - Leverages these options to correctly write multi-volume NFSd stats - This is a port of D2714648 to 3.8 Test Plan: - Tested on local dev server, verified multiple files are generated for multiple vols Change-Id: Id2014a135fe52045da462eaaa91f336f45cdf167 Reviewed-on: https://review.gluster.org/18195 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* cluster/afr: Fix gfid unsplit code when renamed filename exceeds NAME_MAXShreyas Siravara2017-09-032-4/+88
| | | | | | | | | | | | | | | | | | | Summary: - We noticed some folks name their files all the way up to NAME_MAX (usually 255) and when split-brain is encountered, we fail to heal the file. - This diff puts an upper bound on the number of bytes we will snprintf into the buffer so that we do not fail the rename. - This is a port of D3646254 to 3.8 Test Plan: Prove test -- can show it fails without patch as well. Reviewers: #posix_storage, rwareing Reviewed By: rwareing Change-Id: I51c6b28374d4a3f21e29044cb727b4b1da7b69e1 Reviewed-on: https://review.gluster.org/18194 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* nfs: Check if FQDN is authorized before unmounting clientsShreyas Siravara2017-09-031-7/+39
| | | | | | | | | | | | | | | | | | | Summary: - We have a thread that checks if connected clients are "still" authorized for a mount. - This thread is currently only checking the IP (regression from the 3.4 -> 3.6 rebase, perhaps). - This diff adds code toe check the IP *and* the FQDN before unmounting the client. Test Plan: Tested on devserver, auth prove tests. Reviewers: rwareing, kvigor Reviewed By: kvigor Change-Id: I441a4436d8df064d2f09a2539acb780ab53943f6 Reviewed-on: https://review.gluster.org/18193 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* debug/io-stats: Adding stat for weighted & unweighted average latencyRichard Wareing2017-09-032-0/+73
| | | | | | | | | | | | | | | | | | | | | Summary: - Our current approach to measuring "average fop latency" is badly flawed in that it doesn't weight the FOPs correctly according to how many occurred in the time interval. This makes Statisticians very sad. This patch adds an internally computed weighted average latency which will be far more efficient to display via ODS, as well as having the benefit of not being complete nonsense. - This is a port of D3148415 & D3405772 to 3.8 Reviewers: kvigor, dph, sshreyas Reviewed By: sshreyas Change-Id: Ie3618f279b545610b7ed1a8482243fcc8dc53217 Reviewed-on: https://review.gluster.org/18192 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* features/lock-revocation: Remove 3.6 version dependency for lock revocationShreyas Siravara2017-09-031-4/+4
| | | | | | | | | | | | | | | | | | Summary: - Per title - This is a port of D2875451 to 3.8 Test Plan: Live? Reviewers: dph, moox, dld, rwareing Reviewed By: rwareing Change-Id: Ie2862bcbb49d1159cf2465d48cc506f629c527e0 Reviewed-on: https://review.gluster.org/18191 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* gfproxy: Make io-stats aware of the gfproxy daemonShreyas Siravara2017-09-032-0/+40
| | | | | | | | | | | | | | | | | | Summary: - This diff enables gfproxyd to output a stats file that looks like 'glusterfs_gfproxyd_{volname}.dump' - This is a port of D3753684 to 3.8 Test Plan: Tested on devserver, verified output. Reviewers: kvigor Reviewed By: kvigor Change-Id: I8559974e9d24976fd1c8b6145fbc81be40fd4134 Reviewed-on: https://review.gluster.org/18189 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* cluster/afr: PGFID heal only when all children are upRichard Wareing2017-09-031-2/+10
| | | | | | | | | | | | | | | | | | | | Summary: - PGFID healing is pointless when a child is down, since the heal will fail for that reason (and we have no signal for this). Instead restrict PGFID healing to the case where all children are up. - This is a port of D3100450 to 3.8 Test Plan: Run prove -v tests/basic/afr/shd-pgfid-heal.t Reviewers: kvigor, sshreyas Reviewed By: sshreyas Change-Id: I88e542449e3b40415cd201ff39694e86eef65a6e Reviewed-on: https://review.gluster.org/18190 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* Log AFR quorum stats in io-stats translator.Kevin Vigor2017-09-037-58/+146
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Add AFR quorum state to io-stats translator. Sample output: { "storage.gluster.nfsd.test-replicate-0.has-quorum": "1", "storage.gluster.nfsd.test-replicate-0.quorum-threshold": "1", "storage.gluster.nfsd.test-replicate-1.has-quorum": "1", "storage.gluster.nfsd.test-replicate-1.quorum-threshold": "1" } The quorum-threshold field shows the number of bricks that can be lost while still maintaining quorum. Negative numbers indicate that quorum has been lost and show the number of bricks that must be brought online to restore quorum. Additionally, I found that the code contained both afr_have_quorum() and afr_has_quorum(), which were mostly cut-n-pasted copies of each other, but with subtle differences. Mercifully, afr_have_quorum() was totally unused, so I nuked it in passing. This is a port of D4089969 to 3.8. Test Plan: Run, observe stats output. Kill brick, observe proper change. fb-smoke. Reviewers: #posix_storage, sshreyas Reviewed By: sshreyas Subscribers: sshreyas Differential Revision: https://phabricator.intern.facebook.com/D4089969 Change-Id: Ifddb351aebfe63998846bb52be8942415ce4c1a9 Reviewed-on: https://review.gluster.org/18188 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* mgmt/glusterd: Allow disabling of changelog xlator w/ old clients connectedRichard Wareing2017-09-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Summary: - Reduces version of change log option to 2 (3.4.x) so we can disable this *server* side feature when older clients are attached Test Plan: - Was required as a hotfix for a broken cluster, after installing an RC with this patch we were able to kill the feature and stabilize the cluster. Reviewers: sshreyas, moox, dph, dld, kvigor Reviewed By: kvigor Differential Revision: https://phabricator.fb.com/D2981552 Change-Id: I515e2bb520585e5efaa305b1acbab21ebc7218a9 Reviewed-on: https://review.gluster.org/18183 Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* tests: fix timing issues from the previous mergeJeff Darcy2017-08-312-1/+9
| | | | | | | | | | Change-Id: Ic77287c1b96ae426b927b4bf6f2826d6f3a3b17d Signed-off-by: Jeff Darcy <jdarcy@fb.com> Reviewed-on: https://review.gluster.org/18175 Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* Merge remote-tracking branch 'origin/release-3.8' into release-3.8-fbJeff Darcy2017-08-3192-627/+1579
|\ | | | | | | Change-Id: Ie35cd1c8c7808949ddf79b3189f1f8bf0ff70ed8
| * doc: release-notes for GlusterFS-3.8.15v3.8.15release-3.8Niels de Vos2017-08-161-0/+30
| | | | | | | | | | | | | | | | | | | | BUG: 1469558 Change-Id: Ia9a4e69e5d7dfd33933b20b7c4ea41e439d3c838 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://review.gluster.org/18039 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
| * api: memory leak in glfs_h_acl_get(), missing dict unrefKaleb S. KEITHLEY2017-08-151-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | master review https://review.gluster.org/17092 circa April 2017 Fix already exists in release-3.12 and release-3.11 branches Hat tip to Shyam (srangana[at]redhat.com) who found the existing fix after sitting and debugging it with me for several hours. Reported-by: Kinglong Mee <mijinlong@open-fs.com> Change-Id: Ic7169fd05aff7bf46108e8ac7b1f29688a7f2358 BUG: 1481398 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: https://review.gluster.org/18037 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kinglong Mee <kinglongmee@gmail.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
| * cli/xml: fix return handlingAtin Mukherjee2017-08-101-39/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The return code of xmlTextWriter* APIs says it returns either the bytes written (may be 0 because of buffering) or -1 in case of error. Now if the volume of the xml data payload is not huge then most of the time the data to be written gets buffered, however when this grows sometimes this APIs will return the total number of bytes written and then it becomes absolutely mandatory that every such call is followed by XML_RET_CHECK_AND_GOTO otherwise we may end up returning a non zero ret code which would result into the overall xml generation to fail. >Reviewed-on: https://review.gluster.org/17702 >Smoke: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Amar Tumballi <amarts@redhat.com> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Gaurav Yadav <gyadav@redhat.com> Change-Id: I02ee7076e1d8c26cf654d3dc3e77b1eb17cbdab0 BUG: 1470495 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/17766 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
| * storage/posix: Use the ret value of posix_gfid_heal()Krutika Dhananjay2017-08-102-10/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... to make the change in commit acf8cfdf truly useful. Without this, a race between entry creation fops and lookup at posix layer can cause lookups to fail with ENODATA, as opposed to ENOENT. Backport of: > Change-Id: I44a226872283a25f1f4812f03f68921c5eb335bb > Reviewed-on: https://review.gluster.org/17821 > BUG: 1472758 > cherry-picked from 669868d23eaeba42809fca7be134137c607d64ed Change-Id: I44a226872283a25f1f4812f03f68921c5eb335bb BUG: 1480193 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: https://review.gluster.org/18015 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
| * afr: mark non sources as sinks in metadata healRavishankar N2017-07-283-3/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of https://review.gluster.org/#/c/17717/ Problem: In a 3 way replica, when the source brick does not have pending xattrs for the sinks, but the 2 sinks blame each other, metadata heal was not happpening because we were not setting all non-sources as sinks. Fix: Mark all non-sources as sinks, like it is done in data and entry heal. Change-Id: I534978940f5087302e307fcc810a48ffe898ce08 BUG: 1471613 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: https://review.gluster.org/17784 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>