summaryrefslogtreecommitdiffstats
path: root/xlators/performance/io-threads/src/io-threads.h
diff options
context:
space:
mode:
authorJeff Darcy <jdarcy@fb.com>2017-06-02 14:02:02 -0700
committerShreyas Siravara <sshreyas@fb.com>2017-09-08 04:34:13 +0000
commit8b6804f75cda612d13e3a691b3f9698028b2577d (patch)
tree9ce4705785c2e5b3d2c6eab01645ac74b2fc7f09 /xlators/performance/io-threads/src/io-threads.h
parent49d0f911bdc34913836f7014d6b70fd5d58367ca (diff)
performance/io-threads: Add watchdog to cover up a possible thread leak
Summary: There appears to be a thread leak somewhere, which causes io-threads to run out of threads to process a particular (priority-based) queue. The leak should obviously be fixed, but that might take a while and the consequences until then are severe - a brick essentially going offline without the courtesy of actually dying. This patch adds a watchdog that checks for stuck queues, and adds threads to break the deadlock. The same thing done manually on one afflicted cluster caused brick CPU usage to drop from 2600% to 400%, with latency quickly returning to normal. The controlling option is performance.iot-watchdog-secs, which is the number of seconds we'll watch for a non-empty queue with no items being dequeued. That's our signal to add a thread. A value of zero (the default) disables this watchdog feature. This is a port of D5177414 to 3.8. Test Plan: All the usual tests to determine safety. Use gdb to hack priv->queue_sizes to a non-zero value. This will make it look like the queue is non-empty, but since it does in fact have zero items there will be no dequeues. After watchdog-secs seconds, this should add a thread, with a corresponding entry in the brick log. Change-Id: Ic051e411d3e9351e1cf5e233bad8bbb5078cb259 Reviewed-on: https://review.gluster.org/18239 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
Diffstat (limited to 'xlators/performance/io-threads/src/io-threads.h')
-rw-r--r--xlators/performance/io-threads/src/io-threads.h7
1 files changed, 6 insertions, 1 deletions
diff --git a/xlators/performance/io-threads/src/io-threads.h b/xlators/performance/io-threads/src/io-threads.h
index 011d4a00f7f..4056eb5fe09 100644
--- a/xlators/performance/io-threads/src/io-threads.h
+++ b/xlators/performance/io-threads/src/io-threads.h
@@ -77,7 +77,12 @@ struct iot_conf {
gf_boolean_t down; /*PARENT_DOWN event is notified*/
gf_boolean_t mutex_inited;
gf_boolean_t cond_inited;
- struct iot_least_throttle throttle;
+ struct iot_least_throttle throttle;
+
+ int32_t watchdog_secs;
+ gf_boolean_t watchdog_running;
+ pthread_t watchdog_thread;
+ gf_boolean_t queue_marked[IOT_PRI_MAX];
};
typedef struct iot_conf iot_conf_t;