diff options
| author | Jeff Darcy <jdarcy@fb.com> | 2017-06-02 14:02:02 -0700 | 
|---|---|---|
| committer | Shreyas Siravara <sshreyas@fb.com> | 2017-09-08 04:34:13 +0000 | 
| commit | 8b6804f75cda612d13e3a691b3f9698028b2577d (patch) | |
| tree | 9ce4705785c2e5b3d2c6eab01645ac74b2fc7f09 /xlators/performance/io-threads/src/io-threads.h | |
| parent | 49d0f911bdc34913836f7014d6b70fd5d58367ca (diff) | |
performance/io-threads: Add watchdog to cover up a possible thread leak
Summary:
There appears to be a thread leak somewhere, which causes io-threads to
run out of threads to process a particular (priority-based) queue.
The leak should obviously be fixed, but that might take a while
and the consequences until then are severe - a brick essentially going
offline without the courtesy of actually dying.  This patch adds a
watchdog that checks for stuck queues, and adds threads to break the deadlock.
The same thing done manually on one afflicted cluster caused brick CPU usage
to drop from 2600% to 400%, with latency quickly returning to normal.
The controlling option is performance.iot-watchdog-secs,
which is the number of seconds we'll watch for a non-empty
queue with no items being dequeued.  That's our signal to
add a thread. A value of zero (the default) disables
this watchdog feature.
This is a port of D5177414 to 3.8.
Test Plan: All the usual tests to determine safety.
Use gdb to hack priv->queue_sizes to a non-zero value.  This will make it look like the queue is non-empty, but since it does in fact have zero items there will be no dequeues.  After watchdog-secs seconds, this should add a thread, with a corresponding entry in the brick log.
Change-Id: Ic051e411d3e9351e1cf5e233bad8bbb5078cb259
Reviewed-on: https://review.gluster.org/18239
Reviewed-by: Shreyas Siravara <sshreyas@fb.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Diffstat (limited to 'xlators/performance/io-threads/src/io-threads.h')
| -rw-r--r-- | xlators/performance/io-threads/src/io-threads.h | 7 | 
1 files changed, 6 insertions, 1 deletions
diff --git a/xlators/performance/io-threads/src/io-threads.h b/xlators/performance/io-threads/src/io-threads.h index 011d4a00f7f..4056eb5fe09 100644 --- a/xlators/performance/io-threads/src/io-threads.h +++ b/xlators/performance/io-threads/src/io-threads.h @@ -77,7 +77,12 @@ struct iot_conf {          gf_boolean_t         down; /*PARENT_DOWN event is notified*/          gf_boolean_t         mutex_inited;          gf_boolean_t         cond_inited; -	struct iot_least_throttle throttle; +	      struct              iot_least_throttle throttle; + +        int32_t             watchdog_secs; +        gf_boolean_t        watchdog_running; +        pthread_t           watchdog_thread; +        gf_boolean_t        queue_marked[IOT_PRI_MAX];  };  typedef struct iot_conf iot_conf_t;  | 
