diff options
author | Niels de Vos <ndevos@redhat.com> | 2013-06-24 14:05:58 +0200 |
---|---|---|
committer | Vijay Bellur <vbellur@redhat.com> | 2013-07-03 22:34:54 -0700 |
commit | 98f62a731ca13296b937bfff14d0a2f8dfc49a54 (patch) | |
tree | 985d15b9f42c62c387abf46442b5ede8f9be672d /doc | |
parent | 37d2c255e46eea98df473fbc693931462882392e (diff) |
posix: add a simple health-checker
Goal of this health-checker is to detect fatal issues of the underlying
storage that is used for exporting a brick. The current implementation
requires the filesystem to detect the storage error, after which it will
notify the parent xlators and exit the glusterfsd (brick) process to
prevent further troubles.
The interval the health-check runs can be configured per volume with the
storage.health-check-interval option. The default interval is 30
seconds.
It is not trivial to write an automated test-case with the current
prove-framework. These are the manual steps that can be done to verify
the functionality:
- setup a Logical Volume (/dev/bz970960/xfs) and format is as XFS for
brick usage
- create a volume with the one brick
# gluster volume create failing_xfs glufs1:/bricks/failing_xfs/data
# gluster volume start failing_xfs
- mount the volume and verify the functionality
- make the storage fail (use device-mapper, or pull disks)
# dmsetup table
..
bz970960-xfs: 0 196608 linear 7:0 2048
# echo 0 196608 error > dmsetup-error-target
# dmsetup load bz970960-xfs dmsetup-error-target
# dmsetup resume bz970960-xfs
# dmsetup table
...
bz970960-xfs: 0 196608 error
- notice the errors caught by syslog:
Jun 24 11:31:49 vm130-32 kernel: XFS (dm-2): metadata I/O error: block 0x0 ("xfs_buf_iodone_callbacks") error 5 buf count 512
Jun 24 11:31:49 vm130-32 kernel: XFS (dm-2): I/O Error Detected. Shutting down filesystem
Jun 24 11:31:49 vm130-32 kernel: XFS (dm-2): Please umount the filesystem and rectify the problem(s)
Jun 24 11:31:49 vm130-32 kernel: VFS:Filesystem freeze failed
Jun 24 11:31:50 vm130-32 GlusterFS[1969]: [2013-06-24 10:31:50.500674] M [posix-helpers.c:1114:posix_health_check_thread_proc] 0-failing_xfs-posix: health-check failed, going down
Jun 24 11:32:09 vm130-32 kernel: XFS (dm-2): xfs_log_force: error 5 returned.
Jun 24 11:32:20 vm130-32 GlusterFS[1969]: [2013-06-24 10:32:20.508690] M [posix-helpers.c:1119:posix_health_check_thread_proc] 0-failing_xfs-posix: still alive! -> SIGTERM
- these errors are in the log of the brick as well:
[2013-06-24 10:31:50.500607] W [posix-helpers.c:1102:posix_health_check_thread_proc] 0-failing_xfs-posix: stat() on /bricks/failing_xfs/data returned: Input/output error
[2013-06-24 10:31:50.500674] M [posix-helpers.c:1114:posix_health_check_thread_proc] 0-failing_xfs-posix: health-check failed, going down
[2013-06-24 10:32:20.508690] M [posix-helpers.c:1119:posix_health_check_thread_proc] 0-failing_xfs-posix: still alive! -> SIGTERM
- the glusterfsd process has exited correctly:
# gluster volume status
Status of volume: failing_xfs
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick glufs1:/bricks/failing_xfs/data N/A N N/A
NFS Server on localhost 2049 Y 1897
Change-Id: Ic247fbefb97f7e861307a5998a9a7a3ecc80aa07
BUG: 971774
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/5176
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Diffstat (limited to 'doc')
-rw-r--r-- | doc/admin-guide/en-US/markdown/admin_managing_volumes.md | 2 |
1 files changed, 2 insertions, 0 deletions
diff --git a/doc/admin-guide/en-US/markdown/admin_managing_volumes.md b/doc/admin-guide/en-US/markdown/admin_managing_volumes.md index 6375bf5257e..dd8ed471015 100644 --- a/doc/admin-guide/en-US/markdown/admin_managing_volumes.md +++ b/doc/admin-guide/en-US/markdown/admin_managing_volumes.md @@ -153,6 +153,8 @@ To tune volume options server.grace-timeout Specifies the duration for the lock state to be maintained on the server after a network disconnection. 10 10 - 1800 secs server.statedump-path Location of the state dump file. /tmp directory of the brick New directory path + + storage.health-check-interval Number of seconds between health-checks done on the filesystem that is used for the brick(s). Defaults to 30 seconds, set to 0 to disable. /tmp directory of the brick New directory path ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- You can view the changed volume options using |