summaryrefslogtreecommitdiffstats
path: root/doc/features
diff options
context:
space:
mode:
Diffstat (limited to 'doc/features')
-rw-r--r--doc/features/brick-failure-detection.md67
-rw-r--r--doc/features/file-snapshot.md91
-rw-r--r--doc/features/nufa.md20
-rw-r--r--doc/features/server-quorum.md44
4 files changed, 222 insertions, 0 deletions
diff --git a/doc/features/brick-failure-detection.md b/doc/features/brick-failure-detection.md
new file mode 100644
index 000000000..24f2a18f3
--- /dev/null
+++ b/doc/features/brick-failure-detection.md
@@ -0,0 +1,67 @@
+# Brick Failure Detection
+
+This feature attempts to identify storage/file system failures and disable the failed brick without disrupting the remainder of the node's operation.
+
+## Description
+
+Detecting failures on the filesystem that a brick uses makes it possible to handle errors that are caused from outside of the Gluster environment.
+
+There have been hanging brick processes when the underlying storage of a brick went unavailable. A hanging brick process can still use the network and repond to clients, but actual I/O to the storage is impossible and can cause noticible delays on the client side.
+
+Provide better detection of storage subsytem failures and prevent bricks from hanging. It should prevent hanging brick processes when storage-hardware or the filesystem fails.
+
+A health-checker (thread) has been added to the posix xlator. This thread periodically checks the status of the filesystem (implies checking of functional storage-hardware).
+
+`glusterd` can detect that the brick process has exited, `gluster volume status` will show that the brick process is not running anymore. System administrators checking the logs should be able to triage the cause.
+
+## Usage and Configuration
+
+The health-checker is enabled by default and runs a check every 30 seconds. This interval can be changed per volume with:
+
+ # gluster volume set <VOLNAME> storage.health-check-interval <SECONDS>
+
+If `SECONDS` is set to 0, the health-checker will be disabled.
+
+## Failure Detection
+
+Error are logged to the standard syslog (mostly `/var/log/messages`):
+
+ Jun 24 11:31:49 vm130-32 kernel: XFS (dm-2): metadata I/O error: block 0x0 ("xfs_buf_iodone_callbacks") error 5 buf count 512
+ Jun 24 11:31:49 vm130-32 kernel: XFS (dm-2): I/O Error Detected. Shutting down filesystem
+ Jun 24 11:31:49 vm130-32 kernel: XFS (dm-2): Please umount the filesystem and rectify the problem(s)
+ Jun 24 11:31:49 vm130-32 kernel: VFS:Filesystem freeze failed
+ Jun 24 11:31:50 vm130-32 GlusterFS[1969]: [2013-06-24 10:31:50.500674] M [posix-helpers.c:1114:posix_health_check_thread_proc] 0-failing_xfs-posix: health-check failed, going down
+ Jun 24 11:32:09 vm130-32 kernel: XFS (dm-2): xfs_log_force: error 5 returned.
+ Jun 24 11:32:20 vm130-32 GlusterFS[1969]: [2013-06-24 10:32:20.508690] M [posix-helpers.c:1119:posix_health_check_thread_proc] 0-failing_xfs-posix: still alive! -> SIGTERM
+
+The messages labelled with `GlusterFS` in the above output are also written to the logs of the brick process.
+
+## Recovery after a failure
+
+When a brick process detects that the underlaying storage is not responding anymore, the process will exit. There is no automated way that the brick process gets restarted, the sysadmin will need to fix the problem with the storage first.
+
+After correcting the storage (hardware or filesystem) issue, the following command will start the brick process again:
+
+ # gluster volume start <VOLNAME> force
+
+## How To Test
+
+The health-checker thread that is part of each brick process will get started automatically when a volume has been started. Verifying its functionality can be done in different ways.
+
+On virtual hardware:
+
+* disconnect the disk from the VM that holds the brick
+
+On real hardware:
+
+* simulate a RAID-card failure by unplugging the card or cables
+
+On a system that uses LVM for the bricks:
+
+* use device-mapper to load an error-table for the disk, see [this description](http://review.gluster.org/5176).
+
+On any system (writing to random offsets of the block device, more difficult to trigger):
+
+1. cause corruption on the filesystem that holds the brick
+2. read contents from the brick, hoping to hit the corrupted area
+3. the filsystem should abort after hitting a bad spot, the health-checker should notice that shortly afterwards
diff --git a/doc/features/file-snapshot.md b/doc/features/file-snapshot.md
new file mode 100644
index 000000000..7f7c419fc
--- /dev/null
+++ b/doc/features/file-snapshot.md
@@ -0,0 +1,91 @@
+#File Snapshot
+This feature gives the ability to take snapshot of files.
+
+##Descritpion
+This feature adds file snapshotting support to glusterfs. Snapshots can be created , deleted and reverted.
+
+To take a snapshot of a file, file should be in QCOW2 format as the code for the block layer snapshot has been taken from Qemu and put into gluster as a translator.
+
+With this feature, glusterfs will have better integration with Openstack Cinder, and in general ability to take snapshots of files (typically VM images).
+
+New extended attribute (xattr) will be added to identify files which are 'snapshot managed' vs raw files.
+
+##Volume Options
+Following volume option needs to be set on the volume for taking file snapshot.
+
+ # features.file-snapshot on
+##CLI parameters
+Following cli parameters needs to be passed with setfattr command to create, delete and revert file snapshot.
+
+ # trusted.glusterfs.block-format
+ # trusted.glusterfs.block-snapshot-create
+ # trusted.glusterfs.block-snapshot-goto
+##Fully loaded Example
+Download glusterfs3.5 rpms from download.gluster.org
+Install these rpms.
+
+start glusterd by using the command
+
+ # service glusterd start
+Now create a volume by using the command
+
+ # gluster volume create <vol_name> <brick_path>
+Run the command below to make sure that volume is created.
+
+ # gluster volume info
+Now turn on the snapshot feature on the volume by using the command
+
+ # gluster volume set <vol_name> features.file-snapshot on
+Verify that the option is set by using the command
+
+ # gluster volume info
+User should be able to see another option in the volume info
+
+ # features.file-snapshot: on
+Now mount the volume using fuse mount
+
+ # mount -t glusterfs <vol_name> <mount point>
+cd into the mount point
+ # cd <mount_point>
+ # touch <file_name>
+Size of the file can be set and format of the file can be changed to QCOW2 by running the command below. File size can be in KB/MB/GB
+
+ # setfattr -n trusted.glusterfs.block-format -v qcow2:<file_size> <file_name>
+Now create another file and send data to that file by running the command
+
+ # echo 'ABCDEFGHIJ' > <data_file1>
+copy the data to the one file to another by running the command
+
+ # dd if=data-file1 of=big-file conv=notrunc
+Now take the `snapshot of the file` by running the command
+
+ # setfattr -n trusted.glusterfs.block-snapshot-create -v <image1> <file_name>
+Add some more contents to the file and take another file snaphot by doing the following steps
+
+ # echo '1234567890' > <data_file2>
+ # dd if=<data_file2> of=<file_name> conv=notrunc
+ # setfattr -n trusted.glusterfs.block-snapshot-create -v <image2> <file_name>
+Now `revert` both the file snapshots and write data to some files so that data can be compared.
+
+ # setfattr -n trusted.glusterfs.block-snapshot-goto -v <image1> <file_name>
+ # dd if=<file_name> of=<out-file1> bs=11 count=1
+ # setfattr -n trusted.glusterfs.block-snapshot-goto -v <image2> <file_name>
+ # dd if=<file_name> of=<out-file2> bs=11 count=1
+Now read the contents of the files and compare as below:
+
+ # cat <data_file1>, <out_file1> and compare contents.
+ # cat <data_file2>, <out_file2> and compare contents.
+##one line description for the variables used
+file_name = File which will be creating in the mount point intially.
+
+data_file1 = File which contains data 'ABCDEFGHIJ'
+
+image1 = First file snapshot which has 'ABCDEFGHIJ' + some null values.
+
+data_file2 = File which contains data '1234567890'
+
+image2 = second file snapshot which has '1234567890' + some null values.
+
+out_file1 = After reverting image1 this contains 'ABCDEFGHIJ'
+
+out_file2 = After reverting image2 this contians '1234567890'
diff --git a/doc/features/nufa.md b/doc/features/nufa.md
new file mode 100644
index 000000000..03b8194b4
--- /dev/null
+++ b/doc/features/nufa.md
@@ -0,0 +1,20 @@
+# NUFA Translator
+
+The NUFA ("Non Uniform File Access") is a variant of the DHT ("Distributed Hash
+Table") translator, intended for use with workloads that have a high locality
+of reference. Instead of placing new files pseudo-randomly, it places them on
+the same nodes where they are created so that future accesses can be made
+locally. For replicated volumes, this means that one copy will be local and
+others will be remote; the read-replica selection mechanisms will then favor
+the local copy for reads. For non-replicated volumes, the only copy will be
+local.
+
+## Interface
+
+Use of NUFA is controlled by a volume option, as follows.
+
+ gluster volume set myvolume cluster.nufa on
+
+This will cause the NUFA translator to be used wherever the DHT translator
+otherwise would be. The rest is all automatic.
+
diff --git a/doc/features/server-quorum.md b/doc/features/server-quorum.md
new file mode 100644
index 000000000..7b20084ce
--- /dev/null
+++ b/doc/features/server-quorum.md
@@ -0,0 +1,44 @@
+# Server Quorum
+
+Server quorum is a feature intended to reduce the occurrence of "split brain"
+after a brick failure or network partition. Split brain happens when different
+sets of servers are allowed to process different sets of writes, leaving data
+in a state that can not be reconciled automatically. The key to avoiding split
+brain is to ensure that there can be only one set of servers - a quorum - that
+can continue handling writes. Server quorum does this by the brutal but
+effective means of forcing down all brick daemons on cluster nodes that can no
+longer reach enough of their peers to form a majority. Because there can only
+be one majority, there can be only one set of bricks remaining, and thus split
+brain can not occur.
+
+## Options
+
+Server quorum is controlled by two parameters:
+
+ * **cluster.server-quorum-type**
+
+ This value may be "server" to indicate that server quorum is enabled, or
+ "none" to mean it's disabled.
+
+ * **cluster.server-quorum-ratio**
+
+ This is the percentage of cluster nodes that must be up to maintain quorum.
+ More precisely, this percentage of nodes *plus one* must be up.
+
+Note that these are cluster-wide flags. All volumes served by the cluster will
+be affected. Once these values are set, quorum actions - starting or stopping
+brick daemons in response to node or network events - will be automatic.
+
+## Best Practices
+
+If a cluster with an even number of nodes is split exactly down the middle,
+neither half can have quorum (which requires **more than** half of the total).
+This is particularly important when N=2, in which case the loss of either node
+leads to loss of quorum. Therefore, it is highly advisable to ensure that the
+cluster size is three or greater. The "extra" node in this case need not have
+any bricks or serve any data. It need only be present to preserve the notion
+of a quorum majority less than the entire cluster membership, allowing the
+cluster to survive the loss of a single node without losing quorum.
+
+
+