summaryrefslogtreecommitdiffstats
path: root/doc/admin-guide
diff options
context:
space:
mode:
authorXavier Hernandez <xhernandez@datalab.es>2014-09-29 15:48:55 +0200
committerVijay Bellur <vbellur@redhat.com>2014-09-30 09:37:26 -0700
commit36d2975714ed6ef98c0f86a2fac22fc382ea8a9d (patch)
tree923566fcc9edcd8ae83494a44b41ceffaf4a0b46 /doc/admin-guide
parent535c4259119ca3beef2fee1526930e8be42bdd5d (diff)
doc: added documentation for dispersed volumes
Change-Id: I8a8368bdbe31af30a239aaf8cc478429e10c3f57 BUG: 1147563 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8885 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Diffstat (limited to 'doc/admin-guide')
-rw-r--r--doc/admin-guide/en-US/markdown/admin_setting_volumes.md170
1 files changed, 169 insertions, 1 deletions
diff --git a/doc/admin-guide/en-US/markdown/admin_setting_volumes.md b/doc/admin-guide/en-US/markdown/admin_setting_volumes.md
index 028cd30647a..395aa2c79a9 100644
--- a/doc/admin-guide/en-US/markdown/admin_setting_volumes.md
+++ b/doc/admin-guide/en-US/markdown/admin_setting_volumes.md
@@ -52,11 +52,24 @@ start it before attempting to mount it.
and performance is critical. In this release, configuration of
this volume type is supported only for Map Reduce workloads.
+ - **Dispersed** - Dispersed volumes are based on erasure codes,
+ providing space-efficient protection against disk or server failures.
+ It stores an encoded fragment of the original file to each brick in
+ a way that only a subset of the fragments is needed to recover the
+ original file. The number of bricks that can be missing without
+ losing access to data is configured by the administrator on volume
+ creation time.
+
+ - **Distributed Dispersed** - Distributed dispersed volumes distribute
+ files across dispersed subvolumes. This has the same advantages of
+ distribute replicate volumes, but using disperse to store the data
+ into the bricks.
+
**To create a new volume**
- Create a new volume :
- `# gluster volume create [stripe | replica ] [transport tcp | rdma | tcp, rdma] `
+ `# gluster volume create [stripe | replica | disperse] [transport tcp | rdma | tcp, rdma] `
For example, to create a volume called test-volume consisting of
server3:/exp3 and server4:/exp4:
@@ -389,6 +402,161 @@ of this volume type is supported only for Map Reduce workloads.
> Use the `force` option at the end of command if you want to create the volume in this case.
+##Creating Dispersed Volumes
+
+Dispersed volumes are based on erasure codes. It stripes the encoded data of
+files, with some redundancy addedd, across multiple bricks in the volume. You
+can use dispersed volumes to have a configurable level of reliability with a
+minimum space waste.
+
+**Redundancy**
+
+Each dispersed volume has a redundancy value defined when the volume is
+created. This value determines how many bricks can be lost without
+interrupting the operation of the volume. It also determines the amount of
+usable space of the volume using this formula:
+
+ <Usable size> = <Brick size> * (#Bricks - Redundancy)
+
+All bricks of a disperse set should have the same capacity otherwise, when
+the smaller brick becomes full, no additional data will be allowed in the
+disperse set.
+
+It's important to note that a configuration with 3 bricks and redundancy 1
+will have less usable space (66.7% of the total physical space) than a
+configuration with 10 bricks and redundancy 1 (90%). However the first one
+will be safer than the second one (roughly the probability of failure of
+the second configuration if more than 4.5 times bigger than the first one).
+
+For example, a dispersed volume composed by 6 bricks of 4TB and a redundancy
+of 2 will be completely operational even with two bricks inaccessible. However
+a third inaccessible brick will bring the volume down because it won't be
+possible to read or write to it. The usable space of the volume will be equal
+to 16TB.
+
+The implementation of erasure codes in GlusterFS limits the redundancy to a
+value smaller than #Bricks / 2 (or equivalently, redundancy * 2 < #Bricks).
+Having a redundancy equal to half of the number of bricks would be almost
+equivalent to a replica-2 volume, and probably a replicated volume will
+perform better in this case.
+
+**Optimal volumes**
+
+One of the worst things erasure codes have in terms of performance is the
+RMW (Read-Modify-Write) cycle. Erasure codes operate in blocks of a certain
+size and it cannot work with smaller ones. This means that if a user issues
+a write of a portion of a file that doesn't fill a full block, it needs to
+read the remaining portion from the current contents of the file, merge them,
+compute the updated encoded block and, finally, writing the resulting data.
+
+This adds latency, reducing performance when this happens. Some GlusterFS
+performance xlators can help to reduce or even eliminate this problem for
+some workloads, but it should be taken into account when using dispersed
+volumes for a specific use case.
+
+Current implementation of dispersed volumes use blocks of a size that depends
+on the number of bricks and redundancy: 512 * (#Bricks - redundancy) bytes.
+This value is also known as the stripe size.
+
+Using combinations of #Bricks/redundancy that give a power of two for the
+stripe size will make the disperse volume perform better in most workloads
+because it's more typical to write information in blocks that are multiple of
+two (for example databases, virtual machines and many applications).
+
+These combinations are considered *optimal*.
+
+For example, a configuration with 6 bricks and redundancy 2 will have a stripe
+size of 512 * (6 - 2) = 2048 bytes, so it's considered optimal. A configuration
+with 7 bricks and redundancy 2 would have a stripe size of 2560 bytes, needing
+a RMW cycle for many writes (of course this always depends on the use case).
+
+**To create a dispersed volume**
+
+1. Create a trusted storage pool.
+
+2. Create the dispersed volume:
+
+ `# gluster volume create [disperse [<count>]] [redundancy <count>] [transport tcp | rdma | tcp,rdma]`
+
+ A dispersed volume can be created by specifying the number of bricks in a
+ disperse set, by specifying the number of redundancy bricks, or both.
+
+ If *disperse* is not specified, or the _&lt;count&gt;_ is missing, the
+ entire volume will be treated as a single disperse set composed by all
+ bricks enumerated in the command line.
+
+ If *redundancy* is not specified, it is computed automatically to be the
+ optimal value. If this value does not exist, it's assumed to be '1' and a
+ warning message is shown:
+
+ # gluster volume create test-volume disperse 4 server{1..4}:/bricks/test-volume
+ There isn't an optimal redundancy value for this configuration. Do you want to create the volume with redundancy 1 ? (y/n)
+
+ In all cases where *redundancy* is automatically computed and it's not
+ equal to '1', a warning message is displayed:
+
+ # gluster volume create test-volume disperse 6 server{1..6}:/bricks/test-volume
+ The optimal redundancy for this configuration is 2. Do you want to create the volume with this value ? (y/n)
+
+ _redundancy_ must be greater than 0, and the total number of bricks must
+ be greater than 2 * _redundancy_. This means that a dispersed volume must
+ have a minimum of 3 bricks.
+
+ If the transport type is not specified, *tcp* is used as the default. You
+ can also set additional options if required, like in the other volume
+ types.
+
+ > **Note**:
+
+ > - Make sure you start your volumes before you try to mount them or
+ > else client operations after the mount will hang.
+
+ > - GlusterFS will fail to create a dispersed volume if more than one brick of a disperse set is present on the same peer.
+
+ > ```
+ # gluster volume create <volname> disperse 3 server1:/brick{1..3}
+ volume create: <volname>: failed: Multiple bricks of a replicate volume are present on the same server. This setup is not optimal.
+ Do you still want to continue creating the volume? (y/n)```
+
+ > Use the `force` option at the end of command if you want to create the volume in this case.
+
+##Creating Distributed Dispersed Volumes
+
+Distributed dispersed volumes are the equivalent to distributed replicated
+volumes, but using dispersed subvolumes instead of replicated ones.
+
+**To create a distributed dispersed volume**
+
+1. Create a trusted storage pool.
+
+2. Create the distributed dispersed volume:
+
+ `# gluster volume create disperse <count> [redundancy <count>] [transport tcp | rdma tcp,rdma]`
+
+ To create a distributed dispersed volume, the *disperse* keyword and
+ &lt;count&gt; is mandatory, and the number of bricks specified in the
+ command line must must be a multiple of the disperse count.
+
+ *redundancy* is exactly the same as in the dispersed volume.
+
+ If the transport type is not specified, *tcp* is used as the default. You
+ can also set additional options if required, like in the other volume
+ types.
+
+ > **Note**:
+
+ > - Make sure you start your volumes before you try to mount them or
+ > else client operations after the mount will hang.
+
+ > - GlusterFS will fail to create a distributed dispersed volume if more than one brick of a disperse set is present on the same peer.
+
+ > ```
+ # gluster volume create <volname> disperse 3 server1:/brick{1..6}
+ volume create: <volname>: failed: Multiple bricks of a replicate volume are present on the same server. This setup is not optimal.
+ Do you still want to continue creating the volume? (y/n)```
+
+ > Use the `force` option at the end of command if you want to create the volume in this case.
+
##Starting Volumes
You must start your volumes before you try to mount them.