diff options
-rw-r--r-- | doc/admin-guide/en-US/markdown/admin_setting_volumes.md | 7 | ||||
-rw-r--r-- | doc/features/afr-arbiter-volumes.md | 53 |
2 files changed, 60 insertions, 0 deletions
diff --git a/doc/admin-guide/en-US/markdown/admin_setting_volumes.md b/doc/admin-guide/en-US/markdown/admin_setting_volumes.md index e58bb63ab23..d66a6894152 100644 --- a/doc/admin-guide/en-US/markdown/admin_setting_volumes.md +++ b/doc/admin-guide/en-US/markdown/admin_setting_volumes.md @@ -266,6 +266,13 @@ high-availability and high-reliability are critical. > Use the `force` option at the end of command if you want to create the volume in this case. +###Arbiter configuration for replica volumes +Arbiter volumes are replica 3 volumes where the 3rd brick acts as the arbiter brick. This configuration has mechanisms that prevent occurrence of split-brains. +It can be created with the following command: +`# gluster volume create <VOLNAME> replica 3 arbiter 1 host1:brick1 host2:brick2 host3:brick3` +More information about this configuration can be found at `doc/features/afr-arbiter-volumes.md` +Note that the arbiter configuration for replica 3 can be used to create distributed-replicate volumes as well. + ##Creating Striped Volumes Striped volumes stripes data across bricks in the volume. For best diff --git a/doc/features/afr-arbiter-volumes.md b/doc/features/afr-arbiter-volumes.md new file mode 100644 index 00000000000..1348e5645b8 --- /dev/null +++ b/doc/features/afr-arbiter-volumes.md @@ -0,0 +1,53 @@ +Usage guide: Replicate volumes with arbiter configuration +========================================================== +Arbiter volumes are replica 3 volumes where the 3rd brick of the replica is +automatically configured as an arbiter node. What this means is that the 3rd +brick will store only the file name and metadata, but does not contain any data. +This configuration is helpful in avoiding split-brains while providing the same +level of consistency as a normal replica 3 volume. + +The arbiter volume can be created with the following command: +`gluster volume create <VOLNAME> replica 3 arbiter 1 host1:brick1 host2:brick2 host3:brick3` + +Note that the syntax is similar to creating a normal replica 3 volume with the +exception of the `arbiter 1` keyword. As seen in the command above, the only +permissible values for the replica count and arbiter count are 3 and 1 +respectively. Also, the 3rd brick is always chosen as the arbiter brick and it +is not configurable to have any other brick as the arbiter. + +Client/ Mount behaviour: +======================== +By default, client quorum (`cluster.quorum-type`) is set to `auto` for a replica +3 volume when it is created; i.e. at least 2 bricks need to be up to satisfy +quorum and to allow writes. This setting is not to be changed for arbiter +volumes also. Additionally, the arbiter volume has additional some checks to +prevent files from ending up in split-brain: + +* Clients take full file locks when writing to a file as opposed to range locks + in a normal replica 3 volume. + +* If 2 bricks are up and if one of them is the arbiter (i.e. the 3rd brick) *and* + it blames the other up brick, then all FOPS will fail with ENOTCONN (Transport + endpoint is not connected). IF the arbiter doesn't blame the other brick, + FOPS will be allowed to proceed. 'Blaming' here is w.r.t the values of AFR + changelog extended attributes. + +* If 2 bricks are up and the arbiter is down, then FOPS will be allowed. + +* In all cases, if there is only one source before the FOP is initiated and if + the FOP fails on that source, the application will receive ENOTCONN. + +Note: It is possible to see if a replica 3 volume has arbiter configuration from +the mount point. If +`$mount_point/.meta/graphs/active/$V0-replicate-0/options/arbiter-count` exists +and its value is 1, then it is an arbiter volume. Also the client volume graph +will have arbiter-count as a xlator option for AFR translators. + +Self-heal daemon behaviour: +=========================== +Since the arbiter brick does not store any data for the files, data-self-heal +from the arbiter brick will not take place. For example if there are 2 source +bricks B2 and B3 (B3 being arbiter brick) and B2 is down, then data-self-heal +will *not* happen from B3 to sink brick B1, and will be pending until B2 comes +up and heal can happen from it. Note that metadata and entry self-heals can +still happen from B3 if it is one of the sources. |