summaryrefslogtreecommitdiffstats
path: root/done/Features/afr-arbiter-volumes.md
diff options
context:
space:
mode:
authorraghavendra talur <raghavendra.talur@gmail.com>2015-08-20 15:09:31 +0530
committerHumble Devassy Chirammal <humble.devassy@gmail.com>2015-08-31 02:27:22 -0700
commit9e9e3c5620882d2f769694996ff4d7e0cf36cc2b (patch)
tree3a00cbd0cc24eb7df3de9b2eeeb8d42ee9175f88 /done/Features/afr-arbiter-volumes.md
parentf6055cdb4dedde576ed8ec55a13814a69dceefdc (diff)
Create basic directory structure
All new features specs go into in_progress directory. Once signed off, it should be moved to done directory. For now, This change moves all the Gluster 4.0 feature specs to in_progress. All other specs are under done/release-version. More cleanup required will be done incrementally. Change-Id: Id272d301ba8c434cbf7a9a966ceba05fe63b230d BUG: 1206539 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: http://review.gluster.org/11969 Reviewed-by: Humble Devassy Chirammal <humble.devassy@gmail.com> Reviewed-by: Prashanth Pai <ppai@redhat.com> Tested-by: Humble Devassy Chirammal <humble.devassy@gmail.com>
Diffstat (limited to 'done/Features/afr-arbiter-volumes.md')
-rw-r--r--done/Features/afr-arbiter-volumes.md56
1 files changed, 56 insertions, 0 deletions
diff --git a/done/Features/afr-arbiter-volumes.md b/done/Features/afr-arbiter-volumes.md
new file mode 100644
index 0000000..e31bc31
--- /dev/null
+++ b/done/Features/afr-arbiter-volumes.md
@@ -0,0 +1,56 @@
+Usage guide: Replicate volumes with arbiter configuration
+==========================================================
+
+Arbiter volumes are replica 3 volumes where the 3rd brick of the replica is
+automatically configured as an arbiter node. What this means is that the 3rd
+brick will store only the file name and metadata, but does not contain any data.
+This configuration is helpful in avoiding split-brains while providing the same
+level of consistency as a normal replica 3 volume.
+
+The arbiter volume can be created with the following command:
+
+ gluster volume create <VOLNAME> replica 3 arbiter 1 host1:brick1 host2:brick2 host3:brick3
+
+Note that the syntax is similar to creating a normal replica 3 volume with the
+exception of the `arbiter 1` keyword. As seen in the command above, the only
+permissible values for the replica count and arbiter count are 3 and 1
+respectively. Also, the 3rd brick is always chosen as the arbiter brick and it
+is not configurable to have any other brick as the arbiter.
+
+Client/ Mount behaviour:
+========================
+
+By default, client quorum (`cluster.quorum-type`) is set to `auto` for a replica
+3 volume when it is created; i.e. at least 2 bricks need to be up to satisfy
+quorum and to allow writes. This setting is not to be changed for arbiter
+volumes also. Additionally, the arbiter volume has additional some checks to
+prevent files from ending up in split-brain:
+
+* Clients take full file locks when writing to a file as opposed to range locks
+ in a normal replica 3 volume.
+
+* If 2 bricks are up and if one of them is the arbiter (i.e. the 3rd brick) *and*
+ it blames the other up brick, then all FOPS will fail with ENOTCONN (Transport
+ endpoint is not connected). IF the arbiter doesn't blame the other brick,
+ FOPS will be allowed to proceed. 'Blaming' here is w.r.t the values of AFR
+ changelog extended attributes.
+
+* If 2 bricks are up and the arbiter is down, then FOPS will be allowed.
+
+* In all cases, if there is only one source before the FOP is initiated and if
+ the FOP fails on that source, the application will receive ENOTCONN.
+
+Note: It is possible to see if a replica 3 volume has arbiter configuration from
+the mount point. If
+`$mount_point/.meta/graphs/active/$V0-replicate-0/options/arbiter-count` exists
+and its value is 1, then it is an arbiter volume. Also the client volume graph
+will have arbiter-count as a xlator option for AFR translators.
+
+Self-heal daemon behaviour:
+===========================
+Since the arbiter brick does not store any data for the files, data-self-heal
+from the arbiter brick will not take place. For example if there are 2 source
+bricks B2 and B3 (B3 being arbiter brick) and B2 is down, then data-self-heal
+will *not* happen from B3 to sink brick B1, and will be pending until B2 comes
+up and heal can happen from it. Note that metadata and entry self-heals can
+still happen from B3 if it is one of the sources.cd \ No newline at end of file