summaryrefslogtreecommitdiffstats
path: root/Feature Planning/GlusterFS 3.6/Thousand Node Gluster.md
diff options
context:
space:
mode:
Diffstat (limited to 'Feature Planning/GlusterFS 3.6/Thousand Node Gluster.md')
-rw-r--r--Feature Planning/GlusterFS 3.6/Thousand Node Gluster.md150
1 files changed, 150 insertions, 0 deletions
diff --git a/Feature Planning/GlusterFS 3.6/Thousand Node Gluster.md b/Feature Planning/GlusterFS 3.6/Thousand Node Gluster.md
new file mode 100644
index 0000000..54c3e13
--- /dev/null
+++ b/Feature Planning/GlusterFS 3.6/Thousand Node Gluster.md
@@ -0,0 +1,150 @@
+Goal
+----
+
+Thousand-node scalability for glusterd
+
+Summary
+=======
+
+This "feature" is really a set of infrastructure changes that will
+enable glusterd to manage a thousand servers gracefully.
+
+Owners
+======
+
+Krishnan Parthasarathi <kparthas@redhat.com>
+Jeff Darcy <jdarcy@redhat.com>
+
+Current status
+==============
+
+Proposed, awaiting summit for approval.
+
+Related Feature Requests and Bugs
+=================================
+
+N/A
+
+Detailed Description
+====================
+
+There are three major areas of change included in this proposal.
+
+- Replace the current order-n-squared heartbeat/membership protocol
+ with a much smaller "monitor cluster" based on Paxos or
+ [Raft](https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf),
+ to which I/O servers check in.
+
+- Use the monitor cluster to designate specific functions or roles -
+ e.g. self-heal, rebalance, leadership in an NSR subvolume - to I/O
+ servers in a coordinated and globally optimal fashion.
+
+- Replace the current system of replicating configuration data on all
+ servers (providing practically no guarantee of consistency if one is
+ absent during a configuration change) with storage of configuration
+ data in the monitor cluster.
+
+Benefit to GlusterFS
+====================
+
+Scaling of our management plane to 1000+ nodes, enabling competition
+with other projects such as HDFS or Ceph which already have or claim
+such scalability.
+
+Scope
+=====
+
+Nature of proposed change
+-------------------------
+
+Functionality very similar to what we need in the monitor cluster
+already exists in some of the Raft implementations, notably
+[etcd](https://github.com/coreos/etcd). Such a component could provide
+the services described above to a modified glusterd running on each
+server. The changes to glusterd would mostly consist of removing the
+current heartbeat and config-storage code, replacing it with calls into
+(and callbacks from) the monitor cluster.
+
+Implications on manageability
+-----------------------------
+
+Enabling/starting monitor daemons on those few nodes that have them must
+be done separately from starting glusterd. Since the changes mostly are
+to how each glusterd interacts with others and with its own local
+storage back end, interactions with the CLI or with glusterfsd need not
+change.
+
+Implications on presentation layer
+----------------------------------
+
+N/A
+
+Implications on persistence layer
+---------------------------------
+
+N/A
+
+Implications on 'GlusterFS' backend
+-----------------------------------
+
+N/A
+
+Modification to GlusterFS metadata
+----------------------------------
+
+The monitor daemons need space for their data, much like that currently
+maintained in /var/lib/glusterd currently.
+
+Implications on 'glusterd'
+--------------------------
+
+Drastic. See sections above.
+
+How To Test
+===========
+
+A new set of tests for the monitor-cluster functionality will need to be
+developed, perhaps derived from those for the external project if we
+adopt one. Most tests related to our multi-node testing facilities
+(cluster.rc) will also need to change. Tests which merely invoke the CLI
+should require little if any change.
+
+User Experience
+===============
+
+Minimal change.
+
+Dependencies
+============
+
+A mature/stable enough implementation of Raft or a similar protocol.
+Failing that, we'd need to develop our own service along similar lines.
+
+Documentation
+=============
+
+TBD.
+
+Status
+======
+
+In design.
+
+The choice of technology and approaches are being discussed on the
+-devel ML.
+
+- "Proposal for Glusterd-2.0" -
+ [1](http://www.gluster.org/pipermail/gluster-users/2014-September/018639.html)
+
+: Though the discussion has become passive, the question is whether we
+ choose to implement consensus algorithm inside our project or depend
+ on external projects that provide similar service.
+
+- "Management volume proposal" -
+ [2](http://www.gluster.org/pipermail/gluster-devel/2014-November/042944.html)
+
+: This has limitations due to the circular dependency making it
+ infeasible.
+
+Comments and Discussion
+=======================