summaryrefslogtreecommitdiffstats
path: root/Feature Planning/GlusterFS 3.6/New Style Replication.md
diff options
context:
space:
mode:
authorraghavendra talur <raghavendra.talur@gmail.com>2015-08-20 15:09:31 +0530
committerHumble Devassy Chirammal <humble.devassy@gmail.com>2015-08-31 02:27:22 -0700
commit9e9e3c5620882d2f769694996ff4d7e0cf36cc2b (patch)
tree3a00cbd0cc24eb7df3de9b2eeeb8d42ee9175f88 /Feature Planning/GlusterFS 3.6/New Style Replication.md
parentf6055cdb4dedde576ed8ec55a13814a69dceefdc (diff)
Create basic directory structure
All new features specs go into in_progress directory. Once signed off, it should be moved to done directory. For now, This change moves all the Gluster 4.0 feature specs to in_progress. All other specs are under done/release-version. More cleanup required will be done incrementally. Change-Id: Id272d301ba8c434cbf7a9a966ceba05fe63b230d BUG: 1206539 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: http://review.gluster.org/11969 Reviewed-by: Humble Devassy Chirammal <humble.devassy@gmail.com> Reviewed-by: Prashanth Pai <ppai@redhat.com> Tested-by: Humble Devassy Chirammal <humble.devassy@gmail.com>
Diffstat (limited to 'Feature Planning/GlusterFS 3.6/New Style Replication.md')
-rw-r--r--Feature Planning/GlusterFS 3.6/New Style Replication.md230
1 files changed, 0 insertions, 230 deletions
diff --git a/Feature Planning/GlusterFS 3.6/New Style Replication.md b/Feature Planning/GlusterFS 3.6/New Style Replication.md
deleted file mode 100644
index ffd8167..0000000
--- a/Feature Planning/GlusterFS 3.6/New Style Replication.md
+++ /dev/null
@@ -1,230 +0,0 @@
-Goal
-----
-
-More partition-tolerant replication, with higher performance for most
-use cases.
-
-Summary
--------
-
-NSR is a new synchronous replication translator, complementing or
-perhaps some day replacing AFR.
-
-Owners
-------
-
-Jeff Darcy <jdarcy@redhat.com>
-Venky Shankar <vshankar@redhat.com>
-
-Current status
---------------
-
-Design and prototype (nearly) complete, implementation beginning.
-
-Related Feature Requests and Bugs
----------------------------------
-
-[AFR bugs related to "split
-brain"](https://bugzilla.redhat.com/buglist.cgi?classification=Community&component=replicate&list_id=3040567&product=GlusterFS&query_format=advanced&short_desc=split&short_desc_type=allwordssubstr)
-
-[AFR bugs related to
-"perf"](https://bugzilla.redhat.com/buglist.cgi?classification=Community&component=replicate&list_id=3040572&product=GlusterFS&query_format=advanced&short_desc=perf&short_desc_type=allwordssubstr)
-
-(Both lists are undoubtedly partial because not all bugs in these areas
-using these specific words. In particular, "GFID mismatch" bugs are
-really a kind of split brain, but aren't represented.)
-
-Detailed Description
---------------------
-
-NSR is designed to have the following features.
-
-- Server based - "chain" replication can use bandwidth of both client
- and server instead of splitting client bandwidth N ways.
-
-- Journal based - for reduced network traffic in normal operation,
- plus faster recovery and greater resistance to "split brain" errors.
-
-- Variable consistency model - based on
- [Dynamo](http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf)
- to provide options trading some consistency for greater availability
- and/or performance.
-
-- Newer, smaller codebase - reduces technical debt, enables higher
- replica counts, more informative status reporting and logging, and
- other future features (e.g. ordered asynchronous replication).
-
-Benefit to GlusterFS
-====================
-
-Faster, more robust, more manageable/maintainable replication.
-
-Scope
-=====
-
-Nature of proposed change
--------------------------
-
-At least two new translators will be necessary.
-
-- A simple client-side translator to route requests to the current
- leader among the bricks in a replica set.
-
-- A server-side translator to handle the "heavy lifting" of
- replication, recovery, etc.
-
-Implications on manageability
------------------------------
-
-At a high level, commands to enable, configure, and manage NSR will be
-very similar to those already used for AFR. At a lower level, the
-options affecting things things like quorum, consistency, and placement
-of journals will all be completely different.
-
-Implications on presentation layer
-----------------------------------
-
-Minimal. Most changes will be to simplify or remove special handling for
-AFR's unique behavior (especially around lookup vs. self-heal).
-
-Implications on persistence layer
----------------------------------
-
-N/A
-
-Implications on 'GlusterFS' backend
------------------------------------
-
-The journal for each brick in an NSR volume might (for performance
-reasons) be placed on one or more local volumes other than the one
-containing the brick's data. Special requirements around AIO, fsync,
-etc. will be less than with AFR.
-
-Modification to GlusterFS metadata
-----------------------------------
-
-NSR will not use the same xattrs as AFR, reducing the need for larger
-inodes.
-
-Implications on 'glusterd'
---------------------------
-
-Volgen must be able to configure the client-side and server-side parts
-of NSR, instead of AFR on the client side and index (which will no
-longer be necessary) on the server side. Other interactions with
-glusterd should remain mostly the same.
-
-How To Test
-===========
-
-Most basic AFR tests - e.g. reading/writing data, killing nodes,
-starting/stopping self-heal - would apply to NSR as well. Tests that
-embed assumptions about AFR xattrs or other internal artifacts will need
-to be re-written.
-
-User Experience
-===============
-
-Minimal change, mostly related to new options.
-
-Dependencies
-============
-
-NSR depends on a cluster-management framework that can provide
-membership tracking, leader election, and robust consistent key/value
-data storage. This is expected to be developed in parallel as part of
-the glusterd-scalability feature, but can be implemented (in simplified
-form) within NSR itself if necessary.
-
-Documentation
-=============
-
-TBD.
-
-Status
-======
-
-Some parts of earlier implementation updated to current tree, others in
-the middle of replacement.
-
-- [New design](http://review.gluster.org/#/c/8915/)
-
-- [Basic translator code](http://review.gluster.org/#/c/8913/) (needs
- update to new code-generation infractructure)
-
-- [GF\_FOP\_IPC](http://review.gluster.org/#/c/8812/)
-
-- [etcd support](http://review.gluster.org/#/c/8887/)
-
-- [New code-generation
- infrastructure](http://review.gluster.org/#/c/9411/)
-
-- [New data-logging
- translator](https://forge.gluster.org/~jdarcy/glusterfs-core/jdarcys-glusterfs-data-logging)
-
-Comments and Discussion
-=======================
-
-My biggest concern with journal-based replication comes from my previous
-use of DRBD. They do an "activity log"[^1] which sounds like the same
-basic concept. Once that log filled, I experienced cascading failure.
-When the journal can be filled faster than it's emptied this could cause
-the problem I experienced.
-
-So what I'm looking to be convinced is how journalled replication
-maintains full redundancy and how it will prevent the journal input from
-exceeding the capacity of the journal output or at least how it won't
-fail if this should happen.
-
-[jjulian](User:Jjulian "wikilink")
-([talk](User talk:Jjulian "wikilink")) 17:21, 13 August 2013 (UTC)
-
-<hr/>
-This is akin to a CAP Theorem[^2][^3] problem. If your nodes can't
-communicate, what do you do with writes? Our replication approach has
-traditionally been CP - enforce quorum, allow writes only among the
-majority - and for the sake of satisfying user expectations (or POSIX)
-pretty much has to remain CP at least by default. I personally think we
-need to allow an AP choice as well, which is why the quorum levels in
-NSR are tunable to get that result.
-
-So, what do we do if a node runs out of journal space? Well, it's unable
-to function normally, i.e. it's failed, so it can't count toward quorum.
-This would immediately lead to loss of write availability in a two-node
-replica set, and could happen easily enough in a three-node replica set
-if two similarly configured nodes ran out of journal space
-simultaneously. A significant part of the complexity in our design is
-around pruning no-longer-needed journal segments, precisely because this
-is an icky problem, but even with all the pruning in the world it could
-still happen eventually. Therefore the design also includes the notion
-of arbiters, which can be quorum-only or can also have their own
-journals (with no or partial data). Therefore, your quorum for
-admission/journaling purposes can be significantly higher than your
-actual replica count. So what options do we have to avoid or deal with
-journal exhaustion?
-
-- Add more journal space (it's just files, so this can be done
- reactively during an extended outage).
-
-- Add arbiters.
-
-- Decrease the quorum levels.
-
-- Manually kick a node out of the replica set.
-
-- Add admission control, artificially delaying new requests as the
- journal becomes full. (This one requires more code.)
-
-If you do \*none\* of these things then yeah, you're scrod. That said,
-do you think these options seem sufficient?
-
-[Jdarcy](User:Jdarcy "wikilink") ([talk](User talk:Jdarcy "wikilink"))
-15:27, 29 August 2013 (UTC)
-
-<references/>
-
-[^1]: <http://www.drbd.org/users-guide-emb/s-activity-log.html>
-
-[^2]: <http://www.julianbrowne.com/article/viewer/brewers-cap-theorem>
-
-[^3]: <http://henryr.github.io/cap-faq/>