summaryrefslogtreecommitdiffstats
path: root/done/GlusterFS 3.7
diff options
context:
space:
mode:
Diffstat (limited to 'done/GlusterFS 3.7')
-rw-r--r--done/GlusterFS 3.7/Archipelago Integration.md93
-rw-r--r--done/GlusterFS 3.7/BitRot.md211
-rw-r--r--done/GlusterFS 3.7/Clone of Snapshot.md100
-rw-r--r--done/GlusterFS 3.7/Data Classification.md279
-rw-r--r--done/GlusterFS 3.7/Easy addition of Custom Translators.md129
-rw-r--r--done/GlusterFS 3.7/Exports and Netgroups Authentication.md134
-rw-r--r--done/GlusterFS 3.7/Gluster CLI for NFS Ganesha.md120
-rw-r--r--done/GlusterFS 3.7/Gnotify.md168
-rw-r--r--done/GlusterFS 3.7/HA for Ganesha.md156
-rw-r--r--done/GlusterFS 3.7/Improve Rebalance Performance.md277
-rw-r--r--done/GlusterFS 3.7/Object Count.md113
-rw-r--r--done/GlusterFS 3.7/Policy based Split-brain Resolution.md128
-rw-r--r--done/GlusterFS 3.7/SE Linux Integration.md4
-rw-r--r--done/GlusterFS 3.7/Scheduling of Snapshot.md229
-rw-r--r--done/GlusterFS 3.7/Sharding xlator.md129
-rw-r--r--done/GlusterFS 3.7/Small File Performance.md433
-rw-r--r--done/GlusterFS 3.7/Trash.md182
-rw-r--r--done/GlusterFS 3.7/Upcall Infrastructure.md747
-rw-r--r--done/GlusterFS 3.7/arbiter.md100
-rw-r--r--done/GlusterFS 3.7/index.md90
-rw-r--r--done/GlusterFS 3.7/rest-api.md152
21 files changed, 3974 insertions, 0 deletions
diff --git a/done/GlusterFS 3.7/Archipelago Integration.md b/done/GlusterFS 3.7/Archipelago Integration.md
new file mode 100644
index 0000000..69ce61d
--- /dev/null
+++ b/done/GlusterFS 3.7/Archipelago Integration.md
@@ -0,0 +1,93 @@
+Feature
+-------
+
+**Archipelago Integration**
+
+Summary
+-------
+
+This proposal is regarding adding support in libgfapi for better
+integration with archipelago.
+
+Owners
+------
+
+Vijay Bellur <vbellur@redhat.com>
+
+Current status
+--------------
+
+Work in progress
+
+Detailed Description
+--------------------
+
+Please refer to discussion at:
+
+<http://lists.nongnu.org/archive/html/gluster-devel/2013-12/msg00011.html>
+
+Benefit to GlusterFS
+--------------------
+
+More interfaces in libgfapi.
+
+Scope
+-----
+
+To be explained better.
+
+### Nature of proposed change
+
+TBD
+
+### Implications on manageability
+
+N/A
+
+### Implications on presentation layer
+
+TBD
+
+### Implications on persistence layer
+
+No impact
+
+### Implications on 'GlusterFS' backend
+
+No impact
+
+### Modification to GlusterFS metadata
+
+No impact
+
+### Implications on 'glusterd'
+
+No impact
+
+How To Test
+-----------
+
+TBD
+
+User Experience
+---------------
+
+TBD
+
+Dependencies
+------------
+
+TBD
+
+Documentation
+-------------
+
+TBD
+
+Status
+------
+
+In development
+
+Comments and Discussion
+-----------------------
diff --git a/done/GlusterFS 3.7/BitRot.md b/done/GlusterFS 3.7/BitRot.md
new file mode 100644
index 0000000..deca9ee
--- /dev/null
+++ b/done/GlusterFS 3.7/BitRot.md
@@ -0,0 +1,211 @@
+Feature
+=======
+
+BitRot Detection
+
+1 Summary
+=========
+
+BitRot detection is a technique used to identify an “insidious” type of
+disk error where data is silently corrupted with no indication from the
+disk to the storage software layer that an error has occurred. BitRot
+detection is exceptionally useful when using JBOD (which had no way of
+knowing that the data is corrupted on disk) rather than RAID (esp. RAID6
+which has a performance penalty for certain kind of workloads).
+
+2 Use cases
+===========
+
+- Archival/Compliance
+- Openstack cinder
+- Gluster health
+
+Refer
+[here](http://supercolony.gluster.org/pipermail/gluster-devel/2014-December/043248.html)
+for an elaborate discussion on use cases.
+
+3 Owners
+========
+
+Venky Shankar <vshankar@redhat.com, yknev.shankar@gmail.com>
+Raghavendra Bhat <rabhat@redhat.com>
+Vijay Bellur <vbellur@redhat.com>
+
+4 Current Status
+================
+
+Initial approach is [here](http://goo.gl/TSjLJn). The document goes into
+some details on why one could end up with "rotten" data and approaches
+taken by block level filesystems to detect and recover from bitrot. Some
+of the design goals are carry forwarded and made to fit with GlusterFS.
+
+Status as of 11th Feb 2015:
+
+Done
+
+- Object notification
+- Object expiry tracking using timer-wheel
+
+In Progress
+
+- BitRot server stub
+- BitRot Daemon
+
+5 Detailed Description
+======================
+
+**NOTE: Points marked with [NIS] are "Not in Scope" for 3.7 release.**
+
+The basic idea is to maintain file data/metadata checksums as an
+extended attribute. Checksum granularity is per file for now, however
+this can be extended to be per "block-size" blocks (chunks). A BitRot
+daemon per brick is responsible for checksum maintenance for files local
+to the brick. "Distributifying" enables scale and effective resource
+utilization of the cluster (memory, disk, etc..).
+
+BitD (BitRot Deamon)
+
+- Daemon per brick takes care of maintaining checksums for data local
+ to the brick.
+- Checksums are SHA256 (default) hash
+ - Of file data (regular files only)
+ - "Rolling" metadata checksum of extended attributes (GlusterFS
+ xattrs) **[NIS]**
+ - Master checksum: checksum of checksums (data + metadata)
+ **[NIS]**
+ - Hashtype is persisted along side the checksum and can be tuned
+ per file type
+
+- Checksum maintenance is "lazy"
+ - "not" inline to the data path (expensive)
+ - List of changed files is notified by the filesystem although a
+ single filesystem scan is needed to get to the current state.
+ BitD is built over existing journaling infrastructure (a.k.a
+ changelog)
+ - Laziness is governed by policies that determine when to
+ (re)calculate checksum. IOW, checksum is calculated when a file
+ is considered "stable"
+ - Release+Expiry: on a file descriptor release and an
+ inactivity for "X" seconds.
+
+- Filesystem scan
+ - Required once after stop/start or for initial data set
+ - Xtime based scan (marker framework)
+ - Considerations
+ - Parallelize crawl
+ - Sort by inode \# to reduce disk seek
+ - Integrate with libgfchangelog
+
+Detection
+
+- Upon file/data access (expensive)
+ - open() or read() (disabled by default)
+- Data scrubbing
+ - Filesystem checksum validation
+ - "Bad" file marking
+ - Deep: validate data checksum
+ - Timestamp of last validity - used for replica repair **[NIS]**
+ - Repair **[NIS]**
+ - Shallow: validate metadata checksum **[NIS]**
+
+Repair/Recover stratergies **[NIS]**
+
+- Mirrored file data
+ - self-heal
+- Erasure Codes (ec xlator)
+
+It would also be beneficial to use inbuilt bitrot capabilities of
+backend filesystems such as btrfs. For such cases, it's better to
+"handover" bulk of the work of the backend filesystem and have
+minimalistic implementation on the daemon side. This area needs to be
+explored more (i.e., ongoing and not for 3.7).
+
+6 Benefit to GlusterFS
+======================
+
+By the ability of detect silent corruptions (and even backend tinkering
+of a file), reading bad data could be avoided and possibly using it as a
+truthful source to heal other copies and may be even remotely replicate
+to a backup node damaging a good copy. Scrubbing allows pro-active
+detection of corrupt files and repairing them before access.
+
+7 Design and CLI specification
+==============================
+
+- [Design document](http://goo.gl/Mjy4mD)
+- [CLI specification](http://goo.gl/2o12Fn)
+
+8 Scope
+=======
+
+8.1. Nature of proposed change
+------------------------------
+
+The most basic changes being introduction of a server side daemon (per
+brick) to maintain file data checksums. Changes to changelog and
+consumer library would be needed to support requirements for bitrot
+daemon.
+
+8.2. Implications on manageability
+----------------------------------
+
+Introduction of new CLI commands to enable bitrot detection, trigger
+scrub, query file status, etc.
+
+8.3. Implications on presentation layer
+---------------------------------------
+
+N/A
+
+8.4. Implications on persistence layer
+--------------------------------------
+
+Introduction of new extended attributes.
+
+8.5. Implications on 'GlusterFS' backend
+----------------------------------------
+
+As in 8.4
+
+8.6. Modification to GlusterFS metadata
+---------------------------------------
+
+BitRot related extended attributes
+
+8.7. Implications on 'glusterd'
+-------------------------------
+
+Supporting changes to CLI.
+
+9 How To Test
+=============
+
+10 User Experience
+==================
+
+Refer to Section \#7
+
+11 Dependencies
+===============
+
+Enhancement to changelog translator (and libgfchangelog) is the most
+prevalent change. Other dependencies include glusterd.
+
+12 Documentation
+================
+
+TBD
+
+13 Status
+=========
+
+- Initial set of patches merged
+- Bug fixing/enhancement in progress
+
+14 Comments and Discussion
+==========================
+
+More than welcome :-)
+
+- [BitRot tracker Bug](https://bugzilla.redhat.com/1170075)
+- [BitRot hash computation](https://bugzilla.redhat.com/914874) \ No newline at end of file
diff --git a/done/GlusterFS 3.7/Clone of Snapshot.md b/done/GlusterFS 3.7/Clone of Snapshot.md
new file mode 100644
index 0000000..ca6304c
--- /dev/null
+++ b/done/GlusterFS 3.7/Clone of Snapshot.md
@@ -0,0 +1,100 @@
+Feature
+-------
+
+Clone of a Snapshot
+
+Summary
+-------
+
+GlusterFS volume snapshot provides point-in-time copy of a GlusterFS
+volume. When we take a volume snapshot, the newly created snap volume is
+a read only volume.
+
+By this feature, this snap volume can be later 'cloned' to create a new
+regular volume which contains the same contents of snapshot bricks. This
+is a space efficient clone therefore it will be created instantaneously
+and will share the disk space in the back-end, just like a snapshot and
+the origin volume.
+
+Owners
+------
+
+Mohammed Rafi KC <rkavunga@redhat.com>
+
+Current status
+--------------
+
+Requiremnt for openstack manila.
+
+Detailed Description
+--------------------
+
+Snapshot create will take point-in-time snapshot of a volume. upon
+successful completion, it will create a new read/only volume. But the
+new volume is not considered as a regular volume, which prevents us to
+perform any volume related operations on this snapshot volume. The
+ultimate aim of this feature is creating a new regular volume out of
+this snap.
+
+For e.g.:
+
+ gluster snapshot create snap1 vol1
+
+The above command will create a read-only snapshot "snap1" from volume
+vol1.
+
+ gluster snapshot clone share1 snap1
+
+The above command will create a regular gluster volume share1 from
+snap1.
+
+Benefit to GlusterFS
+--------------------
+
+We will have a writable snapshot.
+
+Scope
+-----
+
+### Nature of proposed change
+
+Modification to glusterd snapshot code.
+
+### Implications on manageability
+
+glusterd,gluster cli
+
+### Implications on 'GlusterFS' backend
+
+There will be performance degradation for the first write of a each
+block of main volume.
+
+### Modification to GlusterFS metadata
+
+none
+
+How To Test
+-----------
+
+create a volume Take snapshot create a clone. start the clone. cloned
+volume should support all operation for a regular volume.
+
+User Experience
+---------------
+
+there will an additional cli option for snapshot. gluster snapshot clone
+```<clonename> <snapname> [<description> <description test>] [force]```
+
+Dependencies
+------------
+
+Documentation
+-------------
+
+Status
+------
+
+In development
+
+Comments and Discussion
+-----------------------
diff --git a/done/GlusterFS 3.7/Data Classification.md b/done/GlusterFS 3.7/Data Classification.md
new file mode 100644
index 0000000..a3bb35c
--- /dev/null
+++ b/done/GlusterFS 3.7/Data Classification.md
@@ -0,0 +1,279 @@
+Goal
+----
+
+Support tiering and other policy-driven (as opposed to pseudo-random)
+placement of files.
+
+Summary
+-------
+
+"Data classification" is an umbrella term covering things:
+locality-aware data placement, SSD/disk or
+normal/deduplicated/erasure-coded data tiering, HSM, etc. They share
+most of the same infrastructure, and so are proposed (for now) as a
+single feature.
+
+NB this has also been referred to as "DHT on DHT" in various places,
+though "unify on DHT" might be more accurate.
+
+Owners
+------
+
+Dan Lambright <dlambrig@redhat.com>
+
+Joseph Fernandes <josferna@redhat.com>
+
+Current status
+--------------
+
+Cache tiering under development upstream. Tiers may be added to existing
+volumes. Tiers are made up of bricks.
+
+Volume-granularity tiering has been prototyped (bugzilla \#9387) and
+merged in a branch (origin/fix\_9387) to the cache tiering forge
+project. This will allow existing volumes to be combined into a single
+one offering both functionality.
+
+Related Feature Requests and Bugs
+---------------------------------
+
+N/A
+
+Detailed Description
+--------------------
+
+The basic idea is to layer multiple instances of a modified DHT
+translator on top of one another, each making placement/rebalancing
+decisions based on different criteria. The current consistent-hashing
+method is one possibility. Other possibilities involve matching
+file/directory characteristics to subvolume characteristics.
+
+- File/directory characteristics: size, age, access rate, type
+ (extension), ...
+
+- Subvolume characteristics: physical location, storage type (e.g.
+ SSD/disk/PCM, cache), encoding method (e.g. erasure coded or
+ deduplicated).
+
+- Either (arbitrary tags assigned by user): owner, security level,
+ HIPPA category
+
+For example, a first level might redirect files based on security level,
+a second level might match age or access rate vs. SSD-based or
+disk-based subvolumes, and then a third level might use consistent
+hashing across several similarly-equipped bricks.
+
+### Cache tier
+
+The cache tier will support data placement based on access frequency.
+Frequently accessed files shall exist on a "hot" subvolume. Infrequently
+accessed files shall reside on a "cold" subvolume. Files will migrate
+between the hot and cold subvolumes according to observed usage.
+
+Read caching is a desired future enhancement.
+
+When the "cold" subvolume is expensive to use (e.g. erasure coded), this
+feature will mitigate its overhead for many workloads.
+
+Some use cases:
+
+- fast subvolumes are SSDs, slow subvolumes are normal disks
+- fast subvolumes are normal disks, slow subvolumes are erasure coded.
+- fast subvolume is backed up more frequently than slow tier.
+- read caching only , good in cases where migration overhead is
+ unacceptable
+
+Benefit to GlusterFS
+--------------------
+
+By itself, data classification can be used to improve performance (by
+optimizing where "hot" files are placed) and security or regulatory
+compliance (by placing sensitive data only on the most secure storage).
+It also serves as an enabling technology for other enhancements by
+allowing users to combine more cost-effective or archivally oriented
+storage for the majority of their data with higher-performance storage
+to absorb the majority of their I/O load. This enabling effect applies
+e.g. to compression, deduplication, erasure coding, or bitrot detection.
+
+Scope
+-----
+
+### Nature of proposed change
+
+The most basic set of changes involves making the data-placement part of
+DHT more modular, and providing modules/plugins to do the various kinds
+of intelligent placement discussed above. Other changes will be
+explained in subsequent sections.
+
+### Implications on manageability
+
+Eventually, the CLI must provide users with a way to arrange bricks into
+a hierarchy, and assign characteristics such as storage type or security
+level at any level within that hierarchy. They must also be able to
+express which policy (plugin), with which parameters, should apply to
+any level. A data classification language has been proposed to help
+express these concepts, see link above.
+
+The cache tier's graph is more rigid and can be expressed using the
+"volume attach-cache" command described below. Both a "hot" tier and
+"cold tier" are made up of dispersed / distributed / replicated bricks
+in the same manner as a normal volume, and they are combined with the
+tier translator.
+
+#### Cache Tier
+
+An "attach" command will declare an existing volume as "cold" and create
+a new "hot" volume which is appended to it. Together, the combination is
+a single "cache tiered" volume. For example:
+
+gluser volume attach-tier [name] [redundancy \#] brick1 brick2 .. brickN
+
+.. will attach a hot tier made up of brick[1..N] to existing volume
+[name].
+
+The tier can be detached. Data is first migrated off the hot volume, in
+the same manner as brick removal, and then the hot volume is removed
+from the volfile.
+
+gluster volume detach-tier brick1,...,brickN
+
+To start cache tiering:
+
+gluster volume rebalance [name] tier start
+
+Enable the change time recorder:
+
+gluster voiume set [name] features.ctr-enabled on
+
+Other cache parameters:
+
+tier-demote-frequency - how often thread wakes up to demote data
+
+tier-promote-frequency - as above , to promote data
+
+To stop it:
+
+gluster volume rebalance [name] tier stop
+
+To get status:
+
+gluster volume rebalance [name] tier status
+
+upcoming:
+
+A "pause-tier" command will allow users to stop using the hot tier.
+While paused, data will be migrated off the hot tier to the cold tier,
+and all I/Os will be forwarded to the cold tier. A status CLI will
+indicate how much data remains to be "flushed" from the hot tier to the
+cold tier.
+
+### Implications on presentation layer
+
+N/A
+
+### Implications on persistence layer
+
+N/A
+
+### Implications on 'GlusterFS' backend
+
+A tiered volume is a new volume type.
+
+Simple rules may be represented using volume "options" key-value in the
+volfile. Eventually, for more elaborate graphs, some information about a
+brick's characteristics and relationships (within the aforementioned
+hierarchy) may be stored on the bricks themselves as well as in the
+glusterd configuration. In addition, the volume's "info" file may
+include an adjacency list to represent more elaborate graphs.
+
+### Modification to GlusterFS metadata
+
+There are no plans to change meta-data for the cache tier. However in
+the future, categorizing files and directories (especially with
+user-defined tags) may require additional xattrs.
+
+### Implications on 'glusterd'
+
+Finally, volgen must be able to convert these specifications into a
+corresponding hierarchy of translators and options for those
+translators.
+
+Adding and removing tiers dynamically closely resembles the add and
+remove brick operations.
+
+How To Test
+-----------
+
+Eventually, new tests will be needed to set up multi-layer hierarchies,
+create files/directories, issue rebalance commands etc. and ensure that
+files end up in the right place(s). Many of the tests are
+policy-specific, e.g. to test an HSM policy one must effectively change
+files' ages or access rates (perhaps artificially).
+
+Interoperability tests between the Snap, geo-rep, and quota features are
+necessary.
+
+### Cache tier
+
+Tests should include:
+
+Automated tests are under development in the forge repository in file
+tier.t
+
+- The performance of "cache friendly" workloads (e.g. repeated access to
+a small set of files) is improved.
+
+- Performance is not substantially worse in "cache unfriendly" workloads
+(e.g. sequential writes over large numbers of files.)
+
+- Performance should not become substantially worse when the hot tier's
+bricks become full.
+
+User Experience
+---------------
+
+The hierarchical arrangement of bricks, with attributes and policies
+potentially at many levels, represents a fundamental change to the
+current "sea of identical bricks" model. Eventually, some commands that
+currently apply to whole volumes will need to be modified to work on
+sub-volume-level groups (or even individual bricks) as well.
+
+The cache tier must provide statistics on data migration.
+
+Dependencies
+------------
+
+Documentation
+-------------
+
+See below.
+
+Status
+------
+
+Cache tiering implementation in progress for 3.7; some bits for more
+general DC also done (fix 9387).
+
+- [Syntax
+ proposal](https://docs.google.com/presentation/d/1e8tuh9DKNi9eCMrdt5vetppn1D3BiJSmfR7lDW2wRvA/edit#slide=id.p)
+ (dormant).
+- [Syntax prototype](https://forge.gluster.org/data-classification)
+ (dormant, not part of cache tiering).
+- [Cache tier
+ design](https://docs.google.com/document/d/1cjFLzRQ4T1AomdDGk-yM7WkPNhAL345DwLJbK3ynk7I/edit)
+- [Bug 763746](https://bugzilla.redhat.com/763746) - We need an easy
+ way to alter client configs without breaking DVM
+- [Bug 905747](https://bugzilla.redhat.com/905747) - [FEAT] Tier
+ support for Volumes
+- [Working tree for
+ tiering](https://forge.gluster.org/data-classification/data-classification)
+- [Volgen changes for general DC](http://review.gluster.org/#/c/9387/)
+- [d\_off changes to allow stacked
+ DHTs](https://www.mail-archive.com/gluster-devel%40gluster.org/msg03155.html)
+ (prototyped)
+- [Video on the concept](https://www.youtube.com/watch?v=V4cvawIv1qA)
+ Efficient Data Maintenance in GlusterFS using DataBases : Data
+ Classification as the case study
+
+Comments and Discussion
+-----------------------
diff --git a/done/GlusterFS 3.7/Easy addition of Custom Translators.md b/done/GlusterFS 3.7/Easy addition of Custom Translators.md
new file mode 100644
index 0000000..487770e
--- /dev/null
+++ b/done/GlusterFS 3.7/Easy addition of Custom Translators.md
@@ -0,0 +1,129 @@
+Feature
+-------
+
+Easy addition of custom translators
+
+Summary
+-------
+
+I'd like to propose we add a way for people to easily add custom
+translators they've written. (using C, or Glupy, or whatever)
+
+Owners
+------
+
+Justin Clift <jclift@redhat.com>
+Anand Avati <avati@redhat.com>
+
+Current status
+--------------
+
+At present, when a custom translator has been developed it's difficult
+to get it included in generated .vol files properly.
+
+It **can** be done using the GlusterFS "filter" mechanism, but that's
+non optimal and open to catastrophic failure.
+
+Detailed Description
+--------------------
+
+Discussed on the gluster-devel mailing list here:
+
+[http://lists.nongnu.org/archive/html/gluster-devel/2013-08/msg00074.html](http://lists.nongnu.org/archive/html/gluster-devel/2013-08/msg00074.html)
+
+We could have a new install Gluster sub-dir, which takes a .so/.py
+translator file, and a JSON fragment to say what to do with it. No CLI.
+
+This would suit deployment via packaging, and should be simple enough
+for developers to make use of easily as well.
+
+Benefit to GlusterFS
+--------------------
+
+Having an easily usable / deployable approach for custom translators is
+a key part of extending the Gluster Developer Community, especially in
+combination with rapid feature prototyping through Glupy.
+
+Scope
+-----
+
+### Nature of proposed change
+
+Modification of existing code, to enable much easier addition of custom
+translators.
+
+### Implications on packaging
+
+The gluster-devel package should include all the necessary header and
+library files to compile a standalone glusterfs translator.
+
+### Implications on development
+
+/usr/share/doc/gluster-devel/examples/translators/hello-world should
+contain skeleton translator code (well commented), README.txt and build
+files. This code becomes the starting point to implement a new
+translator. Make a few changes and you should be able to build, install,
+test and package your translator.
+
+Ideally, this would be implemented via a script.
+
+Similar to autoproject, "translator-gen NAME" should produce all the
+necessary skeleton translator code and associated files. This avoids
+erroneous find-replace steps.
+
+### Implications on manageability
+
+TBD
+
+### Implications on presentation layer
+
+N/A
+
+### Implications on persistence layer
+
+TBD
+
+### Implications on 'GlusterFS' backend
+
+TBD
+
+### Modification to GlusterFS metadata
+
+TBD
+
+### Implications on 'glusterd'
+
+TBD
+
+How To Test
+-----------
+
+TBD
+
+User Experience
+---------------
+
+TBD
+
+Dependencies
+------------
+
+No new dependencies.
+
+Documentation
+-------------
+
+At least "Getting Started" documentation and API documentation needs to
+be created, including libglusterfs APIs.
+
+Status
+------
+
+Initial concept proposal only.
+
+Comments and Discussion
+-----------------------
+
+- An initial potential concept for the JSON fragment is on the mailing
+ list:
+ - <http://lists.nongnu.org/archive/html/gluster-devel/2013-08/msg00080.html>
diff --git a/done/GlusterFS 3.7/Exports and Netgroups Authentication.md b/done/GlusterFS 3.7/Exports and Netgroups Authentication.md
new file mode 100644
index 0000000..03b43f0
--- /dev/null
+++ b/done/GlusterFS 3.7/Exports and Netgroups Authentication.md
@@ -0,0 +1,134 @@
+Feature
+-------
+
+Exports and Netgroups Authentication for NFS
+
+Summary
+-------
+
+This feature adds Linux-style exports & netgroups authentication to
+Gluster's NFS server. More specifically, this feature allows you to
+restrict access to specific clients & netgroups for both Gluster volumes
+and subdirectories within Gluster volumes.
+
+Owners
+------
+
+Shreyas Siravara
+Richard Wareing
+
+Current Status
+--------------
+
+Today, Gluster can restrict access to volumes through simple IP list.
+This feature makes that capability more scalable by allowing large lists
+of IPs to be managed through a netgroup. It also allows more granular
+permission handling on volumes.
+
+Related Feature Requests and Bugs
+---------------------------------
+
+- [Bug 1143880](https://bugzilla.redhat.com/1143880): Exports and
+ Netgroups Authentication for Gluster NFS mount
+
+Patches ([Gerrit
+link](http://review.gluster.org/#/q/project:glusterfs+branch:master+topic:bug-1143880,n,z)):
+
+- [\#1](http://review.gluster.org/9359): core: add generic parser
+ utility
+- [\#2](http://review.gluster.org/9360): nfs: add structures and
+ functions for parsing netgroups
+- [\#3](http://review.gluster.org/9361): nfs: add support for separate
+ 'exports' file
+- [\#4](http://review.gluster.org/9362): nfs: more fine grained
+ authentication for the MOUNT protocol
+- [\#5](http://review.gluster.org/9363): nfs: add auth-cache for the
+ MOUNT protocol
+- [\#6](http://review.gluster.org/8758): gNFS: Export / Netgroup
+ authentication on Gluster NFS mount
+- [\#7](http://review.gluster.org/9364): glusterd: add new NFS options
+ for exports/netgroups and related caching
+- [\#8](http://review.gluster.org/9365): glusterfsd: add
+ "print-netgroups" and "print-exports" command
+
+Detailed Description
+--------------------
+
+This feature allows users to restrict access to Gluster volumes (and
+subdirectories within a volume) to specific IPs (exports authentication)
+or a netgroup (netgroups authentication), or a combination of both.
+
+Benefit to GlusterFS
+--------------------
+
+This is a scalable security model and allows more granular permissions.
+
+Scope
+-----
+
+### Nature of proposed change
+
+This change modifies the NFS server code and the mount daemon code. It
+adds two parsers for the exports & netgroups files as well as some files
+relating to caching to improve performance.
+
+### Implications on manageability
+
+The authentication can be turned off with a simply volume setting
+('gluster vol set <VOLNAME> nfs.exports-auth-enable off'). The feature
+has some tweakable parameters (how long authorizations should be cached,
+etc.) that can be tweaked through the CLI interface.
+
+### Implications on presentation layer
+
+Adds per-fileop authentication to the NFS server. No other elements of
+the presentation layer are affected.
+
+### Implications on persistence layer
+
+No implications.
+
+### Implications on 'GlusterFS' backend
+
+No implications.
+
+### Modification to GlusterFS metadata
+
+No modifications.
+
+### Implications on 'glusterd'
+
+Adds a few configuration options to NFS to tweak the authentication
+model.
+
+How To Test
+-----------
+
+Restrict some volume in the exports file to some IP, turn on the
+authentication through the Gluster CLI and see mounts/file-operations
+denied (or authorized depending on your setup).
+
+User Experience
+---------------
+
+Authentication can be toggled through the command line.
+
+Dependencies
+------------
+
+No external dependencies.
+
+Documentation
+-------------
+
+TBD
+
+Status
+------
+
+Feature complete, currently testing & working on enhancements.
+
+Comments and Discussion
+-----------------------
+
+TBD
diff --git a/done/GlusterFS 3.7/Gluster CLI for NFS Ganesha.md b/done/GlusterFS 3.7/Gluster CLI for NFS Ganesha.md
new file mode 100644
index 0000000..94028e4
--- /dev/null
+++ b/done/GlusterFS 3.7/Gluster CLI for NFS Ganesha.md
@@ -0,0 +1,120 @@
+Feature
+-------
+
+Gluster CLI support to manage nfs-ganesha exports.
+
+Summary
+-------
+
+NFS-ganesha support for GlusterFS volumes has been operational for quite
+some now. In the upcoming release, we intend to provide gluster CLI
+commands to manage nfs-ganesha exports analogous to the commands
+provided for Gluster-NFS. CLI commands to support ganesha specific
+options shall also be introduced.
+
+Owners
+------
+
+Meghana Madhusudhan
+
+Current status
+--------------
+
+1. Options nfs-ganesha.enable and nfs-ganesha.host defined in
+ gluster-nfs code.
+2. Writing into config files and starting nfs-ganesha is done as part
+ of hook scripts.
+3. User has to manually stop gluster-nfs and configure DBus interface.(
+ Required to add/remove exports dynamically)
+4. Volume level options
+
+ gluster vol set testvol nfs-ganesha.host 10.70.43.78
+ gluster vol set testvol nfs-ganesha.enable on
+
+Drawbacks
+---------
+
+1. Volume set options show success status irrespective of what the
+ outcome is. Post phase of the hook scipts do not allow us to handle
+ errors.
+2. Multi-headed ganesha scenarios were difficult to avoid in this
+ approach.
+
+Related Feature Requests and Bugs
+---------------------------------
+
+Detailed Description
+--------------------
+
+Benefit to GlusterFS
+--------------------
+
+These CLI options is aimed to make the switch between gluster-nfs and
+nfs-ganesha seamless. The approach is to find a way where the end user
+executes the kind of commands that he is already familiar with.
+
+Scope
+-----
+
+### Nature of proposed change
+
+The CLI integration would mean introduction of a number of options that
+are analogous to gluster-nfs. A dummy translator will be introduced on
+the client side for this purpose. Having it as a separate translator
+would provide the necessary modularity and the correct placeholder for
+all nfs-ganesha related functions. When the translator is loaded, all
+the options that are enabled for nfs-ganesha will be listed in that
+(nfs-ganesha) block. This approach will make the user experience with
+nfs-ganesha close to the one that's familiar.
+
+### Implications on manageability
+
+All the options related to nfs-ganesha will appear in the volfile once
+the nfs-ganesha translator is enabled.
+
+### Implications on presentation layer
+
+Gluster-nfs should be disabled to export any volume via nfs-ganesha None
+
+### Implications on persistence layer
+
+None
+
+### Implications on 'GlusterFS' backend
+
+None
+
+### Modification to GlusterFS metadata
+
+None
+
+### Implications on 'glusterd'
+
+Some code will be added to glusterd to manage nfs-ganesha options.
+
+How To Test
+-----------
+
+Execute CLI commands and check for expected behaviour.
+
+User Experience
+---------------
+
+User will be introduced to new CLI commands to manage nfs-ganesha
+exports. Most of the commands will be volume level options.
+
+Dependencies
+------------
+
+None
+
+Documentation
+-------------
+
+<Status of development - Design Ready, In development, Completed> In
+development
+
+Comments and Discussion
+-----------------------
+
+The feature page is not complete as yet. This will be updated regularly.
diff --git a/done/GlusterFS 3.7/Gnotify.md b/done/GlusterFS 3.7/Gnotify.md
new file mode 100644
index 0000000..4f2597c
--- /dev/null
+++ b/done/GlusterFS 3.7/Gnotify.md
@@ -0,0 +1,168 @@
+Feature
+=======
+
+GlusterFS Backup API (a.k.a Gnotify)
+
+1 Summary
+=========
+
+Gnotify is analogous to inotify(7) for Gluster distributed filesystem to
+monitor filesystem events. Currently a similar mechanism exist via
+libgfchangelog (per-brick), but that's more of notification + poll
+based. This feature makes the notification purely callback based and
+provides an API that resembles inotify's block on read() for events.
+There may be efforts to support filesystem notifications on the client
+at a volume level.
+
+2 Owners
+========
+
+Venky Shankar <vshankar@redhat.com>
+Aravinda V K <avishwan@redhat.com>
+
+3 Current Status
+================
+
+As of now, there exist "notification + poll" based event consumption
+mechanism (used by Geo-replication). This has vastly improved
+performance (as filesystem crawl goes away) and has a set of APIs that
+respond to event queries by an application. We call this the "higher
+level" API as the application needs to deal with changelogs (user
+consumable journals) taking care of format, record position, etc..
+
+Proposed change would be to make the API simple, elegant and "backup"
+friendly apart from designing it to be "purely" notify based. Engaging
+the community is a must so as to identify how various backup utilities
+work and prototype APIs accordingly.
+
+4 Detailed Description
+======================
+
+The idea is to have a set of APIs use by applications to retrieve a list
+of changes in the filesystem. As of now, the changes are classified in
+to three categories:
+
+- Entry operation
+ - Operations that act on filesystem namespace such as creat(),
+ unlink(), rename(), etc. fall into this category. These
+ operation require parent inode and the basename as part of the
+ file operation method.
+
+- Data operation
+ - Operations that modify data blocks fall into this category:
+ write(), truncate(), etc.
+
+- Metadata operation
+ - Operation that modify inode data such as setattr(), setxattr()
+ [set extended attributes] etc. fall in this category.
+
+Details of the record format and the consumer library (libgfchangelog)
+is explained in this
+[document](https://github.com/gluster/glusterfs/blob/master/doc/features/geo-replication/libgfchangelog.md).
+Operations that are persisted in the journal can be notified. Therefore,
+operations such as open(), close() are not notified (via journals
+consumption). It's beneficial that notifications for such operations be
+short circuited directly from the changelog translator to
+libgfchangelog.
+
+For gnotify, we introduce a set of low level APIs. Using the low level
+interface relieves the application of knowing the record format and
+other details such as journal state, leave alone periodic polling which
+could be expensive at times. Low level interface induces callback based
+programming model (and an intofy() type blocking read() call) with
+minimum heavy loading from the application.
+
+Now we list down the API prototype for the same (NOTE: prototype is
+subjected to change)
+
+- changelog\_low\_level\_register()
+
+- changelog\_put\_buffer()
+
+It's also necessary to provide an interface to get changes via
+filesystem crawl based on changed time (xtime): beneficial for initial
+crawl when journals are not available or after a stop/start.
+
+5 Benefit to GlusterFS
+======================
+
+Integrating backup applications with GlusterFS to incrementally backup
+the filesystem is a powerful functionality. Having notification back up
+to \*each\* client adds up to the usefulness of this feature. Apart from
+backup perspective, journals can be used by utilities such as self-heal
+daemon and Geo-replication (which already uses the high level API).
+
+6 Scope
+=======
+
+6.1. Nature of proposed change
+------------------------------
+
+Changes to the changelog translator and consumer library (plus
+integration of parallel filesystem crawl and exposing a API)
+
+6.2. Implications on manageability
+----------------------------------
+
+None
+
+6.3. Implications on presentation layer
+---------------------------------------
+
+None
+
+6.4. Implications on persistence layer
+--------------------------------------
+
+None
+
+6.5. Implications on 'GlusterFS' backend
+----------------------------------------
+
+None
+
+6.6. Modification to GlusterFS metadata
+---------------------------------------
+
+Introduction of 'xtime' extended attribute . This is nothing new as it's
+already maintained by marker translator. Now with integrating 'xsync'
+crawl with libgfchangelog, 'xtime' would be additionally maintained by
+the library.
+
+6.7. Implications on 'glusterd'
+-------------------------------
+
+None
+
+7 How To Test
+=============
+
+Test backup scripts integrated with the API or use shipped 'gfind' tool
+as an example.
+
+8 User Experience
+=================
+
+Easy to use backup friendly API, well integrated with GlusterFS
+ecosystem. Does away with polling or expensive duplication of filesystem
+crawl code.
+
+9 Dependencies
+==============
+
+None
+
+10 Documentation
+================
+
+TBD
+
+11 Status
+=========
+
+Design/Development in progress
+
+12 Comments and Discussion
+==========================
+
+More than welcome :-)
diff --git a/done/GlusterFS 3.7/HA for Ganesha.md b/done/GlusterFS 3.7/HA for Ganesha.md
new file mode 100644
index 0000000..fbd3192
--- /dev/null
+++ b/done/GlusterFS 3.7/HA for Ganesha.md
@@ -0,0 +1,156 @@
+Feature
+-------
+
+HA support for NFS-ganesha.
+
+Summary
+-------
+
+Automated resource monitoring and fail-over of the ganesha.nfsd in a
+cluster of GlusterFS and NFS-Ganesha servers.
+
+Owners
+------
+
+Kaleb Keithley
+
+Current status
+--------------
+
+Implementation is in progress.
+
+Related Feature Requests and Bugs
+---------------------------------
+
+- [Gluster CLI for
+ Ganesha](Features/Gluster_CLI_for_ganesha "wikilink")
+- [Upcall Infrastructure](Features/Upcall-infrastructure "wikilink")
+
+Detailed Description
+--------------------
+
+The implementation uses the Corosync and Pacemaker HA solution. The
+implementation consists of three parts:
+1. a script for setup and
+teardown of the clustering.
+2. three new Pacemaker resource agent files,
+and
+3. use of the existing IPaddr and Dummy Pacemaker resource agents
+for handling a floating Virtual IP address (VIP) and putting the
+ganesha.nfsd into Grace.
+
+The three new resource agents are tentatively named ganesha\_grace,
+ganesha\_mon, and ganesha\_nfsd.
+
+The ganesha\_nfsd resource agent is cloned on all nodes in the cluster.
+Each ganesha\_nfsd resource agent is responsible for mounting and
+unmounting a shared volume used for persistent storage of the state of
+all the ganesha.nfsds in the cluster and starting the ganesha.nfsd
+process on each node.
+
+The ganesha\_mon resource agent is cloned on all nodes in the cluster.
+Each ganesha\_mon resource agent monitors the state of its ganesha.nfsd.
+If the daemon terminates for any reason it initiates the move of its VIP
+to another node in the cluster. A Dummy resource agent is created which
+represents the dead ganesha.nfsd. The ganesha\_grace resource agents use
+this resource to send the correct hostname in the dbus event they send.
+
+The ganesha\_grace resource agent is cloned on all nodes in the cluster.
+Each ganesha\_grace resource agent monitors the states of all
+ganesha.nfsds in the clustger. If any ganesha.nfsd has died, it sends a
+DBUS event to its own ganesha.nfsd to put it into Grace.
+
+IPaddr and Dummy resource agents are created on each node in the
+cluster. Each IPaddr resource agent has a unique name derived from the
+node name (e.g. mynodename-cluster\_ip-1) and manages an associated
+virtual IP address. There is one virtual IP address for each node.
+Initially each IPaddr and its virtual IP address is tied to its
+respective node, and moves to another node when its ganesha.nfsd dies
+for any reason. Each Dummy resource agent has a unique name derived from
+the node name (e.g. mynodename-trigger\_ip-1) is used to ensure the
+proper order of operations, i.e. move the virtual IP, then send the dbus
+signal.
+
+N.B. Originally fail-back was outside the scope for the Everglades
+release. After a redesign we got fail-back for free. If the ganesha.nfsd
+is restarted on a node its virtual IP will automatically fail back.
+
+Benefit to GlusterFS
+--------------------
+
+GlusterFS is expected to be a common storage medium for NFS-Ganesha
+NFSv4 storage solutions. GlusterFS has its own built-in HA feature.
+NFS-Ganesha will ultimately support pNFS, a cluster-aware version of
+NFSv4, but does not have its own HA functionality. This will allow users
+to deploy HA NFS-Ganesha.
+
+Scope
+-----
+
+TBD
+
+### Nature of proposed change
+
+TBD
+
+### Implications on manageability
+
+Simplifies setup of HA by providing a supported solution with a recipe
+for basic configuration plus an automated setup.
+
+### Implications on presentation layer
+
+None
+
+### Implications on persistence layer
+
+A small shared volume is required. The NFSganesha resource agent mounts
+and unmounts the volume when it starts and stops.
+
+This volume is used by the ganesha.nfsd to persist things like its lock
+state and is used by another ganesha.nfsd after a fail-over.
+
+### Implications on 'GlusterFS' backend
+
+A small shared volume is required. The NFSganesha resource agent mounts
+and unmounts the volume when it starts and stops.
+
+This volume must be created before HA setup is attempted.
+
+### Modification to GlusterFS metadata
+
+None
+
+### Implications on 'glusterd'
+
+None
+
+How To Test
+-----------
+
+TBD
+
+User Experience
+---------------
+
+The user experiences is intended to be as seamless and invisible as
+possible. There are a few new CLI commands added that will invoke the
+setup script. The Corosync/Pacemaker setup takes about 15-30 seconds on
+a four node cluster, so there is a short delay between invoking the CLI
+and the cluster being ready.
+
+Dependencies
+------------
+
+GlusterFS CLI and Upcall Infrastructure (see related features).
+
+Documentation
+-------------
+
+<Status of development - Design Ready, In development, Complete> In
+development
+
+Comments and Discussion
+-----------------------
+
+The feature page is not complete as yet. This will be updated regularly.
diff --git a/done/GlusterFS 3.7/Improve Rebalance Performance.md b/done/GlusterFS 3.7/Improve Rebalance Performance.md
new file mode 100644
index 0000000..32a2eec
--- /dev/null
+++ b/done/GlusterFS 3.7/Improve Rebalance Performance.md
@@ -0,0 +1,277 @@
+Feature
+-------
+
+Improve GlusterFS rebalance performance
+
+Summary
+-------
+
+Improve the current rebalance mechanism in GlusterFS by utilizing the
+resources better, to speed up overall rebalance and also to speed up the
+brick removal and addition cases where data needs to be spread faster
+than the current rebalance mechanism does.
+
+Owners
+------
+
+Raghavendra Gowdappa <rgowdapp@redhat.com>
+Shyamsundar Ranganathan <srangana@redhat.com>
+Susant Palai <spalai@redhat.com>
+Venkatesh Somyajulu <vsomyaju@redhat.com>
+
+Current status
+--------------
+
+This section is split into 2 parts, explaining how the current rebalance
+works and what its limitations are.
+
+### Current rebalance mechanism
+
+Currently rebalance works as follows,
+
+​A) Each node in the Gluster cluster kicks off a rebalance process for
+one of the following actions
+
+- layout fixing
+- rebalance data, with space constraints in check
+ - Will rebalance data with file size and disk free availability
+ constraints, and move files that will not cause a brick
+ imbalance in terms of amount of data stored across bricks
+- rebalance data, based on layout precision
+ - Will rebalance data so that the layout is adhered to and hence
+ optimize lookups in the future (find the file where the layout
+ claims it is)
+
+​B) Each nodes process, then uses the following algorithm to proceed,
+
+- 1: Open root of the volume
+- 2: Fix the layout of root
+- 3: Start a crawl on the current directory
+- 4: For each file in the current directory,
+ - 4.1: Determine if file is in the current node (optimize on
+ network reads for file data)
+ - 4.2: If it does, migrate file based on type of rebalance sought
+ - 4.3: End the file crawl once crawl returns no more entries
+- 5: For each directory in the current directory
+ - 5.1: Fix the layout, and iterate on starting the crawl for this
+ directory (goto step 3)
+- 6: End the directory crawl once crawl returns no more entries
+- 7: Cleanup and exit
+
+### Limitations and issues in the current mechanism
+
+The current mechanism spreads the work of rebalance to all nodes in the
+cluster, and also takes into account only files that belong to the node
+on which the rebalance process is running. This spreads the load of
+rebalance well and also optimizes network reads of data, by taking into
+account only files local to the current node.
+
+Where this becomes slow is in the following cases,
+
+​1) It rebalances one file at a time only as it uses the syncop
+infrastructure to start the rebalance of a file issuing a setxattr with
+the special attribute "distribute.migrate-data", which in turn would
+return after its synctask of migrating the file is completes (synctask:
+rebalance\_task)
+
+- This reduces the bandwidth consumption of several resources, like
+ disk, CPU and network as we would read and write a single file at a
+ time
+
+​2) Rebalance of data is serial between reads and writes of data, i.e
+for a file a chunk of data is read from disk, written to the network,
+awaiting an response on the write from the remote node, and then
+proceeds with the next read
+
+- This makes read-write dependent on each other, and waiting for one
+ or the other to complete, so we either have the network idle when
+ reads from disk are in progress or vice-versa
+
+- This further makes serial use of resource like the disk or network,
+ reading or writing one block at a time
+
+​3) Each rebalance process crawls the entire volume for files to
+migrate, and chooses only files that are local to it
+
+- This crawl could be expensive and as a node deals with files that
+ are local to it, based on the cluster size and number of nodes,
+ quite a proportion of the entries crawled would hence be dropped
+
+​4) On a remove brick the current rebalance, ends up rebalancing the
+entire cluster. If the interest is in removing the brick(s) or replacing
+the bricks, realancing the entire cluster can be costly.
+
+​5) On addition of bricks, again the entire cluster is rebalanced. If
+space constraints are in play due to which bricks were added, it is
+sub-optimal to rebalance the entire cluster.
+
+​6) In cases where AFR is below DHT, all the nodes in AFR participate in
+the rebalance, and end up rebalancing (or attempting to) the same set of
+files. This is racy, and could (maybe) be made better.
+
+Detailed Description
+--------------------
+
+The above limitations can be broken down into separate features to
+improve rebalance performance and to also provide options in rebalance
+when specific use cases like quicker brick removal is sought. The
+following sections detail out these improvements.
+
+### Rebalance multiple files in parallel
+
+Instead of rebalancing file by file due to using syncops to trigger the
+rebalance of a files data using setxattr, use the wind infrastructure to
+migrate multiple files at a time. This would end up using the disk and
+network infrastructure better and can hence enable faster rebalance of
+data. This would even mean that when one file is blocked on a disk read
+the other parallel stream could be writing data to the network, hence
+starvation of the read-write-read model between disk and network could
+also be alleviated to a point.
+
+### Split reads and writes of files into separate tasks when rebalancing the data
+
+This is to reduce the wait between a disk read or a network write, and
+to ensure both these resources can be kept busy. By rebalancing more
+files in parallel, this improvement may not be needed, as the parallel
+streams would end up in keeping one or the other resource busy with
+better probability. Noting this enhancement down anyway to see if this
+needs consideration post increasing the parallelism of rebalance as
+above.
+
+### Crawl only bricks that belong to the current node
+
+As explained, the current rebalance takes into account only those files
+that belong to the current node. As this is a DHT level operation, we
+can hence choose not to send opendir/readdir calls to subvolumes that do
+not belong to the current node. This would reduce the crawls that are
+performed in rebalance for files at least and help in speeding up the
+entire process.
+
+NOTE: We would still need to evaluate the cost of this crawl vis-a-vis
+the overall rebalance process, to evaluate its benefits
+
+### Rebalance on access
+
+When removing bricks, one of the intention is to drain the brick of all
+its data and to hence enable removing the brick as soon as possible.
+
+When adding bricks, one of the requirements could be that the cluster is
+reaching its capacity and hence we want to increase the same.
+
+In both the cases rebalancing the entire cluster could/would take time.
+Instead an alternate approach is being proposed, where we do 3 things
+essentially as follows,
+
+- Kick off rebalance to fix layout, and drain a brick of its data,
+ or rebalance files onto a newly added brick
+- On further access of data, if the access is leading to a double
+ lookup or redirection based on layout (due to older bricks data not
+ yet having been rebalanced), start a rebalance of this file in
+ tandem to IO access (call this rebalance on access)
+- Start a slower, or a later, rebalance of the cluster, once the
+ intended use case is met, i.e draining a brick of its data or
+ creating space in other bricks and filling the newly added brick
+ with relevant data. This is to get the cluster balanced again,
+ without requiring data to be accessed.
+
+### Make rebalance aware of IO path requirements
+
+One of the problems of improving resource consumption in a node by the
+rebalance process would be that, we could starve the IO path. So further
+to some of the above enhancements, take into account IO path resource
+utilization (i.e disk/network/CPU) and slow down or speed up the
+rebalance process appropriately (say by increasing or decreasing the
+number of files that are rebalanced in parallel).
+
+NOTE: This requirement is being noted down, just to ensure we do not
+make the IO access to the cluster slow as rebalance is being made
+faster, resources to monitor and tune rebalance may differ when tested
+and experimented upon
+
+### Further considerations
+
+- We could consider some further layout optimization to reduce the
+ amount of data that is being rebalanced
+- Addition of scheduled rebalance, or the ability to stop and
+ continue rebalance from a point, could be useful for preventing IO
+ path slowness in cases where an admin could choose to run rebalance
+ on non-critical hours (do these even exist today?)
+- There are no performance xlators in the rebalance graph. We should
+ try experiments loading them.
+
+Benefit to GlusterFS
+--------------------
+
+Gluster is a grow as you need distributed file system. With this in
+picture, rebalance is key to grow the cluster in relatively sane amount
+of time. This enhancement attempts to speed things up in rebalance, in
+order to better this use case.
+
+Scope
+-----
+
+### Nature of proposed change
+
+This is intended as a modification to existing code only, there are no
+new xlators being introduced. BUT, as things evolve and we consider say,
+layout optimization based on live data or some such notions, we would
+need to extend this section to capture the proposed changes.
+
+### Implications on manageability
+
+The gluster command would need some extensions, for example the number
+of files to process in parallel, as we introduce these changes. As it is
+currently in the prototype phase, keeping this and the sections below as
+TBDs
+
+**Document TBD from here on...**
+
+### Implications on presentation layer
+
+*NFS/SAMBA/UFO/FUSE/libglusterfsclient Integration*
+
+### Implications on persistence layer
+
+*LVM, XFS, RHEL ...*
+
+### Implications on 'GlusterFS' backend
+
+*brick's data format, layout changes*
+
+### Modification to GlusterFS metadata
+
+*extended attributes used, internal hidden files to keep the metadata...*
+
+### Implications on 'glusterd'
+
+*persistent store, configuration changes, brick-op...*
+
+How To Test
+-----------
+
+*Description on Testing the feature*
+
+User Experience
+---------------
+
+*Changes in CLI, effect on User experience...*
+
+Dependencies
+------------
+
+*Dependencies, if any*
+
+Documentation
+-------------
+
+*Documentation for the feature*
+
+Status
+------
+
+Design/Prototype in progress
+
+Comments and Discussion
+-----------------------
+
+*Follow here*
diff --git a/done/GlusterFS 3.7/Object Count.md b/done/GlusterFS 3.7/Object Count.md
new file mode 100644
index 0000000..5c7c014
--- /dev/null
+++ b/done/GlusterFS 3.7/Object Count.md
@@ -0,0 +1,113 @@
+Feature
+-------
+
+Object Count
+
+Summary
+-------
+
+An efficient mechanism to retrieve the number of objects per directory
+or volume.
+
+Owners
+------
+
+Vijaikumar M <vmallika@redhat.com>
+Sachin Pandit <spandit@redhat.com>
+
+Current status
+--------------
+
+Currently, the only way to retrieve the number of files/objects in a
+directory or volume is to do a crawl of the entire directory/volume.
+This is expensive and is not scalable.
+
+The proposed mechanism will provide an easier alternative to determine
+the count of files/objects in a directory or volume.
+
+Detailed Description
+--------------------
+
+The new mechanism proposes to store count of objects/files as part of an
+extended attribute of a directory. Each directory's extended attribute
+value will indicate the number of files/objects present in a tree with
+the directory being considered as the root of the tree.
+
+The count value can be accessed by performing a getxattr(). Cluster
+translators like afr, dht and stripe will perform aggregation of count
+values from various bricks when getxattr() happens on the key associated
+with file/object count.
+
+Benefit to GlusterFS
+--------------------
+
+- Easy to query number of objects present in a volume.
+- Can serve as an accounting mechanism for quota enforcement based on
+ number of inodes.
+- This interface will be useful for integration with OpenStack Swift
+ and Ceilometer.
+
+Scope
+-----
+
+### Nature of proposed change
+
+- Marker translator to be modified to perform accounting on all
+ create/delete operations.
+
+- A new volume option to enable/disable this feature.
+
+### Implications on manageability
+
+- A new volume option to enable/disable this feature.
+- A new CLI interface to display this count at either a volume or
+ directory level.
+
+### Implications on presentation layer
+
+None
+
+### Implications on persistence layer
+
+None
+
+### Implications on 'GlusterFS' backend
+
+None
+
+### Modification to GlusterFS metadata
+
+A new extended attribute for storing count of objects at each directory
+level.
+
+### Implications on 'glusterd'
+
+TBD
+
+How To Test
+-----------
+
+TBD
+
+User Experience
+---------------
+
+TBD
+
+Dependencies
+------------
+
+None
+
+Documentation
+-------------
+
+TBD
+
+Status
+------
+
+Design Ready
+
+Comments and Discussion
+-----------------------
diff --git a/done/GlusterFS 3.7/Policy based Split-brain Resolution.md b/done/GlusterFS 3.7/Policy based Split-brain Resolution.md
new file mode 100644
index 0000000..f7a6870
--- /dev/null
+++ b/done/GlusterFS 3.7/Policy based Split-brain Resolution.md
@@ -0,0 +1,128 @@
+Feature
+-------
+
+This feature provides a way of resolving split-brains based on policies
+from the gluster CLI.
+
+Summary
+-------
+
+This feature provides a way of resolving split-brains based on policies.
+Goal is to give different commands to resolve split-brains using
+policies like 'choose a specific brick as source' and choose the biggest
+files as source etc.
+
+Owners
+------
+
+Ravishankar N
+Pranith Kumar Karampuri
+
+Current status
+--------------
+
+Feature completed.
+
+Detailed Description
+--------------------
+
+Till now, if there is a split-brain manual intervention is required to
+resolve split-brain. But most of the times it so happens that files from
+particular brick are chosen as source or the files with bigger file size
+is chosen as source. This feature provides CLI that can be used to
+resolve the split-brains in the system at that moment using these
+policies.
+
+Benefit to GlusterFS
+--------------------
+
+It improves manageability of resolving split-brains
+
+Scope
+-----
+
+### Nature of proposed change
+
+####Added new gluster CLIs:
+
+1.```gluster volume heal <VOLNAME> split-brain bigger-file <FILE>.```
+
+Locates the replica containing the FILE, selects bigger-file as source
+and completes heal.
+
+2.```gluster volume heal <VOLNAME> split-brain source-brick <HOSTNAME:BRICKNAME> <FILE>.```
+
+Selects ```<FILE>``` present in ```<HOSTNAME:BRICKNAME>``` as source and completes
+heal.
+
+3.```gluster volume heal <VOLNAME> split-brain <HOSTNAME:BRICKNAME>.```
+
+Selects **all** split-brained files in ```<HOSTNAME:BRICKNAME>``` as source
+and completes heal.
+
+Note: ```<FILE>``` can be either the full file name as seen from the root of
+the volume (or) the gfid-string representation of the file, which
+sometimes gets displayed in the heal info command's output.
+
+### Implications on manageability
+
+New CLIs are added to improve manageability of files in split-brain
+
+### Implications on presentation layer
+
+None
+
+### Implications on persistence layer
+
+None
+
+### Implications on 'GlusterFS' backend
+
+None
+
+### Modification to GlusterFS metadata
+
+None
+
+### Implications on 'glusterd'
+
+None
+
+How To Test
+-----------
+
+Create files in data and metadata split-brain. Accessing the files from
+clients gives EIO. Use the CLI commands to pick the source file and
+trigger heal After the CLI returns success, the files should be
+identical on the replica bricks and must be accessible again by the
+clients
+
+User Experience
+---------------
+
+New CLIs are introduced.
+
+Dependencies
+------------
+
+None
+
+Documentation
+-------------
+
+TODO: Add an md file in glusterfs/doc.
+
+Status
+------
+
+Feature completed. Main and dependency patches:
+
+<http://review.gluster.org/9377>
+<http://review.gluster.org/9375>
+<http://review.gluster.org/9376>
+<http://review.gluster.org/9439>
+
+Comments and Discussion
+-----------------------
+
+---
diff --git a/done/GlusterFS 3.7/SE Linux Integration.md b/done/GlusterFS 3.7/SE Linux Integration.md
new file mode 100644
index 0000000..8d282e6
--- /dev/null
+++ b/done/GlusterFS 3.7/SE Linux Integration.md
@@ -0,0 +1,4 @@
+SELinux Integration
+-------------------
+
+The work here is really to get SELinux to work with gluster (saving labels on gluster inodes etc.), and most of the work is really outside Gluster. There's really not any coding involved in the gluster side, but to push for things in the SELinux project to get the right policies and code changes in the kernel to deal with FUSE based filesystems. In the process we might discover issues in the gluster side (not sure what they are) - and I would like to fix those not-yet-known problems before 3.5. \ No newline at end of file
diff --git a/done/GlusterFS 3.7/Scheduling of Snapshot.md b/done/GlusterFS 3.7/Scheduling of Snapshot.md
new file mode 100644
index 0000000..0b2b49c
--- /dev/null
+++ b/done/GlusterFS 3.7/Scheduling of Snapshot.md
@@ -0,0 +1,229 @@
+Feature
+-------
+
+Scheduling Of Snapshot
+
+Summary
+-------
+
+GlusterFS volume snapshot provides point-in-time copy of a GlusterFS
+volume. Currently, GlusterFS volume snapshots can be easily scheduled by
+setting up cron jobs on one of the nodes in the GlusterFS trusted
+storage pool. This has a single point failure (SPOF), as scheduled jobs
+can be missed if the node running the cron jobs dies.
+
+We can avoid the SPOF by distributing the cron jobs to all nodes of the
+trusted storage pool.
+
+Owner(s)
+--------
+
+Avra Sengupta <asengupt@redhat.com>
+
+Copyright
+---------
+
+Copyright (c) 2015 Red Hat, Inc. <http://www.redhat.com>
+
+This feature is licensed under your choice of the GNU Lesser General
+Public License, version 3 or any later version (LGPLv3 or later), or the
+GNU General Public License, version 2 (GPLv2), in all cases as published
+by the Free Software Foundation.
+
+Detailed Description
+--------------------
+
+The solution to the above problems involves the usage of:
+
+- A shared storage - A gluster volume by the name of
+ "gluster\_shared\_storage" is used as a shared storage across nodes
+ to co-ordinate the scheduling operations. This shared storage is
+ mounted at /var/run/gluster/shared\_storage on all the nodes.
+
+- An agent - This agent will perform the actual snapshot commands,
+ instead of cron. It will contain the logic to perform coordinated
+ snapshots.
+
+- A helper script - This script will allow the user to initialise the
+ scheduler on the local node, enable/disable scheduling,
+ add/edit/list/delete snapshot schedules.
+
+- cronie - It is the default cron daemon shipped with RHEL. It invokes
+ the agent at the appropriate intervals as mentioned by the user to
+ perform the snapshot operation on the volume as mentioned by the
+ user in the schedule.
+
+Initial Setup
+-------------
+
+The administrator needs to create a shared storage that can be available
+to nodes across the cluster. A GlusterFS volume by the name of
+"gluster\_shared\_storage" should be created for this purpose. It is
+preferable that the \*shared volume\* be a replicate volume to avoid
+SPOF.
+
+Once the shared storage is created, it should be mounted on all nodes in
+the trusted storage pool which will be participating in the scheduling.
+The location where the shared\_storage should be mounted
+(/var/run/gluster/shared\_storage) in these nodes is fixed and is not
+configurable. Each node participating in the scheduling then needs to
+perform an initialisation of the snapshot scheduler by invoking the
+following:
+
+snap\_scheduler.py init
+
+NOTE: This command needs to be run on all the nodes participating in the
+scheduling
+
+Helper Script
+-------------
+
+The helper script(snap\_scheduler.py) will initialise the scheduler on
+the local node, enable/disable scheduling, add/edit/list/delete snapshot
+schedules.
+
+​a) snap\_scheduler.py init
+
+This command initialises the snap\_scheduler and interfaces it with the
+crond running on the local node. This is the first step, before
+executing any scheduling related commands from a node.
+
+NOTE: The helper script needs to be run with this option on all the
+nodes participating in the scheduling. Other options of the helper
+script can be run independently from any node, where initialisation has
+been successfully completed.
+
+​b) snap\_scheduler.py enable
+
+The snap scheduler is disabled by default after initialisation. This
+command enables the snap scheduler.
+
+​c) snap\_scheduler.py disable
+
+This command disables the snap scheduler.
+
+​d) snap\_scheduler.py status
+
+This command displays the current status(Enabled/Disabled) of the snap
+scheduler.
+
+​e) snap\_scheduler.py add "Job Name" "Schedule" "Volume Name"
+
+This command adds a new snapshot schedule. All the arguments must be
+provided within double-quotes(""). It takes three arguments:
+
+-\> Job Name: This name uniquely identifies this particular schedule,
+and can be used to reference this schedule for future events like
+edit/delete. If a schedule already exists for the specified Job Name,
+the add command will fail.
+
+-\> Schedule: The schedules are accepted in the format crond
+understands:-
+
+1. Example of job definition:
+2. .---------------- minute (0 - 59)
+3. | .------------- hour (0 - 23)
+4. | | .---------- day of month (1 - 31)
+5. | | | .------- month (1 - 12) OR jan,feb,mar,apr ...
+6. | | | | .---- day of week (0 - 6) (Sunday=0 or 7) OR
+ sun,mon,tue,wed,thu,fri,sat
+7. | | | | |
+8. - - - - - user-name command to be executed
+
+Although we accept all valid cron schedules, currently we support
+granularity of snapshot schedules to a maximum of half-hourly snapshots.
+
+-\> Volume Name: The name of the volume on which the scheduled snapshot
+operation will be performed.
+
+​f) snap\_scheduler.py edit "Job Name" "Schedule" "Volume Name"
+
+This command edits an existing snapshot schedule. It takes the same
+three arguments that the add option takes. All the arguments must be
+provided within double-quotes(""). If a schedule does not exists for the
+specified Job Name, the edit command will fail.
+
+​g) snap\_scheduler.py delete "Job Name"
+
+This command deletes an existing snapshot schedule. It takes the job
+name of the schedule as argument. The argument must be provided within
+double-quotes(""). If a schedule does not exists for the specified Job
+Name, the delete command will fail.
+
+​h) snap\_scheduler.py list
+
+This command lists the existing snapshot schedules in the following
+manner: Pseudocode:
+
+``\
+`# snap_scheduler.py list`\
+`JOB_NAME         SCHEDULE         OPERATION        VOLUME NAME      `\
+`--------------------------------------------------------------------`\
+`Job0             * * * * *        Snapshot Create  test_vol    `
+
+The agent
+---------
+
+The snapshots scheduled with the help of the helper script, are read by
+crond which then invokes the agent(gcron.py) at the scheduled intervals
+to perform the snapshot operations on the specified volumes. It then
+performs the scheduled snapshots using the following algorithm to
+coordinate.
+
+Pseudocode:
+
+``\
+`start_time = get current time`\
+`lock_file = job_name passed as an argument`\
+`vol_name = volume name psased as an argument`\
+`try POSIX locking the $lock_file`\
+`    if lock is obtained, then`\
+`        mod_time = Get modification time of $entry`\
+`        if $mod_time < $start_time, then`\
+`            Take snapshot of $entry.name (Volume name)`\
+`            if snapshot failed, then`\
+`                log the failure`\
+`            Update modification time of $entry to current time`\
+`        unlock the $entry`
+
+The coordination with other scripts running on other nodes, is handled
+by the use of POSIX locks. All the instances of the script will attempt
+to lock the lock\_file which is essentialy an empty file with the job
+name, and one which gets the lock will take the snapshot.
+
+To prevent redoing a done task, the script will make use of the mtime
+attribute of the entry. At the beginning execution, the script would
+have saved its start time. Once the script obtains the lock on the
+lock\_file, before taking the snapshot, it compares the mtime of the
+entry with the start time. The snapshot will only be taken if the mtime
+is smaller than start time. Once the snapshot command completes, the
+script will update the mtime of the lock\_file to the current time
+before unlocking.
+
+If a snapshot command fails, the script will log the failure (in syslog)
+and continue with its operation. It will not attempt to retry the failed
+snapshot in the current schedule, but will attempt it again in the next
+schedules. It is left to the administrator to monitor the logs and
+decide what to do after a failure.
+
+Assumptions and Limitations
+---------------------------
+
+It is assumed that all nodes in the have their times synced using NTP or
+any other mechanism. This is a hard requirement for this feature to
+work.
+
+The administrator needs to have python2.7 or higher installed, as well
+as the argparse module installed, to be able to use the helper
+script(snap\_scheduler.py).
+
+There is a latency of one minute, between providing a command by the
+helper script and that command taking effect. Hence, currently we do not
+support snapshot schedules with per minute granularity.
+
+The administrator can however leverage the scheduler to schedule
+snapshots with granularity of
+half-hourly/hourly/daily/weekly/monthly/yearly periodic intervals. They
+can also schedule snapshots, which are customised mentioning which
+minute of the hour, which day of the week, which week of the month, and
+which month of the year, they want to schedule the snapshot operation.
diff --git a/done/GlusterFS 3.7/Sharding xlator.md b/done/GlusterFS 3.7/Sharding xlator.md
new file mode 100644
index 0000000..b33d698
--- /dev/null
+++ b/done/GlusterFS 3.7/Sharding xlator.md
@@ -0,0 +1,129 @@
+Goal
+----
+
+Better support for striping.
+
+Summary
+-------
+
+The current stripe translator, below DHT, requires that bricks be added
+in a multiple of the stripe count times the replica/erasure count. It
+also means that failures or performance anomalies in one brick
+disproportionately affect one set of striped files (a fraction equal to
+stripe count divided by total bricks) while the rest remain unaffected.
+By moving above DHT, we can avoid both the configuration limit and the
+performance asymmetry.
+
+Owners
+------
+
+Vijay Bellur <vbellur@redhat.com>
+Jeff Darcy <jdarcy@redhat.com>
+Pranith Kumar Karampuri <pkarampu@redhat.com>
+Krutika Dhananjay <kdhananj@redhat.com>
+
+Current status
+--------------
+
+Proposed, waiting until summit for approval.
+
+Related Feature Requests and Bugs
+---------------------------------
+
+None.
+
+Detailed Description
+--------------------
+
+The new sharding translator sits above DHT, creating "shard files" that
+DHT is then responsible for distributing. The shard translator is thus
+oblivious to the topology under DHT, even when that changes (or for that
+matter when the implementation of DHT changes). Because the shard files
+will each be hashed and placed separately by DHT, we'll also be using
+more combinations of DHT subvolumes and the effect of any imbalance
+there will be distributed more evenly.
+
+Benefit to GlusterFS
+--------------------
+
+More configuration flexibility and resilience to failures.
+
+Data transformations such as compression or de-duplication would benefit
+from sharding because portions of the file may be processed rather than
+exclusively at whole-file granularity. For example, to read a small
+extent from the middle of a compressed large file, only the shards
+overlapping the extent would need to be decompressed. Sharding could
+mean the "chunking" step is not needed at the dedupe level. For example,
+if a small portion of a de-duplicated file was modified, only the shard
+that changed would need to be reverted to an original non-deduped state.
+The untouched shards could continue as deduped and their savings
+maintained.
+
+The cache tiering feature would benefit from sharding. Currently large
+files must be migrated in full between tiers, even if only a small
+portion of the file is accessed. With sharding, only the shard accessed
+would need to be migrated.
+
+Scope
+-----
+
+### Nature of proposed change
+
+Most of the existing stripe translator remains applicable, except that
+it needs to be adapted to its new location above DHT instead of below.
+In particular, it needs to generate unique shard-file names and pass
+them all down to the same (DHT) subvolume, instead of using the same
+name across multiple (AFR/client) subvolumes.
+
+### Implications on manageability
+
+None, except perhaps the name change ("shard" vs. "stripe").
+
+### Implications on presentation layer
+
+None.
+
+### Implications on persistence layer
+
+None.
+
+### Implications on 'GlusterFS' backend
+
+None.
+
+### Modification to GlusterFS metadata
+
+Possibly some minor xattr changes.
+
+### Implications on 'glusterd'
+
+None.
+
+How To Test
+-----------
+
+Current stripe tests should still be applicable. More should be written,
+since it's a little used feature and not many exist currently.
+
+User Experience
+---------------
+
+None, except the name change.
+
+Dependencies
+------------
+
+None.
+
+Documentation
+-------------
+
+TBD, probably minor.
+
+Status
+------
+
+Work In Progress
+
+Comments and Discussion
+-----------------------
diff --git a/done/GlusterFS 3.7/Small File Performance.md b/done/GlusterFS 3.7/Small File Performance.md
new file mode 100644
index 0000000..3b868a6
--- /dev/null
+++ b/done/GlusterFS 3.7/Small File Performance.md
@@ -0,0 +1,433 @@
+Feature
+-------
+
+Small-file performance
+
+Summary
+-------
+
+This page describes a menu of optimizations that together can improve
+small-file performance, along with expected cases where optimizations
+matter, degree of improvement expected, and degree of difficulty. -
+
+Owners
+------
+
+Shyamsundar Ranganathan <srangana@redhat.com>
+
+Ben England <bengland@redhat.com>
+
+Current status
+--------------
+
+Some of these optimizations are proposed patches upstream, some are also
+features being planned, such as Darcy's Gluster V4 DHT and NSR changes,
+and some are not specified yet at all. Where they already exist in some
+form links will be provided.
+
+Some previous optimizations have been included already in the Gluster
+code base, such as quick-read and open-behind translators. While these
+were useful and do improve throughput, they do not solve the general
+problem.
+
+Detailed Description
+--------------------
+
+What is a small file? While this term seems ambiguous, it really is just
+a file where the metadata access time far exceeds the data access time.
+Another term used for this is "metadata-intensive workload". To be
+clear, it is possible to have a metadata-intensive workload running
+against large files, if it is not the file data that is being accessed
+(example: "ls -l", "rm"). But what we are really concerned with is
+throughput and response time of common operations on files where the
+data is being accessed but metadata access time is severely restricting
+throughput.
+
+Why do we have a performance problem? We would expect that Gluster
+small-file performance would be within some reasonable percentage of the
+bottleneck determined by network performance and storage performance,
+and that a user would be happy to pay a performance "tax" in order to
+achieve scalability and high-availability that Gluster offers, as well
+as a wealth of functionality. However, repeatedly we see cases where
+Gluster small-file perf is an order of magnitude off of these
+bottlenecks, indicating that there are flaws in the software. This
+interferes with the most common tasks that a system admin or user has to
+perform, such as copying files into or out of Gluster, migrating or
+rebalancing data, self-heal,
+
+So why do we care? Many of us anticipated that many Gluster workloads
+would have increasingly large files, however we are continuing to
+observe that Gluster workloads such as "unstructured data", are
+surprisingly metadata-intensive. As compute and storage power increase
+exponentially, we would expect that the average size of a storage object
+would also increase, but in fact it hasn't -- in several common cases we
+have files as small as 100 KB average size, or even 7 KB average size in
+one case. We can tell customers to rewrite their applications, or we can
+improve Gluster to be adequate for their needs, even if it isn't the
+design center for Gluster.
+
+The single-threadedness of many customer applications (examples include
+common Linux utilities such as rsync and tar) amplifies this problem,
+converting what was a throughput problem into a *latency* problem.
+
+Benefit to GlusterFS
+--------------------
+
+Improvement of small-file performance will remove a barrier to
+widespread adoption of this filesystem for mainstream use.
+
+Scope
+-----
+
+Although the scope of the individual changes is limited, the overall
+scope is very wide. Some changes can be done incrementally, and some
+cannot. That is why changes are presented as a menu rather than an
+all-or-nothing proposal.
+
+We know that scope of DHT+NSR V4 is large and changes will be discussed
+elsewhere, so we won't cover that here.
+
+##### multi-thread-epoll
+
+*Status*: DONE in glusterfs-3.6! [ <http://review.gluster.org/#/c/3842/>
+based on Anand Avati's patch ]
+
+*Why*: remove single-thread-per-brick barrier to higher CPU utilization
+by servers
+
+*Use case*: multi-client and multi-thread applications
+
+*Improvement*: measured 40% with 2 epoll threads and 100% with 4 epoll
+threads for small file creates to an SSD
+
+*Disadvantage*: might expose some race conditions in Gluster that might
+otherwise happen far less frequently, because receive message processing
+will be less sequential. These need to be fixed anyway.
+
+**Note**: this enhancement also helps high-IOPS applications such as
+databases and virtualization which are not metadata-intensive. This has
+been measured already using a Fusion I/O SSD performing random reads and
+writes -- it was necessary to define multiple bricks per SSD device to
+get Gluster to the same order of magnitude IOPS as a local filesystem.
+But this workaround is problematic for users, because storage space is
+not properly measured when there are multiple bricks on the same
+filesystem.
+
+##### remove io-threads translator
+
+*Status*: no patch yet, hopefully can be tested with volfile edit
+
+*Why*: don't need io-threads xlator now. Anand Avati suggested this
+optimization was possible. io-threads translator was created to allow a
+single "epoll" thread to launch multiple concurrent disk I/O requests,
+and this made sense back in the era of 1-GbE networking and rotational
+storage. However, thread context switching is getting more and more
+expensive as CPUs get faster. For example, switching between threads on
+different NUMA nodes is very costly. Switching to a powered-down core is
+also expensive. And context switch makes the CPUs forget whatever they
+learned about the application's memory and instructions. So this
+optimization could be vital as we try to make Gluster competitive in
+performance.
+
+*Use case*: lower latency for latency-sensitive workloads such as
+single-thread or single-client loads, and also improve efficiency of
+glusterfsd process.
+
+*Improvement*: no data yet
+
+*Disadvantage*: we need to have a much bigger e-poll thread pool to keep
+a large set of disks busy. In principle this is no worse than having
+io-threads pool, is it?
+
+##### glusterfsd stat and xattr cache
+
+Please see feature page
+[Features/stat-xattr-cache](../GlusterFS 4.0/stat-xattr-cache.md)
+
+*Why*: remove most system call latency from small-file read and create
+in brick process (glusterfsd)
+
+*Use case*: single-thread throughput, response time
+
+##### SSD and glusterfs tiering feature
+
+*Status*: [
+<http://www.gluster.org/community/documentation/index.php/Features/data-classification>
+feature page ]
+
+This is Jeff Darcy's proposal for re-using portions of DHT
+infrastructure to do storage tiering and other things. One possible use
+of this data classification feature is SSD caching of hot files, which
+Dan Lambright has begun to implement and demo.
+
+also see [
+<https://www.mail-archive.com/gluster-devel@gluster.org/msg00385.html>
+discussion in gluster-devel ]
+
+*Improvement*: results are particularly dramatic with erasure coding for
+small files, Dan's single-thread demo of 20-KB file reads showed a 100x
+reduction in latency with O\_DIRECT reads.
+
+*Disadvantages*: this will not help and may even slow down workloads
+with a "working set" (set of concurrently active files) much larger than
+the SSD tier, or with a rapidly changing working set that prevents the
+cache from "warming up". At present tiering works at the level of the
+entire file, which means it could be very expensive for some
+applications such as virtualization that do not read the entire file, as
+Ceph found out.
+
+##### migrate .glusterfs to SSD
+
+*Status*: [ <https://forge.gluster.org/gluster-meta-data-on-ssd> Dan
+Lambright's code for moving .glusterfs to SSD ]
+
+Also see [
+<http://blog.gluster.org/2014/03/experiments-using-ssds-with-gluster/> ]
+for background on other attempts to use SSD without changing Gluster to
+be SSD-aware.
+
+*Why*: lower latency of metadata access on disk
+
+*Improvement*: a small smoke test showed a 10x improvement for
+single-thread create, it is expected that this will help small-file
+workloads that are not cache-friendly.
+
+*Disadvantages*: This will not help large-file workloads. It will not
+help workloads where the Linux buffer cache is sufficient to get a high
+cache hit rate.
+
+*Costs*: Gluster bricks now have an external dependency on an SSD device
+- what if it fails?
+
+##### tiering at block device level
+
+*Status*: transparent to GlusterFS core. We mention it here because it
+is a design alternative to preceding item (.glusterfs in SSD).
+
+This option includes use of Linux features like dm-cache (Device Mapper
+caching module) to accelerate reads and writes to Gluster "brick"
+filesystems. Early experimentation with firmware-resident SSD caching
+algorithms suggests that this isn't as maintainable and flexible as a
+software-defined implementation, but these too are transparent to
+GlusterFS.
+
+*Use Case*: can provide acceleration for data ingest, as well as for
+cache-friendly read-intensive workloads where the total size of the hot
+data subset fits within SSD.
+
+*Improvement*: For create-intensive workloads, normal writeback caching
+in RAID controllers does provide some of the same benefits at lower
+cost. For very small files, read acceleration can be as much as 10x if
+SSD cache hits are obtained (and if the total size of hot files does NOT
+fit in buffer cache). BTW, This approach can also have as much as a 30x
+improvement in random read and write performance under these conditions.
+This could also provide lower response times for Device Mapper thin-p
+metadata device.
+
+**NOTE**: we have to change our workload generation to use a non-uniform
+file access distribution, preferably with a *moving* mean, to
+acknowledge that in real-world workloads, not all files are equally
+accessed, and that the "hot" files change over time. Without these two
+workload features, we are not going to observe much benefit from cache
+tiering.
+
+*Disadvantage*: This does not help sequential workloads. It does not
+help workloads where Linux buffer cache can provide cache hits. Because
+this caching is done on the server and not the client, there are limits
+imposed by network round trips on response times that limit the
+improvement.
+
+*Costs*: This adds complexity to the already-complex Gluster brick
+configuration process.
+
+##### cluster.lookup-unhashed auto
+
+*Status*: DONE in glusterfs-3.7! [ <http://review.gluster.org/#/c/7702/>
+Jeff Darcy patch ]
+
+Why: When safe, don't lookup path on every brick before every file
+create, in order to make small-file creation scalable with brick, server
+count
+
+**Note**: With JBOD bricks, we are going to hit this scalability wall a
+lot sooner for smallfile creates!!!
+
+*Use case*: small-file creates of any sort with large brick counts
+
+*Improvement*: [
+<https://s3.amazonaws.com/ben.england/small-file-perf-feature-page.pdf>
+graphs ]
+
+*Costs*: Requires monitoring hooks, see below.
+
+*Disadvantage*: if DHT subvolumes are added/removed, how quickly do we
+recover to state where we don't have to do the paranoid thing and lookup
+on every DHT subvolume? As we scale, does DHT subvolume addition/removal
+become a significantly more frequent occurrence?
+
+##### lower RPC calls per file access
+
+Please see
+[Features/composite-operations](../GlusterFS 4.0/composite-operations.md)
+page for details.
+
+*Status*: no proposals exist for this, but NFS compound RPC and SMB ANDX
+are examples, and NSR and DHT for Gluster V4 are necessary for this.
+
+*Why*: reduce round-trip-induced latency between Gluster client and
+servers.
+
+*Use case*: small file creates -- example: [
+<https://bugzilla.redhat.com/show_bug.cgi?id=1086681> bz-1086681 ]
+
+*Improvement*: small-file operations can avoid pessimistic round-trip
+patterns, and small-file creates can potentially avoid round trips
+required because of AFR implementation. For clients with high round-trip
+time to server, this has a dramatic improvement in throughput.
+
+*Costs*: some of these code modifications are very non-trivial.
+
+*Disadvantage*: may not be backward-compatible?
+
+##### object-store API
+
+Some of details are covered in
+[Features/composite-operations](../GlusterFS 4.0/composite-operations.md)
+
+*Status*: Librados in Ceph and Swift in OpenStack are examples. The
+proposal would be to create an API that lets you do equivalent of Swift
+PUT or GET, including opening/creating a file, accessing metadata, and
+transferring data, in a single API call.
+
+*Why*: on creates, allow application to avoid many round trips to server
+to do lookups, create the file, then retrieve the data, then set
+attributes, then close the file. On reads, allow application to get all
+data in a single round trip (like Swift API).
+
+*Use case*: applications which do not have to use POSIX, such as
+OpenStack Swift.
+
+*Improvement*: for clients that have a long network round trip time to
+server, performance improvement could be 5x. Load on the server could be
+greatly reduced due to lower context-switching overhead.
+
+*Disadvantage*: Without preceding reduction in round trips, the changed
+API may not result in much performance gain if any.
+
+##### dentry injection
+
+*Why*: This is not about small files themselves, but applies to
+directories full of many small files. No matter how much we prefetch
+directory entries from the server to the client, directory-listing speed
+will still be limited by context switches from the application to the
+glusterfs client process. One way to ameliorate this would be to
+prefetch entries and *inject* them into FUSE, so that when the
+application asks they'll be available directly from the kernel.
+
+*Status*: Have discussed this with Brian Foster, not aware of subsequent
+attempts/measurements.
+
+*Use case*: All of those applications which insist on listing all files
+in a huge directory, plus users who do so from the command line. We can
+warn people and recommend against this all we like, but "ls" is often
+one of the first things users do on their new file system and it can be
+hard to recover from a bad first impression.
+
+*Improvement*: TBS, probably not much impact until we have optimized
+directory browsing round trips to server as discussed in
+composite-operations.
+
+*Disadvantage*: Some extra effort might be required to deal with
+consistency issues.
+
+### Implications on manageability
+
+lookup-unhashed=auto implies that the system can, by adding/removing DHT
+subvolumes, get itself into a state where it is not safe to do file
+lookup using consistent hashing, until a rebalance has completed. This
+needs to be visible at the management interface so people know why their
+file creates have slowed down and when they will speed up again.
+
+Use of SSDs implies greater complexity and inter-dependency in managing
+the system as a whole (not necessarily Gluster).
+
+### Implications on presentation layer
+
+No change is required for multi-thread epoll, xattr+stat cache,
+lookup-unhashed=off. If Swift uses libgfapi then Object-Store API
+proposal affects it. DHT and NSR changes will impact management of
+Gluster but should be transparent to translators farther up the stack
+perhaps?
+
+### Implications on persistence layer
+
+None
+
+### Implications on 'GlusterFS' backend
+
+Massive changes would be required for DHT and NSR V4 to on-disk format.
+
+### Modification to GlusterFS metadata
+
+lookup-unhashed-auto change would require an additional xattr to track
+cases where it's not safe to trust consistent hashing for a directory?
+
+### Implications on 'glusterd'
+
+DHT+NSR V4 require big changes to glusterd, covered elsewhere.
+
+How To Test
+-----------
+
+Small-file performance testing methods are discussed in [Gluster
+performance test
+page](http://gluster.readthedocs.org/en/latest/Administrator%20Guide/Performance%20Testing/)
+
+User Experience
+---------------
+
+We anticipate that user experience will become far more pleasant as the
+system performance matches the user expectations and the hardware
+capacity. Operations like loading data into Gluster and running
+traditional NFS or SMB apps will be completed in a reasonable amount of
+time without heroic effort from sysadmins.
+
+SSDs are becoming an increasingly important form of storage, possibly
+even replacing traditional spindles for some high-IOPS apps in the 2016
+timeframe. Multi-thread-epoll and xattr+stat caching are a requirement
+for Gluster to utilize more CPUs, and utilize them more efficiently, to
+keep up with SSDs.
+
+Dependencies
+------------
+
+None other than above.
+
+Documentation
+-------------
+
+lookup-unhashed-auto behavior and how to monitor it will have to be
+documented.
+
+Status
+------
+
+Design-ready
+
+Comments and Discussion
+-----------------------
+
+This work can be, and should be, done incrementally. However, if we
+order these investments by ratio of effort to perf improvement, it might
+look like this:
+
+- multi-thread-epoll (done)
+- lookup-unhashed-auto (done)
+- remove io-threads translator (from brick)
+- .glusterfs on SSD (prototyped)
+- cache tiering (in development)
+- glusterfsd stat+xattr cache
+- libgfapi Object-Store API
+- DHT in Gluster V4
+- NSR
+- reduction in RPCs/file-access
diff --git a/done/GlusterFS 3.7/Trash.md b/done/GlusterFS 3.7/Trash.md
new file mode 100644
index 0000000..cc03ccd
--- /dev/null
+++ b/done/GlusterFS 3.7/Trash.md
@@ -0,0 +1,182 @@
+Feature
+-------
+
+Trash translator for GlusterFS
+
+Summary
+-------
+
+This feature will enable user to temporarily store deleted files from
+GlusterFS for a specified time period.
+
+Owners
+------
+
+Anoop C S <achiraya@redhat.com>
+
+Jiffin Tony Thottan <jthottan@redhat.com>
+
+Current status
+--------------
+
+In the present scenario deletion by a user results in permanent removal
+of a file from the storage pool. An incompatible translator code for
+trash is currently available as part of codebase. On the other side
+gluster cli lacks a volume set option to load the trash translator in
+volume graph.
+
+Detailed Description
+--------------------
+
+Trash is a desired feature for users who accidentally delete some files
+and may need to get back those in near future. Currently, GlusterFS
+codebase includes a translator for trash which is not compatible with
+the current version and so is not usable by users. Trash feature is
+planned to be implemented as a separate directory in every single brick
+inside a volume. This would be achieved by a volume set option from
+gluster cli.
+
+A file can only be deleted when all hard links to it has been completely
+removed. This feature can be extended to operations like truncation
+where we need to retain the original file.
+
+Benefit to GlusterFS
+--------------------
+
+With the implementation of trash, accidental deletion of files can be
+easily avoided.
+
+Scope
+-----
+
+### Nature of proposed change
+
+Proposed implementation mostly involves modifications to existing code
+for trash translator.
+
+### Implications on manageability
+
+Gluster cli will provide an option for creating trash directories on
+various bricks.
+
+### Implications on presentation layer
+
+None
+
+### Implications on persistence layer
+
+None
+
+### Implications on 'GlusterFS' backend
+
+The overall brick structure will include a separate section for trash in
+which regular files will not be stored, i.e. space occupied by the trash
+become unusable.
+
+### Modification to GlusterFS metadata
+
+The original path of files can be stored as an extended attribute.
+
+### Implications on 'glusterd'
+
+An alert can be triggered when trash exceeds a particular size limit.
+Purging of a file from trash depends on its size and age attributes or
+other policies.
+
+### Implications on Rebalancing
+
+Trash can act as an intermediate storage when a file is moved from one
+brick to another during rebalancing of volumes.
+
+### Implications on Self-healing
+
+Self-healing must avoid the chance of re-creating a file which was
+deleted from a brick while one among the other bricks were offline.
+Trash can be used to track the deleted file inside a brick.
+
+### Scope of Recovery
+
+This feature can enhance the restoring of files to previous locations
+through gluster cli with the help of extended attributes residing along
+with the file.
+
+How To Test
+-----------
+
+Functionality of this trash translator can be checked by the file
+operations deletion and truncation or using gluster internal operations
+like self heal and rebalance.
+
+Steps :
+
+1.)Create a glusterfs volume.
+
+2.)Start the volume
+
+3.)Mount the the volume
+
+4.) Check whether ".trashcan" directory is created on mount or not.By
+default the trash directory will be created when volume is started but
+files are not moved to trash directory when deletion or truncation is
+performed until trash translator is on.
+
+5.) The name for trash directory is user configurable option and its
+default value is ".trashcan".It can be configured only when volume is
+started.We cannot remove and rename the trash directory from the
+mount(like .glusterfs directory)
+
+6.) Set features.trash on
+
+7.) Create some files in the mount and perform deletion or truncation on
+those files.Check whether these files are recreated under trash
+directory appending time stamp on the file name. For example,
+
+        [root@rh-host ~]#mount -t glusterfs rh-host:/test /mnt/test
+        [root@rh-host ~]#mkdir /mnt/test/abc
+        [root@rh-host ~]#touch /mnt/test/abc/file
+        [root@rh-host ~]#rm /mnt/test/abc/filer
+        remove regular empty file ‘/mnt/test/abc/file’? y
+        [root@rh-host ~]#ls /mnt/test/abc
+        [root@rh-host ~]# 
+        [root@rh-host ~]#ls /mnt/test/.trashcan/abc/
+        file2014-08-21_123400
+
+8.) Check whether files deleted from trash directory are permanently
+removed
+
+9.) Perform internal operations such as rebalance and self heal on the
+volume.Check whether files are created under trash directory as result
+of internal-ops[we can also make trash translator exclusively for
+internal operations by setting the option features.trash-internal-op on]
+
+10.) Reconfigure the trash directory name and check whether file are
+retained in the new one.
+
+11.) Check whether other options for trash translator such as eliminate
+pattern and maxium file size is working or not.
+
+User Experience
+---------------
+
+Users can access files which were deleted accidentally or intentionally
+and can review the original file which was truncated.
+
+Dependencies
+------------
+
+None
+
+Documentation
+-------------
+
+None
+
+Status
+------
+
+Under review
+
+Comments and Discussion
+-----------------------
+
+Follow here
diff --git a/done/GlusterFS 3.7/Upcall Infrastructure.md b/done/GlusterFS 3.7/Upcall Infrastructure.md
new file mode 100644
index 0000000..47cc8d6
--- /dev/null
+++ b/done/GlusterFS 3.7/Upcall Infrastructure.md
@@ -0,0 +1,747 @@
+Feature
+-------
+
+Framework on the server-side, to handle certain state of the files
+accessed and send notifications to the clients connected.
+
+Summary
+-------
+
+A generic and extensible framework, used to maintain states in the
+glusterfsd process for each of the files accessed (including the clients
+info doing the fops) and send notifications to the respective glusterfs
+clients incase of any change in that state.
+
+Few of the use-cases (currently identified) of this infrastructure are:
+
+- Inode Update/Invalidation
+- Recall Delegations/lease locks
+- Maintain Share Reservations/Locks states.
+- Refresh attributes in md-cache
+
+One of the initial consumers of this feature is NFS-ganesha.
+
+Owners
+------
+
+Soumya Koduri <skoduri@redhat.com>
+
+Poornima Gurusiddaiah <pgurusid@redhat.com>
+
+Current status
+--------------
+
+- Currently there is no such infra available in GlusterFS which can
+ notify clients incase of any change in the file state.
+- There is no support of lease and shared locks.
+
+Drawbacks
+---------
+
+- NFS-ganesha cannot service as Multi-Head and have Active-Active HA
+ support.
+- NFS-ganesha cannot support NFSv4 delegations and Open share
+ reservations.
+
+Related Feature Requests and Bugs
+---------------------------------
+
+<http://www.gluster.org/community/documentation/index.php/Features/Gluster_CLI_for_ganesha>
+
+<http://www.gluster.org/community/documentation/index.php/Features/HA_for_ganesha>
+
+Detailed Description
+--------------------
+
+There are various scenarios which require server processes notify
+certain events/information to the clients connected to it (by means of
+callbacks). Few of such cases are
+
+Cache Invalidation:
+: Each of the GlusterFS clients/applications cache certain state of
+ the files (for eg, inode or attributes). In a muti-node environment
+ these caches could lead to data-integrity issues, for certain time,
+ if there are multiple clients accessing the same file
+ simulataneously.
+
+: To avoid such scenarios, we need server to notify clients incase of
+ any change in the file state/attributes.
+
+Delegations/Lease-locks:
+: Currently there is no support of lease locks/delegations in
+ GlusterFS. We need a infra to maintain those locks state on server
+ side and send notifications to recall those locks incase of any
+ conflicting access by a different client. This can be acheived by
+ using the Upcalls infra.
+
+Similar to above use-cases, this framework could easily be extended to
+handle any other event notifications required to be sent by server.
+
+### Design Considerations
+
+Upcall notifications are RPC calls sent from Gluster server process to
+the client.
+
+Note : A new rpc procedure has been added to "GlusterFS Callback"
+program to send notifications. This rpc call support from gluster server
+to client has been prototyped by Poornima Gurusiddaiah(multi-protocol
+team). We have taken that support and enhanced it to suit our
+requirements.
+
+"clients" referred below are GlusterFS clients. GlusterFS server just
+need to store the details of the clients accessing the file and these
+clients when notified can lookup the corresponding file entry based on
+the gfid, which it need to take action upon and intimate the application
+accordingly.
+
+A new upcall xlator is defined to maintain all the state required for
+upcall notifications. This xlator is below io-threads xlator
+(considering protocol/server xlator is on top). The reason for choosing
+this xlator to be below io-threads is to be able to spawn new threads to
+send upcall notifications, to detect conflicts or to do the cleanup etc.
+
+At present we store all the state related to the file entries accessed
+by the clients in the inode context. Each of these entries have 'gfid'
+as the key value and list of client entries accessing that file.
+
+For each of the file accessed, we create or update an existing entry and
+append/update the clientinfo accessing that file.
+
+Sample structure of the upcall and client entries are -
+
+ struct _upcall_client_entry_t {
+ struct list_head client_list;
+ char *client_uid; /* unique UID of the client */
+ rpc_transport_t *trans; /* RPC transport object of the client */
+ rpcsvc_t *rpc; /* RPC structure of the client */
+ time_t access_time; /* time last accessed */
+ time_t recall_time; /* time recall_deleg sent */
+ deleg_type deleg; /* Delegation granted to the client */
+ };
+
+ typedef struct _upcall_client_entry_t upcall_client_entry;
+
+ struct _upcall_entry_t {
+ struct list_head list;
+ uuid_t gfid; /* GFID of the file */
+ upcall_client_entry client; /* list of clients */
+ int deleg_cnt /* no. of delegations granted for this file */
+ };
+
+ typedef struct _upcall_entry_t upcall_entry;
+
+As upcall notifcations are rpc calls, Gluster server needs to store
+client rpc details as well in the upcalls xlator. These rpc details are
+passed from protocol/server xlator to upcall xlator via "client\_t"
+structure stored as "frame-\>root-\>client".
+
+Below is a brief overview of how each of the above defined use-cases are
+handled.
+
+#### Register for callback notifications
+
+We shall provide APIs in gfapi to register and unregister, for receiving
+specific callback events from the server. At present, we support below
+upcall events.
+
+- Cache Invalidation
+- Recall Lease-Lock
+
+#### Cache Invalidation
+
+: Whenever a client sends a fop, after processing the fop, in callback
+ path, server
+
+- get/add upcall entry based on gfid.
+- lookup/add the client entry to that upcall entry based on
+ client\_t-\>client\_uid, with timestamp updated
+- check if there are other clients which have accessed the same file
+ within cache invalidation time (default 60sec and tunable)
+- if present, send notifications to those clients with the attributes
+ info to be invalidated/refreshed on the client side.
+
+: For eg - WRITE fop would result in change in size, atime, ctime,
+ mtime attributes.
+
+###### Sequence diagram
+
+
+ ------------- ---------------------- ------------ ----------------------- -------------
+ |NFS-Client(C1)| |NFS-Ganesha server(GC1)| |Brick server| |NFS-ganesha server(GC2)| |NFS-Client(C2)|
+ ------------- ----------------------- ------------ ----------------------- -------------
+ | | | | |
+ | | | | |
+ ' ' ' ' '
+ ' ' ' ' '
+ ' I/O on file1 ' ' ' '
+ '------------------------>' ' ' '
+ ' ' Send fop via rpc request ' ' '
+ ' '-------------------------->' ' '
+ ' ' ' ' '
+ ' ' ' ' '
+ ' ' Make an upcall entry of ' '
+ ' ' 'GC1' for 'file1' in ' '
+ ' ' STACK_UNWIND path ' '
+ ' ' Send fop response ' ' '
+ ' '<------------------------- ' ' '
+ ' Response to I/O ' ' ' '
+ '<------------------------' ' ' '
+ ' ' ' ' Request an I/O on 'file1' '
+ ' ' ' '<------------------------------'
+ ' ' ' ' '
+ ' ' ' ' '
+ ' ' ' Send rpc request ' '
+ ' ' '<------------------------------' '
+ ' ' ' ' '
+ ' ' ' ' '
+ ' ' In STACK_UNWIND CBK path, ' '
+ ' ' add upcall entry 'GC2' for ' '
+ ' ' 'file1' ' '
+ ' ' ' ' '
+ ' ' Send 'CACHE_INVALIDATE' ' ' '
+ ' ' Upcall event ' ' '
+ ' '<--------------------------' ' '
+ ' ' ' Send rpc response ' '
+ ' ' '------------------------------>' '
+ ' ' ' ' Response to I/O '
+ ' ' ' '------------------------------>'
+ ' ' ' ' '
+ ' ' ' ' '
+ ' ' ' ' '
+
+Reaper thread
+: Incase of cache\_invalidation, the upcall states maintained are
+ considered valid only if the corresponding client's last
+ access\_time hasn't exceeded 60sec (default at present).
+: To clean-up the expired state entries, a new reaper thread will be
+ spawned which will crawl through all the upcalls states, detect and
+ cleanup the expired entries.
+
+#### delegations/lease-locks
+
+A file lease provides a mechanism whereby the process holding the lease
+(the "lease holder") is notified when a process (the "lease breaker")
+tries to perform a fop with conflicting access on the same file.
+
+NFS Delegation (similar to lease\_locks) is a technique by which the
+server delegates the management of a file to a client which guarantees
+that no client can open the file in a conflicting mode.
+
+Advantages of these locks is that it greatly reduces the interactions
+between the server and the client for delegated files.
+
+This feature now also provides the support to grant or process these
+lease-locks/delegations for the files.
+
+##### API to request lease
+
+: A new API has been introduced in "gfapi" for the applications to
+ request or unlock the lease-locks.
+
+: This API will be an extension to the existing API "glfs\_posix\_lock
+ (int fd, int cmd, struct flock fl)" which is used to request for
+ posix locks, with below extra parameters -
+
+- lktype (byte-range or lease-lock or share-reservation)
+- lkowner (to differentiate between different application clients)
+
+: On receiving lease-lock request, the GlusterFS client uses existing
+ rpc program "GFS3\_OP\_LK" to send lock request to the brick process
+ but with lkflags denoting lease-lock set in the xdata of the
+ request.
+
+##### Leas-lock processing on the server-side
+
+Add Lease
+: On receiving the lock request, the server (in the upcall xlator) now
+ checks the lkflags first to determine if its lease-lock request.
+ Once it identifies so and considering there are no lease-conflicts
+ for that file, it
+
+- fetches the inode\_ctx for that inode entry
+- lookup/add the client entry to that upcall entry based on
+ client\_t-\>client\_uid, with timestamp updated
+- checks whether there are any existing open-fds with conflicting
+ access requests on that file. If yes bail out and do not grant the
+ lease.
+- In addition, server now also need to keep-track and verify that
+ there aren't any non-fd related fops (like SETATTR) being processed
+ in parallel before granting lease. This is done by either
+
+<!-- -->
+
+ * not granting a lease irrespective of which client requested for those fops or
+ * providing a mechanism for the applications to set clientid while doing each fop. Sever then can match the client-ids before deciding to grant lease.
+
+- Update the lease info in the client entry and mark it as lease
+ granted.
+- Incase if there is already a lease-lock granted to the same client
+ for the same fd, this request will be considered duplicate and a
+ success is returned to the client.
+
+Remove Lease
+: Similar to the above case "Add Lease", the server on receiving
+ UNLOCK request for a lease-lock, it
+
+- fetches the inode\_ctx
+- lookup/add the client entry to that upcall entry based on
+ client\_t-\>client\_uid, with timestamp updated
+- remove the lease granted to that client from that list.
+- Even if the lease not found, the server will return success (as done
+ for POSIX locks).
+- After removing the lease, the server starts processing the fops from
+ the blocked queue if there are any.
+
+Lease-conflict check/Recalling lease-lock
+: For each fop issued by a client, server now first need to check if
+ it conflicts with any exisiting lease-lock taken on that file. For
+ that it first
+
+- fetches its inode\_ctx
+- verify if there are lease-locks granted with conflicting access to
+ any other client for that file.
+
+(Note: incase of same client, the assumption is that application will
+handle all the conflict checks between its clients and block them if
+necessary. However, in future we plan to provide a framework/API for
+applications to set their client id, like lkwoner incase of locks,
+before sending any fops for the server to identify and differentiate
+them)
+
+- if yes, send upcall notifications to recall the lease-lock and
+ either
+
+`   * send EDELAY error incase if the fop is 'NON-BLOCKING'. Else`\
+`   * add the fop to the blocking queue`
+
+- Trigger a timer event to notify if the recall doesn't happen within
+ certain configured time.
+
+Purge Lease
+
+- Incase if the client doesn't unlock the lease with in the recall
+ timeout period, timer thread will trigger an event to purge that
+ lease forcefully.
+- Post that, fops (if any) in the blocked queue are processed.
+
+##### Sequence Diagram
+
+ ------------- ---------------------- ------------ ----------------------- -------------
+ |NFS-Client(C1)| |NFS-Ganesha server(GC1)| |Brick server| |NFS-ganesha server(GC2)| |NFS-Client(C2)|
+ ------------- ----------------------- ------------ ----------------------- -------------
+ | | | | |
+ ' Open on file1 ' ' ' '
+ '------------------------>' ' ' '
+ ' ' Send OPEN on 'file1' ' ' '
+ ' '-------------------------->' ' '
+ ' ' ' ' '
+ ' ' OPEN response ' ' '
+ ' '<--------------------------' ' '
+ ' ' ' ' '
+ ' ' LOCK on 'file1' with ' ' '
+ ' ' LEASE_LOCK type ' ' '
+ ' '-------------------------->' ' '
+ ' ' ' ' '
+ ' ' Take a lease_lock for ' '
+ ' ' entire file range. ' '
+ ' ' If it suceeds, add an upcall ' '
+ ' ' lease entry 'GC1' for 'file1' ' '
+ ' ' Send Success ' ' '
+ ' '<------------------------- ' ' '
+ ' Response to OPEN ' ' ' '
+ '<------------------------' ' ' '
+ ' ' ' ' Conflicting I/O on 'file1' '
+ ' ' ' '<------------------------------'
+ ' ' ' Send rpc request ' '
+ ' ' '<------------------------------' '
+ ' ' Send Upcall event ' ' '
+ ' ' 'RECALL_LEASE' ' ' '
+ ' '<--------------------------' ' '
+ ' RECALL_DELEGATION ' (a)Now either block I/O ' '
+ '<------------------------' or ' '
+ ' ' (b) ' Send EDELAY/ERETRY ' '
+ ' ' '------------------------------>' '
+ ' ' ' ' (b)SEND EDELAY/ERETRY '
+ ' Send I/O to flush data ' ' '------------------------------>'
+ '------------------------>' ' ' '
+ ' ' RPC reqeust for all fops' ' '
+ ' '-------------------------->' ' '
+ ' ' ' ' '
+ ' ' Send rpc response ' ' '
+ ' '<--------------------------' ' '
+ ' Send success ' ' ' '
+ '<------------------------' ' ' '
+ ' ' ' ' '
+ ' Return DELEGATION ' ' ' '
+ '------------------------>' ' ' '
+ ' ' UNLOCK request with type ' ' '
+ ' ' LEASE_LOCK ' ' '
+ ' '-------------------------->' ' '
+ ' ' ' ' '
+ ' ' Unlock the lease_lk. ' '
+ ' ' (a) Unblock the fop ' '
+ ' ' Send Success ' ' '
+ ' '<--------------------------'(a) Send response to I/O ' '
+ ' ' '------------------------------>' '
+ ' Return Success ' ' ' (a) SEND RESPONSE '
+ '<------------------------' ' '------------------------------>'
+ ' ' ' ' '
+ ' ' ' ' '
+ ' ' ' ' '
+ ' ' ' ' '
+
+
+#### Upcall notifications processing on the client side
+
+: The structure of the upcall data sent by the server is noted in the
+ "Documentation" section.
+: On receiving the upcall notifications, protocol/client xlator
+ detects that its a callback event, decodes the upcall data sent
+ ('gfs3\_upcall\_req' noted in the Documentation section) and passes
+ the same to the parent translators.
+: On receiving these notify calls from protocol/client, parent
+ translators (planning to use this infra) have to first processes the
+ event\_type of the upcall data received and accordingly take the
+ action.
+: Currently as this infra is used by only nfs-ganesha, these notify
+ calls are directly sent to gfapi from protocol/client xlator.
+: For each of such events received, gfapi creates an entry and queues
+ it to the list of upcall events received.
+
+: Sample entry structure -
+
+<!-- -->
+
+ struct _upcall_events_list {
+ struct list_head upcall_entries;
+ uuid_t gfid;
+ upcall_event_type event_type;
+ uint32_t flags;
+ };
+ typedef struct _upcall_events_list upcall_events_list;
+
+: Now either the application could choose to regularly poll for such
+ upcall events or the gfapi can notify application via a signal or a
+ cond-variable.
+
+### Extentions
+
+: This framework could easily be extended to send any other event
+ notifications to the client process.
+: A new event has to be added to the list of upcall event types
+ (mentioned in Documentation section) and any extra data which need
+ to be sent has to be added to gfs3\_upcall\_req structure.
+: On the client side, the translator (interested) should check for the
+ event type and the data passed to take action accordingly.
+: FUSE can also make use of this feature to support lease-locks
+: A new performance xlator can be added to take lease-locks and cache
+ I/O.
+
+### Limitations
+
+Rebalancing
+: At present, after rebalance, locks states are not migrated.
+ Similarly, the state maintained by this new xlator will also be not
+ migrated.
+: However, after migrating the file, since DHT does delete of the file
+ on the source brick, incase of
+
+- cache-invalidation, we may falsely notify the client that the file
+ is deleted. (Note: to avoid this at present, we do not send any
+ "Destroy Flag")
+- delegations/lease locks present, the 'delete' will be blocked till
+ that delegation is recalled. This way, the clients holding those
+ locks can flush their data which will now be redirected to the new
+ brick.
+
+Self-Heal
+: If a brick process goes down, the replica brick (which maintain the
+ same state) will takeover processing of all the fops.
+: But if later the first brick process comes back up, the healing of
+ the upcall/lease-lock states is not done on that process.
+
+Network Partitions
+: Incase if there are any network partitions between glusterfsd brick
+ process and glusterfs client process, similar to lock states, the
+ upcalls/lease-lock state maintained by this new xlator will also be
+ lost.
+: However if there is a replica brick present, clients will get
+ re-directed to that process (which still has the states maintained).
+ This brick process will take care of checking the conflicts and
+ sending notifications.
+: Maybe client could try reconnecting with the same client\_uid and
+ replay the locks. But if any of those operations fail, gfapi will
+ return 'EBADFD' to the applications. This enhancement will be
+ considered for future.
+
+Directory leases are not yet supported.
+: This feature at present mainly targets file-level
+ delegations/leases.
+
+Lease Upgrade
+: Read-to-write lease upgrade is not supported currently.
+
+Heuristics
+: Have to maintain heuristics in Gluster as well to determine when to
+ grant the lease/delegations.
+
+Benefit to GlusterFS
+--------------------
+
+This feature is definitely needed to support NFS-Ganesha Multi-head and
+Active-Active HA support.
+
+Along with it, this infra may potentially can be used for
+
+- multi-protocol access
+- small-file performance improvements.
+- pNFS support.
+
+Scope
+-----
+
+### Nature of proposed change
+
+- A new xlator 'Upcalls' will be introduced in the server-side stack
+ to maintain the states and send callbacks.
+
+- This xlator will be ON only when Ganesha feature is enabled.
+
+- "client\_t" structure is modified to contain rpc connection details.
+
+- New procedures have been added to "Glusterfs Callback" rpc program
+ for each of the notify events.
+
+- There will be support added on gfapi side to handle these upcalls
+ sent and inform the applications accordingly.
+
+- Probably md-cache may also add support to handle these upcalls.
+
+### Implications on manageability
+
+A new xlator 'Upcalls' is added to the server vol file.
+
+### Implications on presentation layer
+
+Applications planning to use Upcall Infra have to invoke new APIs
+provided by gfapi to receive these notifications.
+
+### Implications on persistence layer
+
+None.
+
+### Implications on 'GlusterFS' backend
+
+None
+
+### Modification to GlusterFS metadata
+
+None
+
+### Implications on 'glusterd'
+
+This infra is supported currently only when the new CLI option
+introduced to enable Ganesha is ON. May need to revisit on this incase
+if there are other consumers of this feature.
+
+How To Test
+-----------
+
+- Bring up Multi-head Ganesha servers.
+- Test if the I/Os performed using one head are reflected on the
+ another server.
+- Test if delegations are granted and successfully recalled when
+ needed.
+
+User Experience
+---------------
+
+- This infra will be controlled by a tunable (currently
+ 'nfs-ganesha.enable' option as it is the only consumer). If the
+ option is off, fops will just pass through without any additional
+ processing done.
+- But incase if its ON, the consumers of this infra may see some
+ performance hit due to the additional state maintained, processed
+ and more RPCs sent over wire incase of notifications.
+
+Dependencies
+------------
+
+Gluster CLI to enable ganesha
+: It depends on the new [Gluster CLI
+ option](http://www.gluster.org/community/documentation/index.php/Features/Gluster_CLI_for_ganesha)
+ which is to be added to enable Ganesha.
+
+Wireshark
+: In addition, the new RPC procedure introduced to send callbacks has
+ to be added to the list of Gluster RPC Procedures supported by
+ [Wireshark](https://forge.gluster.org/wireshark/pages/Todo).
+
+Rebalance/Self-Heal/Tiering
+: This upcall state maintained is anologous to the locks state. Hence,
+
+- During rebalance or tiering of the files, along with the locks
+ state, the state maintained by this xlator also need to be migrated
+ to the new subvolume.
+
+- When there is self-heal support for the locks state, this xlator
+ state also needs to be considered.
+
+Filter-out duplicate notifications
+: Incase of replica bricks maintained by AFR/EC, the upcalls state is
+ maintained and processed on all the replica bricks. This will result
+ in duplicate notifications to be sent by all those bricks incase of
+ non-idempotent fops. Also in case of distributed volumes, cache
+ invalidation notifications on a directory entry will be sent by all
+ the bricks part of that volume. Hence We need support to filter out
+ such duplicate callback notifications.
+
+: The approach we shall take to address it is that,
+
+- add a new xlator on the client-side to track all the fops. Maybe
+ create a unique transaction id and send it to the server.
+- Server needs to store this transaction id in the client info as part
+ of upcall state.
+- While sending any notifications, add this transaction id too to the
+ request.
+- Client (the new xlator) has to filter out duplicate requests based
+ on the transaction ids received.
+
+Special fops
+: During rebalance/self-heal, though it is not the client application
+ which is doing the fops, brick process may still send the
+ notifications. To avoid that, we need a register mechanism to let
+ only those clients who register, to receive upcall notifications.
+
+Cleanup during network disconnect - protocol/server
+: At present, incase of network disconnects between the
+ glusterfs-server and the client, the protocol/server looks up the fd
+ table associated with that client and sends 'flush' op for each of
+ those fds to cleanup the locks associated with it.
+
+: We need similar support to flush the lease locks taken. Hence, while
+ granting the lease-lock, we plan to associate that upcall\_entry
+ with the corresponding fd\_ctx or inode\_ctx so that they can be
+ easily tracked if needed to be cleaned up. Also it will help in
+ faster lookup of the upcall entry while trying to process the fops
+ using the same fd/inode.
+
+Note: Above cleanup is done for the upcall state associated with only
+lease-locks. For the other entries maintained (for eg:, for
+cache-invalidations), the reaper thread will anyways clean-up those
+stale entries once they get expired (i.e, access\_time \> 1min)
+
+Replay the lease-locks taken
+: At present, replay of locks by the client xlator seems to have been
+ disabled.
+: But when it is being enabled, we need to add support to replay
+ lease-locks taken as well.
+
+Documentation
+-------------
+
+Sample upcall request structure sent to the clients-
+
+ struct gfs3_upcall_req {
+ char gfid[16];
+ u_int event_type;
+ u_int flags;
+ };
+ typedef struct gfs3_upcall_req gfs3_upcall_req;
+
+ enum upcall_event_type_t {
+ CACHE_INVALIDATION,
+ RECALL_READ_DELEG,
+ RECALL_READ_WRITE_DELEG
+ };
+ typedef enum upcall_event_type_t upcall_event_type;
+
+ flags to be sent for inode update/invalidation-
+ #define UP_NLINK 0x00000001 /* update nlink */
+ #define UP_MODE 0x00000002 /* update mode and ctime */
+ #define UP_OWN 0x00000004 /* update mode,uid,gid and ctime */
+ #define UP_SIZE 0x00000008 /* update fsize */
+ #define UP_TIMES 0x00000010 /* update all times */
+ #define UP_ATIME 0x00000020 /* update atime only */
+ #define UP_PERM 0x00000040 /* update fields needed for
+ permission checking */
+ #define UP_RENAME 0x00000080 /* this is a rename op -
+ delete the cache entry */
+ #define UP_FORGET 0x00000100 /* inode_forget on server side -
+ invalidate the cache entry */
+ #define UP_PARENT_TIMES 0x00000200 /* update parent dir times */
+ #define UP_XATTR-FLAGS 0x00000400 /* update xattr */
+
+ /* for fops - open, read, lk, which do not trigger upcalll notifications
+ * but need to update to client info in the upcall state */
+ #define UP_UPDATE_CLIENT (UP_ATIME)
+
+ /* for fop - write, truncate */
+ #define UP_WRITE_FLAGS (UP_SIZE | UP_TIMES)
+
+ /* for fop - setattr */
+ #define UP_ATTR_FLAGS (UP_SIZE | UP_TIMES | UP_OWN | \
+ UP_MODE | UP_PERM)
+ /* for fop - rename */
+ #define UP_RENAME_FLAGS (UP_RENAME)
+
+ /* to invalidate parent directory entries for fops -rename, unlink,
+ * rmdir, mkdir, create */
+ #define UP_PARENT_DENTRY_FLAGS (UP_PARENT_TIMES)
+
+ /* for fop - unlink, link, rmdir, mkdir */
+ #define UP_NLINK_FLAGS (UP_NLINK | UP_TIMES)
+
+List of fops currently identified which trigger inode update/Invalidate
+notifications to be sent are :
+
+ fop - flags to be sent - UPDATE/ - Entries
+ INVALIDATION affected
+ ----------------------------------------------------------------------------
+ writev - UP_WRITE_FLAGS - INODE_UPDATE - file
+ truncate - UP_WRITE_FLAGS - INODE_UPDATE - file
+ lk/lock - UP_UPDATE_CLIENT - INODE_UPDATE - file
+ setattr - UP_ATTR_FLAGS - INODE_UPDATE/INVALIDATE - file
+ rename - UP_RENAME_FLAGS, UP_PARENT_DENTRY_FLAGS - INODE_INVALIDATE - both file and parent dir
+ unlink - UP_NLINK_FLAGS, UP_PARENT_DENTRY_FLAGS - INODE_INVALIDATE - file & parent_dir
+ rmdir - UP_NLINK_FLAGS, UP_PARENT_DENTRY_FLAGS - INODE_INVALIDATE - file & parent_dir
+ link - UP_NLINK_FLAGS, UP_PARENT_DENTRY_FLAGS - INODE_UPDATE - file & parent_dir
+ create - UP_TIMES, UP_PARENT_DENTRY_FLAGS - INODE_UPDATE - parent_dir
+ mkdir - UP_TIMES, UP_PARENT_DENTRY_FLAGS - INODE_UPDATE - parent_dir
+ setxattr - UP_XATTR_FLAGS - INODE_UPDATE - file
+ removexattr - UP_UPDATE_CLIENT - INODE_UPDATE - file
+ mknod - UP_TIMES, UP_PARENT_DENTRY_FLAGS - INODE_UPDATE - parent_dir
+ symlink - UP_TIMES, UP_PARENT_DENTRY_FLAGS - INODE_UPDATE - file
+
+List of fops which result in delegations/lease-lock recall are:
+
+ open
+ read
+ write
+ truncate
+ setattr
+ lock
+ link
+ remove
+ rename
+
+Comments and Discussion
+-----------------------
+
+### TODO
+
+- Lease-locks implementation is currently work in progress [BZ
+ 1200268](https://bugzilla.redhat.com/show_bug.cgi?id=1200268)
+- Clean up expired client entries (in case of cache-invalidation).
+ Refer to the section 'Cache Invalidation' [BZ
+ 1200267](https://bugzilla.redhat.com/show_bug.cgi?id=1200267)
+- At present, for cache-invalidation, callback notifications are sent
+ in the fop path. Instead to avoid brick latency, have a mechanism to
+ send it asynchronously. [BZ
+ 1200264](https://bugzilla.redhat.com/show_bug.cgi?id=1200264)
+- Filer out duplicate callback notifications [BZ
+ 1200266](https://bugzilla.redhat.com/show_bug.cgi?id=1200266)
+- Support for Directory leases.
+- Support for read-write Lease Upgrade
+- Have to maintain heuristics in Gluster as well to determine when to
+ grant the lease/delegations.
diff --git a/done/GlusterFS 3.7/arbiter.md b/done/GlusterFS 3.7/arbiter.md
new file mode 100644
index 0000000..797f005
--- /dev/null
+++ b/done/GlusterFS 3.7/arbiter.md
@@ -0,0 +1,100 @@
+Feature
+-------
+
+This feature provides a way of preventing split-brains in replica 3 gluster volumes both in time and space.
+
+Summary
+-------
+
+Please see <http://review.gluster.org/#/c/9656/> for the design discussions
+
+Owners
+------
+
+Pranith Kumar Karampuri
+Ravishankar N
+
+Current status
+--------------
+
+Feature complete.
+
+Code patches: <http://review.gluster.org/#/c/10257/> and
+<http://review.gluster.org/#/c/10258/>
+
+Detailed Description
+--------------------
+Arbiter volumes are replica 3 volumes where the 3rd brick of the replica is
+automatically configured as an arbiter node. What this means is that the 3rd
+brick will store only the file name and metadata, but does not contain any data.
+This configuration is helpful in avoiding split-brains while providing the same
+level of consistency as a normal replica 3 volume.
+
+Benefit to GlusterFS
+--------------------
+
+It prevents split-brains in replica 3 volumes and consumes lesser space than a normal replica 3 volume.
+
+Scope
+-----
+
+### Nature of proposed change
+
+### Implications on manageability
+
+None
+
+### Implications on presentation layer
+
+None
+
+### Implications on persistence layer
+
+None
+
+### Implications on 'GlusterFS' backend
+
+None
+
+### Modification to GlusterFS metadata
+
+None
+
+### Implications on 'glusterd'
+
+None
+
+How To Test
+-----------
+
+If we bring down bricks and perform writes in such a way that arbiter
+brick is the only source online, writes/reads will be made to fail with
+ENOTCONN. See 'tests/basic/afr/arbiter.t' in the glusterfs tree for
+examples.
+
+User Experience
+---------------
+
+Similar to a normal replica 3 volume. The only change is the syntax in
+volume creation. See
+<https://github.com/gluster/glusterfs-specs/blob/master/Features/afr-arbiter-volumes.md>
+
+Dependencies
+------------
+
+None
+
+Documentation
+-------------
+
+---
+
+Status
+------
+
+Feature completed. See 'Current status' section for the patches.
+
+Comments and Discussion
+-----------------------
+Some optimizations are under way.
+---
diff --git a/done/GlusterFS 3.7/index.md b/done/GlusterFS 3.7/index.md
new file mode 100644
index 0000000..99381cf
--- /dev/null
+++ b/done/GlusterFS 3.7/index.md
@@ -0,0 +1,90 @@
+GlusterFS 3.7 Release Planning
+------------------------------
+
+Tentative Dates:
+
+9th Mar, 2015 - Feature freeze & Branching
+
+28th Apr, 2015 - 3.7.0 Beta Release
+
+14th May, 2015 - 3.7.0 GA
+
+Features in GlusterFS 3.7
+-------------------------
+
+* [Features/Smallfile Performance](./Small File Performance.md):
+Small-file performance enhancement - multi-threaded epoll implemented.
+
+* [Features/Data-classification](./Data Classification.md):
+Tiering, rack-aware placement, and more. - Policy based tiering
+implemented
+
+* [Features/Trash](./Trash.md): Trash translator for
+GlusterFS
+
+* [Features/Object Count](./Object Count.md)
+
+* [Features/SELinux Integration](./SE Linux Integration.md)
+
+* [Features/Exports Netgroups Authentication](./Exports and Netgroups Authentication.md)
+
+* [Features/Policy based Split-brain resolution](./Policy based Split-brain Resolution.md): Policy Based
+Split-brain resolution
+
+* [Features/BitRot](./BitRot.md)
+
+* [Features/Gnotify](./Gnotify.md)
+
+* [Features/Improve Rebalance Performance](./Improve Rebalance Performance.md)
+
+* [Features/Upcall-infrastructure](./Upcall Infrastructure.md):
+Support for delegations/lease-locks, inode-invalidation, etc..
+
+* [Features/Gluster CLI for ganesha](./Gluster CLI for NFS Ganesha.md): Gluster CLI
+support to manage nfs-ganesha exports
+
+* [Features/Scheduling of Snapshot](./Scheduling of Snapshot.md): Schedule creation
+of gluster volume snapshots from command line, using cron.
+
+* [Features/sharding-xlator](./Sharding xlator.md)
+
+* [Features/HA for ganesha](./HA for Ganesha.md): HA
+support for NFS-Ganesha
+
+* [Features/Clone of Snapshot](./Clone of Snapshot.md)
+
+Other big changes
+-----------------
+
+* **GlusterD Daemon code
+refactoring**: GlusterD
+manages a lot of other daemons (bricks, NFS server, SHD, rebalance
+etc.), and there are several more on the way. This refactoring will
+introduce a common framework to manage all these daemons, which will
+make maintainance easier.
+
+* **RCU in GlusterD**: GlusterD has issues
+with thread synchronization and data access. This has been discussed on
+<http://www.gluster.org/pipermail/gluster-devel/2014-December/043382.html>
+. We will be using the RCU method to solve these issues and will be
+using [Userspace-RCU](http://urcu.so) to help with the implementation.
+
+Features beyond GlusterFS 3.7
+-----------------------------
+
+* [Features/Easy addition of custom translators](./Easy addition of Custom Translators.md)
+
+* [Features/outcast](./Outcast.md): Outcast
+
+* [Features/rest-api](./rest-api.md): REST API for
+Gluster Volume and Peer Management
+
+* [Features/Archipelago Integration](./Archipelago Integration.md):
+Improvements for integration with Archipelago
+
+Release Criterion
+-----------------
+
+- All new features to be documented in admin guide
+
+- Regression tests added \ No newline at end of file
diff --git a/done/GlusterFS 3.7/rest-api.md b/done/GlusterFS 3.7/rest-api.md
new file mode 100644
index 0000000..e967d28
--- /dev/null
+++ b/done/GlusterFS 3.7/rest-api.md
@@ -0,0 +1,152 @@
+Feature
+-------
+
+REST API for GlusterFS
+
+Summary
+-------
+
+Provides REST API for Gluster Volume and Peer Management.
+
+Owners
+------
+
+Aravinda VK <mail@aravindavk.in> (http://aravindavk.in)
+
+Current status
+--------------
+
+REST API is not available in GlusterFS.
+
+Detailed Description
+--------------------
+
+GlusterFS REST service can be started by running following command in
+any node.
+
+ sudo glusterrest -p 8080
+
+Features:
+
+- No Separate server required, command can be run in any one node.
+- Provides Basic authentication(user groups can be added)
+- Any REST client can be used.
+- JSON output
+
+Benefit to GlusterFS
+--------------------
+
+Provides REST API for GlusterFS cluster.
+
+Scope
+-----
+
+### Nature of proposed change
+
+New code.
+
+### Implications on manageability
+
+### Implications on presentation layer
+
+### Implications on persistence layer
+
+### Implications on 'GlusterFS' backend
+
+### Modification to GlusterFS metadata
+
+### Implications on 'glusterd'
+
+How To Test
+-----------
+
+User Experience
+---------------
+
+Dependencies
+------------
+
+Documentation
+-------------
+
+### Usage:
+
+New CLI command will be available \`glusterrest\`,
+
+ usage: glusterrest [-h] [-p PORT] [--users USERS]
+ [--no-password-hash]
+
+ optional arguments:
+ -h, --help show this help message and exit
+ -p PORT, --port PORT PORT Number
+ -u USERS, --users USERS
+ Users JSON file
+ --no-password-hash No Password Hash
+
+
+Following command will start the REST server in the given port(8080 in
+this example).
+
+ sudo glusterrest -p 8080 --users /root/secured_dir/gluster_users.json
+
+Format of users json file(List of key value pairs, username and
+password):
+
+ [
+ {"username": "aravindavk", "password": "5ebe2294ecd0e0f08eab7690d2a6ee69"}
+ ]
+
+Password is md5 hash, if no hash required then use --no-password-hash
+while running glusterrest command.
+
+### API Documentation
+
+Getting list of peers
+
+ GET /api/1/peers
+
+Peer Probe
+
+ CREATE /api/1/peers/:hostname
+
+Peer Detach
+
+ DELETE /api/1/peers/:hostname
+
+Creating Gluster volumes
+
+ CREATE /api/1/volumes/:name
+ CREATE /api/1/volumes/:name/force
+
+Deleting Gluster Volume
+
+ DELETE /api/1/volumes/:name
+ DELETE /api/1/volumes/:name/stop
+
+Gluster volume actions
+
+ POST /api/1/volumes/:name/start
+ POST /api/1/volumes/:name/stop
+ POST /api/1/volumes/:name/start-force
+ POST /api/1/volumes/:name/stop-force
+ POST /api/1/volumes/:name/restart
+
+Gluster Volume modifications
+
+ PUT /api/1/volumes/:name/add-brick
+ PUT /api/1/volumes/:name/remove-brick
+ PUT /api/1/volumes/:name/set
+ PUT /api/1/volumes/:name/reset
+
+Getting volume information
+
+ GET /api/1/volumes
+ GET /api/1/volumes/:name
+
+Status
+------
+
+50% Coding complete, Started writing documentation.
+
+Comments and Discussion
+-----------------------