Create basic directory structure

All new features specs go into in_progress directory. Once signed off, it should be moved to done directory. For now, This change moves all the Gluster 4.0 feature specs to in_progress. All other specs are under done/release-version. More cleanup required will be done incrementally. Change-Id: Id272d301ba8c434cbf7a9a966ceba05fe63b230d BUG: 1206539 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: http://review.gluster.org/11969 Reviewed-by: Humble Devassy Chirammal <humble.devassy@gmail.com> Reviewed-by: Prashanth Pai <ppai@redhat.com> Tested-by: Humble Devassy Chirammal <humble.devassy@gmail.com>
author: raghavendra talur <raghavendra.talur@gmail.com> 2015-08-20 15:09:31 +0530
committer: Humble Devassy Chirammal <humble.devassy@gmail.com> 2015-08-31 02:27:22 -0700
commit: 9e9e3c5620882d2f769694996ff4d7e0cf36cc2b (patch)
tree: 3a00cbd0cc24eb7df3de9b2eeeb8d42ee9175f88 /done/GlusterFS 3.5/AFR CLI enhancements.md
parent: f6055cdb4dedde576ed8ec55a13814a69dceefdc (diff)
1 files changed, 204 insertions, 0 deletions
diff --git a/done/GlusterFS 3.5/AFR CLI enhancements.md b/done/GlusterFS 3.5/AFR CLI enhancements.md
new file mode 100644
index 0000000..88f4980
--- /dev/null
+++ b/done/GlusterFS 3.5/AFR CLI enhancements.md
@@ -0,0 +1,204 @@
+Feature
+-------
+
+AFR CLI enhancements
+
+SUMMARY
+-------
+
+Presently the AFR reporting via CLI has lots of problems in the
+representation of logs because of which they may not be able to use the
+data effectively. This feature is to correct these problems and provide
+a coherent mechanism to present heal status,information and the logs
+associated.
+
+Owners
+------
+
+Venkatesh Somayajulu  
+Raghavan
+
+Current status
+--------------
+
+There are many bugs related to this which indicates the current status
+and why these requirements are required.
+
+1) 924062 - gluster volume heal info shows only gfids in some cases and
+sometimes names. This is very confusing for the end user.
+
+2) 852294 - gluster volume heal info hangs/crashes when there is a
+large number of entries to be healed.
+
+3) 883698 - when self heal daemon is turned off, heal info does not
+show any output. But healing can happen because of lookups from IO path.
+Hence list of entries to be healed still needs to be shown.
+
+4) 921025 - directories are not reported when list of split brain
+entries needs to be displayed.
+
+5) 981185 - when self heal daemon process is offline, volume heal info
+gives error as "staging failure"
+
+6) 952084 - We need a command to resolve files in split brain state.
+
+7) 986309 - We need to report source information for files which got
+healed during a self heal session.
+
+8) 986317 - Sometimes list of files to get healed also includes files
+to which IO s being done since the entries for these files could be in
+the xattrop directory. This could be confusing for the user.
+
+There is a master bug 926044 that sums up most of the above problems. It
+does give the QA perspective of the current representation out of the
+present reporting infrastructure.
+
+Detailed Description
+--------------------
+
+1) One common thread among all the above complaints is that the
+information presented to the user is <B>FUD</B> because of the following
+reasons:
+
+(a) Split brain itself is a scary scenario especially with VMs.  
+(b) The data that we present to the users cannot be used in a stable
+    manner for them to get to the list of these files. <I>For ex:</I> we
+    need to give mechanisms by which he can automate the resolution out
+    of split brain.  
+(c) The logs that are generated are all the more scarier since we
+    see repetition of some error lines running into hundreds of lines.
+    Our mailing lists are filled with such emails from end users.  
+
+Any data is useless unless it is associated with an event. For self
+heal, the event that leads to self heal is the loss of connectivity to a
+brick from a client. So all healing info and especially split brain
+should be associated with such events.
+
+The following is hence the proposed mechanism:
+
+(a) Every loss of a brick from client's perspective is logged and
+    available via some ID. The information provides the time from when
+    the brick went down to when it came up. Also it should also report
+    the number of IO transactions(modifies) that hapenned during this
+    event.  
+(b) The list of these events are available via some CLI command. The
+    actual command needs to be detailed as part of this feature.  
+(c) All volume info commands regarding list of files to be healed,
+    files healed and split brain files should be associated with this
+    event(s).  
+
+2) Provide a mechanism to show statistics at a volume and replica group
+level. It should show the number of files to be healed and number of
+split brain files at both the volume and replica group level.
+
+3) Provide a mechanism to show per volume list of files to be
+healed/files healed/split brain in the following info:
+
+This should have the following information:
+
+(a) File name  
+(b) Bricks location  
+(c) Event association (brick going down)  
+(d) Source  
+(v) Sink
+
+4) Self heal crawl statistics - Introduce new CLI commands for showing
+more information on self heal crawl per volume.
+
+(a) Display why a self heal crawl ran (timeouts, brick coming up)  
+(b) Start time and end time  
+(c) Number of files it attempted to heal  
+(d) Location of the self heal daemon
+
+5) Scale the logging infrastructure to handle huge number of file list
+that needs to be displayed as part of the logging.
+
+(a) Right now the system crashes or hangs in case of a high number
+    of files.  
+(b) It causes CLI timeouts arbitrarily. The latencies involved in
+    the logging have to be studied (profiled) and mechanisms to
+    circumvent them have to be introduced.  
+(c) All files are displayed on the output. Have a better way of
+    representing them.
+
+Options are:
+
+(a) Maybe write to a glusterd log file or have a seperate directory
+    for afr heal logs.  
+(b) Have a status kind of command. This will display the current
+    status of the log building and maybe have batched way of
+    representing when there is a huge list.
+
+6) We should provide mechanism where the user can heal split brain by
+some pre-established policies:
+
+(a) Let the system figure out the latest files (assuming all nodes
+    are in time sync) and choose the copies that have the latest time.  
+(b) Choose one particular brick as the source for split brain and
+    heal all split brains from this brick.  
+(c) Just remove the split brain information from changelog. We leave
+    the exercise to the user to repair split brain where in he would
+    rewrite to the split brained files. (right now the user is forced to
+    remove xattrs manually for this step).
+
+Benefits to GlusterFS
+--------------------
+
+Makes the end user more aware of healing status and provides statistics.
+
+Scope
+-----
+
+6.1. Nature of proposed change
+
+Modification to AFR and CLI and glusterd code
+
+6.2. Implications on manageability
+
+New CLI commands to be added. Existing commands to be improved.
+
+6.3. Implications on presentation layer
+
+N/A
+
+6.4. Implications on persistence layer
+
+N/A
+
+6.5. Implications on 'GlusterFS' backend
+
+N/A
+
+6.6. Modification to GlusterFS metadata
+
+N/A
+
+6.7. Implications on 'glusterd'
+
+Changes for healing specific commands will be introduced.
+
+How To Test
+-----------
+
+See documentation session
+
+User Experience
+---------------
+
+*Changes in CLI, effect on User experience...*
+
+Documentation
+-------------
+
+<http://review.gluster.org/#/c/7792/1/doc/features/afr-statistics.md>
+
+Status
+------
+
+Patches :
+
+<http://review.gluster.org/6044> <http://review.gluster.org/4790>
+
+Status:
+
+Merged
+\ No newline at end of file
author	raghavendra talur <raghavendra.talur@gmail.com>	2015-08-20 15:09:31 +0530
committer	Humble Devassy Chirammal <humble.devassy@gmail.com>	2015-08-31 02:27:22 -0700
commit	9e9e3c5620882d2f769694996ff4d7e0cf36cc2b (patch)
tree	3a00cbd0cc24eb7df3de9b2eeeb8d42ee9175f88 /done/GlusterFS 3.5/AFR CLI enhancements.md
parent	f6055cdb4dedde576ed8ec55a13814a69dceefdc (diff)