From 8907f67ba215172b01a7018adcbb063fcc4570e9 Mon Sep 17 00:00:00 2001 From: Humble Devassy Chirammal Date: Mon, 30 Mar 2015 12:21:05 +0530 Subject: doc: restructure developer docs to new layout The developer oriented information is scattered in source and its very difficult to identify which are those. With this patch subdirs are created under developer-guide which will be the parent for developer notes. The changes suggested in http://review.gluster.org/#/c/8827/ are also included in this patch. Change-Id: I4c8510d52c49f4066225f72cac8f97f087d6c70c BUG: 1206539 Signed-off-by: Humble Devassy Chirammal Reviewed-on: http://review.gluster.org/10038 Tested-by: Gluster Build System Reviewed-by: Lalatendu Mohanty Reviewed-by: Kaleb KEITHLEY --- doc/developer-guide/write-behind.md | 56 +++++++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+) create mode 100644 doc/developer-guide/write-behind.md (limited to 'doc/developer-guide/write-behind.md') diff --git a/doc/developer-guide/write-behind.md b/doc/developer-guide/write-behind.md new file mode 100644 index 00000000000..0d78964fa20 --- /dev/null +++ b/doc/developer-guide/write-behind.md @@ -0,0 +1,56 @@ +performance/write-behind translator +=================================== + +Basic working +-------------- + +Write behind is basically a translator to lie to the application that the +write-requests are finished, even before it is actually finished. + +On a regular translator tree without write-behind, control flow is like this: + +1. application makes a `write()` system call. +2. VFS ==> FUSE ==> `/dev/fuse`. +3. fuse-bridge initiates a glusterfs `writev()` call. +4. `writev()` is `STACK_WIND()`ed up to client-protocol or storage translator. +5. client-protocol, on receiving reply from server, starts `STACK_UNWIND()` towards the fuse-bridge. + +On a translator tree with write-behind, control flow is like this: + +1. application makes a `write()` system call. +2. VFS ==> FUSE ==> `/dev/fuse`. +3. fuse-bridge initiates a glusterfs `writev()` call. +4. `writev()` is `STACK_WIND()`ed up to write-behind translator. +5. write-behind adds the write buffer to its internal queue and does a `STACK_UNWIND()` towards the fuse-bridge. + +write call is completed in application's percepective. after +`STACK_UNWIND()`ing towards the fuse-bridge, write-behind initiates a fresh +writev() call to its child translator, whose replies will be consumed by +write-behind itself. Write-behind _doesn't_ cache the write buffer, unless +`option flush-behind on` is specified in volume specification file. + +Windowing +--------- + +With respect to write-behind, each write-buffer has three flags: `stack_wound`, `write_behind` and `got_reply`. + +* `stack_wound`: if set, indicates that write-behind has initiated `STACK_WIND()` towards child translator. +* `write_behind`: if set, indicates that write-behind has done `STACK_UNWIND()` towards fuse-bridge. +* `got_reply`: if set, indicates that write-behind has received reply from child translator for a `writev()` `STACK_WIND()`. a request will be destroyed by write-behind only if this flag is set. + +Currently pending write requests = aggregate size of requests with write_behind = 1 and got_reply = 0. + +window size limits the aggregate size of currently pending write requests. once +the pending requests' size has reached the window size, write-behind blocks +writev() calls from fuse-bridge. Blocking is only from application's +perspective. Write-behind does `STACK_WIND()` to child translator +straight-away, but hold behind the `STACK_UNWIND()` towards fuse-bridge. +`STACK_UNWIND()` is done only once write-behind gets enough replies to +accommodate for currently blocked request. + +Flush behind +------------ + +If `option flush-behind on` is specified in volume specification file, then +write-behind sends aggregate write requests to child translator, instead of +regular per request `STACK_WIND()`s. -- cgit