From 9e9e3c5620882d2f769694996ff4d7e0cf36cc2b Mon Sep 17 00:00:00 2001 From: raghavendra talur Date: Thu, 20 Aug 2015 15:09:31 +0530 Subject: Create basic directory structure All new features specs go into in_progress directory. Once signed off, it should be moved to done directory. For now, This change moves all the Gluster 4.0 feature specs to in_progress. All other specs are under done/release-version. More cleanup required will be done incrementally. Change-Id: Id272d301ba8c434cbf7a9a966ceba05fe63b230d BUG: 1206539 Signed-off-by: Raghavendra Talur Reviewed-on: http://review.gluster.org/11969 Reviewed-by: Humble Devassy Chirammal Reviewed-by: Prashanth Pai Tested-by: Humble Devassy Chirammal --- done/GlusterFS 3.6/Server-side Barrier feature.md | 213 ++++++++++++++++++++++ 1 file changed, 213 insertions(+) create mode 100644 done/GlusterFS 3.6/Server-side Barrier feature.md (limited to 'done/GlusterFS 3.6/Server-side Barrier feature.md') diff --git a/done/GlusterFS 3.6/Server-side Barrier feature.md b/done/GlusterFS 3.6/Server-side Barrier feature.md new file mode 100644 index 0000000..c13e25a --- /dev/null +++ b/done/GlusterFS 3.6/Server-side Barrier feature.md @@ -0,0 +1,213 @@ +Server-side barrier feature +=========================== + +- Author(s): Varun Shastry, Krishnan Parthasarathi +- Date: Jan 28 2014 +- Bugzilla: +- Document ID: BZ1060002 +- Document Version: 1 +- Obsoletes: NA + +Abstract +-------- + +Snapshot feature needs a mechanism in GlusterFS, where acknowledgements +to file operations (FOPs) are held back until the snapshot of all the +bricks of the volume are taken. + +The barrier feature would stop holding back FOPs after a configurable +'barrier-timeout' seconds. This is to prevent an accidental lockdown of +the volume. + +This mechanism should have the following properties: + +- Should keep 'barriering' transparent to the applications. +- Should not acknowledge FOPs that fall into the barrier class. A FOP + that when acknowledged to the application, could lead to the + snapshot of the volume become inconsistent, is a barrier class FOP. + +With the below example of 'unlink' how a FOP is classified as barrier +class is explained. + +For the following sequence of events, assuming unlink FOP was not +barriered. Assume a replicate volume with two bricks, namely b1 and b2. + + b1 b2 + time ---------------------------------- + | t1 snapshot + | t2 unlink /a unlink /a + \/ t3 mkdir /a mkdir /a + t4 snapshot + +The result of the sequence of events will store /a as a file in snapshot +b1 while /a is stored as directory in snapshot b2. This leads to split +brain problem of the AFR and in other way inconsistency of the volume. + +Copyright +--------- + +Copyright (c) 2014 Red Hat, Inc. + +This feature is licensed under your choice of the GNU Lesser General +Public License, version 3 or any later version (LGPLv3 or later), or the +GNU General Public License, version 2 (GPLv2), in all cases as published +by the Free Software Foundation. + +Introduction +------------ + +The volume snapshot feature snapshots a volume by snapshotting +individual bricks, that are available, using the lvm-snapshot +technology. As part of using lvm-snapshot, the design requires bricks to +be free from few set of modifications (fops in Barrier Class) to avoid +the inconsistency. This is where the server-side barriering of FOPs +comes into picture. + +Terminology +----------- + +- barrier(ing) - To make barrier fops temporarily inactive or + disabled. +- available - A brick is said to be available when the corresponding + glusterfsd process is running and serving file operations. +- FOP - File Operation + +High Level Design +----------------- + +### Architecture/Design Overview + +- Server-side barriering, for Snapshot, must be enabled/disabled on + the bricks of a volume in a synchronous manner. ie, any command + using this would be blocked until barriering is enabled/disabled. + The brick process would provide this mechanism via an RPC. +- Barrier translator would be placed immediately above io-threads + translator in the server/brick stack. +- Barrier translator would queue FOPs when enabled. On disable, the + translator dequeues all the FOPs, while serving new FOPs from + application. By default, barriering is disabled. +- The barrier feature would stop blocking the acknowledgements of FOPs + after a configurable 'barrier-timeout' seconds. This is to prevent + an accidental lockdown of the volume. +- Operations those fall into barrier class are listed below. Any other + fop not listed below does not fall into this category and hence are + not barriered. + - rmdir + - unlink + - rename + - [f]truncate + - fsync + - write with O\_SYNC flag + - [f]removexattr + +### Design Feature + +Following timeline diagram depicts message exchanges between glusterd +and brick during enable and disable of barriering. This diagram assumes +that enable operation is synchronous and disable is asynchronous. See +below for alternatives. + + glusterd (snapshot) barrier @ brick + ------------------ --------------- + t1 | | + t2 | continue to pass through + | all the fops + t3 send 'enable' | + t4 | * starts barriering the fops + | * send back the ack + t5 receive the ack | + | | + t6 | <take snap> | + | . | + | . | + | . | + | </take snap> | + | | + t7 send disable | + (does not wait for the ack) | + t8 | release all the holded fops + | and no more barriering + | | + t9 | continue in PASS_THROUGH mode + +Glusterd would send an RPC (described in API section), to enable +barriering on a brick, by setting option feature.barrier to 'ON' in +barrier translator. This would be performed on all the bricks present in +that node, belonging to the set of volumes that are being snapshotted. + +Disable of barriering can happen in synchronous or asynchronous mode. +The choice is left to the consumer of this feature. + +On disable, all FOPs queued up will be dequeued. Simultaneously the +subsequent barrier request(s) will be served. + +Barrier option enable/disable is persisted into the volfile. This is to +make the feature available for consumers in asynchronous mode, like any +other (configurable) feature. + +Barrier feature also has timeout option based on which dequeuing would +get triggered if the consumer fails to send the disable request. + +Low-level details of Barrier translator working +----------------------------------------------- + +The translator operates in one of two states, namely QUEUEING and +PASS\_THROUGH. + +When barriering is enabled, the translator moves to QUEUEING state. It +queues outgoing FOPs thereafter in the call back path. + +When barriering is disabled, the translator moves to PASS\_THROUGH state +and does not queue when it is in PASS\_THROUGH state. Additionally, the +queued FOPs are 'released', when the translator moves from QUEUEING to +PASS\_THROUGH state. + +It has a translator global queue (doubly linked lists, see +libglusterfs/src/list.h) where the FOPs are queued in the form of a call +stub (see libglusterfs/src/call-stub.[ch]) + +When the FOP has succeeded, but barrier translator failed to queue in +the call back, the barrier translator would disable barriering and +release any queued FOPs, barrier would inform the consumer about this +failure on succesive disable request. + +Interfaces +---------- + +### Application Programming Interface + +- An RPC procedure is added at the brick side, which allows any client + [sic] to set the feature.barrier option of the barrier translator + with a given value. +- Glusterd would be using this to set server-side-barriering on, on a + brick. + +Performance Considerations +-------------------------- + +- The barriering of FOPs may be perceived as a performance degrade by + the applications. Since this is a hard requirement for snapshot, the + onus is on the snapshot feature to reduce the window for which + barriering is enabled. + +### Scalability + +- In glusterd, each brick operation is executed in a serial manner. + So, the latency of enabling barriering is a function of the no. of + bricks present on the node of the set of volumes being snapshotted. + This is not a scalability limitation of the mechanism of enabling + barriering but a limitation in the brick operations mechanism in + glusterd. + +Migration Considerations +------------------------ + +The barrier translator is introduced with op-version 4. It is a +server-side translator and does not impact older clients even when this +feature is enabled. + +Installation and deployment +--------------------------- + +- Barrier xlator is not packaged with glusterfs-server rpm. With this + changes, this has to be added to the rpm. -- cgit