summaryrefslogtreecommitdiffstats
path: root/done/GlusterFS 3.5/Zerofill.md
diff options
context:
space:
mode:
Diffstat (limited to 'done/GlusterFS 3.5/Zerofill.md')
-rw-r--r--done/GlusterFS 3.5/Zerofill.md192
1 files changed, 192 insertions, 0 deletions
diff --git a/done/GlusterFS 3.5/Zerofill.md b/done/GlusterFS 3.5/Zerofill.md
new file mode 100644
index 0000000..43b279d
--- /dev/null
+++ b/done/GlusterFS 3.5/Zerofill.md
@@ -0,0 +1,192 @@
+Feature
+-------
+
+zerofill API for GlusterFS
+
+Summary
+-------
+
+zerofill() API would allow creation of pre-allocated and zeroed-out
+files on GlusterFS volumes by offloading the zeroing part to server
+and/or storage (storage offloads use SCSI WRITESAME).
+
+Owners
+------
+
+Bharata B Rao
+M. Mohankumar
+
+Current status
+--------------
+
+Patch on gerrit: <http://review.gluster.org/5327>
+
+Detailed Description
+--------------------
+
+Add support for a new ZEROFILL fop. Zerofill writes zeroes to a file in
+the specified range. This fop will be useful when a whole file needs to
+be initialized with zero (could be useful for zero filled VM disk image
+provisioning or during scrubbing of VM disk images).
+
+Client/application can issue this FOP for zeroing out. Gluster server
+will zero out required range of bytes ie server offloaded zeroing. In
+the absence of this fop, client/application has to repetitively issue
+write (zero) fop to the server, which is very inefficient method because
+of the overheads involved in RPC calls and acknowledgements.
+
+WRITESAME is a SCSI T10 command that takes a block of data as input and
+writes the same data to other blocks and this write is handled
+completely within the storage and hence is known as offload . Linux ,now
+has support for SCSI WRITESAME command which is exposed to the user in
+the form of BLKZEROOUT ioctl. BD Xlator can exploit BLKZEROOUT ioctl to
+implement this fop. Thus zeroing out operations can be completely
+offloaded to the storage device , making it highly efficient.
+
+The fop takes two arguments offset and size. It zeroes out 'size' number
+of bytes in an opened file starting from 'offset' position.
+
+Benefit to GlusterFS
+--------------------
+
+Benefits GlusterFS in virtualization by providing the ability to quickly
+create pre-allocated and zeroed-out VM disk image by using
+server/storage off-loads.
+
+### Scope
+
+Nature of proposed change
+-------------------------
+
+An FOP supported in libgfapi and FUSE.
+
+Implications on manageability
+-----------------------------
+
+None.
+
+Implications on presentation layer
+----------------------------------
+
+N/A
+
+Implications on persistence layer
+---------------------------------
+
+N/A
+
+Implications on 'GlusterFS' backend
+-----------------------------------
+
+N/A
+
+Modification to GlusterFS metadata
+----------------------------------
+
+N/A
+
+Implications on 'glusterd'
+--------------------------
+
+N/A
+
+How To Test
+-----------
+
+Test server offload by measuring the time taken for creating a fully
+allocated and zeroed file on Posix backend.
+
+Test storage offload by measuring the time taken for creating a fully
+allocated and zeroed file on BD backend.
+
+User Experience
+---------------
+
+Fast provisioning of VM images when GlusterFS is used as a file system
+backend for KVM virtualization.
+
+Dependencies
+------------
+
+zerofill() support in BD backend depends on the new BD translator -
+<http://review.gluster.org/#/c/4809/>
+
+Documentation
+-------------
+
+This feature add support for a new ZEROFILL fop. Zerofill writes zeroes
+to a file in the specified range. This fop will be useful when a whole
+file needs to be initialized with zero (could be useful for zero filled
+VM disk image provisioning or during scrubbing of VM disk images).
+
+Client/application can issue this FOP for zeroing out. Gluster server
+will zero out required range of bytes ie server offloaded zeroing. In
+the absence of this fop, client/application has to repetitively issue
+write (zero) fop to the server, which is very inefficient method because
+of the overheads involved in RPC calls and acknowledgements.
+
+WRITESAME is a SCSI T10 command that takes a block of data as input and
+writes the same data to other blocks and this write is handled
+completely within the storage and hence is known as offload . Linux ,now
+has support for SCSI WRITESAME command which is exposed to the user in
+the form of BLKZEROOUT ioctl. BD Xlator can exploit BLKZEROOUT ioctl to
+implement this fop. Thus zeroing out operations can be completely
+offloaded to the storage device , making it highly efficient.
+
+The fop takes two arguments offset and size. It zeroes out 'size' number
+of bytes in an opened file starting from 'offset' position.
+
+This feature adds zerofill support to the following areas:
+
+-  libglusterfs
+-  io-stats
+-  performance/md-cache,open-behind
+-  quota
+-  cluster/afr,dht,stripe
+-  rpc/xdr
+-  protocol/client,server
+-  io-threads
+-  marker
+-  storage/posix
+-  libgfapi
+
+Client applications can exploit this fop by using glfs\_zerofill
+introduced in libgfapi.FUSE support to this fop has not been added as
+there is no system call for this fop.
+
+Here is a performance comparison of server offloaded zeofill vs zeroing
+out using repeated writes.
+
+ [root@llmvm02 remote]# time ./offloaded aakash-test log 20
+
+ real    3m34.155s
+ user    0m0.018s
+ sys 0m0.040s
+
+
+  [root@llmvm02 remote]# time ./manually aakash-test log 20
+
+ real    4m23.043s
+ user    0m2.197s
+ sys 0m14.457s
+  [root@llmvm02 remote]# time ./offloaded aakash-test log 25;
+
+ real    4m28.363s
+ user    0m0.021s
+ sys 0m0.025s
+ [root@llmvm02 remote]# time ./manually aakash-test log 25
+
+ real    5m34.278s
+ user    0m2.957s
+ sys 0m18.808s
+
+The argument log is a file which we want to set for logging purpose and
+the third argument is size in GB .
+
+As we can see there is a performance improvement of around 20% with this
+fop.
+
+Status
+------
+
+Patch : <http://review.gluster.org/5327> Status : Merged \ No newline at end of file