From ee1c8b52721ce815bc98fd60a6b0e867848c8d79 Mon Sep 17 00:00:00 2001 From: Susant Palai Date: Sun, 26 Nov 2017 11:49:48 +0530 Subject: cloudArchival: Added feature page and design document Change-Id: Iff9025dc28ae1b12213b564903b03001251e8aff Signed-off-by: Susant Palai Reviewed-on: https://review.gluster.org/18854 Reviewed-by: Ashish Pandey Reviewed-by: Amar Tumballi Tested-by: Amar Tumballi Reviewed-by: Niels de Vos --- accepted/CloudArchival.md | 84 +++++++++++++++++++++++++++++++++++ design/Cloud-Archival/CloudArchive.md | 60 +++++++++++++++++++++++++ 2 files changed, 144 insertions(+) create mode 100644 accepted/CloudArchival.md create mode 100644 design/Cloud-Archival/CloudArchive.md diff --git a/accepted/CloudArchival.md b/accepted/CloudArchival.md new file mode 100644 index 0000000..ed25fea --- /dev/null +++ b/accepted/CloudArchival.md @@ -0,0 +1,84 @@ +# CloudArchival + +### Goal + +A new Cloud archival story for Glusterfs. + +### Summary +The feature will archive cold data to cloud storage. Applications where majority +of the data are not accessed/modified frequently can be archived to low-cost +cloud storage. And the local storage system(Glusterfs) space can be used for +files that needs high performance operations + +### Owners + +Aravinda Krishna Murthy + +Susant Kumar Palai + +### Current Status Feature under development + +### Detailed Description + +A scanner/uploader tool will run a policy (tunable) based scan and will upload +files to the cloud storage. Post migration of data to cloud, downloader xlator +will truncate the file and store the size information as xattr. Any meta-data +operation will be served locally from glusterfs till the next data modification +request. On a data modification, the request will be stubbed and downloader +will download the file from cloud. Upon success, the stubbed request will be +resumed. + + +### Benefits to GlusterFS +This archival feature will be of immense benifit to users where majority of +their data in the storage system are cold. With this, users can leverage the +in house Glusterfs space for high performance jobs. + +### Scope + +### Nature of proposed change + +- An uploader tool - Role is to scan the file system and upload file to cloud + based on a user-defined policy. + +- Downloader xlator - This xlator will intercept data modification request on a + file which resides in cloud. A download operation will be initiated, post + which the data modification request will be resumed. + +### Implications on manageability +At a high level, command to enable, configure downloader xlator. + +### Implications on presentation layer +N/A + +### Implications on persistence layer +N/A + +### Implications on 'GlusterFS' backend +None + +### Modification to GlusterFS metadata +Post archival, a size xattr will be set on the file to serve meta-data requests +as the file would have been truncated + +### Implications on 'glusterd' +Volgen must be able to configure the downloader xlator and store information +related cloud provider and access. + +### How to Test + +N/A + +### User Experience +Minimal change, mostly related to new options. Some latency will be experienced +while the flie is getting downloaded from cloud during data modification. + +### Dependencies N/A + +### Documentation TBD. + +### Status + +Patches being worked on : + +- https://review.gluster.org/#/c/18532/ (Downloader Xlator) diff --git a/design/Cloud-Archival/CloudArchive.md b/design/Cloud-Archival/CloudArchive.md new file mode 100644 index 0000000..9506429 --- /dev/null +++ b/design/Cloud-Archival/CloudArchive.md @@ -0,0 +1,60 @@ +# CloudArchival-Design.md + +This document gives a high level overview of CloudArchival. The design is being +refined as we go along, and this document will be updated along the way. + +## Introduction + +This design solves the usecase where data that requires high-speed access is +retained internally i.e. Glusterfs and lower-priority data is moved to a +low-cost cloud-based archive storage. This will allow reduction in storage cost +for usecases where a majority of data is cold and can be archived. + +## Architectural Overview + +CloudeArchival has two components. A scanner/uploader tool and a downloader +xlator in Glusterfs stack. + +### 1. Scanner/uploader + +This tool will scan the file system and based on a policy, will upload the data +to a predecided Cloud Storage. The policy can be user defined. A simple example +would be, upload any file that has not been accessed for one month. + +### 2. Downloader + +This xlator will download the file from Cloud-Storage when an access for +read/write (basically any data modification) request is made. This xlator will +be placed on the client side as AFR and EC xlators are client xlators. + +## Work Flow + + - Phase I - Post scanning, the uploader will filter out files to be archived + to Cloud. Once the data migration is complete to Cloud, the uploader will do +a setxattr operation on the file to inform the downloader xlator to truncate +the data. As part of this maintenance, downloader will store the size +information as an xattr on the file to serve lookup/stat etc and then will +truncate the data. + + +- Phase II - While the data resides on Cloud, all meta-data operation can be + performed locally on Glusterfs. The data will be downloaded only when a data +modification is requested. For read/write request, the downloader will stub the +request and start downloading the file from Cloud. Upon successful download, +the stubbed request will be resumed. + +## Cloud Information and Security + +Cloud information like which Cloud provider and it's access information can be +stored per volume basis through Glusterd. There can only one cloud storage be +attached to a volume. + +Since the communication channel to Cloud needs to be secured, the access +information for Cloud should and must reside on the trusted storage pool. +GF-proxy fits this requirement nicely as it runs on the trusted storage pool +(as for now). Hence, the downloader will be part of GF-proxy daemon on the +trusted storage pool. + +#### Note: Initial implementation will integrate with Amazon Web Service (AWS). +Integration with other Cloud Storage will be left open for development to the +community. -- cgit