diff options
Diffstat (limited to 'Feature Planning/GlusterFS 3.7/Data Classification.md')
-rw-r--r-- | Feature Planning/GlusterFS 3.7/Data Classification.md | 279 |
1 files changed, 279 insertions, 0 deletions
diff --git a/Feature Planning/GlusterFS 3.7/Data Classification.md b/Feature Planning/GlusterFS 3.7/Data Classification.md new file mode 100644 index 0000000..a3bb35c --- /dev/null +++ b/Feature Planning/GlusterFS 3.7/Data Classification.md @@ -0,0 +1,279 @@ +Goal +---- + +Support tiering and other policy-driven (as opposed to pseudo-random) +placement of files. + +Summary +------- + +"Data classification" is an umbrella term covering things: +locality-aware data placement, SSD/disk or +normal/deduplicated/erasure-coded data tiering, HSM, etc. They share +most of the same infrastructure, and so are proposed (for now) as a +single feature. + +NB this has also been referred to as "DHT on DHT" in various places, +though "unify on DHT" might be more accurate. + +Owners +------ + +Dan Lambright <dlambrig@redhat.com> + +Joseph Fernandes <josferna@redhat.com> + +Current status +-------------- + +Cache tiering under development upstream. Tiers may be added to existing +volumes. Tiers are made up of bricks. + +Volume-granularity tiering has been prototyped (bugzilla \#9387) and +merged in a branch (origin/fix\_9387) to the cache tiering forge +project. This will allow existing volumes to be combined into a single +one offering both functionality. + +Related Feature Requests and Bugs +--------------------------------- + +N/A + +Detailed Description +-------------------- + +The basic idea is to layer multiple instances of a modified DHT +translator on top of one another, each making placement/rebalancing +decisions based on different criteria. The current consistent-hashing +method is one possibility. Other possibilities involve matching +file/directory characteristics to subvolume characteristics. + +- File/directory characteristics: size, age, access rate, type + (extension), ... + +- Subvolume characteristics: physical location, storage type (e.g. + SSD/disk/PCM, cache), encoding method (e.g. erasure coded or + deduplicated). + +- Either (arbitrary tags assigned by user): owner, security level, + HIPPA category + +For example, a first level might redirect files based on security level, +a second level might match age or access rate vs. SSD-based or +disk-based subvolumes, and then a third level might use consistent +hashing across several similarly-equipped bricks. + +### Cache tier + +The cache tier will support data placement based on access frequency. +Frequently accessed files shall exist on a "hot" subvolume. Infrequently +accessed files shall reside on a "cold" subvolume. Files will migrate +between the hot and cold subvolumes according to observed usage. + +Read caching is a desired future enhancement. + +When the "cold" subvolume is expensive to use (e.g. erasure coded), this +feature will mitigate its overhead for many workloads. + +Some use cases: + +- fast subvolumes are SSDs, slow subvolumes are normal disks +- fast subvolumes are normal disks, slow subvolumes are erasure coded. +- fast subvolume is backed up more frequently than slow tier. +- read caching only , good in cases where migration overhead is + unacceptable + +Benefit to GlusterFS +-------------------- + +By itself, data classification can be used to improve performance (by +optimizing where "hot" files are placed) and security or regulatory +compliance (by placing sensitive data only on the most secure storage). +It also serves as an enabling technology for other enhancements by +allowing users to combine more cost-effective or archivally oriented +storage for the majority of their data with higher-performance storage +to absorb the majority of their I/O load. This enabling effect applies +e.g. to compression, deduplication, erasure coding, or bitrot detection. + +Scope +----- + +### Nature of proposed change + +The most basic set of changes involves making the data-placement part of +DHT more modular, and providing modules/plugins to do the various kinds +of intelligent placement discussed above. Other changes will be +explained in subsequent sections. + +### Implications on manageability + +Eventually, the CLI must provide users with a way to arrange bricks into +a hierarchy, and assign characteristics such as storage type or security +level at any level within that hierarchy. They must also be able to +express which policy (plugin), with which parameters, should apply to +any level. A data classification language has been proposed to help +express these concepts, see link above. + +The cache tier's graph is more rigid and can be expressed using the +"volume attach-cache" command described below. Both a "hot" tier and +"cold tier" are made up of dispersed / distributed / replicated bricks +in the same manner as a normal volume, and they are combined with the +tier translator. + +#### Cache Tier + +An "attach" command will declare an existing volume as "cold" and create +a new "hot" volume which is appended to it. Together, the combination is +a single "cache tiered" volume. For example: + +gluser volume attach-tier [name] [redundancy \#] brick1 brick2 .. brickN + +.. will attach a hot tier made up of brick[1..N] to existing volume +[name]. + +The tier can be detached. Data is first migrated off the hot volume, in +the same manner as brick removal, and then the hot volume is removed +from the volfile. + +gluster volume detach-tier brick1,...,brickN + +To start cache tiering: + +gluster volume rebalance [name] tier start + +Enable the change time recorder: + +gluster voiume set [name] features.ctr-enabled on + +Other cache parameters: + +tier-demote-frequency - how often thread wakes up to demote data + +tier-promote-frequency - as above , to promote data + +To stop it: + +gluster volume rebalance [name] tier stop + +To get status: + +gluster volume rebalance [name] tier status + +upcoming: + +A "pause-tier" command will allow users to stop using the hot tier. +While paused, data will be migrated off the hot tier to the cold tier, +and all I/Os will be forwarded to the cold tier. A status CLI will +indicate how much data remains to be "flushed" from the hot tier to the +cold tier. + +### Implications on presentation layer + +N/A + +### Implications on persistence layer + +N/A + +### Implications on 'GlusterFS' backend + +A tiered volume is a new volume type. + +Simple rules may be represented using volume "options" key-value in the +volfile. Eventually, for more elaborate graphs, some information about a +brick's characteristics and relationships (within the aforementioned +hierarchy) may be stored on the bricks themselves as well as in the +glusterd configuration. In addition, the volume's "info" file may +include an adjacency list to represent more elaborate graphs. + +### Modification to GlusterFS metadata + +There are no plans to change meta-data for the cache tier. However in +the future, categorizing files and directories (especially with +user-defined tags) may require additional xattrs. + +### Implications on 'glusterd' + +Finally, volgen must be able to convert these specifications into a +corresponding hierarchy of translators and options for those +translators. + +Adding and removing tiers dynamically closely resembles the add and +remove brick operations. + +How To Test +----------- + +Eventually, new tests will be needed to set up multi-layer hierarchies, +create files/directories, issue rebalance commands etc. and ensure that +files end up in the right place(s). Many of the tests are +policy-specific, e.g. to test an HSM policy one must effectively change +files' ages or access rates (perhaps artificially). + +Interoperability tests between the Snap, geo-rep, and quota features are +necessary. + +### Cache tier + +Tests should include: + +Automated tests are under development in the forge repository in file +tier.t + +- The performance of "cache friendly" workloads (e.g. repeated access to +a small set of files) is improved. + +- Performance is not substantially worse in "cache unfriendly" workloads +(e.g. sequential writes over large numbers of files.) + +- Performance should not become substantially worse when the hot tier's +bricks become full. + +User Experience +--------------- + +The hierarchical arrangement of bricks, with attributes and policies +potentially at many levels, represents a fundamental change to the +current "sea of identical bricks" model. Eventually, some commands that +currently apply to whole volumes will need to be modified to work on +sub-volume-level groups (or even individual bricks) as well. + +The cache tier must provide statistics on data migration. + +Dependencies +------------ + +Documentation +------------- + +See below. + +Status +------ + +Cache tiering implementation in progress for 3.7; some bits for more +general DC also done (fix 9387). + +- [Syntax + proposal](https://docs.google.com/presentation/d/1e8tuh9DKNi9eCMrdt5vetppn1D3BiJSmfR7lDW2wRvA/edit#slide=id.p) + (dormant). +- [Syntax prototype](https://forge.gluster.org/data-classification) + (dormant, not part of cache tiering). +- [Cache tier + design](https://docs.google.com/document/d/1cjFLzRQ4T1AomdDGk-yM7WkPNhAL345DwLJbK3ynk7I/edit) +- [Bug 763746](https://bugzilla.redhat.com/763746) - We need an easy + way to alter client configs without breaking DVM +- [Bug 905747](https://bugzilla.redhat.com/905747) - [FEAT] Tier + support for Volumes +- [Working tree for + tiering](https://forge.gluster.org/data-classification/data-classification) +- [Volgen changes for general DC](http://review.gluster.org/#/c/9387/) +- [d\_off changes to allow stacked + DHTs](https://www.mail-archive.com/gluster-devel%40gluster.org/msg03155.html) + (prototyped) +- [Video on the concept](https://www.youtube.com/watch?v=V4cvawIv1qA) + Efficient Data Maintenance in GlusterFS using DataBases : Data + Classification as the case study + +Comments and Discussion +----------------------- |