From 9e9e3c5620882d2f769694996ff4d7e0cf36cc2b Mon Sep 17 00:00:00 2001 From: raghavendra talur Date: Thu, 20 Aug 2015 15:09:31 +0530 Subject: Create basic directory structure All new features specs go into in_progress directory. Once signed off, it should be moved to done directory. For now, This change moves all the Gluster 4.0 feature specs to in_progress. All other specs are under done/release-version. More cleanup required will be done incrementally. Change-Id: Id272d301ba8c434cbf7a9a966ceba05fe63b230d BUG: 1206539 Signed-off-by: Raghavendra Talur Reviewed-on: http://review.gluster.org/11969 Reviewed-by: Humble Devassy Chirammal Reviewed-by: Prashanth Pai Tested-by: Humble Devassy Chirammal --- in_progress/Compression Dedup.md | 128 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 128 insertions(+) create mode 100644 in_progress/Compression Dedup.md (limited to 'in_progress/Compression Dedup.md') diff --git a/in_progress/Compression Dedup.md b/in_progress/Compression Dedup.md new file mode 100644 index 0000000..7829018 --- /dev/null +++ b/in_progress/Compression Dedup.md @@ -0,0 +1,128 @@ +Feature +------- + +Compression / Deduplication + +Summary +------- + +In the never-ending quest to increase storage efficiency (or conversely +to decrease storage cost), we could compress and/or deduplicate data +stored on bricks. + +Owners +------ + +Jeff Darcy + +Current status +-------------- + +Just a vague idea so far. + +Related Feature Requests and Bugs +--------------------------------- + +TBD + +Detailed Description +-------------------- + +Compression and deduplication for GlusterFS have been discussed many +times. Deduplication across machines/bricks is a recognized Hard +Problem, with uncertain benefits, and is thus considered out of scope. +Deduplication within a brick is potentially achievable by using +something like +[lessfs](http://sourceforge.net/projects/lessfs/files/ "wikilink"), +which is itself a FUSE filesystem, so one fairly simple approach would +be to integrate lessfs as a translator. There's no similar option for +compression. + +In both cases, it's generally preferable to work on fully expanded files +while they're open, and then compress/dedup when they're closed. Some of +the bitrot or tiering infrastructure might be useful for moving files +between these states, or detecting when such a change is needed. There +are also some interesting interactions with quota, since we need to +count the un-compressed un-deduplicated size of the file against quota +(or do we?) and that's not what the underlying local file system will +report. + +Benefit to GlusterFS +-------------------- + +Less \$\$\$/GB for our users. + +Scope +----- + +### Nature of proposed change + +New translators, hooks into bitrot/tiering/quota, probably new daemons. + +### Implications on manageability + +Besides turning these options on or off, or setting parameters, there +will probably need to be some way of reporting the real vs. +compressed/deduplicated size of files/bricks/volumes. + +### Implications on presentation layer + +Should be none. + +### Implications on persistence layer + +If the DM folks ever get their together on this +front, we might be able to use some of their stuff instead of lessfs. +That worked so well for thin provisioning and snapshots. + +### Implications on 'GlusterFS' backend + +What's on the brick will no longer match the data that the user stored +(and might some day retrieve). In the case of compression, +reconstituting the user-visible version of the data should be a simple +matter of decompressing via a well known algorithm. In the case of +deduplication, the relevant data structures are much more complicated +and reconstitution will be correspondingly more difficult. + +### Modification to GlusterFS metadata + +Some of the information tracking deduplicated blocks will probably be +stored "privately" in .glusterfs or similar. + +### Implications on 'glusterd' + +TBD + +How To Test +----------- + +TBD + +User Experience +--------------- + +Mostly unchanged, except for performance. As with erasure coding, a +compressed/deduplicated slow tier will usually need to be paired with a +simpler fast tier for overall performance to be acceptable. + +Dependencies +------------ + +External: lessfs, DM, whatever other technology we use to do the +low-level work + +Internal: tiering/bitrot (perhaps changelog?) to track state and detect +changes + +Documentation +------------- + +TBD + +Status +------ + +Still just a vague idea. + +Comments and Discussion +----------------------- -- cgit