1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
|
Feature
-------
Compression / Deduplication
Summary
-------
In the never-ending quest to increase storage efficiency (or conversely
to decrease storage cost), we could compress and/or deduplicate data
stored on bricks.
Owners
------
Jeff Darcy <jdarcy@redhat.com>
Current status
--------------
Just a vague idea so far.
Related Feature Requests and Bugs
---------------------------------
TBD
Detailed Description
--------------------
Compression and deduplication for GlusterFS have been discussed many
times. Deduplication across machines/bricks is a recognized Hard
Problem, with uncertain benefits, and is thus considered out of scope.
Deduplication within a brick is potentially achievable by using
something like
[lessfs](http://sourceforge.net/projects/lessfs/files/ "wikilink"),
which is itself a FUSE filesystem, so one fairly simple approach would
be to integrate lessfs as a translator. There's no similar option for
compression.
In both cases, it's generally preferable to work on fully expanded files
while they're open, and then compress/dedup when they're closed. Some of
the bitrot or tiering infrastructure might be useful for moving files
between these states, or detecting when such a change is needed. There
are also some interesting interactions with quota, since we need to
count the un-compressed un-deduplicated size of the file against quota
(or do we?) and that's not what the underlying local file system will
report.
Benefit to GlusterFS
--------------------
Less \$\$\$/GB for our users.
Scope
-----
### Nature of proposed change
New translators, hooks into bitrot/tiering/quota, probably new daemons.
### Implications on manageability
Besides turning these options on or off, or setting parameters, there
will probably need to be some way of reporting the real vs.
compressed/deduplicated size of files/bricks/volumes.
### Implications on presentation layer
Should be none.
### Implications on persistence layer
If the DM folks ever get their <expletive deleted> together on this
front, we might be able to use some of their stuff instead of lessfs.
That worked so well for thin provisioning and snapshots.
### Implications on 'GlusterFS' backend
What's on the brick will no longer match the data that the user stored
(and might some day retrieve). In the case of compression,
reconstituting the user-visible version of the data should be a simple
matter of decompressing via a well known algorithm. In the case of
deduplication, the relevant data structures are much more complicated
and reconstitution will be correspondingly more difficult.
### Modification to GlusterFS metadata
Some of the information tracking deduplicated blocks will probably be
stored "privately" in .glusterfs or similar.
### Implications on 'glusterd'
TBD
How To Test
-----------
TBD
User Experience
---------------
Mostly unchanged, except for performance. As with erasure coding, a
compressed/deduplicated slow tier will usually need to be paired with a
simpler fast tier for overall performance to be acceptable.
Dependencies
------------
External: lessfs, DM, whatever other technology we use to do the
low-level work
Internal: tiering/bitrot (perhaps changelog?) to track state and detect
changes
Documentation
-------------
TBD
Status
------
Still just a vague idea.
Comments and Discussion
-----------------------
|