under_review/Compression Dedup.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128

Feature
-------

Compression / Deduplication

Summary
-------

In the never-ending quest to increase storage efficiency (or conversely
to decrease storage cost), we could compress and/or deduplicate data
stored on bricks.

Owners
------

Jeff Darcy <jdarcy@redhat.com>

Current status
--------------

Just a vague idea so far.

Related Feature Requests and Bugs
---------------------------------

TBD

Detailed Description
--------------------

Compression and deduplication for GlusterFS have been discussed many
times. Deduplication across machines/bricks is a recognized Hard
Problem, with uncertain benefits, and is thus considered out of scope.
Deduplication within a brick is potentially achievable by using
something like
[lessfs](http://sourceforge.net/projects/lessfs/files/ "wikilink"),
which is itself a FUSE filesystem, so one fairly simple approach would
be to integrate lessfs as a translator. There's no similar option for
compression.

In both cases, it's generally preferable to work on fully expanded files
while they're open, and then compress/dedup when they're closed. Some of
the bitrot or tiering infrastructure might be useful for moving files
between these states, or detecting when such a change is needed. There
are also some interesting interactions with quota, since we need to
count the un-compressed un-deduplicated size of the file against quota
(or do we?) and that's not what the underlying local file system will
report.

Benefit to GlusterFS
--------------------

Less \$\$\$/GB for our users.

Scope
-----

### Nature of proposed change

New translators, hooks into bitrot/tiering/quota, probably new daemons.

### Implications on manageability

Besides turning these options on or off, or setting parameters, there
will probably need to be some way of reporting the real vs.
compressed/deduplicated size of files/bricks/volumes.

### Implications on presentation layer

Should be none.

### Implications on persistence layer

If the DM folks ever get their <expletive deleted> together on this
front, we might be able to use some of their stuff instead of lessfs.
That worked so well for thin provisioning and snapshots.

### Implications on 'GlusterFS' backend

What's on the brick will no longer match the data that the user stored
(and might some day retrieve). In the case of compression,
reconstituting the user-visible version of the data should be a simple
matter of decompressing via a well known algorithm. In the case of
deduplication, the relevant data structures are much more complicated
and reconstitution will be correspondingly more difficult.

### Modification to GlusterFS metadata

Some of the information tracking deduplicated blocks will probably be
stored "privately" in .glusterfs or similar.

### Implications on 'glusterd'

TBD

How To Test
-----------

TBD

User Experience
---------------

Mostly unchanged, except for performance. As with erasure coding, a
compressed/deduplicated slow tier will usually need to be paired with a
simpler fast tier for overall performance to be acceptable.

Dependencies
------------

External: lessfs, DM, whatever other technology we use to do the
low-level work

Internal: tiering/bitrot (perhaps changelog?) to track state and detect
changes

Documentation
-------------

TBD

Status
------

Still just a vague idea.

Comments and Discussion
-----------------------