1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
|
# DHT lookup optimization
Distribute xlator (or DHT) has a performance penalty when dealing with negative
lookups. This document explains the problem and optimization provided for
alleviating the same in GlusterFS.
## Negative lookups and issues surrounding them
Negative lookups are lookup operations for entries that are not present in the
volume. IOW, a lookup for a file/directory that does not exist is a negative
lookup.
DHT normally looks up an entry in the hashed subvolume first (based on the
layout), if not found in the hashed location, it fans out a lookup across all
subvolumes to DHT, to ensure that the entry is not present in another subvolume.
This behavior comes from the perspective that if a rebalance is in progress,
and the layout on disk is temporarily out of alignment with the actual location
of the file, the entry is still found by the fan out lookup.
Such fan out lookups are costly and typically slow down file creates. This
especially impacts small file performance, where a large number of files are
being added/created in quick succession to the volume.
## Optimizing lookups in DHT
A balanced volume is either, a new volume that is created, and no bricks are
added to, or removed from the same, or a volume that has undergone expansion
(or reduction) of bricks and a full rebalance has been run on the volume.
In such volumes, the fan out lookup behavior can be turned off to speed up
negative lookups, as files are in their respective hashed locations (or at
least their DHT link-to entries are present in the hashed location).
With GlusterFS 3.7.2 negative lookup fan-out behavior is optimized, by not
performing the same, in an balanced volume.
The optimization provided, further detects a cluster out of balance (when a
fix-layout is done, or a brick is removed) to automatically turn **on** the
fan out negative lookup behavior, thereby preventing duplicate entry creation
in the volume, till the volume is brought into balance again.
## Configuration options to enable optimized lookups
With Gluster 3.7.2 the following options are provided to enable DHT lookup
optimization,
Option: cluster.lookup-optimize
Description: "This option if set to ON enables the optimization of -ve lookups,
by not doing a lookup on non-hashed subvolumes for files, in case the hashed
subvolume does not return any result. This option disregards the
lookup-unhashed setting, when enabled."
Default: OFF
CLI command to enable this option:
gluster volume set <volname> cluster.lookup-optimize <on/off>
### Client compatibility support
As DHT xlator runs on the client stack of gluster (i.e on the machine where the
FUSE/NFS Server/SAMBA Server are running), this configuration requires that the
cluster and the clients are upgraded to 3.7.2 version, at the minimum.
When setting this option, if any Gluster brick node or connected clients are of
an older version, the option will error out stating incompatible version
detected in the cluster and not allow the configuration change.
Older clients connecting to the cluster post this configuration option is set,
would also error out and not be able to mount the volume due to the version
incompatibility.
### Compatability with lookup-unhashed setting
In older DHT versions, the configuration option lookup-unhashed emulated a
similar behavior for a balanced cluster. The downside of this option is that
if the cluster grows or becomes unbalanced, there is a risk of losing entry
consistency. The current changes to gluster and specifically in DHT, prevent
this inconsistency from occurring when using the new option (lookup-optimize).
Additionally, if the lookup-optimize option is set, the older lookup-unhashed
setting is ignored by DHT.
## Requirements for the optimization to function
When the lookup-optimize option is enabled, there are a few prerequisites
before which the option is honored by DHT. The following list provides some of
these conditions and ways to meet the same.
1. New volume
A new volume is a volume that has just been created and is unused or not
started
- For a volume that is just created
- Prerequisite: Before starting and accessing this volume, set the lookup
optimization to ON
- Gotchas: All directories that are created post the above setting, will
leverage the negative lookup optimization, except entries in the root of
the volume.
NOTE: The root of the volume, or the brick root on each brick of the
volume, is already created prior to the start of the volume, or the
ability to set this option. As a result the root of the volume gains this
optimization only post the first full rebalance, or is treated equivalent
to an existing directory (see (2)-(1) below).
2. Existing volume
An existing volume is one which is under use, and may have had bricks added
or removed in its lifetime. In this scenario there are 2 cases where the
lookup optimization behavior changes,
Prerequisite: Enable the lookup-optimize option
1. New directory creation
- All directories created beyond this point will gain the negative lookup
optimization
2. Existing directories
- Existing directories will not gain the lookup optimization
- To enable existing directories to also gain the lookup optimization a
full rebalance on the volume needs to be performed
The optimization is also bypassed by the code automatically in the following
conditions,
1. Brick removed
- When a remove-brick is executed for a volume, it immediately triggers a
rebalance to move data out of the removed bricks. In these circumstances the
optimization is bypassed and a fan out lookup is performed for negative
lookups.
- Post removal of the brick, the lookup optimization would automatically kick
in
2. Brick added and only fix-layout is executed
- When a brick is added and a fix-layout only is executed, the files are
still not present in the correct hashed locations. As a result under these
conditions the lookup optimization is bypassed by DHT.
- A full rebalance post fix-layout would get the optimization enabled
NOTE: Although fix-layout is deprecated, it is still present and honored,
as a result this distinction is presented in this document. This is not an
endorsement of fix-layout still being supported.
## FAQ
<< TBD >>
1. How do I verify that I have a problem with negative lookups? OR
When should I use this option?
2. Can I roll back to an older client post enabling this optimization?
3. How do I verify this option is working for me?
4. What additional meta-data does this option add to the bricks?
5. I see duplicate entries after enabling this option, what should I do?
6. I see the following error in my client logs, help!
7. My create performance is still poor, help!
8. <Other suggestions welcome>
|