diff options
author | Shyam <srangana@redhat.com> | 2015-06-05 16:37:08 -0400 |
---|---|---|
committer | Raghavendra G <rgowdapp@redhat.com> | 2015-06-09 07:08:17 -0700 |
commit | 5e08bf5b351b13a4ced35a026c3db633e161b237 (patch) | |
tree | 2b3c1a2c024395e227870b39162405be1277bd1a /doc/admin-guide/en-US | |
parent | b77c8ddda8e68c1086227b19f9fb5db9bf89b0dc (diff) |
doc: Adding documentation for DHT lookup-optimize option
This is intended to be admin facing documentation, with some granular
details.
Change-Id: I379f8cea5375649fba915e907648e3a71344092d
BUG: 1228818
Signed-off-by: Shyam <srangana@redhat.com>
Reviewed-on: http://review.gluster.org/11109
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
Diffstat (limited to 'doc/admin-guide/en-US')
-rw-r--r-- | doc/admin-guide/en-US/markdown/admin_lookup_optimization.md | 145 |
1 files changed, 145 insertions, 0 deletions
diff --git a/doc/admin-guide/en-US/markdown/admin_lookup_optimization.md b/doc/admin-guide/en-US/markdown/admin_lookup_optimization.md new file mode 100644 index 00000000000..ccab44f87bb --- /dev/null +++ b/doc/admin-guide/en-US/markdown/admin_lookup_optimization.md @@ -0,0 +1,145 @@ +# DHT lookup optimization + +Distribute xlator (or DHT) has a performance penalty when dealing with negative +lookups. This document explains the problem and optimization provided for +alleviating the same in GlusterFS. + +## Negative lookups and issues surrounding them + +Negative lookups are lookup operations for entries that are not present in the +volume. IOW, a lookup for a file/directory that does not exist is a negative +lookup. + +DHT normally looks up an entry in the hashed subvolume first (based on the +layout), if not found in the hashed location, it fans out a lookup across all +subvolumes to DHT, to ensure that the entry is not present in another subvolume. +This behavior comes from the perspective that if a rebalance is in progress, +and the layout on disk is temporarily out of alignment with the actual location +of the file, the entry is still found by the fan out lookup. + +Such fan out lookups are costly and typically slow down file creates. This +especially impacts small file performance, where a large number of files are +being added/created in quick succession to the volume. + +## Optimizing lookups in DHT + +A balanced volume is either, a new volume that is created, and no bricks are +added to, or removed from the same, or a volume that has undergone expansion +(or reduction) of bricks and a full rebalance has been run on the volume. + +In such volumes, the fan out lookup behavior can be turned off to speed up +negative lookups, as files are in their respective hashed locations (or at +least their DHT link-to entries are present in the hashed location). + +With GlusterFS 3.7.2 negative lookup fan-out behavior is optimized, by not +performing the same, in an balanced volume. + +The optimization provided, further detects a cluster out of balance (when a +fix-layout is done, or a brick is removed) to automatically turn **on** the +fan out negative lookup behavior, thereby preventing duplicate entry creation +in the volume, till the volume is brought into balance again. + +## Configuration options to enable optimized lookups + +With Gluster 3.7.2 the following options are provided to enable DHT lookup +optimization, + +Option: cluster.lookup-optimize +Description: "This option if set to ON enables the optimization of -ve lookups, +by not doing a lookup on non-hashed subvolumes for files, in case the hashed +subvolume does not return any result. This option disregards the +lookup-unhashed setting, when enabled." +Default: OFF + +CLI command to enable this option: + gluster volume set <volname> cluster.lookup-optimize <on/off> + +### Client compatibility support + +As DHT xlator runs on the client stack of gluster (i.e on the machine where the +FUSE/NFS Server/SAMBA Server are running), this configuration requires that the +cluster and the clients are upgraded to 3.7.2 version, at the minimum. + +When setting this option, if any Gluster brick node or connected clients are of +an older version, the option will error out stating incompatible version +detected in the cluster and not allow the configuration change. + +Older clients connecting to the cluster post this configuration option is set, +would also error out and not be able to mount the volume due to the version +incompatibility. + +### Compatability with lookup-unhashed setting + +In older DHT versions, the configuration option lookup-unhashed emulated a +similar behavior for a balanced cluster. The downside of this option is that +if the cluster grows or becomes unbalanced, there is a risk of losing entry +consistency. The current changes to gluster and specifically in DHT, prevent +this inconsistency from occurring when using the new option (lookup-optimize). + +Additionally, if the lookup-optimize option is set, the older lookup-unhashed +setting is ignored by DHT. + +## Requirements for the optimization to function + +When the lookup-optimize option is enabled, there are a few prerequisites +before which the option is honored by DHT. The following list provides some of +these conditions and ways to meet the same. + +1. New volume + A new volume is a volume that has just been created and is unused or not + started + - For a volume that is just created + - Prerequisite: Before starting and accessing this volume, set the lookup + optimization to ON + - Gotchas: All directories that are created post the above setting, will + leverage the negative lookup optimization, except entries in the root of + the volume. + NOTE: The root of the volume, or the brick root on each brick of the + volume, is already created prior to the start of the volume, or the + ability to set this option. As a result the root of the volume gains this + optimization only post the first full rebalance, or is treated equivalent + to an existing directory (see (2)-(1) below). + +2. Existing volume + An existing volume is one which is under use, and may have had bricks added + or removed in its lifetime. In this scenario there are 2 cases where the + lookup optimization behavior changes, + Prerequisite: Enable the lookup-optimize option + 1. New directory creation + - All directories created beyond this point will gain the negative lookup + optimization + 2. Existing directories + - Existing directories will not gain the lookup optimization + - To enable existing directories to also gain the lookup optimization a + full rebalance on the volume needs to be performed + +The optimization is also bypassed by the code automatically in the following +conditions, + +1. Brick removed + - When a remove-brick is executed for a volume, it immediately triggers a + rebalance to move data out of the removed bricks. In these circumstances the + optimization is bypassed and a fan out lookup is performed for negative + lookups. + - Post removal of the brick, the lookup optimization would automatically kick + in +2. Brick added and only fix-layout is executed + - When a brick is added and a fix-layout only is executed, the files are + still not present in the correct hashed locations. As a result under these + conditions the lookup optimization is bypassed by DHT. + - A full rebalance post fix-layout would get the optimization enabled + NOTE: Although fix-layout is deprecated, it is still present and honored, + as a result this distinction is presented in this document. This is not an + endorsement of fix-layout still being supported. + +## FAQ +<< TBD >> +1. How do I verify that I have a problem with negative lookups? OR + When should I use this option? +2. Can I roll back to an older client post enabling this optimization? +3. How do I verify this option is working for me? +4. What additional meta-data does this option add to the bricks? +5. I see duplicate entries after enabling this option, what should I do? +6. I see the following error in my client logs, help! +7. My create performance is still poor, help! +8. <Other suggestions welcome> |