diff options
author | Shyam <srangana@redhat.com> | 2017-02-27 13:25:14 -0500 |
---|---|---|
committer | Jeff Darcy <jdarcy@redhat.com> | 2017-03-06 07:35:12 -0500 |
commit | aa2f48dbd8f8ff1d10230fb9656f2ac7d99a48f8 (patch) | |
tree | 55a38e973d906b8a979ae64bd0bf86c362234f5a /under_review | |
parent | 8104eedeaa3ad75b712b1dfb8488e4609b140ac4 (diff) |
doc: Moved feature pages that were delivered as a part of 3.10.0
Change-Id: I35a6b599eebbe42b5ef1244d2d72fa103bcf8acb
Signed-off-by: Shyam <srangana@redhat.com>
Reviewed-on: https://review.gluster.org/16775
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Diffstat (limited to 'under_review')
-rw-r--r-- | under_review/client-opversion.md | 111 | ||||
-rw-r--r-- | under_review/max-opversion.md | 118 | ||||
-rw-r--r-- | under_review/multiplexing.md | 141 | ||||
-rw-r--r-- | under_review/readdir-ahead.md | 167 | ||||
-rw-r--r-- | under_review/rebalance-estimates.md | 128 | ||||
-rw-r--r-- | under_review/tier_service.md | 130 |
6 files changed, 0 insertions, 795 deletions
diff --git a/under_review/client-opversion.md b/under_review/client-opversion.md deleted file mode 100644 index 8c9991e..0000000 --- a/under_review/client-opversion.md +++ /dev/null @@ -1,111 +0,0 @@ -Feature -------- - -Summary -------- - -Support to get the op-version information for each client through the volume -status command. - -Owners ------- - -Samikshan Bairagya <samikshan@gmail.com> - -Current status --------------- - -Currently the only way to get an idea regarding the version of the connected -clients is to grep for "accepted client from" in /var/log/glusterfs/bricks. -There is no command that gives that information out to the users. - -Related Feature Requests and Bugs ---------------------------------- - -https://bugzilla.redhat.com/show_bug.cgi?id=1409078 - -Detailed Description --------------------- - -The op-version information for each client can be added to the already existing -volume status command. `volume status <VOLNAME|all> clients` currently gives the -following information for each client: - -* Hostname:port -* Bytes Read -* Bytes Written - -Benefit to GlusterFS --------------------- - -This would make the user-experience better as it would make it easier for users -to know the op-version of each client from a single command. - -Scope ------ - -#### Nature of proposed change - -Adds more information to `volume status <VOLNAME|all> clients` output. - -#### Implications on manageability - -None. - -#### Implications on presentation layer - -None. - -#### Implications on persistence layer - -None. - -#### Implications on 'GlusterFS' backend - -None. - -#### Modification to GlusterFS metadata - -None. - -#### Implications on 'glusterd' - -None. - -How To Test ------------ - -This can be tested by having clients with different glusterfs versions connected -to running volumes, and executing the `volume status <VOLNAME|all> clients` -command. - -User Experience ---------------- - -Users can use the `volume status <VOLNAME|all> clients` command to get -information on the op-versions for each client along with information that were -already available like (hostname, bytes read and bytes written). - -Dependencies ------------- - -None - -Documentation -------------- - -None. - -Status ------- - -In development. - -Comments and Discussion ------------------------ - - 1. Discussion on gluster-devel ML: - - [Thread 1](http://www.gluster.org/pipermail/gluster-users/2016-January/025064.html) - - [Thread 2](http://www.gluster.org/pipermail/gluster-devel/2017-January/051820.html) - 2. [Discussion on Github](https://github.com/gluster/glusterfs/issues/79) - diff --git a/under_review/max-opversion.md b/under_review/max-opversion.md deleted file mode 100644 index 16d4ee4..0000000 --- a/under_review/max-opversion.md +++ /dev/null @@ -1,118 +0,0 @@ -Feature -------- - -Summary -------- - -Support to retrieve the maximum supported op-version (cluster.op-version) in a -heterogeneous cluster. - -Owners ------- - -Samikshan Bairagya <samikshan@gmail.com> - -Current status --------------- - -Currently users can retrieve the op-version on which a cluster is operating by -using the gluster volume get command on the global option cluster.op-version as -follows: - -# gluster volume get <volname> cluster.op-version - -There is however no way for an user to find out the maximum op-version to which -the cluster could be bumped upto. - -Related Feature Requests and Bugs ---------------------------------- - -https://bugzilla.redhat.com/show_bug.cgi?id=1365822 - -Detailed Description --------------------- - -A heterogeneous cluster operates on a common op-version that can be supported -across all the nodes in the trusted storage pool.Upon upgrade of the nodes in -the cluster, the cluster might support a higher op-version. However, since it -is currently not possible for the user to get this op-version value, it is -difficult for them to bump up the op-version of the cluster to the supported -value. - -The maximum supported op-version in a cluster would be the minimum of the -maximum op-versions in each of the nodes. To retrieve this, the volume get -functionality could be invoked as follows: - -# gluster volume get all cluster.max-op-version - -Benefit to GlusterFS --------------------- - -This would make the user-experience better as it would make it easier for users -to know the maximum op-version on which the cluster can operate. - -Scope ------ - -#### Nature of proposed change - -This adds a new non-settable global option, cluster.max-op-version. - -#### Implications on manageability - -None. - -#### Implications on presentation layer - -None. - -#### Implications on persistence layer - -None. - -#### Implications on 'GlusterFS' backend - -None. - -#### Modification to GlusterFS metadata - -None. - -#### Implications on 'glusterd' - -None. - -How To Test ------------ - -This can be tested on a cluster with at least one node running on version 'n+1' -and others on version 'n' where n = 3.10. The maximum supported op-version -(cluster.max-op-version) should be returned by `volume get` as n in this case. - -User Experience ---------------- - -Upon upgrade of one or more nodes in a cluster, users can get the new maximum -op-version the cluster can support. - -Dependencies ------------- - -None - -Documentation -------------- - -None. - -Status ------- - -In development. - -Comments and Discussion ------------------------ - - 1. [Discussion on gluster-devel ML](http://www.gluster.org/pipermail/gluster-devel/2016-December/051650.html) - 2. [Discussion on Github](https://github.com/gluster/glusterfs/issues/56) - diff --git a/under_review/multiplexing.md b/under_review/multiplexing.md deleted file mode 100644 index fd06150..0000000 --- a/under_review/multiplexing.md +++ /dev/null @@ -1,141 +0,0 @@ -Feature -------- -Brick Multiplexing - -Summary -------- - -Use one process (and port) to serve multiple bricks. - -Owners ------- - -Jeff Darcy (jdarcy@redhat.com) - -Current status --------------- - -In development. - -Related Feature Requests and Bugs ---------------------------------- - -Mostly N/A, except that this will make implementing real QoS easier at some -point in the future. - -Detailed Description --------------------- - -The basic idea is very simple: instead of spawning a new process for every -brick, we send an RPC to an existing brick process telling it to attach the new -brick (identified and described by a volfile) beneath its protocol/server -instance. Likewise, instead of killing a process to terminate a brick, we tell -it to detach one of its (possibly several) brick translator stacks. - -Bricks can *not* share a process if they use incompatible transports (e.g. TLS -vs. non-TLS). Also, a brick process serving several bricks is a larger failure -domain than we have with a process per brick, so we might voluntarily decide to -spawn a new process anyway just to keep the failure domains smaller. Lastly, -there should always be a fallback to current brick-per-process behavior, by -simply pretending that all bricks' transports are incompatible with each other. - -Benefit to GlusterFS --------------------- - -Multiplexing should significantly reduce resource consumption: - - * Each *process* will consume one TCP port, instead of each *brick* doing so. - - * The cost of global data structures and object pools will be reduced to 1/N - of what it is now, where N is the average number of bricks per process. - - * Thread counts will also be reduced to 1/N. This avoids the exponentially - bad thrashing effects as the total number of threads far exceeds the number - of cores, made worse by multiple processes trying to auto-scale the nunber - of network and disk I/O threads independently. - -These resource issues are already limiting the number of bricks and volumes we -can support. By reducing all forms of resource consumption at once, we should -be able to raise these user-visible limits by a corresponding amount. - -Scope ------ - -#### Nature of proposed change - -The largest changes are at the two places where we do brick and process -management - GlusterD at one end, generic glusterfsd code at the other. The -new messages require changes to rpc and client/server translator code. The -server translator needs further changes to look up one among several child -translators instead of assuming only one. Auth code must be changed to handle -separate permissions/credentials on each brick. - -Beyond these "obvious" changes, many lesser changes will undoubtedly be needed -anywhere that we make assumptions about the relationships between bricks and -processes. Anything that involves a "helper" daemon - e.g. self-heal, quota - -is particularly suspect in this regard. - -#### Implications on manageability - -The fact that bricks can only share a process when they have compatible -transports might affect decisions about what transport options to use for -separate volumes. - -#### Implications on presentation layer - -N/A - -#### Implications on persistence layer - -N/A - -#### Implications on 'GlusterFS' backend - -N/A - -#### Modification to GlusterFS metadata - -N/A - -#### Implications on 'glusterd' - -GlusterD changes are integral to this feature, and described above. - -How To Test ------------ - -For the most part, testing is of the "do no harm" sort; the most thorough test -of this feature is to run our current regression suite. Only one additional -test is needed - create/start a volume with multiple bricks on one node, and -check that only one glusterfsd process is running. - -User Experience ---------------- - -Volume status can now include the possibly-surprising result of multiple bricks -on the same node having the same port number and PID. Anything that relies on -these values, such as monitoring or automatic firewall configuration (or our -regression tests) could get confused and/or end up doing the wrong thing. - -Dependencies ------------- - -N/A - -Documentation -------------- - -TBD (very little) - -Status ------- - -Very basic functionality - starting/stopping bricks along with volumes, -mounting, doing I/O - work. Some features, especially snapshots, probably do -not work. Currently running tests to identify the precise extent of needed -fixes. - -Comments and Discussion ------------------------ - -N/A diff --git a/under_review/readdir-ahead.md b/under_review/readdir-ahead.md deleted file mode 100644 index 71e5b62..0000000 --- a/under_review/readdir-ahead.md +++ /dev/null @@ -1,167 +0,0 @@ -Feature -------- -Improve directory enumeration performance - -Summary -------- -Improve directory enumeration performance by implementing parallel readdirp -at the dht layer. - -Owners ------- - -Raghavendra G <rgowdapp@redhat.com> -Poornima G <pgurusid@redhat.com> -Rajesh Joseph <rjoseph@redhat.com> - -Current status --------------- - -In development. - -Related Feature Requests and Bugs ---------------------------------- -https://bugzilla.redhat.com/show_bug.cgi?id=1401812 - -Detailed Description --------------------- - -Currently readdirp is sequential at the dht layer. -This makes find and recursive listing of small directories very slow -(directory whose content can be accomodated in one readdirp call, -eg: ~600 entries if buf size is 128k). - -The number of readdirp fops required to fetch the ls -l -R for nested -directories is: -no. of fops = (x + 1) * m * n -n = number of bricks -m = number of directories -x = number of readdirp calls required to fetch the dentries completely -(this depends on the size of the directory and the readdirp buf size) -1 = readdirp fop that is sent to just detect the end of directory. - -Eg: Let's say, to list 800 directories with files ~300 each and readdirp -buf size 128K, on distribute 6: -(1+1) * 800 * 6 = 9600 fops - -And all the readdirp fops are sent in sequential manner to all the bricks. -With parallel readdirp, the number of fops may not decrease drastically -but since they are issued in parallel, it will increase the throughput. - -Why its not a straightforward problem to solve: -One needs to briefly understand, how the directory offset is handled in dht. -[1], [2], [3] are some of the links that will hint the same. -- The d_off is in the order of bricks identfied by dht. Hence, the dentries -should always be returned in the same order as bricks. i.e. brick2 entries -shouldn't be returned before brick1 reaches EOD. -- We cannot store any info of offset read so far etc. in inode_ctx or fd_ctx -- In case of a very large directories, and readdirp buf too small to hold -all the dentries in any brick, parallel readdirp is a overhead. Sequential -readdirp best suits the large directories. This demands dht be aware of or -speculate the directory size. - -There were two solutions that we evaluated: -1. Change dht_readdirp itself to wind readdirp parallely - http://review.gluster.org/15160 - http://review.gluster.org/15159 - http://review.gluster.org/15169 -2. Load readd-ahead as a child of dht - http://review.gluster.org/#/q/status:open+project:glusterfs+branch:master+topic:bug-1401812 - -For the below mentioned reasons we go with the second approach suggested by -Ragavendra G: -- It requires nil or very less changes in dht -- Along with empty/small directories it also benifits large directories -The only slightly complecated part would be to tune the readdir-ahead -buffer size for each instance. - -The perf gain observed is directly proportional to the: -- Number of nodes in the cluster/Volume -- Latency between client and each node in the volume. - -Some references: -[1] http://review.gluster.org/#/c/4711 -[2] https://www.mail-archive.com/gluster-devel@gluster.org/msg02834.html -[3] http://www.gluster.org/pipermail/gluster-devel/2015-January/043592.html - -Benefit to GlusterFS --------------------- - -Improves directory enumeration performance in large clusters. - -Scope ------ - -#### Nature of proposed change - -- Changes in readdir-ahead, dht xlators. -- Change glusterd to load readdir-ahead as a child of dht - and without breaking upgrade and downgrade scenarios - -#### Implications on manageability - -N/A - -#### Implications on presentation layer - -N/A - -#### Implications on persistence layer - -N/A - -#### Implications on 'GlusterFS' backend - -N/A - -#### Modification to GlusterFS metadata - -N/A - -#### Implications on 'glusterd' - -GlusterD changes are integral to this feature, and described above. - -How To Test ------------ - -For the most part, testing is of the "do no harm" sort; the most thorough test -of this feature is to run our current regression suite. -Some specific test cases include readdirp on all kind of volumes: -- distribute -- replicate -- shard -- disperse -- tier -Also, readdirp while: -- rebalance in progress -- tiering migration in progress -- self heal in progress - -And all the test cases being run while the memory consumption of the process -is monitored. - -User Experience ---------------- - -Faster directory enumeration - -Dependencies ------------- - -N/A - -Documentation -------------- - -TBD (very little) - -Status ------- - -Development in progress - -Comments and Discussion ------------------------ - -N/A diff --git a/under_review/rebalance-estimates.md b/under_review/rebalance-estimates.md deleted file mode 100644 index 2a2c299..0000000 --- a/under_review/rebalance-estimates.md +++ /dev/null @@ -1,128 +0,0 @@ -Feature -------- - -Summary -------- - -Provide a user interface to determine when the rebalance process will complete - -Owners ------- -Nithya Balachandran <nbalacha@redhat.com> - - -Current status --------------- -Patch being worked on. - - -Related Feature Requests and Bugs ---------------------------------- -https://bugzilla.redhat.com/show_bug.cgi?id=1396004 -Desc: RFE: An administrator friendly way to determine rebalance completion time - - -Detailed Description --------------------- -The rebalance operation starts a rebalance process on each node of the volume. -Each process scans the files and directories on the local subvols, fixes the layout -for each directory and migrates files to their new hashed subvolumes based on the -new layouts. - -Currently we do not have any way to determine how long the rebalance process will -take to complete. - -The proposed approach is as follows: - - 1. Determine the total number of files and directories on the local subvol - 2. Calculate the rate at which files have been processed since the rebalance started - 3. Calculate the time required to process all the files based on the rate calculated - 4. Send these values in the rebalance status response - 5. Calculate the maximum time required among all the rebalance processes - 6. Display the time required along with the rebalance status output - - -The time taken is a factor or the number and size of the files and the number of directories. -Determining the number of files and directories is difficult as Glusterfs currently -does not keep track of the number of files on each brick. - -The current approach uses the statfs call to determine the number of used inodes -and uses that number as a rough estimate of how many files/directories ae present -on the brick. However, this number is not very accurate because the .glusterfs -directory contributes heavily to this number. - -Benefit to GlusterFS --------------------- -Improves the usability of rebalance operations. -Administrators can now determine how long a rebalance operation will take to complete -allowing better planning. - - -Scope ------ - -#### Nature of proposed change - -Modifications required to the rebalance and the cli code. - -#### Implications on manageability - -gluster volume rebalance <volname> status output will be modified - -#### Implications on presentation layer - -None - -#### Implications on persistence layer - -None - -#### Implications on 'GlusterFS' backend - -None - -#### Modification to GlusterFS metadata - -None - -#### Implications on 'glusterd' - -None - -How To Test ------------ - -Run a rebalance and compare the estimates with the time actually taken to complete -the rebalance. - -The feature needs to be tested against large workloads to determine the accuracy -of the calculated times. - -User Experience ---------------- - -Gluster volume rebalance <volname> status -will display the expected time left for the rebalance process to complete - - -Dependencies ------------- - -None - -Documentation -------------- - -Documents to be updated with the changes in the rebalance status output. - - -Status ------- -In development. - - - -Comments and Discussion ------------------------ - -*Follow here* diff --git a/under_review/tier_service.md b/under_review/tier_service.md deleted file mode 100644 index 47640ee..0000000 --- a/under_review/tier_service.md +++ /dev/null @@ -1,130 +0,0 @@ -Feature -------- - -Tier as a daemon with the service framework of gluster. - -Summary -------- - -Current tier process uses the same dht code. If any change is made to DHT -it affects tier and vice versa. On an attempt to support add brick on tiered -volume, we need a rebalance daemon. So the current tier daemon has to be -separated from DHT. And so the new Daemon has been split from DHT and comes -under the service framework. - -Owners ------- - -Dan Lambright <dlambrig@redhat.com> - -Hari Gowtham <hgowtham@redhatcom> - -Current status --------------- - -In the current code, it doesn't fall under the service framework and this -makes it hard for gluster to manage the daemon. Moving it into the gluster's -service framework makes it easier to be managed. - -Related Feature Requests and Bugs ---------------------------------- - -[BUG] https://bugzilla.redhat.com/show_bug.cgi?id=1313838 - -Detailed Description --------------------- - -This change is similar to the other daemons that come under service framework. -The service framework takes care of : - -*) Spawning the daemon, killing it and other such processes. -*) Volume set options. -*) Restarting the daemon at two points - 1) when gluster goes down and comes up. - 2) to stop detach tier. -*) Reconfigure is used to make volfile changes. The reconfigure checks if the -daemons needs a restart or not and then does it as per the requirement. -By doing this, we don’t restart the daemon everytime. -*) Volume status lists the status of tier daemon as a process instead of -a task. -*) remove-brick and detach tier are separated from code level. - -With this patch the log, pid, and volfile are separated and put into respective -directories. - - -Benefit to GlusterFS --------------------- - -Improved Stability, helps the glusterd to manage the daemon during situations -like update, node down, and restart. - -Scope ------ - -#### Nature of proposed change - -A new service will be made available. The existing code will be removed in a -while to make DHT rebalance easy to maintain as the DHT and tier code are -separated. - -#### Implications on manageability - -The older gluster commands are designed to be compatible with this change. - -#### Implications on presentation layer - -None. - -#### Implications on persistence layer - -None. - -#### Implications on 'GlusterFS' backend - -Remains the same as for Tier. - -#### Modification to GlusterFS metadata - -None. - -#### Implications on 'glusterd' - -The data related to tier is made persistent (will be available after reboot). -The brick op phase being different for Tier (brick op phase was earlier used -to communicate with the daemon instead of bricks) has been implemented in -the commit phase. -The volfile changes for setting the options are also take care of using the -service framework. - -How To Test ------------ - -The basic tier commands need to be tested as it doesn't change much -in the user perspective. The same test (like attaching tier, detaching it, -status) used for testing tier have to be used. - -User Experience ---------------- - -No changes. - -Dependencies ------------- - -None. - -Documentation -------------- - -https://docs.google.com/document/d/1_iyjiwTLnBJlCiUgjAWnpnPD801h5LNxLhHmN7zmk1o/edit?usp=sharing - -Status ------- - -Code being reviewed. - -Comments and Discussion ------------------------ - -*Follow here* |