From 601bfa2719d8c9be40982b8a6526c21cd0ea4966 Mon Sep 17 00:00:00 2001 From: Kaushal M Date: Wed, 20 Jan 2016 13:09:23 +0530 Subject: Rename in_progress to under_review `in_progress` is vague term, which could either mean the feature review is in progress, or that the feature implementation is in progress. Renaming to `under_review` gives a much better indication that the feature is under review and implementation hasn't begun yet. Refer https://review.gluster.org/13187 for the discussion which lead to this Change-Id: I3f48e15deb4cf5486d7b8cac4a7915f9925f38f5 Signed-off-by: Kaushal M Reviewed-on: http://review.gluster.org/13264 Reviewed-by: Raghavendra Talur Tested-by: Raghavendra Talur --- under_review/Better Brick Mgmt.md | 180 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 180 insertions(+) create mode 100644 under_review/Better Brick Mgmt.md (limited to 'under_review/Better Brick Mgmt.md') diff --git a/under_review/Better Brick Mgmt.md b/under_review/Better Brick Mgmt.md new file mode 100644 index 0000000..adfc781 --- /dev/null +++ b/under_review/Better Brick Mgmt.md @@ -0,0 +1,180 @@ +Goal +---- + +Easier (more autonomous) assignment of storage to specific roles + +Summary +------- + +Managing bricks and arrangements of bricks (e.g. into replica sets) +manually doesn't scale. Instead, we need more intuitive ways to group +bricks together into pools, allocate space from those pools (creating +new pools), and let users define volumes in terms of pools rather than +individual bricks. We get to worry about how to arrange those bricks +into an intelligent volume configuration, e.g. replicating between +bricks that are the same size/speed/type but not on the same server. + +Because this smarter and/or finer-grain resource allocation (plus +general technology evolution) is likely to result in many more bricks +per server than we have now, we also need a brick-daemon infrastructure +capable of handling that. + +Owners +------ + +Jeff Darcy + +Current status +-------------- + +Proposed, waiting until summit for approval. + +Related Feature Requests and Bugs +--------------------------------- + +[Features/data-classification](../GlusterFS 3.7/Data Classification.md) +will drive the heaviest and/or most sophisticated use of this feature, +and some of the underlying mechanisms were originally proposed there. + +Detailed Description +-------------------- + +To start with, we need to distinguish between the raw brick that the +user allocates to GlusterFS and the pieces of that brick that result +from our complicated storage allocation. Some documents refer to these +as u-brick and s-brick respectively, though perhaps it's better to keep +calling the former bricks and come up with a new name for the latter - +slice, tile, pebble, etc. For now, let's stick with the x-brick +terminology. We can manipulate these objects in several ways. + +- Group u-bricks together into an equivalent pool of s-bricks + (trivially 1:1). + +- Allocate space from a pool of s-bricks, creating a set of smaller + s-bricks. Note that the results of applying this repeatedly might be + s-bricks which are on the same u-brick but part of different + volumes. + +- Combine multiple s-bricks into one via some combination of + replication, erasure coding, distribution, tiering, etc. + +- Export an s-brick as a volume. + +These operations - especially combining - can be applied iteratively, +creating successively more complex structures prior to the final export. +To support this, the code we currently use to generate volfiles needs to +be changed to generate similar definitions for the various levels of +s-bricks. Combined with the need to support versioning of these files +(for snapshots), this probably means a rewrite of the volgen code. +Another type of configuration file we need to create is for a brick +daemon. We still run one glusterfsd process per u-brick, for various +reasons. + +- Maximize compatibility with our current infrastructure for starting + and monitoring server processes. + +- Align the boundaries between actual and detected device failures. + +- Reduce the number of ports assigned, both for administrative + convenience and to avoid exhaustion. + +- Reduce context-switch and virtual-memory thrashing between too many + uncoordinated processes. Some day we might even add custom resource + control/scheduling between s-bricks within a process, which would be + impossible in separate processes. + +These new glusterfsd processes are going to require more complex +volfiles, and more complex translator-graph code to consume those. They +also need to be more parallel internally, so this feature depends on +eliminating single-threaded bottlenecks such as our socket transport. + +Benefit to GlusterFS +-------------------- + +- Reduced administrative overhead for large/complex volume + configurations. + +- More flexible/sophisticated volume configurations, especially with + respect to other features such as tiering or internal enhancements + such as overlapping replica/erasure sets. + +- Improved performance. + +Scope +----- + +### Nature of proposed change + +- New object model, exposed via both glusterd-level and user-level + commands on those objects. + +- Rewritten volfile infrastructure. + +- Significantly enhanced translator-graph infrastructure. + +- Multi-threaded transport. + +### Implications on manageability + +New commands will be needed to group u-bricks into pools, allocate +s-bricks from pools, etc. There will also be new commands to view status +of objects at various levels, and perhaps to set options on them. On the +other hand, "volume create" will probably become simpler as the +specifics of creating a volume are delegated downward to s-bricks. + +### Implications on presentation layer + +Surprisingly little. + +### Implications on persistence layer + +None. + +### Implications on 'GlusterFS' backend + +The on-disk structures (.glusterfs and so on) currently associated with +a brick become associated with an s-brick. The u-brick itself will +contain little, probably just an enumeration of the s-bricks into which +it has been divided. + +### Modification to GlusterFS metadata + +None. + +### Implications on 'glusterd' + +See detailed description. + +How To Test +----------- + +New tests will be needed for grouping/allocation functions. In +particular, negative tests for incorrect or impossible configurations +will be needed. Once s-bricks have been aggregated back into volumes, +most of the current volume-level tests will still apply. Related tests +will also be developed as part of the data classification feature. + +User Experience +--------------- + +See "implications on manageability" etc. + +Dependencies +------------ + +This feature is so closely associated with data classification that the +two can barely be considered separately. + +Documentation +------------- + +Much of our "brick and volume management" documentation will require a +thorough review, if not an actual rewrite. + +Status +------ + +Design still in progress. + +Comments and Discussion +----------------------- -- cgit