summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--doc/features/tier/tier.md (renamed from doc/features/tier/tier.txt)96
1 files changed, 73 insertions, 23 deletions
diff --git a/doc/features/tier/tier.txt b/doc/features/tier/tier.md
index 3b99fb82c32..13e7d971bdf 100644
--- a/doc/features/tier/tier.txt
+++ b/doc/features/tier/tier.md
@@ -1,11 +1,12 @@
-Tiering =======
+##Tiering
-* Feature page:
+* ####Feature page:
http://www.gluster.org/community/documentation/index.php/Features/data-classification
-* Design: goo.gl/bkU5qv
+* #####Design: goo.gl/bkU5qv
+
+###Theory of operation
-Theory of operation -------------------
The tiering feature enables different storage types to be used by the same
logical volume. In Gluster 3.7, the two types are classified as "cold" and
@@ -46,7 +47,7 @@ Performance enhancements being considered include:
* Write-back cache for database updates.
-Code organization -----------------
+###Code organization
The design endevors to be upward compatible with future migration policies,
such as scheduled file migration, data classification, etc. For example,
@@ -55,26 +56,66 @@ different set of migration policies could use the same underlying migration
engine. The I/O tracking and meta data store compontents are intended to be
reusable for things besides caching semantics.
-Meta data:
+####Libgfdb:
+
+Libgfdb provides abstract mechanism to record extra/rich metadata
+required for data maintenance, such as data tiering/classification.
+It provides consumer with API for recording and querying, keeping
+the consumer abstracted from the data store used beneath for storing data.
+It works in a plug-and-play model, where data stores can be plugged-in.
+Presently we have plugin for Sqlite3. In the future will provide recording
+and querying performance optimizer. In the current implementation the schema
+of metadata is fixed.
+
+####Schema:
+
+ GF_FILE_TB Table:
+ This table has one entry per file inode. It holds the metadata required to
+ make decisions in data maintenance.
+ GF_ID (Primary key) : File GFID (Universal Unique IDentifier in the namespace)
+ W_SEC, W_MSEC : Write wind time in sec & micro-sec
+ UW_SEC, UW_MSEC : Write un-wind time in sec & micro-sec
+ W_READ_SEC, W_READ_MSEC : Read wind time in sec & micro-sec
+ UW_READ_SEC, UW_READ_MSEC : Read un-wind time in sec & micro-sec
+ WRITE_FREQ_CNTR INTEGER : Write Frequency Counter
+ READ_FREQ_CNTR INTEGER : Read Frequency Counter
+
+ GF_FLINK_TABLE:
+ This table has all the hardlinks to a file inode.
+ GF_ID : File GFID (Composite Primary Key)``|
+ GF_PID : Parent Directory GFID (Composite Primary Key) |-> Primary Key
+ FNAME : File Base Name (Composite Primary Key)__|
+ FPATH : File Full Path (Its redundant for now, this will go)
+ W_DEL_FLAG : This Flag is used for crash consistancy, when a link is unlinked.
+ i.e Set to 1 during unlink wind and during unwind this record is deleted
+ LINK_UPDATE : This Flag is used when a link is changed i.e rename.
+ Set to 1 when rename wind and set to 0 in rename unwind
+
+Libgfdb API :
+Refer libglusterfs/src/gfdb/gfdb_data_store.h
+
+####ChangeTimeRecorder (CTR) Translator:
+
+ChangeTimeRecorder(CTR) is server side xlator(translator) which sits
+just above posix xlator. The main role of this xlator is to record the
+access/write patterns on a file residing the brick. It records the
+read(only data) and write(data and metadata) times and also count on
+how many times a file is read or written. This xlator also captures
+the hard links to a file(as its required by data tiering to move
+files).
-A database stores meta-data on the files. Entries within it are added or
-removed by the changetimerecorder translator. The database is queried by
-the migration daemon. The results of the queries drive which files are to
-be migrated.
+CTR Xlator is the consumer of libgfdb.
-The database resides withi the libgfdb subdirectory. There is one database
-for each brick. The database is currently sqlite. However, the libgfdb
-library API is not tied to sqlite, and a different database could be used.
+To Enable/Disable CTR Xlator:
-For more information on libgfdb see the doc file: libgfdb.txt.
+ **gluster volume set <volume-name> features.ctr-enabled {on/off}**
-I/O tracking:
+To Enable/Disable Frequency Counter Recording in CTR Xlator:
-The changetimerecorder server-side translator generates metadata about I/Os
-as they happen. Metadata is then entered into the database after the I/O
-completes. Internal I/Os are not included.
+ **gluster volume set <volume-name> features.record-counters {on/off}**
-Migration daemon:
+
+####Migration daemon:
When a tiered volume is created, a migration daemon starts. There is one daemon
for every tiered volume per node. The daemon sleeps and then periodically
@@ -86,7 +127,18 @@ Selected files are migrated between the tiers using existing DHT migration
logic. The tier translator will leverage DHT rebalance performance
enhancements.
-tier translator:
+Configurable for Migration daemon:
+
+ gluster volume set <volume-name> cluster.tier-demote-frequency <SECS>
+
+ gluster volume set <volume-name> cluster.tier-promote-frequency <SECS>
+
+ gluster volume set <volume-name> cluster.read-freq-threshold <SECS>
+
+ gluster volume set <volume-name> cluster.write-freq-threshold <SECS>
+
+
+####Tier Translator:
The tier translator is the root node in tiered volumes. The first subvolume
is the cold tier, and the second the hot tier. DHT logic for fowarding I/Os
@@ -108,11 +160,9 @@ Changes to DHT were made to allow "stacking", i.e. DHT over DHT:
Currently tiered volume expansion (adding and removing bricks) is unsupported.
-glusterd:
+####glusterd:
The tiered volume tree is a composition of two other volumes. The glusterd
daemon builds it. Existing logic for adding and removing bricks is heavily
leveraged to attach and detach tiers, and perform statistics collection.
-
-