From 2855dff243f20a78cd8cc4e7cd581a9c558b2e69 Mon Sep 17 00:00:00 2001 From: Raghavendra Bhat Date: Tue, 23 Sep 2014 13:02:56 +0530 Subject: doc: documentation of inode and dentry management Change-Id: Ica510752d011596e8ecff5ea13c4b2bbf76ba186 BUG: 1145475 Signed-off-by: Raghavendra Bhat Reviewed-on: http://review.gluster.org/8815 Tested-by: Gluster Build System Reviewed-by: Vijay Bellur --- doc/hacker-guide/en-US/markdown/inode.md | 226 +++++++++++++++++++++++++++++++ 1 file changed, 226 insertions(+) create mode 100644 doc/hacker-guide/en-US/markdown/inode.md (limited to 'doc/hacker-guide') diff --git a/doc/hacker-guide/en-US/markdown/inode.md b/doc/hacker-guide/en-US/markdown/inode.md new file mode 100644 index 00000000000..a340ab9ca8e --- /dev/null +++ b/doc/hacker-guide/en-US/markdown/inode.md @@ -0,0 +1,226 @@ +#Inode and dentry management in GlusterFS: + +##Background +Filesystems internally refer to files and directories via inodes. Inodes +are unique identifiers of the entities stored in a filesystem. Whenever an +application has to operate on a file/directory (read/modify), the filesystem +maps that file/directory to the right inode and start referring to that inode +whenever an operation has to be performed on the file/directory. + +In GlusterFS a new inode gets created whenever a new file/directory is created +OR when a successful lookup is done on a file/directory for the first time. +Inodes in GlusterFS are maintained by the inode table which gets initiated when +the filesystem daemon is started (both for the brick process as well as the +mount process). Below are some important data structures for inode management. + +## Data-structure (inode-table) +``` +struct _inode_table { + pthread_mutex_t lock; + size_t hashsize; /* bucket size of inode hash and dentry hash */ + char *name; /* name of the inode table, just for gf_log() */ + inode_t *root; /* root directory inode, with inode + number and gfid 1 */ + xlator_t *xl; /* xlator to be called to do purge and + the xlator which maintains the inode table*/ + uint32_t lru_limit; /* maximum LRU cache size */ + struct list_head *inode_hash; /* buckets for inode hash table */ + struct list_head *name_hash; /* buckets for dentry hash table */ + struct list_head active; /* list of inodes currently active (in an fop) */ + uint32_t active_size; /* count of inodes in active list */ + struct list_head lru; /* list of inodes recently used. + lru.next most recent */ + uint32_t lru_size; /* count of inodes in lru list */ + struct list_head purge; /* list of inodes to be purged soon */ + uint32_t purge_size; /* count of inodes in purge list */ + + struct mem_pool *inode_pool; /* memory pool for inodes */ + struct mem_pool *dentry_pool; /* memory pool for dentrys */ + struct mem_pool *fd_mem_pool; /* memory pool for fd_t */ + int ctxcount; /* number of slots in inode->ctx */ +}; +``` + +#Life-cycle +``` + +inode_table_new (size_t lru_limit, xlator_t *xl) + +This is a function which allocates a new inode table. Usually the top xlators in +the graph such as protocol/server (for bricks), fuse and nfs (for fuse and nfs +mounts) and libgfapi do inode managements. Hence they are the ones which will +allocate a new inode table by calling the above function. + +Each xlator graph in glusterfs maintains an inode table. So in fuse clients, +whenever there is a graph change due to add brick/remove brick or +addition/removal of some other xlators, a new graph is created which creates a +new inode table. + +Thus an allocated inode table is destroyed only when the filesystem daemon is +killed or unmounted. + +``` + +#what it contains. +``` + +Inode table in glusterfs mainly contains a hash table for maintaining inodes. +In general a file/directory is considered to be existing if there is a +corresponding inode present in the inode table. If a inode for a file/directory +cannot be found in the inode table, glusterfs tries to resolve it by sending a +lookup on the entry for which the inode is needed. If lookup is successful, then +a new inode correponding to the entry is added to the hash table present in the +inode table. Thus an inode present in the hash-table means, its an existing +file/directory within the filesystem. The inode table also contains the hash +size of the hash table (as of now it is hard coded to 14057. The hash value of +a inode is calculated using its gfid). + +Apart from the hash table, inode table also maintains 3 important list of inodes +1) Active list: +Active list contains all the active inodes (i.e inodes which are currently part +of some fop). +2) Lru list: +Least recently used inodes list. A limit can be set for the size of the lru +list. For bricks it is 16384 and for clients it is infinity. +3) Purge list: +List of all the inodes which have to be purged (i.e inodes which have to be +deleted from the inode table due to unlink/rmdir/forget). + +And at last it also contains the mem-pool for allocating inodes, dentries so +that frequent malloc/calloc and free of the data structures can be avoided. +``` + +#Data structure (inode) +``` +struct _inode { + inode_table_t *table; /* the table this inode belongs to */ + uuid_t gfid; /* unique identifier of the inode */ + gf_lock_t lock; + uint64_t nlookup; + uint32_t fd_count; /* Open fd count */ + uint32_t ref; /* reference count on this inode */ + ia_type_t ia_type; /* what kind of file */ + struct list_head fd_list; /* list of open files on this inode */ + struct list_head dentry_list; /* list of directory entries for this inode */ + struct list_head hash; /* hash table pointers */ + struct list_head list; /* active/lru/purge */ + + struct _inode_ctx *_ctx; /* place holder for keeping the + information about the inode by different xlators */ +}; + +As said above, inodes are internal way of identifying the files/directories. A +inode uniquely represents a file/directory. A new inode is created whenever a +create/mkdir/symlink/mknod operations are performed. Apart from that a new inode +is created upon the successful fresh lookup of a file/directory. Say the +filesystem contained some file "a" within root and the filesystem was +unmounted. Now when glusterfs is mounted and some operation is perfomed on "/a", +glusterfs tries to get the inode for the entry "a" with parent inode as +root. But, since glusterfs just came up, it will not be able to find the inode +for "a" and will send a lookup on "/a". If the lookup operation succeeds (i.e. +the root of glusterfs contains an entry called "a"), then a new inode for "/a" +is created and added to the inode table. + +Depending upon the situation, an inode can be in one of the 3 lists maintained +by the inode table. If some fop is happening on the inode, then the inode will +be present in the active inodes list maintained by the inode table. Active +inodes are those inodes whose refcount is greater than zero. Whenever some +operation comes on a file/directory, and the resolver tries to find the inode +for it, it increments the refcount of the inode before returning the inode. The +refcount of an inode can be incremented by calling the below function + +inode_ref (inode_t *inode) + +Any xlator which wants to operate on a inode as part of some fop (or wants the +inode in the callback), should hold a ref on the inode. +Once the fop is completed before sending the reply of the fop to the above +layers , the inode has to be unrefed. When the refcount of an inode becomes +zero, it is removed from the active inodes list and put into LRU list maintained +by the inode table. Thus in short if some fop is happening on a file/directory, +the corresponding inode will be in the active list or it will be in the LRU +list. +``` + +#Life Cycle + +A new inode is created whenever a new file/directory/symlink is created OR a +successful lookup of an existing entry is done. The xlators which does inode +management (as of now protocol/server, fuse, nfs, gfapi) will perform inode_link +operation upon successful lookup or successful creation of a new entry. + +inode_link (inode_t *inode, inode_t *parent, const char *name, + struct iatt *buf); + +inode_link actually adds the inode to the inode table (to be precise it adds +the inode to the hash table maintained by the inode table. The hash value is +calculated based on the gfid). Copies the gfid to the inode (the gfid is +present in the iatt structure). Creates a dentry with the new name. + +A inode is removed from the inode table and eventually destroyed when unlink +or rmdir operation is performed on a file/directory, or the the lru limit of +the inode table has been exceeded. + +#Data structure (dentry) +``` + +struct _dentry { + struct list_head inode_list; /* list of dentries of inode */ + struct list_head hash; /* hash table pointers */ + inode_t *inode; /* inode of this directory entry */ + char *name; /* name of the directory entry */ + inode_t *parent; /* directory of the entry */ +}; + +A dentry is the presence of an entry for a file/directory within its parent +directory. A dentry usually points to the inode to which it belongs to. In +glusterfs a dentry contains the following fields. +1) a hook using which it can add itself to the list of +the dentries maintained by the inode to which it points to. +2) A hash table pointer. +3) Pointer to the inode to which it belongs to. +4) Name of the dentry +5) Pointer to the inode of the parent directory in which the dentry is present + +A new dentry is created when a new file/directory/symlink is created or a hard +link to an existing file is created. + +__dentry_create (inode_t *inode, inode_t *parent, const char *name); + +A dentry holds a refcount on the parent +directory so that the parent inode is never removed from the active inode's list +and put to the lru list (If the lru limit of the lru list is exceeded, there is +a chance of parent inode being destroyed. To avoid it, the dentries hold a +reference to the parent inode). A dentry is removed whenevern a unlink/rmdir +is perfomed on a file/directory. Or when the lru limit has been exceeded, the +oldest inodes are purged out of the inode table, during which all the dentries +of the inode are removed. + +Whenever a unlink/rmdir comes on a file/directory, the corresponding inode +should be removed from the inode table. So upon unlink/rmdir, the inode will +be moved to the purge list maintained by the inode table and from there it is +destroyed. To be more specific, if a inode has to be destroyed, its refcount +and nlookup count both should become 0. For refcount to become 0, the inode +should not be part of any fop (there should not be any open fds). Or if the +inode belongs to a directory, then there should not be any fop happening on the +directory and it should not contain any dentries within it. For nlookup count to +become zero, a forget has to be sent on the inode with nlookup count set to 0 as +an argument. For fuse clients, forget is sent by the kernel itself whenever a +unlink/rmdir is performed. But for brick processes, upon unlink/rmdir, the +protocol/server itself has to do inode_forget. Whenever the inode has to be +deleted due to file removal or lru limit being exceeded the inode is retired +(i.e. all the dentries of the inode are deleted and the inode is moved to the +purge list maintained by the inode table), the nlookup count is set to 0 via +inode_forget api. The inode table, then prunes all the inodes from the purge +list by destroying the inode contexts maintained by each xlator. + +unlinking of the dentry is done via inode_unlink; + +void +inode_unlink (inode_t *inode, inode_t *parent, const char *name); + +If the inode has multiple hard links, then the unlink operation performed by +the application results just in the removal of the dentry with the name provided +by the application. For the inode to be removed, all the dentries of the inode +should be unlinked. +``` + -- cgit