diff options
Diffstat (limited to 'doc/developer-guide/datastructure-inode.md')
| -rw-r--r-- | doc/developer-guide/datastructure-inode.md | 226 | 
1 files changed, 226 insertions, 0 deletions
| diff --git a/doc/developer-guide/datastructure-inode.md b/doc/developer-guide/datastructure-inode.md new file mode 100644 index 00000000000..a340ab9ca8e --- /dev/null +++ b/doc/developer-guide/datastructure-inode.md @@ -0,0 +1,226 @@ +#Inode and dentry management in GlusterFS: + +##Background +Filesystems internally refer to files and directories via inodes. Inodes +are unique identifiers of the entities stored in a filesystem. Whenever an +application has to operate on a file/directory (read/modify), the filesystem +maps that file/directory to the right inode and start referring to that inode +whenever an operation has to be performed on the file/directory. + +In GlusterFS a new inode gets created whenever a new file/directory is created +OR when a successful lookup is done on a file/directory for the first time. +Inodes in GlusterFS are maintained by the inode table which gets initiated when +the filesystem daemon is started (both for the brick process as well as the +mount process). Below are some important data structures for inode management. + +## Data-structure (inode-table) +``` +struct _inode_table { +        pthread_mutex_t    lock; +        size_t             hashsize;    /* bucket size of inode hash and dentry hash */ +        char              *name;        /* name of the inode table, just for gf_log() */ +        inode_t           *root;        /* root directory inode, with inode +        number and gfid 1 */ +        xlator_t          *xl;          /* xlator to be called to do purge and +        the xlator which maintains the inode table*/ +        uint32_t           lru_limit;   /* maximum LRU cache size */ +        struct list_head  *inode_hash;  /* buckets for inode hash table */ +        struct list_head  *name_hash;   /* buckets for dentry hash table */ +        struct list_head   active;      /* list of inodes currently active (in an fop) */ +        uint32_t           active_size; /* count of inodes in active list */ +        struct list_head   lru;         /* list of inodes recently used. +                                           lru.next most recent */ +        uint32_t           lru_size;    /* count of inodes in lru list  */ +        struct list_head   purge;       /* list of inodes to be purged soon */ +        uint32_t           purge_size;  /* count of inodes in purge list */ + +        struct mem_pool   *inode_pool;  /* memory pool for inodes */ +        struct mem_pool   *dentry_pool; /* memory pool for dentrys */ +        struct mem_pool   *fd_mem_pool; /* memory pool for fd_t */ +        int                ctxcount;    /* number of slots in inode->ctx */ +}; +``` + +#Life-cycle +``` + +inode_table_new (size_t lru_limit, xlator_t *xl) + +This is a function which allocates a new inode table. Usually the top xlators in +the graph such as protocol/server (for bricks), fuse and nfs (for fuse and nfs +mounts) and libgfapi do inode managements. Hence they are the ones which will +allocate a new inode table by calling the above function. + +Each xlator graph in glusterfs maintains an inode table. So in fuse clients, +whenever there is a graph change due to add brick/remove brick or +addition/removal of some other xlators, a new graph is created which creates a +new inode table. + +Thus an allocated inode table is destroyed only when the filesystem daemon is +killed or unmounted. + +``` + +#what it contains. +``` + +Inode table in glusterfs mainly contains a hash table for maintaining inodes. +In general a file/directory is considered to be existing if there is a +corresponding inode present in the inode table. If a inode for a file/directory +cannot be found in the inode table, glusterfs tries to resolve it by sending a +lookup on the entry for which the inode is needed. If lookup is successful, then +a new inode correponding to the entry is added to the hash table present in the +inode table. Thus an inode present in the hash-table means, its an existing +file/directory within the filesystem. The inode table also contains the hash +size of the hash table (as of now it is hard coded to 14057. The hash value of +a inode is calculated using its gfid). + +Apart from the hash table, inode table also maintains 3 important list of inodes +1) Active list: +Active list contains all the active inodes (i.e inodes which are currently part +of some fop). +2) Lru list: +Least recently used inodes list. A limit can be set for the size of the lru +list. For bricks it is 16384 and for clients it is infinity. +3) Purge list: +List of all the inodes which have to be purged (i.e inodes which have to be +deleted from the inode table due to unlink/rmdir/forget). + +And at last it also contains the mem-pool for allocating inodes, dentries so +that frequent malloc/calloc and free of the data structures can be avoided. +``` + +#Data structure (inode) +``` +struct _inode { +        inode_table_t       *table;         /* the table this inode belongs to */ +        uuid_t               gfid;          /* unique identifier of the inode */ +        gf_lock_t            lock; +        uint64_t             nlookup; +        uint32_t             fd_count;      /* Open fd count */ +        uint32_t             ref;           /* reference count on this inode */ +        ia_type_t            ia_type;       /* what kind of file */ +        struct list_head     fd_list;       /* list of open files on this inode */ +        struct list_head     dentry_list;   /* list of directory entries for this inode */ +        struct list_head     hash;          /* hash table pointers */ +        struct list_head     list;          /* active/lru/purge */ + +        struct _inode_ctx   *_ctx;          /* place holder for keeping the +        information about the inode by different xlators */ +}; + +As said above, inodes are internal way of identifying the files/directories. A +inode uniquely represents a file/directory. A new inode is created whenever a +create/mkdir/symlink/mknod operations are performed. Apart from that a new inode +is created upon the successful fresh lookup of a file/directory. Say the +filesystem contained some file "a" within root and the filesystem was +unmounted. Now when glusterfs is mounted and some operation is perfomed on "/a", +glusterfs tries to get the inode for the entry "a" with parent inode as +root. But, since glusterfs just came up, it will not be able to find the inode +for "a" and will send a lookup on "/a". If the lookup operation succeeds (i.e. +the root of glusterfs contains an entry called "a"), then a new inode for "/a" +is created and added to the inode table. + +Depending upon the situation, an inode can be in one of the 3 lists maintained +by the inode table. If some fop is happening on the inode, then the inode will +be present in the active inodes list maintained by the inode table. Active +inodes are those inodes whose refcount is greater than zero. Whenever some +operation comes on a file/directory, and the resolver tries to find the inode +for it, it increments the refcount of the inode before returning the inode. The +refcount of an inode can be incremented by calling the below function + +inode_ref (inode_t *inode) + +Any xlator which wants to operate on a inode as part of some fop (or wants the +inode in the callback), should hold a ref on the inode. +Once the fop is completed before sending the reply of the fop to the above +layers , the inode has to be unrefed. When the refcount of an inode becomes +zero, it is removed from the active inodes list and put into LRU list maintained +by the inode table. Thus in short if some fop is happening on a file/directory, +the corresponding inode will be in the active list or it will be in the LRU +list. +``` + +#Life Cycle + +A new inode is created whenever a new file/directory/symlink is created OR a +successful lookup of an existing entry is done. The xlators which does inode +management (as of now protocol/server, fuse, nfs, gfapi) will perform inode_link +operation upon successful lookup or successful creation of a new entry. + +inode_link (inode_t *inode, inode_t *parent, const char *name, +            struct iatt *buf); + +inode_link actually adds the inode to the inode table (to be precise it adds +the inode to the hash table maintained by the inode table. The hash value is +calculated based on the gfid). Copies the gfid to the inode (the gfid is +present in the iatt structure). Creates a dentry with the new name. + +A inode is removed from the inode table and eventually destroyed when unlink +or rmdir operation is performed on a file/directory, or the the lru limit of +the inode table has been exceeded. + +#Data structure (dentry) +``` + +struct _dentry { +        struct list_head   inode_list;   /* list of dentries of inode */ +        struct list_head   hash;         /* hash table pointers */ +        inode_t           *inode;        /* inode of this directory entry */ +        char              *name;         /* name of the directory entry */ +        inode_t           *parent;       /* directory of the entry */ +}; + +A dentry is the presence of an entry for a file/directory within its parent +directory. A dentry usually points to the inode to which it belongs to. In +glusterfs a dentry contains the following fields. +1) a hook using which it can add itself to the list of +the dentries maintained by the inode to which it points to. +2) A hash table pointer. +3) Pointer to the inode to which it belongs to. +4) Name of the dentry +5) Pointer to the inode of the parent directory in which the dentry is present + +A new dentry is created when a new file/directory/symlink is created or a hard +link to an existing file is created. + +__dentry_create (inode_t *inode, inode_t *parent, const char *name); + +A dentry holds a refcount on the parent +directory so that the parent inode is never removed from the active inode's list +and put to the lru list (If the lru limit of the lru list is exceeded, there is +a chance of parent inode being destroyed. To avoid it, the dentries hold a +reference to the parent inode). A dentry is removed whenevern a unlink/rmdir +is perfomed on a file/directory. Or when the lru limit has been exceeded, the +oldest inodes are purged out of the inode table, during which all the dentries +of the inode are removed. + +Whenever a unlink/rmdir comes on a file/directory, the corresponding inode +should be removed from the inode table. So upon unlink/rmdir, the inode will +be moved to the purge list maintained by the inode table and from there it is +destroyed. To be more specific, if a inode has to be destroyed, its refcount +and nlookup count both should become 0. For refcount to become 0, the inode +should not be part of any fop (there should not be any open fds). Or if the +inode belongs to a directory, then there should not be any fop happening on the +directory and it should not contain any dentries within it. For nlookup count to +become zero, a forget has to be sent on the inode with nlookup count set to 0 as +an argument. For fuse clients, forget is sent by the kernel itself whenever a +unlink/rmdir is performed. But for brick processes, upon unlink/rmdir, the +protocol/server itself has to do inode_forget. Whenever the inode has to be +deleted due to file removal or lru limit being exceeded  the inode is retired +(i.e. all the dentries of the inode are deleted and the inode is moved to the +purge list maintained by the inode table), the nlookup count is set to 0 via +inode_forget api. The inode table, then prunes all the inodes from the purge +list by destroying the inode contexts maintained by each xlator. + +unlinking of the dentry is done via inode_unlink; + +void +inode_unlink (inode_t *inode, inode_t *parent, const char *name); + +If the inode has multiple hard links, then the unlink operation performed by +the application results just in the removal of the dentry with the name provided +by the application. For the inode to be removed, all the dentries of the inode +should be unlinked. +``` + | 
