diff options
Diffstat (limited to 'doc/developer-guide/datastructure-iobuf.md')
| -rw-r--r-- | doc/developer-guide/datastructure-iobuf.md | 259 | 
1 files changed, 259 insertions, 0 deletions
diff --git a/doc/developer-guide/datastructure-iobuf.md b/doc/developer-guide/datastructure-iobuf.md new file mode 100644 index 00000000000..5f521f1485f --- /dev/null +++ b/doc/developer-guide/datastructure-iobuf.md @@ -0,0 +1,259 @@ +#Iobuf-pool +##Datastructures +###iobuf +Short for IO Buffer. It is one allocatable unit for the consumers of the IOBUF +API, each unit hosts @page_size(defined in arena structure) bytes of memory. As +initial step of processing a fop, the IO buffer passed onto GlusterFS by the +other applications (FUSE VFS/ Applications using gfapi) is copied into GlusterFS +space i.e. iobufs. Hence Iobufs are mostly allocated/deallocated in Fuse, gfapi, +protocol xlators, and also in performance xlators to cache the IO buffers etc. +``` +struct iobuf { +        union { +                struct list_head      list; +                struct { +                        struct iobuf *next; +                        struct iobuf *prev; +                }; +        }; +        struct iobuf_arena  *iobuf_arena; + +        gf_lock_t            lock; /* for ->ptr and ->ref */ +        int                  ref;  /* 0 == passive, >0 == active */ + +        void                *ptr;  /* usable memory region by the consumer */ + +        void                *free_ptr; /* in case of stdalloc, this is the +                                          one to be freed not the *ptr */ +}; +``` + +###iobref +There may be need of multiple iobufs for a single fop, like in vectored read/write. +Hence multiple iobufs(default 16) are encapsulated under one iobref. +``` +struct iobref { +        gf_lock_t          lock; +        int                ref; +        struct iobuf     **iobrefs; /* list of iobufs */ +        int                alloced; /* 16 by default, grows as required */ +        int                used;    /* number of iobufs added to this iobref */ +}; +``` +###iobuf_arenas +One region of memory MMAPed from the operating system. Each region MMAPs +@arena_size bytes of memory, and hosts @arena_size / @page_size IOBUFs. +The same sized iobufs are grouped into one arena, for sanity of access. + +``` +struct iobuf_arena { +        union { +                struct list_head            list; +                struct { +                        struct iobuf_arena *next; +                        struct iobuf_arena *prev; +                }; +        }; + +        size_t              page_size;  /* size of all iobufs in this arena */ +        size_t              arena_size; /* this is equal to +                                           (iobuf_pool->arena_size / page_size) +                                           * page_size */ +        size_t              page_count; + +        struct iobuf_pool  *iobuf_pool; + +        void               *mem_base; +        struct iobuf       *iobufs;     /* allocated iobufs list */ + +        int                 active_cnt; +        struct iobuf        active;     /* head node iobuf +                                           (unused by itself) */ +        int                 passive_cnt; +        struct iobuf        passive;    /* head node iobuf +                                           (unused by itself) */ +        uint64_t            alloc_cnt;  /* total allocs in this pool */ +        int                 max_active; /* max active buffers at a given time */ +}; + +``` +###iobuf_pool +Pool of Iobufs. As there may be many Io buffers required by the filesystem, +a pool of iobufs are preallocated and kept, if these preallocated ones are +exhausted only then the standard malloc/free is called, thus improving the +performance. Iobuf pool is generally one per process, allocated during +glusterfs_ctx_t init (glusterfs_ctx_defaults_init), currently the preallocated +iobuf pool memory is freed on process exit. Iobuf pool is globally accessible +across GlusterFs, hence iobufs allocated by any xlator can be accessed by any +other xlators(unless iobuf is not passed). +``` +struct iobuf_pool { +        pthread_mutex_t     mutex; +        size_t              arena_size; /* size of memory region in +                                           arena */ +        size_t              default_page_size; /* default size of iobuf */ + +        int                 arena_cnt; +        struct list_head    arenas[GF_VARIABLE_IOBUF_COUNT]; +        /* array of arenas. Each element of the array is a list of arenas +           holding iobufs of particular page_size */ + +        struct list_head    filled[GF_VARIABLE_IOBUF_COUNT]; +        /* array of arenas without free iobufs */ + +        struct list_head    purge[GF_VARIABLE_IOBUF_COUNT]; +        /* array of of arenas which can be purged */ + +        uint64_t            request_misses; /* mostly the requests for higher +                                               value of iobufs */ +}; +``` +~~~ +The default size of the iobuf_pool(as of yet): +1024 iobufs of 128Bytes = 128KB +512  iobufs of 512Bytes = 256KB +512  iobufs of 2KB      = 1MB +128  iobufs of 8KB      = 1MB +64   iobufs of 32KB     = 2MB +32   iobufs of 128KB    = 4MB +8    iobufs of 256KB    = 2MB +2    iobufs of 1MB      = 2MB +Total ~13MB +~~~ +As seen in the datastructure iobuf_pool has 3 arena lists. + +- arenas:   +The arenas allocated during iobuf_pool create, are part of this list. This list +also contains arenas that are partially filled i.e. contain few active and few +passive iobufs (passive_cnt !=0, active_cnt!=0 except for initially allocated +arenas). There will be by default 8 arenas of the sizes mentioned above. +- filled:   +If all the iobufs in the arena are filled(passive_cnt = 0), the arena is moved +to the filled list. If any of the iobufs from the filled arena is iobuf_put, +then the arena moves back to the 'arenas' list. +- purge:   +If there are no active iobufs in the arena(active_cnt = 0), the arena is moved +to purge list. iobuf_put() triggers destruction of the arenas in this list. The +arenas in the purge list are destroyed only if there is  atleast one arena in +'arenas' list, that way there won't be spurious mmap/unmap of buffers. +(e.g: If there is an arena (page_size=128KB, count=32) in purge list, this arena +is destroyed(munmap) only if there is an arena in 'arenas' list with page_size=128KB). + +##APIs +###iobuf_get + +``` +struct iobuf *iobuf_get (struct iobuf_pool *iobuf_pool); +``` +Creates a new iobuf of the default page size(128KB hard coded as of yet). +Also takes a reference(increments ref count), hence no need of doing it +explicitly after getting iobuf. + +###iobuf_get2 + +``` +struct iobuf * iobuf_get2 (struct iobuf_pool *iobuf_pool, size_t page_size); +``` +Creates a new iobuf of a specified page size, if page_size=0 default page size +is considered. +``` +if (requested iobuf size > Max iobuf size in the pool(1MB as of yet)) +  { +     Perform standard allocation(CALLOC) of the requested size and +     add it to the list iobuf_pool->arenas[IOBUF_ARENA_MAX_INDEX]. +  } +  else +  { +     -Round the page size to match the stndard sizes in iobuf pool. +     (eg: if 3KB is requested, it is rounded to 8KB). +     -Select the arena list corresponding to the rounded size +     (eg: select 8KB arena) +     If the selected arena has passive count > 0, then return the +     iobuf from this arena, set the counters(passive/active/etc.) +     appropriately. +     else the arena is full, allocate new arena with rounded size +     and standard page numbers and add to the arena list +     (eg: 128 iobufs of 8KB is allocated). +  } +``` +Also takes a reference(increments ref count), hence no need of doing it +explicitly after getting iobuf. + +###iobuf_ref + +``` +struct iobuf *iobuf_ref (struct iobuf *iobuf); +``` +  Take a reference on the iobuf. If using an iobuf allocated by some other +xlator/function/, its a good practice to take a reference so that iobuf is not +deleted by the allocator. + +###iobuf_unref +``` +void iobuf_unref (struct iobuf *iobuf); +``` +Unreference the iobuf, if the ref count is zero iobuf is considered free. + +``` +  -Delete the iobuf, if allocated from standard alloc and return.   +  -set the active/passive count appropriately.   +  -if passive count > 0 then add the arena to 'arena' list.   +  -if active count = 0 then add the arena to 'purge' list.   +``` +Every iobuf_ref should have a corresponding iobuf_unref, and also every +iobuf_get/2 should have a correspondning iobuf_unref. + +###iobref_new +``` +struct iobref *iobref_new (); +``` +Creates a new iobref structure and returns its pointer. + +###iobref_ref +``` +struct iobref *iobref_ref (struct iobref *iobref); +``` +Take a reference on the iobref. + +###iobref_unref +``` +void iobref_unref (struct iobref *iobref); +``` +Decrements the reference count of the iobref. If the ref count is 0, then unref +all the iobufs(iobuf_unref) in the iobref, and destroy the iobref. + +###iobref_add +``` +int iobref_add (struct iobref *iobref, struct iobuf *iobuf); +``` +Adds the given iobuf into the iobref, it takes a ref on the iobuf before adding +it, hence explicit iobuf_ref is not required if adding to the iobref. + +###iobref_merge +``` +int iobref_merge (struct iobref *to, struct iobref *from); +``` +Adds all the iobufs in the 'from' iobref to the 'to' iobref. Merge will not +cause the delete of the 'from' iobref, therefore it will result in another ref +on all the iobufs added to the 'to' iobref. Hence iobref_unref should be +performed both on 'from' and 'to' iobrefs (performing iobref_unref only on 'to' +will not free the iobufs and may result in leak). + +###iobref_clear +``` +void iobref_clear (struct iobref *iobref); +``` +Unreference all the iobufs in the iobref, and also unref the iobref. + +##Iobuf Leaks +If all iobuf_refs/iobuf_new do not have correspondning iobuf_unref, then the +iobufs are not freed and recurring execution of such code path may lead to huge +memory leaks. The easiest way to identify if a memory leak is caused by iobufs +is to take a statedump. If the statedump shows a lot of filled arenas then it is +a sure sign of leak. Refer doc/debugging/statedump.md for more details. + +If iobufs are leaking, the next step is to find where the iobuf_unref went +missing. There is no standard/easy way of debugging this, code reading and logs +are the only ways. If there is a liberty to reproduce the memory leak at will, +then logs(gf_callinginfo) in iobuf_ref/unref might help.   +TODO: A easier way to debug iobuf leaks.  | 
