diff options
Diffstat (limited to 'doc/legacy/stat-prefetch-design.txt')
-rw-r--r-- | doc/legacy/stat-prefetch-design.txt | 154 |
1 files changed, 154 insertions, 0 deletions
diff --git a/doc/legacy/stat-prefetch-design.txt b/doc/legacy/stat-prefetch-design.txt new file mode 100644 index 00000000000..68ed423d3dd --- /dev/null +++ b/doc/legacy/stat-prefetch-design.txt @@ -0,0 +1,154 @@ +what is stat-prefetch? +====================== +It is a translator which caches the dentries read in readdir. This dentry +list is stored in the context of fd. Later when lookup happens on +[parent-inode, basename (path)] combination, this list is searched for the +basename. The dentry thus searched is used to fill up the stat corresponding +to path being looked upon, thereby short-cutting lookup calls. This cache is +preserved till closedir is called on the fd. The purpose of this translator +is to optimize operations like 'ls -l', where a readdir is followed by +lookup (stat) calls on each directory entry. + +1. stat-prefetch harnesses the efficiency of short lookup calls + (saves network roundtrip time for lookup calls from being accounted to + the stat call). +2. To maintain the correctness, it does lookup-behind - lookup is winded to + underlying translators after it is unwound to upper translators. + lookup-behind is necessary as inode gets populated in server inode table + only in lookup-cbk and also because various translators store their + contexts in inode contexts during lookup calls. + +fops to be implemented: +======================= +* lookup + 1. check the dentry cache stored in context of fds opened by the same process + on parent inode for basename. If found unwind with cached stat, else wind + the lookup call to underlying translators. + 2. stat is stored in the context of inode if the path being looked upon + happens to be directory. This stat will be used to fill postparent stat + when lookup happens on any of the directory contents. + +* readdir + 1. cache the direntries returned in readdir_cbk in the context of fd. + 2. if the readdir is happening on non-expected offsets (means a seekdir/rewinddir + has happened), cache has to be flushed. + 3. delete the entry corresponding to basename of path on which fd is opened + from cache stored in parent. + +* chmod/fchmod + delete the entry corresponding to basename from cache stored in context of + fds opened on parent inode, since these calls change st_mode and st_ctime of + stat. + +* chown/fchown + delete the entry corresponding to basename from cache stored in context of + fds opened on parent inode, since these calls change st_uid/st_gid and + st_ctime of stat. + +* truncate/ftruncate + delete the entry corresponding to basename from cache stored in context of + fds opened on parent inode, since these calls change st_size/st_mtime of stat. + +* utimens + delete the entry corresponding to basename from cache stored in context of + fds opened on parent inode, since this call changes st_atime/st_mtime of stat. + +* readlink + delete the entry corresponding to basename from cache stored in context of fds + opened on parent inode, since this call changes st_atime of stat. + +* unlink + 1. delete the entry corresponding to basename from cache stored in context of + fds, opened on parent directory containing the file being unlinked. + 2. delete the entry corresponding to basename of parent directory from cache + of grand-parent. + +* rmdir + 1. delete the entry corresponding to basename from cache stored in context of + fds opened on parent inode. + 2. remove the entire cache from all fds opened on inode corresponding to + directory being removed. + 3. delete the entry correspondig to basename of parent from cache stored in + grand-parent. + +* readv + delete the entry corresponding to basename from cache stored in context of fds + opened on parent inode, since readv changes st_atime of file. + +* writev + delete the entry corresponding to basename from cache stored in context of fds + opened on parent inode, since writev can possibly change st_size and definitely + changes st_mtime of file. + +* fsync + there is a confusion here as to whether fsync updates mtime/ctimes. Disk based + filesystems (atleast ext2) just writes the times stored in inode to disk + during fsync and not the time at which fsync is being done. But in glusterfs, + a translator like write-behind actually sends writes during fsync which will + change mtime/ctime. Hence stat-prefetch implements fsync to delete the entry + corresponding to basename from cache stored in context of fds opened on parent + inode. + +* rename + 1. remove entry corresponding to oldname from cache stored in fd contexts of + oldparent. + 2. remove entry corresponding to newname from cache stored in fd contexts of + newparent. + 3. remove entry corresponding to oldparent from cache stored in + old-grand-parent, since removing oldname changes st_mtime and st_ctime + of oldparent stat. + 4. remove entry corresponding to newparent from cache stored in + new-grand-parent, since adding newname changes st_mtime and st_ctime + of newparent stat. + 5. if oldname happens to be a directory, remove entire cache from all fds + opened on it. + +* create/mknod/mkdir/symlink/link + delete entry corresponding to basename of parent directory in which these + operations are happening, from cache stored in context of fds opened on + grand-parent, since adding a new entry to a directory changes st_mtime + and st_ctime of parent directory. + +* setxattr/removexattr + delete the entry corresponding to basename from cache stored in context of + fds opened on parent inode, since setxattr changes st_ctime of file. + +* setdents + 1. remove entry corresponding to basename of path on which fd is opened from + cache stored in context of fds opened on parent. + 2. for each of the entry in the direntry list, delete from cache stored in + context of fd, the entry corresponding to basename of path being passed. + +* getdents + 1. remove entry corresponding to basename of path on which fd is opened from + cache stored in parent, since getdents changes st_atime. + 2. remove entries corresponding to symbolic links from cache, since readlink + would've changed st_atime. + +* checksum + delete the entry corresponding to basename from cache stored in context of + fds opened on parent inode, since st_atime is changed during this call. + +* xattrop/fxattrop + delete the entry corresponding to basename from cache stored in context of fds + opened on parent inode, since these calls modify st_ctime of file. + +callbacks to be implemented: +============================ +* releasedir + free the context stored in fd. + +* forget + dree the stat if the inode corresponds to a directory. + +limitations: +============ +* since a readdir does not return extended attributes of file, if need_xattr is + set, short-cutting of lookup does not happen and lookup is passed to + underlying translators. + +* posix_readdir does not check whether the dentries are spanning across multiple + mount points. Hence it is not transforming inode numbers in stat buffers if + posix is configured to allow export directory spanning on multiple mountpoints. + This is a bug which needs to be fixed. posix_readdir should treat dentries the + same way as if lookup is happening on dentries. |