summaryrefslogtreecommitdiffstats
path: root/doc/legacy/stat-prefetch-design.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/legacy/stat-prefetch-design.txt')
-rw-r--r--doc/legacy/stat-prefetch-design.txt154
1 files changed, 154 insertions, 0 deletions
diff --git a/doc/legacy/stat-prefetch-design.txt b/doc/legacy/stat-prefetch-design.txt
new file mode 100644
index 000000000..68ed423d3
--- /dev/null
+++ b/doc/legacy/stat-prefetch-design.txt
@@ -0,0 +1,154 @@
+what is stat-prefetch?
+======================
+It is a translator which caches the dentries read in readdir. This dentry
+list is stored in the context of fd. Later when lookup happens on
+[parent-inode, basename (path)] combination, this list is searched for the
+basename. The dentry thus searched is used to fill up the stat corresponding
+to path being looked upon, thereby short-cutting lookup calls. This cache is
+preserved till closedir is called on the fd. The purpose of this translator
+is to optimize operations like 'ls -l', where a readdir is followed by
+lookup (stat) calls on each directory entry.
+
+1. stat-prefetch harnesses the efficiency of short lookup calls
+ (saves network roundtrip time for lookup calls from being accounted to
+ the stat call).
+2. To maintain the correctness, it does lookup-behind - lookup is winded to
+ underlying translators after it is unwound to upper translators.
+ lookup-behind is necessary as inode gets populated in server inode table
+ only in lookup-cbk and also because various translators store their
+ contexts in inode contexts during lookup calls.
+
+fops to be implemented:
+=======================
+* lookup
+ 1. check the dentry cache stored in context of fds opened by the same process
+ on parent inode for basename. If found unwind with cached stat, else wind
+ the lookup call to underlying translators.
+ 2. stat is stored in the context of inode if the path being looked upon
+ happens to be directory. This stat will be used to fill postparent stat
+ when lookup happens on any of the directory contents.
+
+* readdir
+ 1. cache the direntries returned in readdir_cbk in the context of fd.
+ 2. if the readdir is happening on non-expected offsets (means a seekdir/rewinddir
+ has happened), cache has to be flushed.
+ 3. delete the entry corresponding to basename of path on which fd is opened
+ from cache stored in parent.
+
+* chmod/fchmod
+ delete the entry corresponding to basename from cache stored in context of
+ fds opened on parent inode, since these calls change st_mode and st_ctime of
+ stat.
+
+* chown/fchown
+ delete the entry corresponding to basename from cache stored in context of
+ fds opened on parent inode, since these calls change st_uid/st_gid and
+ st_ctime of stat.
+
+* truncate/ftruncate
+ delete the entry corresponding to basename from cache stored in context of
+ fds opened on parent inode, since these calls change st_size/st_mtime of stat.
+
+* utimens
+ delete the entry corresponding to basename from cache stored in context of
+ fds opened on parent inode, since this call changes st_atime/st_mtime of stat.
+
+* readlink
+ delete the entry corresponding to basename from cache stored in context of fds
+ opened on parent inode, since this call changes st_atime of stat.
+
+* unlink
+ 1. delete the entry corresponding to basename from cache stored in context of
+ fds, opened on parent directory containing the file being unlinked.
+ 2. delete the entry corresponding to basename of parent directory from cache
+ of grand-parent.
+
+* rmdir
+ 1. delete the entry corresponding to basename from cache stored in context of
+ fds opened on parent inode.
+ 2. remove the entire cache from all fds opened on inode corresponding to
+ directory being removed.
+ 3. delete the entry correspondig to basename of parent from cache stored in
+ grand-parent.
+
+* readv
+ delete the entry corresponding to basename from cache stored in context of fds
+ opened on parent inode, since readv changes st_atime of file.
+
+* writev
+ delete the entry corresponding to basename from cache stored in context of fds
+ opened on parent inode, since writev can possibly change st_size and definitely
+ changes st_mtime of file.
+
+* fsync
+ there is a confusion here as to whether fsync updates mtime/ctimes. Disk based
+ filesystems (atleast ext2) just writes the times stored in inode to disk
+ during fsync and not the time at which fsync is being done. But in glusterfs,
+ a translator like write-behind actually sends writes during fsync which will
+ change mtime/ctime. Hence stat-prefetch implements fsync to delete the entry
+ corresponding to basename from cache stored in context of fds opened on parent
+ inode.
+
+* rename
+ 1. remove entry corresponding to oldname from cache stored in fd contexts of
+ oldparent.
+ 2. remove entry corresponding to newname from cache stored in fd contexts of
+ newparent.
+ 3. remove entry corresponding to oldparent from cache stored in
+ old-grand-parent, since removing oldname changes st_mtime and st_ctime
+ of oldparent stat.
+ 4. remove entry corresponding to newparent from cache stored in
+ new-grand-parent, since adding newname changes st_mtime and st_ctime
+ of newparent stat.
+ 5. if oldname happens to be a directory, remove entire cache from all fds
+ opened on it.
+
+* create/mknod/mkdir/symlink/link
+ delete entry corresponding to basename of parent directory in which these
+ operations are happening, from cache stored in context of fds opened on
+ grand-parent, since adding a new entry to a directory changes st_mtime
+ and st_ctime of parent directory.
+
+* setxattr/removexattr
+ delete the entry corresponding to basename from cache stored in context of
+ fds opened on parent inode, since setxattr changes st_ctime of file.
+
+* setdents
+ 1. remove entry corresponding to basename of path on which fd is opened from
+ cache stored in context of fds opened on parent.
+ 2. for each of the entry in the direntry list, delete from cache stored in
+ context of fd, the entry corresponding to basename of path being passed.
+
+* getdents
+ 1. remove entry corresponding to basename of path on which fd is opened from
+ cache stored in parent, since getdents changes st_atime.
+ 2. remove entries corresponding to symbolic links from cache, since readlink
+ would've changed st_atime.
+
+* checksum
+ delete the entry corresponding to basename from cache stored in context of
+ fds opened on parent inode, since st_atime is changed during this call.
+
+* xattrop/fxattrop
+ delete the entry corresponding to basename from cache stored in context of fds
+ opened on parent inode, since these calls modify st_ctime of file.
+
+callbacks to be implemented:
+============================
+* releasedir
+ free the context stored in fd.
+
+* forget
+ dree the stat if the inode corresponds to a directory.
+
+limitations:
+============
+* since a readdir does not return extended attributes of file, if need_xattr is
+ set, short-cutting of lookup does not happen and lookup is passed to
+ underlying translators.
+
+* posix_readdir does not check whether the dentries are spanning across multiple
+ mount points. Hence it is not transforming inode numbers in stat buffers if
+ posix is configured to allow export directory spanning on multiple mountpoints.
+ This is a bug which needs to be fixed. posix_readdir should treat dentries the
+ same way as if lookup is happening on dentries.