summaryrefslogtreecommitdiffstats
path: root/doc/hacker-guide
diff options
context:
space:
mode:
Diffstat (limited to 'doc/hacker-guide')
-rw-r--r--doc/hacker-guide/Makefile.am8
-rw-r--r--doc/hacker-guide/adding-fops.txt33
-rw-r--r--doc/hacker-guide/bdb.txt70
-rw-r--r--doc/hacker-guide/call-stub.txt1033
-rw-r--r--doc/hacker-guide/hacker-guide.tex312
-rw-r--r--doc/hacker-guide/posix.txt59
-rw-r--r--doc/hacker-guide/replicate.txt206
-rw-r--r--doc/hacker-guide/write-behind.txt45
8 files changed, 1766 insertions, 0 deletions
diff --git a/doc/hacker-guide/Makefile.am b/doc/hacker-guide/Makefile.am
new file mode 100644
index 000000000..65c92ac23
--- /dev/null
+++ b/doc/hacker-guide/Makefile.am
@@ -0,0 +1,8 @@
+EXTRA_DIST = replicate.txt bdb.txt posix.txt call-stub.txt write-behind.txt
+
+#EXTRA_DIST = hacker-guide.tex afr.txt bdb.txt posix.txt call-stub.txt write-behind.txt
+#hacker_guidedir = $(docdir)
+#hacker_guide_DATA = hacker-guide.pdf
+
+#hacker-guide.pdf: $(EXTRA_DIST)
+# pdflatex $(srcdir)/hacker-guide.tex
diff --git a/doc/hacker-guide/adding-fops.txt b/doc/hacker-guide/adding-fops.txt
new file mode 100644
index 000000000..293de2637
--- /dev/null
+++ b/doc/hacker-guide/adding-fops.txt
@@ -0,0 +1,33 @@
+ HOW TO ADD A NEW FOP TO GlusterFS
+ =================================
+
+Steps to be followed when adding a new FOP to GlusterFS:
+
+1. Edit glusterfs.h and add a GF_FOP_* constant.
+
+2. Edit xlator.[ch] and:
+ 2a. add the new prototype for fop and callback.
+ 2b. edit xlator_fops structure.
+
+3. Edit xlator.c and add to fill_defaults.
+
+4. Edit protocol.h and add struct necessary for the new FOP.
+
+5. Edit defaults.[ch] and provide default implementation.
+
+6. Edit call-stub.[ch] and provide stub implementation.
+
+7. Edit common-utils.c and add to gf_global_variable_init().
+
+8. Edit client-protocol and add your FOP.
+
+9. Edit server-protocol and add your FOP.
+
+10. Implement your FOP in any translator for which the default implementation
+ is not sufficient.
+
+==========================================
+Last updated: Mon Oct 27 21:35:49 IST 2008
+
+Author: Vikas Gorur <vikas@zresearch.com>
+==========================================
diff --git a/doc/hacker-guide/bdb.txt b/doc/hacker-guide/bdb.txt
new file mode 100644
index 000000000..fd0bd3652
--- /dev/null
+++ b/doc/hacker-guide/bdb.txt
@@ -0,0 +1,70 @@
+
+* How does file translates to key/value pair?
+---------------------------------------------
+
+ in bdb a file is identified by key (obtained by taking basename() of the path of
+the file) and file contents are stored as value corresponding to the key in database
+file (defaults to glusterfs_storage.db under dirname() directory).
+
+* symlinks, directories
+-----------------------
+
+ symlinks and directories are stored as is.
+
+* db (database) files
+---------------------
+
+ every directory, including root directory, contains a database file called
+glusterfs_storage.db. all the regular files contained in the directory are stored
+as key/value pair inside the glusterfs_storage.db.
+
+* internal data cache
+---------------------
+
+ db does not provide a way to find out the size of the value corresponding to a key.
+so, bdb makes DB->get() call for key and takes the length of the value returned.
+since DB->get() also returns file contents for key, bdb maintains an internal cache and
+stores the file contents in the cache.
+ every directory maintains a seperate cache.
+
+* inode number transformation
+-----------------------------
+
+ bdb allocates a inode number to each file and directory on its own. bdb maintains a
+global counter and increments it after allocating inode number for each file
+(regular, symlink or directory). NOTE: bdb does not guarantee persistent inode numbers.
+
+* checkpoint thread
+-------------------
+
+ bdb creates a checkpoint thread at the time of init(). checkpoint thread does a
+periodic checkpoint on the DB_ENV. checkpoint is the mechanism, provided by db, to
+forcefully commit the logged transactions to the storage.
+
+NOTES ABOUT FOPS:
+-----------------
+
+lookup() -
+ 1> do lstat() on the path, if lstat fails, we assume that the file being looked up
+ is either a regular file or doesn't exist.
+ 2> lookup in the DB of parent directory for key corresponding to path. if key exists,
+ return key, with.
+ NOTE: 'struct stat' stat()ed from DB file is used as a container for 'struct stat'
+ of the regular file. st_ino, st_size, st_blocks are updated with file's values.
+
+readv() -
+ 1> do a lookup in bctx cache. if successful, return the requested data from cache.
+ 2> if cache missed, do a DB->get() the entire file content and insert to cache.
+
+writev():
+ 1> flush any cached content of this file.
+ 2> do a DB->put(), with DB_DBT_PARTIAL flag.
+ NOTE: DB_DBT_PARTIAL is used to do partial update of a value in DB.
+
+readdir():
+ 1> regular readdir() in a loop, and vomit all DB_ENV log files and DB files that
+ we encounter.
+ 2> if the readdir() buffer still has space, open a DB cursor and do a sequential
+ DBC->get() to fill the reaadir buffer.
+
+
diff --git a/doc/hacker-guide/call-stub.txt b/doc/hacker-guide/call-stub.txt
new file mode 100644
index 000000000..bca1579b2
--- /dev/null
+++ b/doc/hacker-guide/call-stub.txt
@@ -0,0 +1,1033 @@
+creating a call stub and pausing a call
+---------------------------------------
+libglusterfs provides seperate API to pause each of the fop. parameters to each API is
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+ NOTE: @fn should exactly take the same type and number of parameters that
+ the corresponding regular fop takes.
+rest will be the regular parameters to corresponding fop.
+
+NOTE: @frame can never be NULL. fop_<operation>_stub() fails with errno
+ set to EINVAL, if @frame is NULL. also wherever @loc is applicable,
+ @loc cannot be NULL.
+
+refer to individual stub creation API to know about call-stub creation's behaviour with
+specific parameters.
+
+here is the list of stub creation APIs for xlator fops.
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@loc - pointer to location structure.
+ NOTE: @loc will be copied to a different location, with inode_ref() to
+ @loc->inode and @loc->parent, if not NULL. also @loc->path will be
+ copied to a different location.
+@need_xattr - flag to specify if xattr should be returned or not.
+call_stub_t *
+fop_lookup_stub (call_frame_t *frame,
+ fop_lookup_t fn,
+ loc_t *loc,
+ int32_t need_xattr);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@loc - pointer to location structure.
+ NOTE: @loc will be copied to a different location, with inode_ref() to
+ @loc->inode and @loc->parent, if not NULL. also @loc->path will be
+ copied to a different location.
+call_stub_t *
+fop_stat_stub (call_frame_t *frame,
+ fop_stat_t fn,
+ loc_t *loc);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@fd - file descriptor parameter to lk fop.
+ NOTE: @fd is stored with a fd_ref().
+call_stub_t *
+fop_fstat_stub (call_frame_t *frame,
+ fop_fstat_t fn,
+ fd_t *fd);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@loc - pointer to location structure.
+ NOTE: @loc will be copied to a different location, with inode_ref() to @loc->inode and
+ @loc->parent, if not NULL. also @loc->path will be copied to a different location.
+@mode - mode parameter to chmod.
+call_stub_t *
+fop_chmod_stub (call_frame_t *frame,
+ fop_chmod_t fn,
+ loc_t *loc,
+ mode_t mode);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@fd - file descriptor parameter to lk fop.
+ NOTE: @fd is stored with a fd_ref().
+@mode - mode parameter for fchmod fop.
+call_stub_t *
+fop_fchmod_stub (call_frame_t *frame,
+ fop_fchmod_t fn,
+ fd_t *fd,
+ mode_t mode);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@loc - pointer to location structure.
+ NOTE: @loc will be copied to a different location, with inode_ref() to @loc->inode and
+ @loc->parent, if not NULL. also @loc->path will be copied to a different location.
+@uid - uid parameter to chown.
+@gid - gid parameter to chown.
+call_stub_t *
+fop_chown_stub (call_frame_t *frame,
+ fop_chown_t fn,
+ loc_t *loc,
+ uid_t uid,
+ gid_t gid);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@fd - file descriptor parameter to lk fop.
+ NOTE: @fd is stored with a fd_ref().
+@uid - uid parameter to fchown.
+@gid - gid parameter to fchown.
+call_stub_t *
+fop_fchown_stub (call_frame_t *frame,
+ fop_fchown_t fn,
+ fd_t *fd,
+ uid_t uid,
+ gid_t gid);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@loc - pointer to location structure.
+ NOTE: @loc will be copied to a different location, with inode_ref() to
+ @loc->inode and @loc->parent, if not NULL. also @loc->path will be
+ copied to a different location, if not NULL.
+@off - offset parameter to truncate fop.
+call_stub_t *
+fop_truncate_stub (call_frame_t *frame,
+ fop_truncate_t fn,
+ loc_t *loc,
+ off_t off);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@fd - file descriptor parameter to lk fop.
+ NOTE: @fd is stored with a fd_ref().
+@off - offset parameter to ftruncate fop.
+call_stub_t *
+fop_ftruncate_stub (call_frame_t *frame,
+ fop_ftruncate_t fn,
+ fd_t *fd,
+ off_t off);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@loc - pointer to location structure.
+ NOTE: @loc will be copied to a different location, with inode_ref() to
+ @loc->inode and @loc->parent, if not NULL. also @loc->path will be
+ copied to a different location.
+@tv - tv parameter to utimens fop.
+call_stub_t *
+fop_utimens_stub (call_frame_t *frame,
+ fop_utimens_t fn,
+ loc_t *loc,
+ struct timespec tv[2]);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@loc - pointer to location structure.
+ NOTE: @loc will be copied to a different location, with inode_ref() to
+ @loc->inode and @loc->parent, if not NULL. also @loc->path will be
+ copied to a different location.
+@mask - mask parameter for access fop.
+call_stub_t *
+fop_access_stub (call_frame_t *frame,
+ fop_access_t fn,
+ loc_t *loc,
+ int32_t mask);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@loc - pointer to location structure.
+ NOTE: @loc will be copied to a different location, with inode_ref() to
+ @loc->inode and @loc->parent, if not NULL. also @loc->path will be
+ copied to a different location.
+@size - size parameter to readlink fop.
+call_stub_t *
+fop_readlink_stub (call_frame_t *frame,
+ fop_readlink_t fn,
+ loc_t *loc,
+ size_t size);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@loc - pointer to location structure.
+ NOTE: @loc will be copied to a different location, with inode_ref() to
+ @loc->inode and @loc->parent, if not NULL. also @loc->path will be
+ copied to a different location.
+@mode - mode parameter to mknod fop.
+@rdev - rdev parameter to mknod fop.
+call_stub_t *
+fop_mknod_stub (call_frame_t *frame,
+ fop_mknod_t fn,
+ loc_t *loc,
+ mode_t mode,
+ dev_t rdev);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@loc - pointer to location structure.
+ NOTE: @loc will be copied to a different location, with inode_ref() to
+ @loc->inode and @loc->parent, if not NULL. also @loc->path will be
+ copied to a different location.
+@mode - mode parameter to mkdir fop.
+call_stub_t *
+fop_mkdir_stub (call_frame_t *frame,
+ fop_mkdir_t fn,
+ loc_t *loc,
+ mode_t mode);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@loc - pointer to location structure.
+ NOTE: @loc will be copied to a different location, with inode_ref() to
+ @loc->inode and @loc->parent, if not NULL. also @loc->path will be
+ copied to a different location.
+call_stub_t *
+fop_unlink_stub (call_frame_t *frame,
+ fop_unlink_t fn,
+ loc_t *loc);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@loc - pointer to location structure.
+ NOTE: @loc will be copied to a different location, with inode_ref() to
+ @loc->inode and @loc->parent, if not NULL. also @loc->path will be
+ copied to a different location.
+call_stub_t *
+fop_rmdir_stub (call_frame_t *frame,
+ fop_rmdir_t fn,
+ loc_t *loc);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@linkname - linkname parameter to symlink fop.
+@loc - pointer to location structure.
+ NOTE: @loc will be copied to a different location, with inode_ref() to
+ @loc->inode and @loc->parent, if not NULL. also @loc->path will be
+ copied to a different location.
+call_stub_t *
+fop_symlink_stub (call_frame_t *frame,
+ fop_symlink_t fn,
+ const char *linkname,
+ loc_t *loc);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@oldloc - pointer to location structure.
+ NOTE: @oldloc will be copied to a different location, with inode_ref() to
+ @oldloc->inode and @oldloc->parent, if not NULL. also @oldloc->path will
+ be copied to a different location, if not NULL.
+@newloc - pointer to location structure.
+ NOTE: @newloc will be copied to a different location, with inode_ref() to
+ @newloc->inode and @newloc->parent, if not NULL. also @newloc->path will
+ be copied to a different location, if not NULL.
+call_stub_t *
+fop_rename_stub (call_frame_t *frame,
+ fop_rename_t fn,
+ loc_t *oldloc,
+ loc_t *newloc);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@loc - pointer to location structure.
+ NOTE: @loc will be copied to a different location, with inode_ref() to
+ @loc->inode and @loc->parent, if not NULL. also @loc->path will be
+ copied to a different location.
+@newpath - newpath parameter to link fop.
+call_stub_t *
+fop_link_stub (call_frame_t *frame,
+ fop_link_t fn,
+ loc_t *oldloc,
+ const char *newpath);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@loc - pointer to location structure.
+ NOTE: @loc will be copied to a different location, with inode_ref() to
+ @loc->inode and @loc->parent, if not NULL. also @loc->path will be
+ copied to a different location.
+@flags - flags parameter to create fop.
+@mode - mode parameter to create fop.
+@fd - file descriptor parameter to create fop.
+ NOTE: @fd is stored with a fd_ref().
+call_stub_t *
+fop_create_stub (call_frame_t *frame,
+ fop_create_t fn,
+ loc_t *loc,
+ int32_t flags,
+ mode_t mode, fd_t *fd);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@flags - flags parameter to open fop.
+@loc - pointer to location structure.
+ NOTE: @loc will be copied to a different location, with inode_ref() to
+ @loc->inode and @loc->parent, if not NULL. also @loc->path will be
+ copied to a different location.
+call_stub_t *
+fop_open_stub (call_frame_t *frame,
+ fop_open_t fn,
+ loc_t *loc,
+ int32_t flags,
+ fd_t *fd);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@fd - file descriptor parameter to lk fop.
+ NOTE: @fd is stored with a fd_ref().
+@size - size parameter to readv fop.
+@off - offset parameter to readv fop.
+call_stub_t *
+fop_readv_stub (call_frame_t *frame,
+ fop_readv_t fn,
+ fd_t *fd,
+ size_t size,
+ off_t off);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@fd - file descriptor parameter to lk fop.
+ NOTE: @fd is stored with a fd_ref().
+@vector - vector parameter to writev fop.
+ NOTE: @vector is iov_dup()ed while creating stub. and frame->root->req_refs
+ dictionary is dict_ref()ed.
+@count - count parameter to writev fop.
+@off - off parameter to writev fop.
+call_stub_t *
+fop_writev_stub (call_frame_t *frame,
+ fop_writev_t fn,
+ fd_t *fd,
+ struct iovec *vector,
+ int32_t count,
+ off_t off);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@fd - file descriptor parameter to flush fop.
+ NOTE: @fd is stored with a fd_ref().
+call_stub_t *
+fop_flush_stub (call_frame_t *frame,
+ fop_flush_t fn,
+ fd_t *fd);
+
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@fd - file descriptor parameter to lk fop.
+ NOTE: @fd is stored with a fd_ref().
+@datasync - datasync parameter to fsync fop.
+call_stub_t *
+fop_fsync_stub (call_frame_t *frame,
+ fop_fsync_t fn,
+ fd_t *fd,
+ int32_t datasync);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@loc - pointer to location structure.
+ NOTE: @loc will be copied to a different location, with inode_ref() to @loc->inode and
+ @loc->parent, if not NULL. also @loc->path will be copied to a different location.
+@fd - file descriptor parameter to opendir fop.
+ NOTE: @fd is stored with a fd_ref().
+call_stub_t *
+fop_opendir_stub (call_frame_t *frame,
+ fop_opendir_t fn,
+ loc_t *loc,
+ fd_t *fd);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@fd - file descriptor parameter to getdents fop.
+ NOTE: @fd is stored with a fd_ref().
+@size - size parameter to getdents fop.
+@off - off parameter to getdents fop.
+@flags - flags parameter to getdents fop.
+call_stub_t *
+fop_getdents_stub (call_frame_t *frame,
+ fop_getdents_t fn,
+ fd_t *fd,
+ size_t size,
+ off_t off,
+ int32_t flag);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@fd - file descriptor parameter to setdents fop.
+ NOTE: @fd is stored with a fd_ref().
+@flags - flags parameter to setdents fop.
+@entries - entries parameter to setdents fop.
+call_stub_t *
+fop_setdents_stub (call_frame_t *frame,
+ fop_setdents_t fn,
+ fd_t *fd,
+ int32_t flags,
+ dir_entry_t *entries,
+ int32_t count);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@fd - file descriptor parameter to setdents fop.
+ NOTE: @fd is stored with a fd_ref().
+@datasync - datasync parameter to fsyncdir fop.
+call_stub_t *
+fop_fsyncdir_stub (call_frame_t *frame,
+ fop_fsyncdir_t fn,
+ fd_t *fd,
+ int32_t datasync);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@loc - pointer to location structure.
+ NOTE: @loc will be copied to a different location, with inode_ref() to
+ @loc->inode and @loc->parent, if not NULL. also @loc->path will be
+ copied to a different location.
+call_stub_t *
+fop_statfs_stub (call_frame_t *frame,
+ fop_statfs_t fn,
+ loc_t *loc);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@loc - pointer to location structure.
+ NOTE: @loc will be copied to a different location, with inode_ref() to
+ @loc->inode and @loc->parent, if not NULL. also @loc->path will be
+ copied to a different location.
+@dict - dict parameter to setxattr fop.
+ NOTE: stub creation procedure stores @dict pointer with dict_ref() to it.
+call_stub_t *
+fop_setxattr_stub (call_frame_t *frame,
+ fop_setxattr_t fn,
+ loc_t *loc,
+ dict_t *dict,
+ int32_t flags);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@loc - pointer to location structure.
+ NOTE: @loc will be copied to a different location, with inode_ref() to
+ @loc->inode and @loc->parent, if not NULL. also @loc->path will be
+ copied to a different location.
+@name - name parameter to getxattr fop.
+call_stub_t *
+fop_getxattr_stub (call_frame_t *frame,
+ fop_getxattr_t fn,
+ loc_t *loc,
+ const char *name);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@loc - pointer to location structure.
+ NOTE: @loc will be copied to a different location, with inode_ref() to
+ @loc->inode and @loc->parent, if not NULL. also @loc->path will be
+ copied to a different location.
+@name - name parameter to removexattr fop.
+ NOTE: name string will be copied to a different location while creating stub.
+call_stub_t *
+fop_removexattr_stub (call_frame_t *frame,
+ fop_removexattr_t fn,
+ loc_t *loc,
+ const char *name);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@fd - file descriptor parameter to lk fop.
+ NOTE: @fd is stored with a fd_ref().
+@cmd - command parameter to lk fop.
+@lock - lock parameter to lk fop.
+ NOTE: lock will be copied to a different location while creating stub.
+call_stub_t *
+fop_lk_stub (call_frame_t *frame,
+ fop_lk_t fn,
+ fd_t *fd,
+ int32_t cmd,
+ struct flock *lock);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@fd - fd parameter to gf_lk fop.
+ NOTE: @fd is fd_ref()ed while creating stub, if not NULL.
+@cmd - cmd parameter to gf_lk fop.
+@lock - lock paramater to gf_lk fop.
+ NOTE: @lock is copied to a different memory location while creating
+ stub.
+call_stub_t *
+fop_gf_lk_stub (call_frame_t *frame,
+ fop_gf_lk_t fn,
+ fd_t *fd,
+ int32_t cmd,
+ struct flock *lock);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@fd - file descriptor parameter to readdir fop.
+ NOTE: @fd is stored with a fd_ref().
+@size - size parameter to readdir fop.
+@off - offset parameter to readdir fop.
+call_stub_t *
+fop_readdir_stub (call_frame_t *frame,
+ fop_readdir_t fn,
+ fd_t *fd,
+ size_t size,
+ off_t off);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@loc - pointer to location structure.
+ NOTE: @loc will be copied to a different location, with inode_ref() to
+ @loc->inode and @loc->parent, if not NULL. also @loc->path will be
+ copied to a different location.
+@flags - flags parameter to checksum fop.
+call_stub_t *
+fop_checksum_stub (call_frame_t *frame,
+ fop_checksum_t fn,
+ loc_t *loc,
+ int32_t flags);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+@inode - inode parameter to @fn.
+ NOTE: @inode pointer is stored with a inode_ref().
+@buf - buf parameter to @fn.
+ NOTE: @buf is copied to a different memory location, if not NULL.
+@dict - dict parameter to @fn.
+ NOTE: @dict pointer is stored with dict_ref().
+call_stub_t *
+fop_lookup_cbk_stub (call_frame_t *frame,
+ fop_lookup_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno,
+ inode_t *inode,
+ struct stat *buf,
+ dict_t *dict);
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+@buf - buf parameter to @fn.
+ NOTE: @buf is copied to a different memory location, if not NULL.
+call_stub_t *
+fop_stat_cbk_stub (call_frame_t *frame,
+ fop_stat_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno,
+ struct stat *buf);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+@buf - buf parameter to @fn.
+ NOTE: @buf is copied to a different memory location, if not NULL.
+call_stub_t *
+fop_fstat_cbk_stub (call_frame_t *frame,
+ fop_fstat_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno,
+ struct stat *buf);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+@buf - buf parameter to @fn.
+ NOTE: @buf is copied to a different memory location, if not NULL.
+call_stub_t *
+fop_chmod_cbk_stub (call_frame_t *frame,
+ fop_chmod_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno,
+ struct stat *buf);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+@buf - buf parameter to @fn.
+ NOTE: @buf is copied to a different memory location, if not NULL.
+call_stub_t *
+fop_fchmod_cbk_stub (call_frame_t *frame,
+ fop_fchmod_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno,
+ struct stat *buf);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+@buf - buf parameter to @fn.
+ NOTE: @buf is copied to a different memory location, if not NULL.
+call_stub_t *
+fop_chown_cbk_stub (call_frame_t *frame,
+ fop_chown_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno,
+ struct stat *buf);
+
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+@buf - buf parameter to @fn.
+ NOTE: @buf is copied to a different memory location, if not NULL.
+call_stub_t *
+fop_fchown_cbk_stub (call_frame_t *frame,
+ fop_fchown_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno,
+ struct stat *buf);
+
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+@buf - buf parameter to @fn.
+ NOTE: @buf is copied to a different memory location, if not NULL.
+call_stub_t *
+fop_truncate_cbk_stub (call_frame_t *frame,
+ fop_truncate_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno,
+ struct stat *buf);
+
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+@buf - buf parameter to @fn.
+ NOTE: @buf is copied to a different memory location, if not NULL.
+call_stub_t *
+fop_ftruncate_cbk_stub (call_frame_t *frame,
+ fop_ftruncate_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno,
+ struct stat *buf);
+
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+@buf - buf parameter to @fn.
+ NOTE: @buf is copied to a different memory location, if not NULL.
+call_stub_t *
+fop_utimens_cbk_stub (call_frame_t *frame,
+ fop_utimens_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno,
+ struct stat *buf);
+
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+call_stub_t *
+fop_access_cbk_stub (call_frame_t *frame,
+ fop_access_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno);
+
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+@path - path parameter to @fn.
+ NOTE: @path is copied to a different memory location, if not NULL.
+call_stub_t *
+fop_readlink_cbk_stub (call_frame_t *frame,
+ fop_readlink_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno,
+ const char *path);
+
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+@inode - inode parameter to @fn.
+ NOTE: @inode pointer is stored with a inode_ref().
+@buf - buf parameter to @fn.
+ NOTE: @buf is copied to a different memory location, if not NULL.
+call_stub_t *
+fop_mknod_cbk_stub (call_frame_t *frame,
+ fop_mknod_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno,
+ inode_t *inode,
+ struct stat *buf);
+
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+@inode - inode parameter to @fn.
+ NOTE: @inode pointer is stored with a inode_ref().
+@buf - buf parameter to @fn.
+ NOTE: @buf is copied to a different memory location, if not NULL.
+call_stub_t *
+fop_mkdir_cbk_stub (call_frame_t *frame,
+ fop_mkdir_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno,
+ inode_t *inode,
+ struct stat *buf);
+
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+call_stub_t *
+fop_unlink_cbk_stub (call_frame_t *frame,
+ fop_unlink_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno);
+
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+call_stub_t *
+fop_rmdir_cbk_stub (call_frame_t *frame,
+ fop_rmdir_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno);
+
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+@inode - inode parameter to @fn.
+ NOTE: @inode pointer is stored with a inode_ref().
+@buf - buf parameter to @fn.
+ NOTE: @buf is copied to a different memory location, if not NULL.
+call_stub_t *
+fop_symlink_cbk_stub (call_frame_t *frame,
+ fop_symlink_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno,
+ inode_t *inode,
+ struct stat *buf);
+
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+@buf - buf parameter to @fn.
+ NOTE: @buf is copied to a different memory location, if not NULL.
+call_stub_t *
+fop_rename_cbk_stub (call_frame_t *frame,
+ fop_rename_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno,
+ struct stat *buf);
+
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+@inode - inode parameter to @fn.
+ NOTE: @inode pointer is stored with a inode_ref().
+@buf - buf parameter to @fn.
+ NOTE: @buf is copied to a different memory location, if not NULL.
+call_stub_t *
+fop_link_cbk_stub (call_frame_t *frame,
+ fop_link_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno,
+ inode_t *inode,
+ struct stat *buf);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+@fd - fd parameter to @fn.
+ NOTE: @fd pointer is stored with a fd_ref().
+@inode - inode parameter to @fn.
+ NOTE: @inode pointer is stored with a inode_ref().
+@buf - buf parameter to @fn.
+ NOTE: @buf is copied to a different memory location, if not NULL.
+call_stub_t *
+fop_create_cbk_stub (call_frame_t *frame,
+ fop_create_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno,
+ fd_t *fd,
+ inode_t *inode,
+ struct stat *buf);
+
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+@fd - fd parameter to @fn.
+ NOTE: @fd pointer is stored with a fd_ref().
+call_stub_t *
+fop_open_cbk_stub (call_frame_t *frame,
+ fop_open_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno,
+ fd_t *fd);
+
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+@vector - vector parameter to @fn.
+ NOTE: @vector is copied to a different memory location, if not NULL. also
+ frame->root->rsp_refs is dict_ref()ed.
+@stbuf - stbuf parameter to @fn.
+ NOTE: @stbuf is copied to a different memory location, if not NULL.
+call_stub_t *
+fop_readv_cbk_stub (call_frame_t *frame,
+ fop_readv_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno,
+ struct iovec *vector,
+ int32_t count,
+ struct stat *stbuf);
+
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+@stbuf - stbuf parameter to @fn.
+ NOTE: @stbuf is copied to a different memory location, if not NULL.
+call_stub_t *
+fop_writev_cbk_stub (call_frame_t *frame,
+ fop_writev_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno,
+ struct stat *stbuf);
+
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+call_stub_t *
+fop_flush_cbk_stub (call_frame_t *frame,
+ fop_flush_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+call_stub_t *
+fop_fsync_cbk_stub (call_frame_t *frame,
+ fop_fsync_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno);
+
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+@fd - fd parameter to @fn.
+ NOTE: @fd pointer is stored with a fd_ref().
+call_stub_t *
+fop_opendir_cbk_stub (call_frame_t *frame,
+ fop_opendir_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno,
+ fd_t *fd);
+
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+@entries - entries parameter to @fn.
+@count - count parameter to @fn.
+call_stub_t *
+fop_getdents_cbk_stub (call_frame_t *frame,
+ fop_getdents_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno,
+ dir_entry_t *entries,
+ int32_t count);
+
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+call_stub_t *
+fop_setdents_cbk_stub (call_frame_t *frame,
+ fop_setdents_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+call_stub_t *
+fop_fsyncdir_cbk_stub (call_frame_t *frame,
+ fop_fsyncdir_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno);
+
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+@buf - buf parameter to @fn.
+ NOTE: @buf is copied to a different memory location, if not NULL.
+call_stub_t *
+fop_statfs_cbk_stub (call_frame_t *frame,
+ fop_statfs_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno,
+ struct statvfs *buf);
+
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+call_stub_t *
+fop_setxattr_cbk_stub (call_frame_t *frame,
+ fop_setxattr_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno);
+
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+@value - value dictionary parameter to @fn.
+ NOTE: @value pointer is stored with a dict_ref().
+call_stub_t *
+fop_getxattr_cbk_stub (call_frame_t *frame,
+ fop_getxattr_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno,
+ dict_t *value);
+
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+call_stub_t *
+fop_removexattr_cbk_stub (call_frame_t *frame,
+ fop_removexattr_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno);
+
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+@lock - lock parameter to @fn.
+ NOTE: @lock is copied to a different memory location while creating
+ stub.
+call_stub_t *
+fop_lk_cbk_stub (call_frame_t *frame,
+ fop_lk_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno,
+ struct flock *lock);
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+@lock - lock parameter to @fn.
+ NOTE: @lock is copied to a different memory location while creating
+ stub.
+call_stub_t *
+fop_gf_lk_cbk_stub (call_frame_t *frame,
+ fop_gf_lk_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno,
+ struct flock *lock);
+
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+@entries - entries parameter to @fn.
+call_stub_t *
+fop_readdir_cbk_stub (call_frame_t *frame,
+ fop_readdir_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno,
+ gf_dirent_t *entries);
+
+
+@frame - call frame which has to be used to resume the call at call_resume().
+@fn - procedure to call during call_resume().
+@op_ret - op_ret parameter to @fn.
+@op_errno - op_errno parameter to @fn.
+@file_checksum - file_checksum parameter to @fn.
+ NOTE: file_checksum will be copied to a different memory location
+ while creating stub.
+@dir_checksum - dir_checksum parameter to @fn.
+ NOTE: file_checksum will be copied to a different memory location
+ while creating stub.
+call_stub_t *
+fop_checksum_cbk_stub (call_frame_t *frame,
+ fop_checksum_cbk_t fn,
+ int32_t op_ret,
+ int32_t op_errno,
+ uint8_t *file_checksum,
+ uint8_t *dir_checksum);
+
+resuming a call:
+---------------
+ call can be resumed using call stub through call_resume API.
+
+ void call_resume (call_stub_t *stub);
+
+ stub - call stub created during pausing a call.
+
+ NOTE: call_resume() will decrease reference count of any fd_t, dict_t and inode_t that it finds
+ in stub->args.<operation>.<fd_t-or-inode_t-or-dict_t>. so, if any fd_t, dict_t or
+ inode_t pointers are assigned at stub->args.<operation>.<fd_t-or-inode_t-or-dict_t> after
+ fop_<operation>_stub() call, they must be <fd_t-or-inode_t-or-dict_t>_ref()ed.
+
+ call_resume does not STACK_DESTROY() for any fop.
+
+ if stub->fn is NULL, call_resume does STACK_WIND() or STACK_UNWIND() using the stub->frame.
+
+ return - call resume fails only if stub is NULL. call resume fails with errno set to EINVAL.
diff --git a/doc/hacker-guide/hacker-guide.tex b/doc/hacker-guide/hacker-guide.tex
new file mode 100644
index 000000000..72c44df1a
--- /dev/null
+++ b/doc/hacker-guide/hacker-guide.tex
@@ -0,0 +1,312 @@
+\documentclass{book}[12pt]
+\usepackage{graphicx}
+% \usepackage{fancyhdr}
+
+% \pagestyle{fancy}
+\begin{document}
+
+% \headheight 117pt
+% \rhead{\includegraphics{zr-logo.eps}}
+
+\author{Z Research}
+\title{GlusterFS 1.3 Hacker's Guide}
+\date{June 1, 2007}
+
+\maketitle
+\frontmatter
+\tableofcontents
+
+\mainmatter
+\chapter{Introduction}
+
+\section{Coding guidelines}
+GlusterFS uses GNU Arch for version control. To get the latest source do:
+\begin{verbatim}
+ $ tla register-archive http://arch.sv.gnu.org/archives/gluster
+ $ tla -A gluster@sv.gnu.org get glusterfs--mainline--2.4
+\end{verbatim}
+\noindent
+GlusterFS follows the GNU coding
+standards\footnote{http://www.gnu.org/prep/standards\_toc.html} for the
+most part.
+
+\chapter{Major components}
+\section{libglusterfs}
+\texttt{libglusterfs} contains supporting code used by all the other components.
+The important files here are:
+
+\texttt{dict.c}: This is an implementation of a serializable dictionary type. It is
+used by the protocol code to send requests and replies. It is also used to pass options
+to translators.
+
+\texttt{logging.c}: This is a thread-safe logging library. The log messages go to a
+file (default \texttt{/usr/local/var/log/glusterfs/*}).
+
+\texttt{protocol.c}: This file implements the GlusterFS on-the-wire
+protocol. The protocol itself is a simple ASCII protocol, designed to
+be easy to parse and be human readable.
+
+A sample GlusterFS protocol block looks like this:
+\begin{verbatim}
+ Block Start header
+ 0000000000000023 callid
+ 00000001 type
+ 00000016 op
+ xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx human-readable name
+ 00000000000000000000000000000ac3 block size
+ <...> block
+ Block End
+\end{verbatim}
+
+\texttt{stack.h}: This file defines the \texttt{STACK\_WIND} and
+\texttt{STACK\_UNWIND} macros which are used to implement the parallel
+stack that is maintained for inter-xlator calls. See the \textsl{Taking control
+of the stack} section below for more details.
+
+\texttt{spec.y}: This contains the Yacc grammar for the GlusterFS
+specification file, and the parsing code.
+
+
+Draw diagrams of trees
+Two rules:
+(1) directory structure is same
+(2) file can exist only on one node
+
+\section{glusterfs-fuse}
+\section{glusterfsd}
+\section{transport}
+\section{scheduler}
+\section{xlator}
+
+\chapter{xlators}
+\section{Taking control of the stack}
+One can think of STACK\_WIND/UNWIND as a very specific RPC mechanism.
+
+% \includegraphics{stack.eps}
+
+\section{Overview of xlators}
+
+\flushleft{\LARGE\texttt{cluster/}}
+\vskip 2ex
+\flushleft{\Large\texttt{afr}}
+\vskip 2ex
+\flushleft{\Large\texttt{stripe}}
+\vskip 2ex
+\flushleft{\Large\texttt{unify}}
+
+\vskip 4ex
+\flushleft{\LARGE\texttt{debug/}}
+\vskip 2ex
+\flushleft{\Large\texttt{trace}}
+\vskip 2ex
+The trace xlator simply logs all fops and mops, and passes them through to its child.
+
+\vskip 4ex
+\flushleft{\LARGE\texttt{features/}}
+\flushleft{\Large\texttt{posix-locks}}
+\vskip 2ex
+This xlator implements \textsc{posix} record locking semantics over
+any kind of storage.
+
+\vskip 4ex
+\flushleft{\LARGE\texttt{performance/}}
+
+\flushleft{\Large\texttt{io-threads}}
+\vskip 2ex
+\flushleft{\Large\texttt{read-ahead}}
+\vskip 2ex
+\flushleft{\Large\texttt{stat-prefetch}}
+\vskip 2ex
+\flushleft{\Large\texttt{write-behind}}
+\vskip 2ex
+
+\vskip 4ex
+\flushleft{\LARGE\texttt{protocol/}}
+\vskip 2ex
+
+\flushleft{\Large\texttt{client}}
+\vskip 2ex
+
+\flushleft{\Large\texttt{server}}
+\vskip 2ex
+
+\vskip 4ex
+\flushleft{\LARGE\texttt{storage/}}
+\flushleft{\Large\texttt{posix}}
+\vskip 2ex
+The \texttt{posix} xlator is the one which actually makes calls to the
+on-disk filesystem. Currently this is the only storage xlator available. However,
+plans to develop other storage xlators, such as one for Amazon's S3 service, are
+on the roadmap.
+
+\chapter{Writing a simple xlator}
+\noindent
+In this section we're going to write a rot13 xlator. ``Rot13'' is a
+simple substitution cipher which obscures a text by replacing each
+letter with the letter thirteen places down the alphabet. So `a' (0)
+would become `n' (12), `b' would be 'm', and so on. Rot13 applied to
+a piece of ciphertext yields the plaintext again, because rot13 is its
+own inverse, since:
+
+\[
+x_c = x + 13\; (mod\; 26)
+\]
+\[
+x_c + 13\; (mod\; 26) = x + 13 + 13\; (mod\; 26) = x
+\]
+
+First we include the requisite headers.
+
+\begin{verbatim}
+#include <ctype.h>
+#include <sys/uio.h>
+
+#include "glusterfs.h"
+#include "xlator.h"
+#include "logging.h"
+
+/*
+ * This is a rot13 ``encryption'' xlator. It rot13's data when
+ * writing to disk and rot13's it back when reading it.
+ * This xlator is meant as an example, not for production
+ * use ;) (hence no error-checking)
+ */
+
+\end{verbatim}
+
+Then we write the rot13 function itself. For simplicity, we only transform lower case
+letters. Any other byte is passed through as it is.
+
+\begin{verbatim}
+/* We only handle lower case letters for simplicity */
+static void
+rot13 (char *buf, int len)
+{
+ int i;
+ for (i = 0; i < len; i++) {
+ if (isalpha (buf[i]))
+ buf[i] = (buf[i] - 'a' + 13) % 26;
+ else if (buf[i] <= 26)
+ buf[i] = (buf[i] + 13) % 26 + 'a';
+ }
+}
+\end{verbatim}
+
+Next comes a utility function whose purpose will be clear after looking at the code
+below.
+
+\begin{verbatim}
+static void
+rot13_iovec (struct iovec *vector, int count)
+{
+ int i;
+ for (i = 0; i < count; i++) {
+ rot13 (vector[i].iov_base, vector[i].iov_len);
+ }
+}
+\end{verbatim}
+
+\begin{verbatim}
+static int32_t
+rot13_readv_cbk (call_frame_t *frame,
+ call_frame_t *prev_frame,
+ xlator_t *this,
+ int32_t op_ret,
+ int32_t op_errno,
+ struct iovec *vector,
+ int32_t count)
+{
+ rot13_iovec (vector, count);
+
+ STACK_UNWIND (frame, op_ret, op_errno, vector, count);
+ return 0;
+}
+
+static int32_t
+rot13_readv (call_frame_t *frame,
+ xlator_t *this,
+ dict_t *ctx,
+ size_t size,
+ off_t offset)
+{
+ STACK_WIND (frame,
+ rot13_readv_cbk,
+ FIRST_CHILD (this),
+ FIRST_CHILD (this)->fops->readv,
+ ctx, size, offset);
+ return 0;
+}
+
+static int32_t
+rot13_writev_cbk (call_frame_t *frame,
+ call_frame_t *prev_frame,
+ xlator_t *this,
+ int32_t op_ret,
+ int32_t op_errno)
+{
+ STACK_UNWIND (frame, op_ret, op_errno);
+ return 0;
+}
+
+static int32_t
+rot13_writev (call_frame_t *frame,
+ xlator_t *this,
+ dict_t *ctx,
+ struct iovec *vector,
+ int32_t count,
+ off_t offset)
+{
+ rot13_iovec (vector, count);
+
+ STACK_WIND (frame,
+ rot13_writev_cbk,
+ FIRST_CHILD (this),
+ FIRST_CHILD (this)->fops->writev,
+ ctx, vector, count, offset);
+ return 0;
+}
+
+\end{verbatim}
+
+Every xlator must define two functions and two external symbols. The functions are
+\texttt{init} and \texttt{fini}, and the symbols are \texttt{fops} and \texttt{mops}.
+The \texttt{init} function is called when the xlator is loaded by GlusterFS, and
+contains code for the xlator to initialize itself. Note that if an xlator is present
+multiple times in the spec tree, the \texttt{init} function will be called each time
+the xlator is loaded.
+
+\begin{verbatim}
+int32_t
+init (xlator_t *this)
+{
+ if (!this->children) {
+ gf_log ("rot13", GF_LOG_ERROR,
+ "FATAL: rot13 should have exactly one child");
+ return -1;
+ }
+
+ gf_log ("rot13", GF_LOG_DEBUG, "rot13 xlator loaded");
+ return 0;
+}
+\end{verbatim}
+
+\begin{verbatim}
+
+void
+fini (xlator_t *this)
+{
+ return;
+}
+
+struct xlator_fops fops = {
+ .readv = rot13_readv,
+ .writev = rot13_writev
+};
+
+struct xlator_mops mops = {
+};
+
+\end{verbatim}
+
+\end{document}
+
diff --git a/doc/hacker-guide/posix.txt b/doc/hacker-guide/posix.txt
new file mode 100644
index 000000000..d0132abfe
--- /dev/null
+++ b/doc/hacker-guide/posix.txt
@@ -0,0 +1,59 @@
+---------------
+* storage/posix
+---------------
+
+- SET_FS_ID
+
+ This is so that all filesystem checks are done with the user's
+ uid/gid and not GlusterFS's uid/gid.
+
+- MAKE_REAL_PATH
+
+ This macro concatenates the base directory of the posix volume
+ ('option directory') with the given path.
+
+- need_xattr in lookup
+
+ If this flag is passed, lookup returns a xattr dictionary that contains
+ the file's create time, the file's contents, and the version number
+ of the file.
+
+ This is a hack to increase small file performance. If an application
+ wants to read a small file, it can finish its job with just a lookup
+ call instead of a lookup followed by read.
+
+- getdents/setdents
+
+ These are used by unify to set and get directory entries.
+
+- ALIGN_BUF
+
+ Macro to align an address to a page boundary (4K).
+
+- priv->export_statfs
+
+ In some cases, two exported volumes may reside on the same
+ partition on the server. Sending statvfs info for both
+ the volumes will lead to erroneous df output at the client,
+ since free space on the partition will be counted twice.
+
+ In such cases, user can disable exporting statvfs info
+ on one of the volumes by setting this option.
+
+- xattrop
+
+ This fop is used by replicate to set version numbers on files.
+
+- getxattr/setxattr hack to read/write files
+
+ A key, GLUSTERFS_FILE_CONTENT_STRING, is handled in a special way by
+ getxattr/setxattr. A getxattr with the key will return the entire
+ content of the file as the value. A setxattr with the key will write
+ the value as the entire content of the file.
+
+- posix_checksum
+
+ This calculates a simple XOR checksum on all entry names in a
+ directory that is used by unify to compare directory contents.
+
+
diff --git a/doc/hacker-guide/replicate.txt b/doc/hacker-guide/replicate.txt
new file mode 100644
index 000000000..284f373fb
--- /dev/null
+++ b/doc/hacker-guide/replicate.txt
@@ -0,0 +1,206 @@
+---------------
+* cluster/replicate
+---------------
+
+Before understanding replicate, one must understand two internal FOPs:
+
+GF_FILE_LK:
+ This is exactly like fcntl(2) locking, except the locks are in a
+ separate domain from locks held by applications.
+
+GF_DIR_LK (loc_t *loc, char *basename):
+ This allows one to lock a name under a directory. For example,
+ to lock /mnt/glusterfs/foo, one would use the call:
+
+ GF_DIR_LK ({loc_t for "/mnt/glusterfs"}, "foo")
+
+ If one wishes to lock *all* the names under a particular directory,
+ supply the basename argument as NULL.
+
+ The locks can either be read locks or write locks; consult the
+ function prototype for more details.
+
+Both these operations are implemented by the features/locks (earlier
+known as posix-locks) translator.
+
+--------------
+* Basic design
+--------------
+
+All FOPs can be classified into four major groups:
+
+ - inode-read
+ Operations that read an inode's data (file contents) or metadata (perms, etc.).
+
+ access, getxattr, fstat, readlink, readv, stat.
+
+ - inode-write
+ Operations that modify an inode's data or metadata.
+
+ chmod, chown, truncate, writev, utimens.
+
+ - dir-read
+ Operations that read a directory's contents or metadata.
+
+ readdir, getdents, checksum.
+
+ - dir-write
+ Operations that modify a directory's contents or metadata.
+
+ create, link, mkdir, mknod, rename, rmdir, symlink, unlink.
+
+ Some of these make a subgroup in that they modify *two* different entries:
+ link, rename, symlink.
+
+ - Others
+ Other operations.
+
+ flush, lookup, open, opendir, statfs.
+
+------------
+* Algorithms
+------------
+
+Each of the four major groups has its own algorithm:
+
+ ----------------------
+ - inode-read, dir-read
+ ----------------------
+
+ = Send a request to the first child that is up:
+ - if it fails:
+ try the next available child
+ - if we have exhausted all children:
+ return failure
+
+ -------------
+ - inode-write
+ -------------
+
+ All operations are done in parallel unless specified otherwise.
+
+ (1) Send a GF_FILE_LK request on all children for a write lock on
+ the appropriate region
+ (for metadata operations: entire file (0, 0)
+ for writev: (offset, offset+size of buffer))
+
+ - If a lock request fails on a child:
+ unlock all children
+ try to acquire a blocking lock (F_SETLKW) on each child, serially.
+
+ If this fails (due to ENOTCONN or EINVAL):
+ Consider this child as dead for rest of transaction.
+
+ (2) Mark all children as "pending" on all (alive) children
+ (see below for meaning of "pending").
+
+ - If it fails on any child:
+ mark it as dead (in transaction local state).
+
+ (3) Perform operation on all (alive) children.
+
+ - If it fails on any child:
+ mark it as dead (in transaction local state).
+
+ (4) Unmark all successful children as not "pending" on all nodes.
+
+ (5) Unlock region on all (alive) children.
+
+ -----------
+ - dir-write
+ -----------
+
+ The algorithm for dir-write is same as above except instead of holding
+ GF_FILE_LK locks we hold a GF_DIR_LK lock on the name being operated upon.
+ In case of link-type calls, we hold locks on both the operand names.
+
+-----------
+* "pending"
+-----------
+
+ The "pending" number is like a journal entry. A pending entry is an
+ array of 32-bit integers stored in network byte-order as the extended
+ attribute of an inode (which can be a directory as well).
+
+ There are three keys corresponding to three types of pending operations:
+
+ - AFR_METADATA_PENDING
+ There are some metadata operations pending on this inode (perms, ctime/mtime,
+ xattr, etc.).
+
+ - AFR_DATA_PENDING
+ There is some data pending on this inode (writev).
+
+ - AFR_ENTRY_PENDING
+ There are some directory operations pending on this directory
+ (create, unlink, etc.).
+
+-----------
+* Self heal
+-----------
+
+ - On lookup, gather extended attribute data:
+ - If entry is a regular file:
+ - If an entry is present on one child and not on others:
+ - create entry on others.
+ - If entries exist but have different metadata (perms, etc.):
+ - consider the entry with the highest AFR_METADATA_PENDING number as
+ definitive and replicate its attributes on children.
+
+ - If entry is a directory:
+ - Consider the entry with the higest AFR_ENTRY_PENDING number as
+ definitive and replicate its contents on all children.
+
+ - If any two entries have non-matching types (i.e., one is file and
+ other is directory):
+ - Announce to the user via log that a split-brain situation has been
+ detected, and do nothing.
+
+ - On open, gather extended attribute data:
+ - Consider the file with the highest AFR_DATA_PENDING number as
+ the definitive one and replicate its contents on all other
+ children.
+
+ During all self heal operations, appropriate locks must be held on all
+ regions/entries being affected.
+
+---------------
+* Inode scaling
+---------------
+
+Inode scaling is necessary because if a situation arises where:
+ - An inode number is returned for a directory (by lookup) which was
+ previously the inode number of a file (as per FUSE's table), then
+ FUSE gets horribly confused (consult a FUSE expert for more details).
+
+To avoid such a situation, we distribute the 64-bit inode space equally
+among all children of replicate.
+
+To illustrate:
+
+If c1, c2, c3 are children of replicate, they each get 1/3 of the available
+inode space:
+
+Child: c1 c2 c3 c1 c2 c3 c1 c2 c3 c1 c2 ...
+Inode number: 1 2 3 4 5 6 7 8 9 10 11 ...
+
+Thus, if lookup on c1 returns an inode number "2", it is scaled to "4"
+(which is the second inode number in c1's space).
+
+This way we ensure that there is never a collision of inode numbers from
+two different children.
+
+This reduction of inode space doesn't really reduce the usability of
+replicate since even if we assume replicate has 1024 children (which would be a
+highly unusual scenario), each child still has a 54-bit inode space.
+
+2^54 ~ 1.8 * 10^16
+
+which is much larger than any real world requirement.
+
+
+==============================================
+$ Last updated: Sun Oct 12 23:17:01 IST 2008 $
+$ Author: Vikas Gorur <vikas@zresearch.com> $
+==============================================
+
diff --git a/doc/hacker-guide/write-behind.txt b/doc/hacker-guide/write-behind.txt
new file mode 100644
index 000000000..498e95480
--- /dev/null
+++ b/doc/hacker-guide/write-behind.txt
@@ -0,0 +1,45 @@
+basic working
+--------------
+
+ write behind is basically a translator to lie to the application that the write-requests are finished, even before it is actually finished.
+
+ on a regular translator tree without write-behind, control flow is like this:
+
+ 1. application makes a write() system call.
+ 2. VFS ==> FUSE ==> /dev/fuse.
+ 3. fuse-bridge initiates a glusterfs writev() call.
+ 4. writev() is STACK_WIND()ed upto client-protocol or storage translator.
+ 5. client-protocol, on recieving reply from server, starts STACK_UNWIND() towards the fuse-bridge.
+
+ on a translator tree with write-behind, control flow is like this:
+
+ 1. application makes a write() system call.
+ 2. VFS ==> FUSE ==> /dev/fuse.
+ 3. fuse-bridge initiates a glusterfs writev() call.
+ 4. writev() is STACK_WIND()ed upto write-behind translator.
+ 5. write-behind adds the write buffer to its internal queue and does a STACK_UNWIND() towards the fuse-bridge.
+
+ write call is completed in application's percepective. after STACK_UNWIND()ing towards the fuse-bridge, write-behind initiates a fresh writev() call to its child translator, whose replies will be consumed by write-behind itself. write-behind _doesn't_ cache the write buffer, unless 'option flush-behind on' is specified in volume specification file.
+
+windowing
+---------
+
+ write respect to write-behind, each write-buffer has three flags: 'stack_wound', 'write_behind' and 'got_reply'.
+
+ stack_wound: if set, indicates that write-behind has initiated STACK_WIND() towards child translator.
+
+ write_behind: if set, indicates that write-behind has done STACK_UNWIND() towards fuse-bridge.
+
+ got_reply: if set, indicates that write-behind has recieved reply from child translator for a writev() STACK_WIND(). a request will be destroyed by write-behind only if this flag is set.
+
+ currently pending write requests = aggregate size of requests with write_behind = 1 and got_reply = 0.
+
+ window size limits the aggregate size of currently pending write requests. once the pending requests' size has reached the window size, write-behind blocks writev() calls from fuse-bridge.
+ blocking is only from application's perspective. write-behind does STACK_WIND() to child translator straight-away, but hold behind the STACK_UNWIND() towards fuse-bridge. STACK_UNWIND() is done only once write-behind gets enough replies to accomodate for currently blocked request.
+
+flush behind
+------------
+
+ if 'option flush-behind on' is specified in volume specification file, then write-behind sends aggregate write requests to child translator, instead of regular per request STACK_WIND()s.
+
+