summaryrefslogtreecommitdiffstats
path: root/doc/legacy/replicate.lyx
diff options
context:
space:
mode:
Diffstat (limited to 'doc/legacy/replicate.lyx')
-rw-r--r--doc/legacy/replicate.lyx797
1 files changed, 0 insertions, 797 deletions
diff --git a/doc/legacy/replicate.lyx b/doc/legacy/replicate.lyx
deleted file mode 100644
index e3d081191e0..00000000000
--- a/doc/legacy/replicate.lyx
+++ /dev/null
@@ -1,797 +0,0 @@
-#LyX 1.4.2 created this file. For more info see http://www.lyx.org/
-\lyxformat 245
-\begin_document
-\begin_header
-\textclass article
-\language english
-\inputencoding auto
-\fontscheme default
-\graphics default
-\paperfontsize default
-\spacing single
-\papersize default
-\use_geometry false
-\use_amsmath 1
-\cite_engine basic
-\use_bibtopic false
-\paperorientation portrait
-\secnumdepth 3
-\tocdepth 3
-\paragraph_separation skip
-\defskip medskip
-\quotes_language english
-\papercolumns 1
-\papersides 1
-\paperpagestyle default
-\tracking_changes false
-\output_changes false
-\end_header
-
-\begin_body
-
-\begin_layout Title
-
-\size larger
-Automatic File Replication (replicate) in GlusterFS
-\end_layout
-
-\begin_layout Author
-Vikas Gorur
-\family typewriter
-\size larger
-<vikas@gluster.com>
-\end_layout
-
-\begin_layout Standard
-\begin_inset ERT
-status open
-
-\begin_layout Standard
-
-
-\backslash
-hrule
-\end_layout
-
-\end_inset
-
-
-\end_layout
-
-\begin_layout Section*
-Overview
-\end_layout
-
-\begin_layout Standard
-This document describes the design and usage of the replicate translator in GlusterFS.
- This document is valid for the 1.4.x releases, and not earlier ones.
-\end_layout
-
-\begin_layout Standard
-The replicate translator of GlusterFS aims to keep identical copies of a file
- on all its subvolumes, as far as possible.
- It tries to do this by performing all filesystem mutation operations (writing
- data, creating files, changing ownership, etc.) on all its subvolumes in
- such a way that if an operation succeeds on atleast one subvolume, all
- other subvolumes can later be brought up to date.
-\end_layout
-
-\begin_layout Standard
-In the rest of the document the terms
-\begin_inset Quotes eld
-\end_inset
-
-subvolume
-\begin_inset Quotes erd
-\end_inset
-
- and
-\begin_inset Quotes eld
-\end_inset
-
-server
-\begin_inset Quotes erd
-\end_inset
-
- are used interchangeably, trusting that it will cause no confusion to the
- reader.
-\end_layout
-
-\begin_layout Section*
-Usage
-\end_layout
-
-\begin_layout Standard
-A sample volume declaration for replicate looks like this:
-\end_layout
-
-\begin_layout Standard
-\begin_inset ERT
-status open
-
-\begin_layout Standard
-
-
-\backslash
-begin{verbatim}
-\end_layout
-
-\begin_layout Standard
-
-volume replicate
-\end_layout
-
-\begin_layout Standard
-
- type cluster/replicate
-\end_layout
-
-\begin_layout Standard
-
- # options, see below for description
-\end_layout
-
-\begin_layout Standard
-
- subvolumes brick1 brick2
-\end_layout
-
-\begin_layout Standard
-
-end-volume
-\end_layout
-
-\begin_layout Standard
-
-
-\backslash
-end{verbatim}
-\end_layout
-
-\begin_layout Standard
-
-\end_layout
-
-\begin_layout Standard
-
-\end_layout
-
-\begin_layout Standard
-
-\end_layout
-
-\end_inset
-
-
-\end_layout
-
-\begin_layout Standard
-This defines an replicate volume with two subvolumes, brick1, and brick2.
- For replicate to work properly, it is essential that its subvolumes support
-\series bold
-extended attributes
-\series default
-.
- This means that you should choose a backend filesystem that supports extended
- attributes, like XFS, ReiserFS, or Ext3.
-\end_layout
-
-\begin_layout Standard
-The storage volumes used as backend for replicate
-\emph on
-must
-\emph default
- have a posix-locks volume loaded above them.
-\end_layout
-
-\begin_layout Standard
-\begin_inset ERT
-status open
-
-\begin_layout Standard
-
-
-\backslash
-begin{verbatim}
-\end_layout
-
-\begin_layout Standard
-
-volume brick1
-\end_layout
-
-\begin_layout Standard
-
- type features/posix-locks
-\end_layout
-
-\begin_layout Standard
-
- subvolumes brick1-ds
-\end_layout
-
-\begin_layout Standard
-
-end-volume
-\end_layout
-
-\begin_layout Standard
-
-
-\backslash
-end{verbatim}
-\end_layout
-
-\end_inset
-
-
-\end_layout
-
-\begin_layout Section*
-Design
-\end_layout
-
-\begin_layout Subsection*
-Read algorithm
-\end_layout
-
-\begin_layout Standard
-All operations that do not modify the file or directory are sent to all
- the subvolumes and the first successful reply is returned to the application.
-\end_layout
-
-\begin_layout Standard
-The read() system call (reading data from a file) is an exception.
- For read() calls, replicate tries to do load balancing by sending all reads from
- a particular file to a particular server.
-\end_layout
-
-\begin_layout Standard
-The read algorithm is also affected by the option read-subvolume; see below
- for details.
-\end_layout
-
-\begin_layout Subsection*
-Classes of file operations
-\end_layout
-
-\begin_layout Standard
-replicate divides all filesystem write operations into three classes:
-\end_layout
-
-\begin_layout Itemize
-
-\series bold
-data:
-\series default
-Operations that modify the contents of a file (write, truncate).
-\end_layout
-
-\begin_layout Itemize
-
-\series bold
-metadata:
-\series default
-Operations that modify attributes of a file or directory (permissions, ownership
-, etc.).
-\end_layout
-
-\begin_layout Itemize
-
-\series bold
-entry:
-\series default
-Operations that create or delete directory entries (mkdir, create, rename,
- rmdir, unlink, etc.).
-\end_layout
-
-\begin_layout Subsection*
-Locking and Change Log
-\end_layout
-
-\begin_layout Standard
-To ensure consistency across subvolumes, replicate holds a lock whenever a modificatio
-n is being made to a file or directory.
- By default, replicate considers the first subvolume as the sole lock server.
- However, the number of lock servers can be increased up to the total number
- of subvolumes.
-\end_layout
-
-\begin_layout Standard
-The change log is a set of extended attributes associated with files and
- directories that replicate maintains.
- The change log keeps track of the changes made to files and directories
- (data, metadata, entry) so that the self-heal algorithm knows which copy
- of a file or directory is the most recent one.
-\end_layout
-
-\begin_layout Subsection*
-Write algorithm
-\end_layout
-
-\begin_layout Standard
-The algorithm for all write operations (data, metadata, entry) is:
-\end_layout
-
-\begin_layout Enumerate
-Lock the file (or directory) on all of the lock servers (see options below).
-\end_layout
-
-\begin_layout Enumerate
-Write change log entries on all servers.
-\end_layout
-
-\begin_layout Enumerate
-Perform the operation.
-\end_layout
-
-\begin_layout Enumerate
-Erase change log entries.
-\end_layout
-
-\begin_layout Enumerate
-Unlock the file (or directory) on all of the lock servers.
-\end_layout
-
-\begin_layout Standard
-The above algorithm is a simplified version intended for general users.
- Please refer to the source code for the full details.
-\end_layout
-
-\begin_layout Subsection*
-Self-Heal
-\end_layout
-
-\begin_layout Standard
-replicate automatically tries to fix any inconsistencies it detects among different
- copies of a file.
- It uses information in the change log to determine which copy is the
-\begin_inset Quotes eld
-\end_inset
-
-correct
-\begin_inset Quotes erd
-\end_inset
-
- version.
-\end_layout
-
-\begin_layout Standard
-Self-heal is triggered when a file or directory is first
-\begin_inset Quotes eld
-\end_inset
-
-accessed
-\begin_inset Quotes erd
-\end_inset
-
-, that is, the first time any operation is attempted on it.
- The self-heal algorithm does the following things:
-\end_layout
-
-\begin_layout Standard
-If the entry being accessed is a directory:
-\end_layout
-
-\begin_layout Itemize
-The contents of the
-\begin_inset Quotes eld
-\end_inset
-
-correct
-\begin_inset Quotes erd
-\end_inset
-
- version is replicated on all subvolumes, by deleting entries and creating
- entries as necessary.
-\end_layout
-
-\begin_layout Standard
-If the entry being accessed is a file:
-\end_layout
-
-\begin_layout Itemize
-If the file does not exist on some subvolumes, it is created.
-\end_layout
-
-\begin_layout Itemize
-If there is a mismatch in the size of the file, or ownership, or permission,
- it is fixed.
-\end_layout
-
-\begin_layout Itemize
-If the change log indicates that some copies need updating, they are updated.
-\end_layout
-
-\begin_layout Subsection*
-Split-brain
-\end_layout
-
-\begin_layout Standard
-It may happen that one replicate client can access only some of the servers in
- a cluster and another replicate client can access the remaining servers.
- Or it may happen that in a cluster of two servers, one server goes down
- and comes back up, but the other goes down immediately.
- Both these scenarios result in a
-\begin_inset Quotes eld
-\end_inset
-
-split-brain
-\begin_inset Quotes erd
-\end_inset
-
-.
-\end_layout
-
-\begin_layout Standard
-In a split-brain situation, there will be two or more copies of a file,
- all of which are
-\begin_inset Quotes eld
-\end_inset
-
-correct
-\begin_inset Quotes erd
-\end_inset
-
- in some sense.
- replicate without manual intervention has no way of knowing what to do, since
- it cannot consider any single copy as definitive, nor does it know of any
- meaningful way to merge the copies.
-\end_layout
-
-\begin_layout Standard
-If replicate detects that a split-brain has happened on a file, it disallows opening
- of that file.
- You will have to manually resolve the conflict by deleting all but one
- copy of the file.
- Alternatively you can set an automatic split-brain resolution policy by
- using the `favorite-child' option (see below).
-\end_layout
-
-\begin_layout Section*
-Translator Options
-\end_layout
-
-\begin_layout Standard
-replicate accepts the following options:
-\end_layout
-
-\begin_layout Subsection*
-read-subvolume (default: none)
-\end_layout
-
-\begin_layout Standard
-The value of this option must be the name of a subvolume.
- If given, all read operations are sent to only the specified subvolume,
- instead of being balanced across all subvolumes.
-\end_layout
-
-\begin_layout Subsection*
-favorite-child (default: none)
-\end_layout
-
-\begin_layout Standard
-The value of this option must be the name of a subvolume.
- If given, the specified subvolume will be preferentially used in resolving
- conflicts (
-\begin_inset Quotes eld
-\end_inset
-
-split-brain
-\begin_inset Quotes erd
-\end_inset
-
-).
- This means if a discrepancy is noticed in the attributes or content of
- a file, the copy on the `favorite-child' will be considered the definitive
- version and its contents will
-\emph on
-overwrite
-\emph default
-the contents of all other copies.
- Use this option with caution! It is possible to
-\emph on
-lose data
-\emph default
- with this option.
- If you are in doubt, do not specify this option.
-\end_layout
-
-\begin_layout Subsection*
-Self-heal options
-\end_layout
-
-\begin_layout Standard
-Setting any of these options to
-\begin_inset Quotes eld
-\end_inset
-
-off
-\begin_inset Quotes erd
-\end_inset
-
- prevents that kind of self-heal from being done on a file or directory.
- For example, if metadata self-heal is turned off, permissions and ownership
- are no longer fixed automatically.
-\end_layout
-
-\begin_layout Subsubsection*
-data-self-heal (default: on)
-\end_layout
-
-\begin_layout Standard
-Enable/disable self-healing of file contents.
-\end_layout
-
-\begin_layout Subsubsection*
-metadata-self-heal (default: off)
-\end_layout
-
-\begin_layout Standard
-Enable/disable self-healing of metadata (permissions, ownership, modification
- times).
-\end_layout
-
-\begin_layout Subsubsection*
-entry-self-heal (default: on)
-\end_layout
-
-\begin_layout Standard
-Enable/disable self-healing of directory entries.
-\end_layout
-
-\begin_layout Subsection*
-Change Log options
-\end_layout
-
-\begin_layout Standard
-If any of these options is turned off, it disables writing of change log
- entries for that class of file operations.
- That is, steps 2 and 4 of the write algorithm (see above) are not done.
- Note that if the change log is not written, the self-heal algorithm cannot
- determine the
-\begin_inset Quotes eld
-\end_inset
-
-correct
-\begin_inset Quotes erd
-\end_inset
-
- version of a file and hence self-heal will only be able to fix
-\begin_inset Quotes eld
-\end_inset
-
-obviously
-\begin_inset Quotes erd
-\end_inset
-
- wrong things (such as a file existing on only one node).
-\end_layout
-
-\begin_layout Subsubsection*
-data-change-log (default: on)
-\end_layout
-
-\begin_layout Standard
-Enable/disable writing of change log for data operations.
-\end_layout
-
-\begin_layout Subsubsection*
-metadata-change-log (default: on)
-\end_layout
-
-\begin_layout Standard
-Enable/disable writing of change log for metadata operations.
-\end_layout
-
-\begin_layout Subsubsection*
-entry-change-log (default: on)
-\end_layout
-
-\begin_layout Standard
-Enable/disable writing of change log for entry operations.
-\end_layout
-
-\begin_layout Subsection*
-Locking options
-\end_layout
-
-\begin_layout Standard
-These options let you specify the number of lock servers to use for each
- class of file operations.
- The default values are satisfactory in most cases.
- If you are extra paranoid, you may want to increase the values.
- However, be very cautious if you set the data- or entry- lock server counts
- to zero, since this can result in
-\emph on
-lost data.
-
-\emph default
- For example, if you set the data-lock-server-count to zero, and two application
-s write to the same region of a file, there is a possibility that none of
- your servers will have all the data.
- In other words, the copies will be
-\emph on
-inconsistent
-\emph default
-, and
-\emph on
-incomplete
-\emph default
-.
- Do not set data- and entry- lock server counts to zero unless you absolutely
- know what you are doing and agree to not hold GlusterFS responsible for
- any lost data.
-\end_layout
-
-\begin_layout Subsubsection*
-data-lock-server-count (default: 1)
-\end_layout
-
-\begin_layout Standard
-Number of lock servers to use for data operations.
-\end_layout
-
-\begin_layout Subsubsection*
-metadata-lock-server-count (default: 0)
-\end_layout
-
-\begin_layout Standard
-Number of lock servers to use for metadata operations.
-\end_layout
-
-\begin_layout Subsubsection*
-entry-lock-server-count (default: 1)
-\end_layout
-
-\begin_layout Standard
-Number of lock servers to use for entry operations.
-\end_layout
-
-\begin_layout Section*
-Known Issues
-\end_layout
-
-\begin_layout Subsection*
-Self-heal of file with more than one link (hard links):
-\end_layout
-
-\begin_layout Standard
-Consider two servers, A and B.
- Assume A is down, and the user creates a file `new' as a hard link to a
- file `old'.
- When A comes back up, replicate will see that the file `new' does not exist on
- A, and self-heal will create the file and copy the contents from B.
- However, now on server A the file `new' is not a link to the file `old'
- but an entirely different file.
-\end_layout
-
-\begin_layout Standard
-We know of no easy way to fix this problem, but we will try to fix it in
- forthcoming releases.
-\end_layout
-
-\begin_layout Subsection*
-File re-opening after a server comes back up:
-\end_layout
-
-\begin_layout Standard
-If a server A goes down and comes back up, any files which were opened while
- A was down and are still open will not have their writes replicated on
- A.
- In other words, data replication only happens on those servers which were
- alive when the file was opened.
-\end_layout
-
-\begin_layout Standard
-This is a rather tricky issue but we hope to fix it very soon.
-\end_layout
-
-\begin_layout Section*
-Frequently Asked Questions
-\end_layout
-
-\begin_layout Subsection*
-1.
- How can I force self-heal to happen?
-\end_layout
-
-\begin_layout Standard
-You can force self-heal to happen on your cluster by running a script or
- a command that accesses every file.
- A simple way to do it would be:
-\end_layout
-
-\begin_layout Standard
-\begin_inset ERT
-status open
-
-\begin_layout Standard
-
-\end_layout
-
-\begin_layout Standard
-
-
-\backslash
-begin{verbatim}
-\end_layout
-
-\begin_layout Standard
-
-$ ls -lR
-\end_layout
-
-\begin_layout Standard
-
-
-\backslash
-end{verbatim}
-\end_layout
-
-\begin_layout Standard
-
-\end_layout
-
-\end_inset
-
-
-\end_layout
-
-\begin_layout Standard
-Run the command in all directories which you want to forcibly self-heal.
-\end_layout
-
-\begin_layout Subsection*
-2.
- Which backend filesystem should I use for replicate?
-\end_layout
-
-\begin_layout Standard
-You can use any backend filesystem that supports extended attributes.
- We know of users successfully using XFR, ReiserFS, and Ext3.
-\end_layout
-
-\begin_layout Subsection*
-3.
- What can I do to improve replicate performance?
-\end_layout
-
-\begin_layout Standard
-Try loading performance translators such as io-threads, write-behind, io-cache,
- and read-ahead depending on your workload.
- If you are willing to sacrifice correctness in corner cases, you can experiment
- with the lock-server-count and the change-log options (see above).
- As warned earlier, be very careful!
-\end_layout
-
-\begin_layout Subsection*
-4.
- How can I selectively replicate files?
-\end_layout
-
-\begin_layout Standard
-There is no support for selective replication in replicate itself.
- You can achieve selective replication by loading the unify translator over
- replicate, and using the switch scheduler.
- Configure unify with two subvolumes, one of them being replicate.
- Using the switch scheduler, schedule all files for which you need replication
- to the replicate subvolume.
- Consult unify and switch documentation for more details.
-\end_layout
-
-\begin_layout Section*
-Contact
-\end_layout
-
-\begin_layout Standard
-If you need more assistance on replicate, contact us on the mailing list <gluster-user
-s@gluster.org> (visit gluster.org for details on how to subscribe).
-\end_layout
-
-\begin_layout Standard
-Send you comments and suggestions about this document to <vikas@gluster.com>.
-\end_layout
-
-\end_body
-\end_document