diff options
Diffstat (limited to 'doc/user-guide')
| -rw-r--r-- | doc/user-guide/Makefile.am | 1 | ||||
| -rw-r--r-- | doc/user-guide/advanced-stripe.odg | bin | 0 -> 12648 bytes | |||
| -rw-r--r-- | doc/user-guide/advanced-stripe.pdf | bin | 0 -> 13382 bytes | |||
| -rw-r--r-- | doc/user-guide/colonO-icon.jpg | bin | 0 -> 779 bytes | |||
| -rw-r--r-- | doc/user-guide/fdl.texi | 454 | ||||
| -rw-r--r-- | doc/user-guide/fuse.odg | bin | 0 -> 13190 bytes | |||
| -rw-r--r-- | doc/user-guide/fuse.pdf | bin | 0 -> 14948 bytes | |||
| -rw-r--r-- | doc/user-guide/ha.odg | bin | 0 -> 37290 bytes | |||
| -rw-r--r-- | doc/user-guide/ha.pdf | bin | 0 -> 19403 bytes | |||
| -rw-r--r-- | doc/user-guide/stripe.odg | bin | 0 -> 10188 bytes | |||
| -rw-r--r-- | doc/user-guide/stripe.pdf | bin | 0 -> 11941 bytes | |||
| -rw-r--r-- | doc/user-guide/unify.odg | bin | 0 -> 12955 bytes | |||
| -rw-r--r-- | doc/user-guide/unify.pdf | bin | 0 -> 18969 bytes | |||
| -rw-r--r-- | doc/user-guide/user-guide.info | 2698 | ||||
| -rw-r--r-- | doc/user-guide/user-guide.pdf | bin | 0 -> 353986 bytes | |||
| -rw-r--r-- | doc/user-guide/user-guide.texi | 2226 | ||||
| -rw-r--r-- | doc/user-guide/xlator.odg | bin | 0 -> 12169 bytes | |||
| -rw-r--r-- | doc/user-guide/xlator.pdf | bin | 0 -> 14358 bytes | 
18 files changed, 5379 insertions, 0 deletions
diff --git a/doc/user-guide/Makefile.am b/doc/user-guide/Makefile.am new file mode 100644 index 00000000000..8d7068f14aa --- /dev/null +++ b/doc/user-guide/Makefile.am @@ -0,0 +1 @@ +info_TEXINFOS = user-guide.texi diff --git a/doc/user-guide/advanced-stripe.odg b/doc/user-guide/advanced-stripe.odg Binary files differnew file mode 100644 index 00000000000..7686d7091b2 --- /dev/null +++ b/doc/user-guide/advanced-stripe.odg diff --git a/doc/user-guide/advanced-stripe.pdf b/doc/user-guide/advanced-stripe.pdf Binary files differnew file mode 100644 index 00000000000..ec8b03dcfbb --- /dev/null +++ b/doc/user-guide/advanced-stripe.pdf diff --git a/doc/user-guide/colonO-icon.jpg b/doc/user-guide/colonO-icon.jpg Binary files differnew file mode 100644 index 00000000000..3e66f7a2775 --- /dev/null +++ b/doc/user-guide/colonO-icon.jpg diff --git a/doc/user-guide/fdl.texi b/doc/user-guide/fdl.texi new file mode 100644 index 00000000000..e33c687cdfb --- /dev/null +++ b/doc/user-guide/fdl.texi @@ -0,0 +1,454 @@ + +@c @node GNU Free Documentation License +@c @appendixsec GNU Free Documentation License + +@cindex FDL, GNU Free Documentation License +@center Version 1.2, November 2002 + +@display +Copyright @copyright{} 2000,2001,2002 Free Software Foundation, Inc. +59 Temple Place, Suite 330, Boston, MA  02111-1307, USA + +Everyone is permitted to copy and distribute verbatim copies +of this license document, but changing it is not allowed. +@end display + +@enumerate 0 +@item +PREAMBLE + +The purpose of this License is to make a manual, textbook, or other +functional and useful document @dfn{free} in the sense of freedom: to +assure everyone the effective freedom to copy and redistribute it, +with or without modifying it, either commercially or noncommercially. +Secondarily, this License preserves for the author and publisher a way +to get credit for their work, while not being considered responsible +for modifications made by others. + +This License is a kind of ``copyleft'', which means that derivative +works of the document must themselves be free in the same sense.  It +complements the GNU General Public License, which is a copyleft +license designed for free software. + +We have designed this License in order to use it for manuals for free +software, because free software needs free documentation: a free +program should come with manuals providing the same freedoms that the +software does.  But this License is not limited to software manuals; +it can be used for any textual work, regardless of subject matter or +whether it is published as a printed book.  We recommend this License +principally for works whose purpose is instruction or reference. + +@item +APPLICABILITY AND DEFINITIONS + +This License applies to any manual or other work, in any medium, that +contains a notice placed by the copyright holder saying it can be +distributed under the terms of this License.  Such a notice grants a +world-wide, royalty-free license, unlimited in duration, to use that +work under the conditions stated herein.  The ``Document'', below, +refers to any such manual or work.  Any member of the public is a +licensee, and is addressed as ``you''.  You accept the license if you +copy, modify or distribute the work in a way requiring permission +under copyright law. + +A ``Modified Version'' of the Document means any work containing the +Document or a portion of it, either copied verbatim, or with +modifications and/or translated into another language. + +A ``Secondary Section'' is a named appendix or a front-matter section +of the Document that deals exclusively with the relationship of the +publishers or authors of the Document to the Document's overall +subject (or to related matters) and contains nothing that could fall +directly within that overall subject.  (Thus, if the Document is in +part a textbook of mathematics, a Secondary Section may not explain +any mathematics.)  The relationship could be a matter of historical +connection with the subject or with related matters, or of legal, +commercial, philosophical, ethical or political position regarding +them. + +The ``Invariant Sections'' are certain Secondary Sections whose titles +are designated, as being those of Invariant Sections, in the notice +that says that the Document is released under this License.  If a +section does not fit the above definition of Secondary then it is not +allowed to be designated as Invariant.  The Document may contain zero +Invariant Sections.  If the Document does not identify any Invariant +Sections then there are none. + +The ``Cover Texts'' are certain short passages of text that are listed, +as Front-Cover Texts or Back-Cover Texts, in the notice that says that +the Document is released under this License.  A Front-Cover Text may +be at most 5 words, and a Back-Cover Text may be at most 25 words. + +A ``Transparent'' copy of the Document means a machine-readable copy, +represented in a format whose specification is available to the +general public, that is suitable for revising the document +straightforwardly with generic text editors or (for images composed of +pixels) generic paint programs or (for drawings) some widely available +drawing editor, and that is suitable for input to text formatters or +for automatic translation to a variety of formats suitable for input +to text formatters.  A copy made in an otherwise Transparent file +format whose markup, or absence of markup, has been arranged to thwart +or discourage subsequent modification by readers is not Transparent. +An image format is not Transparent if used for any substantial amount +of text.  A copy that is not ``Transparent'' is called ``Opaque''. + +Examples of suitable formats for Transparent copies include plain +@sc{ascii} without markup, Texinfo input format, La@TeX{} input +format, @acronym{SGML} or @acronym{XML} using a publicly available +@acronym{DTD}, and standard-conforming simple @acronym{HTML}, +PostScript or @acronym{PDF} designed for human modification.  Examples +of transparent image formats include @acronym{PNG}, @acronym{XCF} and +@acronym{JPG}.  Opaque formats include proprietary formats that can be +read and edited only by proprietary word processors, @acronym{SGML} or +@acronym{XML} for which the @acronym{DTD} and/or processing tools are +not generally available, and the machine-generated @acronym{HTML}, +PostScript or @acronym{PDF} produced by some word processors for +output purposes only. + +The ``Title Page'' means, for a printed book, the title page itself, +plus such following pages as are needed to hold, legibly, the material +this License requires to appear in the title page.  For works in +formats which do not have any title page as such, ``Title Page'' means +the text near the most prominent appearance of the work's title, +preceding the beginning of the body of the text. + +A section ``Entitled XYZ'' means a named subunit of the Document whose +title either is precisely XYZ or contains XYZ in parentheses following +text that translates XYZ in another language.  (Here XYZ stands for a +specific section name mentioned below, such as ``Acknowledgements'', +``Dedications'', ``Endorsements'', or ``History''.)  To ``Preserve the Title'' +of such a section when you modify the Document means that it remains a +section ``Entitled XYZ'' according to this definition. + +The Document may include Warranty Disclaimers next to the notice which +states that this License applies to the Document.  These Warranty +Disclaimers are considered to be included by reference in this +License, but only as regards disclaiming warranties: any other +implication that these Warranty Disclaimers may have is void and has +no effect on the meaning of this License. + +@item +VERBATIM COPYING + +You may copy and distribute the Document in any medium, either +commercially or noncommercially, provided that this License, the +copyright notices, and the license notice saying this License applies +to the Document are reproduced in all copies, and that you add no other +conditions whatsoever to those of this License.  You may not use +technical measures to obstruct or control the reading or further +copying of the copies you make or distribute.  However, you may accept +compensation in exchange for copies.  If you distribute a large enough +number of copies you must also follow the conditions in section 3. + +You may also lend copies, under the same conditions stated above, and +you may publicly display copies. + +@item +COPYING IN QUANTITY + +If you publish printed copies (or copies in media that commonly have +printed covers) of the Document, numbering more than 100, and the +Document's license notice requires Cover Texts, you must enclose the +copies in covers that carry, clearly and legibly, all these Cover +Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on +the back cover.  Both covers must also clearly and legibly identify +you as the publisher of these copies.  The front cover must present +the full title with all words of the title equally prominent and +visible.  You may add other material on the covers in addition. +Copying with changes limited to the covers, as long as they preserve +the title of the Document and satisfy these conditions, can be treated +as verbatim copying in other respects. + +If the required texts for either cover are too voluminous to fit +legibly, you should put the first ones listed (as many as fit +reasonably) on the actual cover, and continue the rest onto adjacent +pages. + +If you publish or distribute Opaque copies of the Document numbering +more than 100, you must either include a machine-readable Transparent +copy along with each Opaque copy, or state in or with each Opaque copy +a computer-network location from which the general network-using +public has access to download using public-standard network protocols +a complete Transparent copy of the Document, free of added material. +If you use the latter option, you must take reasonably prudent steps, +when you begin distribution of Opaque copies in quantity, to ensure +that this Transparent copy will remain thus accessible at the stated +location until at least one year after the last time you distribute an +Opaque copy (directly or through your agents or retailers) of that +edition to the public. + +It is requested, but not required, that you contact the authors of the +Document well before redistributing any large number of copies, to give +them a chance to provide you with an updated version of the Document. + +@item +MODIFICATIONS + +You may copy and distribute a Modified Version of the Document under +the conditions of sections 2 and 3 above, provided that you release +the Modified Version under precisely this License, with the Modified +Version filling the role of the Document, thus licensing distribution +and modification of the Modified Version to whoever possesses a copy +of it.  In addition, you must do these things in the Modified Version: + +@enumerate A +@item +Use in the Title Page (and on the covers, if any) a title distinct +from that of the Document, and from those of previous versions +(which should, if there were any, be listed in the History section +of the Document).  You may use the same title as a previous version +if the original publisher of that version gives permission. + +@item +List on the Title Page, as authors, one or more persons or entities +responsible for authorship of the modifications in the Modified +Version, together with at least five of the principal authors of the +Document (all of its principal authors, if it has fewer than five), +unless they release you from this requirement. + +@item +State on the Title page the name of the publisher of the +Modified Version, as the publisher. + +@item +Preserve all the copyright notices of the Document. + +@item +Add an appropriate copyright notice for your modifications +adjacent to the other copyright notices. + +@item +Include, immediately after the copyright notices, a license notice +giving the public permission to use the Modified Version under the +terms of this License, in the form shown in the Addendum below. + +@item +Preserve in that license notice the full lists of Invariant Sections +and required Cover Texts given in the Document's license notice. + +@item +Include an unaltered copy of this License. + +@item +Preserve the section Entitled ``History'', Preserve its Title, and add +to it an item stating at least the title, year, new authors, and +publisher of the Modified Version as given on the Title Page.  If +there is no section Entitled ``History'' in the Document, create one +stating the title, year, authors, and publisher of the Document as +given on its Title Page, then add an item describing the Modified +Version as stated in the previous sentence. + +@item +Preserve the network location, if any, given in the Document for +public access to a Transparent copy of the Document, and likewise +the network locations given in the Document for previous versions +it was based on.  These may be placed in the ``History'' section. +You may omit a network location for a work that was published at +least four years before the Document itself, or if the original +publisher of the version it refers to gives permission. + +@item +For any section Entitled ``Acknowledgements'' or ``Dedications'', Preserve +the Title of the section, and preserve in the section all the +substance and tone of each of the contributor acknowledgements and/or +dedications given therein. + +@item +Preserve all the Invariant Sections of the Document, +unaltered in their text and in their titles.  Section numbers +or the equivalent are not considered part of the section titles. + +@item +Delete any section Entitled ``Endorsements''.  Such a section +may not be included in the Modified Version. + +@item +Do not retitle any existing section to be Entitled ``Endorsements'' or +to conflict in title with any Invariant Section. + +@item +Preserve any Warranty Disclaimers. +@end enumerate + +If the Modified Version includes new front-matter sections or +appendices that qualify as Secondary Sections and contain no material +copied from the Document, you may at your option designate some or all +of these sections as invariant.  To do this, add their titles to the +list of Invariant Sections in the Modified Version's license notice. +These titles must be distinct from any other section titles. + +You may add a section Entitled ``Endorsements'', provided it contains +nothing but endorsements of your Modified Version by various +parties---for example, statements of peer review or that the text has +been approved by an organization as the authoritative definition of a +standard. + +You may add a passage of up to five words as a Front-Cover Text, and a +passage of up to 25 words as a Back-Cover Text, to the end of the list +of Cover Texts in the Modified Version.  Only one passage of +Front-Cover Text and one of Back-Cover Text may be added by (or +through arrangements made by) any one entity.  If the Document already +includes a cover text for the same cover, previously added by you or +by arrangement made by the same entity you are acting on behalf of, +you may not add another; but you may replace the old one, on explicit +permission from the previous publisher that added the old one. + +The author(s) and publisher(s) of the Document do not by this License +give permission to use their names for publicity for or to assert or +imply endorsement of any Modified Version. + +@item +COMBINING DOCUMENTS + +You may combine the Document with other documents released under this +License, under the terms defined in section 4 above for modified +versions, provided that you include in the combination all of the +Invariant Sections of all of the original documents, unmodified, and +list them all as Invariant Sections of your combined work in its +license notice, and that you preserve all their Warranty Disclaimers. + +The combined work need only contain one copy of this License, and +multiple identical Invariant Sections may be replaced with a single +copy.  If there are multiple Invariant Sections with the same name but +different contents, make the title of each such section unique by +adding at the end of it, in parentheses, the name of the original +author or publisher of that section if known, or else a unique number. +Make the same adjustment to the section titles in the list of +Invariant Sections in the license notice of the combined work. + +In the combination, you must combine any sections Entitled ``History'' +in the various original documents, forming one section Entitled +``History''; likewise combine any sections Entitled ``Acknowledgements'', +and any sections Entitled ``Dedications''.  You must delete all +sections Entitled ``Endorsements.'' + +@item +COLLECTIONS OF DOCUMENTS + +You may make a collection consisting of the Document and other documents +released under this License, and replace the individual copies of this +License in the various documents with a single copy that is included in +the collection, provided that you follow the rules of this License for +verbatim copying of each of the documents in all other respects. + +You may extract a single document from such a collection, and distribute +it individually under this License, provided you insert a copy of this +License into the extracted document, and follow this License in all +other respects regarding verbatim copying of that document. + +@item +AGGREGATION WITH INDEPENDENT WORKS + +A compilation of the Document or its derivatives with other separate +and independent documents or works, in or on a volume of a storage or +distribution medium, is called an ``aggregate'' if the copyright +resulting from the compilation is not used to limit the legal rights +of the compilation's users beyond what the individual works permit. +When the Document is included in an aggregate, this License does not +apply to the other works in the aggregate which are not themselves +derivative works of the Document. + +If the Cover Text requirement of section 3 is applicable to these +copies of the Document, then if the Document is less than one half of +the entire aggregate, the Document's Cover Texts may be placed on +covers that bracket the Document within the aggregate, or the +electronic equivalent of covers if the Document is in electronic form. +Otherwise they must appear on printed covers that bracket the whole +aggregate. + +@item +TRANSLATION + +Translation is considered a kind of modification, so you may +distribute translations of the Document under the terms of section 4. +Replacing Invariant Sections with translations requires special +permission from their copyright holders, but you may include +translations of some or all Invariant Sections in addition to the +original versions of these Invariant Sections.  You may include a +translation of this License, and all the license notices in the +Document, and any Warranty Disclaimers, provided that you also include +the original English version of this License and the original versions +of those notices and disclaimers.  In case of a disagreement between +the translation and the original version of this License or a notice +or disclaimer, the original version will prevail. + +If a section in the Document is Entitled ``Acknowledgements'', +``Dedications'', or ``History'', the requirement (section 4) to Preserve +its Title (section 1) will typically require changing the actual +title. + +@item +TERMINATION + +You may not copy, modify, sublicense, or distribute the Document except +as expressly provided for under this License.  Any other attempt to +copy, modify, sublicense or distribute the Document is void, and will +automatically terminate your rights under this License.  However, +parties who have received copies, or rights, from you under this +License will not have their licenses terminated so long as such +parties remain in full compliance. + +@item +FUTURE REVISIONS OF THIS LICENSE + +The Free Software Foundation may publish new, revised versions +of the GNU Free Documentation License from time to time.  Such new +versions will be similar in spirit to the present version, but may +differ in detail to address new problems or concerns.  See +@uref{http://www.gnu.org/copyleft/}. + +Each version of the License is given a distinguishing version number. +If the Document specifies that a particular numbered version of this +License ``or any later version'' applies to it, you have the option of +following the terms and conditions either of that specified version or +of any later version that has been published (not as a draft) by the +Free Software Foundation.  If the Document does not specify a version +number of this License, you may choose any version ever published (not +as a draft) by the Free Software Foundation. +@end enumerate + +@page +@c @appendixsubsec ADDENDUM: How to use this License for your +@c documents +@subsection ADDENDUM: How to use this License for your documents + +To use this License in a document you have written, include a copy of +the License in the document and put the following copyright and +license notices just after the title page: + +@smallexample +@group +  Copyright (C)  @var{year}  @var{your name}. +  Permission is granted to copy, distribute and/or modify this document +  under the terms of the GNU Free Documentation License, Version 1.2 +  or any later version published by the Free Software Foundation; +  with no Invariant Sections, no Front-Cover Texts, and no Back-Cover +  Texts.  A copy of the license is included in the section entitled ``GNU +  Free Documentation License''. +@end group +@end smallexample + +If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, +replace the ``with...Texts.'' line with this: + +@smallexample +@group +    with the Invariant Sections being @var{list their titles}, with +    the Front-Cover Texts being @var{list}, and with the Back-Cover Texts +    being @var{list}. +@end group +@end smallexample + +If you have Invariant Sections without Cover Texts, or some other +combination of the three, merge those two alternatives to suit the +situation. + +If your document contains nontrivial examples of program code, we +recommend releasing these examples in parallel under your choice of +free software license, such as the GNU General Public License, +to permit their use in free software. + +@c Local Variables: +@c ispell-local-pdict: "ispell-dict" +@c End: + diff --git a/doc/user-guide/fuse.odg b/doc/user-guide/fuse.odg Binary files differnew file mode 100644 index 00000000000..61bd103c78b --- /dev/null +++ b/doc/user-guide/fuse.odg diff --git a/doc/user-guide/fuse.pdf b/doc/user-guide/fuse.pdf Binary files differnew file mode 100644 index 00000000000..a7d13faff56 --- /dev/null +++ b/doc/user-guide/fuse.pdf diff --git a/doc/user-guide/ha.odg b/doc/user-guide/ha.odg Binary files differnew file mode 100644 index 00000000000..e4b8b72d08b --- /dev/null +++ b/doc/user-guide/ha.odg diff --git a/doc/user-guide/ha.pdf b/doc/user-guide/ha.pdf Binary files differnew file mode 100644 index 00000000000..e372c0ab03e --- /dev/null +++ b/doc/user-guide/ha.pdf diff --git a/doc/user-guide/stripe.odg b/doc/user-guide/stripe.odg Binary files differnew file mode 100644 index 00000000000..79441bf1452 --- /dev/null +++ b/doc/user-guide/stripe.odg diff --git a/doc/user-guide/stripe.pdf b/doc/user-guide/stripe.pdf Binary files differnew file mode 100644 index 00000000000..b94446feb56 --- /dev/null +++ b/doc/user-guide/stripe.pdf diff --git a/doc/user-guide/unify.odg b/doc/user-guide/unify.odg Binary files differnew file mode 100644 index 00000000000..ccaa9bf16f9 --- /dev/null +++ b/doc/user-guide/unify.odg diff --git a/doc/user-guide/unify.pdf b/doc/user-guide/unify.pdf Binary files differnew file mode 100644 index 00000000000..c22027f66e7 --- /dev/null +++ b/doc/user-guide/unify.pdf diff --git a/doc/user-guide/user-guide.info b/doc/user-guide/user-guide.info new file mode 100644 index 00000000000..078d62adef1 --- /dev/null +++ b/doc/user-guide/user-guide.info @@ -0,0 +1,2698 @@ +This is ../../../doc/user-guide/user-guide.info, produced by makeinfo +version 4.9 from ../../../doc/user-guide/user-guide.texi. + +START-INFO-DIR-ENTRY +* GlusterFS: (user-guide). GlusterFS distributed filesystem user guide +END-INFO-DIR-ENTRY + +   This is the user manual for GlusterFS 2.0. + +   Copyright (C) 2008,2007 <Z> Research, Inc. Permission is granted to +copy, distribute and/or modify this document under the terms of the GNU +Free Documentation License, Version 1.2 or any later version published +by the Free Software Foundation; with no Invariant Sections, no +Front-Cover Texts, and no Back-Cover Texts. A copy of the license is +included in the chapter entitled "GNU Free Documentation License". + + +File: user-guide.info,  Node: Top,  Next: Acknowledgements,  Up: (dir) + +GlusterFS 2.0 User Guide +************************ + +This is the user manual for GlusterFS 2.0. + +   Copyright (C) 2008,2007 <Z> Research, Inc. Permission is granted to +copy, distribute and/or modify this document under the terms of the GNU +Free Documentation License, Version 1.2 or any later version published +by the Free Software Foundation; with no Invariant Sections, no +Front-Cover Texts, and no Back-Cover Texts. A copy of the license is +included in the chapter entitled "GNU Free Documentation License". + +* Menu: + +* Acknowledgements:: +* Introduction:: +* Installation and Invocation:: +* Concepts:: +* Translators:: +* Usage Scenarios:: +* Troubleshooting:: +* GNU Free Documentation Licence:: +* Index:: + + --- The Detailed Node Listing --- + +Installation and Invocation + +* Pre requisites:: +* Getting GlusterFS:: +* Building:: +* Running GlusterFS:: +* A Tutorial Introduction:: + +Running GlusterFS + +* Server:: +* Client:: + +Concepts + +* Filesystems in Userspace:: +* Translator:: +* Volume specification file:: + +Translators + +* Storage Translators:: +* Client and Server Translators:: +* Clustering Translators:: +* Performance Translators:: +* Features Translators:: + +Storage Translators + +* POSIX:: + +Client and Server Translators + +* Transport modules:: +* Client protocol:: +* Server protocol:: + +Clustering Translators + +* Unify:: +* Replicate:: +* Stripe:: + +Performance Translators + +* Read Ahead:: +* Write Behind:: +* IO Threads:: +* IO Cache:: + +Features Translators + +* POSIX Locks:: +* Fixed ID:: + +Miscellaneous Translators + +* ROT-13:: +* Trace:: + + +File: user-guide.info,  Node: Acknowledgements,  Next: Introduction,  Prev: Top,  Up: Top + +Acknowledgements +**************** + +GlusterFS continues to be a wonderful and enriching experience for all +of us involved. + +   GlusterFS development would not have been possible at this pace if +not for our enthusiastic users. People from around the world have +helped us with bug reports, performance numbers, and feature +suggestions.  A huge thanks to them all. + +   Matthew Paine - for RPMs & general enthu + +   Leonardo Rodrigues de Mello - for DEBs + +   Julian Perez & Adam D'Auria - for multi-server tutorial + +   Paul England - for HA spec + +   Brent Nelson - for many bug reports + +   Jacques Mattheij - for Europe mirror. + +   Patrick Negri - for TCP non-blocking connect. +        http://gluster.org/core-team.php (<list-hacking@zresearch.com>) +                                                           <Z> Research + + +File: user-guide.info,  Node: Introduction,  Next: Installation and Invocation,  Prev: Acknowledgements,  Up: Top + +1 Introduction +************** + +GlusterFS is a distributed filesystem. It works at the file level, not +block level. + +   A network filesystem is one which allows us to access remote files. A +distributed filesystem is one that stores data on multiple machines and +makes them all appear to be a part of the same filesystem. + +   Need for distributed filesystems + +   * Scalability: A distributed filesystem allows us to store more data +     than what can be stored on a single machine. + +   * Redundancy: We might want to replicate crucial data on to several +     machines. + +   * Uniform access: One can mount a remote volume (for example your +     home directory) from any machine and access the same data. + +1.1 Contacting us +================= + +You can reach us through the mailing list *gluster-devel* +(<gluster-devel@nongnu.org>).   + +   You can also find many of the developers on IRC, on the `#gluster' +channel on Freenode (<irc.freenode.net>).   + +   The GlusterFS documentation wiki is also useful: +<http://gluster.org/docs/index.php/GlusterFS> + +   For commercial support, you can contact <Z> Research at:  + +     3194 Winding Vista Common +     Fremont, CA 94539 +     USA. + +     Phone: +1 (510) 354 6801 +     Toll free: +1 (888) 813 6309 +     Fax: +1 (510) 372 0604 + +   You can also email us at <support@zresearch.com>. + + +File: user-guide.info,  Node: Installation and Invocation,  Next: Concepts,  Prev: Introduction,  Up: Top + +2 Installation and Invocation +***************************** + +* Menu: + +* Pre requisites:: +* Getting GlusterFS:: +* Building:: +* Running GlusterFS:: +* A Tutorial Introduction:: + + +File: user-guide.info,  Node: Pre requisites,  Next: Getting GlusterFS,  Up: Installation and Invocation + +2.1 Pre requisites +================== + +Before installing GlusterFS make sure you have the following components +installed. + +2.1.1 FUSE +---------- + +You'll need FUSE version 2.6.0 or higher to use GlusterFS. You can omit +installing FUSE if you want to build _only_ the server. Note that you +won't be able to mount a GlusterFS filesystem on a machine that does +not have FUSE installed. + +   FUSE can be downloaded from: <http://fuse.sourceforge.net/> + +   To get the best performance from GlusterFS, however, it is +recommended that you use our patched version of FUSE. See Patched FUSE +for details. + +2.1.2 Patched FUSE +------------------ + +The GlusterFS project maintains a patched version of FUSE meant to be +used with GlusterFS. The patches increase GlusterFS performance. It is +recommended that all users use the patched FUSE. + +   The patched FUSE tarball can be downloaded from: + +   <ftp://ftp.zresearch.com/pub/gluster/glusterfs/fuse/> + +   The specific changes made to FUSE are: + +   * The communication channel size between FUSE kernel module and +     GlusterFS has been increased to 1MB, permitting large reads and +     writes to be sent in bigger chunks. + +   * The kernel's read-ahead boundry has been extended upto 1MB. + +   * Block size returned in the `stat()'/`fstat()' calls tuned to 1MB, +     to make cp and similar commands perform I/O using that block size. + +   * `flock()' locking support has been added (although some rework in +     GlusterFS is needed for perfect compliance). + +2.1.3 libibverbs (optional) +--------------------------- + +This is only needed if you want GlusterFS to use InfiniBand as the +interconnect mechanism between server and client. You can get it from: + +   <http://www.openfabrics.org/downloads.htm>. + +2.1.4 Bison and Flex +-------------------- + +These should be already installed on most Linux systems. If not, use +your distribution's normal software installation procedures to install +them. Make sure you install the relevant developer packages also. + + +File: user-guide.info,  Node: Getting GlusterFS,  Next: Building,  Prev: Pre requisites,  Up: Installation and Invocation + +2.2 Getting GlusterFS +===================== + +There are many ways to get hold of GlusterFS. For a production +deployment, the recommended method is to download the latest release +tarball.  Release tarballs are available at: +<http://gluster.org/download.php>. + +   If you want the bleeding edge development source, you can get them +from the GNU Arch(1) repository. First you must install GNU Arch +itself. Then register the GlusterFS archive by doing: + +     $ tla register-archive http://arch.sv.gnu.org/archives/gluster + +   Now you can check out the source itself: + +     $ tla get -A gluster@sv.gnu.org glusterfs--mainline--3.0 + +   ---------- Footnotes ---------- + +   (1) <http://www.gnu.org/software/gnu-arch/> + + +File: user-guide.info,  Node: Building,  Next: Running GlusterFS,  Prev: Getting GlusterFS,  Up: Installation and Invocation + +2.3 Building +============ + +You can skip this section if you're installing from RPMs or DEBs. + +   GlusterFS uses the Autotools mechanism to build. As such, the +procedure is straight-forward. First, change into the GlusterFS source +directory. + +     $ cd glusterfs-<version> + +   If you checked out the source from the Arch repository, you'll need +to run `./autogen.sh' first. Note that you'll need to have Autoconf and +Automake installed for this. + +   Run `configure'. + +     $ ./configure + +   The configure script accepts the following options: + +`--disable-ibverbs' +     Disable the InfiniBand transport mechanism. + +`--disable-fuse-client' +     Disable the FUSE client. + +`--disable-server' +     Disable building of the GlusterFS server. + +`--disable-bdb' +     Disable building of Berkeley DB based storage translator. + +`--disable-mod_glusterfs' +     Disable building of Apache/lighttpd glusterfs plugins. + +`--disable-epoll' +     Use poll instead of epoll. + +`--disable-libglusterfsclient' +     Disable building of libglusterfsclient + + +   Build and install GlusterFS. + +     # make install + +   The binaries (`glusterfsd' and `glusterfs') will be by default +installed in `/usr/local/sbin/'. Translator, scheduler, and transport +shared libraries will be installed in +`/usr/local/lib/glusterfs/<version>/'. Sample volume specification +files will be in `/usr/local/etc/glusterfs/'.  This document itself can +be found in `/usr/local/share/doc/glusterfs/'. If you passed the +`--prefix' argument to the configure script, then replace `/usr/local' +in the preceding paths with the prefix. + + +File: user-guide.info,  Node: Running GlusterFS,  Next: A Tutorial Introduction,  Prev: Building,  Up: Installation and Invocation + +2.4 Running GlusterFS +===================== + +* Menu: + +* Server:: +* Client:: + + +File: user-guide.info,  Node: Server,  Next: Client,  Up: Running GlusterFS + +2.4.1 Server +------------ + +The GlusterFS server is necessary to export storage volumes to remote +clients (See *Note Server protocol:: for more info). This section +documents the invocation of the GlusterFS server program and all the +command-line options accepted by it. + +     Basic Options + +`-f, --volfile=<path>' +     Use the volume file as the volume specification. + +`-s, --volfile-server=<hostname>' +     Server to get volume file from. This option overrides -volfile +     option. + +`-l, --log-file=<path>' +     Specify the path for the log file. + +`-L, --log-level=<level>' +     Set the log level for the server. Log level should be one of DEBUG, +     WARNING, ERROR, CRITICAL, or NONE. + +     Advanced Options + +`--debug' +     Run in debug mode. This option sets -no-daemon, -log-level to +     DEBUG and       -log-file to console. + +`-N, --no-daemon' +     Run glusterfsd as a foreground process. + +`-p, --pid-file=<path>' +     Path for the PID file. + +`--volfile-id=<key>' +     'key' of the volfile to be fetched from server. + +`--volfile-server-port=<port-number>' +     Listening port number of volfile server. + +`--volfile-server-transport=[socket|ib-verbs]' +     Transport type to get volfile from server. [default: `socket'] + +`--xlator-options=<volume-name.option=value>' +     Add/override a translator option for a volume with specified value. + +     Miscellaneous Options + +`-?, --help' +     Show this help text. + +`--usage' +     Display a short usage message. + +`-V, --version' +     Show version information. + + +File: user-guide.info,  Node: Client,  Prev: Server,  Up: Running GlusterFS + +2.4.2 Client +------------ + +The GlusterFS client process is necessary to access remote storage +volumes and mount them locally using FUSE. This section documents the +invocation of the client process and all its command-line arguments. + +       # glusterfs [options] <mountpoint> + +   The `mountpoint' is the directory where you want the GlusterFS +filesystem to appear. Example: + +       # glusterfs -f /usr/local/etc/glusterfs-client.vol /mnt + +   The command-line options are detailed below. + +     Basic Options + +`-f, --volfile=<path>' +     Use the volume file as the volume specification. + +`-s, --volfile-server=<hostname>' +     Server to get volume file from. This option overrides -volfile +     option. + +`-l, --log-file=<path>' +     Specify the path for the log file. + +`-L, --log-level=<level>' +     Set the log level for the server. Log level should be one of DEBUG, +     WARNING, ERROR, CRITICAL, or NONE. + +     Advanced Options + +`--debug' +     Run in debug mode. This option sets -no-daemon, -log-level to +     DEBUG and       -log-file to console. + +`-N, --no-daemon' +     Run `glusterfs' as a foreground process. + +`-p, --pid-file=<path>' +     Path for the PID file. + +`--volfile-id=<key>' +     'key' of the volfile to be fetched from server. + +`--volfile-server-port=<port-number>' +     Listening port number of volfile server. + +`--volfile-server-transport=[socket|ib-verbs]' +     Transport type to get volfile from server. [default: `socket'] + +`--xlator-options=<volume-name.option=value>' +     Add/override a translator option for a volume with specified value. + +`--volume-name=<volume name>' +     Volume name in client spec to use. Defaults to the root volume. + +     FUSE Options + +`--attribute-timeout=<n>' +     Attribute timeout for inodes in the kernel, in seconds. Defaults +     to 1 second. + +`--disable-direct-io-mode' +     Disable direct I/O mode in FUSE kernel module. + +`-e, --entry-timeout=<n>' +     Entry timeout for directory entries in the kernel, in seconds. +        Defaults to 1 second. + +     Missellaneous Options + +`-?, --help' +     Show this help information. + +`-V, --version' +     Show version information. + + +File: user-guide.info,  Node: A Tutorial Introduction,  Prev: Running GlusterFS,  Up: Installation and Invocation + +2.5 A Tutorial Introduction +=========================== + +This section will show you how to quickly get GlusterFS up and running. +We'll configure GlusterFS as a simple network filesystem, with one +server and one client.  In this mode of usage, GlusterFS can serve as a +replacement for NFS. + +   We'll make use of two machines; call them _server_ and _client_ (If +you don't want to setup two machines, just run everything that follows +on the same machine).  In the examples that follow, the shell prompts +will use these names to clarify the machine on which the command is +being run. For example, a command that should be run on the server will +be shown with the prompt: + +     [root@server]# + +   Our goal is to make a directory on the _server_ (say, `/export') +accessible to the _client_. + +   First of all, get GlusterFS installed on both the machines, as +described in the previous sections. Make sure you have the FUSE kernel +module loaded. You can ensure this by running: + +     [root@server]# modprobe fuse + +   Before we can run the GlusterFS client or server programs, we need +to write two files called _volume specifications_ (equivalently refered +to as _volfiles_).  The volfile describes the _translator tree_ on a +node. The next chapter will explain the concepts of `translator' and +`volume specification' in detail. For now, just assume that the volfile +is like an NFS `/etc/export' file. + +   On the server, create a text file somewhere (we'll assume the path +`/tmp/glusterfsd.vol') with the following contents. + +     volume colon-o +       type storage/posix +       option directory /export +     end-volume + +     volume server +       type protocol/server +       subvolumes colon-o +       option transport-type tcp +       option auth.addr.colon-o.allow * +     end-volume + +   A brief explanation of the file's contents. The first section +defines a storage volume, named "colon-o" (the volume names are +arbitrary), which exports the `/export' directory. The second section +defines options for the translator which will make the storage volume +accessible remotely. It specifies `colon-o' as a subvolume. This +defines the _translator tree_, about which more will be said in the +next chapter. The two options specify that the TCP protocol is to be +used (as opposed to InfiniBand, for example), and that access to the +storage volume is to be provided to clients with any IP address at all. +If you wanted to restrict access to this server to only your subnet for +example, you'd specify something like `192.168.1.*' in the second +option line. + +   On the client machine, create the following text file (again, we'll +assume the path to be `/tmp/glusterfs-client.vol'). Replace +_server-ip-address_ with the IP address of your server machine. If you +are doing all this on a single machine, use `127.0.0.1'. + +     volume client +       type protocol/client +       option transport-type tcp +       option remote-host _server-ip-address_ +       option remote-subvolume colon-o +     end-volume + +   Now we need to start both the server and client programs. To start +the server: + +     [root@server]# glusterfsd -f /tmp/glusterfs-server.vol + +   To start the client: + +     [root@client]# glusterfs -f /tmp/glusterfs-client.vol /mnt/glusterfs + +   You should now be able to see the files under the server's `/export' +directory in the `/mnt/glusterfs' directory on the client. That's it; +GlusterFS is now working as a network file system. + + +File: user-guide.info,  Node: Concepts,  Next: Translators,  Prev: Installation and Invocation,  Up: Top + +3 Concepts +********** + +* Menu: + +* Filesystems in Userspace:: +* Translator:: +* Volume specification file:: + + +File: user-guide.info,  Node: Filesystems in Userspace,  Next: Translator,  Up: Concepts + +3.1 Filesystems in Userspace +============================ + +A filesystem is usually implemented in kernel space. Kernel space +development is much harder than userspace development. FUSE is a kernel +module/library that allows us to write a filesystem completely in +userspace. + +   FUSE consists of a kernel module which interacts with the userspace +implementation using a device file `/dev/fuse'. When a process makes a +syscall on a FUSE filesystem, VFS hands the request to the FUSE module, +which writes the request to `/dev/fuse'. The userspace implementation +polls `/dev/fuse', and when a request arrives, processes it and writes +the result back to `/dev/fuse'. The kernel then reads from the device +file and returns the result to the user process. + +   In case of GlusterFS, the userspace program is the GlusterFS client. +The control flow is shown in the diagram below. The GlusterFS client +services the request by sending it to the server, which in turn hands +it to the local POSIX filesystem. + + +                   Fig 1. Control flow in GlusterFS + + +File: user-guide.info,  Node: Translator,  Next: Volume specification file,  Prev: Filesystems in Userspace,  Up: Concepts + +3.2 Translator +============== + +The _translator_ is the most important concept in GlusterFS. In fact, +GlusterFS is nothing but a collection of translators working together, +forming a translator _tree_. + +   The idea of a translator is perhaps best understood using an +analogy. Consider the VFS in the Linux kernel. The VFS abstracts the +various filesystem implementations (such as EXT3, ReiserFS, XFS, etc.) +supported by the kernel. When an application calls the kernel to +perform an operation on a file, the kernel passes the request on to the +appropriate filesystem implementation. + +   For example, let's say there are two partitions on a Linux machine: +`/', which is an EXT3 partition, and `/usr', which is a ReiserFS +partition. Now if an application wants to open a file called, say, +`/etc/fstab', then the kernel will internally pass the request to the +EXT3 implementation.  If on the other hand, an application wants to +read a file called `/usr/src/linux/CREDITS', then the kernel will call +upon the ReiserFS implementation to do the job. + +   The "filesystem implementation" objects are analogous to GlusterFS +translators. A GlusterFS translator implements all the filesystem +operations.  Whereas in VFS there is a two-level tree (with the kernel +at the root and all the filesystem implementation as its children), in +GlusterFS there exists a more elaborate tree structure. + +   We can now define translators more precisely. A GlusterFS translator +is a shared object (`.so') that implements every filesystem call. +GlusterFS translators can be arranged in an arbitrary tree structure +(subject to constraints imposed by the translators). When GlusterFS +receives a filesystem call, it passes it on to the translator at the +root of the translator tree. The root translator may in turn pass it on +to any or all of its children, and so on, until the leaf nodes are +reached. The result of a filesystem call is communicated in the reverse +fashion, from the leaf nodes up to the root node, and then on to the +application. + +   So what might a translator tree look like? + + +                    Fig 2. A sample translator tree + +   The diagram depicts three servers and one GlusterFS client. It is +important to note that conceptually, the translator tree spans machine +boundaries.  Thus, the client machine in the diagram, `10.0.0.1', can +access the aggregated storage of the filesystems on the server machines +`10.0.0.2', `10.0.0.3', and `10.0.0.4'. The translator diagram will +make more sense once you've read the next chapter and understood the +functions of the various translators. + + +File: user-guide.info,  Node: Volume specification file,  Prev: Translator,  Up: Concepts + +3.3 Volume specification file +============================= + +The volume specification file describes the translator tree for both the +server and client programs. + +   A volume specification file is a sequence of volume definitions. +The syntax of a volume definition is explained below: + +     *volume* _volume-name_ +       *type* _translator-name_ +       *option* _option-name_ _option-value_ +       ... +       *subvolumes* _subvolume1_ _subvolume2_ ... +     *end-volume* + +   ... + +_volume-name_ +     An identifier for the volume. This is just a human-readable name, +     and can contain any alphanumeric character. For instance, +     "storage-1", "colon-o", or "forty-two". + +_translator-name_ +     Name of one of the available translators. Example: +     `protocol/client', `cluster/unify'. + +_option-name_ +     Name of a valid option for the translator. + +_option-value_ +     Value for the option. Everything following the "option" keyword to +     the end of the line is considered the value; it is up to the +     translator to parse it. + +_subvolume1_, _subvolume2_, ... +     Volume names of sub-volumes. The sub-volumes must already have +     been defined earlier in the file. + +   There are a few rules you must follow when writing a volume +specification file: + +   * Everything following a ``#'' is considered a comment and is +     ignored. Blank lines are also ignored. + +   * All names and keywords are case-sensitive. + +   * The order of options inside a volume definition does not matter. + +   * An option value may not span multiple lines. + +   * If an option is not specified, it will assume its default value. + +   * A sub-volume must have already been defined before it can be +     referenced. This means you have to write the specification file +     "bottom-up", starting from the leaf nodes of the translator tree +     and moving up to the root. + +   A simple example volume specification file is shown below: + +     # This is a comment line +     volume client +      type protocol/client +      option transport-type tcp +      option remote-host localhost      # Also a comment +      option remote-subvolume brick +     # The subvolumes line may be absent +     end-volume + +     volume iot +      type performance/io-threads +      option thread-count 4 +      subvolumes client +     end-volume + +     volume wb +      type performance/write-behind +      subvolumes iot +     end-volume + + +File: user-guide.info,  Node: Translators,  Next: Usage Scenarios,  Prev: Concepts,  Up: Top + +4 Translators +************* + +* Menu: + +* Storage Translators:: +* Client and Server Translators:: +* Clustering Translators:: +* Performance Translators:: +* Features Translators:: +* Miscellaneous Translators:: + +   This chapter documents all the available GlusterFS translators in +detail.  Each translator section will show its name (for example, +`cluster/unify'), briefly describe its purpose and workings, and list +every option accepted by that translator and their meaning. + + +File: user-guide.info,  Node: Storage Translators,  Next: Client and Server Translators,  Up: Translators + +4.1 Storage Translators +======================= + +The storage translators form the "backend" for GlusterFS. Currently, +the only available storage translator is the POSIX translator, which +stores files on a normal POSIX filesystem. A pleasant consequence of +this is that your data will still be accessible if GlusterFS crashes or +cannot be started. + +   Other storage backends are planned for the future. One of the +possibilities is an Amazon S3 translator. Amazon S3 is an unlimited +online storage service accessible through a web services API. The S3 +translator will allow you to access the storage as a normal POSIX +filesystem.  (1) + +* Menu: + +* POSIX:: +* BDB:: + +   ---------- Footnotes ---------- + +   (1) Some more discussion about this can be found at: + +http://developer.amazonwebservices.com/connect/message.jspa?messageID=52873 + + +File: user-guide.info,  Node: POSIX,  Next: BDB,  Up: Storage Translators + +4.1.1 POSIX +----------- + +     type storage/posix + +   The `posix' translator uses a normal POSIX filesystem as its +"backend" to actually store files and directories. This can be any +filesystem that supports extended attributes (EXT3, ReiserFS, XFS, +...). Extended attributes are used by some translators to store +metadata, for example, by the replicate and stripe translators. See +*Note Replicate:: and *Note Stripe::, respectively for details. + +`directory <path>' +     The directory on the local filesystem which is to be used for +     storage. + + +File: user-guide.info,  Node: BDB,  Prev: POSIX,  Up: Storage Translators + +4.1.2 BDB +--------- + +     type storage/bdb + +   The `BDB' translator uses a Berkeley DB database as its "backend" to +actually store files as key-value pair in the database and directories +as regular POSIX directories. Note that BDB does not provide extended +attribute support for regular files. Do not use BDB as storage +translator while using any translator that demands extended attributes +on "backend". + +`directory <path>' +     The directory on the local filesystem which is to be used for +     storage. + +`mode [cache|persistent] (cache)' +     When BDB is run in `cache' mode, recovery of back-end is not +     completely guaranteed. `persistent' guarantees that BDB can +     recover back-end from Berkeley DB even if GlusterFS crashes. + +`errfile <path>' +     The path of the file to be used as `errfile' for Berkeley DB to +     report detailed error messages, if any. Note that all the contents +     of this file will be written by Berkeley DB, not GlusterFS. + +`logdir <path>' + + +File: user-guide.info,  Node: Client and Server Translators,  Next: Clustering Translators,  Prev: Storage Translators,  Up: Translators + +4.2 Client and Server Translators +================================= + +The client and server translator enable GlusterFS to export a +translator tree over the network or access a remote GlusterFS server. +These two translators implement GlusterFS's network protocol. + +* Menu: + +* Transport modules:: +* Client protocol:: +* Server protocol:: + + +File: user-guide.info,  Node: Transport modules,  Next: Client protocol,  Up: Client and Server Translators + +4.2.1 Transport modules +----------------------- + +The client and server translators are capable of using any of the +pluggable transport modules. Currently available transport modules are +`tcp', which uses a TCP connection between client and server to +communicate; `ib-sdp', which uses a TCP connection over InfiniBand, and +`ibverbs', which uses high-speed InfiniBand connections. + +   Each transport module comes in two different versions, one to be +used on the server side and the other on the client side. + +4.2.1.1 TCP +........... + +The TCP transport module uses a TCP/IP connection between the server +and the client. + +       option transport-type tcp + +   The TCP client module accepts the following options: + +`non-blocking-connect [no|off|on|yes] (on)' +     Whether to make the connection attempt asynchronous. + +`remote-port <n> (6996)' +     Server port to connect to.   + +`remote-host <hostname> *' +     Hostname or IP address of the server. If the host name resolves to +     multiple IP addresses, all of them will be tried in a round-robin +     fashion. This feature can be used to implement fail-over. + +   The TCP server module accepts the following options: + +`bind-address <address> (0.0.0.0)' +     The local interface on which the server should listen to requests. +     Default is to listen on all interfaces. + +`listen-port <n> (6996)' +     The local port to listen on. + +4.2.1.2 IB-SDP +.............. + +       option transport-type ib-sdp + +   kernel implements socket interface for ib hardware. SDP is over +ib-verbs.  This module accepts the same options as `tcp' + +4.2.1.3 ibverbs +............... + +       option transport-type tcp + +   InfiniBand is a scalable switched fabric interconnect mechanism +primarily used in high-performance computing. InfiniBand can deliver +data throughput of the order of 10 Gbit/s, with latencies of 4-5 ms. + +   The `ib-verbs' transport accesses the InfiniBand hardware through +the "verbs" API, which is the lowest level of software access possible +and which gives the highest performance. On InfiniBand hardware, it is +always best to use `ib-verbs'. Use `ib-sdp' only if you cannot get +`ib-verbs' working for some reason. + +   The `ib-verbs' client module accepts the following options: + +`non-blocking-connect [no|off|on|yes] (on)' +     Whether to make the connection attempt asynchronous. + +`remote-port <n> (6996)' +     Server port to connect to.   + +`remote-host <hostname> *' +     Hostname or IP address of the server. If the host name resolves to +     multiple IP addresses, all of them will be tried in a round-robin +     fashion. This feature can be used to implement fail-over. + +   The `ib-verbs' server module accepts the following options: + +`bind-address <address> (0.0.0.0)' +     The local interface on which the server should listen to requests. +     Default is to listen on all interfaces. + +`listen-port <n> (6996)' +     The local port to listen on. + +   The following options are common to both the client and server +modules: + +   If you are familiar with InfiniBand jargon, the mode is used by +GlusterFS is "reliable connection-oriented channel transfer". + +`ib-verbs-work-request-send-count <n> (64)' +     Length of the send queue in datagrams. [Reason to +     increase/decrease?] + +`ib-verbs-work-request-recv-count <n> (64)' +     Length of the receive queue in datagrams. [Reason to +     increase/decrease?] + +`ib-verbs-work-request-send-size <size> (128KB)' +     Size of each datagram that is sent. [Reason to increase/decrease?] + +`ib-verbs-work-request-recv-size <size> (128KB)' +     Size of each datagram that is received. [Reason to +     increase/decrease?] + +`ib-verbs-port <n> (1)' +     Port number for ib-verbs. + +`ib-verbs-mtu [256|512|1024|2048|4096] (2048)' +     The Maximum Transmission Unit [Reason to increase/decrease?] + +`ib-verbs-device-name <device-name> (first device in the list)' +     InfiniBand device to be used. + +   For maximum performance, you should ensure that the send/receive +counts on both the client and server are the same. + +   ib-verbs is preferred over ib-sdp. + + +File: user-guide.info,  Node: Client protocol,  Next: Server protocol,  Prev: Transport modules,  Up: Client and Server Translators + +4.2.2 Client +------------ + +     type procotol/client + +   The client translator enables the GlusterFS client to access a +remote server's translator tree. + +`transport-type [tcp,ib-sdp,ib-verbs] (tcp)' +     The transport type to use. You should use the client versions of +     all the transport modules (`tcp', `ib-sdp', `ib-verbs'). + +`remote-subvolume <volume_name> *' +     The name of the volume on the remote host to attach to. Note that +     this is _not_ the name of the `protocol/server' volume on the +     server. It should be any volume under the server. + +`transport-timeout <n> (120- seconds)' +     Inactivity timeout. If a reply is expected and no activity takes +     place on the connection within this time, the transport connection +     will be broken, and a new connection will be attempted. + + +File: user-guide.info,  Node: Server protocol,  Prev: Client protocol,  Up: Client and Server Translators + +4.2.3 Server +------------ + +     type protocol/server + +   The server translator exports a translator tree and makes it +accessible to remote GlusterFS clients. + +`client-volume-filename <path> (<CONFDIR>/glusterfs-client.vol)' +     The volume specification file to use for the client. This is the +     file the client will receive when it is invoked with the +     `--server' option (*Note Client::). + +`transport-type [tcp,ib-verbs,ib-sdp] (tcp)' +     The transport to use. You should use the server versions of all +     the transport modules (`tcp', `ib-sdp', `ib-verbs'). + +`auth.addr.<volume name>.allow <IP address wildcard pattern>' +     IP addresses of the clients that are allowed to attach to the +     specified volume.  This can be a wildcard. For example, a wildcard +     of the form `192.168.*.*' allows any host in the `192.168.x.x' +     subnet to connect to the server. + + + +File: user-guide.info,  Node: Clustering Translators,  Next: Performance Translators,  Prev: Client and Server Translators,  Up: Translators + +4.3 Clustering Translators +========================== + +The clustering translators are the most important GlusterFS +translators, since it is these that make GlusterFS a cluster +filesystem. These translators together enable GlusterFS to access an +arbitrarily large amount of storage, and provide RAID-like redundancy +and distribution over the entire cluster. + +   There are three clustering translators: *unify*, *replicate*, and +*stripe*.  The unify translator aggregates storage from many server +nodes. The replicate translator provides file replication. The stripe +translator allows a file to be spread across many server nodes. The +following sections look at each of these translators in detail. + +* Menu: + +* Unify:: +* Replicate:: +* Stripe:: + + +File: user-guide.info,  Node: Unify,  Next: Replicate,  Up: Clustering Translators + +4.3.1 Unify +----------- + +     type cluster/unify + +   The unify translator presents a `unified' view of all its +sub-volumes. That is, it makes the union of all its sub-volumes appear +as a single volume. It is the unify translator that gives GlusterFS the +ability to access an arbitrarily large amount of storage. + +   For unify to work correctly, certain invariants need to be +maintained across the entire network. These are: + +   * The directory structure of all the sub-volumes must be identical. + +   * A particular file can exist on only one of the sub-volumes. +     Phrasing it in another way, a pathname such as +     `/home/calvin/homework.txt') is unique across the entire cluster. + + + +Looking at the second requirement, you might wonder how one can +accomplish storing redundant copies of a file, if no file can exist +multiple times.  To answer, we must remember that these invariants are +from _unify's perspective_.  A translator such as replicate at a lower +level in the translator tree than unify may subvert this picture. + +   The first invariant might seem quite tedious to ensure. We shall see +later that this is not so, since unify's _self-heal_ mechanism takes +care of maintaining it. + +   The second invariant implies that unify needs some way to decide +which file goes where.  Unify makes use of _scheduler_ modules for this +purpose. + +   When a file needs to be created, unify's scheduler decides upon the +sub-volume to be used to store the file. There are many schedulers +available, each using a different algorithm and suitable for different +purposes. + +   The various schedulers are described in detail in the sections that +follow. + +4.3.1.1 ALU +........... + +       option scheduler alu + +   ALU stands for "Adaptive Least Usage". It is the most advanced +scheduler available in GlusterFS. It balances the load across volumes +taking several factors in account. It adapts itself to changing I/O +patterns according to its configuration. When properly configured, it +can eliminate the need for regular tuning of the filesystem to keep +volume load nicely balanced. + +   The ALU scheduler is composed of multiple least-usage +sub-schedulers. Each sub-scheduler keeps track of a certain type of +load, for each of the sub-volumes, getting statistics from the +sub-volumes themselves. The sub-schedulers are these: + +   * disk-usage: The used and free disk space on the volume. + +   * read-usage: The amount of reading done from this volume. + +   * write-usage: The amount of writing done to this volume. + +   * open-files-usage: The number of files currently open from this +     volume. + +   * disk-speed-usage: The speed at which the disks are spinning. This +     is a constant value and therefore not very useful. + +   The ALU scheduler needs to know which of these sub-schedulers to use, +and in which order to evaluate them. This is done through the `option +alu.order' configuration directive. + +   Each sub-scheduler needs to know two things: when to kick in (the +entry-threshold), and how long to stay in control (the exit-threshold). +For example: when unifying three disks of 100GB, keeping an exact +balance of disk-usage is not necesary. Instead, there could be a 1GB +margin, which can be used to nicely balance other factors, such as +read-usage. The disk-usage scheduler can be told to kick in only when a +certain threshold of discrepancy is passed, such as 1GB. When it +assumes control under this condition, it will write all subsequent data +to the least-used volume. If it is doing so, it is unwise to stop right +after the values are below the entry-threshold again, since that would +make it very likely that the situation will occur again very soon. Such +a situation would cause the ALU to spend most of its time disk-usage +scheduling, which is unfair to the other sub-schedulers. The +exit-threshold therefore defines the amount of data that needs to be +written to the least-used disk, before control is relinquished again. + +   In addition to the sub-schedulers, the ALU scheduler also has +"limits" options. These can stop the creation of new files on a volume +once values drop below a certain threshold. For example, setting +`option alu.limits.min-free-disk 5GB' will stop the scheduling of files +to volumes that have less than 5GB of free disk space, leaving the +files on that disk some room to grow. + +   The actual values you assign to the thresholds for sub-schedulers and +limits depend on your situation. If you have fast-growing files, you'll +want to stop file-creation on a disk much earlier than when hardly any +of your files are growing. If you care less about disk-usage balance +than about read-usage balance, you'll want a bigger disk-usage +scheduler entry-threshold and a smaller read-usage scheduler +entry-threshold. + +   For thresholds defining a size, values specifying "KB", "MB" and "GB" +are allowed. For example: `option alu.limits.min-free-disk 5GB'. + +`alu.order <order> * ("disk-usage:write-usage:read-usage:open-files-usage:disk-speed")' + +`alu.disk-usage.entry-threshold <size> (1GB)' + +`alu.disk-usage.exit-threshold <size> (512MB)' + +`alu.write-usage.entry-threshold <%> (25)' + +`alu.write-usage.exit-threshold <%> (5)' + +`alu.read-usage.entry-threshold <%> (25)' + +`alu.read-usage.exit-threshold <%> (5)' + +`alu.open-files-usage.entry-threshold <n> (1000)' + +`alu.open-files-usage.exit-threshold <n> (100)' + +`alu.limits.min-free-disk <%>' + +`alu.limits.max-open-files <n>' + +4.3.1.2 Round Robin (RR) +........................ + +       option scheduler rr + +   Round-Robin (RR) scheduler creates files in a round-robin fashion. +Each client will have its own round-robin loop. When your files are +mostly similar in size and I/O access pattern, this scheduler is a good +choice. RR scheduler checks for free disk space on the server before +scheduling, so you can know when to add another server node. The +default value of min-free-disk is 5% and is checked on file creation +calls, with atleast 10 seconds (by default) elapsing between two checks. + +   Options: +`rr.limits.min-free-disk <%> (5)' +     Minimum free disk space a node must have for RR to schedule a file +     to it. + +`rr.refresh-interval <t> (10 seconds)' +     Time between two successive free disk space checks. + +4.3.1.3 Random +.............. + +       option scheduler random + +   The random scheduler schedules file creation randomly among its +child nodes.  Like the round-robin scheduler, it also checks for a +minimum amount of free disk space before scheduling a file to a node. + +`random.limits.min-free-disk <%> (5)' +     Minimum free disk space a node must have for random to schedule a +     file to it. + +`random.refresh-interval <t> (10 seconds)' +     Time between two successive free disk space checks. + +4.3.1.4 NUFA +............ + +       option scheduler nufa + +   It is common in many GlusterFS computing environments for all +deployed machines to act as both servers and clients. For example, a +research lab may have 40 workstations each with its own storage. All of +these workstations might act as servers exporting a volume as well as +clients accessing the entire cluster's storage.  In such a situation, +it makes sense to store locally created files on the local workstation +itself (assuming files are accessed most by the workstation that +created them). The Non-Uniform File Allocation (NUFA) scheduler +accomplishes that. + +   NUFA gives the local system first priority for file creation over +other nodes. If the local volume does not have more free disk space +than a specified amount (5% by default) then NUFA schedules files among +the other child volumes in a round-robin fashion. + +   NUFA is named after the similar strategy used for memory access, +NUMA(1). + +`nufa.limits.min-free-disk <%> (5)' +     Minimum disk space that must be free (local or remote) for NUFA to +     schedule a file to it. + +`nufa.refresh-interval <t> (10 seconds)' +     Time between two successive free disk space checks. + +`nufa.local-volume-name <volume>' +     The name of the volume corresponding to the local system. This +     volume must be one of the children of the unify volume. This +     option is mandatory. + +4.3.1.5 Namespace +................. + +Namespace volume needed because:  - persistent inode numbers.   - file +exists even when node is down. + +   namespace files are simply touched. on every lookup it is checked. + +`namespace <volume> *' +     Name of the namespace volume (which should be one of the unify +     volume's children). + +`self-heal [on|off] (on)' +     Enable/disable self-heal. Unless you know what you are doing, do +     not disable self-heal. + +4.3.1.6 Self Heal +................. + +* When a 'lookup()/stat()' call is made on directory for the first +time, a self-heal call is made, which checks for the consistancy of its +child nodes. If an entry is present in storage node, but not in +namespace, that entry is created in namespace, and vica-versa. There is +an writedir() API introduced which is used for the same. It also checks +for permissions, and uid/gid consistencies. + +   * This check is also done when an server goes down and comes up. + +   * If one starts with an empty namespace export, but has data in +storage nodes, a 'find .>/dev/null' or 'ls -lR >/dev/null' should help +to build namespace in one shot. Even otherwise, namespace is built on +demand when a file is looked up for the first time. + +   NOTE: There are some issues (Kernel 'Oops' msgs) seen with +fuse-2.6.3, when someone deletes namespace in backend, when glusterfs is +running. But with fuse-2.6.5, this issue is not there. + +   ---------- Footnotes ---------- + +   (1) Non-Uniform Memory Access: +<http://en.wikipedia.org/wiki/Non-Uniform_Memory_Access> + + +File: user-guide.info,  Node: Replicate,  Next: Stripe,  Prev: Unify,  Up: Clustering Translators + +4.3.2 Replicate (formerly AFR) +------------------------------ + +     type cluster/replicate + +   Replicate provides RAID-1 like functionality for GlusterFS. +Replicate replicates files and directories across the subvolumes. Hence +if Replicate has four subvolumes, there will be four copies of all +files and directories. Replicate provides high-availability, i.e., in +case one of the subvolumes go down (e. g. server crash, network +disconnection) Replicate will still service the requests using the +redundant copies. + +   Replicate also provides self-heal functionality, i.e., in case the +crashed servers come up, the outdated files and directories will be +updated with the latest versions. Replicate uses extended attributes of +the backend file system to track the versioning of files and +directories and provide the self-heal feature. + +     volume replicate-example +      type cluster/replicate +      subvolumes brick1 brick2 brick3 +     end-volume + +   This sample configuration will replicate all directories and files on +brick1, brick2 and brick3. + +   All the read operations happen from the first alive child. If all the +three sub-volumes are up, reads will be done from brick1; if brick1 is +down read will be done from brick2. In case read() was being done on +brick1 and it goes down, replicate transparently falls back to brick2. + +   The next release of GlusterFS will add the following features: +   * Ability to specify the sub-volume from which read operations are +     to be done (this will help users who have one of the sub-volumes +     as a local storage volume). + +   * Allow scheduling of read operations amongst the sub-volumes in a +     round-robin fashion. + +   The order of the subvolumes list should be same across all the +'replicate's as they will be used for locking purposes. + +4.3.2.1 Self Heal +................. + +Replicate has self-heal feature, which updates the outdated file and +directory copies by the most recent versions. For example consider the +following config: + +     volume replicate-example +      type cluster/replicate +      subvolumes brick1 brick2 +     end-volume + +4.3.2.2 File self-heal +...................... + +Now if we create a file foo.txt on replicate-example, the file will be +created on brick1 and brick2. The file will have two extended +attributes associated with it in the backend filesystem. One is +trusted.afr.createtime and the other is trusted.afr.version. The +trusted.afr.createtime xattr has the create time (in terms of seconds +since epoch) and trusted.afr.version is a number that is incremented +each time a file is modified. This increment happens during close +(incase any write was done before close). + +   If brick1 goes down, we edit foo.txt the version gets incremented. +Now the brick1 comes back up, when we open() on foo.txt replicate will +check if their versions are same. If they are not same, the outdated +copy is replaced by the latest copy and its version is updated. After +the sync the open() proceeds in the usual manner and the application +calling open() can continue on its access to the file. + +   If brick1 goes down, we delete foo.txt and create a file with the +same name again i.e foo.txt. Now brick1 comes back up, clearly there is +a chance that the version on brick1 being more than the version on +brick2, this is where createtime extended attribute helps in deciding +which the outdated copy is. Hence we need to consider both createtime +and version to decide on the latest copy. + +   The version attribute is incremented during the close() call. Version +will not be incremented in case there was no write() done. In case the +fd that the close() gets was got by create() call, we also create the +createtime extended attribute. + +4.3.2.3 Directory self-heal +........................... + +Suppose brick1 goes down, we delete foo.txt, brick1 comes back up, now +we should not create foo.txt on brick2 but we should delete foo.txt on +brick1. We handle this situation by having the createtime and version +attribute on the directory similar to the file. when lookup() is done +on the directory, we compare the createtime/version attributes of the +copies and see which files needs to be deleted and delete those files +and update the extended attributes of the outdated directory copy. +Each time a directory is modified (a file or a subdirectory is created +or deleted inside the directory) and one of the subvols is down, we +increment the directory's version. + +   lookup() is a call initiated by the kernel on a file or directory +just before any access to that file or directory. In glusterfs, by +default, lookup() will not be called in case it was called in the past +one second on that particular file or directory. + +   The extended attributes can be seen in the backend filesystem using +the `getfattr' command. (`getfattr -n trusted.afr.version <file>') + +`debug [on|off]  (off)' + +`self-heal [on|off] (on)' + +`replicate <pattern> (*:1)' + +`lock-node <child_volume> (first child is used by default)' + + +File: user-guide.info,  Node: Stripe,  Prev: Replicate,  Up: Clustering Translators + +4.3.3 Stripe +------------ + +     type cluster/stripe + +   The stripe translator distributes the contents of a file over its +sub-volumes.  It does this by creating a file equal in size to the +total size of the file on each of its sub-volumes. It then writes only +a part of the file to each sub-volume, leaving the rest of it empty. +These empty regions are called `holes' in Unix terminology. The holes +do not consume any disk space. + +   The diagram below makes this clear. + + + +You can configure stripe so that only filenames matching a pattern are +striped. You can also configure the size of the data to be stored on +each sub-volume. + +`block-size <pattern>:<size>  (*:0 no striping)' +     Distribute files matching `<pattern>' over the sub-volumes, +     storing at least `<size>' on each sub-volume. For example, + +            option block-size *.mpg:1M + +     distributes all files ending in `.mpg', storing at least 1 MB on +     each sub-volume. + +     Any number of `block-size' option lines may be present, specifying +     different sizes for different file name patterns. + + +File: user-guide.info,  Node: Performance Translators,  Next: Features Translators,  Prev: Clustering Translators,  Up: Translators + +4.4 Performance Translators +=========================== + +* Menu: + +* Read Ahead:: +* Write Behind:: +* IO Threads:: +* IO Cache:: +* Booster:: + + +File: user-guide.info,  Node: Read Ahead,  Next: Write Behind,  Up: Performance Translators + +4.4.1 Read Ahead +---------------- + +     type performance/read-ahead + +   The read-ahead translator pre-fetches data in advance on every read. +This benefits applications that mostly process files in sequential +order, since the next block of data will already be available by the +time the application is done with the current one. + +   Additionally, the read-ahead translator also behaves as a +read-aggregator.  Many small read operations are combined and issued as +fewer, larger read requests to the server. + +   Read-ahead deals in "pages" as the unit of data fetched. The page +size is configurable, as is the "page count", which is the number of +pages that are pre-fetched. + +   Read-ahead is best used with InfiniBand (using the ib-verbs +transport).  On FastEthernet and Gigabit Ethernet networks, GlusterFS +can achieve the link-maximum throughput even without read-ahead, making +it quite superflous. + +   Note that read-ahead only happens if the reads are perfectly +sequential. If your application accesses data in a random fashion, +using read-ahead might actually lead to a performance loss, since +read-ahead will pointlessly fetch pages which won't be used by the +application. + +   Options: +`page-size <n> (256KB)' +     The unit of data that is pre-fetched. + +`page-count <n> (2)' +     The number of pages that are pre-fetched. + +`force-atime-update [on|off|yes|no] (off|no)' +     Whether to force an access time (atime) update on the file on +     every read. Without this, the atime will be slightly imprecise, as +     it will reflect the time when the read-ahead translator read the +     data, not when the application actually read it. + + +File: user-guide.info,  Node: Write Behind,  Next: IO Threads,  Prev: Read Ahead,  Up: Performance Translators + +4.4.2 Write Behind +------------------ + +     type performance/write-behind + +   The write-behind translator improves the latency of a write +operation.  It does this by relegating the write operation to the +background and returning to the application even as the write is in +progress. Using the write-behind translator, successive write requests +can be pipelined.  This mode of write-behind operation is best used on +the client side, to enable decreased write latency for the application. + +   The write-behind translator can also aggregate write requests. If the +`aggregate-size' option is specified, then successive writes upto that +size are accumulated and written in a single operation. This mode of +operation is best used on the server side, as this will decrease the +disk's head movement when multiple files are being written to in +parallel. + +   The `aggregate-size' option has a default value of 128KB. Although +this works well for most users, you should always experiment with +different values to determine the one that will deliver maximum +performance. This is because the performance of write-behind depends on +your interconnect, size of RAM, and the work load. + +`aggregate-size <n> (128KB)' +     Amount of data to accumulate before doing a write + +`flush-behind [on|yes|off|no] (off|no)' + + +File: user-guide.info,  Node: IO Threads,  Next: IO Cache,  Prev: Write Behind,  Up: Performance Translators + +4.4.3 IO Threads +---------------- + +     type performance/io-threads + +   The IO threads translator is intended to increase the responsiveness +of the server to metadata operations by doing file I/O (read, write) in +a background thread.  Since the GlusterFS server is single-threaded, +using the IO threads translator can significantly improve performance. +This translator is best used on the server side, loaded just below the +server protocol translator. + +   IO threads operates by handing out read and write requests to a +separate thread.  The total number of threads in existence at a time is +constant, and configurable. + +`thread-count <n> (1)' +     Number of threads to use. + + +File: user-guide.info,  Node: IO Cache,  Next: Booster,  Prev: IO Threads,  Up: Performance Translators + +4.4.4 IO Cache +-------------- + +     type performance/io-cache + +   The IO cache translator caches data that has been read. This is +useful if many applications read the same data multiple times, and if +reads are much more frequent than writes (for example, IO caching may be +useful in a web hosting environment, where most clients will simply +read some files and only a few will write to them). + +   The IO cache translator reads data from its child in `page-size' +chunks.  It caches data upto `cache-size' bytes. The cache is +maintained as a prioritized least-recently-used (LRU) list, with +priorities determined by user-specified patterns to match filenames. + +   When the IO cache translator detects a write operation, the cache +for that file is flushed. + +   The IO cache translator periodically verifies the consistency of +cached data, using the modification times on the files. The +verification timeout is configurable. + +`page-size <n> (128KB)' +     Size of a page. + +`cache-size (n) (32MB)' +     Total amount of data to be cached. + +`force-revalidate-timeout <n> (1)' +     Timeout to force a cache consistency verification, in seconds. + +`priority <pattern> (*:0)' +     Filename patterns listed in order of priority. + + +File: user-guide.info,  Node: Booster,  Prev: IO Cache,  Up: Performance Translators + +4.4.5 Booster +------------- + +       type performance/booster + +   The booster translator gives applications a faster path to +communicate read and write requests to GlusterFS. Normally, all +requests to GlusterFS from applications go through FUSE, as indicated +in *Note Filesystems in Userspace::.  Using the booster translator in +conjunction with the GlusterFS booster shared library, an application +can bypass the FUSE path and send read/write requests directly to the +GlusterFS client process. + +   The booster mechanism consists of two parts: the booster translator, +and the booster shared library. The booster translator is meant to be +loaded on the client side, usually at the root of the translator tree. +The booster shared library should be `LD_PRELOAD'ed with the +application. + +   The booster translator when loaded opens a Unix domain socket and +listens for read/write requests on it. The booster shared library +intercepts read and write system calls and sends the requests to the +GlusterFS process directly using the Unix domain socket, bypassing FUSE. +This leads to superior performance. + +   Once you've loaded the booster translator in your volume +specification file, you can start your application as: + +       $ LD_PRELOAD=/usr/local/bin/glusterfs-booster.so your_app + +   The booster translator accepts no options. + + +File: user-guide.info,  Node: Features Translators,  Next: Miscellaneous Translators,  Prev: Performance Translators,  Up: Translators + +4.5 Features Translators +======================== + +* Menu: + +* POSIX Locks:: +* Fixed ID:: + + +File: user-guide.info,  Node: POSIX Locks,  Next: Fixed ID,  Up: Features Translators + +4.5.1 POSIX Locks +----------------- + +     type features/posix-locks + +   This translator provides storage independent POSIX record locking +support (`fcntl' locking). Typically you'll want to load this on the +server side, just above the POSIX storage translator. Using this +translator you can get both advisory locking and mandatory locking +support.  It also handles `flock()' locks properly. + +   Caveat: Consider a file that does not have its mandatory locking bits +(+setgid, -group execution) turned on. Assume that this file is now +opened by a process on a client that has the write-behind xlator +loaded. The write-behind xlator does not cache anything for files which +have mandatory locking enabled, to avoid incoherence. Let's say that +mandatory locking is now enabled on this file through another client. +The former client will not know about this change, and write-behind may +erroneously report a write as being successful when in fact it would +fail due to the region it is writing to being locked. + +   There seems to be no easy way to fix this. To work around this +problem, it is recommended that you never enable the mandatory bits on +a file while it is open. + +`mandatory [on|off] (on)' +     Turns mandatory locking on. + + +File: user-guide.info,  Node: Fixed ID,  Prev: POSIX Locks,  Up: Features Translators + +4.5.2 Fixed ID +-------------- + +     type features/fixed-id + +   The fixed ID translator makes all filesystem requests from the client +to appear to be coming from a fixed, specified UID/GID, regardless of +which user actually initiated the request. + +`fixed-uid <n> [if not set, not used]' +     The UID to send to the server + +`fixed-gid <n> [if not set, not used]' +     The GID to send to the server + + +File: user-guide.info,  Node: Miscellaneous Translators,  Prev: Features Translators,  Up: Translators + +4.6 Miscellaneous Translators +============================= + +* Menu: + +* ROT-13:: +* Trace:: + + +File: user-guide.info,  Node: ROT-13,  Next: Trace,  Up: Miscellaneous Translators + +4.6.1 ROT-13 +------------ + +     type encryption/rot-13 + +   ROT-13 is a toy translator that can "encrypt" and "decrypt" file +contents using the ROT-13 algorithm. ROT-13 is a trivial algorithm that +rotates each alphabet by thirteen places. Thus, 'A' becomes 'N', 'B' +becomes 'O', and 'Z' becomes 'M'. + +   It goes without saying that you shouldn't use this translator if you +need _real_ encryption (a future release of GlusterFS will have real +encryption translators). + +`encrypt-write [on|off] (on)' +     Whether to encrypt on write + +`decrypt-read [on|off] (on)' +     Whether to decrypt on read + + +File: user-guide.info,  Node: Trace,  Prev: ROT-13,  Up: Miscellaneous Translators + +4.6.2 Trace +----------- + +     type debug/trace + +   The trace translator is intended for debugging purposes. When +loaded, it logs all the system calls received by the server or client +(wherever trace is loaded), their arguments, and the results. You must +use a GlusterFS log level of DEBUG (See *Note Running GlusterFS::) for +trace to work. + +   Sample trace output (lines have been wrapped for readability): +     2007-10-30 00:08:58 D [trace.c:1579:trace_opendir] trace: callid: 68 +     (*this=0x8059e40, loc=0x8091984 {path=/iozone3_283, inode=0x8091f00}, +      fd=0x8091d50) + +     2007-10-30 00:08:58 D [trace.c:630:trace_opendir_cbk] trace: +     (*this=0x8059e40, op_ret=4, op_errno=1, fd=0x8091d50) + +     2007-10-30 00:08:58 D [trace.c:1602:trace_readdir] trace: callid: 69 +     (*this=0x8059e40, size=4096, offset=0 fd=0x8091d50) + +     2007-10-30 00:08:58 D [trace.c:215:trace_readdir_cbk] trace: +     (*this=0x8059e40, op_ret=0, op_errno=0, count=4) + +     2007-10-30 00:08:58 D [trace.c:1624:trace_closedir] trace: callid: 71 +     (*this=0x8059e40, *fd=0x8091d50) + +     2007-10-30 00:08:58 D [trace.c:809:trace_closedir_cbk] trace: +     (*this=0x8059e40, op_ret=0, op_errno=1) + + +File: user-guide.info,  Node: Usage Scenarios,  Next: Troubleshooting,  Prev: Translators,  Up: Top + +5 Usage Scenarios +***************** + +5.1 Advanced Striping +===================== + +This section is based on the Advanced Striping tutorial written by +Anand Avati on the GlusterFS wiki (1). + +5.1.1 Mixed Storage Requirements +-------------------------------- + +There are two ways of scheduling the I/O. One at file level (using +unify translator) and other at block level (using stripe translator). +Striped I/O is good for files that are potentially large and require +high parallel throughput (for example, a single file of 400GB being +accessed by 100s and 1000s of systems simultaneously and randomly). For +most of the cases, file level scheduling works best. + +   In the real world, it is desirable to mix file level and block level +scheduling on a single storage volume. Alternatively users can choose +to have two separate volumes and hence two mount points, but the +applications may demand a single storage system to host both. + +   This document explains how to mix file level scheduling with stripe. + +5.1.2 Configuration Brief +------------------------- + +This setup demonstrates how users can configure unify translator with +appropriate I/O scheduler for file level scheduling and strip for only +matching patterns. This way, GlusterFS chooses appropriate I/O profile +and knows how to efficiently handle both the types of data. + +   A simple technique to achieve this effect is to create a stripe set +of unify and stripe blocks, where unify is the first sub-volume. Files +that do not match the stripe policy passed on to first unify sub-volume +and inturn scheduled arcoss the cluster using its file level I/O +scheduler. + + 5.1.3 Preparing GlusterFS Envoronment +------------------------------------- + +Create the directories /export/namespace, /export/unify and +/export/stripe on all the storage bricks. + +   Place the following server and client volume spec file under +/etc/glusterfs (or appropriate installed path) and replace the IP +addresses / access control fields to match your environment. + +       ## file: /etc/glusterfs/glusterfsd.vol +        volume posix-unify +                type storage/posix +                option directory /export/for-unify +        end-volume + +        volume posix-stripe +                type storage/posix +                option directory /export/for-stripe +        end-volume + +        volume posix-namespace +                type storage/posix +                option directory /export/for-namespace +        end-volume + +        volume server +                type protocol/server +                option transport-type tcp +                option auth.addr.posix-unify.allow 192.168.1.* +                option auth.addr.posix-stripe.allow 192.168.1.* +                option auth.addr.posix-namespace.allow 192.168.1.* +                subvolumes posix-unify posix-stripe posix-namespace +        end-volume + +      ## file: /etc/glusterfs/glusterfs.vol +        volume client-namespace +          type protocol/client +          option transport-type tcp +          option remote-host 192.168.1.1 +          option remote-subvolume posix-namespace +        end-volume + +        volume client-unify-1 +          type protocol/client +          option transport-type tcp +          option remote-host 192.168.1.1 +          option remote-subvolume posix-unify +        end-volume + +        volume client-unify-2 +          type protocol/client +          option transport-type tcp +          option remote-host 192.168.1.2 +          option remote-subvolume posix-unify +        end-volume + +        volume client-unify-3 +          type protocol/client +          option transport-type tcp +          option remote-host 192.168.1.3 +          option remote-subvolume posix-unify +        end-volume + +        volume client-unify-4 +          type protocol/client +          option transport-type tcp +          option remote-host 192.168.1.4 +          option remote-subvolume posix-unify +        end-volume + +        volume client-stripe-1 +          type protocol/client +          option transport-type tcp +          option remote-host 192.168.1.1 +          option remote-subvolume posix-stripe +        end-volume + +        volume client-stripe-2 +          type protocol/client +          option transport-type tcp +          option remote-host 192.168.1.2 +          option remote-subvolume posix-stripe +        end-volume + +        volume client-stripe-3 +          type protocol/client +          option transport-type tcp +          option remote-host 192.168.1.3 +          option remote-subvolume posix-stripe +        end-volume + +        volume client-stripe-4 +          type protocol/client +          option transport-type tcp +          option remote-host 192.168.1.4 +          option remote-subvolume posix-stripe +        end-volume + +        volume unify +          type cluster/unify +          option scheduler rr +          subvolumes cluster-unify-1 cluster-unify-2 cluster-unify-3 cluster-unify-4 +        end-volume + +        volume stripe +          type cluster/stripe +          option block-size *.img:2MB # All files ending with .img are striped with 2MB stripe block size. +          subvolumes unify cluster-stripe-1 cluster-stripe-2 cluster-stripe-3 cluster-stripe-4 +        end-volume + +   Bring up the Storage + +   Starting GlusterFS Server: If you have installed through binary +package, you can start the service through init.d startup script. If +not: + +     [root@server]# glusterfsd + +   Mounting GlusterFS Volumes: + +     [root@client]# glusterfs -s [BRICK-IP-ADDRESS] /mnt/cluster + +   Improving upon this Setup + +   Infiniband Verbs RDMA transport is much faster than TCP/IP GigE +transport. + +   Use of performance translators such as read-ahead, write-behind, +io-cache, io-threads, booster is recommended. + +   Replace round-robin (rr) scheduler with ALU to handle more dynamic +storage environments. + +   ---------- Footnotes ---------- + +   (1) +http://gluster.org/docs/index.php/Mixing_Striped_and_Regular_Files + + +File: user-guide.info,  Node: Troubleshooting,  Next: GNU Free Documentation Licence,  Prev: Usage Scenarios,  Up: Top + +6 Troubleshooting +***************** + +This chapter is a general troubleshooting guide to GlusterFS. It lists +common GlusterFS server and client error messages, debugging hints, and +concludes with the suggested procedure to report bugs in GlusterFS. + +6.1 GlusterFS error messages +============================ + +6.1.1 Server errors +------------------- + +     glusterfsd: FATAL: could not open specfile: +     '/etc/glusterfs/glusterfsd.vol' + +   The GlusterFS server expects the volume specification file to be at +`/etc/glusterfs/glusterfsd.vol'. The example specification file will be +installed as `/etc/glusterfs/glusterfsd.vol.sample'. You need to edit +it and rename it, or provide a different specification file using the +`--spec-file' command line option (See *Note Server::). + +     gf_log_init: failed to open logfile "/usr/var/log/glusterfs/glusterfsd.log" +                  (Permission denied) + +   You don't have permission to create files in the +`/usr/var/log/glusterfs' directory. Make sure you are running GlusterFS +as root. Alternatively, specify a different path for the log file using +the `--log-file' option (See *Note Server::). + +6.1.2 Client errors +------------------- + +     fusermount: failed to access mountpoint /mnt: +                 Transport endpoint is not connected + +   A previous failed (or hung) mount of GlusterFS is preventing it from +being mounted again in the same location. The fix is to do: + +     # umount /mnt + +   and try mounting again. + +   *"Transport endpoint is not connected".* + +   If you get this error when you try a command such as `ls' or `cat', +it means the GlusterFS mount did not succeed. Try running GlusterFS in +`DEBUG' logging level and study the log messages to discover the cause. + +   *"Connect to server failed", "SERVER-ADDRESS: Connection refused".* + +   GluserFS Server is not running or dead. Check your network +connections and firewall settings. To check if the server is reachable, +try: + +     telnet IP-ADDRESS 6996 + +   If the server is accessible, your `telnet' command should connect and +block. If not you will see an error message such as `telnet: Unable to +connect to remote host: Connection refused'. 6996 is the default +GlusterFS port. If you have changed it, then use the corresponding port +instead. + +     gf_log_init: failed to open logfile "/usr/var/log/glusterfs/glusterfs.log" +                  (Permission denied) + +   You don't have permission to create files in the +`/usr/var/log/glusterfs' directory. Make sure you are running GlusterFS +as root. Alternatively, specify a different path for the log file using +the `--log-file' option (See *Note Client::). + +6.2 FUSE error messages +======================= + +`modprobe fuse' fails with: "Unknown symbol in module, or unknown +parameter".   + +   If you are using fuse-2.6.x on Redhat Enterprise Linux Work Station 4 +and Advanced Server 4 with 2.6.9-42.ELlargesmp, 2.6.9-42.ELsmp, +2.6.9-42.EL kernels and get this error while loading FUSE kernel +module, you need to apply the following patch. + +   For fuse-2.6.2: + +<http://ftp.zresearch.com/pub/gluster/glusterfs/fuse/fuse-2.6.2-rhel-build.patch> + +   For fuse-2.6.3: + +<http://ftp.zresearch.com/pub/gluster/glusterfs/fuse/fuse-2.6.3-rhel-build.patch> + +6.3 AppArmour and GlusterFS +=========================== + +Under OpenSuSE GNU/Linux, the AppArmour security feature does not allow +GlusterFS to create temporary files or network socket connections even +while running as root. You will see error messages like `Unable to open +log file: Operation not permitted' or `Connection refused'. Disabling +AppArmour using YaST or properly configuring AppArmour to recognize +`glusterfsd' or `glusterfs'/`fusermount' should solve the problem. + +6.4 Reporting a bug +=================== + +If you encounter a bug in GlusterFS, please follow the below guidelines +when you report it to the mailing list. Be sure to report it! User +feedback is crucial to the health of the project and we value it highly. + +6.4.1 General instructions +-------------------------- + +When running GlusterFS in a non-production environment, be sure to +build it with the following command: + +      $ make CFLAGS='-g -O0 -DDEBUG' + +   This includes debugging information which will be helpful in getting +backtraces (see below) and also disable optimization. Enabling +optimization can result in incorrect line numbers being reported to gdb. + +6.4.2 Volume specification files +-------------------------------- + +Attach all relevant server and client spec files you were using when +you encountered the bug. Also tell us details of your setup, i.e., how +many clients and how many servers. + +6.4.3 Log files +--------------- + +Set the loglevel of your client and server programs to DEBUG (by +passing the -L DEBUG option) and attach the log files with your bug +report. Obviously, if only the client is failing (for example), you +only need to send us the client log file. + +6.4.4 Backtrace +--------------- + +If GlusterFS has encountered a segmentation fault or has crashed for +some other reason, include the backtrace with the bug report. You can +get the backtrace using the following procedure. + +   Run the GlusterFS client or server inside gdb. + +      $ gdb ./glusterfs +      (gdb) set args -f client.spec -N -l/path/to/log/file -LDEBUG /mnt/point +      (gdb) run + +   Now when the process segfaults, you can get the backtrace by typing: + +      (gdb) bt + +   If the GlusterFS process has crashed and dumped a core file (you can +find this in / if running as a daemon and in the current directory +otherwise), you can do: + +      $ gdb /path/to/glusterfs /path/to/core.<pid> + +   and then get the backtrace. + +   If the GlusterFS server or client seems to be hung, then you can get +the backtrace by attaching gdb to the process. First get the `PID' of +the process (using ps), and then do: + +      $ gdb ./glusterfs <pid> + +   Press Ctrl-C to interrupt the process and then generate the +backtrace. + +6.4.5 Reproducing the bug +------------------------- + +If the bug is reproducible, please include the steps necessary to do +so. If the bug is not reproducible, send us the bug report anyway. + +6.4.6 Other information +----------------------- + +If you think it is relevant, send us also the version of FUSE you're +using, the kernel version, platform. + + +File: user-guide.info,  Node: GNU Free Documentation Licence,  Next: Index,  Prev: Troubleshooting,  Up: Top + +Appendix A GNU Free Documentation Licence +***************************************** + +                      Version 1.2, November 2002 + +     Copyright (C) 2000,2001,2002 Free Software Foundation, Inc. +     59 Temple Place, Suite 330, Boston, MA  02111-1307, USA + +     Everyone is permitted to copy and distribute verbatim copies +     of this license document, but changing it is not allowed. + +  0. PREAMBLE + +     The purpose of this License is to make a manual, textbook, or other +     functional and useful document "free" in the sense of freedom: to +     assure everyone the effective freedom to copy and redistribute it, +     with or without modifying it, either commercially or +     noncommercially.  Secondarily, this License preserves for the +     author and publisher a way to get credit for their work, while not +     being considered responsible for modifications made by others. + +     This License is a kind of "copyleft", which means that derivative +     works of the document must themselves be free in the same sense. +     It complements the GNU General Public License, which is a copyleft +     license designed for free software. + +     We have designed this License in order to use it for manuals for +     free software, because free software needs free documentation: a +     free program should come with manuals providing the same freedoms +     that the software does.  But this License is not limited to +     software manuals; it can be used for any textual work, regardless +     of subject matter or whether it is published as a printed book. +     We recommend this License principally for works whose purpose is +     instruction or reference. + +  1. APPLICABILITY AND DEFINITIONS + +     This License applies to any manual or other work, in any medium, +     that contains a notice placed by the copyright holder saying it +     can be distributed under the terms of this License.  Such a notice +     grants a world-wide, royalty-free license, unlimited in duration, +     to use that work under the conditions stated herein.  The +     "Document", below, refers to any such manual or work.  Any member +     of the public is a licensee, and is addressed as "you".  You +     accept the license if you copy, modify or distribute the work in a +     way requiring permission under copyright law. + +     A "Modified Version" of the Document means any work containing the +     Document or a portion of it, either copied verbatim, or with +     modifications and/or translated into another language. + +     A "Secondary Section" is a named appendix or a front-matter section +     of the Document that deals exclusively with the relationship of the +     publishers or authors of the Document to the Document's overall +     subject (or to related matters) and contains nothing that could +     fall directly within that overall subject.  (Thus, if the Document +     is in part a textbook of mathematics, a Secondary Section may not +     explain any mathematics.)  The relationship could be a matter of +     historical connection with the subject or with related matters, or +     of legal, commercial, philosophical, ethical or political position +     regarding them. + +     The "Invariant Sections" are certain Secondary Sections whose +     titles are designated, as being those of Invariant Sections, in +     the notice that says that the Document is released under this +     License.  If a section does not fit the above definition of +     Secondary then it is not allowed to be designated as Invariant. +     The Document may contain zero Invariant Sections.  If the Document +     does not identify any Invariant Sections then there are none. + +     The "Cover Texts" are certain short passages of text that are +     listed, as Front-Cover Texts or Back-Cover Texts, in the notice +     that says that the Document is released under this License.  A +     Front-Cover Text may be at most 5 words, and a Back-Cover Text may +     be at most 25 words. + +     A "Transparent" copy of the Document means a machine-readable copy, +     represented in a format whose specification is available to the +     general public, that is suitable for revising the document +     straightforwardly with generic text editors or (for images +     composed of pixels) generic paint programs or (for drawings) some +     widely available drawing editor, and that is suitable for input to +     text formatters or for automatic translation to a variety of +     formats suitable for input to text formatters.  A copy made in an +     otherwise Transparent file format whose markup, or absence of +     markup, has been arranged to thwart or discourage subsequent +     modification by readers is not Transparent.  An image format is +     not Transparent if used for any substantial amount of text.  A +     copy that is not "Transparent" is called "Opaque". + +     Examples of suitable formats for Transparent copies include plain +     ASCII without markup, Texinfo input format, LaTeX input format, +     SGML or XML using a publicly available DTD, and +     standard-conforming simple HTML, PostScript or PDF designed for +     human modification.  Examples of transparent image formats include +     PNG, XCF and JPG.  Opaque formats include proprietary formats that +     can be read and edited only by proprietary word processors, SGML or +     XML for which the DTD and/or processing tools are not generally +     available, and the machine-generated HTML, PostScript or PDF +     produced by some word processors for output purposes only. + +     The "Title Page" means, for a printed book, the title page itself, +     plus such following pages as are needed to hold, legibly, the +     material this License requires to appear in the title page.  For +     works in formats which do not have any title page as such, "Title +     Page" means the text near the most prominent appearance of the +     work's title, preceding the beginning of the body of the text. + +     A section "Entitled XYZ" means a named subunit of the Document +     whose title either is precisely XYZ or contains XYZ in parentheses +     following text that translates XYZ in another language.  (Here XYZ +     stands for a specific section name mentioned below, such as +     "Acknowledgements", "Dedications", "Endorsements", or "History".) +     To "Preserve the Title" of such a section when you modify the +     Document means that it remains a section "Entitled XYZ" according +     to this definition. + +     The Document may include Warranty Disclaimers next to the notice +     which states that this License applies to the Document.  These +     Warranty Disclaimers are considered to be included by reference in +     this License, but only as regards disclaiming warranties: any other +     implication that these Warranty Disclaimers may have is void and +     has no effect on the meaning of this License. + +  2. VERBATIM COPYING + +     You may copy and distribute the Document in any medium, either +     commercially or noncommercially, provided that this License, the +     copyright notices, and the license notice saying this License +     applies to the Document are reproduced in all copies, and that you +     add no other conditions whatsoever to those of this License.  You +     may not use technical measures to obstruct or control the reading +     or further copying of the copies you make or distribute.  However, +     you may accept compensation in exchange for copies.  If you +     distribute a large enough number of copies you must also follow +     the conditions in section 3. + +     You may also lend copies, under the same conditions stated above, +     and you may publicly display copies. + +  3. COPYING IN QUANTITY + +     If you publish printed copies (or copies in media that commonly +     have printed covers) of the Document, numbering more than 100, and +     the Document's license notice requires Cover Texts, you must +     enclose the copies in covers that carry, clearly and legibly, all +     these Cover Texts: Front-Cover Texts on the front cover, and +     Back-Cover Texts on the back cover.  Both covers must also clearly +     and legibly identify you as the publisher of these copies.  The +     front cover must present the full title with all words of the +     title equally prominent and visible.  You may add other material +     on the covers in addition.  Copying with changes limited to the +     covers, as long as they preserve the title of the Document and +     satisfy these conditions, can be treated as verbatim copying in +     other respects. + +     If the required texts for either cover are too voluminous to fit +     legibly, you should put the first ones listed (as many as fit +     reasonably) on the actual cover, and continue the rest onto +     adjacent pages. + +     If you publish or distribute Opaque copies of the Document +     numbering more than 100, you must either include a +     machine-readable Transparent copy along with each Opaque copy, or +     state in or with each Opaque copy a computer-network location from +     which the general network-using public has access to download +     using public-standard network protocols a complete Transparent +     copy of the Document, free of added material.  If you use the +     latter option, you must take reasonably prudent steps, when you +     begin distribution of Opaque copies in quantity, to ensure that +     this Transparent copy will remain thus accessible at the stated +     location until at least one year after the last time you +     distribute an Opaque copy (directly or through your agents or +     retailers) of that edition to the public. + +     It is requested, but not required, that you contact the authors of +     the Document well before redistributing any large number of +     copies, to give them a chance to provide you with an updated +     version of the Document. + +  4. MODIFICATIONS + +     You may copy and distribute a Modified Version of the Document +     under the conditions of sections 2 and 3 above, provided that you +     release the Modified Version under precisely this License, with +     the Modified Version filling the role of the Document, thus +     licensing distribution and modification of the Modified Version to +     whoever possesses a copy of it.  In addition, you must do these +     things in the Modified Version: + +       A. Use in the Title Page (and on the covers, if any) a title +          distinct from that of the Document, and from those of +          previous versions (which should, if there were any, be listed +          in the History section of the Document).  You may use the +          same title as a previous version if the original publisher of +          that version gives permission. + +       B. List on the Title Page, as authors, one or more persons or +          entities responsible for authorship of the modifications in +          the Modified Version, together with at least five of the +          principal authors of the Document (all of its principal +          authors, if it has fewer than five), unless they release you +          from this requirement. + +       C. State on the Title page the name of the publisher of the +          Modified Version, as the publisher. + +       D. Preserve all the copyright notices of the Document. + +       E. Add an appropriate copyright notice for your modifications +          adjacent to the other copyright notices. + +       F. Include, immediately after the copyright notices, a license +          notice giving the public permission to use the Modified +          Version under the terms of this License, in the form shown in +          the Addendum below. + +       G. Preserve in that license notice the full lists of Invariant +          Sections and required Cover Texts given in the Document's +          license notice. + +       H. Include an unaltered copy of this License. + +       I. Preserve the section Entitled "History", Preserve its Title, +          and add to it an item stating at least the title, year, new +          authors, and publisher of the Modified Version as given on +          the Title Page.  If there is no section Entitled "History" in +          the Document, create one stating the title, year, authors, +          and publisher of the Document as given on its Title Page, +          then add an item describing the Modified Version as stated in +          the previous sentence. + +       J. Preserve the network location, if any, given in the Document +          for public access to a Transparent copy of the Document, and +          likewise the network locations given in the Document for +          previous versions it was based on.  These may be placed in +          the "History" section.  You may omit a network location for a +          work that was published at least four years before the +          Document itself, or if the original publisher of the version +          it refers to gives permission. + +       K. For any section Entitled "Acknowledgements" or "Dedications", +          Preserve the Title of the section, and preserve in the +          section all the substance and tone of each of the contributor +          acknowledgements and/or dedications given therein. + +       L. Preserve all the Invariant Sections of the Document, +          unaltered in their text and in their titles.  Section numbers +          or the equivalent are not considered part of the section +          titles. + +       M. Delete any section Entitled "Endorsements".  Such a section +          may not be included in the Modified Version. + +       N. Do not retitle any existing section to be Entitled +          "Endorsements" or to conflict in title with any Invariant +          Section. + +       O. Preserve any Warranty Disclaimers. + +     If the Modified Version includes new front-matter sections or +     appendices that qualify as Secondary Sections and contain no +     material copied from the Document, you may at your option +     designate some or all of these sections as invariant.  To do this, +     add their titles to the list of Invariant Sections in the Modified +     Version's license notice.  These titles must be distinct from any +     other section titles. + +     You may add a section Entitled "Endorsements", provided it contains +     nothing but endorsements of your Modified Version by various +     parties--for example, statements of peer review or that the text +     has been approved by an organization as the authoritative +     definition of a standard. + +     You may add a passage of up to five words as a Front-Cover Text, +     and a passage of up to 25 words as a Back-Cover Text, to the end +     of the list of Cover Texts in the Modified Version.  Only one +     passage of Front-Cover Text and one of Back-Cover Text may be +     added by (or through arrangements made by) any one entity.  If the +     Document already includes a cover text for the same cover, +     previously added by you or by arrangement made by the same entity +     you are acting on behalf of, you may not add another; but you may +     replace the old one, on explicit permission from the previous +     publisher that added the old one. + +     The author(s) and publisher(s) of the Document do not by this +     License give permission to use their names for publicity for or to +     assert or imply endorsement of any Modified Version. + +  5. COMBINING DOCUMENTS + +     You may combine the Document with other documents released under +     this License, under the terms defined in section 4 above for +     modified versions, provided that you include in the combination +     all of the Invariant Sections of all of the original documents, +     unmodified, and list them all as Invariant Sections of your +     combined work in its license notice, and that you preserve all +     their Warranty Disclaimers. + +     The combined work need only contain one copy of this License, and +     multiple identical Invariant Sections may be replaced with a single +     copy.  If there are multiple Invariant Sections with the same name +     but different contents, make the title of each such section unique +     by adding at the end of it, in parentheses, the name of the +     original author or publisher of that section if known, or else a +     unique number.  Make the same adjustment to the section titles in +     the list of Invariant Sections in the license notice of the +     combined work. + +     In the combination, you must combine any sections Entitled +     "History" in the various original documents, forming one section +     Entitled "History"; likewise combine any sections Entitled +     "Acknowledgements", and any sections Entitled "Dedications".  You +     must delete all sections Entitled "Endorsements." + +  6. COLLECTIONS OF DOCUMENTS + +     You may make a collection consisting of the Document and other +     documents released under this License, and replace the individual +     copies of this License in the various documents with a single copy +     that is included in the collection, provided that you follow the +     rules of this License for verbatim copying of each of the +     documents in all other respects. + +     You may extract a single document from such a collection, and +     distribute it individually under this License, provided you insert +     a copy of this License into the extracted document, and follow +     this License in all other respects regarding verbatim copying of +     that document. + +  7. AGGREGATION WITH INDEPENDENT WORKS + +     A compilation of the Document or its derivatives with other +     separate and independent documents or works, in or on a volume of +     a storage or distribution medium, is called an "aggregate" if the +     copyright resulting from the compilation is not used to limit the +     legal rights of the compilation's users beyond what the individual +     works permit.  When the Document is included in an aggregate, this +     License does not apply to the other works in the aggregate which +     are not themselves derivative works of the Document. + +     If the Cover Text requirement of section 3 is applicable to these +     copies of the Document, then if the Document is less than one half +     of the entire aggregate, the Document's Cover Texts may be placed +     on covers that bracket the Document within the aggregate, or the +     electronic equivalent of covers if the Document is in electronic +     form.  Otherwise they must appear on printed covers that bracket +     the whole aggregate. + +  8. TRANSLATION + +     Translation is considered a kind of modification, so you may +     distribute translations of the Document under the terms of section +     4.  Replacing Invariant Sections with translations requires special +     permission from their copyright holders, but you may include +     translations of some or all Invariant Sections in addition to the +     original versions of these Invariant Sections.  You may include a +     translation of this License, and all the license notices in the +     Document, and any Warranty Disclaimers, provided that you also +     include the original English version of this License and the +     original versions of those notices and disclaimers.  In case of a +     disagreement between the translation and the original version of +     this License or a notice or disclaimer, the original version will +     prevail. + +     If a section in the Document is Entitled "Acknowledgements", +     "Dedications", or "History", the requirement (section 4) to +     Preserve its Title (section 1) will typically require changing the +     actual title. + +  9. TERMINATION + +     You may not copy, modify, sublicense, or distribute the Document +     except as expressly provided for under this License.  Any other +     attempt to copy, modify, sublicense or distribute the Document is +     void, and will automatically terminate your rights under this +     License.  However, parties who have received copies, or rights, +     from you under this License will not have their licenses +     terminated so long as such parties remain in full compliance. + + 10. FUTURE REVISIONS OF THIS LICENSE + +     The Free Software Foundation may publish new, revised versions of +     the GNU Free Documentation License from time to time.  Such new +     versions will be similar in spirit to the present version, but may +     differ in detail to address new problems or concerns.  See +     `http://www.gnu.org/copyleft/'. + +     Each version of the License is given a distinguishing version +     number.  If the Document specifies that a particular numbered +     version of this License "or any later version" applies to it, you +     have the option of following the terms and conditions either of +     that specified version or of any later version that has been +     published (not as a draft) by the Free Software Foundation.  If +     the Document does not specify a version number of this License, +     you may choose any version ever published (not as a draft) by the +     Free Software Foundation. + +A.0.1 ADDENDUM: How to use this License for your documents +---------------------------------------------------------- + +To use this License in a document you have written, include a copy of +the License in the document and put the following copyright and license +notices just after the title page: + +       Copyright (C)  YEAR  YOUR NAME. +       Permission is granted to copy, distribute and/or modify this document +       under the terms of the GNU Free Documentation License, Version 1.2 +       or any later version published by the Free Software Foundation; +       with no Invariant Sections, no Front-Cover Texts, and no Back-Cover +       Texts.  A copy of the license is included in the section entitled ``GNU +       Free Documentation License''. + +   If you have Invariant Sections, Front-Cover Texts and Back-Cover +Texts, replace the "with...Texts." line with this: + +         with the Invariant Sections being LIST THEIR TITLES, with +         the Front-Cover Texts being LIST, and with the Back-Cover Texts +         being LIST. + +   If you have Invariant Sections without Cover Texts, or some other +combination of the three, merge those two alternatives to suit the +situation. + +   If your document contains nontrivial examples of program code, we +recommend releasing these examples in parallel under your choice of +free software license, such as the GNU General Public License, to +permit their use in free software. + + +File: user-guide.info,  Node: Index,  Prev: GNU Free Documentation Licence,  Up: Top + +Index +***** + + +* Menu: + +* alu (scheduler):                       Unify.               (line  49) +* AppArmour:                             Troubleshooting.     (line  96) +* arch:                                  Getting GlusterFS.   (line   6) +* booster:                               Booster.             (line   6) +* commercial support:                    Introduction.        (line  36) +* DNS round robin:                       Transport modules.   (line  29) +* fcntl:                                 POSIX Locks.         (line   6) +* FDL, GNU Free Documentation License:   GNU Free Documentation Licence. +                                                              (line   6) +* fixed-id (translator):                 Fixed ID.            (line   6) +* GlusterFS client:                      Client.              (line   6) +* GlusterFS mailing list:                Introduction.        (line  28) +* GlusterFS server:                      Server.              (line   6) +* infiniband transport:                  Transport modules.   (line  58) +* InfiniBand, installation:              Pre requisites.      (line  51) +* io-cache (translator):                 IO Cache.            (line   6) +* io-threads (translator):               IO Threads.          (line   6) +* IRC channel, #gluster:                 Introduction.        (line  31) +* libibverbs:                            Pre requisites.      (line  51) +* namespace:                             Unify.               (line 207) +* nufa (scheduler):                      Unify.               (line 175) +* OpenSuSE:                              Troubleshooting.     (line  96) +* posix-locks (translator):              POSIX Locks.         (line   6) +* random (scheduler):                    Unify.               (line 159) +* read-ahead (translator):               Read Ahead.          (line   6) +* record locking:                        POSIX Locks.         (line   6) +* Redhat Enterprise Linux:               Troubleshooting.     (line  78) +* Replicate:                             Replicate.           (line   6) +* rot-13 (translator):                   ROT-13.              (line   6) +* rr (scheduler):                        Unify.               (line 138) +* scheduler (unify):                     Unify.               (line   6) +* self heal (replicate):                 Replicate.           (line  46) +* self heal (unify):                     Unify.               (line 223) +* stripe (translator):                   Stripe.              (line   6) +* trace (translator):                    Trace.               (line   6) +* unify (translator):                    Unify.               (line   6) +* unify invariants:                      Unify.               (line  16) +* write-behind (translator):             Write Behind.        (line   6) +* Z Research, Inc.:                      Introduction.        (line  36) + + + +Tag Table: +Node: Top703 +Node: Acknowledgements2303 +Node: Introduction3213 +Node: Installation and Invocation4648 +Node: Pre requisites4932 +Node: Getting GlusterFS7022 +Ref: Getting GlusterFS-Footnote-17808 +Node: Building7856 +Node: Running GlusterFS9558 +Node: Server9769 +Node: Client11357 +Node: A Tutorial Introduction13563 +Node: Concepts17100 +Node: Filesystems in Userspace17315 +Node: Translator18456 +Node: Volume specification file21159 +Node: Translators23631 +Node: Storage Translators24200 +Ref: Storage Translators-Footnote-125007 +Node: POSIX25141 +Node: BDB25764 +Node: Client and Server Translators26821 +Node: Transport modules27297 +Node: Client protocol31444 +Node: Server protocol32383 +Node: Clustering Translators33372 +Node: Unify34259 +Ref: Unify-Footnote-143858 +Node: Replicate43950 +Node: Stripe49005 +Node: Performance Translators50163 +Node: Read Ahead50437 +Node: Write Behind52169 +Node: IO Threads53578 +Node: IO Cache54366 +Node: Booster55690 +Node: Features Translators57104 +Node: POSIX Locks57332 +Node: Fixed ID58649 +Node: Miscellaneous Translators59135 +Node: ROT-1359333 +Node: Trace60012 +Node: Usage Scenarios61281 +Ref: Usage Scenarios-Footnote-167214 +Node: Troubleshooting67289 +Node: GNU Free Documentation Licence73637 +Node: Index96086 + +End Tag Table diff --git a/doc/user-guide/user-guide.pdf b/doc/user-guide/user-guide.pdf Binary files differnew file mode 100644 index 00000000000..ed7bd2a9907 --- /dev/null +++ b/doc/user-guide/user-guide.pdf diff --git a/doc/user-guide/user-guide.texi b/doc/user-guide/user-guide.texi new file mode 100644 index 00000000000..8365419a611 --- /dev/null +++ b/doc/user-guide/user-guide.texi @@ -0,0 +1,2226 @@ +\input texinfo +@setfilename user-guide.info +@settitle GlusterFS 2.0 User Guide +@afourpaper + +@direntry +* GlusterFS: (user-guide). GlusterFS distributed filesystem user guide +@end direntry + +@copying +This is the user manual for GlusterFS 2.0. + +Copyright @copyright{} 2008,2007 @email{@b{Z}} Research, Inc. Permission is granted to +copy, distribute and/or modify this document under the terms of the +@acronym{GNU} Free Documentation License, Version 1.2 or any later +version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the +license is included in the chapter entitled ``@acronym{GNU} Free +Documentation License''. +@end copying + +@titlepage +@title GlusterFS 2.0 User Guide [DRAFT] +@subtitle January 15, 2008 +@author http://gluster.org/core-team.php +@author @email{@b{Z}} @b{Research} + +@page +@vskip 0pt plus 1filll +@insertcopying +@end titlepage + +@c Info stuff +@ifnottex +@node Top +@top GlusterFS 2.0 User Guide + +@insertcopying +@menu +* Acknowledgements::             +* Introduction::                 +* Installation and Invocation::   +* Concepts::                     +* Translators::                  +* Usage Scenarios::              +* Troubleshooting::              +* GNU Free Documentation Licence::   +* Index::                        + +@detailmenu + --- The Detailed Node Listing --- + +Installation and Invocation + +* Pre requisites::               +* Getting GlusterFS::            +* Building::                     +* Running GlusterFS::            +* A Tutorial Introduction::      + +Running GlusterFS + +* Server::                       +* Client::                       + +Concepts + +* Filesystems in Userspace::                 +* Translator::                   +* Volume specification file::    + +Translators + +* Storage Translators::          +* Client and Server Translators::   +* Clustering Translators::       +* Performance Translators::      +* Features Translators::         + +Storage Translators + +* POSIX::         + +Client and Server Translators + +* Transport modules::            +* Client protocol::              +* Server protocol::              + +Clustering Translators + +* Unify::                        +* Replicate::   +* Stripe::                       + +Performance Translators + +* Read Ahead::                   +* Write Behind::                 +* IO Threads::                   +* IO Cache::                     + +Features Translators  + +* POSIX Locks::                  +* Fixed ID::                     + +Miscellaneous Translators + +* ROT-13::                       +* Trace::                        + +@end detailmenu +@end menu + +@end ifnottex +@c Info stuff end + +@contents + +@node Acknowledgements +@unnumbered Acknowledgements +GlusterFS continues to be a wonderful and enriching experience for all +of us involved.  + +GlusterFS development would not have been possible at this pace if +not for our enthusiastic users. People from around the world have +helped us with bug reports, performance numbers, and feature suggestions. +A huge thanks to them all. + +Matthew Paine - for RPMs & general enthu + +Leonardo Rodrigues de Mello - for DEBs + +Julian Perez & Adam D'Auria - for multi-server tutorial + +Paul England - for HA spec + +Brent Nelson - for many bug reports + +Jacques Mattheij - for Europe mirror. + +Patrick Negri - for TCP non-blocking connect. +@flushright +http://gluster.org/core-team.php (@email{list-hacking@@zresearch.com}) +@email{@b{Z}} Research +@end flushright + +@node Introduction +@chapter Introduction + +GlusterFS is a distributed filesystem. It works at the file level, +not block level. + +A network filesystem is one which allows us to access remote files. A +distributed filesystem is one that stores data on multiple machines +and makes them all appear to be a part of the same filesystem. + +Need for distributed filesystems + +@itemize @bullet +@item Scalability: A distributed filesystem allows us to store more data than what can be stored on a single machine. + +@item Redundancy: We might want to replicate crucial data on to several machines. + +@item Uniform access: One can mount a remote volume (for example your home directory) from any machine and access the same data. +@end itemize + +@section Contacting us +You can reach us through the mailing list @strong{gluster-devel}  +(@email{gluster-devel@@nongnu.org}). +@cindex GlusterFS mailing list + +You can also find many of the developers on @acronym{IRC}, on the @code{#gluster} +channel on Freenode (@indicateurl{irc.freenode.net}). +@cindex IRC channel, #gluster + +The GlusterFS documentation wiki is also useful: @* +@indicateurl{http://gluster.org/docs/index.php/GlusterFS} + +For commercial support, you can contact @email{@b{Z}} Research at: +@cindex commercial support +@cindex Z Research, Inc. + +@display +3194 Winding Vista Common +Fremont, CA 94539 +USA. + +Phone: +1 (510) 354 6801 +Toll free: +1 (888) 813 6309 +Fax: +1 (510) 372 0604 +@end display + +You can also email us at @email{support@@zresearch.com}. + +@node Installation and Invocation +@chapter Installation and Invocation + +@menu +* Pre requisites::               +* Getting GlusterFS::            +* Building::                     +* Running GlusterFS::            +* A Tutorial Introduction::      +@end menu + +@node Pre requisites +@section Pre requisites + +Before installing GlusterFS make sure you have the +following components installed. + +@subsection @acronym{FUSE} +You'll need @acronym{FUSE} version 2.6.0 or higher to +use GlusterFS. You can omit installing @acronym{FUSE} if you want to +build @emph{only} the server. Note that you won't be able to mount +a GlusterFS filesystem on a machine that does not have @acronym{FUSE} +installed. + +@acronym{FUSE} can be downloaded from: @indicateurl{http://fuse.sourceforge.net/} + +To get the best performance from GlusterFS, however, it is recommended that you use +our patched version of @acronym{FUSE}. See Patched FUSE for details. + +@subsection Patched FUSE + +The GlusterFS project maintains a patched version of @acronym{FUSE} meant to be used  +with GlusterFS. The patches increase GlusterFS performance. It is recommended that +all users use the patched @acronym{FUSE}. + +The patched @acronym{FUSE} tarball can be downloaded from: + +@indicateurl{ftp://ftp.zresearch.com/pub/gluster/glusterfs/fuse/} + +The specific changes made to @acronym{FUSE} are: + +@itemize +@item The communication channel size between @acronym{FUSE} kernel module and GlusterFS has been increased to 1MB, permitting large reads and writes to be sent in bigger chunks. + +@item The kernel's read-ahead boundry has been extended upto 1MB. + +@item Block size returned in the @command{stat()}/@command{fstat()} calls tuned to 1MB, to make cp and similar commands perform I/O using that block size. + +@item @command{flock()} locking support has been added (although some rework in GlusterFS is needed for perfect compliance). +@end itemize + +@subsection libibverbs (optional) +@cindex InfiniBand, installation +@cindex libibverbs +This is only needed if you want GlusterFS to use InfiniBand as the +interconnect mechanism between server and client. You can get it from: + +@indicateurl{http://www.openfabrics.org/downloads.htm}. + +@subsection Bison and Flex +These should be already installed on most Linux systems. If not, use your distribution's +normal software installation procedures to install them. Make sure you install the +relevant developer packages also. + +@node Getting GlusterFS +@section Getting GlusterFS +@cindex arch +There are many ways to get hold of GlusterFS. For a production deployment, +the recommended method is to download the latest release tarball. +Release tarballs are available at: @indicateurl{http://gluster.org/download.php}. + +If you want the bleeding edge development source, you can get them +from the @acronym{GNU} +Arch@footnote{@indicateurl{http://www.gnu.org/software/gnu-arch/}} +repository. First you must install @acronym{GNU} Arch itself. Then +register the GlusterFS archive by doing: + +@example +$ tla register-archive http://arch.sv.gnu.org/archives/gluster +@end example + +Now you can check out the source itself: + +@example +$ tla get -A gluster@@sv.gnu.org glusterfs--mainline--3.0 +@end example + +@node Building +@section Building +You can skip this section if you're installing from @acronym{RPM}s +or @acronym{DEB}s. + +GlusterFS uses the Autotools mechanism to build. As such, the procedure +is straight-forward. First, change into the GlusterFS source directory. + +@example +$ cd glusterfs-<version> +@end example + +If you checked out the source from the Arch repository, you'll need +to run @command{./autogen.sh} first. Note that you'll need to have +Autoconf and Automake installed for this.  + +Run @command{configure}. + +@example +$ ./configure +@end example + +The configure script accepts the following options: + +@cartouche +@table @code + +@item --disable-ibverbs +Disable the InfiniBand transport mechanism. + +@item --disable-fuse-client +Disable the @acronym{FUSE} client. + +@item --disable-server +Disable building of the GlusterFS server. + +@item --disable-bdb +Disable building of Berkeley DB based storage translator. + +@item --disable-mod_glusterfs +Disable building of Apache/lighttpd glusterfs plugins. + +@item --disable-epoll +Use poll instead of epoll. + +@item --disable-libglusterfsclient +Disable building of libglusterfsclient + +@end table +@end cartouche + +Build and install GlusterFS. + +@example +# make install +@end example + +The binaries (@command{glusterfsd} and @command{glusterfs}) will be by +default installed in @command{/usr/local/sbin/}. Translator, +scheduler, and transport shared libraries will be installed in +@command{/usr/local/lib/glusterfs/<version>/}. Sample volume +specification files will be in @command{/usr/local/etc/glusterfs/}. +This document itself can be found in +@command{/usr/local/share/doc/glusterfs/}. If you passed the @command{--prefix} +argument to the configure script, then replace @command{/usr/local} in the preceding +paths with the prefix. + +@node Running GlusterFS +@section Running GlusterFS + +@menu +* Server::                       +* Client::                       +@end menu + +@node Server +@subsection Server +@cindex GlusterFS server + +The GlusterFS server is necessary to export storage volumes to remote clients +(See @ref{Server protocol} for more info). This section documents the invocation +of the GlusterFS server program and all the command-line options accepted by it. + +@cartouche +@table @code +Basic Options +@item -f, --volfile=<path>    +      Use the volume file as the volume specification. + +@item -s, --volfile-server=<hostname> +      Server to get volume file from. This option overrides --volfile option. + +@item -l, --log-file=<path> +      Specify the path for the log file. + +@item -L, --log-level=<level> +      Set the log level for the server. Log level should be one of @acronym{DEBUG},  +@acronym{WARNING}, @acronym{ERROR}, @acronym{CRITICAL}, or @acronym{NONE}. + +Advanced Options +@item --debug +      Run in debug mode. This option sets --no-daemon, --log-level to DEBUG and +      --log-file to console. + +@item  -N, --no-daemon             +      Run glusterfsd as a foreground process. + +@item  -p, --pid-file=<path>       +      Path for the @acronym{PID} file. + +@item --volfile-id=<key> +      'key' of the volfile to be fetched from server. + +@item --volfile-server-port=<port-number> +      Listening port number of volfile server. + +@item --volfile-server-transport=[socket|ib-verbs] +      Transport type to get volfile from server. [default: @command{socket}] + +@item --xlator-options=<volume-name.option=value> +      Add/override a translator option for a volume with specified value. + +Miscellaneous Options +@item  -?, --help                  +       Show this help text. + +@item  --usage                 +       Display a short usage message. + +@item  -V, --version               +       Show version information. +@end table +@end cartouche + +@node Client +@subsection Client +@cindex GlusterFS client + +The GlusterFS client process is necessary to access remote storage volumes and +mount them locally using @acronym{FUSE}. This section documents the invocation of the +client process and all its command-line arguments. + +@example +  # glusterfs [options] <mountpoint> +@end example + +The @command{mountpoint} is the directory where you want the GlusterFS +filesystem to appear. Example: + +@example +  # glusterfs -f /usr/local/etc/glusterfs-client.vol /mnt +@end example + +The command-line options are detailed below. + +@tex +\vfill +@end tex +@page + +@cartouche +@table @code + +Basic Options +@item -f, --volfile=<path>    +      Use the volume file as the volume specification. + +@item -s, --volfile-server=<hostname> +      Server to get volume file from. This option overrides --volfile option. + +@item -l, --log-file=<path> +      Specify the path for the log file. + +@item -L, --log-level=<level> +      Set the log level for the server. Log level should be one of @acronym{DEBUG},  +@acronym{WARNING}, @acronym{ERROR}, @acronym{CRITICAL}, or @acronym{NONE}. + +Advanced Options +@item --debug +      Run in debug mode. This option sets --no-daemon, --log-level to DEBUG and +      --log-file to console. + +@item  -N, --no-daemon             +      Run @command{glusterfs} as a foreground process. + +@item  -p, --pid-file=<path>       +      Path for the @acronym{PID} file. + +@item --volfile-id=<key> +      'key' of the volfile to be fetched from server. + +@item --volfile-server-port=<port-number> +      Listening port number of volfile server. + +@item --volfile-server-transport=[socket|ib-verbs] +      Transport type to get volfile from server. [default: @command{socket}] + +@item --xlator-options=<volume-name.option=value> +      Add/override a translator option for a volume with specified value. + +@item  --volume-name=<volume name> +      Volume name in client spec to use. Defaults to the root volume. + +@acronym{FUSE} Options +@item  --attribute-timeout=<n> +       Attribute timeout for inodes in the kernel, in seconds. Defaults to 1 second. + +@item  --disable-direct-io-mode +       Disable direct @acronym{I/O} mode in @acronym{FUSE} kernel module. + +@item  -e, --entry-timeout=<n> +       Entry timeout for directory entries in the kernel, in seconds.  +       Defaults to 1 second. + +Missellaneous Options +@item  -?, --help                  +       Show this help information. + +@item  -V, --version               +       Show version information. +@end table +@end cartouche + +@node A Tutorial Introduction +@section A Tutorial Introduction + +This section will show you how to quickly get GlusterFS up and running. We'll  +configure GlusterFS as a simple network filesystem, with one server and one client. +In this mode of usage, GlusterFS can serve as a replacement for NFS. + +We'll make use of two machines; call them @emph{server} and +@emph{client} (If you don't want to setup two machines, just run +everything that follows on the same machine).  In the examples that +follow, the shell prompts will use these names to clarify the machine +on which the command is being run. For example, a command that should +be run on the server will be shown with the prompt: + +@example +[root@@server]# +@end example + +Our goal is to make a directory on the @emph{server} (say, @command{/export}) +accessible to the @emph{client}. + +First of all, get GlusterFS installed on both the machines, as described in the  +previous sections. Make sure you have the @acronym{FUSE} kernel module loaded. You +can ensure this by running:  + +@example +[root@@server]# modprobe fuse +@end example + +Before we can run the GlusterFS client or server programs, we need to write +two files called @emph{volume specifications} (equivalently refered to as @emph{volfiles}).  +The volfile describes the @emph{translator tree} on a node. The next chapter will +explain the concepts of `translator' and `volume specification' in detail. For now,  +just assume that the volfile is like an NFS @command{/etc/export} file. + +On the server, create a text file somewhere (we'll assume the path +@command{/tmp/glusterfsd.vol}) with the following contents. + +@cartouche +@example +volume colon-o +  type storage/posix +  option directory /export +end-volume + +volume server +  type protocol/server +  subvolumes colon-o +  option transport-type tcp      +  option auth.addr.colon-o.allow * +end-volume +@end example +@end cartouche + +A brief explanation of the file's contents. The first section defines a storage +volume, named ``colon-o'' (the volume names are arbitrary), which exports the +@command{/export} directory. The second section defines options for the translator +which will make the storage volume accessible remotely. It specifies @command{colon-o} as +a subvolume. This defines the @emph{translator tree}, about which more will be said +in the next chapter. The two options specify that the @acronym{TCP} protocol is to be +used (as opposed to InfiniBand, for example), and that access to the storage volume +is to be provided to clients with any @acronym{IP} address at all. If you wanted to +restrict access to this server to only your subnet for example, you'd specify +something like @command{192.168.1.*} in the second option line. + +On the client machine, create the following text file (again, we'll assume +the path to be @command{/tmp/glusterfs-client.vol}). Replace +@emph{server-ip-address} with the @acronym{IP} address of your server machine. If you +are doing all this on a single machine, use @command{127.0.0.1}. + +@cartouche +@example +volume client +  type protocol/client +  option transport-type tcp +  option remote-host @emph{server-ip-address} +  option remote-subvolume colon-o +end-volume +@end example +@end cartouche + +Now we need to start both the server and client programs. To start the server: + +@example +[root@@server]# glusterfsd -f /tmp/glusterfs-server.vol +@end example + +To start the client: + +@example +[root@@client]# glusterfs -f /tmp/glusterfs-client.vol /mnt/glusterfs +@end example + +You should now be able to see the files under the server's @command{/export} directory +in the @command{/mnt/glusterfs} directory on the client. That's it; GlusterFS is now +working as a network file system. + +@node Concepts +@chapter Concepts + +@menu +* Filesystems in Userspace::                 +* Translator::                   +* Volume specification file::    +@end menu + +@node Filesystems in Userspace +@section Filesystems in Userspace + +A filesystem is usually implemented in kernel space. Kernel space +development is much harder than userspace development. @acronym{FUSE} +is a kernel module/library that allows us to write a filesystem +completely in userspace. + +@acronym{FUSE} consists of a kernel module which interacts with the userspace +implementation using a device file @code{/dev/fuse}. When a process  +makes a syscall on a @acronym{FUSE} filesystem, @acronym{VFS} hands the request to the +@acronym{FUSE} module, which writes the request to @code{/dev/fuse}. The +userspace implementation polls @code{/dev/fuse}, and when a request arrives, +processes it and writes the result back to @code{/dev/fuse}. The kernel then +reads from the device file and returns the result to the user process.  + +In case of GlusterFS, the userspace program is the GlusterFS client. +The control flow is shown in the diagram below. The GlusterFS client +services the request by sending it to the server, which in turn  +hands it to the local @acronym{POSIX} filesystem. + +@center @image{fuse,44pc,,,.pdf} +@center Fig 1. Control flow in GlusterFS + +@node Translator +@section Translator + +The @emph{translator} is the most important concept in GlusterFS. In +fact, GlusterFS is nothing but a collection of translators working +together, forming a translator @emph{tree}. + +The idea of a translator is perhaps best understood using an +analogy. Consider the @acronym{VFS} in the Linux kernel. The +@acronym{VFS} abstracts the various filesystem implementations (such +as @acronym{EXT3}, ReiserFS, @acronym{XFS}, etc.) supported by the +kernel. When an application calls the kernel to perform an operation +on a file, the kernel passes the request on to the appropriate +filesystem implementation. + +For example, let's say there are two partitions on a Linux machine: +@command{/}, which is an @acronym{EXT3} partition, and @command{/usr}, +which is a ReiserFS partition. Now if an application wants to open a +file called, say, @command{/etc/fstab}, then the kernel will +internally pass the request to the @acronym{EXT3} implementation.  If +on the other hand, an application wants to read a file called +@command{/usr/src/linux/CREDITS}, then the kernel will call upon the +ReiserFS implementation to do the job. + +The ``filesystem implementation'' objects are analogous to GlusterFS +translators. A GlusterFS translator implements all the filesystem +operations.  Whereas in @acronym{VFS} there is a two-level tree (with +the kernel at the root and all the filesystem implementation as its +children), in GlusterFS there exists a more elaborate tree structure. + +We can now define translators more precisely. A GlusterFS translator +is a shared object (@command{.so}) that implements every filesystem +call. GlusterFS translators can be arranged in an arbitrary tree +structure (subject to constraints imposed by the translators). When +GlusterFS receives a filesystem call, it passes it on to the +translator at the root of the translator tree. The root translator may +in turn pass it on to any or all of its children, and so on, until the +leaf nodes are reached. The result of a filesystem call is +communicated in the reverse fashion, from the leaf nodes up to the +root node, and then on to the application. + +So what might a translator tree look like? + +@tex +\vfill +@end tex +@page + +@center @image{xlator,44pc,,,.pdf} +@center Fig 2. A sample translator tree + +The diagram depicts three servers and one GlusterFS client. It is important +to note that conceptually, the translator tree spans machine boundaries. +Thus, the client machine in the diagram, @command{10.0.0.1}, can access +the aggregated storage of the filesystems on the server machines @command{10.0.0.2}, +@command{10.0.0.3}, and @command{10.0.0.4}. The translator diagram will make more +sense once you've read the next chapter and understood the functions of the +various translators. + +@node Volume specification file +@section Volume specification file +The volume specification file describes the translator tree for both the +server and client programs. + +A volume specification file is a sequence of volume definitions. +The syntax of a volume definition is explained below: + +@cartouche +@example +@strong{volume} @emph{volume-name} +  @strong{type} @emph{translator-name} +  @strong{option} @emph{option-name} @emph{option-value} +  @dots{} +  @strong{subvolumes} @emph{subvolume1} @emph{subvolume2} @dots{} +@strong{end-volume} +@end example + +@dots{} +@end cartouche + +@table @asis +@item @emph{volume-name} +  An identifier for the volume. This is just a human-readable name, +and can contain any alphanumeric character. For instance, ``storage-1'', ``colon-o'', +or ``forty-two''. + +@item @emph{translator-name} +  Name of one of the available translators. Example: @command{protocol/client}, +@command{cluster/unify}. + +@item @emph{option-name} +  Name of a valid option for the translator. + +@item @emph{option-value} +  Value for the option. Everything following the ``option'' keyword to the end of the +line is considered the value; it is up to the translator to parse it. + +@item @emph{subvolume1}, @emph{subvolume2}, @dots{} +  Volume names of sub-volumes. The sub-volumes must already have been defined earlier  +in the file. +@end table + +There are a few rules you must follow when writing a volume specification file: + +@itemize +@item Everything following a `@command{#}' is considered a comment and is ignored. Blank lines are also ignored. +@item All names and keywords are case-sensitive. +@item The order of options inside a volume definition does not matter. +@item An option value may not span multiple lines. +@item If an option is not specified, it will assume its default value. +@item A sub-volume must have already been defined before it can be referenced. This means you have to write the specification file ``bottom-up'', starting from the leaf nodes of the translator tree and moving up to the root. +@end itemize + +A simple example volume specification file is shown below: + +@cartouche +@example +# This is a comment line +volume client + type protocol/client + option transport-type tcp + option remote-host localhost      # Also a comment + option remote-subvolume brick +# The subvolumes line may be absent +end-volume + +volume iot + type performance/io-threads + option thread-count 4 + subvolumes client +end-volume + +volume wb + type performance/write-behind + subvolumes iot +end-volume +@end example +@end cartouche + +@node Translators +@chapter Translators + +@menu +* Storage Translators::          +* Client and Server Translators::   +* Clustering Translators::       +* Performance Translators::      +* Features Translators::         +* Miscellaneous Translators:: +@end menu + +This chapter documents all the available GlusterFS translators in detail. +Each translator section will show its name (for example, @command{cluster/unify}), +briefly describe its purpose and workings, and list every option accepted by +that translator and their meaning. + +@node Storage Translators +@section Storage Translators + +The storage translators form the ``backend'' for GlusterFS. Currently, +the only available storage translator is the @acronym{POSIX} +translator, which stores files on a normal @acronym{POSIX} +filesystem. A pleasant consequence of this is that your data will +still be accessible if GlusterFS crashes or cannot be started. + +Other storage backends are planned for the future. One of the possibilities is an +Amazon S3 translator. Amazon S3 is an unlimited online storage service accessible +through a web services @acronym{API}. The S3 translator will allow you to access +the storage as a normal @acronym{POSIX} filesystem. +@footnote{Some more discussion about this can be found at:  + +http://developer.amazonwebservices.com/connect/message.jspa?messageID=52873} + +@menu +* POSIX::         +* BDB:: +@end menu + +@node POSIX +@subsection POSIX +@example +type storage/posix +@end example + +The @command{posix} translator uses a normal @acronym{POSIX} +filesystem as its ``backend'' to actually store files and +directories. This can be any filesystem that supports extended +attributes (@acronym{EXT3}, ReiserFS, @acronym{XFS}, ...). Extended +attributes are used by some translators to store metadata, for +example, by the replicate and stripe translators. See  +@ref{Replicate} and @ref{Stripe}, respectively for details. + +@cartouche +@table @code +@item directory <path> +The directory on the local filesystem which is to be used for storage. +@end table                +@end cartouche + +@node BDB +@subsection BDB +@example +type storage/bdb +@end example + +The @command{BDB} translator uses a @acronym{Berkeley DB} database as its +``backend'' to actually store files as key-value pair in the database and +directories as regular @acronym{POSIX} directories. Note that @acronym{BDB} +does not provide extended attribute support for regular files. Do not use  +@acronym{BDB} as storage translator while using any translator that demands +extended attributes on ``backend''. + +@cartouche +@table @code +@item directory <path> +The directory on the local filesystem which is to be used for storage. +@item mode [cache|persistent] (cache) +When @acronym{BDB} is run in @command{cache} mode, recovery of back-end is not completely +guaranteed. @command{persistent} guarantees that @acronym{BDB} can recover back-end from +@acronym{Berkeley DB} even if GlusterFS crashes. +@item errfile <path> +The path of the file to be used as @command{errfile} for @acronym{Berkeley DB} to report +detailed error messages, if any. Note that all the contents of this file will be written +by @acronym{Berkeley DB}, not GlusterFS. +@item logdir <path> + + +@end table +@end cartouche + +@node Client and Server Translators, Clustering Translators, Storage Translators, Translators +@section Client and Server Translators + +The client and server translator enable GlusterFS to export a +translator tree over the network or access a remote GlusterFS +server. These two translators implement GlusterFS's network protocol. + +@menu +* Transport modules::            +* Client protocol::              +* Server protocol::              +@end menu + +@node Transport modules +@subsection Transport modules +The client and server translators are capable of using any of the +pluggable transport modules. Currently available transport modules are +@command{tcp}, which uses a @acronym{TCP} connection between client +and server to communicate; @command{ib-sdp}, which uses a +@acronym{TCP} connection over InfiniBand, and @command{ibverbs}, which +uses high-speed InfiniBand connections. + +Each transport module comes in two different versions, one to be used on +the server side and the other on the client side. + +@subsubsection TCP + +The @acronym{TCP} transport module uses a @acronym{TCP/IP} connection between +the server and the client. + +@example +  option transport-type tcp +@end example + +The @acronym{TCP} client module accepts the following options: + +@cartouche +@table @code +@item non-blocking-connect [no|off|on|yes] (on) +Whether to make the connection attempt asynchronous. +@item remote-port <n> (6996) +Server port to connect to. +@cindex DNS round robin +@item remote-host <hostname> * +Hostname or @acronym{IP} address of the server. If the host name resolves to +multiple IP addresses, all of them will be tried in a round-robin fashion. This +feature can be used to implement fail-over. +@end table +@end cartouche + +The @acronym{TCP} server module accepts the following options: + +@cartouche +@table @code +@item bind-address <address> (0.0.0.0) +The local interface on which the server should listen to requests. Default is to +listen on all interfaces. +@item listen-port <n> (6996) +The local port to listen on. +@end table +@end cartouche + +@subsubsection IB-SDP +@example +  option transport-type ib-sdp +@end example + +kernel implements socket interface for ib hardware. SDP is over ib-verbs. +This module accepts the same options as @command{tcp} + +@subsubsection ibverbs + +@example +  option transport-type tcp +@end example + +@cindex infiniband transport + +InfiniBand is a scalable switched fabric interconnect mechanism  +primarily used in high-performance computing. InfiniBand can deliver +data throughput of the order of 10 Gbit/s, with latencies of 4-5 ms. + +The @command{ib-verbs} transport accesses the InfiniBand hardware through +the ``verbs'' @acronym{API}, which is the lowest level of software access possible +and which gives the highest performance. On InfiniBand hardware, it is always +best to use @command{ib-verbs}. Use @command{ib-sdp} only if you cannot get +@command{ib-verbs} working for some reason.  + +The @command{ib-verbs} client module accepts the following options: + +@cartouche +@table @code +@item non-blocking-connect [no|off|on|yes] (on) +Whether to make the connection attempt asynchronous. +@item remote-port <n> (6996) +Server port to connect to. +@cindex DNS round robin +@item remote-host <hostname> * +Hostname or @acronym{IP} address of the server. If the host name resolves to +multiple IP addresses, all of them will be tried in a round-robin fashion. This +feature can be used to implement fail-over. +@end table +@end cartouche + +The @command{ib-verbs} server module accepts the following options: + +@cartouche +@table @code +@item bind-address <address> (0.0.0.0) +The local interface on which the server should listen to requests. Default is to +listen on all interfaces. +@item listen-port <n> (6996) +The local port to listen on. +@end table +@end cartouche + +The following options are common to both the client and server modules: + +If you are familiar with InfiniBand jargon, +the mode is used by GlusterFS is ``reliable connection-oriented channel transfer''. + +@cartouche +@table @code +@item ib-verbs-work-request-send-count <n> (64) +Length of the send queue in datagrams. [Reason to increase/decrease?] + +@item ib-verbs-work-request-recv-count <n> (64) +Length of the receive queue in datagrams. [Reason to increase/decrease?] + +@item ib-verbs-work-request-send-size <size> (128KB) +Size of each datagram that is sent. [Reason to increase/decrease?] + +@item ib-verbs-work-request-recv-size <size> (128KB) +Size of each datagram that is received. [Reason to increase/decrease?] + +@item ib-verbs-port <n> (1) +Port number for ib-verbs. + +@item ib-verbs-mtu [256|512|1024|2048|4096] (2048) +The Maximum Transmission Unit [Reason to increase/decrease?] + +@item ib-verbs-device-name <device-name> (first device in the list) +InfiniBand device to be used. +@end table +@end cartouche + +For maximum performance, you should ensure that the send/receive counts on both +the client and server are the same. + +ib-verbs is preferred over ib-sdp. + +@node Client protocol +@subsection Client +@example +type procotol/client +@end example + +The client translator enables the GlusterFS client to access a remote server's +translator tree. + +@cartouche +@table @code + +@item transport-type [tcp,ib-sdp,ib-verbs] (tcp) +The transport type to use. You should use the client versions of all the +transport modules (@command{tcp}, @command{ib-sdp},  +@command{ib-verbs}). +@item remote-subvolume <volume_name> * +The name of the volume on the remote host to attach to. Note that +this is @emph{not} the name of the @command{protocol/server} volume on the +server. It should be any volume under the server. +@item transport-timeout <n> (120- seconds) +Inactivity timeout. If a reply is expected and no activity takes place +on the connection within this time, the transport connection will be +broken, and a new connection will be attempted. +@end table +@end cartouche + +@node Server protocol +@subsection Server +@example +type protocol/server +@end example + +The server translator exports a translator tree and makes it accessible to +remote GlusterFS clients. + +@cartouche +@table @code +@item client-volume-filename <path> (<CONFDIR>/glusterfs-client.vol) +The volume specification file to use for the client. This is the file the +client will receive when it is invoked with the @command{--server} option  +(@ref{Client}). + +@item transport-type [tcp,ib-verbs,ib-sdp] (tcp) +The transport to use. You should use the server versions of all the transport +modules (@command{tcp}, @command{ib-sdp}, @command{ib-verbs}). + +@item auth.addr.<volume name>.allow <IP address wildcard pattern> +IP addresses of the clients that are allowed to attach to the specified volume. +This can be a wildcard. For example, a wildcard of the form @command{192.168.*.*} +allows any host in the @command{192.168.x.x} subnet to connect to the server. + +@end table +@end cartouche + +@node Clustering Translators +@section Clustering Translators + +The clustering translators are the most important GlusterFS +translators, since it is these that make GlusterFS a cluster +filesystem. These translators together enable GlusterFS to access an +arbitrarily large amount of storage, and provide @acronym{RAID}-like +redundancy and distribution over the entire cluster. + +There are three clustering translators: @strong{unify}, @strong{replicate}, +and @strong{stripe}.  The unify translator aggregates storage from +many server nodes. The replicate translator provides file replication. The stripe +translator allows a file to be spread across many server nodes. The following sections +look at each of these translators in detail. + +@menu +* Unify::                        +* Replicate::   +* Stripe::                       +@end menu + +@node Unify +@subsection Unify +@cindex unify (translator) +@cindex scheduler (unify) +@example +type cluster/unify +@end example + +The unify translator presents a `unified' view of all its sub-volumes. That is, +it makes the union of all its sub-volumes appear as a single volume. It is the +unify translator that gives GlusterFS the ability to access an arbitrarily  +large amount of storage. + +For unify to work correctly, certain invariants need to be maintained across +the entire network. These are: + +@cindex unify invariants +@itemize +@item The directory structure of all the sub-volumes must be identical. +@item A particular file can exist on only one of the sub-volumes. Phrasing it in another way, a pathname such as @command{/home/calvin/homework.txt}) is unique across the entire cluster. +@end itemize + +@tex +\vfill +@end tex +@page + +@center @image{unify,44pc,,,.pdf} + +Looking at the second requirement, you might wonder how one can +accomplish storing redundant copies of a file, if no file can exist +multiple times.  To answer, we must remember that these invariants are +from @emph{unify's perspective}.  A translator such as replicate at a lower +level in the translator tree than unify may subvert this picture.  + +The first invariant might seem quite tedious to ensure. We shall see +later that this is not so, since unify's @emph{self-heal} mechanism +takes care of maintaining it. + +The second invariant implies that unify needs some way to decide which file goes where. +Unify makes use of @emph{scheduler} modules for this purpose. + +When a file needs to be created, unify's scheduler decides upon the +sub-volume to be used to store the file. There are many schedulers +available, each using a different algorithm and suitable for different +purposes. + +The various schedulers are described in detail in the sections that follow. + +@subsubsection ALU +@cindex alu (scheduler) + +@example +  option scheduler alu +@end example + +ALU stands for "Adaptive Least Usage". It is the most advanced +scheduler available in GlusterFS. It balances the load across volumes +taking several factors in account. It adapts itself to changing I/O +patterns according to its configuration. When properly configured, it +can eliminate the need for regular tuning of the filesystem to keep +volume load nicely balanced. + +The ALU scheduler is composed of multiple least-usage +sub-schedulers. Each sub-scheduler keeps track of a certain type of +load, for each of the sub-volumes, getting statistics from +the sub-volumes themselves. The sub-schedulers are these: + +@itemize +@item disk-usage: The used and free disk space on the volume. + +@item read-usage: The amount of reading done from this volume. + +@item write-usage: The amount of writing done to this volume. + +@item open-files-usage: The number of files currently open from this volume. + +@item disk-speed-usage: The speed at which the disks are spinning. This is a constant value and therefore not very useful. +@end itemize + +The ALU scheduler needs to know which of these sub-schedulers to use, +and in which order to evaluate them. This is done through the +@command{option alu.order} configuration directive. + +Each sub-scheduler needs to know two things: when to kick in (the +entry-threshold), and how long to stay in control (the +exit-threshold). For example: when unifying three disks of 100GB, +keeping an exact balance of disk-usage is not necesary. Instead, there +could be a 1GB margin, which can be used to nicely balance other +factors, such as read-usage. The disk-usage scheduler can be told to +kick in only when a certain threshold of discrepancy is passed, such +as 1GB. When it assumes control under this condition, it will write +all subsequent data to the least-used volume. If it is doing so, it is +unwise to stop right after the values are below the entry-threshold +again, since that would make it very likely that the situation will +occur again very soon. Such a situation would cause the ALU to spend +most of its time disk-usage scheduling, which is unfair to the other +sub-schedulers. The exit-threshold therefore defines the amount of +data that needs to be written to the least-used disk, before control +is relinquished again. + +In addition to the sub-schedulers, the ALU scheduler also has "limits" +options. These can stop the creation of new files on a volume once +values drop below a certain threshold. For example, setting +@command{option alu.limits.min-free-disk 5GB} will stop the scheduling +of files to volumes that have less than 5GB of free disk space, +leaving the files on that disk some room to grow. + +The actual values you assign to the thresholds for sub-schedulers and +limits depend on your situation. If you have fast-growing files, +you'll want to stop file-creation on a disk much earlier than when +hardly any of your files are growing. If you care less about +disk-usage balance than about read-usage balance, you'll want a bigger +disk-usage scheduler entry-threshold and a smaller read-usage +scheduler entry-threshold. + +For thresholds defining a size, values specifying "KB", "MB" and "GB" +are allowed. For example: @command{option alu.limits.min-free-disk 5GB}. + +@cartouche +@table @code +@item alu.order <order> * ("disk-usage:write-usage:read-usage:open-files-usage:disk-speed") +@item alu.disk-usage.entry-threshold <size> (1GB) +@item alu.disk-usage.exit-threshold <size> (512MB) +@item alu.write-usage.entry-threshold <%> (25) +@item alu.write-usage.exit-threshold <%> (5) +@item alu.read-usage.entry-threshold <%> (25) +@item alu.read-usage.exit-threshold <%> (5) +@item alu.open-files-usage.entry-threshold <n> (1000) +@item alu.open-files-usage.exit-threshold <n> (100) +@item alu.limits.min-free-disk <%>  +@item alu.limits.max-open-files <n>  +@end table +@end cartouche + +@subsubsection Round Robin (RR) +@cindex rr (scheduler) + +@example +  option scheduler rr +@end example + +Round-Robin (RR) scheduler creates files in a round-robin +fashion. Each client will have its own round-robin loop. When your +files are mostly similar in size and I/O access pattern, this +scheduler is a good choice. RR scheduler checks for free disk space +on the server before scheduling, so you can know when to add +another server node. The default value of min-free-disk is 5% and is +checked on file creation calls, with atleast 10 seconds (by default)  +elapsing between two checks. + +Options: +@cartouche +@table @code +@item rr.limits.min-free-disk <%> (5) +Minimum free disk space a node must have for RR to schedule a file to it. +@item rr.refresh-interval <t> (10 seconds) +Time between two successive free disk space checks. +@end table +@end cartouche + +@subsubsection Random +@cindex random (scheduler) + +@example +  option scheduler random +@end example + +The random scheduler schedules file creation randomly among its child nodes. +Like the round-robin scheduler, it also checks for a minimum amount of free disk +space before scheduling a file to a node. + +@cartouche +@table @code +@item random.limits.min-free-disk <%> (5) +Minimum free disk space a node must have for random to schedule a file to it. +@item random.refresh-interval <t> (10 seconds) +Time between two successive free disk space checks. +@end table +@end cartouche + +@subsubsection NUFA +@cindex nufa (scheduler) + +@example +  option scheduler nufa +@end example + +It is common in many GlusterFS computing environments for all deployed +machines to act as both servers and clients. For example, a +research lab may have 40 workstations each with its own storage. All +of these workstations might act as servers exporting a volume as well +as clients accessing the entire cluster's storage.  In such a +situation, it makes sense to store locally created files on the local +workstation itself (assuming files are accessed most by the +workstation that created them). The Non-Uniform File Allocation (@acronym{NUFA}) +scheduler accomplishes that. + +@acronym{NUFA} gives the local system first priority for file creation +over other nodes. If the local volume does not have more free disk space +than a specified amount (5% by default) then @acronym{NUFA} schedules files +among the other child volumes in a round-robin fashion. + +@acronym{NUFA} is named after the similar strategy used for memory access, +@acronym{NUMA}@footnote{Non-Uniform Memory Access:  +@indicateurl{http://en.wikipedia.org/wiki/Non-Uniform_Memory_Access}}. + +@cartouche +@table @code +@item nufa.limits.min-free-disk <%> (5) +Minimum disk space that must be free (local or remote) for @acronym{NUFA} to schedule a +file to it. +@item nufa.refresh-interval <t> (10 seconds) +Time between two successive free disk space checks. +@item nufa.local-volume-name <volume>  +The name of the volume corresponding to the local system. This volume must be +one of the children of the unify volume. This option is mandatory. +@end table +@end cartouche + +@cindex namespace +@subsubsection Namespace +Namespace volume needed because: + - persistent inode numbers. + - file exists even when node is down. + +namespace files are simply touched. on every lookup it is checked. + +@cartouche +@table @code +@item namespace <volume> * +Name of the namespace volume (which should be one of the unify volume's children). +@item self-heal [on|off] (on) +Enable/disable self-heal. Unless you know what you are doing, do not disable self-heal. +@end table +@end cartouche + +@cindex self heal (unify) +@subsubsection Self Heal +    * When a 'lookup()/stat()' call is made on directory for the first +time, a self-heal call is made, which checks for the consistancy of +its child nodes. If an entry is present in storage node, but not in +namespace, that entry is created in namespace, and vica-versa. There +is an writedir() API introduced which is used for the same. It also +checks for permissions, and uid/gid consistencies. + +    * This check is also done when an server goes down and comes up. + +    * If one starts with an empty namespace export, but has data in +storage nodes, a 'find .>/dev/null' or 'ls -lR >/dev/null' should help +to build namespace in one shot. Even otherwise, namespace is built on +demand when a file is looked up for the first time. + +NOTE: There are some issues (Kernel 'Oops' msgs) seen with fuse-2.6.3, +when someone deletes namespace in backend, when glusterfs is +running. But with fuse-2.6.5, this issue is not there. + +@node Replicate +@subsection Replicate (formerly AFR) +@cindex Replicate +@example +type cluster/replicate +@end example + +Replicate provides @acronym{RAID}-1 like functionality for +GlusterFS. Replicate replicates files and directories across the +subvolumes. Hence if Replicate has four subvolumes, there will be +four copies of all files and directories. Replicate provides +high-availability, i.e., in case one of the subvolumes go down +(e. g. server crash, network disconnection) Replicate will still +service the requests using the redundant copies. + +Replicate also provides self-heal functionality, i.e., in case the +crashed servers come up, the outdated files and directories will be +updated with the latest versions. Replicate uses extended +attributes of the backend file system to track the versioning of files +and directories and provide the self-heal feature. + +@example +volume replicate-example + type cluster/replicate + subvolumes brick1 brick2 brick3 +end-volume +@end example + +This sample configuration will replicate all directories and files on +brick1, brick2 and brick3.  + +All the read operations happen from the first alive child. If all the +three sub-volumes are up, reads will be done from brick1; if brick1 is +down read will be done from brick2. In case read() was being done on +brick1 and it goes down, replicate transparently falls back to +brick2.  + +The next release of GlusterFS will add the following features: +@itemize +@item Ability to specify the sub-volume from which read operations are to be done (this will help users who have one of the sub-volumes as a local storage volume). +@item Allow scheduling of read operations amongst the sub-volumes in a round-robin fashion. +@end itemize + +The order of the subvolumes list should be same across all the 'replicate's as +they will be used for locking purposes. + +@cindex self heal (replicate) +@subsubsection Self Heal +Replicate has self-heal feature, which updates the outdated file and +directory copies by the most recent versions. For example consider the +following config: + +@example +volume replicate-example + type cluster/replicate + subvolumes brick1 brick2 +end-volume +@end example + +@subsubsection File self-heal + +Now if we create a file foo.txt on replicate-example, the file will be created +on brick1 and brick2. The file will have two extended attributes associated +with it in the backend filesystem. One is trusted.afr.createtime and the +other is trusted.afr.version. The trusted.afr.createtime xattr has the +create time (in terms of seconds since epoch) and trusted.afr.version +is a number that is incremented each time a file is modified. This increment +happens during close (incase any write was done before close). + +If brick1 goes down, we edit foo.txt the version gets incremented. Now +the brick1 comes back up, when we open() on foo.txt replicate will check if +their versions are same. If they are not same, the outdated copy is +replaced by the latest copy and its version is updated. After the sync +the open() proceeds in the usual manner and the application calling open() +can continue on its access to the file. + +If brick1 goes down, we delete foo.txt and create a file with the same +name again i.e foo.txt. Now brick1 comes back up, clearly there is a +chance that the version on brick1 being more than the version on brick2, +this is where createtime extended attribute helps in deciding which +the outdated copy is. Hence we need to consider both createtime and +version to decide on the latest copy. + +The version attribute is incremented during the close() call. Version +will not be incremented in case there was no write() done. In case the +fd that the close() gets was got by create() call, we also create +the createtime extended attribute. + +@subsubsection Directory self-heal + +Suppose brick1 goes down, we delete foo.txt, brick1 comes back up, now +we should not create foo.txt on brick2 but we should delete foo.txt +on brick1. We handle this situation by having the createtime and version +attribute on the directory similar to the file. when lookup() is done +on the directory, we compare the createtime/version attributes of the +copies and see which files needs to be deleted and delete those files +and update the extended attributes of the outdated directory copy. +Each time a directory is modified (a file or a subdirectory is created +or deleted inside the directory) and one of the subvols is down, we +increment the directory's version. + +lookup() is a call initiated by the kernel on a file or directory +just before any access to that file or directory. In glusterfs, by +default, lookup() will not be called in case it was called in the +past one second on that particular file or directory. + +The extended attributes can be seen in the backend filesystem using +the @command{getfattr} command. (@command{getfattr -n trusted.afr.version <file>}) + +@cartouche +@table @code +@item debug [on|off]  (off) +@item self-heal [on|off] (on) +@item replicate <pattern> (*:1) +@item lock-node <child_volume> (first child is used by default) +@end table +@end cartouche + +@node Stripe +@subsection Stripe +@cindex stripe (translator) +@example +type cluster/stripe +@end example + +The stripe translator distributes the contents of a file over its +sub-volumes.  It does this by creating a file equal in size to the +total size of the file on each of its sub-volumes. It then writes only +a part of the file to each sub-volume, leaving the rest of it empty.  +These empty regions are called `holes' in Unix terminology. The holes +do not consume any disk space. + +The diagram below makes this clear. + +@center @image{stripe,44pc,,,.pdf} + +You can configure stripe so that only filenames matching a pattern  +are striped. You can also configure the size of the data to be stored  +on each sub-volume. + +@cartouche +@table @code +@item block-size <pattern>:<size>  (*:0 no striping) +Distribute files matching @command{<pattern>} over the sub-volumes,  +storing at least @command{<size>} on each sub-volume. For example, + +@example +  option block-size *.mpg:1M +@end example + +distributes all files ending in @command{.mpg}, storing at least 1 MB on +each sub-volume. + +Any number of @command{block-size} option lines may be present, specifying +different sizes for different file name patterns. +@end table +@end cartouche + +@node Performance Translators +@section Performance Translators + +@menu +* Read Ahead::                   +* Write Behind::                 +* IO Threads::                   +* IO Cache:: +* Booster:: +@end menu + +@node Read Ahead +@subsection Read Ahead +@cindex read-ahead (translator) +@example +type performance/read-ahead +@end example + +The read-ahead translator pre-fetches data in advance on every read. +This benefits applications that mostly process files in sequential order, +since the next block of data will already be available by the time the +application is done with the current one.  + +Additionally, the read-ahead translator also behaves as a read-aggregator.  +Many small read operations are combined and issued as fewer, larger read +requests to the server. + +Read-ahead deals in ``pages'' as the unit of data fetched. The page size +is configurable, as is the ``page count'', which is the number of pages +that are pre-fetched. + +Read-ahead is best used with InfiniBand (using the ib-verbs transport).  +On FastEthernet and Gigabit Ethernet networks, +GlusterFS can achieve the link-maximum throughput even without +read-ahead, making it quite superflous. + +Note that read-ahead only happens if the reads are perfectly +sequential. If your application accesses data in a random fashion, +using read-ahead might actually lead to a performance loss, since +read-ahead will pointlessly fetch pages which won't be used by the +application. + +@cartouche +Options: +@table @code +@item page-size <n> (256KB) +The unit of data that is pre-fetched. +@item page-count <n> (2) +The number of pages that are pre-fetched. +@item force-atime-update [on|off|yes|no] (off|no) +Whether to force an access time (atime) update on the file on every read. Without +this, the atime will be slightly imprecise, as it will reflect the time when  +the read-ahead translator read the data, not when the application actually read it. +@end table +@end cartouche + +@node Write Behind +@subsection Write Behind +@cindex write-behind (translator) +@example +type performance/write-behind +@end example + +The write-behind translator improves the latency of a write operation. +It does this by relegating the write operation to the background and +returning to the application even as the write is in progress. Using the +write-behind translator, successive write requests can be pipelined. +This mode of write-behind operation is best used on the client side, to +enable decreased write latency for the application. + +The write-behind translator can also aggregate write requests. If the  +@command{aggregate-size} option is specified, then successive writes upto that +size are accumulated and written in a single operation. This mode of operation +is best used on the server side, as this will decrease the disk's head movement +when multiple files are being written to in parallel. + +The @command{aggregate-size} option has a default value of 128KB. Although +this works well for most users, you should always experiment with different values +to determine the one that will deliver maximum performance. This is because the +performance of write-behind depends on your interconnect, size of RAM, and the +work load. + +@cartouche +@table @code +@item aggregate-size <n> (128KB) +Amount of data to accumulate before doing a write +@item flush-behind [on|yes|off|no] (off|no) + +@end table +@end cartouche + +@node IO Threads +@subsection IO Threads +@cindex io-threads (translator) +@example +type performance/io-threads +@end example + +The IO threads translator is intended to increase the responsiveness +of the server to metadata operations by doing file I/O (read, write) +in a background thread.  Since the GlusterFS server is +single-threaded, using the IO threads translator can significantly +improve performance. This translator is best used on the server side, +loaded just below the server protocol translator. + +IO threads operates by handing out read and write requests to a separate thread. +The total number of threads in existence at a time is constant, and configurable. + +@cartouche +@table @code +@item thread-count <n> (1) +Number of threads to use. +@end table +@end cartouche + +@node IO Cache +@subsection IO Cache +@cindex io-cache (translator) +@example +type performance/io-cache +@end example + +The IO cache translator caches data that has been read. This is useful +if many applications read the same data multiple times, and if reads +are much more frequent than writes (for example, IO caching may be +useful in a web hosting environment, where most clients will simply +read some files and only a few will write to them). + +The IO cache translator reads data from its child in @command{page-size} chunks. +It caches data upto @command{cache-size} bytes. The cache is maintained as +a prioritized least-recently-used (@acronym{LRU}) list, with priorities determined +by user-specified patterns to match filenames. + +When the IO cache translator detects a write operation, the  +cache for that file is flushed. + +The IO cache translator periodically verifies the consistency of +cached data, using the modification times on the files. The verification timeout +is configurable. + +@cartouche +@table @code +@item page-size <n> (128KB) +Size of a page. +@item cache-size (n) (32MB) +Total amount of data to be cached. +@item force-revalidate-timeout <n> (1) +Timeout to force a cache consistency verification, in seconds. +@item priority <pattern> (*:0) +Filename patterns listed in order of priority. +@end table +@end cartouche + +@node Booster +@subsection Booster +@cindex booster +@example +  type performance/booster +@end example + +The booster translator gives applications a faster path to communicate +read and write requests to GlusterFS. Normally, all requests to GlusterFS from +applications go through FUSE, as indicated in @ref{Filesystems in Userspace}. +Using the booster translator in conjunction with the GlusterFS booster shared +library, an application can bypass the FUSE path and send read/write requests +directly to the GlusterFS client process. + +The booster mechanism consists of two parts: the booster translator, +and the booster shared library. The booster translator is meant to be +loaded on the client side, usually at the root of the translator tree. +The booster shared library should be @command{LD_PRELOAD}ed with the +application. + +The booster translator when loaded opens a Unix domain socket and +listens for read/write requests on it. The booster shared library +intercepts read and write system calls and sends the requests to the +GlusterFS process directly using the Unix domain socket, bypassing FUSE. +This leads to superior performance. + +Once you've loaded the booster translator in your volume specification file, you +can start your application as: + +@example +  $ LD_PRELOAD=/usr/local/bin/glusterfs-booster.so your_app +@end example + +The booster translator accepts no options. + +@node Features Translators +@section Features Translators  + +@menu +* POSIX Locks::                  +* Fixed ID::                     +@end menu + +@node POSIX Locks +@subsection POSIX Locks +@cindex record locking +@cindex fcntl +@cindex posix-locks (translator) +@example +type features/posix-locks +@end example + +This translator provides storage independent POSIX record locking +support (@command{fcntl} locking). Typically you'll want to load this on the +server side, just above the @acronym{POSIX} storage translator. Using this +translator you can get both advisory locking and mandatory locking +support.  It also handles @command{flock()} locks properly. + +Caveat: Consider a file that does not have its mandatory locking bits +(+setgid, -group execution) turned on. Assume that this file is now +opened by a process on a client that has the write-behind xlator +loaded. The write-behind xlator does not cache anything for files +which have mandatory locking enabled, to avoid incoherence. Let's say +that mandatory locking is now enabled on this file through another +client. The former client will not know about this change, and +write-behind may erroneously report a write as being successful when +in fact it would fail due to the region it is writing to being locked. + +There seems to be no easy way to fix this. To work around this +problem, it is recommended that you never enable the mandatory bits on +a file while it is open. + +@cartouche +@table @code +@item mandatory [on|off] (on) +Turns mandatory locking on. +@end table +@end cartouche + +@node Fixed ID +@subsection Fixed ID +@cindex fixed-id (translator) +@example +type features/fixed-id +@end example + +The fixed ID translator makes all filesystem requests from the client +to appear to be coming from a fixed, specified +@acronym{UID}/@acronym{GID}, regardless of which user actually +initiated the request. + +@cartouche +@table @code +@item fixed-uid <n> [if not set, not used] +The @acronym{UID} to send to the server +@item fixed-gid <n> [if not set, not used] +The @acronym{GID} to send to the server +@end table +@end cartouche + +@node Miscellaneous Translators +@section Miscellaneous Translators + +@menu +* ROT-13::                       +* Trace::                        +@end menu + +@node ROT-13 +@subsection ROT-13 +@cindex rot-13 (translator) +@example +type encryption/rot-13 +@end example + +@acronym{ROT-13} is a toy translator that can ``encrypt'' and ``decrypt'' file +contents using the @acronym{ROT-13} algorithm. @acronym{ROT-13} is a trivial +algorithm that rotates each alphabet by thirteen places. Thus, 'A' becomes 'N', +'B' becomes 'O', and 'Z' becomes 'M'. + +It goes without saying that you shouldn't use this translator if you need  +@emph{real} encryption (a future release of GlusterFS will have real encryption +translators). + +@cartouche +@table @code +@item encrypt-write [on|off] (on) +Whether to encrypt on write +@item decrypt-read [on|off] (on) +Whether to decrypt on read +@end table +@end cartouche + +@node Trace +@subsection Trace +@cindex trace (translator) +@example +type debug/trace      +@end example + +The trace translator is intended for debugging purposes. When loaded, it +logs all the system calls received by the server or client (wherever +trace is loaded), their arguments, and the results. You must use a GlusterFS log +level of DEBUG (See @ref{Running GlusterFS}) for trace to work. + +Sample trace output (lines have been wrapped for readability): +@cartouche +@example +2007-10-30 00:08:58 D [trace.c:1579:trace_opendir] trace: callid: 68  +(*this=0x8059e40, loc=0x8091984 @{path=/iozone3_283, inode=0x8091f00@},  + fd=0x8091d50) + +2007-10-30 00:08:58 D [trace.c:630:trace_opendir_cbk] trace:  +(*this=0x8059e40, op_ret=4, op_errno=1, fd=0x8091d50) + +2007-10-30 00:08:58 D [trace.c:1602:trace_readdir] trace: callid: 69  +(*this=0x8059e40, size=4096, offset=0 fd=0x8091d50) + +2007-10-30 00:08:58 D [trace.c:215:trace_readdir_cbk] trace:  +(*this=0x8059e40, op_ret=0, op_errno=0, count=4) + +2007-10-30 00:08:58 D [trace.c:1624:trace_closedir] trace: callid: 71  +(*this=0x8059e40, *fd=0x8091d50) + +2007-10-30 00:08:58 D [trace.c:809:trace_closedir_cbk] trace:  +(*this=0x8059e40, op_ret=0, op_errno=1) +@end example +@end cartouche + +@node Usage Scenarios +@chapter Usage Scenarios + +@section Advanced Striping + +This section is based on the Advanced Striping tutorial written by +Anand Avati on the GlusterFS wiki +@footnote{http://gluster.org/docs/index.php/Mixing_Striped_and_Regular_Files}. + +@subsection Mixed Storage Requirements + +There are two ways of scheduling the I/O. One at file level (using +unify translator) and other at block level (using stripe +translator). Striped I/O is good for files that are potentially large +and require high parallel throughput (for example, a single file of +400GB being accessed by 100s and 1000s of systems simultaneously and +randomly). For most of the cases, file level scheduling works best. + +In the real world, it is desirable to mix file level and block level +scheduling on a single storage volume. Alternatively users can choose +to have two separate volumes and hence two mount points, but the +applications may demand a single storage system to host both. + +This document explains how to mix file level scheduling with stripe.  + +@subsection Configuration Brief + +This setup demonstrates how users can configure unify translator with +appropriate I/O scheduler for file level scheduling and strip for only +matching patterns. This way, GlusterFS chooses appropriate I/O profile +and knows how to efficiently handle both the types of data. + +A simple technique to achieve this effect is to create a stripe set of +unify and stripe blocks, where unify is the first sub-volume. Files +that do not match the stripe policy passed on to first unify +sub-volume and inturn scheduled arcoss the cluster using its file +level I/O scheduler. + +@image{advanced-stripe,44pc,,,.pdf} + +@subsection Preparing GlusterFS Envoronment + +Create the directories /export/namespace, /export/unify and +/export/stripe on all the storage bricks. + + Place the following server and client volume spec file under +/etc/glusterfs (or appropriate installed path) and replace the IP +addresses / access control fields to match your environment. + +@cartouche +@example +  ## file: /etc/glusterfs/glusterfsd.vol +   volume posix-unify +           type storage/posix +           option directory /export/for-unify +   end-volume +  +   volume posix-stripe +           type storage/posix +           option directory /export/for-stripe +   end-volume +  +   volume posix-namespace +           type storage/posix +           option directory /export/for-namespace +   end-volume +   +   volume server +           type protocol/server +           option transport-type tcp +           option auth.addr.posix-unify.allow 192.168.1.* +           option auth.addr.posix-stripe.allow 192.168.1.* +           option auth.addr.posix-namespace.allow 192.168.1.* +           subvolumes posix-unify posix-stripe posix-namespace +   end-volume +@end example +@end cartouche + +@cartouche +@example + ## file: /etc/glusterfs/glusterfs.vol +   volume client-namespace +     type protocol/client +     option transport-type tcp +     option remote-host 192.168.1.1 +     option remote-subvolume posix-namespace +   end-volume + +   volume client-unify-1 +     type protocol/client +     option transport-type tcp +     option remote-host 192.168.1.1 +     option remote-subvolume posix-unify +   end-volume + +   volume client-unify-2 +     type protocol/client +     option transport-type tcp +     option remote-host 192.168.1.2 +     option remote-subvolume posix-unify +   end-volume + +   volume client-unify-3 +     type protocol/client +     option transport-type tcp +     option remote-host 192.168.1.3 +     option remote-subvolume posix-unify +   end-volume + +   volume client-unify-4 +     type protocol/client +     option transport-type tcp +     option remote-host 192.168.1.4 +     option remote-subvolume posix-unify +   end-volume +  +   volume client-stripe-1 +     type protocol/client +     option transport-type tcp +     option remote-host 192.168.1.1 +     option remote-subvolume posix-stripe +   end-volume + +   volume client-stripe-2 +     type protocol/client +     option transport-type tcp +     option remote-host 192.168.1.2 +     option remote-subvolume posix-stripe +   end-volume + +   volume client-stripe-3 +     type protocol/client +     option transport-type tcp +     option remote-host 192.168.1.3 +     option remote-subvolume posix-stripe +   end-volume + +   volume client-stripe-4 +     type protocol/client +     option transport-type tcp +     option remote-host 192.168.1.4 +     option remote-subvolume posix-stripe +   end-volume +   +   volume unify +     type cluster/unify +     option scheduler rr +     subvolumes cluster-unify-1 cluster-unify-2 cluster-unify-3 cluster-unify-4 +   end-volume +  +   volume stripe +     type cluster/stripe +     option block-size *.img:2MB # All files ending with .img are striped with 2MB stripe block size. +     subvolumes unify cluster-stripe-1 cluster-stripe-2 cluster-stripe-3 cluster-stripe-4 +   end-volume +@end example +@end cartouche + + +Bring up the Storage + +Starting GlusterFS Server: If you have installed through binary +package, you can start the service through init.d startup script. If +not: + +@example +[root@@server]# glusterfsd +@end example + +Mounting GlusterFS Volumes: + +@example +[root@@client]# glusterfs -s [BRICK-IP-ADDRESS] /mnt/cluster +@end example + +Improving upon this Setup + +Infiniband Verbs RDMA transport is much faster than TCP/IP GigE +transport. + +Use of performance translators such as read-ahead, write-behind, +io-cache, io-threads, booster is recommended. + +Replace round-robin (rr) scheduler with ALU to handle more dynamic +storage environments. + +@node Troubleshooting +@chapter Troubleshooting + +This chapter is a general troubleshooting guide to GlusterFS. It lists +common GlusterFS server and client error messages, debugging hints, and +concludes with the suggested procedure to report bugs in GlusterFS. + +@section GlusterFS error messages + +@subsection Server errors + +@example +glusterfsd: FATAL: could not open specfile:  +'/etc/glusterfs/glusterfsd.vol' +@end example + +The GlusterFS server expects the volume specification file to be  +at @command{/etc/glusterfs/glusterfsd.vol}. The example +specification file will be installed as  +@command{/etc/glusterfs/glusterfsd.vol.sample}. You need to edit +it and rename it, or provide a different specification file using +the @command{--spec-file} command line option (See @ref{Server}). + +@vskip 4ex + +@example +gf_log_init: failed to open logfile "/usr/var/log/glusterfs/glusterfsd.log"  +             (Permission denied) +@end example + +You don't have permission to create files in the +@command{/usr/var/log/glusterfs} directory. Make sure you are running +GlusterFS as root. Alternatively, specify a different path for the log +file using the @command{--log-file} option (See @ref{Server}). + +@subsection Client errors + +@example +fusermount: failed to access mountpoint /mnt:  +            Transport endpoint is not connected +@end example + +A previous failed (or hung) mount of GlusterFS is preventing it from being +mounted again in the same location. The fix is to do: + +@example +# umount /mnt +@end example + +and try mounting again. + +@vskip 4ex + +@strong{``Transport endpoint is not connected''.} + +If you get this error when you try a command such as @command{ls} or @command{cat}, +it means the GlusterFS mount did not succeed. Try running GlusterFS in @command{DEBUG} +logging level and study the log messages to discover the cause. + +@vskip 4ex + +@strong{``Connect to server failed'', ``SERVER-ADDRESS: Connection refused''.} + +GluserFS Server is not running or dead. Check your network +connections and firewall settings. To check if the server is reachable, +try: + +@example +telnet IP-ADDRESS 6996 +@end example + +If the server is accessible, your `telnet' command should connect and +block. If not you will see an error message such as @command{telnet: Unable to +connect to remote host: Connection refused}. 6996 is the default +GlusterFS port. If you have changed it, then use the corresponding +port instead. + +@vskip 4ex + +@example +gf_log_init: failed to open logfile "/usr/var/log/glusterfs/glusterfs.log"  +             (Permission denied) +@end example + +You don't have permission to create files in the +@command{/usr/var/log/glusterfs} directory. Make sure you are running +GlusterFS as root. Alternatively, specify a different path for the log +file using the @command{--log-file} option (See @ref{Client}). + +@section FUSE error messages +@command{modprobe fuse} fails with: ``Unknown symbol in module, or unknown parameter''. +@cindex Redhat Enterprise Linux + +If you are using fuse-2.6.x on Redhat Enterprise Linux Work Station 4 +and Advanced Server 4 with 2.6.9-42.ELlargesmp, 2.6.9-42.ELsmp, +2.6.9-42.EL kernels and get this error while loading @acronym{FUSE} kernel +module, you need to apply the following patch. + +For fuse-2.6.2: + +@indicateurl{http://ftp.zresearch.com/pub/gluster/glusterfs/fuse/fuse-2.6.2-rhel-build.patch} + +For fuse-2.6.3: + +@indicateurl{http://ftp.zresearch.com/pub/gluster/glusterfs/fuse/fuse-2.6.3-rhel-build.patch} + +@section AppArmour and GlusterFS +@cindex AppArmour +@cindex OpenSuSE +Under OpenSuSE GNU/Linux, the AppArmour security feature does not +allow GlusterFS to create temporary files or network socket +connections even while running as root. You will see error messages +like `Unable to open log file: Operation not permitted' or `Connection +refused'. Disabling AppArmour using YaST or properly configuring +AppArmour to recognize @command{glusterfsd} or @command{glusterfs}/@command{fusermount} +should solve the problem. + +@section Reporting a bug + +If you encounter a bug in GlusterFS, please follow the below +guidelines when you report it to the mailing list. Be sure to report +it! User feedback is crucial to the health of the project and we value +it highly. + +@subsection General instructions + +When running GlusterFS in a non-production environment, be sure to +build it with the following command: + +@example + $ make CFLAGS='-g -O0 -DDEBUG' +@end example + +This includes debugging information which will be helpful in getting +backtraces (see below) and also disable optimization. Enabling +optimization can result in incorrect line numbers being reported to +gdb. + +@subsection Volume specification files + +Attach all relevant server and client spec files you were using when +you encountered the bug. Also tell us details of your setup, i.e., how +many clients and how many servers. + +@subsection Log files + +Set the loglevel of your client and server programs to @acronym{DEBUG} (by +passing the -L @acronym{DEBUG} option) and attach the log files with your bug +report. Obviously, if only the client is failing (for example), you +only need to send us the client log file. + +@subsection Backtrace + +If GlusterFS has encountered a segmentation fault or has crashed for +some other reason, include the backtrace with the bug report. You can +get the backtrace using the following procedure. + +Run the GlusterFS client or server inside gdb. + +@example + $ gdb ./glusterfs + (gdb) set args -f client.spec -N -l/path/to/log/file -LDEBUG /mnt/point + (gdb) run +@end example + +Now when the process segfaults, you can get the backtrace by typing: + +@example + (gdb) bt +@end example + +If the GlusterFS process has crashed and dumped a core file (you can +find this in / if running as a daemon and in the current directory +otherwise), you can do: + +@example + $ gdb /path/to/glusterfs /path/to/core.<pid> +@end example + +and then get the backtrace. + +If the GlusterFS server or client seems to be hung, then you can get +the backtrace by attaching gdb to the process. First get the @command{PID} of +the process (using ps), and then do: + +@example + $ gdb ./glusterfs <pid> +@end example + +Press Ctrl-C to interrupt the process and then generate the backtrace. + +@subsection Reproducing the bug + +If the bug is reproducible, please include the steps necessary to do +so. If the bug is not reproducible, send us the bug report anyway. + +@subsection Other information + +If you think it is relevant, send us also the version of @acronym{FUSE} you're +using, the kernel version, platform. + +@node GNU Free Documentation Licence +@appendix GNU Free Documentation Licence +@include fdl.texi + +@node Index +@unnumbered Index +@printindex cp + +@bye diff --git a/doc/user-guide/xlator.odg b/doc/user-guide/xlator.odg Binary files differnew file mode 100644 index 00000000000..179a65f6e26 --- /dev/null +++ b/doc/user-guide/xlator.odg diff --git a/doc/user-guide/xlator.pdf b/doc/user-guide/xlator.pdf Binary files differnew file mode 100644 index 00000000000..a07e14d67d2 --- /dev/null +++ b/doc/user-guide/xlator.pdf  | 
