doc: Added admin guide

Change-Id: Ic60558dee0d20df0c2a1bf41e4bd072ae4774912 BUG: 811311 Signed-off-by: Vijay Bellur <vijay@gluster.com> Reviewed-on: http://review.gluster.com/3209 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: divya M N <divyasetty@gmail.com> Reviewed-by: divya M N <divyasetty@gmail.com> Reviewed-by: Anand Avati <avati@redhat.com>
author: Vijay Bellur <vijay@gluster.com> 2012-04-23 12:51:50 +0530
committer: Anand Avati <avati@redhat.com> 2012-04-23 14:51:35 -0700
commit: 4c9e8fad23836d87b0c4327e990c789630fe5b97 (patch)
tree: e2402a2a0b8135add5cf0c6d810f141253877259 /doc/admin-guide/en-US/admin_Hadoop.xml
parent: 664daecef49d5e497bb5dd867fc1f51b046d4bf2 (diff)
1 files changed, 244 insertions, 0 deletions
diff --git a/doc/admin-guide/en-US/admin_Hadoop.xml b/doc/admin-guide/en-US/admin_Hadoop.xml
new file mode 100644
index 00000000000..08bac89615b
--- /dev/null
+++ b/doc/admin-guide/en-US/admin_Hadoop.xml
@@ -0,0 +1,244 @@
+<?xml version='1.0' encoding='UTF-8'?>
+<!-- This document was created with Syntext Serna Free. --><!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
+<!ENTITY % BOOK_ENTITIES SYSTEM "Administration_Guide.ent">
+%BOOK_ENTITIES;
+]>
+<chapter id="chap-Administration_Guide-Hadoop">
+  <title>Managing Hadoop Compatible Storage </title>
+  <para>GlusterFS  provides compatibility for Apache Hadoop and it uses the standard file system
+APIs available in Hadoop to provide a new storage option for Hadoop deployments. Existing
+MapReduce based applications can use GlusterFS   seamlessly. This new functionality opens up data
+within Hadoop deployments to any file-based or object-based application.
+
+ </para>
+  <section id="sect-Administration_Guide-Hadoop-Introduction-Architecture_Overview">
+    <title>Architecture Overview </title>
+    <para>The following diagram illustrates Hadoop integration with GlusterFS:
+<mediaobject>
+        <imageobject>
+          <imagedata fileref="images/Hadoop_Architecture.png"/>
+        </imageobject>
+      </mediaobject>
+  </para>
+  </section>
+  <section id="sect-Administration_Guide-Hadoop-Introduction-Advantages">
+    <title>Advantages </title>
+    <para>
+The following are the advantages of Hadoop Compatible Storage with GlusterFS:
+
+   
+  </para>
+    <itemizedlist>
+      <listitem>
+        <para>Provides simultaneous file-based and object-based access within Hadoop.
+</para>
+      </listitem>
+      <listitem>
+        <para>Eliminates the centralized metadata server.
+</para>
+      </listitem>
+      <listitem>
+        <para>Provides compatibility with MapReduce applications and rewrite is not required.
+</para>
+      </listitem>
+      <listitem>
+        <para>Provides a fault tolerant file system.
+</para>
+      </listitem>
+    </itemizedlist>
+  </section>
+  <section>
+    <title>Preparing to Install Hadoop Compatible Storage</title>
+    <para>This section provides information on pre-requisites and list of dependencies that will be installed
+during installation of Hadoop compatible storage.
+
+</para>
+    <section id="sect-Administration_Guide-Hadoop-Preparation">
+      <title>Pre-requisites </title>
+      <para>The following are the pre-requisites to install  Hadoop Compatible
+Storage :
+
+  </para>
+      <itemizedlist>
+        <listitem>
+          <para>Hadoop 0.20.2 is installed, configured, and is running on all the machines in the cluster.
+</para>
+        </listitem>
+        <listitem>
+          <para>Java Runtime Environment
+</para>
+        </listitem>
+        <listitem>
+          <para>Maven (mandatory only if you are building the plugin from the source)
+</para>
+        </listitem>
+        <listitem>
+          <para>JDK (mandatory only if you are building the plugin from the source)
+</para>
+        </listitem>
+        <listitem>
+          <para>getfattr
+- command line utility</para>
+        </listitem>
+      </itemizedlist>
+    </section>
+  </section>
+  <section>
+    <title>Installing, and Configuring Hadoop Compatible Storage</title>
+    <para>This section describes how to install and configure Hadoop Compatible Storage in your storage
+environment and verify that it is functioning correctly.
+
+</para>
+    <orderedlist>
+      <para>To install and configure Hadoop compatible storage:</para>
+      <listitem>
+        <para>Download  <filename>glusterfs-hadoop-0.20.2-0.1.x86_64.rpm</filename> file to each server on your cluster. You can download the file from <ulink url="http://download.gluster.com/pub/gluster/glusterfs/qa-releases/3.3-beta-2/glusterfs-hadoop-0.20.2-0.1.x86_64.rpm"/>.
+
+</para>
+      </listitem>
+      <listitem>
+        <para>To install Hadoop Compatible Storage on all servers in your cluster, run the following command:
+</para>
+        <para><command># rpm –ivh --nodeps glusterfs-hadoop-0.20.2-0.1.x86_64.rpm</command>
+</para>
+        <para>The following files will be extracted:
+ </para>
+        <itemizedlist>
+          <listitem>
+            <para>/usr/local/lib/glusterfs-<replaceable>Hadoop-version-gluster_plugin_version</replaceable>.jar </para>
+          </listitem>
+          <listitem>
+            <para> /usr/local/lib/conf/core-site.xml</para>
+          </listitem>
+        </itemizedlist>
+      </listitem>
+      <listitem>
+        <para>(Optional) To install Hadoop Compatible Storage in a different location, run the following
+command:
+</para>
+        <para><command># rpm –ivh --nodeps –prefix /usr/local/glusterfs/hadoop glusterfs-hadoop- 0.20.2-0.1.x86_64.rpm</command>
+</para>
+      </listitem>
+      <listitem>
+        <para>Edit the <filename>conf/core-site.xml</filename> file. The following is the sample <filename>conf/core-site.xml</filename> file:
+</para>
+        <para><programlisting>&lt;configuration&gt;
+  &lt;property&gt;
+    &lt;name&gt;fs.glusterfs.impl&lt;/name&gt;
+    &lt;value&gt;org.apache.hadoop.fs.glusterfs.Gluster FileSystem&lt;/value&gt;
+&lt;/property&gt;
+
+&lt;property&gt;
+   &lt;name&gt;fs.default.name&lt;/name&gt;
+   &lt;value&gt;glusterfs://fedora1:9000&lt;/value&gt;
+&lt;/property&gt;
+
+&lt;property&gt;
+   &lt;name&gt;fs.glusterfs.volname&lt;/name&gt;
+   &lt;value&gt;hadoopvol&lt;/value&gt;
+&lt;/property&gt;  
+ 
+&lt;property&gt;
+   &lt;name&gt;fs.glusterfs.mount&lt;/name&gt;
+   &lt;value&gt;/mnt/glusterfs&lt;/value&gt;
+&lt;/property&gt;
+
+&lt;property&gt;
+   &lt;name&gt;fs.glusterfs.server&lt;/name&gt;
+   &lt;value&gt;fedora2&lt;/value&gt;
+&lt;/property&gt;
+
+&lt;property&gt;
+   &lt;name&gt;quick.slave.io&lt;/name&gt;
+   &lt;value&gt;Off&lt;/value&gt;
+&lt;/property&gt;
+&lt;/configuration&gt;
+</programlisting></para>
+        <para>The following are the configurable fields:
+</para>
+        <para><informaltable frame="none">
+            <tgroup cols="3">
+              <colspec colnum="1" colname="c0" colsep="0"/>
+              <colspec colnum="2" colname="c1" colsep="0"/>
+              <colspec colnum="3" colname="c2" colsep="0"/>
+              <thead>
+                <row>
+                  <entry>Property Name </entry>
+                  <entry>Default Value </entry>
+                  <entry>Description </entry>
+                </row>
+              </thead>
+              <tbody>
+                <row>
+                  <entry>fs.default.name </entry>
+                  <entry>glusterfs://fedora1:9000</entry>
+                  <entry>Any hostname in the cluster as the server and any port number. </entry>
+                </row>
+                <row>
+                  <entry>fs.glusterfs.volname </entry>
+                  <entry>hadoopvol </entry>
+                  <entry>GlusterFS volume to mount. </entry>
+                </row>
+                <row>
+                  <entry>fs.glusterfs.mount </entry>
+                  <entry>/mnt/glusterfs</entry>
+                  <entry>The directory used to fuse mount the volume.</entry>
+                </row>
+                <row>
+                  <entry>fs.glusterfs.server </entry>
+                  <entry>fedora2</entry>
+                  <entry>Any hostname or IP address on the cluster except the client/master. </entry>
+                </row>
+                <row>
+                  <entry>quick.slave.io </entry>
+                  <entry>Off </entry>
+                  <entry>Performance tunable option. If this option is set to On, the plugin will try to perform I/O directly from the disk file system (like ext3 or ext4) the file resides on. Hence read performance will improve and job would run faster. <note>
+                      <para>This option is not tested widely</para>
+                    </note></entry>
+                </row>
+              </tbody>
+            </tgroup>
+          </informaltable></para>
+      </listitem>
+      <listitem>
+        <para>Create a soft link in Hadoop’s library and configuration directory for the downloaded files (in
+Step 3) using the following commands:
+</para>
+        <para><command># ln -s <replaceable>&lt;target location&gt; &lt;source location</replaceable>&gt;</command>
+</para>
+        <para>For example,
+</para>
+        <para><command># ln –s /usr/local/lib/glusterfs-0.20.2-0.1.jar <replaceable>$HADOOP_HOME</replaceable>/lib/glusterfs-0.20.2-0.1.jar</command>
+</para>
+        <para><command># ln –s /usr/local/lib/conf/core-site.xml <replaceable>$HADOOP_HOME</replaceable>/conf/core-site.xml </command></para>
+      </listitem>
+      <listitem>
+        <para> (Optional) You can run the following command on Hadoop master to build the plugin and deploy
+it along with core-site.xml file, instead of repeating the above steps:
+</para>
+        <para><command># build-deploy-jar.py -d <replaceable>$HADOOP_HOME</replaceable> -c </command></para>
+      </listitem>
+    </orderedlist>
+  </section>
+  <section>
+    <title>Starting and Stopping the Hadoop MapReduce Daemon</title>
+    <para>To start and stop MapReduce daemon</para>
+    <itemizedlist>
+      <listitem>
+        <para>To start MapReduce daemon manually, enter the following command:
+</para>
+        <para><command># <replaceable>$HADOOP_HOME</replaceable>/bin/start-mapred.sh</command>
+</para>
+      </listitem>
+      <listitem>
+        <para>To stop MapReduce daemon manually, enter the following command:
+</para>
+        <para><command># <replaceable>$HADOOP_HOME</replaceable>/bin/stop-mapred.sh </command></para>
+      </listitem>
+    </itemizedlist>
+    <para><note>
+        <para>You must start Hadoop MapReduce daemon on all servers.
+</para>
+      </note></para>
+  </section>
+</chapter>
author	Vijay Bellur <vijay@gluster.com>	2012-04-23 12:51:50 +0530
committer	Anand Avati <avati@redhat.com>	2012-04-23 14:51:35 -0700
commit	4c9e8fad23836d87b0c4327e990c789630fe5b97 (patch)
tree	e2402a2a0b8135add5cf0c6d810f141253877259 /doc/admin-guide/en-US/admin_Hadoop.xml
parent	664daecef49d5e497bb5dd867fc1f51b046d4bf2 (diff)