diff options
Diffstat (limited to 'doc/legacy/docbook/admin_Hadoop.xml')
-rw-r--r-- | doc/legacy/docbook/admin_Hadoop.xml | 244 |
1 files changed, 0 insertions, 244 deletions
diff --git a/doc/legacy/docbook/admin_Hadoop.xml b/doc/legacy/docbook/admin_Hadoop.xml deleted file mode 100644 index 08bac89615b..00000000000 --- a/doc/legacy/docbook/admin_Hadoop.xml +++ /dev/null @@ -1,244 +0,0 @@ -<?xml version='1.0' encoding='UTF-8'?> -<!-- This document was created with Syntext Serna Free. --><!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [ -<!ENTITY % BOOK_ENTITIES SYSTEM "Administration_Guide.ent"> -%BOOK_ENTITIES; -]> -<chapter id="chap-Administration_Guide-Hadoop"> - <title>Managing Hadoop Compatible Storage </title> - <para>GlusterFS provides compatibility for Apache Hadoop and it uses the standard file system -APIs available in Hadoop to provide a new storage option for Hadoop deployments. Existing -MapReduce based applications can use GlusterFS seamlessly. This new functionality opens up data -within Hadoop deployments to any file-based or object-based application. - - </para> - <section id="sect-Administration_Guide-Hadoop-Introduction-Architecture_Overview"> - <title>Architecture Overview </title> - <para>The following diagram illustrates Hadoop integration with GlusterFS: -<mediaobject> - <imageobject> - <imagedata fileref="images/Hadoop_Architecture.png"/> - </imageobject> - </mediaobject> - </para> - </section> - <section id="sect-Administration_Guide-Hadoop-Introduction-Advantages"> - <title>Advantages </title> - <para> -The following are the advantages of Hadoop Compatible Storage with GlusterFS: - - - </para> - <itemizedlist> - <listitem> - <para>Provides simultaneous file-based and object-based access within Hadoop. -</para> - </listitem> - <listitem> - <para>Eliminates the centralized metadata server. -</para> - </listitem> - <listitem> - <para>Provides compatibility with MapReduce applications and rewrite is not required. -</para> - </listitem> - <listitem> - <para>Provides a fault tolerant file system. -</para> - </listitem> - </itemizedlist> - </section> - <section> - <title>Preparing to Install Hadoop Compatible Storage</title> - <para>This section provides information on pre-requisites and list of dependencies that will be installed -during installation of Hadoop compatible storage. - -</para> - <section id="sect-Administration_Guide-Hadoop-Preparation"> - <title>Pre-requisites </title> - <para>The following are the pre-requisites to install Hadoop Compatible -Storage : - - </para> - <itemizedlist> - <listitem> - <para>Hadoop 0.20.2 is installed, configured, and is running on all the machines in the cluster. -</para> - </listitem> - <listitem> - <para>Java Runtime Environment -</para> - </listitem> - <listitem> - <para>Maven (mandatory only if you are building the plugin from the source) -</para> - </listitem> - <listitem> - <para>JDK (mandatory only if you are building the plugin from the source) -</para> - </listitem> - <listitem> - <para>getfattr -- command line utility</para> - </listitem> - </itemizedlist> - </section> - </section> - <section> - <title>Installing, and Configuring Hadoop Compatible Storage</title> - <para>This section describes how to install and configure Hadoop Compatible Storage in your storage -environment and verify that it is functioning correctly. - -</para> - <orderedlist> - <para>To install and configure Hadoop compatible storage:</para> - <listitem> - <para>Download <filename>glusterfs-hadoop-0.20.2-0.1.x86_64.rpm</filename> file to each server on your cluster. You can download the file from <ulink url="http://download.gluster.com/pub/gluster/glusterfs/qa-releases/3.3-beta-2/glusterfs-hadoop-0.20.2-0.1.x86_64.rpm"/>. - -</para> - </listitem> - <listitem> - <para>To install Hadoop Compatible Storage on all servers in your cluster, run the following command: -</para> - <para><command># rpm –ivh --nodeps glusterfs-hadoop-0.20.2-0.1.x86_64.rpm</command> -</para> - <para>The following files will be extracted: - </para> - <itemizedlist> - <listitem> - <para>/usr/local/lib/glusterfs-<replaceable>Hadoop-version-gluster_plugin_version</replaceable>.jar </para> - </listitem> - <listitem> - <para> /usr/local/lib/conf/core-site.xml</para> - </listitem> - </itemizedlist> - </listitem> - <listitem> - <para>(Optional) To install Hadoop Compatible Storage in a different location, run the following -command: -</para> - <para><command># rpm –ivh --nodeps –prefix /usr/local/glusterfs/hadoop glusterfs-hadoop- 0.20.2-0.1.x86_64.rpm</command> -</para> - </listitem> - <listitem> - <para>Edit the <filename>conf/core-site.xml</filename> file. The following is the sample <filename>conf/core-site.xml</filename> file: -</para> - <para><programlisting><configuration> - <property> - <name>fs.glusterfs.impl</name> - <value>org.apache.hadoop.fs.glusterfs.Gluster FileSystem</value> -</property> - -<property> - <name>fs.default.name</name> - <value>glusterfs://fedora1:9000</value> -</property> - -<property> - <name>fs.glusterfs.volname</name> - <value>hadoopvol</value> -</property> - -<property> - <name>fs.glusterfs.mount</name> - <value>/mnt/glusterfs</value> -</property> - -<property> - <name>fs.glusterfs.server</name> - <value>fedora2</value> -</property> - -<property> - <name>quick.slave.io</name> - <value>Off</value> -</property> -</configuration> -</programlisting></para> - <para>The following are the configurable fields: -</para> - <para><informaltable frame="none"> - <tgroup cols="3"> - <colspec colnum="1" colname="c0" colsep="0"/> - <colspec colnum="2" colname="c1" colsep="0"/> - <colspec colnum="3" colname="c2" colsep="0"/> - <thead> - <row> - <entry>Property Name </entry> - <entry>Default Value </entry> - <entry>Description </entry> - </row> - </thead> - <tbody> - <row> - <entry>fs.default.name </entry> - <entry>glusterfs://fedora1:9000</entry> - <entry>Any hostname in the cluster as the server and any port number. </entry> - </row> - <row> - <entry>fs.glusterfs.volname </entry> - <entry>hadoopvol </entry> - <entry>GlusterFS volume to mount. </entry> - </row> - <row> - <entry>fs.glusterfs.mount </entry> - <entry>/mnt/glusterfs</entry> - <entry>The directory used to fuse mount the volume.</entry> - </row> - <row> - <entry>fs.glusterfs.server </entry> - <entry>fedora2</entry> - <entry>Any hostname or IP address on the cluster except the client/master. </entry> - </row> - <row> - <entry>quick.slave.io </entry> - <entry>Off </entry> - <entry>Performance tunable option. If this option is set to On, the plugin will try to perform I/O directly from the disk file system (like ext3 or ext4) the file resides on. Hence read performance will improve and job would run faster. <note> - <para>This option is not tested widely</para> - </note></entry> - </row> - </tbody> - </tgroup> - </informaltable></para> - </listitem> - <listitem> - <para>Create a soft link in Hadoop’s library and configuration directory for the downloaded files (in -Step 3) using the following commands: -</para> - <para><command># ln -s <replaceable><target location> <source location</replaceable>></command> -</para> - <para>For example, -</para> - <para><command># ln –s /usr/local/lib/glusterfs-0.20.2-0.1.jar <replaceable>$HADOOP_HOME</replaceable>/lib/glusterfs-0.20.2-0.1.jar</command> -</para> - <para><command># ln –s /usr/local/lib/conf/core-site.xml <replaceable>$HADOOP_HOME</replaceable>/conf/core-site.xml </command></para> - </listitem> - <listitem> - <para> (Optional) You can run the following command on Hadoop master to build the plugin and deploy -it along with core-site.xml file, instead of repeating the above steps: -</para> - <para><command># build-deploy-jar.py -d <replaceable>$HADOOP_HOME</replaceable> -c </command></para> - </listitem> - </orderedlist> - </section> - <section> - <title>Starting and Stopping the Hadoop MapReduce Daemon</title> - <para>To start and stop MapReduce daemon</para> - <itemizedlist> - <listitem> - <para>To start MapReduce daemon manually, enter the following command: -</para> - <para><command># <replaceable>$HADOOP_HOME</replaceable>/bin/start-mapred.sh</command> -</para> - </listitem> - <listitem> - <para>To stop MapReduce daemon manually, enter the following command: -</para> - <para><command># <replaceable>$HADOOP_HOME</replaceable>/bin/stop-mapred.sh </command></para> - </listitem> - </itemizedlist> - <para><note> - <para>You must start Hadoop MapReduce daemon on all servers. -</para> - </note></para> - </section> -</chapter> |