summaryrefslogtreecommitdiffstats
path: root/doc/admin-guide/en-US/markdown/admin_Hadoop.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/admin-guide/en-US/markdown/admin_Hadoop.md')
-rw-r--r--doc/admin-guide/en-US/markdown/admin_Hadoop.md170
1 files changed, 170 insertions, 0 deletions
diff --git a/doc/admin-guide/en-US/markdown/admin_Hadoop.md b/doc/admin-guide/en-US/markdown/admin_Hadoop.md
new file mode 100644
index 00000000000..2894fa71302
--- /dev/null
+++ b/doc/admin-guide/en-US/markdown/admin_Hadoop.md
@@ -0,0 +1,170 @@
+Managing Hadoop Compatible Storage
+==================================
+
+GlusterFS provides compatibility for Apache Hadoop and it uses the
+standard file system APIs available in Hadoop to provide a new storage
+option for Hadoop deployments. Existing MapReduce based applications can
+use GlusterFS seamlessly. This new functionality opens up data within
+Hadoop deployments to any file-based or object-based application.
+
+Architecture Overview
+=====================
+
+The following diagram illustrates Hadoop integration with GlusterFS:
+
+Advantages
+==========
+
+The following are the advantages of Hadoop Compatible Storage with
+GlusterFS:
+
+- Provides simultaneous file-based and object-based access within
+ Hadoop.
+
+- Eliminates the centralized metadata server.
+
+- Provides compatibility with MapReduce applications and rewrite is
+ not required.
+
+- Provides a fault tolerant file system.
+
+Preparing to Install Hadoop Compatible Storage
+==============================================
+
+This section provides information on pre-requisites and list of
+dependencies that will be installed during installation of Hadoop
+compatible storage.
+
+Pre-requisites
+--------------
+
+The following are the pre-requisites to install Hadoop Compatible
+Storage :
+
+- Hadoop 0.20.2 is installed, configured, and is running on all the
+ machines in the cluster.
+
+- Java Runtime Environment
+
+- Maven (mandatory only if you are building the plugin from the
+ source)
+
+- JDK (mandatory only if you are building the plugin from the source)
+
+- getfattr - command line utility
+
+Installing, and Configuring Hadoop Compatible Storage
+=====================================================
+
+This section describes how to install and configure Hadoop Compatible
+Storage in your storage environment and verify that it is functioning
+correctly.
+
+1. Download `glusterfs-hadoop-0.20.2-0.1.x86_64.rpm` file to each
+ server on your cluster. You can download the file from [][].
+
+2. To install Hadoop Compatible Storage on all servers in your cluster,
+ run the following command:
+
+ `# rpm –ivh --nodeps glusterfs-hadoop-0.20.2-0.1.x86_64.rpm`
+
+ The following files will be extracted:
+
+ - /usr/local/lib/glusterfs-Hadoop-version-gluster\_plugin\_version.jar
+
+ - /usr/local/lib/conf/core-site.xml
+
+3. (Optional) To install Hadoop Compatible Storage in a different
+ location, run the following command:
+
+ `# rpm –ivh --nodeps –prefix /usr/local/glusterfs/hadoop glusterfs-hadoop- 0.20.2-0.1.x86_64.rpm`
+
+4. Edit the `conf/core-site.xml` file. The following is the sample
+ `conf/core-site.xml` file:
+
+ <configuration>
+ <property>
+ <name>fs.glusterfs.impl</name>
+ <value>org.apache.hadoop.fs.glusterfs.Gluster FileSystem</value>
+ </property>
+
+ <property>
+ <name>fs.default.name</name>
+ <value>glusterfs://fedora1:9000</value>
+ </property>
+
+ <property>
+ <name>fs.glusterfs.volname</name>
+ <value>hadoopvol</value>
+ </property>
+
+ <property>
+ <name>fs.glusterfs.mount</name>
+ <value>/mnt/glusterfs</value>
+ </property>
+
+ <property>
+ <name>fs.glusterfs.server</name>
+ <value>fedora2</value>
+ </property>
+
+ <property>
+ <name>quick.slave.io</name>
+ <value>Off</value>
+ </property>
+ </configuration>
+
+ The following are the configurable fields:
+
+ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Property Name Default Value Description
+ ---------------------- -------------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ fs.default.name glusterfs://fedora1:9000 Any hostname in the cluster as the server and any port number.
+
+ fs.glusterfs.volname hadoopvol GlusterFS volume to mount.
+
+ fs.glusterfs.mount /mnt/glusterfs The directory used to fuse mount the volume.
+
+ fs.glusterfs.server fedora2 Any hostname or IP address on the cluster except the client/master.
+
+ quick.slave.io Off Performance tunable option. If this option is set to On, the plugin will try to perform I/O directly from the disk file system (like ext3 or ext4) the file resides on. Hence read performance will improve and job would run faster.
+ > **Note**
+ >
+ > This option is not tested widely
+ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+
+5. Create a soft link in Hadoop’s library and configuration directory
+ for the downloaded files (in Step 3) using the following commands:
+
+ `# ln -s >`
+
+ For example,
+
+ `# ln –s /usr/local/lib/glusterfs-0.20.2-0.1.jar /lib/glusterfs-0.20.2-0.1.jar`
+
+ `# ln –s /usr/local/lib/conf/core-site.xml /conf/core-site.xml `
+
+6. (Optional) You can run the following command on Hadoop master to
+ build the plugin and deploy it along with core-site.xml file,
+ instead of repeating the above steps:
+
+ `# build-deploy-jar.py -d -c `
+
+Starting and Stopping the Hadoop MapReduce Daemon
+=================================================
+
+To start and stop MapReduce daemon
+
+- To start MapReduce daemon manually, enter the following command:
+
+ `# /bin/start-mapred.sh`
+
+- To stop MapReduce daemon manually, enter the following command:
+
+ `# /bin/stop-mapred.sh `
+
+> **Note**
+>
+> You must start Hadoop MapReduce daemon on all servers.
+
+ []: http://download.gluster.com/pub/gluster/glusterfs/qa-releases/3.3-beta-2/glusterfs-hadoop-0.20.2-0.1.x86_64.rpm