1 files changed, 182 insertions, 0 deletions
diff --git a/glusterfs-hadoop/README b/glusterfs-hadoop/README
new file mode 100644
index 000000000..3026f11c0
--- /dev/null
+++ b/glusterfs-hadoop/README
@@ -0,0 +1,182 @@
+GlusterFS Hadoop Plugin
+=======================
+
+INTRODUCTION
+------------
+
+This document describes how to use GlusterFS (http://www.gluster.org/) as a backing store with Hadoop.
+
+
+REQUIREMENTS
+------------
+
+* Supported OS is GNU/Linux
+* GlusterFS and Hadoop installed on all machines in the cluster
+* Java Runtime Environment (JRE)
+* Maven (needed if you are building the plugin from source)
+* JDK (needed if you are building the plugin from source)
+
+NOTE: Plugin relies on two *nix command line utilities to function properly. They are:
+
+* mount: Used to mount GlusterFS volumes.
+* getfattr: Used to fetch Extended Attributes of a file
+
+Make sure they are installed on all hosts in the cluster and their locations are in $PATH
+environment variable.
+
+
+INSTALLATION
+------------
+
+** NOTE: Example below is for Hadoop version 0.20.2 ($GLUSTER_HOME/hdfs/0.20.2) **
+
+* Building the plugin from source [Maven (http://maven.apache.org/) and JDK is required to build the plugin]
+
+  Change to glusterfs-hadoop directory in the GlusterFS source tree and build the plugin.
+
+  # cd $GLUSTER_HOME/hdfs/0.20.2
+  # mvn package
+
+  On a successful build the plugin will be present in the `target` directory.
+  (NOTE: version number will be a part of the plugin)
+
+  # ls target/
+  classes  glusterfs-0.20.2-0.1.jar  maven-archiver  surefire-reports  test-classes
+               ^^^^^^^^^^^^^^^^^^
+
+  Copy the plugin to lib/ directory in your $HADOOP_HOME dir.
+
+  # cp target/glusterfs-0.20.2-0.1.jar $HADOOP_HOME/lib
+
+  Copy the sample configuration file that ships with this source (conf/core-site.xml) to conf
+  directory in your $HADOOP_HOME dir.
+
+  # cp conf/core-site.xml $HADOOP_HOME/conf
+
+* Installing the plugin from RPM
+
+  See the plugin documentation for installing from RPM.
+
+
+CLUSTER INSTALLATION
+--------------------
+
+  In case it is tedious to do the above steps(s) on all hosts in the cluster; use the build-and-deploy.py script to
+  build the plugin in one place and deploy it (along with the configuration file on all other hosts).
+
+  This should be run on the host which is that hadoop master [Job Tracker].
+
+* STEPS (You would have done Step 1 and 2 anyway while deploying Hadoop)
+
+  1. Edit conf/slaves file in your hadoop distribution; one line for each slave.
+  2. Setup password-less ssh b/w hadoop master and slave(s).
+  3. Edit conf/core-site.xml with all glusterfs related configurations (see CONFIGURATION)
+  4. Run the following
+     # cd $GLUSTER_HOME/hdfs/0.20.2/tools
+     # python ./build-and-deploy.py -b -d /path/to/hadoop/home -c
+
+     This will build the plugin and copy it (and the config file) to all slaves (mentioned in $HADOOP_HOME/conf/slaves).
+
+   Script options:
+     -b : build the plugin
+     -d : location of hadoop directory
+     -c : deploy core-site.xml
+     -m : deploy mapred-site.xml
+     -h : deploy hadoop-env.sh
+
+
+CONFIGURATION
+-------------
+
+  All plugin configuration is done in a single XML file (core-site.xml) with <name><value> tags in each <property>
+  block.
+
+  Brief explanation of the tunables and the values they accept (change them where-ever needed) are mentioned below
+
+  name:  fs.glusterfs.impl
+  value: org.apache.hadoop.fs.glusterfs.GlusterFileSystem
+
+         The default FileSystem API to use (there is little reason to modify this).
+
+  name:  fs.default.name
+  value: glusterfs://server:port
+
+         The default name that hadoop uses to represent file as a URI (typically a server:port tuple). Use any host
+         in the cluster as the server and any port number. This option has to be in server:port format for hadoop
+         to create file URI; but is not used by plugin.
+
+  name:  fs.glusterfs.volname
+  value: volume-dist-rep
+
+         The volume to mount.
+
+
+  name:  fs.glusterfs.mount
+  value: /mnt/glusterfs
+
+         This is the directory that the plugin will use to mount (FUSE mount) the volume.
+
+  name:  fs.glusterfs.server
+  value: 192.168.1.36, hackme.zugzug.org
+
+         To mount a volume the plugin needs to know the hostname or the IP of a GlusterFS server in the cluster.
+         Mention it here.
+
+  name:  quick.slave.io
+  value: [On/Off], [Yes/No], [1/0]
+
+         NOTE: This option is not tested as of now.
+
+         This is a performance tunable option. Hadoop schedules jobs to hosts that contain the file data part. The job
+         then does I/O on the file (via FUSE in case of GlusterFS). When this option is set, the plugin will try to
+         do I/O directly from the backed filesystem (ext3, ext4 etc..) the file resides on. Hence read performance
+         will improve and job would run faster.
+
+
+USAGE
+-----
+
+  Once configured, start Hadoop Map/Reduce daemons
+
+  # cd $HADOOP_HOME
+  # ./bin/start-mapred.sh
+
+  If the map/reduce job/task trackers are up, all I/O will be done to GlusterFS.
+
+
+FOR HACKERS
+-----------
+
+* Source Layout
+
+** version specific: hdfs/<version> **
+./src
+./src/main
+./src/main/java
+./src/main/java/org
+./src/main/java/org/apache
+./src/main/java/org/apache/hadoop
+./src/main/java/org/apache/hadoop/fs
+./src/main/java/org/apache/hadoop/fs/glusterfs
+./src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFSBrickClass.java
+./src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFSXattr.java            <--- Fetch/Parse Extended Attributes of a file
+./src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFUSEInputStream.java    <--- Input Stream (instantiated during open() calls; quick read from backed FS)
+./src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFSBrickRepl.java
+./src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFUSEOutputStream.java   <--- Output Stream (instantiated during creat() calls)
+./src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFileSystem.java         <--- Entry Point for the plugin (extends Hadoop FileSystem class)
+./src/test
+./src/test/java
+./src/test/java/org
+./src/test/java/org/apache
+./src/test/java/org/apache/hadoop
+./src/test/java/org/apache/hadoop/fs
+./src/test/java/org/apache/hadoop/fs/glusterfs
+./src/test/java/org/apache/hadoop/fs/glusterfs/AppTest.java                  <--- Your test cases go here (if any :-))
+./tools/build-deploy-jar.py                                                  <--- Build and Deployment Script
+./conf
+./conf/core-site.xml                                                         <--- Sample configuration file
+./pom.xml                                                                    <--- build XML file (used by maven)
+
+** toplevel: hdfs/ **
+./COPYING                                                                    <--- License
+./README                                                                     <--- This file