Managing Hadoop Compatible Storage

Preparing to Install Hadoop Compatible Storage This section provides information on pre-requisites and list of dependencies that will be installed during installation of Hadoop compatible storage.

Pre-requisites The following are the pre-requisites to install Hadoop Compatible Storage : Hadoop 0.20.2 is installed, configured, and is running on all the machines in the cluster. Java Runtime Environment Maven (mandatory only if you are building the plugin from the source) JDK (mandatory only if you are building the plugin from the source) getfattr - command line utility

Installing, and Configuring Hadoop Compatible Storage This section describes how to install and configure Hadoop Compatible Storage in your storage environment and verify that it is functioning correctly. To install and configure Hadoop compatible storage: Download glusterfs-hadoop-0.20.2-0.1.x86_64.rpm file to each server on your cluster. You can download the file from . To install Hadoop Compatible Storage on all servers in your cluster, run the following command: # rpm –ivh --nodeps glusterfs-hadoop-0.20.2-0.1.x86_64.rpm The following files will be extracted: /usr/local/lib/glusterfs-Hadoop-version-gluster_plugin_version.jar /usr/local/lib/conf/core-site.xml (Optional) To install Hadoop Compatible Storage in a different location, run the following command: # rpm –ivh --nodeps –prefix /usr/local/glusterfs/hadoop glusterfs-hadoop- 0.20.2-0.1.x86_64.rpm Edit the conf/core-site.xml file. The following is the sample conf/core-site.xml file: <configuration> <property> <name>fs.glusterfs.impl</name> <value>org.apache.hadoop.fs.glusterfs.Gluster FileSystem</value> </property> <property> <name>fs.default.name</name> <value>glusterfs://fedora1:9000</value> </property> <property> <name>fs.glusterfs.volname</name> <value>hadoopvol</value> </property> <property> <name>fs.glusterfs.mount</name> <value>/mnt/glusterfs</value> </property> <property> <name>fs.glusterfs.server</name> <value>fedora2</value> </property> <property> <name>quick.slave.io</name> <value>Off</value> </property> </configuration> The following are the configurable fields: Property Name Default Value Description fs.default.name glusterfs://fedora1:9000 Any hostname in the cluster as the server and any port number. fs.glusterfs.volname hadoopvol GlusterFS volume to mount. fs.glusterfs.mount /mnt/glusterfs The directory used to fuse mount the volume. fs.glusterfs.server fedora2 Any hostname or IP address on the cluster except the client/master. quick.slave.io Off Performance tunable option. If this option is set to On, the plugin will try to perform I/O directly from the disk file system (like ext3 or ext4) the file resides on. Hence read performance will improve and job would run faster. This option is not tested widely Create a soft link in Hadoop’s library and configuration directory for the downloaded files (in Step 3) using the following commands: # ln -s <target location> <source location> For example, # ln –s /usr/local/lib/glusterfs-0.20.2-0.1.jar $HADOOP_HOME/lib/glusterfs-0.20.2-0.1.jar # ln –s /usr/local/lib/conf/core-site.xml $HADOOP_HOME/conf/core-site.xml (Optional) You can run the following command on Hadoop master to build the plugin and deploy it along with core-site.xml file, instead of repeating the above steps: # build-deploy-jar.py -d $HADOOP_HOME -c