Git Product home page Git Product logo

glusterfs-hadoop's Introduction

GlusterFS Hadoop Plugin
=======================

INTRODUCTION
------------

This document describes how to use GlusterFS (http://www.gluster.org/) as a backing store with Hadoop.

This plugin replaces the hadoop file system (typically, the Hadoop Distributed File System) with the 
GlusterFileSystem, which writes to a local directory which FUSE mounts a proxy to a gluster system.

REQUIREMENTS
------------

* Supported OS is GNU/Linux
* GlusterFS installed on all machines in the cluster
* Java Runtime Environment (JRE)
* Maven 3x (needed if you are building the plugin from source)
* JDK 6+ (needed if you are building the plugin from source)

NOTE: Plugin relies on two *nix command line utilities to function properly. They are:

* mount: Used to mount GlusterFS volumes.
* getfattr: Used to fetch Extended Attributes of a file

Make sure they are installed on all hosts in the cluster and their locations are in $PATH
environment variable.


INSTALLATION
------------

** NOTE: Example below is for Hadoop version 0.20.2 ($GLUSTER_HOME/hdfs/0.20.2) **

* Building the plugin from source [Maven (http://maven.apache.org/) and JDK is required to build the plugin]

  Change to glusterfs-hadoop directory in the GlusterFS source tree and build the plugin.

  # cd $GLUSTER_HOME/hdfs/0.20.2
  # mvn package

  On a successful build the plugin will be present in the `target` directory.
  (NOTE: version number will be a part of the plugin)

  # ls target/
  classes  glusterfs-0.20.2-0.1.jar  maven-archiver  surefire-reports  test-classes

  Copy the plugin to lib/ directory in your $HADOOP_HOME dir.

  # cp target/glusterfs-0.20.2-0.1.jar $HADOOP_HOME/lib

  Copy the sample configuration file that ships with this source (conf/core-site.xml) to conf
  directory in your $HADOOP_HOME dir.

  # cp conf/core-site.xml $HADOOP_HOME/conf

* Installing the plugin from RPM

  See the plugin documentation for installing from RPM.


CLUSTER INSTALLATION
--------------------

  In case it is tedious to do the above steps(s) on all hosts in the cluster; use the build-and-deploy.py script to
  build the plugin in one place and deploy it (along with the configuration file on all other hosts).

  This should be run on the host which is that hadoop master [Job Tracker].

* STEPS (You would have done Step 1 and 2 anyway while deploying Hadoop)

  1. Edit conf/slaves file in your hadoop distribution; one line for each slave.
  2. Setup password-less ssh b/w hadoop master and slave(s).
  3. Edit conf/core-site.xml with all glusterfs related configurations (see CONFIGURATION)
  4. Run the following
     # cd $GLUSTER_HOME/hdfs/0.20.2/tools
     # python ./build-and-deploy.py -b -d /path/to/hadoop/home -c

     This will build the plugin and copy it (and the config file) to all slaves (mentioned in $HADOOP_HOME/conf/slaves).

   Script options:
     -b : build the plugin
     -d : location of hadoop directory
     -c : deploy core-site.xml
     -m : deploy mapred-site.xml
     -h : deploy hadoop-env.sh


CONFIGURATION
-------------

  All plugin configuration is done in a single XML file (core-site.xml) with <name><value> tags in each <property>
  block.

  Brief explanation of the tunables and the values they accept (change them where-ever needed) are mentioned below

  name:  fs.glusterfs.impl
  value: org.apache.hadoop.fs.glusterfs.GlusterFileSystem

         The default FileSystem API to use (there is little reason to modify this).

  name:  fs.default.name
  value: glusterfs:///

         The default name that hadoop uses to represent file as a URI (typically a server:port tuple). Use any host
         in the cluster as the server and any port number. This option has to be in server:port format for hadoop
         to create file URI; but is not used by plugin.

  name:  fs.glusterfs.volname
  value: volume-dist-rep

         The volume to mount.


  name:  fs.glusterfs.mount
  value: /mnt/glusterfs

         This is the directory where the gluster volume is mounted

  name:  fs.glusterfs.server
  value: localhost

         To mount a volume the plugin needs to know the hostname or the IP of a GlusterFS server in the cluster.
         Mention it here.

USAGE
-----

  Once configured, start Hadoop Map/Reduce daemons

  # cd $HADOOP_HOME
  # ./bin/start-mapred.sh

  If the map/reduce job/task trackers are up, all I/O will be done to GlusterFS.


FOR HACKERS
-----------

* Source Layout (./src/)

For the overall architecture, see.  Currently, we use the hadoop RawLocalFileSystem as 
the basis - and wrap it with the GlusterVolume class.  That class is then used by the 
Hadoop 1x (GlusterFileSystem) and Hadoop 2x (GlusterFs) adapters.

 https://forge.gluster.org/hadoop/pages/Architecture

./tools/build-deploy-jar.py                                                  <--- Build and Deployment Script
./conf/core-site.xml                                                         <--- Sample configuration file
./pom.xml                                                                    <--- build XML file (used by maven)

./COPYING                                                                    <--- License
./README                                                                     <--- This file



JENKINS
-------

  #Method 1) Modify JENKINS_USER in /etc/sysconfig/jenkins
  JENKINS_USER=root

  #Method 2) Directly modify /etc/init.d/jenkins 
  #daemon --user "$JENKINS_USER" --pidfile "$JENKINS_PID_FILE" $JAVA_CMD $PARAMS > /dev/null
  echo "WARNING: RUNNING AS ROOT" 
  daemon --user root --pidfile "$JENKINS_PID_FILE" $JAVA_CMD $PARAMS > /dev/null


BUILDING 
--------

Building requires a working gluster mount for unit tests. 
The unit tests read test resources from glusterconfig.properties - a file which should be present 

1) edit your .bashrc, or else at your terminal run : 

export GLUSTER_MOUNT=/mnt/glusterfs
export HCFS_FILE_SYSTEM_CONNECTOR=org.apache.hadoop.fs.test.connector.glusterfs.GlusterFileSystemTestConnector 
export HCFS_CLASSNAME=org.apache.hadoop.fs.glusterfs.GlusterFileSystem

(in eclipse - see below , you will add these at the "Run Configurations" menu,
in VM arguments, prefixed with -D, for example, "-DGLUSTER_MOUNT=x -DHCFS_FILE_SYSTEM_CONNECTOR=y ...")

2) run: 
   mvn clean package 
   
3) The jar artifact will be in target/

DEVELOPING
----------

0) Create a mock gluster mount: 
 
 #Create raw disk and format it...
 truncate -s 1G /export/debugging_fun.brick
 sudo mkfs.xfs  /export/debugging_fun.brick

 #Mount it as loopback fs
 mount -o loop /export/debugging_fun.brick /mnt/mybrick ;

 #Now make a mount point for it, and also, for gluster itself
 mkdir /mnt/mybrick/glusterbrick
 mkdir /mnt/glusterfs
 MNT="/mnt/glusterfs"
 BRICK="/mnt/mybrick/glusterbrick"
 
 #Create a gluster volume that writes to the brick
 sudo gluster volume create HadoopVol 10.10.61.230:$BRICK 

 #Mount the volume on top of the newly created brick
 mount -t glusterfs mount -t glusterfs $(hostname):HadoopVol $MNT

1) Run "mvn eclipse:eclipse", and import into eclipse.

2) Add the exported env variables above via Run Configurations as described in the above section.

3) Develop and run unit tests as you would any other java app. 

glusterfs-hadoop's People

Contributors

avati avatar childsb avatar jayunit100 avatar jeffvance avatar mattf avatar mbukatov avatar mikebonnet avatar msvbhat avatar rootfs avatar vbellur avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

glusterfs-hadoop's Issues

Performance impact of glusterFS vs HDFS

Using version 2.1.6 of the glusterfs-hadoop plugin in an hadoop 2.x and glusterfs 3.4 environment, we have some strange behaviour wrt performances and function.

Using teragen on the same physical cluster of 8 nodes, with both HDFS and glusterFS we have comparable results. However, using terasort, there is a huge perf impact using glusterfs.

SW used:
glusterfs-libs-3.4.0.59rhs-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.59rhs-1.el6rhs.x86_64
glusterfs-3.4.0.59rhs-1.el6rhs.x86_64
glusterfs-server-3.4.0.59rhs-1.el6rhs.x86_64

RHEL 6.4 with kenrel 2.6.32-358.32.3.el6.x86_64
glusterfs-hadoop-2.1.6.jar

HDFS Teragen results:

application_1393510237328_0004 root TeraGen default Thu, 27 Feb 2014 14:17:07 GMT Thu, 27 Feb 2014 14:18:16 GMT FINISHED SUCCEEDED

========== preparing terasort data==========
HADOOP_EXECUTABLE=/usr/lib/hadoop/bin/hadoop
HADOOP_CONF_DIR=/usr/lib/hadoop/etc/hadoop
HADOOP_EXAMPLES_JAR=/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
Deleted /tmp/HiBench/Terasort/Input
14/02/27 15:17:06 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
14/02/27 15:17:06 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
14/02/27 15:17:06 INFO terasort.TeraSort: Generating 1000000000 using 96
14/02/27 15:17:06 INFO mapreduce.JobSubmitter: number of splits:96
14/02/27 15:17:06 WARN conf.Configuration: user.name is deprecated. Instead, use mapreduce.job.user.name
14/02/27 15:17:06 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/02/27 15:17:06 WARN conf.Configuration: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/02/27 15:17:06 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/02/27 15:17:06 WARN conf.Configuration: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
14/02/27 15:17:06 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/02/27 15:17:06 WARN conf.Configuration: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
14/02/27 15:17:06 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/02/27 15:17:06 WARN conf.Configuration: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
14/02/27 15:17:06 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/02/27 15:17:06 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/02/27 15:17:06 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/02/27 15:17:07 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1393510237328_0004
14/02/27 15:17:07 INFO client.YarnClientImpl: Submitted application application_1393510237328_0004 to ResourceManager at hp-jobtracker-1.hpintelco.org/10.3.222.41:8032
14/02/27 15:17:07 INFO mapreduce.Job: The url to track the job: http://hp-jobtracker-1.hpintelco.org:8088/proxy/application_1393510237328_0004/
14/02/27 15:17:07 INFO mapreduce.Job: Running job: job_1393510237328_0004
14/02/27 15:17:11 INFO mapreduce.Job: Job job_1393510237328_0004 running in uber mode : false
14/02/27 15:17:11 INFO mapreduce.Job: map 0% reduce 0%
14/02/27 15:17:21 INFO mapreduce.Job: map 1% reduce 0%
14/02/27 15:17:22 INFO mapreduce.Job: map 7% reduce 0%
14/02/27 15:17:23 INFO mapreduce.Job: map 11% reduce 0%
14/02/27 15:17:24 INFO mapreduce.Job: map 13% reduce 0%
14/02/27 15:17:25 INFO mapreduce.Job: map 17% reduce 0%
14/02/27 15:17:26 INFO mapreduce.Job: map 19% reduce 0%
14/02/27 15:17:27 INFO mapreduce.Job: map 21% reduce 0%
14/02/27 15:17:28 INFO mapreduce.Job: map 23% reduce 0%
14/02/27 15:17:29 INFO mapreduce.Job: map 26% reduce 0%
14/02/27 15:17:30 INFO mapreduce.Job: map 28% reduce 0%
14/02/27 15:17:31 INFO mapreduce.Job: map 31% reduce 0%
14/02/27 15:17:32 INFO mapreduce.Job: map 34% reduce 0%
14/02/27 15:17:33 INFO mapreduce.Job: map 36% reduce 0%
14/02/27 15:17:34 INFO mapreduce.Job: map 38% reduce 0%
14/02/27 15:17:35 INFO mapreduce.Job: map 40% reduce 0%
14/02/27 15:17:36 INFO mapreduce.Job: map 41% reduce 0%
14/02/27 15:17:37 INFO mapreduce.Job: map 43% reduce 0%
14/02/27 15:17:38 INFO mapreduce.Job: map 45% reduce 0%
14/02/27 15:17:39 INFO mapreduce.Job: map 47% reduce 0%
14/02/27 15:17:40 INFO mapreduce.Job: map 49% reduce 0%
14/02/27 15:17:41 INFO mapreduce.Job: map 51% reduce 0%
14/02/27 15:17:42 INFO mapreduce.Job: map 53% reduce 0%
14/02/27 15:17:43 INFO mapreduce.Job: map 55% reduce 0%
14/02/27 15:17:44 INFO mapreduce.Job: map 57% reduce 0%
14/02/27 15:17:45 INFO mapreduce.Job: map 59% reduce 0%
14/02/27 15:17:46 INFO mapreduce.Job: map 60% reduce 0%
14/02/27 15:17:47 INFO mapreduce.Job: map 63% reduce 0%
14/02/27 15:17:48 INFO mapreduce.Job: map 65% reduce 0%
14/02/27 15:17:49 INFO mapreduce.Job: map 66% reduce 0%
14/02/27 15:17:50 INFO mapreduce.Job: map 68% reduce 0%
14/02/27 15:17:51 INFO mapreduce.Job: map 70% reduce 0%
14/02/27 15:17:52 INFO mapreduce.Job: map 72% reduce 0%
14/02/27 15:17:53 INFO mapreduce.Job: map 74% reduce 0%
14/02/27 15:17:54 INFO mapreduce.Job: map 76% reduce 0%
14/02/27 15:17:55 INFO mapreduce.Job: map 77% reduce 0%
14/02/27 15:17:56 INFO mapreduce.Job: map 79% reduce 0%
14/02/27 15:17:57 INFO mapreduce.Job: map 80% reduce 0%
14/02/27 15:17:58 INFO mapreduce.Job: map 82% reduce 0%
14/02/27 15:17:59 INFO mapreduce.Job: map 84% reduce 0%
14/02/27 15:18:00 INFO mapreduce.Job: map 85% reduce 0%
14/02/27 15:18:01 INFO mapreduce.Job: map 87% reduce 0%
14/02/27 15:18:02 INFO mapreduce.Job: map 89% reduce 0%
14/02/27 15:18:03 INFO mapreduce.Job: map 90% reduce 0%
14/02/27 15:18:04 INFO mapreduce.Job: map 91% reduce 0%
14/02/27 15:18:05 INFO mapreduce.Job: map 93% reduce 0%
14/02/27 15:18:06 INFO mapreduce.Job: map 94% reduce 0%
14/02/27 15:18:07 INFO mapreduce.Job: map 96% reduce 0%
14/02/27 15:18:08 INFO mapreduce.Job: map 97% reduce 0%
14/02/27 15:18:09 INFO mapreduce.Job: map 99% reduce 0%
14/02/27 15:18:10 INFO mapreduce.Job: map 100% reduce 0%
14/02/27 15:18:11 INFO mapreduce.Job: Job job_1393510237328_0004 completed successfully
14/02/27 15:18:12 INFO mapreduce.Job: Counters: 29
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=7449964
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=8251
HDFS: Number of bytes written=100000000000
HDFS: Number of read operations=384
HDFS: Number of large read operations=0
HDFS: Number of write operations=192
Job Counters
Killed map tasks=2
Launched map tasks=98
Other local map tasks=98
Total time spent by all maps in occupied slots (ms)=4080121
Total time spent by all reduces in occupied slots (ms)=0
Map-Reduce Framework
Map input records=1000000000
Map output records=1000000000
Input split bytes=8251
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=10904
CPU time spent (ms)=1803670
Physical memory (bytes) snapshot=34591154176
Virtual memory (bytes) snapshot=159040536576
Total committed heap usage (bytes)=91137507328
org.apache.hadoop.examples.terasort.TeraGen$Counters
CHECKSUM=2147523228284173905
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=100000000000
Deleted /tmp/HiBench/Terasort/Input/_SUCCESS

GlusterFS teragen results:

application_1393512197149_0001 yarn TeraGen default Thu, 27 Feb 2014 14:44:05 GMT Thu, 27 Feb 2014 14:47:24 GMT FINISHED SUCCEEDED
History

========== preparing terasort data==========
HADOOP_EXECUTABLE=/usr/lib/hadoop/bin/hadoop
HADOOP_CONF_DIR=/usr/lib/hadoop/etc/hadoop
HADOOP_EXAMPLES_JAR=/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
14/02/27 15:44:03 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/27 15:44:03 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/27 15:44:03 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS, CRC disabled.
14/02/27 15:44:03 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=7b04317, git.commit.user.email=[email protected], git.commit.message.full=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers

include the sudoers file in the srpm, git.commit.id=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.message.short=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers, git.commit.user.name=jay vyas, git.build.user.name=Unknown, git.commit.id.describe=2.1.6, git.build.user.email=Unknown, git.branch=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.time=07.02.2014 @ 12:06:31 EST, git.build.time=10.02.2014 @ 13:31:20 EST}
14/02/27 15:44:03 INFO glusterfs.GlusterFileSystem: GIT_TAG=2.1.6
14/02/27 15:44:03 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/27 15:44:03 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/27 15:44:03 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/27 15:44:03 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/27 15:44:03 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/27 15:44:03 INFO glusterfs.GlusterVolume: Write buffer size : 131072
rm: `/tmp/HiBench/Terasort/Input': No such file or directory
14/02/27 15:44:04 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/27 15:44:04 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/27 15:44:04 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS, CRC disabled.
14/02/27 15:44:04 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=7b04317, git.commit.user.email=[email protected], git.commit.message.full=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers

include the sudoers file in the srpm, git.commit.id=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.message.short=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers, git.commit.user.name=jay vyas, git.build.user.name=Unknown, git.commit.id.describe=2.1.6, git.build.user.email=Unknown, git.branch=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.time=07.02.2014 @ 12:06:31 EST, git.build.time=10.02.2014 @ 13:31:20 EST}
14/02/27 15:44:04 INFO glusterfs.GlusterFileSystem: GIT_TAG=2.1.6
14/02/27 15:44:04 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/27 15:44:04 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/27 15:44:04 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/27 15:44:04 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/27 15:44:04 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/27 15:44:04 INFO glusterfs.GlusterVolume: Write buffer size : 131072
14/02/27 15:44:04 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
14/02/27 15:44:04 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
14/02/27 15:44:04 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/27 15:44:04 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/27 15:44:04 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/27 15:44:04 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/27 15:44:04 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/27 15:44:04 INFO glusterfs.GlusterVolume: Write buffer size : 131072
14/02/27 15:44:05 INFO terasort.TeraSort: Generating 1000000000 using 96
14/02/27 15:44:05 INFO mapreduce.JobSubmitter: number of splits:96
14/02/27 15:44:05 WARN conf.Configuration: user.name is deprecated. Instead, use mapreduce.job.user.name
14/02/27 15:44:05 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/02/27 15:44:05 WARN conf.Configuration: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/02/27 15:44:05 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/02/27 15:44:05 WARN conf.Configuration: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
14/02/27 15:44:05 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/02/27 15:44:05 WARN conf.Configuration: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
14/02/27 15:44:05 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/02/27 15:44:05 WARN conf.Configuration: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
14/02/27 15:44:05 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/02/27 15:44:05 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/02/27 15:44:05 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/02/27 15:44:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1393512197149_0001
14/02/27 15:44:05 INFO client.YarnClientImpl: Submitted application application_1393512197149_0001 to ResourceManager at hp-jobtracker-1.hpintelco.org/10.3.222.41:8032
14/02/27 15:44:05 INFO mapreduce.Job: The url to track the job: http://hp-jobtracker-1.hpintelco.org:8088/proxy/application_1393512197149_0001/
14/02/27 15:44:05 INFO mapreduce.Job: Running job: job_1393512197149_0001
14/02/27 15:44:16 INFO mapreduce.Job: Job job_1393512197149_0001 running in uber mode : false
14/02/27 15:44:16 INFO mapreduce.Job: map 0% reduce 0%
14/02/27 15:44:33 INFO mapreduce.Job: map 1% reduce 0%
14/02/27 15:44:35 INFO mapreduce.Job: map 2% reduce 0%
14/02/27 15:44:36 INFO mapreduce.Job: map 6% reduce 0%
14/02/27 15:44:37 INFO mapreduce.Job: map 7% reduce 0%
14/02/27 15:44:38 INFO mapreduce.Job: map 8% reduce 0%
14/02/27 15:44:39 INFO mapreduce.Job: map 10% reduce 0%
14/02/27 15:44:40 INFO mapreduce.Job: map 11% reduce 0%
14/02/27 15:44:41 INFO mapreduce.Job: map 12% reduce 0%
14/02/27 15:44:42 INFO mapreduce.Job: map 14% reduce 0%
14/02/27 15:44:43 INFO mapreduce.Job: map 15% reduce 0%
14/02/27 15:44:45 INFO mapreduce.Job: map 16% reduce 0%
14/02/27 15:44:46 INFO mapreduce.Job: map 17% reduce 0%
14/02/27 15:44:48 INFO mapreduce.Job: map 18% reduce 0%
14/02/27 15:44:51 INFO mapreduce.Job: map 19% reduce 0%
14/02/27 15:44:52 INFO mapreduce.Job: map 20% reduce 0%
14/02/27 15:44:53 INFO mapreduce.Job: map 21% reduce 0%
14/02/27 15:44:55 INFO mapreduce.Job: map 22% reduce 0%
14/02/27 15:44:57 INFO mapreduce.Job: map 23% reduce 0%
14/02/27 15:44:59 INFO mapreduce.Job: map 24% reduce 0%
14/02/27 15:45:00 INFO mapreduce.Job: map 25% reduce 0%
14/02/27 15:45:02 INFO mapreduce.Job: map 26% reduce 0%
14/02/27 15:45:04 INFO mapreduce.Job: map 27% reduce 0%
14/02/27 15:45:05 INFO mapreduce.Job: map 28% reduce 0%
14/02/27 15:45:07 INFO mapreduce.Job: map 29% reduce 0%
14/02/27 15:45:09 INFO mapreduce.Job: map 30% reduce 0%
14/02/27 15:45:11 INFO mapreduce.Job: map 31% reduce 0%
14/02/27 15:45:13 INFO mapreduce.Job: map 32% reduce 0%
14/02/27 15:45:15 INFO mapreduce.Job: map 33% reduce 0%
14/02/27 15:45:17 INFO mapreduce.Job: map 34% reduce 0%
14/02/27 15:45:18 INFO mapreduce.Job: map 35% reduce 0%
14/02/27 15:45:19 INFO mapreduce.Job: map 36% reduce 0%
14/02/27 15:45:21 INFO mapreduce.Job: map 37% reduce 0%
14/02/27 15:45:22 INFO mapreduce.Job: map 38% reduce 0%
14/02/27 15:45:24 INFO mapreduce.Job: map 39% reduce 0%
14/02/27 15:45:25 INFO mapreduce.Job: map 40% reduce 0%
14/02/27 15:45:27 INFO mapreduce.Job: map 41% reduce 0%
14/02/27 15:45:28 INFO mapreduce.Job: map 42% reduce 0%
14/02/27 15:45:30 INFO mapreduce.Job: map 43% reduce 0%
14/02/27 15:45:32 INFO mapreduce.Job: map 44% reduce 0%
14/02/27 15:45:33 INFO mapreduce.Job: map 45% reduce 0%
14/02/27 15:45:34 INFO mapreduce.Job: map 46% reduce 0%
14/02/27 15:45:35 INFO mapreduce.Job: map 47% reduce 0%
14/02/27 15:45:37 INFO mapreduce.Job: map 48% reduce 0%
14/02/27 15:45:38 INFO mapreduce.Job: map 49% reduce 0%
14/02/27 15:45:40 INFO mapreduce.Job: map 50% reduce 0%
14/02/27 15:45:42 INFO mapreduce.Job: map 51% reduce 0%
14/02/27 15:45:43 INFO mapreduce.Job: map 52% reduce 0%
14/02/27 15:45:46 INFO mapreduce.Job: map 54% reduce 0%
14/02/27 15:45:48 INFO mapreduce.Job: map 55% reduce 0%
14/02/27 15:45:49 INFO mapreduce.Job: map 56% reduce 0%
14/02/27 15:45:51 INFO mapreduce.Job: map 57% reduce 0%
14/02/27 15:45:52 INFO mapreduce.Job: map 58% reduce 0%
14/02/27 15:45:54 INFO mapreduce.Job: map 59% reduce 0%
14/02/27 15:45:55 INFO mapreduce.Job: map 60% reduce 0%
14/02/27 15:45:57 INFO mapreduce.Job: map 61% reduce 0%
14/02/27 15:45:58 INFO mapreduce.Job: map 62% reduce 0%
14/02/27 15:46:00 INFO mapreduce.Job: map 63% reduce 0%
14/02/27 15:46:01 INFO mapreduce.Job: map 64% reduce 0%
14/02/27 15:46:03 INFO mapreduce.Job: map 65% reduce 0%
14/02/27 15:46:05 INFO mapreduce.Job: map 66% reduce 0%
14/02/27 15:46:07 INFO mapreduce.Job: map 67% reduce 0%
14/02/27 15:46:09 INFO mapreduce.Job: map 68% reduce 0%
14/02/27 15:46:10 INFO mapreduce.Job: map 69% reduce 0%
14/02/27 15:46:12 INFO mapreduce.Job: map 70% reduce 0%
14/02/27 15:46:14 INFO mapreduce.Job: map 71% reduce 0%
14/02/27 15:46:16 INFO mapreduce.Job: map 72% reduce 0%
14/02/27 15:46:18 INFO mapreduce.Job: map 73% reduce 0%
14/02/27 15:46:19 INFO mapreduce.Job: map 74% reduce 0%
14/02/27 15:46:21 INFO mapreduce.Job: map 75% reduce 0%
14/02/27 15:46:23 INFO mapreduce.Job: map 76% reduce 0%
14/02/27 15:46:26 INFO mapreduce.Job: map 77% reduce 0%
14/02/27 15:46:28 INFO mapreduce.Job: map 78% reduce 0%
14/02/27 15:46:30 INFO mapreduce.Job: map 79% reduce 0%
14/02/27 15:46:31 INFO mapreduce.Job: map 80% reduce 0%
14/02/27 15:46:33 INFO mapreduce.Job: map 81% reduce 0%
14/02/27 15:46:34 INFO mapreduce.Job: map 82% reduce 0%
14/02/27 15:46:37 INFO mapreduce.Job: map 83% reduce 0%
14/02/27 15:46:38 INFO mapreduce.Job: map 84% reduce 0%
14/02/27 15:46:40 INFO mapreduce.Job: map 85% reduce 0%
14/02/27 15:46:42 INFO mapreduce.Job: map 86% reduce 0%
14/02/27 15:46:44 INFO mapreduce.Job: map 87% reduce 0%
14/02/27 15:46:46 INFO mapreduce.Job: map 88% reduce 0%
14/02/27 15:46:47 INFO mapreduce.Job: map 89% reduce 0%
14/02/27 15:46:49 INFO mapreduce.Job: map 90% reduce 0%
14/02/27 15:46:50 INFO mapreduce.Job: map 91% reduce 0%
14/02/27 15:46:52 INFO mapreduce.Job: map 92% reduce 0%
14/02/27 15:46:54 INFO mapreduce.Job: map 93% reduce 0%
14/02/27 15:46:56 INFO mapreduce.Job: map 94% reduce 0%
14/02/27 15:47:01 INFO mapreduce.Job: map 96% reduce 0%
14/02/27 15:47:04 INFO mapreduce.Job: map 97% reduce 0%
14/02/27 15:47:10 INFO mapreduce.Job: map 98% reduce 0%
14/02/27 15:47:13 INFO mapreduce.Job: map 99% reduce 0%
14/02/27 15:47:16 INFO mapreduce.Job: map 100% reduce 0%
14/02/27 15:47:19 INFO mapreduce.Job: Job job_1393512197149_0001 completed successfully
14/02/27 15:47:19 INFO mapreduce.Job: Counters: 29
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=7448524
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
GLUSTERFS: Number of bytes read=400410
GLUSTERFS: Number of bytes written=100000000000
GLUSTERFS: Number of read operations=0
GLUSTERFS: Number of large read operations=0
GLUSTERFS: Number of write operations=0
Job Counters
Killed map tasks=10
Launched map tasks=106
Other local map tasks=106
Total time spent by all maps in occupied slots (ms)=10068071
Total time spent by all reduces in occupied slots (ms)=0
Map-Reduce Framework
Map input records=1000000000
Map output records=1000000000
Input split bytes=8251
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=8753
CPU time spent (ms)=1116410
Physical memory (bytes) snapshot=19736702976
Virtual memory (bytes) snapshot=105021358080
Total committed heap usage (bytes)=45473071104
org.apache.hadoop.examples.terasort.TeraGen$Counters
CHECKSUM=2147523228284173905
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=100000000000
14/02/27 15:47:20 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/27 15:47:20 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/27 15:47:20 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS, CRC disabled.
14/02/27 15:47:20 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=7b04317, git.commit.user.email=[email protected], git.commit.message.full=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers

include the sudoers file in the srpm, git.commit.id=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.message.short=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers, git.commit.user.name=jay vyas, git.build.user.name=Unknown, git.commit.id.describe=2.1.6, git.build.user.email=Unknown, git.branch=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.time=07.02.2014 @ 12:06:31 EST, git.build.time=10.02.2014 @ 13:31:20 EST}
14/02/27 15:47:20 INFO glusterfs.GlusterFileSystem: GIT_TAG=2.1.6
14/02/27 15:47:20 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/27 15:47:20 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/27 15:47:20 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/27 15:47:20 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/27 15:47:20 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/27 15:47:20 INFO glusterfs.GlusterVolume: Write buffer size : 131072
14/02/27 15:47:21 WARN conf.Configuration: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
Deleted /tmp/HiBench/Terasort/Input/_SUCCESS

Now HDFS Terasort results:

application_1393510237328_0006 root TeraSort default Thu, 27 Feb 2014 14:23:18 GMT Thu, 27 Feb 2014 14:26:11 GMT FINISHED SUCCEEDED
History

========== running terasort bench ==========
HADOOP_EXECUTABLE=/usr/lib/hadoop/bin/hadoop
HADOOP_CONF_DIR=/usr/lib/hadoop/etc/hadoop
HADOOP_EXAMPLES_JAR=/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
Deleted /tmp/HiBench/Terasort/Output
/usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort -D mapreduce.job.reduces=48 /tmp/HiBench/Terasort/Input /tmp/HiBench/Terasort/Output
14/02/27 15:23:16 INFO terasort.TeraSort: starting
14/02/27 15:23:17 INFO input.FileInputFormat: Total input paths to process : 96
Spent 274ms computing base-splits.
Spent 10ms computing TeraScheduler splits.
Computing input splits took 285ms
Sampling 10 splits of 768
Making 48 from 100000 sampled records
Computing parititions took 296ms
Spent 584ms computing partitions.
14/02/27 15:23:17 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
14/02/27 15:23:17 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
14/02/27 15:23:18 INFO mapreduce.JobSubmitter: number of splits:768
14/02/27 15:23:18 WARN conf.Configuration: user.name is deprecated. Instead, use mapreduce.job.user.name
14/02/27 15:23:18 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/02/27 15:23:18 WARN conf.Configuration: mapred.cache.files.filesizes is deprecated. Instead, use mapreduce.job.cache.files.filesizes
14/02/27 15:23:18 WARN conf.Configuration: mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files
14/02/27 15:23:18 WARN conf.Configuration: mapreduce.partitioner.class is deprecated. Instead, use mapreduce.job.partitioner.class
14/02/27 15:23:18 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/02/27 15:23:18 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/02/27 15:23:18 WARN conf.Configuration: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
14/02/27 15:23:18 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/02/27 15:23:18 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/02/27 15:23:18 WARN conf.Configuration: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
14/02/27 15:23:18 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/02/27 15:23:18 WARN conf.Configuration: mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps
14/02/27 15:23:18 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/02/27 15:23:18 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/02/27 15:23:18 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1393510237328_0006
14/02/27 15:23:18 INFO client.YarnClientImpl: Submitted application application_1393510237328_0006 to ResourceManager at hp-jobtracker-1.hpintelco.org/10.3.222.41:8032
14/02/27 15:23:18 INFO mapreduce.Job: The url to track the job: http://hp-jobtracker-1.hpintelco.org:8088/proxy/application_1393510237328_0006/
14/02/27 15:23:18 INFO mapreduce.Job: Running job: job_1393510237328_0006
14/02/27 15:23:23 INFO mapreduce.Job: Job job_1393510237328_0006 running in uber mode : false
14/02/27 15:23:23 INFO mapreduce.Job: map 0% reduce 0%
[...]
14/02/27 15:25:57 INFO mapreduce.Job: map 100% reduce 100%
14/02/27 15:26:07 INFO mapreduce.Job: Job job_1393510237328_0006 completed successfully
14/02/27 15:26:07 INFO mapreduce.Job: Counters: 45
File System Counters
FILE: Number of bytes read=208148757282
FILE: Number of bytes written=312066255570
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=100000124416
HDFS: Number of bytes written=100000000000
HDFS: Number of read operations=2448
HDFS: Number of large read operations=0
HDFS: Number of write operations=96
Job Counters
Killed map tasks=1
Killed reduce tasks=1
Launched map tasks=769
Launched reduce tasks=49
Data-local map tasks=769
Total time spent by all maps in occupied slots (ms)=24813386
Total time spent by all reduces in occupied slots (ms)=4937789
Map-Reduce Framework
Map input records=1000000000
Map output records=1000000000
Map output bytes=102000000000
Map output materialized bytes=104000221184
Input split bytes=124416
Combine input records=0
Combine output records=0
Reduce input groups=1000000000
Reduce shuffle bytes=104000221184
Reduce input records=1000000000
Reduce output records=1000000000
Spilled Records=3000000000
Shuffled Maps =36864
Failed Shuffles=0
Merged Map outputs=36864
GC time elapsed (ms)=929575
CPU time spent (ms)=16325620
Physical memory (bytes) snapshot=300925956096
Virtual memory (bytes) snapshot=895643398144
Total committed heap usage (bytes)=412231925760
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=100000000000
File Output Format Counters
Bytes Written=100000000000
14/02/27 15:26:07 INFO terasort.TeraSort: done

Whereas when using glusterFS terasort the results show 3 times Launched map tasks and a very long time of handling (on the same env):

14/02/26 10:46:28 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/26 10:46:28 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/26 10:46:28 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS, CRC disabled.
14/02/26 10:46:28 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=7b04317, git.commit.user.email=[email protected],
git.commit.message.full=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers

include the sudoers file in the srpm, git.commit.id=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.message.short=Merge pull request #80
from jayunit100/2.1.6_release_fix_sudoers, git.commit.user.name=jay vyas, git.build.user.name=Unknown, git.commit.id.describe=2.1.6,
git.build.user.email=Unknown, git.branch=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.time=07.02.2014 @ 12:06:31 EST,
git.build.time=10.02.2014 @ 13:31:20 EST}
14/02/26 10:46:28 INFO glusterfs.GlusterFileSystem: GIT_TAG=2.1.6
14/02/26 10:46:28 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/26 10:46:28 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/26 10:46:28 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/26 10:46:28 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/26 10:46:28 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/26 10:46:28 INFO glusterfs.GlusterVolume: Write buffer size : 131072
14/02/26 10:46:29 WARN conf.Configuration: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
Deleted /tmp/HiBench/Terasort/Input/_SUCCESS
========== running terasort bench ==========
HADOOP_EXECUTABLE=/usr/lib/hadoop/bin/hadoop
HADOOP_CONF_DIR=/usr/lib/hadoop/etc/hadoop
HADOOP_EXAMPLES_JAR=/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
14/02/26 10:46:30 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/26 10:46:30 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/26 10:46:30 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS, CRC disabled.
14/02/26 10:46:30 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=7b04317, git.commit.user.email=[email protected],
git.commit.message.full=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers

include the sudoers file in the srpm, git.commit.id=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.message.short=Merge pull request #80
from jayunit100/2.1.6_release_fix_sudoers, git.commit.user.name=jay vyas, git.build.user.name=Unknown, git.commit.id.describe=2.1.6,
git.build.user.email=Unknown, git.branch=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.time=07.02.2014 @ 12:06:31 EST,
git.build.time=10.02.2014 @ 13:31:20 EST}
14/02/26 10:46:30 INFO glusterfs.GlusterFileSystem: GIT_TAG=2.1.6
14/02/26 10:46:30 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/26 10:46:30 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/26 10:46:30 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/26 10:46:30 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/26 10:46:30 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/26 10:46:30 INFO glusterfs.GlusterVolume: Write buffer size : 131072
rm: `/tmp/HiBench/Terasort/Output': No such file or directory
14/02/26 10:46:31 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/26 10:46:31 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/26 10:46:31 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS, CRC disabled.
14/02/26 10:46:31 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=7b04317, git.commit.user.email=[email protected],
git.commit.message.full=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers

include the sudoers file in the srpm, git.commit.id=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.message.short=Merge pull request #80
from jayunit100/2.1.6_release_fix_sudoers, git.commit.user.name=jay vyas, git.build.user.name=Unknown, git.commit.id.describe=2.1.6,
git.build.user.email=Unknown, git.branch=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.time=07.02.2014 @ 12:06:31 EST,
git.build.time=10.02.2014 @ 13:31:20 EST}
14/02/26 10:46:31 INFO glusterfs.GlusterFileSystem: GIT_TAG=2.1.6
14/02/26 10:46:31 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/26 10:46:31 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/26 10:46:31 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/26 10:46:31 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/26 10:46:31 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/26 10:46:31 INFO glusterfs.GlusterVolume: Write buffer size : 131072
14/02/26 10:46:32 INFO terasort.TeraSort: starting
14/02/26 10:46:32 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/26 10:46:32 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/26 10:46:32 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS, CRC disabled.
14/02/26 10:46:32 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=7b04317, git.commit.user.email=[email protected],
git.commit.message.full=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers

include the sudoers file in the srpm, git.commit.id=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.message.short=Merge pull request #80
from jayunit100/2.1.6_release_fix_sudoers, git.commit.user.name=jay vyas, git.build.user.name=Unknown, git.commit.id.describe=2.1.6,
git.build.user.email=Unknown, git.branch=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.time=07.02.2014 @ 12:06:31 EST,
git.build.time=10.02.2014 @ 13:31:20 EST}
14/02/26 10:46:32 INFO glusterfs.GlusterFileSystem: GIT_TAG=2.1.6
14/02/26 10:46:32 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/26 10:46:32 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/26 10:46:32 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/26 10:46:32 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/26 10:46:32 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/26 10:46:32 INFO glusterfs.GlusterVolume: Write buffer size : 131072
14/02/26 10:46:32 INFO input.FileInputFormat: Total input paths to process : 96
Spent 1644ms computing base-splits.
Spent 30ms computing TeraScheduler splits.
Computing input splits took 1675ms
Sampling 10 splits of 2976
Making 48 from 100000 sampled records
Computing parititions took 1088ms
Spent 2766ms computing partitions.
14/02/26 10:46:35 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
14/02/26 10:46:35 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
14/02/26 10:46:35 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/26 10:46:35 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/26 10:46:35 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/26 10:46:35 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/26 10:46:35 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/26 10:46:35 INFO glusterfs.GlusterVolume: Write buffer size : 131072
14/02/26 10:46:38 INFO mapreduce.JobSubmitter: number of splits:2976
14/02/26 10:46:38 WARN conf.Configuration: user.name is deprecated. Instead, use mapreduce.job.user.name
14/02/26 10:46:38 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/02/26 10:46:38 WARN conf.Configuration: mapred.cache.files.filesizes is deprecated. Instead, use mapreduce.job.cache.files.filesizes
14/02/26 10:46:38 WARN conf.Configuration: mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files
14/02/26 10:46:38 WARN conf.Configuration: mapreduce.partitioner.class is deprecated. Instead, use mapreduce.job.partitioner.class
14/02/26 10:46:38 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/02/26 10:46:38 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/02/26 10:46:38 WARN conf.Configuration: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
14/02/26 10:46:38 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/02/26 10:46:38 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/02/26 10:46:38 WARN conf.Configuration: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
14/02/26 10:46:38 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/02/26 10:46:38 WARN conf.Configuration: mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps
14/02/26 10:46:38 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/02/26 10:46:38 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/02/26 10:46:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1393404749197_0036
14/02/26 10:46:39 INFO client.YarnClientImpl: Submitted application application_1393404749197_0036 to ResourceManager at hp-jobtracker-1.hpintelco.org/10.3.222.41:8032
14/02/26 10:46:39 INFO mapreduce.Job: The url to track the job:
http://hp-jobtracker-1.hpintelco.org:8088/proxy/application_1393404749197_0036/
14/02/26 10:46:39 INFO mapreduce.Job: Running job: job_1393404749197_0036
14/02/26 10:46:54 INFO mapreduce.Job: Job job_1393404749197_0036 running in uber mode : false
14/02/26 10:46:54 INFO mapreduce.Job: map 0% reduce 0%
[...]
14/02/26 11:30:47 INFO mapreduce.Job: map 100% reduce 100%
14/02/26 11:31:03 INFO mapreduce.Job: Job job_1393404749197_0036 completed successfully
14/02/26 11:31:03 INFO mapreduce.Job: Counters: 45
File System Counters
FILE: Number of bytes read=104001540032
FILE: Number of bytes written=208243540736
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
GLUSTERFS: Number of bytes read=100451375097
GLUSTERFS: Number of bytes written=100000000000
GLUSTERFS: Number of read operations=0
GLUSTERFS: Number of large read operations=0
GLUSTERFS: Number of write operations=0
Job Counters
Killed map tasks=1
Killed reduce tasks=13
Launched map tasks=2977
Launched reduce tasks=61
Rack-local map tasks=2977
Total time spent by all maps in occupied slots (ms)=161663015
Total time spent by all reduces in occupied slots (ms)=101269527
Map-Reduce Framework
Map input records=1000000000
Map output records=1000000000
Map output bytes=102000000000
Map output materialized bytes=104000857088
Input split bytes=395808
Combine input records=0
Combine output records=0
Reduce input groups=1000000000
Reduce shuffle bytes=104000857088
Reduce input records=1000000000
Reduce output records=1000000000
Spilled Records=2000000000
Shuffled Maps =142848
Failed Shuffles=0
Merged Map outputs=142848
GC time elapsed (ms)=580169
CPU time spent (ms)=23171180
Physical memory (bytes) snapshot=1613411700736
Virtual memory (bytes) snapshot=4991907897344
Total committed heap usage (bytes)=3094560636928
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=100125825280
File Output Format Counters
Bytes Written=100000000000
14/02/26 11:31:03 INFO terasort.TeraSort: done
14/02/26 11:31:04 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
14/02/26 11:31:04 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
14/02/26 11:31:04 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/26 11:31:04 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/26 11:31:04 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/26 11:31:04 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/26 11:31:04 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/26 11:31:04 INFO glusterfs.GlusterVolume: Write buffer size : 131072

As you can see, this conf generates 2977 Launched map tasks whereas the HDFS one generates only 769.

Could this be in relationship with https://bugzilla.redhat.com/show_bug.cgi?id=1071337 ?

Let me know if you need more details. The infra is still available for us to make more tests.

Apache Bigtop smoke tests

We will need to setup a cluster for bigtop verification of glusterfs-hadoop . This will not run upstream, but we will make the results available to the upstream community. We have servers to do this. The internal tests should

  • pull jars from upstream releases.
  • copy them to local (just like our slaves in EC2 do)
  • run mahout,pig,hive,mapreduce, and other similar tests using the new https://issues.apache.org/jira/browse/BIGTOP-1222 simplified bigtop smoke infrastructure which is gradle (or maybe , to start, just use the maven based smoke runner wrapped in a bash script).

GlusterFS-Hadoop plugin seems to no longer be under active development

As above. There have been relatively few commits since 2015 (in both the source repo and the wiki), and there are several outstanding PRs and issues.

  • Is this project still actively developed or worked on?
  • Is there a considerable amount of work that this project requires?
  • If there is any work required, is it something that a single individual might be able to help with? (i.e., is there anything I can do?)

I'd ideally like to use GlusterFS as a drop-in replacement for HDFS, but I just want to make sure that this project is still receiving development attention (otherwise I can't justify it to the IT/Ops people in my organisation)

pom changes

Hi
is pom's file change for every project?
for example in my case, glusterfs-hadoop rpm version is 2.0.1 and must change this version in pom's file or other files!


2.3.13
glusterfs-hadoop
hadoop filesystem impl for glusterfs


thanks for you're reply.

YARN incompatibility with 2.3.GlusterFS stack and ambari 2.2.1.0

Hi,

We are trying to install Hadoop cluster using 2.3.GlusterFS stack through Ambari 2.2.1.0.
All provisioning is done with Ambari blueprints like :
Blueprint
{ "configurations" : [ { "core-site": { "hadoop.proxyuser.hcat.groups" : "*", "hadoop.proxyuser.hcat.hosts" : "*", "hadoop.proxyuser.hue.groups" : "*", "hadoop.proxyuser.hue.hosts" : "*", "hadoop.security.authentication" : "simple", "fs.AbstractFileSystem.glusterfs.impl" : "org.apache.hadoop.fs.local.GlusterFs", "fs.trash.interval" : "360", "fs.glusterfs.impl" : "org.apache.hadoop.fs.glusterfs.GlusterFileSystem", "hadoop.security.authorization" : "false", "net.topology.script.file.name" : "/etc/hadoop/conf/topology_script.py" } }, { "ams-hbase-env" : { "regionserver_xmn_size" : "384m", "hbase_regionserver_heapsize" : "1024m", "hbase_master_heapsize" : "1024m", "hbase_master_xmn_size" : "384m" } }, { "ams-env" : { "metrics_collector_heapsize" : "1024m" } }, { "hadoop-env" : { "dtnode_heapsize" : "1024m", "namenode_heapsize" : "2048m", "namenode_opt_maxnewsize" : "384m", "namenode_opt_newsize" : "384m", "namenode_host" : "master01", "snamenode_host" : "master01", "glusterfs_user" : "root", "hdfs_user" : "hdfs", "hdfs_log_dir_prefix" : "/var/log/hadoop" } }, { "hdfs-site" : { "dfs.datanode.data.dir" : "/hadoop/hdfs/data", "dfs.datanode.balance.bandwidthPerSec" : "12500000", "dfs.datanode.max.transfer.threads": "4096", "dfs.datanode.failed.volumes.tolerated" : "1", "dfs.replication" : "true" } }, { "spark-defaults" : { "spark.executor.instances" : "2", "spark.executor.memory" : "7808m", "spark.driver.memory" : "3712m", "spark.yarn.am.memory" : "3712m", "spark.yarn.executor.memoryOverhead" : "384", "spark.yarn.driver.memoryOverhead" : "384", "spark.yarn.am.memoryOverhead" : "384" } }, { "mapred-site" : { "mapreduce.map.java.opts" : "-Xmx1638m", "mapreduce.map.memory.mb" : "2048", "mapreduce.reduce.java.opts" : "-Xmx1638m", "mapreduce.reduce.memory.mb" : "2048", "mapreduce.task.io.sort.mb" : "768", "yarn.app.mapreduce.am.command-opts" : "-Xmx1638m -Dhdp.version=${hdp.version}", "yarn.app.mapreduce.am.resource.mb" : "2048" } }, { "tez-site" : { "tez.am.resource.memory.mb" : "2048", "tez.task.resource.memory.mb" : "2048" } }, { "spark-defaults" : { "spark.executor.instances" : "1", "spark.executor.memory" : "3712m", "spark.driver.memory" : "1664m", "spark.yarn.am.memory" : "1664m", "spark.yarn.executor.memoryOverhead" : "384", "spark.yarn.driver.memoryOverhead" : "384", "spark.yarn.am.memoryOverhead" : "384" } }, { "storm-site" : { "logviewer.port" : "8005" } }, { "oozie-site" : { "oozie.service.ProxyUserService.proxyuser.hue.groups" : "*", "oozie.service.ProxyUserService.proxyuser.hue.hosts" : "*" } }, { "webhcat-site" : { "webhcat.proxyuser.hue.groups" : "*", "webhcat.proxyuser.hue.hosts" : "*" } }, { "hive-site" : { "hive.tez.container.size" : "-1", "hive.tez.java.opts": "-1", "fs.file.impl.disable.cache" : "true", "fs.hdfs.impl.disable.cache" : "true", "javax.jdo.option.ConnectionPassword" : "my-pw" } } ], "host_groups" : [ { "name" : "slavenode_simple", "configurations" : [ ], "components" : [ { "name" : "ZOOKEEPER_CLIENT" }, { "name" : "OOZIE_CLIENT" }, { "name" : "HIVE_CLIENT" }, { "name" : "GLUSTERFS_CLIENT" }, { "name" : "YARN_CLIENT" }, { "name" : "TEZ_CLIENT" }, { "name" : "SPARK_CLIENT" }, { "name" : "NODEMANAGER" }, { "name" : "METRICS_MONITOR" } ], "cardinality" : "2" }, { "name" : "masternode_1", "configurations" : [ ], "components" : [ { "name" : "NODEMANAGER" }, { "name" : "SPARK_CLIENT" }, { "name" : "YARN_CLIENT" }, { "name" : "GLUSTERFS_CLIENT" }, { "name" : "METRICS_MONITOR" }, { "name" : "TEZ_CLIENT" }, { "name" : "ZOOKEEPER_CLIENT" }, { "name" : "ZOOKEEPER_SERVER" }, { "name" : "AMBARI_SERVER" }, { "name" : "SPARK_JOBHISTORYSERVER" }, { "name" : "APP_TIMELINE_SERVER" }, { "name" : "METRICS_COLLECTOR" }, { "name" : "RESOURCEMANAGER" }, { "name" : "WEBHCAT_SERVER" }, { "name" : "OOZIE_SERVER" }, { "name" : "HIVE_METASTORE" }, { "name" : "HIVE_SERVER" } ], "cardinality" : "1" } ], "Blueprints" : { "stack_name" : "HDP", "stack_version" : "2.3.GlusterFS" } }

Template used for blueprint to be provided to :
{ "blueprint": "cluster_blueprint", "default_password": "my-pw", "host_groups": [ { "name" : "masternode_1", "hosts" : [ { "fqdn": "master01.domain.com" } ] } ] }

Here is the full stack :
23 Mar 2016 11:05:51,256 INFO [qtp-ambari-client-23] AmbariManagementControllerImpl:1471 - Applying configuration with tag 'INITIAL' to cluster 'test-cluster2' for configuration type zookeeper-env 23 Mar 2016 11:05:51,280 INFO [qtp-ambari-client-23] AmbariManagementControllerImpl:1471 - Applying configuration with tag 'INITIAL' to cluster 'test-cluster2' for configuration type zookeeper-log4j 23 Mar 2016 11:05:51,323 INFO [qtp-ambari-client-23] ClusterConfigurationRequest:358 - Sending cluster config update request for service = YARN 23 Mar 2016 11:05:51,324 INFO [qtp-ambari-client-23] AmbariManagementControllerImpl:1353 - Received a updateCluster request, clusterId=null, clusterName=test-cluster2, securityType=null, request={ clusterName=test-cluster2, clusterId=null, provisioningState=null, securityType=null, stackVersion=null, desired_scv=null, hosts=[] } 23 Mar 2016 11:05:51,324 INFO [qtp-ambari-client-23] AmbariManagementControllerImpl:1471 - Applying configuration with tag 'INITIAL' to cluster 'test-cluster2' for configuration type hdfs-site 23 Mar 2016 11:05:51,351 INFO [qtp-ambari-client-23] AmbariManagementControllerImpl:1471 - Applying configuration with tag 'INITIAL' to cluster 'test-cluster2' for configuration type capacity-scheduler 23 Mar 2016 11:05:51,378 INFO [qtp-ambari-client-23] AmbariManagementControllerImpl:1471 - Applying configuration with tag 'INITIAL' to cluster 'test-cluster2' for configuration type yarn-env 23 Mar 2016 11:05:51,411 INFO [qtp-ambari-client-23] AmbariManagementControllerImpl:1471 - Applying configuration with tag 'INITIAL' to cluster 'test-cluster2' for configuration type yarn-site 23 Mar 2016 11:05:51,490 INFO [qtp-ambari-client-23] AmbariManagementControllerImpl:1471 - Applying configuration with tag 'INITIAL' to cluster 'test-cluster2' for configuration type yarn-log4j 23 Mar 2016 11:05:51,567 ERROR [qtp-ambari-client-23] ClusterImpl:2635 - Updating configs for multiple services by a single API request isn't supported, config version not created 23 Mar 2016 11:05:51,572 ERROR [qtp-ambari-client-23] BaseManagementHandler:66 - Caught a runtime exception while attempting to create a resource: Failed to set configurations on cluster: org.apache.ambari.server.AmbariException: Updating configs for multiple services by a single API request isn't supported java.lang.RuntimeException: Failed to set configurations on cluster: org.apache.ambari.server.AmbariException: Updating configs for multiple services by a single API request isn't supported at org.apache.ambari.server.topology.AmbariContext.setConfigurationOnCluster(AmbariContext.java:390) at org.apache.ambari.server.topology.ClusterConfigurationRequest.setConfigurationsOnCluster(ClusterConfigurationRequest.java:359) at org.apache.ambari.server.topology.ClusterConfigurationRequest.setConfigurationsOnCluster(ClusterConfigurationRequest.java:279) at org.apache.ambari.server.topology.ClusterConfigurationRequest.<init>(ClusterConfigurationRequest.java:78) at org.apache.ambari.server.topology.ClusterConfigurationRequest.<init>(ClusterConfigurationRequest.java:83) at org.apache.ambari.server.topology.TopologyManager.provisionCluster(TopologyManager.java:191) at org.apache.ambari.server.controller.internal.ClusterResourceProvider.processBlueprintCreate(ClusterResourceProvider.java:517) at org.apache.ambari.server.controller.internal.ClusterResourceProvider.createResources(ClusterResourceProvider.java:174) at org.apache.ambari.server.controller.internal.ClusterControllerImpl.createResources(ClusterControllerImpl.java:289) at org.apache.ambari.server.api.services.persistence.PersistenceManagerImpl.create(PersistenceManagerImpl.java:76) at org.apache.ambari.server.api.handlers.CreateHandler.persist(CreateHandler.java:36) at org.apache.ambari.server.api.handlers.BaseManagementHandler.handleRequest(BaseManagementHandler.java:72) at org.apache.ambari.server.api.services.BaseRequest.process(BaseRequest.java:135) at org.apache.ambari.server.api.services.BaseService.handleRequest(BaseService.java:106) at org.apache.ambari.server.api.services.BaseService.handleRequest(BaseService.java:75) at org.apache.ambari.server.api.services.ClusterService.createCluster(ClusterService.java:131) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:540) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:715) at javax.servlet.http.HttpServlet.service(HttpServlet.java:770) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1496) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330) at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:118) at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:84) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:113) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:103) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:113) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:54) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.savedrequest.RequestCacheAwareFilter.doFilter(RequestCacheAwareFilter.java:45) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.apache.ambari.server.security.authorization.AmbariAuthorizationFilter.doFilter(AmbariAuthorizationFilter.java:196) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.authentication.www.BasicAuthenticationFilter.doFilter(BasicAuthenticationFilter.java:201) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:87) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:192) at org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:160) at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:237) at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:167) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467) at org.apache.ambari.server.api.MethodOverrideFilter.doFilter(MethodOverrideFilter.java:72) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467) at org.apache.ambari.server.api.AmbariPersistFilter.doFilter(AmbariPersistFilter.java:47) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467) at org.apache.ambari.server.security.AbstractSecurityHeaderFilter.doFilter(AbstractSecurityHeaderFilter.java:109) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467) at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:82) at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:294) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.apache.ambari.server.controller.AmbariHandlerList.processHandlers(AmbariHandlerList.java:216) at org.apache.ambari.server.controller.AmbariHandlerList.processHandlers(AmbariHandlerList.java:205) at org.apache.ambari.server.controller.AmbariHandlerList.handle(AmbariHandlerList.java:139) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:370) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:982) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1043) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:865) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:696) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:53) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.ambari.server.AmbariException: Updating configs for multiple services by a single API request isn't supported at org.apache.ambari.server.utils.RetryHelper.executeWithRetry(RetryHelper.java:110) at org.apache.ambari.server.topology.AmbariContext.setConfigurationOnCluster(AmbariContext.java:381) ... 96 more Caused by: java.lang.IllegalArgumentException: Updating configs for multiple services by a single API request isn't supported at org.apache.ambari.server.state.cluster.ClusterImpl.applyConfigs(ClusterImpl.java:2634) at org.apache.ambari.server.orm.AmbariJpaLocalTxnInterceptor.invoke(AmbariJpaLocalTxnInterceptor.java:68) at org.apache.ambari.server.state.cluster.ClusterImpl.addDesiredConfig(ClusterImpl.java:2172) at org.apache.ambari.server.controller.AmbariManagementControllerImpl.updateCluster(AmbariManagementControllerImpl.java:1485) at org.apache.ambari.server.controller.AmbariManagementControllerImpl.updateClusters(AmbariManagementControllerImpl.java:1337) at org.apache.ambari.server.topology.AmbariContext$5.call(AmbariContext.java:384) at org.apache.ambari.server.utils.RetryHelper.executeWithRetry(RetryHelper.java:95) ... 97 more

When we change GLUSTERFS_CLIENT to HDFS_CLIENT and stack to be 2.4 (HDFS), everything is fine.

Do you have any idea please ? Is there any incompatibility ?
Thanks

resolve GlusterFSXattr

Process p = Runtime.getRuntime()
.exec(new String[] { "sudo", "getfattr", "-m", ".", "-n", "trusted.glusterfs.pathinfo", filename });
You need to resolve next way.
Because this contains error shellcommand can't contain any keys, like space if i am not wrong Process p=Runtime.getRuntime().exec(shellCommand);

Bug with the YARN streaming interface

Using version 2.1.6 of the glusterfs-hadoop plugin in an hadoop 2.x and glusterfs 3.4 environment, we have errors with the YARN streaming interface.

SW used:
glusterfs-libs-3.4.0.59rhs-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.59rhs-1.el6rhs.x86_64
glusterfs-3.4.0.59rhs-1.el6rhs.x86_64
glusterfs-server-3.4.0.59rhs-1.el6rhs.x86_64

RHEL 6.4 with kernel 2.6.32-358.32.3.el6.x86_64
glusterfs-hadoop-2.1.6.jar

Run results:

-bash-4.1$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -mapper /bin/cat -input /ls-gfs.txt -output /process/
14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS, CRC disabled.
14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=7b04317, git.commit.user.email=[email protected],
git.commit.message.full=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers

include the sudoers file in the srpm, git.commit.id=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.message.short=Merge pull request #80
from jayunit100/2.1.6_release_fix_sudoers, git.commit.user.name=jay vyas, git.build.user.name=Unknown, git.commit.id.describe=2.1.6,
git.build.user.email=Unknown, git.branch=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.time=07.02.2014 @ 12:06:31 EST,
git.build.time=10.02.2014 @ 13:31:20 EST}
14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: GIT_TAG=2.1.6
14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/28 15:05:09 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Write buffer size : 131072
packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.0.4-Intel.jar] /tmp/streamjob2645998574693064427.jar tmpDir=null
14/02/28 15:05:10 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
14/02/28 15:05:10 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Write buffer size : 131072
14/02/28 15:05:10 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
14/02/28 15:05:10 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Write buffer size : 131072
14/02/28 15:05:10 INFO mapred.FileInputFormat: Total input paths to process : 1
14/02/28 15:05:10 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/yarn/.staging/job_1393593232248_0003
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.hadoop.mapred.FileInputFormat.getSplitHosts(FileInputFormat.java:508)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:298)
at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:503)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:495)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:390)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1234)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1231)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1231)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:589)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:584)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:584)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:575)
at org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:1014)
at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:134)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

run wordcount error

when I use the command:
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar teragen -Dmapred.map.tasks=20 109951 terasort/100M-input

15/09/03 09:32:58 INFO glusterfs.GlusterVolume: Initializing gluster volume..
15/09/03 09:32:58 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
15/09/03 09:32:58 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS, CRC disabled.
15/09/03 09:32:58 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=46d9738, git.commit.user.email=[email protected], git.commit.message.full=Merge branch 'master' of https://github.com/gluster/glusterfs-hadoop
, git.commit.id=46d973834ae1db6eb6cf9ac025ded9a9ffa38c93, git.commit.message.short=Merge branch 'master' of https://github.com/gluster/glusterfs-hadoop, git.commit.user.name=childsb, git.build.user.name=childsb, git.commit.id.describe=2.3.13-6-g46d9738-dirty, git.build.user.email=[email protected], git.branch=master, git.commit.time=21.01.2015 @ 10:31:08 CST, git.build.time=21.01.2015 @ 12:01:55 CST}
15/09/03 09:32:58 INFO glusterfs.GlusterFileSystem: GIT_TAG=2.3.13
15/09/03 09:32:58 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
15/09/03 09:32:58 INFO glusterfs.GlusterVolume: Initializing gluster volume..
15/09/03 09:32:58 INFO glusterfs.GlusterVolume: Gluster volume: HadoopVol at : /mnt/glusterfs
15/09/03 09:32:58 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/hadoop
15/09/03 09:32:58 INFO glusterfs.GlusterVolume: Write buffer size : 131072
15/09/03 09:32:58 INFO glusterfs.GlusterVolume: Default block size : 67108864
15/09/03 09:32:58 INFO glusterfs.GlusterVolume: Directory list order : fs ordering
15/09/03 09:32:59 INFO client.RMProxy: Connecting to ResourceManager at cn0/192.168.1.40:8032
15/09/03 09:33:02 INFO glusterfs.GlusterVolume: Initializing gluster volume..
15/09/03 09:33:02 INFO glusterfs.GlusterVolume: Initializing gluster volume..
15/09/03 09:33:02 INFO glusterfs.GlusterVolume: Gluster volume: HadoopVol at : /mnt/glusterfs
15/09/03 09:33:02 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/hadoop
15/09/03 09:33:02 INFO glusterfs.GlusterVolume: Write buffer size : 131072
15/09/03 09:33:02 INFO glusterfs.GlusterVolume: Default block size : 67108864
15/09/03 09:33:02 INFO glusterfs.GlusterVolume: Directory list order : fs ordering
15/09/03 09:33:05 INFO terasort.TeraSort: Generating 109951 using 20
15/09/03 09:33:05 INFO mapreduce.JobSubmitter: number of splits:20
15/09/03 09:33:05 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/09/03 09:33:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1441272751391_0001
15/09/03 09:33:07 INFO glusterfs.GlusterVolume: Initializing gluster volume..
15/09/03 09:33:07 INFO glusterfs.GlusterVolume: Initializing gluster volume..
15/09/03 09:33:07 INFO glusterfs.GlusterVolume: Gluster volume: HadoopVol at : /mnt/glusterfs
15/09/03 09:33:07 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/hadoop
15/09/03 09:33:07 INFO glusterfs.GlusterVolume: Write buffer size : 131072
15/09/03 09:33:07 INFO glusterfs.GlusterVolume: Default block size : 67108864
15/09/03 09:33:07 INFO glusterfs.GlusterVolume: Directory list order : fs ordering
15/09/03 09:33:09 INFO impl.YarnClientImpl: Submitted application application_1441272751391_0001
15/09/03 09:33:09 INFO mapreduce.Job: The url to track the job: http://cn0:8088/proxy/application_1441272751391_0001/
15/09/03 09:33:09 INFO mapreduce.Job: Running job: job_1441272751391_0001
15/09/03 09:33:19 INFO mapreduce.Job: Job job_1441272751391_0001 running in uber mode : false
15/09/03 09:33:19 INFO mapreduce.Job: map 0% reduce 0%
15/09/03 09:33:19 INFO mapreduce.Job: Job job_1441272751391_0001 failed with state FAILED due to: Application application_1441272751391_0001 failed 2 times due to AM Container for appattempt_1441272751391_0001_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://cn0:8088/proxy/application_1441272751391_0001/Then, click on links to logs of each attempt.
Diagnostics: File glusterfs:/tmp/hadoop-yarn/staging/hadoop/.staging/job_1441272751391_0001/job.splitmetainfo does not exist.
java.io.FileNotFoundException: File glusterfs:/tmp/hadoop-yarn/staging/hadoop/.staging/job_1441272751391_0001/job.splitmetainfo does not exist.
at org.apache.hadoop.fs.glusterfs.GlusterVolume.getFileStatus(GlusterVolume.java:368)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:251)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Failing this attempt. Failing the application.
15/09/03 09:33:19 INFO mapreduce.Job: Counters: 0

why this happen

Building plugin without Git causes crash

On a Ubuntu server, I decided to download the plugin by using wget rather than using git clone. Building the plugin went fine, but when I tried to use it with my GlusterFS-Hadoop install, the following error occurred:

hadoop@gluster1:/usr/local/hadoop/bin$ ./hdfs dfs -ls
14/05/15 10:15:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/05/15 10:15:14 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/05/15 10:15:14 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
-ls: Fatal internal error
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2315)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:90)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2350)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2332)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:369)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:168)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:353)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
    at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:325)
    at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:224)
    at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:207)
    at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190)
    at org.apache.hadoop.fs.shell.Command.run(Command.java:154)
    at org.apache.hadoop.fs.FsShell.run(FsShell.java:255)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at org.apache.hadoop.fs.FsShell.main(FsShell.java:308)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129)
    ... 17 more
Caused by: java.lang.RuntimeException: Couldn't find git properties for version info null
    at org.apache.hadoop.fs.glusterfs.Version.<init>(Version.java:19)
    at org.apache.hadoop.fs.glusterfs.GlusterFileSystem.<init>(GlusterFileSystem.java:50)
    ... 22 more

Installing git and cloning the repository solved this problem. It seems to be dependent on the .git info in the repository and not Git itself, since uninstalling Git and rebuilding it caused no problems either.

support for major version incrementation

Lets decide how to do major version upgrades,,,, can this be automated via commit message? Or should we even worry about it at all?

(just some brainstorming...)
Right now

  • we explicitly grep for "ci-skip in commit message".
  • we always assume only the minor version is being incremented.

Lets make these key-value pairs sent into the bash script, example:

[ci-skip="false", version="auto"] translates to
--ci-skip=false --version=auto, which results in autoincrement of minor version along with auto publishing of the new version, that way someone who wants to upgrade the major version can do

commit -m "Lets bump to 2.2 [version="2.2.0"]

and then the version, rather than auto incrmented via python script, is explicitly set in the pom.

start-all.sh problem with glusterfs

hi, i use glusterfs 3.5.3 and hadoop 2.2.0 and make jar file and copy them to hadoop library.
and my core-site.xml is:

fs.glusterfs.impl org.apache.hadoop.fs.glusterfs.GlusterFileSystem fs.default.name glusterfs://fedora1.osslab.com:9010 fs.AbstractFileSystem.glusterfs.impl org.apache.hadoop.fs.local.GlusterFs fs.glusterfs.volumes test fs.glusterfs.volume.fuse.test /mnt/Hadoop

but when i use start-all script, i get this error:

Starting namenodes on []
fedora3.osslab.com: starting namenode, logging to /var/log/hadoop-hdfs/hadoop-root-namenode-fedora3.osslab.com.out
fedora3.osslab.com: starting datanode, logging to /var/log/hadoop-hdfs/hadoop-root-datanode-fedora3.osslab.com.out
Starting secondary namenodes [fedora1.osslab.com]
fedora1.osslab.com: starting secondarynamenode, logging to /var/log/hadoop-hdfs/hadoop-root-secondarynamenode-fedora1.osslab.com.out
fedora1.osslab.com: Exception in thread "main" java.lang.IllegalArgumentException: Invalid URI for NameNode address (check fs.defaultFS): glusterfs://fedora1.osslab.com:9010 is not of scheme 'hdfs'.
fedora1.osslab.com: at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:353)
fedora1.osslab.com: at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:335)
fedora1.osslab.com: at org.apache.hadoop.hdfs.server.namenode.NameNode.getServiceAddress(NameNode.java:328)
fedora1.osslab.com: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:235)
fedora1.osslab.com: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.(SecondaryNameNode.java:199)
fedora1.osslab.com: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:652)
starting yarn daemons
starting resourcemanager, logging to /var/log/hadoop-yarn/yarn-yarn-resourcemanager-fedora1.osslab.com.out
fedora3.osslab.com: starting nodemanager, logging to /var/log/hadoop-yarn/yarn-yarn-nodemanager-fedora3.osslab.com.out

the secondary namenode has problem with glusterfs.
does anyone have this problem?
thanks for your reply.

Need to define policy for hedging bets on the slippery reliance on underlying RawLocalFileSystem

We now have many instances of issues which can arise because the underlying RawLocalFileSystem version has semantics which are subtley different from HDFS.

Should we package a "stable" replication of RawLocalFileSystem implemented methods which copy over, from hadoop, the implementations that we know to be correct and specifically support those? With the various bugs out there around RawLocalFileSystem, it is a shame that we are so heavily dependenant on a classpath which we have no control over.

Examples: mkdirs , createNonRecursive, rename , and many other methods in RawLocalFileSystem are changing from version to version and it would be nice if we could have a stable library that we updated and included which has the latest and most reliable/correct semantics.

This is an open ended question - I'd like as many people as possible to comment on this and make an informed vote. and Ask more questions I havent fully explained the issue.

README file is old

Current README file of the project needs review as it no longer properly desribes most of the topics. Most important sections which requires rewrite are:

  • INSTALLATION - uses quite old Hadoop release as an example
  • CONFIGURATION - current configuration of the plugin is verry different, one would not configure plugin using this information

GlusterFSXattr

Not work with file name §r§lAstra§4§lLex§r§l_(§4§lBSL§r§l_Edit)_By_LexBoosT_§4§lV37.0§r§l.zip

IllegalArgumentException: Wrong FS when running hadoop wordcount job example.

I don't know if this is issue in the glusterfs hadoop plugin or on my configuration, but every time i try to run a wordcount example job, always return error like this:
java.lang.IllegalArgumentException: Wrong FS: glusterfs:/mapred/system, expected: file:///

Complete configuration below.

Hadoop version is 2.8.1
glusterfs version is 3.8.15-2.el7
run on Centos 7

Here is the complete command and stdout

[hadoop@gluster1 hadoop]$ bin/hadoop jar /tmp/hadoop-0.20.2-examples.jar wordcount /hadoop/yarn-hadoop-resourcemanager-gluster1.out /hadoop/out/
17/09/27 10:50:03 INFO glusterfs.GlusterVolume: Initializing gluster volume..
17/09/27 10:50:03 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
17/09/27 10:50:03 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS,  CRC disabled.
17/09/27 10:50:03 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=f0fee73, [email protected], git.commit.message.full=Merge pull request #122 from c
hildsb/getfattrparse

Refactor and cleanup the BlockLocation parsing code, git.commit.id=f0fee73c336ac19461d5b5bb91a77e05cff73361, git.commit.message.short=Merge pull request #122 from childsb/getfattrparse, gi
t.commit.user.name=bradley childs, git.build.user.name=Unknown, git.commit.id.describe=GA-12-gf0fee73, git.build.user.email=Unknown, git.branch=master, git.commit.time=30.03.2015 @ 20:06:4
6 UTC, git.build.time=20.09.2017 @ 10:02:14 UTC}
17/09/27 10:50:03 INFO glusterfs.GlusterFileSystem: GIT_TAG=GA
17/09/27 10:50:03 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
17/09/27 10:50:03 INFO glusterfs.GlusterVolume: Initializing gluster volume..
17/09/27 10:50:03 INFO glusterfs.GlusterVolume: Gluster volume: gv0 at : /mnt
17/09/27 10:50:03 WARN glusterfs.GlusterVolume: mapred.system.dir/mapreduce.jobtracker.system.dir does not exist: glusterfs:/mapred/system
17/09/27 10:50:03 WARN glusterfs.GlusterVolume: working directory does not exist: glusterfs:/user/hadoop
17/09/27 10:50:03 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/hadoop
17/09/27 10:50:03 INFO glusterfs.GlusterVolume: Write buffer size : 131072
17/09/27 10:50:03 INFO glusterfs.GlusterVolume: Default block size : 67108864
17/09/27 10:50:03 INFO glusterfs.GlusterVolume: Directory list order : fs ordering
17/09/27 10:50:03 INFO glusterfs.GlusterVolume: File timestamp lease significant digits removed : 0
17/09/27 10:50:03 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
17/09/27 10:50:03 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
java.lang.IllegalArgumentException: Wrong FS: glusterfs:/mapred/system, expected: file:///
        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:665)
        at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:484)
        at org.apache.hadoop.fs.FilterFileSystem.makeQualified(FilterFileSystem.java:120)
        at org.apache.hadoop.mapred.LocalJobRunner.getSystemDir(LocalJobRunner.java:864)
        at org.apache.hadoop.mapreduce.Cluster$1.run(Cluster.java:187)
        at org.apache.hadoop.mapreduce.Cluster$1.run(Cluster.java:185)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:421)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
        at org.apache.hadoop.mapreduce.Cluster.getFileSystem(Cluster.java:185)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1336)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1359)
        at org.apache.hadoop.examples.WordCount.main(WordCount.java:87)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
        at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
        at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:148)

core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>fs.glusterfs.impl</name>
  <value>org.apache.hadoop.fs.glusterfs.GlusterFileSystem</value>
</property>

<property>
  <name>fs.AbstractFileSystem.glusterfs.impl</name>
  <value>org.apache.hadoop.fs.local.GlusterFs</value>
</property>

<property>
    <name>fs.default.name</name>
    <value>glusterfs:///</value>
</property>

<property>
    <name>fs.glusterfs.volumes</name>
    <value>gv0</value>
</property>

<property>
    <name>fs.glusterfs.volume.fuse.gv0</name>
    <value>/mnt</value>
</property>

<property>
    <name>dfs.permissions</name>
    <value>false</value>
</property>
</configuration>

mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
    <name>yarn.app.mapreduce.am.staging-dir</name>
    <value>glusterfs:///tmp/hadoop-yarn/staging/mapred/.staging</value>
  </property>
  <property>
    <name>mapred.healthChecker.script.path</name>
    <value>glusterfs:///mapred/jobstatus</value>
  </property>
  <property>
    <name>mapred.job.tracker.history.completed.location</name>
    <value>glusterfs:///mapred/history/done</value>
  </property>
  <property>
    <name>mapred.system.dir</name>
    <value>glusterfs:///mapred/system</value>
  </property>
  <property>
    <name>mapreduce.jobhistory.done-dir</name>
    <value>glusterfs:///job-history/done</value>
  </property>

  <property>
    <name>mapreduce.jobhistory.intermediate-done-dir</name>
    <value>glusterfs:///job-history/intermediate-done</value>
  </property>

  <property>
    <name>mapreduce.jobtracker.staging.root.dir</name>
    <value>glusterfs:///user</value>
  </property>
  <property>
        <name>mapred.job.tracker</name>
        <value>hadoop-master:9001</value>
  </property>
</configuration>

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
        <name>dfs.data.dir</name>
        <value>/opt/hadoop/hadoop/dfs/name/data</value>
        <final>true</final>
</property>
<property>
        <name>dfs.name.dir</name>
        <value>/opt/hadoop/hadoop/dfs/name</value>
        <final>true</final>
</property>
<property>
        <name>dfs.replication</name>
        <value>1</value>
</property>
</configuration>

mvn package error

hi
when i execute mvn package, i get error!
Does anyone have problem with mvn package?
please help me.
thank for your reply

Maven-metadata : Clean up repos

Minor issue, but lets wrap our heads around this.

We can see that

http://rhbd.s3.amazonaws.com/maven/repositories/internal/org/apache/hadoop/fs/glusterfs/glusterfs-hadoop/maven-metadata.xml

Sais that there is a "current" and a "releases" tag.

  1. Current points to 2.1.4, releases points to 2.1.6. Lets manually update that if the next release doesnt fix it for us.

  2. After manually fixing the version number :Lets figure out exactly how to maintain that xml file and keep it up to date. Probably best to do this on a private fork (jonska/...), do a test release, and see how the corresponding maven-metadata.xml file is updated.

after yarn starts

Hi, i start yarn in my servers, and when i use jps command, the output shows me resourcemanager and nodemanager is running, now i have one question, how to test hadoop works true with glusterfs?
please help me.
thanks for your reply.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.