Performance impact of glusterFS vs HDFS

Using version 2.1.6 of the glusterfs-hadoop plugin in an hadoop 2.x and glusterfs 3.4 environment, we have some strange behaviour wrt performances and function.

Using teragen on the same physical cluster of 8 nodes, with both HDFS and glusterFS we have comparable results. However, using terasort, there is a huge perf impact using glusterfs.

SW used:
glusterfs-libs-3.4.0.59rhs-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.59rhs-1.el6rhs.x86_64
glusterfs-3.4.0.59rhs-1.el6rhs.x86_64
glusterfs-server-3.4.0.59rhs-1.el6rhs.x86_64

RHEL 6.4 with kenrel 2.6.32-358.32.3.el6.x86_64
glusterfs-hadoop-2.1.6.jar

HDFS Teragen results:

application_1393510237328_0004 root TeraGen default Thu, 27 Feb 2014 14:17:07 GMT Thu, 27 Feb 2014 14:18:16 GMT FINISHED SUCCEEDED

========== preparing terasort data==========
HADOOP_EXECUTABLE=/usr/lib/hadoop/bin/hadoop
HADOOP_CONF_DIR=/usr/lib/hadoop/etc/hadoop
HADOOP_EXAMPLES_JAR=/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
Deleted /tmp/HiBench/Terasort/Input
14/02/27 15:17:06 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
14/02/27 15:17:06 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
14/02/27 15:17:06 INFO terasort.TeraSort: Generating 1000000000 using 96
14/02/27 15:17:06 INFO mapreduce.JobSubmitter: number of splits:96
14/02/27 15:17:06 WARN conf.Configuration: user.name is deprecated. Instead, use mapreduce.job.user.name
14/02/27 15:17:06 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/02/27 15:17:06 WARN conf.Configuration: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/02/27 15:17:06 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/02/27 15:17:06 WARN conf.Configuration: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
14/02/27 15:17:06 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/02/27 15:17:06 WARN conf.Configuration: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
14/02/27 15:17:06 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/02/27 15:17:06 WARN conf.Configuration: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
14/02/27 15:17:06 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/02/27 15:17:06 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/02/27 15:17:06 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/02/27 15:17:07 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1393510237328_0004
14/02/27 15:17:07 INFO client.YarnClientImpl: Submitted application application_1393510237328_0004 to ResourceManager at hp-jobtracker-1.hpintelco.org/10.3.222.41:8032
14/02/27 15:17:07 INFO mapreduce.Job: The url to track the job: http://hp-jobtracker-1.hpintelco.org:8088/proxy/application_1393510237328_0004/
14/02/27 15:17:07 INFO mapreduce.Job: Running job: job_1393510237328_0004
14/02/27 15:17:11 INFO mapreduce.Job: Job job_1393510237328_0004 running in uber mode : false
14/02/27 15:17:11 INFO mapreduce.Job: map 0% reduce 0%
14/02/27 15:17:21 INFO mapreduce.Job: map 1% reduce 0%
14/02/27 15:17:22 INFO mapreduce.Job: map 7% reduce 0%
14/02/27 15:17:23 INFO mapreduce.Job: map 11% reduce 0%
14/02/27 15:17:24 INFO mapreduce.Job: map 13% reduce 0%
14/02/27 15:17:25 INFO mapreduce.Job: map 17% reduce 0%
14/02/27 15:17:26 INFO mapreduce.Job: map 19% reduce 0%
14/02/27 15:17:27 INFO mapreduce.Job: map 21% reduce 0%
14/02/27 15:17:28 INFO mapreduce.Job: map 23% reduce 0%
14/02/27 15:17:29 INFO mapreduce.Job: map 26% reduce 0%
14/02/27 15:17:30 INFO mapreduce.Job: map 28% reduce 0%
14/02/27 15:17:31 INFO mapreduce.Job: map 31% reduce 0%
14/02/27 15:17:32 INFO mapreduce.Job: map 34% reduce 0%
14/02/27 15:17:33 INFO mapreduce.Job: map 36% reduce 0%
14/02/27 15:17:34 INFO mapreduce.Job: map 38% reduce 0%
14/02/27 15:17:35 INFO mapreduce.Job: map 40% reduce 0%
14/02/27 15:17:36 INFO mapreduce.Job: map 41% reduce 0%
14/02/27 15:17:37 INFO mapreduce.Job: map 43% reduce 0%
14/02/27 15:17:38 INFO mapreduce.Job: map 45% reduce 0%
14/02/27 15:17:39 INFO mapreduce.Job: map 47% reduce 0%
14/02/27 15:17:40 INFO mapreduce.Job: map 49% reduce 0%
14/02/27 15:17:41 INFO mapreduce.Job: map 51% reduce 0%
14/02/27 15:17:42 INFO mapreduce.Job: map 53% reduce 0%
14/02/27 15:17:43 INFO mapreduce.Job: map 55% reduce 0%
14/02/27 15:17:44 INFO mapreduce.Job: map 57% reduce 0%
14/02/27 15:17:45 INFO mapreduce.Job: map 59% reduce 0%
14/02/27 15:17:46 INFO mapreduce.Job: map 60% reduce 0%
14/02/27 15:17:47 INFO mapreduce.Job: map 63% reduce 0%
14/02/27 15:17:48 INFO mapreduce.Job: map 65% reduce 0%
14/02/27 15:17:49 INFO mapreduce.Job: map 66% reduce 0%
14/02/27 15:17:50 INFO mapreduce.Job: map 68% reduce 0%
14/02/27 15:17:51 INFO mapreduce.Job: map 70% reduce 0%
14/02/27 15:17:52 INFO mapreduce.Job: map 72% reduce 0%
14/02/27 15:17:53 INFO mapreduce.Job: map 74% reduce 0%
14/02/27 15:17:54 INFO mapreduce.Job: map 76% reduce 0%
14/02/27 15:17:55 INFO mapreduce.Job: map 77% reduce 0%
14/02/27 15:17:56 INFO mapreduce.Job: map 79% reduce 0%
14/02/27 15:17:57 INFO mapreduce.Job: map 80% reduce 0%
14/02/27 15:17:58 INFO mapreduce.Job: map 82% reduce 0%
14/02/27 15:17:59 INFO mapreduce.Job: map 84% reduce 0%
14/02/27 15:18:00 INFO mapreduce.Job: map 85% reduce 0%
14/02/27 15:18:01 INFO mapreduce.Job: map 87% reduce 0%
14/02/27 15:18:02 INFO mapreduce.Job: map 89% reduce 0%
14/02/27 15:18:03 INFO mapreduce.Job: map 90% reduce 0%
14/02/27 15:18:04 INFO mapreduce.Job: map 91% reduce 0%
14/02/27 15:18:05 INFO mapreduce.Job: map 93% reduce 0%
14/02/27 15:18:06 INFO mapreduce.Job: map 94% reduce 0%
14/02/27 15:18:07 INFO mapreduce.Job: map 96% reduce 0%
14/02/27 15:18:08 INFO mapreduce.Job: map 97% reduce 0%
14/02/27 15:18:09 INFO mapreduce.Job: map 99% reduce 0%
14/02/27 15:18:10 INFO mapreduce.Job: map 100% reduce 0%
14/02/27 15:18:11 INFO mapreduce.Job: Job job_1393510237328_0004 completed successfully
14/02/27 15:18:12 INFO mapreduce.Job: Counters: 29
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=7449964
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=8251
HDFS: Number of bytes written=100000000000
HDFS: Number of read operations=384
HDFS: Number of large read operations=0
HDFS: Number of write operations=192
Job Counters
Killed map tasks=2
Launched map tasks=98
Other local map tasks=98
Total time spent by all maps in occupied slots (ms)=4080121
Total time spent by all reduces in occupied slots (ms)=0
Map-Reduce Framework
Map input records=1000000000
Map output records=1000000000
Input split bytes=8251
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=10904
CPU time spent (ms)=1803670
Physical memory (bytes) snapshot=34591154176
Virtual memory (bytes) snapshot=159040536576
Total committed heap usage (bytes)=91137507328
org.apache.hadoop.examples.terasort.TeraGen$Counters
CHECKSUM=2147523228284173905
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=100000000000
Deleted /tmp/HiBench/Terasort/Input/_SUCCESS

GlusterFS teragen results:

application_1393512197149_0001 yarn TeraGen default Thu, 27 Feb 2014 14:44:05 GMT Thu, 27 Feb 2014 14:47:24 GMT FINISHED SUCCEEDED
History

========== preparing terasort data==========
HADOOP_EXECUTABLE=/usr/lib/hadoop/bin/hadoop
HADOOP_CONF_DIR=/usr/lib/hadoop/etc/hadoop
HADOOP_EXAMPLES_JAR=/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
14/02/27 15:44:03 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/27 15:44:03 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/27 15:44:03 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS, CRC disabled.
14/02/27 15:44:03 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=7b04317, git.commit.user.email=[email protected], git.commit.message.full=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers

include the sudoers file in the srpm, git.commit.id=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.message.short=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers, git.commit.user.name=jay vyas, git.build.user.name=Unknown, git.commit.id.describe=2.1.6, git.build.user.email=Unknown, git.branch=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.time=07.02.2014 @ 12:06:31 EST, git.build.time=10.02.2014 @ 13:31:20 EST}
14/02/27 15:44:03 INFO glusterfs.GlusterFileSystem: GIT_TAG=2.1.6
14/02/27 15:44:03 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/27 15:44:03 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/27 15:44:03 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/27 15:44:03 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/27 15:44:03 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/27 15:44:03 INFO glusterfs.GlusterVolume: Write buffer size : 131072
rm: `/tmp/HiBench/Terasort/Input': No such file or directory
14/02/27 15:44:04 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/27 15:44:04 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/27 15:44:04 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS, CRC disabled.
14/02/27 15:44:04 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=7b04317, git.commit.user.email=[email protected], git.commit.message.full=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers

include the sudoers file in the srpm, git.commit.id=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.message.short=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers, git.commit.user.name=jay vyas, git.build.user.name=Unknown, git.commit.id.describe=2.1.6, git.build.user.email=Unknown, git.branch=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.time=07.02.2014 @ 12:06:31 EST, git.build.time=10.02.2014 @ 13:31:20 EST}
14/02/27 15:44:04 INFO glusterfs.GlusterFileSystem: GIT_TAG=2.1.6
14/02/27 15:44:04 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/27 15:44:04 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/27 15:44:04 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/27 15:44:04 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/27 15:44:04 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/27 15:44:04 INFO glusterfs.GlusterVolume: Write buffer size : 131072
14/02/27 15:44:04 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
14/02/27 15:44:04 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
14/02/27 15:44:04 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/27 15:44:04 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/27 15:44:04 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/27 15:44:04 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/27 15:44:04 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/27 15:44:04 INFO glusterfs.GlusterVolume: Write buffer size : 131072
14/02/27 15:44:05 INFO terasort.TeraSort: Generating 1000000000 using 96
14/02/27 15:44:05 INFO mapreduce.JobSubmitter: number of splits:96
14/02/27 15:44:05 WARN conf.Configuration: user.name is deprecated. Instead, use mapreduce.job.user.name
14/02/27 15:44:05 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/02/27 15:44:05 WARN conf.Configuration: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/02/27 15:44:05 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/02/27 15:44:05 WARN conf.Configuration: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
14/02/27 15:44:05 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/02/27 15:44:05 WARN conf.Configuration: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
14/02/27 15:44:05 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/02/27 15:44:05 WARN conf.Configuration: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
14/02/27 15:44:05 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/02/27 15:44:05 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/02/27 15:44:05 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/02/27 15:44:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1393512197149_0001
14/02/27 15:44:05 INFO client.YarnClientImpl: Submitted application application_1393512197149_0001 to ResourceManager at hp-jobtracker-1.hpintelco.org/10.3.222.41:8032
14/02/27 15:44:05 INFO mapreduce.Job: The url to track the job: http://hp-jobtracker-1.hpintelco.org:8088/proxy/application_1393512197149_0001/
14/02/27 15:44:05 INFO mapreduce.Job: Running job: job_1393512197149_0001
14/02/27 15:44:16 INFO mapreduce.Job: Job job_1393512197149_0001 running in uber mode : false
14/02/27 15:44:16 INFO mapreduce.Job: map 0% reduce 0%
14/02/27 15:44:33 INFO mapreduce.Job: map 1% reduce 0%
14/02/27 15:44:35 INFO mapreduce.Job: map 2% reduce 0%
14/02/27 15:44:36 INFO mapreduce.Job: map 6% reduce 0%
14/02/27 15:44:37 INFO mapreduce.Job: map 7% reduce 0%
14/02/27 15:44:38 INFO mapreduce.Job: map 8% reduce 0%
14/02/27 15:44:39 INFO mapreduce.Job: map 10% reduce 0%
14/02/27 15:44:40 INFO mapreduce.Job: map 11% reduce 0%
14/02/27 15:44:41 INFO mapreduce.Job: map 12% reduce 0%
14/02/27 15:44:42 INFO mapreduce.Job: map 14% reduce 0%
14/02/27 15:44:43 INFO mapreduce.Job: map 15% reduce 0%
14/02/27 15:44:45 INFO mapreduce.Job: map 16% reduce 0%
14/02/27 15:44:46 INFO mapreduce.Job: map 17% reduce 0%
14/02/27 15:44:48 INFO mapreduce.Job: map 18% reduce 0%
14/02/27 15:44:51 INFO mapreduce.Job: map 19% reduce 0%
14/02/27 15:44:52 INFO mapreduce.Job: map 20% reduce 0%
14/02/27 15:44:53 INFO mapreduce.Job: map 21% reduce 0%
14/02/27 15:44:55 INFO mapreduce.Job: map 22% reduce 0%
14/02/27 15:44:57 INFO mapreduce.Job: map 23% reduce 0%
14/02/27 15:44:59 INFO mapreduce.Job: map 24% reduce 0%
14/02/27 15:45:00 INFO mapreduce.Job: map 25% reduce 0%
14/02/27 15:45:02 INFO mapreduce.Job: map 26% reduce 0%
14/02/27 15:45:04 INFO mapreduce.Job: map 27% reduce 0%
14/02/27 15:45:05 INFO mapreduce.Job: map 28% reduce 0%
14/02/27 15:45:07 INFO mapreduce.Job: map 29% reduce 0%
14/02/27 15:45:09 INFO mapreduce.Job: map 30% reduce 0%
14/02/27 15:45:11 INFO mapreduce.Job: map 31% reduce 0%
14/02/27 15:45:13 INFO mapreduce.Job: map 32% reduce 0%
14/02/27 15:45:15 INFO mapreduce.Job: map 33% reduce 0%
14/02/27 15:45:17 INFO mapreduce.Job: map 34% reduce 0%
14/02/27 15:45:18 INFO mapreduce.Job: map 35% reduce 0%
14/02/27 15:45:19 INFO mapreduce.Job: map 36% reduce 0%
14/02/27 15:45:21 INFO mapreduce.Job: map 37% reduce 0%
14/02/27 15:45:22 INFO mapreduce.Job: map 38% reduce 0%
14/02/27 15:45:24 INFO mapreduce.Job: map 39% reduce 0%
14/02/27 15:45:25 INFO mapreduce.Job: map 40% reduce 0%
14/02/27 15:45:27 INFO mapreduce.Job: map 41% reduce 0%
14/02/27 15:45:28 INFO mapreduce.Job: map 42% reduce 0%
14/02/27 15:45:30 INFO mapreduce.Job: map 43% reduce 0%
14/02/27 15:45:32 INFO mapreduce.Job: map 44% reduce 0%
14/02/27 15:45:33 INFO mapreduce.Job: map 45% reduce 0%
14/02/27 15:45:34 INFO mapreduce.Job: map 46% reduce 0%
14/02/27 15:45:35 INFO mapreduce.Job: map 47% reduce 0%
14/02/27 15:45:37 INFO mapreduce.Job: map 48% reduce 0%
14/02/27 15:45:38 INFO mapreduce.Job: map 49% reduce 0%
14/02/27 15:45:40 INFO mapreduce.Job: map 50% reduce 0%
14/02/27 15:45:42 INFO mapreduce.Job: map 51% reduce 0%
14/02/27 15:45:43 INFO mapreduce.Job: map 52% reduce 0%
14/02/27 15:45:46 INFO mapreduce.Job: map 54% reduce 0%
14/02/27 15:45:48 INFO mapreduce.Job: map 55% reduce 0%
14/02/27 15:45:49 INFO mapreduce.Job: map 56% reduce 0%
14/02/27 15:45:51 INFO mapreduce.Job: map 57% reduce 0%
14/02/27 15:45:52 INFO mapreduce.Job: map 58% reduce 0%
14/02/27 15:45:54 INFO mapreduce.Job: map 59% reduce 0%
14/02/27 15:45:55 INFO mapreduce.Job: map 60% reduce 0%
14/02/27 15:45:57 INFO mapreduce.Job: map 61% reduce 0%
14/02/27 15:45:58 INFO mapreduce.Job: map 62% reduce 0%
14/02/27 15:46:00 INFO mapreduce.Job: map 63% reduce 0%
14/02/27 15:46:01 INFO mapreduce.Job: map 64% reduce 0%
14/02/27 15:46:03 INFO mapreduce.Job: map 65% reduce 0%
14/02/27 15:46:05 INFO mapreduce.Job: map 66% reduce 0%
14/02/27 15:46:07 INFO mapreduce.Job: map 67% reduce 0%
14/02/27 15:46:09 INFO mapreduce.Job: map 68% reduce 0%
14/02/27 15:46:10 INFO mapreduce.Job: map 69% reduce 0%
14/02/27 15:46:12 INFO mapreduce.Job: map 70% reduce 0%
14/02/27 15:46:14 INFO mapreduce.Job: map 71% reduce 0%
14/02/27 15:46:16 INFO mapreduce.Job: map 72% reduce 0%
14/02/27 15:46:18 INFO mapreduce.Job: map 73% reduce 0%
14/02/27 15:46:19 INFO mapreduce.Job: map 74% reduce 0%
14/02/27 15:46:21 INFO mapreduce.Job: map 75% reduce 0%
14/02/27 15:46:23 INFO mapreduce.Job: map 76% reduce 0%
14/02/27 15:46:26 INFO mapreduce.Job: map 77% reduce 0%
14/02/27 15:46:28 INFO mapreduce.Job: map 78% reduce 0%
14/02/27 15:46:30 INFO mapreduce.Job: map 79% reduce 0%
14/02/27 15:46:31 INFO mapreduce.Job: map 80% reduce 0%
14/02/27 15:46:33 INFO mapreduce.Job: map 81% reduce 0%
14/02/27 15:46:34 INFO mapreduce.Job: map 82% reduce 0%
14/02/27 15:46:37 INFO mapreduce.Job: map 83% reduce 0%
14/02/27 15:46:38 INFO mapreduce.Job: map 84% reduce 0%
14/02/27 15:46:40 INFO mapreduce.Job: map 85% reduce 0%
14/02/27 15:46:42 INFO mapreduce.Job: map 86% reduce 0%
14/02/27 15:46:44 INFO mapreduce.Job: map 87% reduce 0%
14/02/27 15:46:46 INFO mapreduce.Job: map 88% reduce 0%
14/02/27 15:46:47 INFO mapreduce.Job: map 89% reduce 0%
14/02/27 15:46:49 INFO mapreduce.Job: map 90% reduce 0%
14/02/27 15:46:50 INFO mapreduce.Job: map 91% reduce 0%
14/02/27 15:46:52 INFO mapreduce.Job: map 92% reduce 0%
14/02/27 15:46:54 INFO mapreduce.Job: map 93% reduce 0%
14/02/27 15:46:56 INFO mapreduce.Job: map 94% reduce 0%
14/02/27 15:47:01 INFO mapreduce.Job: map 96% reduce 0%
14/02/27 15:47:04 INFO mapreduce.Job: map 97% reduce 0%
14/02/27 15:47:10 INFO mapreduce.Job: map 98% reduce 0%
14/02/27 15:47:13 INFO mapreduce.Job: map 99% reduce 0%
14/02/27 15:47:16 INFO mapreduce.Job: map 100% reduce 0%
14/02/27 15:47:19 INFO mapreduce.Job: Job job_1393512197149_0001 completed successfully
14/02/27 15:47:19 INFO mapreduce.Job: Counters: 29
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=7448524
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
GLUSTERFS: Number of bytes read=400410
GLUSTERFS: Number of bytes written=100000000000
GLUSTERFS: Number of read operations=0
GLUSTERFS: Number of large read operations=0
GLUSTERFS: Number of write operations=0
Job Counters
Killed map tasks=10
Launched map tasks=106
Other local map tasks=106
Total time spent by all maps in occupied slots (ms)=10068071
Total time spent by all reduces in occupied slots (ms)=0
Map-Reduce Framework
Map input records=1000000000
Map output records=1000000000
Input split bytes=8251
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=8753
CPU time spent (ms)=1116410
Physical memory (bytes) snapshot=19736702976
Virtual memory (bytes) snapshot=105021358080
Total committed heap usage (bytes)=45473071104
org.apache.hadoop.examples.terasort.TeraGen$Counters
CHECKSUM=2147523228284173905
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=100000000000
14/02/27 15:47:20 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/27 15:47:20 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/27 15:47:20 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS, CRC disabled.
14/02/27 15:47:20 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=7b04317, git.commit.user.email=[email protected], git.commit.message.full=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers

include the sudoers file in the srpm, git.commit.id=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.message.short=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers, git.commit.user.name=jay vyas, git.build.user.name=Unknown, git.commit.id.describe=2.1.6, git.build.user.email=Unknown, git.branch=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.time=07.02.2014 @ 12:06:31 EST, git.build.time=10.02.2014 @ 13:31:20 EST}
14/02/27 15:47:20 INFO glusterfs.GlusterFileSystem: GIT_TAG=2.1.6
14/02/27 15:47:20 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/27 15:47:20 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/27 15:47:20 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/27 15:47:20 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/27 15:47:20 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/27 15:47:20 INFO glusterfs.GlusterVolume: Write buffer size : 131072
14/02/27 15:47:21 WARN conf.Configuration: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
Deleted /tmp/HiBench/Terasort/Input/_SUCCESS

Now HDFS Terasort results:

application_1393510237328_0006 root TeraSort default Thu, 27 Feb 2014 14:23:18 GMT Thu, 27 Feb 2014 14:26:11 GMT FINISHED SUCCEEDED
History

========== running terasort bench ==========
HADOOP_EXECUTABLE=/usr/lib/hadoop/bin/hadoop
HADOOP_CONF_DIR=/usr/lib/hadoop/etc/hadoop
HADOOP_EXAMPLES_JAR=/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
Deleted /tmp/HiBench/Terasort/Output
/usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort -D mapreduce.job.reduces=48 /tmp/HiBench/Terasort/Input /tmp/HiBench/Terasort/Output
14/02/27 15:23:16 INFO terasort.TeraSort: starting
14/02/27 15:23:17 INFO input.FileInputFormat: Total input paths to process : 96
Spent 274ms computing base-splits.
Spent 10ms computing TeraScheduler splits.
Computing input splits took 285ms
Sampling 10 splits of 768
Making 48 from 100000 sampled records
Computing parititions took 296ms
Spent 584ms computing partitions.
14/02/27 15:23:17 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
14/02/27 15:23:17 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
14/02/27 15:23:18 INFO mapreduce.JobSubmitter: number of splits:768
14/02/27 15:23:18 WARN conf.Configuration: user.name is deprecated. Instead, use mapreduce.job.user.name
14/02/27 15:23:18 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/02/27 15:23:18 WARN conf.Configuration: mapred.cache.files.filesizes is deprecated. Instead, use mapreduce.job.cache.files.filesizes
14/02/27 15:23:18 WARN conf.Configuration: mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files
14/02/27 15:23:18 WARN conf.Configuration: mapreduce.partitioner.class is deprecated. Instead, use mapreduce.job.partitioner.class
14/02/27 15:23:18 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/02/27 15:23:18 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/02/27 15:23:18 WARN conf.Configuration: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
14/02/27 15:23:18 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/02/27 15:23:18 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/02/27 15:23:18 WARN conf.Configuration: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
14/02/27 15:23:18 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/02/27 15:23:18 WARN conf.Configuration: mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps
14/02/27 15:23:18 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/02/27 15:23:18 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/02/27 15:23:18 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1393510237328_0006
14/02/27 15:23:18 INFO client.YarnClientImpl: Submitted application application_1393510237328_0006 to ResourceManager at hp-jobtracker-1.hpintelco.org/10.3.222.41:8032
14/02/27 15:23:18 INFO mapreduce.Job: The url to track the job: http://hp-jobtracker-1.hpintelco.org:8088/proxy/application_1393510237328_0006/
14/02/27 15:23:18 INFO mapreduce.Job: Running job: job_1393510237328_0006
14/02/27 15:23:23 INFO mapreduce.Job: Job job_1393510237328_0006 running in uber mode : false
14/02/27 15:23:23 INFO mapreduce.Job: map 0% reduce 0%
[...]
14/02/27 15:25:57 INFO mapreduce.Job: map 100% reduce 100%
14/02/27 15:26:07 INFO mapreduce.Job: Job job_1393510237328_0006 completed successfully
14/02/27 15:26:07 INFO mapreduce.Job: Counters: 45
File System Counters
FILE: Number of bytes read=208148757282
FILE: Number of bytes written=312066255570
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=100000124416
HDFS: Number of bytes written=100000000000
HDFS: Number of read operations=2448
HDFS: Number of large read operations=0
HDFS: Number of write operations=96
Job Counters
Killed map tasks=1
Killed reduce tasks=1
Launched map tasks=769
Launched reduce tasks=49
Data-local map tasks=769
Total time spent by all maps in occupied slots (ms)=24813386
Total time spent by all reduces in occupied slots (ms)=4937789
Map-Reduce Framework
Map input records=1000000000
Map output records=1000000000
Map output bytes=102000000000
Map output materialized bytes=104000221184
Input split bytes=124416
Combine input records=0
Combine output records=0
Reduce input groups=1000000000
Reduce shuffle bytes=104000221184
Reduce input records=1000000000
Reduce output records=1000000000
Spilled Records=3000000000
Shuffled Maps =36864
Failed Shuffles=0
Merged Map outputs=36864
GC time elapsed (ms)=929575
CPU time spent (ms)=16325620
Physical memory (bytes) snapshot=300925956096
Virtual memory (bytes) snapshot=895643398144
Total committed heap usage (bytes)=412231925760
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=100000000000
File Output Format Counters
Bytes Written=100000000000
14/02/27 15:26:07 INFO terasort.TeraSort: done

Whereas when using glusterFS terasort the results show 3 times Launched map tasks and a very long time of handling (on the same env):

14/02/26 10:46:28 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/26 10:46:28 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/26 10:46:28 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS, CRC disabled.
14/02/26 10:46:28 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=7b04317, git.commit.user.email=[email protected],
git.commit.message.full=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers

include the sudoers file in the srpm, git.commit.id=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.message.short=Merge pull request #80
from jayunit100/2.1.6_release_fix_sudoers, git.commit.user.name=jay vyas, git.build.user.name=Unknown, git.commit.id.describe=2.1.6,
git.build.user.email=Unknown, git.branch=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.time=07.02.2014 @ 12:06:31 EST,
git.build.time=10.02.2014 @ 13:31:20 EST}
14/02/26 10:46:28 INFO glusterfs.GlusterFileSystem: GIT_TAG=2.1.6
14/02/26 10:46:28 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/26 10:46:28 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/26 10:46:28 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/26 10:46:28 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/26 10:46:28 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/26 10:46:28 INFO glusterfs.GlusterVolume: Write buffer size : 131072
14/02/26 10:46:29 WARN conf.Configuration: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
Deleted /tmp/HiBench/Terasort/Input/_SUCCESS
========== running terasort bench ==========
HADOOP_EXECUTABLE=/usr/lib/hadoop/bin/hadoop
HADOOP_CONF_DIR=/usr/lib/hadoop/etc/hadoop
HADOOP_EXAMPLES_JAR=/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
14/02/26 10:46:30 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/26 10:46:30 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/26 10:46:30 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS, CRC disabled.
14/02/26 10:46:30 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=7b04317, git.commit.user.email=[email protected],
git.commit.message.full=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers

include the sudoers file in the srpm, git.commit.id=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.message.short=Merge pull request #80
from jayunit100/2.1.6_release_fix_sudoers, git.commit.user.name=jay vyas, git.build.user.name=Unknown, git.commit.id.describe=2.1.6,
git.build.user.email=Unknown, git.branch=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.time=07.02.2014 @ 12:06:31 EST,
git.build.time=10.02.2014 @ 13:31:20 EST}
14/02/26 10:46:30 INFO glusterfs.GlusterFileSystem: GIT_TAG=2.1.6
14/02/26 10:46:30 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/26 10:46:30 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/26 10:46:30 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/26 10:46:30 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/26 10:46:30 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/26 10:46:30 INFO glusterfs.GlusterVolume: Write buffer size : 131072
rm: `/tmp/HiBench/Terasort/Output': No such file or directory
14/02/26 10:46:31 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/26 10:46:31 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/26 10:46:31 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS, CRC disabled.
14/02/26 10:46:31 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=7b04317, git.commit.user.email=[email protected],
git.commit.message.full=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers

include the sudoers file in the srpm, git.commit.id=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.message.short=Merge pull request #80
from jayunit100/2.1.6_release_fix_sudoers, git.commit.user.name=jay vyas, git.build.user.name=Unknown, git.commit.id.describe=2.1.6,
git.build.user.email=Unknown, git.branch=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.time=07.02.2014 @ 12:06:31 EST,
git.build.time=10.02.2014 @ 13:31:20 EST}
14/02/26 10:46:31 INFO glusterfs.GlusterFileSystem: GIT_TAG=2.1.6
14/02/26 10:46:31 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/26 10:46:31 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/26 10:46:31 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/26 10:46:31 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/26 10:46:31 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/26 10:46:31 INFO glusterfs.GlusterVolume: Write buffer size : 131072
14/02/26 10:46:32 INFO terasort.TeraSort: starting
14/02/26 10:46:32 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/26 10:46:32 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/26 10:46:32 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS, CRC disabled.
14/02/26 10:46:32 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=7b04317, git.commit.user.email=[email protected],
git.commit.message.full=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers

include the sudoers file in the srpm, git.commit.id=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.message.short=Merge pull request #80
from jayunit100/2.1.6_release_fix_sudoers, git.commit.user.name=jay vyas, git.build.user.name=Unknown, git.commit.id.describe=2.1.6,
git.build.user.email=Unknown, git.branch=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.time=07.02.2014 @ 12:06:31 EST,
git.build.time=10.02.2014 @ 13:31:20 EST}
14/02/26 10:46:32 INFO glusterfs.GlusterFileSystem: GIT_TAG=2.1.6
14/02/26 10:46:32 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/26 10:46:32 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/26 10:46:32 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/26 10:46:32 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/26 10:46:32 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/26 10:46:32 INFO glusterfs.GlusterVolume: Write buffer size : 131072
14/02/26 10:46:32 INFO input.FileInputFormat: Total input paths to process : 96
Spent 1644ms computing base-splits.
Spent 30ms computing TeraScheduler splits.
Computing input splits took 1675ms
Sampling 10 splits of 2976
Making 48 from 100000 sampled records
Computing parititions took 1088ms
Spent 2766ms computing partitions.
14/02/26 10:46:35 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
14/02/26 10:46:35 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
14/02/26 10:46:35 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/26 10:46:35 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/26 10:46:35 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/26 10:46:35 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/26 10:46:35 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/26 10:46:35 INFO glusterfs.GlusterVolume: Write buffer size : 131072
14/02/26 10:46:38 INFO mapreduce.JobSubmitter: number of splits:2976
14/02/26 10:46:38 WARN conf.Configuration: user.name is deprecated. Instead, use mapreduce.job.user.name
14/02/26 10:46:38 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/02/26 10:46:38 WARN conf.Configuration: mapred.cache.files.filesizes is deprecated. Instead, use mapreduce.job.cache.files.filesizes
14/02/26 10:46:38 WARN conf.Configuration: mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files
14/02/26 10:46:38 WARN conf.Configuration: mapreduce.partitioner.class is deprecated. Instead, use mapreduce.job.partitioner.class
14/02/26 10:46:38 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/02/26 10:46:38 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/02/26 10:46:38 WARN conf.Configuration: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
14/02/26 10:46:38 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/02/26 10:46:38 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/02/26 10:46:38 WARN conf.Configuration: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
14/02/26 10:46:38 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/02/26 10:46:38 WARN conf.Configuration: mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps
14/02/26 10:46:38 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/02/26 10:46:38 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/02/26 10:46:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1393404749197_0036
14/02/26 10:46:39 INFO client.YarnClientImpl: Submitted application application_1393404749197_0036 to ResourceManager at hp-jobtracker-1.hpintelco.org/10.3.222.41:8032
14/02/26 10:46:39 INFO mapreduce.Job: The url to track the job:
http://hp-jobtracker-1.hpintelco.org:8088/proxy/application_1393404749197_0036/
14/02/26 10:46:39 INFO mapreduce.Job: Running job: job_1393404749197_0036
14/02/26 10:46:54 INFO mapreduce.Job: Job job_1393404749197_0036 running in uber mode : false
14/02/26 10:46:54 INFO mapreduce.Job: map 0% reduce 0%
[...]
14/02/26 11:30:47 INFO mapreduce.Job: map 100% reduce 100%
14/02/26 11:31:03 INFO mapreduce.Job: Job job_1393404749197_0036 completed successfully
14/02/26 11:31:03 INFO mapreduce.Job: Counters: 45
File System Counters
FILE: Number of bytes read=104001540032
FILE: Number of bytes written=208243540736
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
GLUSTERFS: Number of bytes read=100451375097
GLUSTERFS: Number of bytes written=100000000000
GLUSTERFS: Number of read operations=0
GLUSTERFS: Number of large read operations=0
GLUSTERFS: Number of write operations=0
Job Counters
Killed map tasks=1
Killed reduce tasks=13
Launched map tasks=2977
Launched reduce tasks=61
Rack-local map tasks=2977
Total time spent by all maps in occupied slots (ms)=161663015
Total time spent by all reduces in occupied slots (ms)=101269527
Map-Reduce Framework
Map input records=1000000000
Map output records=1000000000
Map output bytes=102000000000
Map output materialized bytes=104000857088
Input split bytes=395808
Combine input records=0
Combine output records=0
Reduce input groups=1000000000
Reduce shuffle bytes=104000857088
Reduce input records=1000000000
Reduce output records=1000000000
Spilled Records=2000000000
Shuffled Maps =142848
Failed Shuffles=0
Merged Map outputs=142848
GC time elapsed (ms)=580169
CPU time spent (ms)=23171180
Physical memory (bytes) snapshot=1613411700736
Virtual memory (bytes) snapshot=4991907897344
Total committed heap usage (bytes)=3094560636928
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=100125825280
File Output Format Counters
Bytes Written=100000000000
14/02/26 11:31:03 INFO terasort.TeraSort: done
14/02/26 11:31:04 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
14/02/26 11:31:04 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
14/02/26 11:31:04 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/26 11:31:04 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/26 11:31:04 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/26 11:31:04 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/26 11:31:04 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/26 11:31:04 INFO glusterfs.GlusterVolume: Write buffer size : 131072

As you can see, this conf generates 2977 Launched map tasks whereas the HDFS one generates only 769.

Could this be in relationship with https://bugzilla.redhat.com/show_bug.cgi?id=1071337 ?

Let me know if you need more details. The infra is still available for us to make more tests.

Apache Bigtop smoke tests

We will need to setup a cluster for bigtop verification of glusterfs-hadoop . This will not run upstream, but we will make the results available to the upstream community. We have servers to do this. The internal tests should

pull jars from upstream releases.
copy them to local (just like our slaves in EC2 do)
run mahout,pig,hive,mapreduce, and other similar tests using the new https://issues.apache.org/jira/browse/BIGTOP-1222 simplified bigtop smoke infrastructure which is gradle (or maybe , to start, just use the maven based smoke runner wrapped in a bash script).

GlusterFS-Hadoop plugin seems to no longer be under active development

As above. There have been relatively few commits since 2015 (in both the source repo and the wiki), and there are several outstanding PRs and issues.

Is this project still actively developed or worked on?
Is there a considerable amount of work that this project requires?
If there is any work required, is it something that a single individual might be able to help with? (i.e., is there anything I can do?)

I'd ideally like to use GlusterFS as a drop-in replacement for HDFS, but I just want to make sure that this project is still receiving development attention (otherwise I can't justify it to the IT/Ops people in my organisation)

pom changes

Hi
is pom's file change for every project?
for example in my case, glusterfs-hadoop rpm version is 2.0.1 and must change this version in pom's file or other files!

2.3.13
glusterfs-hadoop
hadoop filesystem impl for glusterfs

thanks for you're reply.

YARN incompatibility with 2.3.GlusterFS stack and ambari 2.2.1.0

Hi,

We are trying to install Hadoop cluster using 2.3.GlusterFS stack through Ambari 2.2.1.0.
All provisioning is done with Ambari blueprints like :
Blueprint
{ "configurations" : [ { "core-site": { "hadoop.proxyuser.hcat.groups" : "*", "hadoop.proxyuser.hcat.hosts" : "*", "hadoop.proxyuser.hue.groups" : "*", "hadoop.proxyuser.hue.hosts" : "*", "hadoop.security.authentication" : "simple", "fs.AbstractFileSystem.glusterfs.impl" : "org.apache.hadoop.fs.local.GlusterFs", "fs.trash.interval" : "360", "fs.glusterfs.impl" : "org.apache.hadoop.fs.glusterfs.GlusterFileSystem", "hadoop.security.authorization" : "false", "net.topology.script.file.name" : "/etc/hadoop/conf/topology_script.py" } }, { "ams-hbase-env" : { "regionserver_xmn_size" : "384m", "hbase_regionserver_heapsize" : "1024m", "hbase_master_heapsize" : "1024m", "hbase_master_xmn_size" : "384m" } }, { "ams-env" : { "metrics_collector_heapsize" : "1024m" } }, { "hadoop-env" : { "dtnode_heapsize" : "1024m", "namenode_heapsize" : "2048m", "namenode_opt_maxnewsize" : "384m", "namenode_opt_newsize" : "384m", "namenode_host" : "master01", "snamenode_host" : "master01", "glusterfs_user" : "root", "hdfs_user" : "hdfs", "hdfs_log_dir_prefix" : "/var/log/hadoop" } }, { "hdfs-site" : { "dfs.datanode.data.dir" : "/hadoop/hdfs/data", "dfs.datanode.balance.bandwidthPerSec" : "12500000", "dfs.datanode.max.transfer.threads": "4096", "dfs.datanode.failed.volumes.tolerated" : "1", "dfs.replication" : "true" } }, { "spark-defaults" : { "spark.executor.instances" : "2", "spark.executor.memory" : "7808m", "spark.driver.memory" : "3712m", "spark.yarn.am.memory" : "3712m", "spark.yarn.executor.memoryOverhead" : "384", "spark.yarn.driver.memoryOverhead" : "384", "spark.yarn.am.memoryOverhead" : "384" } }, { "mapred-site" : { "mapreduce.map.java.opts" : "-Xmx1638m", "mapreduce.map.memory.mb" : "2048", "mapreduce.reduce.java.opts" : "-Xmx1638m", "mapreduce.reduce.memory.mb" : "2048", "mapreduce.task.io.sort.mb" : "768", "yarn.app.mapreduce.am.command-opts" : "-Xmx1638m -Dhdp.version=${hdp.version}", "yarn.app.mapreduce.am.resource.mb" : "2048" } }, { "tez-site" : { "tez.am.resource.memory.mb" : "2048", "tez.task.resource.memory.mb" : "2048" } }, { "spark-defaults" : { "spark.executor.instances" : "1", "spark.executor.memory" : "3712m", "spark.driver.memory" : "1664m", "spark.yarn.am.memory" : "1664m", "spark.yarn.executor.memoryOverhead" : "384", "spark.yarn.driver.memoryOverhead" : "384", "spark.yarn.am.memoryOverhead" : "384" } }, { "storm-site" : { "logviewer.port" : "8005" } }, { "oozie-site" : { "oozie.service.ProxyUserService.proxyuser.hue.groups" : "*", "oozie.service.ProxyUserService.proxyuser.hue.hosts" : "*" } }, { "webhcat-site" : { "webhcat.proxyuser.hue.groups" : "*", "webhcat.proxyuser.hue.hosts" : "*" } }, { "hive-site" : { "hive.tez.container.size" : "-1", "hive.tez.java.opts": "-1", "fs.file.impl.disable.cache" : "true", "fs.hdfs.impl.disable.cache" : "true", "javax.jdo.option.ConnectionPassword" : "my-pw" } } ], "host_groups" : [ { "name" : "slavenode_simple", "configurations" : [ ], "components" : [ { "name" : "ZOOKEEPER_CLIENT" }, { "name" : "OOZIE_CLIENT" }, { "name" : "HIVE_CLIENT" }, { "name" : "GLUSTERFS_CLIENT" }, { "name" : "YARN_CLIENT" }, { "name" : "TEZ_CLIENT" }, { "name" : "SPARK_CLIENT" }, { "name" : "NODEMANAGER" }, { "name" : "METRICS_MONITOR" } ], "cardinality" : "2" }, { "name" : "masternode_1", "configurations" : [ ], "components" : [ { "name" : "NODEMANAGER" }, { "name" : "SPARK_CLIENT" }, { "name" : "YARN_CLIENT" }, { "name" : "GLUSTERFS_CLIENT" }, { "name" : "METRICS_MONITOR" }, { "name" : "TEZ_CLIENT" }, { "name" : "ZOOKEEPER_CLIENT" }, { "name" : "ZOOKEEPER_SERVER" }, { "name" : "AMBARI_SERVER" }, { "name" : "SPARK_JOBHISTORYSERVER" }, { "name" : "APP_TIMELINE_SERVER" }, { "name" : "METRICS_COLLECTOR" }, { "name" : "RESOURCEMANAGER" }, { "name" : "WEBHCAT_SERVER" }, { "name" : "OOZIE_SERVER" }, { "name" : "HIVE_METASTORE" }, { "name" : "HIVE_SERVER" } ], "cardinality" : "1" } ], "Blueprints" : { "stack_name" : "HDP", "stack_version" : "2.3.GlusterFS" } }

Template used for blueprint to be provided to :
{ "blueprint": "cluster_blueprint", "default_password": "my-pw", "host_groups": [ { "name" : "masternode_1", "hosts" : [ { "fqdn": "master01.domain.com" } ] } ] }

Here is the full stack :
23 Mar 2016 11:05:51,256 INFO [qtp-ambari-client-23] AmbariManagementControllerImpl:1471 - Applying configuration with tag 'INITIAL' to cluster 'test-cluster2' for configuration type zookeeper-env 23 Mar 2016 11:05:51,280 INFO [qtp-ambari-client-23] AmbariManagementControllerImpl:1471 - Applying configuration with tag 'INITIAL' to cluster 'test-cluster2' for configuration type zookeeper-log4j 23 Mar 2016 11:05:51,323 INFO [qtp-ambari-client-23] ClusterConfigurationRequest:358 - Sending cluster config update request for service = YARN 23 Mar 2016 11:05:51,324 INFO [qtp-ambari-client-23] AmbariManagementControllerImpl:1353 - Received a updateCluster request, clusterId=null, clusterName=test-cluster2, securityType=null, request={ clusterName=test-cluster2, clusterId=null, provisioningState=null, securityType=null, stackVersion=null, desired_scv=null, hosts=[] } 23 Mar 2016 11:05:51,324 INFO [qtp-ambari-client-23] AmbariManagementControllerImpl:1471 - Applying configuration with tag 'INITIAL' to cluster 'test-cluster2' for configuration type hdfs-site 23 Mar 2016 11:05:51,351 INFO [qtp-ambari-client-23] AmbariManagementControllerImpl:1471 - Applying configuration with tag 'INITIAL' to cluster 'test-cluster2' for configuration type capacity-scheduler 23 Mar 2016 11:05:51,378 INFO [qtp-ambari-client-23] AmbariManagementControllerImpl:1471 - Applying configuration with tag 'INITIAL' to cluster 'test-cluster2' for configuration type yarn-env 23 Mar 2016 11:05:51,411 INFO [qtp-ambari-client-23] AmbariManagementControllerImpl:1471 - Applying configuration with tag 'INITIAL' to cluster 'test-cluster2' for configuration type yarn-site 23 Mar 2016 11:05:51,490 INFO [qtp-ambari-client-23] AmbariManagementControllerImpl:1471 - Applying configuration with tag 'INITIAL' to cluster 'test-cluster2' for configuration type yarn-log4j 23 Mar 2016 11:05:51,567 ERROR [qtp-ambari-client-23] ClusterImpl:2635 - Updating configs for multiple services by a single API request isn't supported, config version not created 23 Mar 2016 11:05:51,572 ERROR [qtp-ambari-client-23] BaseManagementHandler:66 - Caught a runtime exception while attempting to create a resource: Failed to set configurations on cluster: org.apache.ambari.server.AmbariException: Updating configs for multiple services by a single API request isn't supported java.lang.RuntimeException: Failed to set configurations on cluster: org.apache.ambari.server.AmbariException: Updating configs for multiple services by a single API request isn't supported at org.apache.ambari.server.topology.AmbariContext.setConfigurationOnCluster(AmbariContext.java:390) at org.apache.ambari.server.topology.ClusterConfigurationRequest.setConfigurationsOnCluster(ClusterConfigurationRequest.java:359) at org.apache.ambari.server.topology.ClusterConfigurationRequest.setConfigurationsOnCluster(ClusterConfigurationRequest.java:279) at org.apache.ambari.server.topology.ClusterConfigurationRequest.<init>(ClusterConfigurationRequest.java:78) at org.apache.ambari.server.topology.ClusterConfigurationRequest.<init>(ClusterConfigurationRequest.java:83) at org.apache.ambari.server.topology.TopologyManager.provisionCluster(TopologyManager.java:191) at org.apache.ambari.server.controller.internal.ClusterResourceProvider.processBlueprintCreate(ClusterResourceProvider.java:517) at org.apache.ambari.server.controller.internal.ClusterResourceProvider.createResources(ClusterResourceProvider.java:174) at org.apache.ambari.server.controller.internal.ClusterControllerImpl.createResources(ClusterControllerImpl.java:289) at org.apache.ambari.server.api.services.persistence.PersistenceManagerImpl.create(PersistenceManagerImpl.java:76) at org.apache.ambari.server.api.handlers.CreateHandler.persist(CreateHandler.java:36) at org.apache.ambari.server.api.handlers.BaseManagementHandler.handleRequest(BaseManagementHandler.java:72) at org.apache.ambari.server.api.services.BaseRequest.process(BaseRequest.java:135) at org.apache.ambari.server.api.services.BaseService.handleRequest(BaseService.java:106) at org.apache.ambari.server.api.services.BaseService.handleRequest(BaseService.java:75) at org.apache.ambari.server.api.services.ClusterService.createCluster(ClusterService.java:131) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:540) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:715) at javax.servlet.http.HttpServlet.service(HttpServlet.java:770) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1496) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330) at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:118) at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:84) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:113) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:103) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:113) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:54) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.savedrequest.RequestCacheAwareFilter.doFilter(RequestCacheAwareFilter.java:45) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.apache.ambari.server.security.authorization.AmbariAuthorizationFilter.doFilter(AmbariAuthorizationFilter.java:196) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.authentication.www.BasicAuthenticationFilter.doFilter(BasicAuthenticationFilter.java:201) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:87) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:192) at org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:160) at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:237) at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:167) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467) at org.apache.ambari.server.api.MethodOverrideFilter.doFilter(MethodOverrideFilter.java:72) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467) at org.apache.ambari.server.api.AmbariPersistFilter.doFilter(AmbariPersistFilter.java:47) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467) at org.apache.ambari.server.security.AbstractSecurityHeaderFilter.doFilter(AbstractSecurityHeaderFilter.java:109) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467) at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:82) at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:294) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.apache.ambari.server.controller.AmbariHandlerList.processHandlers(AmbariHandlerList.java:216) at org.apache.ambari.server.controller.AmbariHandlerList.processHandlers(AmbariHandlerList.java:205) at org.apache.ambari.server.controller.AmbariHandlerList.handle(AmbariHandlerList.java:139) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:370) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:982) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1043) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:865) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:696) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:53) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.ambari.server.AmbariException: Updating configs for multiple services by a single API request isn't supported at org.apache.ambari.server.utils.RetryHelper.executeWithRetry(RetryHelper.java:110) at org.apache.ambari.server.topology.AmbariContext.setConfigurationOnCluster(AmbariContext.java:381) ... 96 more Caused by: java.lang.IllegalArgumentException: Updating configs for multiple services by a single API request isn't supported at org.apache.ambari.server.state.cluster.ClusterImpl.applyConfigs(ClusterImpl.java:2634) at org.apache.ambari.server.orm.AmbariJpaLocalTxnInterceptor.invoke(AmbariJpaLocalTxnInterceptor.java:68) at org.apache.ambari.server.state.cluster.ClusterImpl.addDesiredConfig(ClusterImpl.java:2172) at org.apache.ambari.server.controller.AmbariManagementControllerImpl.updateCluster(AmbariManagementControllerImpl.java:1485) at org.apache.ambari.server.controller.AmbariManagementControllerImpl.updateClusters(AmbariManagementControllerImpl.java:1337) at org.apache.ambari.server.topology.AmbariContext$5.call(AmbariContext.java:384) at org.apache.ambari.server.utils.RetryHelper.executeWithRetry(RetryHelper.java:95) ... 97 more

When we change GLUSTERFS_CLIENT to HDFS_CLIENT and stack to be 2.4 (HDFS), everything is fine.

Do you have any idea please ? Is there any incompatibility ?
Thanks

resolve GlusterFSXattr

Process p = Runtime.getRuntime()
.exec(new String[] { "sudo", "getfattr", "-m", ".", "-n", "trusted.glusterfs.pathinfo", filename });
You need to resolve next way.
Because this contains error shellcommand can't contain any keys, like space if i am not wrong Process p=Runtime.getRuntime().exec(shellCommand);

how to use glusterfs-hadoop plugin

I want to build glusterfs-hadoop plugin on my hadoop system. I built hadoop0.20.0 and glusterFS3.3beta4 form source tarball on ubuntu12.04 and found the way to install glusterfs-hadoop plugin at "https://forge.gluster.org/hadoop/glusterfs-hadoop/blobs/master/README" , but I could not find "$GLUSTER_HOME/hdfs/0.20.2" at line39. Should I have to move the directory "hadoop/*" to "$GLUSTER_HOME/" ,or another way to solve this problem?

Bug with the YARN streaming interface

Using version 2.1.6 of the glusterfs-hadoop plugin in an hadoop 2.x and glusterfs 3.4 environment, we have errors with the YARN streaming interface.

SW used:
glusterfs-libs-3.4.0.59rhs-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.59rhs-1.el6rhs.x86_64
glusterfs-3.4.0.59rhs-1.el6rhs.x86_64
glusterfs-server-3.4.0.59rhs-1.el6rhs.x86_64

RHEL 6.4 with kernel 2.6.32-358.32.3.el6.x86_64
glusterfs-hadoop-2.1.6.jar

Run results:

-bash-4.1$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -mapper /bin/cat -input /ls-gfs.txt -output /process/
14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS, CRC disabled.
14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=7b04317, git.commit.user.email=[email protected],
git.commit.message.full=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers

include the sudoers file in the srpm, git.commit.id=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.message.short=Merge pull request #80
from jayunit100/2.1.6_release_fix_sudoers, git.commit.user.name=jay vyas, git.build.user.name=Unknown, git.commit.id.describe=2.1.6,
git.build.user.email=Unknown, git.branch=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.time=07.02.2014 @ 12:06:31 EST,
git.build.time=10.02.2014 @ 13:31:20 EST}
14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: GIT_TAG=2.1.6
14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/28 15:05:09 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Write buffer size : 131072
packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.0.4-Intel.jar] /tmp/streamjob2645998574693064427.jar tmpDir=null
14/02/28 15:05:10 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
14/02/28 15:05:10 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Write buffer size : 131072
14/02/28 15:05:10 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
14/02/28 15:05:10 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Write buffer size : 131072
14/02/28 15:05:10 INFO mapred.FileInputFormat: Total input paths to process : 1
14/02/28 15:05:10 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/yarn/.staging/job_1393593232248_0003
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.hadoop.mapred.FileInputFormat.getSplitHosts(FileInputFormat.java:508)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:298)
at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:503)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:495)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:390)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1234)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1231)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1231)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:589)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:584)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:584)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:575)
at org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:1014)
at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:134)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

run wordcount error

when I use the command:
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar teragen -Dmapred.map.tasks=20 109951 terasort/100M-input

15/09/03 09:32:58 INFO glusterfs.GlusterVolume: Initializing gluster volume..
15/09/03 09:32:58 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
15/09/03 09:32:58 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS, CRC disabled.
15/09/03 09:32:58 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=46d9738, git.commit.user.email=[email protected], git.commit.message.full=Merge branch 'master' of https://github.com/gluster/glusterfs-hadoop
, git.commit.id=46d973834ae1db6eb6cf9ac025ded9a9ffa38c93, git.commit.message.short=Merge branch 'master' of https://github.com/gluster/glusterfs-hadoop, git.commit.user.name=childsb, git.build.user.name=childsb, git.commit.id.describe=2.3.13-6-g46d9738-dirty, git.build.user.email=[email protected], git.branch=master, git.commit.time=21.01.2015 @ 10:31:08 CST, git.build.time=21.01.2015 @ 12:01:55 CST}
15/09/03 09:32:58 INFO glusterfs.GlusterFileSystem: GIT_TAG=2.3.13
15/09/03 09:32:58 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
15/09/03 09:32:58 INFO glusterfs.GlusterVolume: Initializing gluster volume..
15/09/03 09:32:58 INFO glusterfs.GlusterVolume: Gluster volume: HadoopVol at : /mnt/glusterfs
15/09/03 09:32:58 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/hadoop
15/09/03 09:32:58 INFO glusterfs.GlusterVolume: Write buffer size : 131072
15/09/03 09:32:58 INFO glusterfs.GlusterVolume: Default block size : 67108864
15/09/03 09:32:58 INFO glusterfs.GlusterVolume: Directory list order : fs ordering
15/09/03 09:32:59 INFO client.RMProxy: Connecting to ResourceManager at cn0/192.168.1.40:8032
15/09/03 09:33:02 INFO glusterfs.GlusterVolume: Initializing gluster volume..
15/09/03 09:33:02 INFO glusterfs.GlusterVolume: Initializing gluster volume..
15/09/03 09:33:02 INFO glusterfs.GlusterVolume: Gluster volume: HadoopVol at : /mnt/glusterfs
15/09/03 09:33:02 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/hadoop
15/09/03 09:33:02 INFO glusterfs.GlusterVolume: Write buffer size : 131072
15/09/03 09:33:02 INFO glusterfs.GlusterVolume: Default block size : 67108864
15/09/03 09:33:02 INFO glusterfs.GlusterVolume: Directory list order : fs ordering
15/09/03 09:33:05 INFO terasort.TeraSort: Generating 109951 using 20
15/09/03 09:33:05 INFO mapreduce.JobSubmitter: number of splits:20
15/09/03 09:33:05 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/09/03 09:33:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1441272751391_0001
15/09/03 09:33:07 INFO glusterfs.GlusterVolume: Initializing gluster volume..
15/09/03 09:33:07 INFO glusterfs.GlusterVolume: Initializing gluster volume..
15/09/03 09:33:07 INFO glusterfs.GlusterVolume: Gluster volume: HadoopVol at : /mnt/glusterfs
15/09/03 09:33:07 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/hadoop
15/09/03 09:33:07 INFO glusterfs.GlusterVolume: Write buffer size : 131072
15/09/03 09:33:07 INFO glusterfs.GlusterVolume: Default block size : 67108864
15/09/03 09:33:07 INFO glusterfs.GlusterVolume: Directory list order : fs ordering
15/09/03 09:33:09 INFO impl.YarnClientImpl: Submitted application application_1441272751391_0001
15/09/03 09:33:09 INFO mapreduce.Job: The url to track the job: http://cn0:8088/proxy/application_1441272751391_0001/
15/09/03 09:33:09 INFO mapreduce.Job: Running job: job_1441272751391_0001
15/09/03 09:33:19 INFO mapreduce.Job: Job job_1441272751391_0001 running in uber mode : false
15/09/03 09:33:19 INFO mapreduce.Job: map 0% reduce 0%
15/09/03 09:33:19 INFO mapreduce.Job: Job job_1441272751391_0001 failed with state FAILED due to: Application application_1441272751391_0001 failed 2 times due to AM Container for appattempt_1441272751391_0001_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://cn0:8088/proxy/application_1441272751391_0001/Then, click on links to logs of each attempt.
Diagnostics: File glusterfs:/tmp/hadoop-yarn/staging/hadoop/.staging/job_1441272751391_0001/job.splitmetainfo does not exist.
java.io.FileNotFoundException: File glusterfs:/tmp/hadoop-yarn/staging/hadoop/.staging/job_1441272751391_0001/job.splitmetainfo does not exist.
at org.apache.hadoop.fs.glusterfs.GlusterVolume.getFileStatus(GlusterVolume.java:368)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:251)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Failing this attempt. Failing the application.
15/09/03 09:33:19 INFO mapreduce.Job: Counters: 0

why this happen

Building plugin without Git causes crash

On a Ubuntu server, I decided to download the plugin by using wget rather than using git clone. Building the plugin went fine, but when I tried to use it with my GlusterFS-Hadoop install, the following error occurred:

hadoop@gluster1:/usr/local/hadoop/bin$ ./hdfs dfs -ls
14/05/15 10:15:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/05/15 10:15:14 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/05/15 10:15:14 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
-ls: Fatal internal error
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2315)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:90)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2350)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2332)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:369)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:168)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:353)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
    at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:325)
    at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:224)
    at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:207)
    at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190)
    at org.apache.hadoop.fs.shell.Command.run(Command.java:154)
    at org.apache.hadoop.fs.FsShell.run(FsShell.java:255)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at org.apache.hadoop.fs.FsShell.main(FsShell.java:308)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129)
    ... 17 more
Caused by: java.lang.RuntimeException: Couldn't find git properties for version info null
    at org.apache.hadoop.fs.glusterfs.Version.<init>(Version.java:19)
    at org.apache.hadoop.fs.glusterfs.GlusterFileSystem.<init>(GlusterFileSystem.java:50)
    ... 22 more

Installing git and cloning the repository solved this problem. It seems to be dependent on the .git info in the repository and not Git itself, since uninstalling Git and rebuilding it caused no problems either.

support for major version incrementation

Lets decide how to do major version upgrades,,,, can this be automated via commit message? Or should we even worry about it at all?

(just some brainstorming...)
Right now

we explicitly grep for "ci-skip in commit message".
we always assume only the minor version is being incremented.

Lets make these key-value pairs sent into the bash script, example:

[ci-skip="false", version="auto"] translates to
--ci-skip=false --version=auto, which results in autoincrement of minor version along with auto publishing of the new version, that way someone who wants to upgrade the major version can do

commit -m "Lets bump to 2.2 [version="2.2.0"]

and then the version, rather than auto incrmented via python script, is explicitly set in the pom.

start-all.sh problem with glusterfs

hi, i use glusterfs 3.5.3 and hadoop 2.2.0 and make jar file and copy them to hadoop library.
and my core-site.xml is:

fs.glusterfs.impl org.apache.hadoop.fs.glusterfs.GlusterFileSystem fs.default.name glusterfs://fedora1.osslab.com:9010 fs.AbstractFileSystem.glusterfs.impl org.apache.hadoop.fs.local.GlusterFs fs.glusterfs.volumes test fs.glusterfs.volume.fuse.test /mnt/Hadoop

but when i use start-all script, i get this error:

Starting namenodes on []
fedora3.osslab.com: starting namenode, logging to /var/log/hadoop-hdfs/hadoop-root-namenode-fedora3.osslab.com.out
fedora3.osslab.com: starting datanode, logging to /var/log/hadoop-hdfs/hadoop-root-datanode-fedora3.osslab.com.out
Starting secondary namenodes [fedora1.osslab.com]
fedora1.osslab.com: starting secondarynamenode, logging to /var/log/hadoop-hdfs/hadoop-root-secondarynamenode-fedora1.osslab.com.out
fedora1.osslab.com: Exception in thread "main" java.lang.IllegalArgumentException: Invalid URI for NameNode address (check fs.defaultFS): glusterfs://fedora1.osslab.com:9010 is not of scheme 'hdfs'.
fedora1.osslab.com: at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:353)
fedora1.osslab.com: at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:335)
fedora1.osslab.com: at org.apache.hadoop.hdfs.server.namenode.NameNode.getServiceAddress(NameNode.java:328)
fedora1.osslab.com: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:235)
fedora1.osslab.com: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.(SecondaryNameNode.java:199)
fedora1.osslab.com: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:652)
starting yarn daemons
starting resourcemanager, logging to /var/log/hadoop-yarn/yarn-yarn-resourcemanager-fedora1.osslab.com.out
fedora3.osslab.com: starting nodemanager, logging to /var/log/hadoop-yarn/yarn-yarn-nodemanager-fedora3.osslab.com.out

the secondary namenode has problem with glusterfs.
does anyone have this problem?
thanks for your reply.

Need to define policy for hedging bets on the slippery reliance on underlying RawLocalFileSystem

We now have many instances of issues which can arise because the underlying RawLocalFileSystem version has semantics which are subtley different from HDFS.

Should we package a "stable" replication of RawLocalFileSystem implemented methods which copy over, from hadoop, the implementations that we know to be correct and specifically support those? With the various bugs out there around RawLocalFileSystem, it is a shame that we are so heavily dependenant on a classpath which we have no control over.

Examples: mkdirs , createNonRecursive, rename , and many other methods in RawLocalFileSystem are changing from version to version and it would be nice if we could have a stable library that we updated and included which has the latest and most reliable/correct semantics.

This is an open ended question - I'd like as many people as possible to comment on this and make an informed vote. and Ask more questions I havent fully explained the issue.

README file is old

Current README file of the project needs review as it no longer properly desribes most of the topics. Most important sections which requires rewrite are:

INSTALLATION - uses quite old Hadoop release as an example
CONFIGURATION - current configuration of the plugin is verry different, one would not configure plugin using this information

Copy s3 maven repo locally backups

Add an S3 task into test-jenkins that runs on the slave server
Lets also copy the S3 maven repo into EC2 nightly

GlusterFSXattr

Not work with file name Â§rÂ§lAstraÂ§4Â§lLexÂ§rÂ§l_(Â§4Â§lBSLÂ§rÂ§l_Edit)_By_LexBoosT_Â§4Â§lV37.0Â§rÂ§l.zip

Data Locality Optimisation

So does/did the Plugin fully preserve data locality optimisation as does HDFS ?

IllegalArgumentException: Wrong FS when running hadoop wordcount job example.

I don't know if this is issue in the glusterfs hadoop plugin or on my configuration, but every time i try to run a wordcount example job, always return error like this:
java.lang.IllegalArgumentException: Wrong FS: glusterfs:/mapred/system, expected: file:///

Complete configuration below.

Hadoop version is 2.8.1
glusterfs version is 3.8.15-2.el7
run on Centos 7

Here is the complete command and stdout

[hadoop@gluster1 hadoop]$ bin/hadoop jar /tmp/hadoop-0.20.2-examples.jar wordcount /hadoop/yarn-hadoop-resourcemanager-gluster1.out /hadoop/out/
17/09/27 10:50:03 INFO glusterfs.GlusterVolume: Initializing gluster volume..
17/09/27 10:50:03 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
17/09/27 10:50:03 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS,  CRC disabled.
17/09/27 10:50:03 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=f0fee73, [email protected], git.commit.message.full=Merge pull request #122 from c
hildsb/getfattrparse

Refactor and cleanup the BlockLocation parsing code, git.commit.id=f0fee73c336ac19461d5b5bb91a77e05cff73361, git.commit.message.short=Merge pull request #122 from childsb/getfattrparse, gi
t.commit.user.name=bradley childs, git.build.user.name=Unknown, git.commit.id.describe=GA-12-gf0fee73, git.build.user.email=Unknown, git.branch=master, git.commit.time=30.03.2015 @ 20:06:4
6 UTC, git.build.time=20.09.2017 @ 10:02:14 UTC}
17/09/27 10:50:03 INFO glusterfs.GlusterFileSystem: GIT_TAG=GA
17/09/27 10:50:03 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
17/09/27 10:50:03 INFO glusterfs.GlusterVolume: Initializing gluster volume..
17/09/27 10:50:03 INFO glusterfs.GlusterVolume: Gluster volume: gv0 at : /mnt
17/09/27 10:50:03 WARN glusterfs.GlusterVolume: mapred.system.dir/mapreduce.jobtracker.system.dir does not exist: glusterfs:/mapred/system
17/09/27 10:50:03 WARN glusterfs.GlusterVolume: working directory does not exist: glusterfs:/user/hadoop
17/09/27 10:50:03 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/hadoop
17/09/27 10:50:03 INFO glusterfs.GlusterVolume: Write buffer size : 131072
17/09/27 10:50:03 INFO glusterfs.GlusterVolume: Default block size : 67108864
17/09/27 10:50:03 INFO glusterfs.GlusterVolume: Directory list order : fs ordering
17/09/27 10:50:03 INFO glusterfs.GlusterVolume: File timestamp lease significant digits removed : 0
17/09/27 10:50:03 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
17/09/27 10:50:03 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
java.lang.IllegalArgumentException: Wrong FS: glusterfs:/mapred/system, expected: file:///
        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:665)
        at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:484)
        at org.apache.hadoop.fs.FilterFileSystem.makeQualified(FilterFileSystem.java:120)
        at org.apache.hadoop.mapred.LocalJobRunner.getSystemDir(LocalJobRunner.java:864)
        at org.apache.hadoop.mapreduce.Cluster$1.run(Cluster.java:187)
        at org.apache.hadoop.mapreduce.Cluster$1.run(Cluster.java:185)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:421)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
        at org.apache.hadoop.mapreduce.Cluster.getFileSystem(Cluster.java:185)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1336)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1359)
        at org.apache.hadoop.examples.WordCount.main(WordCount.java:87)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
        at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
        at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:148)

core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>fs.glusterfs.impl</name>
  <value>org.apache.hadoop.fs.glusterfs.GlusterFileSystem</value>
</property>

<property>
  <name>fs.AbstractFileSystem.glusterfs.impl</name>
  <value>org.apache.hadoop.fs.local.GlusterFs</value>
</property>

<property>
    <name>fs.default.name</name>
    <value>glusterfs:///</value>
</property>

<property>
    <name>fs.glusterfs.volumes</name>
    <value>gv0</value>
</property>

<property>
    <name>fs.glusterfs.volume.fuse.gv0</name>
    <value>/mnt</value>
</property>

<property>
    <name>dfs.permissions</name>
    <value>false</value>
</property>
</configuration>

mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
    <name>yarn.app.mapreduce.am.staging-dir</name>
    <value>glusterfs:///tmp/hadoop-yarn/staging/mapred/.staging</value>
  </property>
  <property>
    <name>mapred.healthChecker.script.path</name>
    <value>glusterfs:///mapred/jobstatus</value>
  </property>
  <property>
    <name>mapred.job.tracker.history.completed.location</name>
    <value>glusterfs:///mapred/history/done</value>
  </property>
  <property>
    <name>mapred.system.dir</name>
    <value>glusterfs:///mapred/system</value>
  </property>
  <property>
    <name>mapreduce.jobhistory.done-dir</name>
    <value>glusterfs:///job-history/done</value>
  </property>

  <property>
    <name>mapreduce.jobhistory.intermediate-done-dir</name>
    <value>glusterfs:///job-history/intermediate-done</value>
  </property>

  <property>
    <name>mapreduce.jobtracker.staging.root.dir</name>
    <value>glusterfs:///user</value>
  </property>
  <property>
        <name>mapred.job.tracker</name>
        <value>hadoop-master:9001</value>
  </property>
</configuration>

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
        <name>dfs.data.dir</name>
        <value>/opt/hadoop/hadoop/dfs/name/data</value>
        <final>true</final>
</property>
<property>
        <name>dfs.name.dir</name>
        <value>/opt/hadoop/hadoop/dfs/name</value>
        <final>true</final>
</property>
<property>
        <name>dfs.replication</name>
        <value>1</value>
</property>
</configuration>

mvn package error

hi
when i execute mvn package, i get error!
Does anyone have problem with mvn package?
please help me.
thank for your reply

Reference to Red Hat Bugzilla is wrong

File contribute.md contains wrong reference to Red Hat Bugzilla: product "Red Hat Storage" doesn't contain gluster-hadoop component. As far as I know the correct link should point to the community component here: Classification: Community, Component: gluster-hadoop, Product: GlusterFS

Deploy , test and evaluate new CI after merging pull 85

We need to evaluate the existing CI after releasing Pull #85.

Did it build and tests pass
Did it run the mapreduce job

3) Did it autocommit and release the jar with incremented version to http://rhbd.s3.amazonaws.com/maven/indexV2.html

Are build/deploy scripts under version control somewhere (not necessarily on this repo)
Security : Have we audited security of the master build server/slave server

Maven-metadata : Clean up repos

Minor issue, but lets wrap our heads around this.

We can see that

http://rhbd.s3.amazonaws.com/maven/repositories/internal/org/apache/hadoop/fs/glusterfs/glusterfs-hadoop/maven-metadata.xml

Sais that there is a "current" and a "releases" tag.

Current points to 2.1.4, releases points to 2.1.6. Lets manually update that if the next release doesnt fix it for us.
After manually fixing the version number :Lets figure out exactly how to maintain that xml file and keep it up to date. Probably best to do this on a private fork (jonska/...), do a test release, and see how the corresponding maven-metadata.xml file is updated.

after yarn starts

Hi, i start yarn in my servers, and when i use jps command, the output shows me resourcemanager and nodemanager is running, now i have one question, how to test hadoop works true with glusterfs?
please help me.
thanks for your reply.

gluster / glusterfs-hadoop Goto Github PK

glusterfs-hadoop's Introduction

glusterfs-hadoop's People

Contributors

Stargazers

Watchers

Forkers

glusterfs-hadoop's Issues

HDFS Teragen results:

GlusterFS teragen results:

Now HDFS Terasort results:

Whereas when using glusterFS terasort the results show 3 times Launched map tasks and a very long time of handling (on the same env):

Run results:

3) Did it autocommit and release the jar with incremented version to http://rhbd.s3.amazonaws.com/maven/indexV2.html

Recommend Projects

Recommend Topics

Recommend Org