kiwenlau / hadoop-cluster-docker Goto Github PK
View Code? Open in Web Editor NEWRun Hadoop Custer within Docker Containers
License: Apache License 2.0
Run Hadoop Custer within Docker Containers
License: Apache License 2.0
How can I use Hadoop streaming with this cluster? I can't find the jar for it
I found a better way to solve this issue. Docker **官方镜像加速.
sudo docker pull registry.docker-cn.com/kiwenlau/hadoop:1.0
I strongly recommend adding this to README.md.
报错如下:
[root@computer002 hadoop-cluster-docker]# ./start-container.sh
start master container...
start slave1 container...
start slave2 container...
FATA[0000] Error response from daemon: Container master is not running
docker info 如下 :
[root@computer002 hadoop-cluster-docker]# docker info
Containers: 5
Images: 71
Storage Driver: devicemapper
Pool Name: docker-253:0-2494751-pool
Pool Blocksize: 65.54 kB
Backing Filesystem: extfs
Data file: /dev/loop0
Metadata file: /dev/loop1
Data Space Used: 4.494 GB
Data Space Total: 107.4 GB
Metadata Space Used: 5.648 MB
Metadata Space Total: 2.147 GB
Udev Sync Supported: true
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
Library Version: 1.02.89-RHEL6 (2014-09-01)
Execution Driver: native-0.2
Kernel Version: 2.6.32-504.23.4.el6.x86_64
Operating System: <unknown>
CPUs: 12
Total Memory: 23.39 GiB
Name: computer002
ID: 7N3H:JKBG:43WH:SOJM:PSJM:MJEP:V37S:ZHMP:2YPK:QL74:PT4C:FUB3
怎么样配置yarn以及开启yarn呢?
I had type the following :
root@master:~# : serf member
master.kiwenlau.com 172.17.0.65:7946 alive
5 mins later still
master.kiwenlau.com 172.17.0.65:7946 alive
don't seem to load up
Hii
I'm keen in building centos how can i reference your and build a centos rather then ubuntu
some users don't have sudo permissions but they belong to docker group.
root@hadoop-master:~# ./run-wordcount.sh
18/12/04 03:48:57 INFO client.RMProxy: Connecting to ResourceManager at hadoop-master/172.18.0.2:8032
18/12/04 03:48:58 INFO input.FileInputFormat: Total input paths to process : 2
18/12/04 03:48:58 INFO mapreduce.JobSubmitter: number of splits:2
18/12/04 03:48:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1543895269322_0001
18/12/04 03:48:59 INFO impl.YarnClientImpl: Submitted application application_1543895269322_0001
18/12/04 03:48:59 INFO mapreduce.Job: The url to track the job: http://hadoop-master:8088/proxy/application_1543895269322_0001/
18/12/04 03:48:59 INFO mapreduce.Job: Running job: job_1543895269322_0001
18/12/04 03:49:10 INFO mapreduce.Job: Job job_1543895269322_0001 running in uber mode : false
18/12/04 03:49:10 INFO mapreduce.Job: map 0% reduce 0%
18/12/04 03:49:20 INFO mapreduce.Job: Task Id : attempt_1543895269322_0001_m_000001_0, Status : FAILED
Exception from container-launch.
Container id: container_1543895269322_0001_01_000003
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
我看了issue,但没有解决这个问题.
If I want to only go up the slaves by the docker and my local machine be the master, is it possible ?, How?
Example:
My notebook - master
Docker image - 3 slaves
i follow the instructions and successful startup the hadoop using ./start_hadoop.sh. But when i run the wordcount example, i got this messages, and it just stop there:
root@hadoop-master:~# ./run-wordcount.sh
mkdir: cannot create directory 'input': File exists
16/12/05 15:03:01 INFO client.RMProxy: Connecting to ResourceManager at hadoop-master/172.18.0.2:8032
16/12/05 15:03:02 INFO input.FileInputFormat: Total input paths to process : 2
16/12/05 15:03:02 INFO mapreduce.JobSubmitter: number of splits:2
16/12/05 15:03:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1480950171648_0001
16/12/05 15:03:02 INFO impl.YarnClientImpl: Submitted application application_1480950171648_0001
16/12/05 15:03:02 INFO mapreduce.Job: The url to track the job: http://hadoop-master:8088/proxy/application_1480950171648_0001/
16/12/05 15:03:02 INFO mapreduce.Job: Running job: job_1480950171648_0001
脚本中的代码:
# start hadoop master container sudo docker rm -f hadoop-master &> /dev/null echo "start hadoop-master container..." sudo docker run -itd \ --net=hadoop \ -p 50070:50070 \ -p 8088:8088 \ --name hadoop-master \ --hostname hadoop-master \ kiwenlau/hadoop:1.0 &> /dev/null
I've searched many articles about Hadoop running on Docker and like Mathew said "this is a unique one," indeed the strategy adopted is very good to make a set of big data of low cost and low complexity.
Thank you very much for sharing this!
But I stayed with a doubt ... Is there any problem in using the feature docker volume
to "share" folder hdfs
of the hadoop-master
with the host
in order to insert new files and make backups.
Such doubts came up to me after reading this article Understanding Volumes in Docker by Adrian Mouat
I run 'hadoop-master' by mount a local folder by '-v'. Then I enter the hadoop-master and 'cd' to the mount folder run 'hdfs dfs -put ./data/* input/'. It works.
But my problem is that I cannot delete the data I copy to 'hdfs'. I delete containers by 'docker rm' ,but the data still exist. Now I only can reset Docker and the data can be deleted.
Is there any other solution?
This is my docker info
➜ hadoop docker info
Containers: 5
Running: 5
Paused: 0
Stopped: 0
Images: 1
Server Version: 1.12.3
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 22
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: null bridge host overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 4.4.27-moby
Operating System: Alpine Linux v3.4
OSType: linux
Architecture: x86_64
CPUs: 5
Total Memory: 11.71 GiB
Name: moby
ID: NPR6:2ZTU:CREI:BHWE:4TQI:KFAC:TZ4P:S5GM:5XUZ:OKBH:NR5C:NI4T
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 56
Goroutines: 81
System Time: 2016-11-22T08:10:37.120826598Z
EventsListeners: 2
Username: chaaaa
Registry: https://index.docker.io/v1/
WARNING: No kernel memory limit support
Insecure Registries:
127.0.0.0/8
JAVA API access Hbase-cluster.
Caused by: java.net.UnknownHostException: hadoop-slave2.hadoop
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.<init>(AbstractRpcClient.java:315)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.createBlockingRpcChannel(AbstractRpcClient.java:267)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getClient(ConnectionManager.java:1639)
at org.apache.hadoop.hbase.client.ScannerCallable.prepare(ScannerCallable.java:163)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.prepare(ScannerCallableWithReplicas.java:376)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
... 4 more
zeegin@zeegin-Virtual-Machine:~/hadoop-cluster-docker$
./start-container.sh
start hadoop-master container...
start hadoop-slave1 container...
start hadoop-slave2 container...
Error response from daemon: Container ebcfe89abd2ffa3c038b6960d279e95e3b3e2426cd002406422facfd8c09b04b is not running
zeegin@zeegin-Virtual-Machine:~/hadoop-cluster-docker$ docker info
Containers: 3
Running: 0
Paused: 0
Stopped: 3
Images: 1
Server Version: 1.12.6
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 18
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: host overlay bridge null
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.8.0-46-generic
Operating System: Ubuntu 16.04.2 LTS
OSType: linux
Architecture: i686
CPUs: 1
Total Memory: 998.4 MiB
Name: zeegin-Virtual-Machine
ID: 4TFL:RFFY:DZIF:5ATO:OJRP:5RAD:HLXT:X6ZH:VIJN:426D:7WXK:V6JG
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
127.0.0.0/8
Hi,
I have your setup up & running. run-wordcount.sh
ran smoothly without any issues. However the problem that I have is that the NameNode and DataNodes are "hidden" behind their network. Ports 50070 & 8088 are open, but is it enough to access this HDFS from my local machine (outside the docker network that was created)? Probably not.
In other words, I'm looking for ability to
Any hints?
When i try to start-container get this error. I am new to docker and i don't know is this error belongs here. Looks like docker-machine does't have bash.
When I run
hadoop jar autoComplete.jar src.autoComplete.Driver input output 5
It takes a long time to see the run log changes, and then it's stucking in the loop:
17/11/28 12:07:02 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
17/11/28 12:07:03 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
17/11/28 12:07:04 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
17/11/28 12:07:05 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
What's wrong here? Thanks.
我开启了 9000 和 8032 端口. 使用 JAVA API 进行 上传文件. 会提示 : could only be replicated to 0 nodes instead of minReplication (=1). There are 2 datanode(s) running and 2 node(s) are excluded in this operation.
我不明白这是 hadoop 配置问题还是什么? 目前 还出现在 sequenceiq/hadoop-docker. 自己搭建 的 hadoop 2.9.1 也存在 此问题.
Thank you.
All steps were successful but when I try to run the ./run-wordcount.sh
on master
I receive the following messages:
15/09/18 15:34:14 INFO client.RMProxy: Connecting to ResourceManager at master.kiwenlau.com/172.17.0.8:8040
15/09/18 15:34:16 INFO ipc.Client: Retrying connect to server: master.kiwenlau.com/172.17.0.8:8040. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
...
15/09/18 15:39:53 INFO ipc.Client: Retrying connect to server: master.kiwenlau.com/172.17.0.8:8040. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
Then I send SIGINT
signal and I receive the following messages:
input file1.txt:
Hello Hadoop
input file2.txt:
Hello Docker
wordcount output:
cat: `output/part-r-00000': No such file or directory
There are other ways of logs that can help you know what's going on? I could not use the YARN.
Error response from daemon: Container 2e445deeec26bd4d44b95f9a982e8c840ae2bcd99fbeed842c4f39bfc9f986f5 is not running
This project holds good only for single host. How to use it on multi-host ? i have been trying different implementations provided on web, but nothing is straight forward.
请问 我想通过java来操作hdfs
FileSystem fs = FileSystem.get(new URI("hdfs://172.18.0.2:9000/"), configuration, "root"); System.out.println("begin copy"); fs.copyFromLocalFile(new Path("/Users/xxx/apps/test/test.log"), new Path("/")); System.out.println("done!");
用hadoop上master上的ip 没法在hdfs上创建文件
我仿照脚本加上了一个 0.0.0.0:9000 -> 9000/tcp 冲宿主机上映射到hadoop-master上的9000端口,hdfs://localhost:9000/ 发现虽然能创建文件但size是0
请教一下,谢谢!
build docker hadoop image
Sending build context to Docker daemon 305.2 kB
Step 1/15 : FROM ubuntu:14.04
14.04: Pulling from library/ubuntu
c60055a51d74: Pull complete
755da0cdb7d2: Pull complete
969d017f67e6: Pull complete
37c9a9113595: Pull complete
a3d9f8479786: Pull complete
Digest: sha256:8f5f12335124c1b78e4cf2f8860d395f75ba279bae70a3c18dd470e910e38ec5
Status: Downloaded newer image for ubuntu:14.04
---> b969ab9f929b
Step 2/15 : MAINTAINER KiwenLau [email protected]
mkdir /var/lib/docker/overlay/0c7a9394e523f14a0f16f8f940b14702c7454ecd04b0aeb1dfe85a4373a7e111-init/merged/dev/shm: invalid argument
docker info:
Containers: 2
Running: 0
Paused: 0
Stopped: 2
Images: 3
Server Version: 1.13.1
Storage Driver: overlay
Backing Filesystem: extfs
Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1
runc version: 9df8b306d01f59d3a8029be411de015b7304dd8f
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-327.22.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.51 GiB
Name: iZ2zegz864acs9ifmprfaqZ
ID: CC4X:3HWP:UKBS:4XQK:2FFT:YJ32:5WOS:JTN5:W6KX:7FXX:SAAE:YKC4
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Experimental: false
Insecure Registries:
127.0.0.0/8
Registry Mirrors:
https://zd8ozs9s.mirror.aliyuncs.com
Live Restore Enabled: false
linux version : centos 7
When I pull your image, I am getting the following error:
Status: Downloaded newer image for kiwenlau/hadoop-master:0.1.0 docker.io/kiwenlau/hadoop-master: this image was pulled from a legacy registry. Important: This registry version will not be supported in future versions of docker.
I think this is gonna fix it: ansible/ansible-modules-core#2351
Pulling repository index.alauda.cn/kiwenlau/serf-dnsmasq
2015/06/09 05:29:32 Error: image kiwenlau/serf-dnsmasq not found
执行
$HADOOP_HOME/bin/hadoop jar
$HADOOP_HOME/mapred/contrib/streaming/hadoop-streaming*.jar,
报错:No such file or directory。
请问是怎么回事?小白求指教
start hadoop-master container...
start hadoop-slave1 container...
start hadoop-slave2 container...
Error response from daemon: Container 2bfaaa643f18a33c9bb018140b91af35b68c777b3f8c1c5d4081af84e0b8af5a is not running
root@master:~# serf members
Error connecting to Serf agent: dial tcp 127.0.0.1:7373: connection refused
./start-ssh-serf.sh: line 9: /etc/serf/start-serf-agent.sh: Permission denied
hbase-env.sh
# Set environment variables here.
# This script sets variables multiple times over the course of starting an hbase process,
# so try to keep things idempotent unless you want to take an even deeper look
# into the startup scripts (bin/hbase, etc.)
# The java implementation to use. Java 1.7+ required.
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
# Extra Java CLASSPATH elements. Optional.
export HBASE_CLASSPATH=/usr/local/hadoop/etc/hadoop/
# The maximum amount of heap to use. Default is left to JVM default.
# export HBASE_HEAPSIZE=1G
# Uncomment below if you intend to use off heap cache. For example, to allocate 8G of
# offheap, set the value to "8G".
# export HBASE_OFFHEAPSIZE=1G
# Extra Java runtime options.
# Below are what we set by default. May only work with SUN JVM.
# For more on why as well as other possible settings,
# see http://wiki.apache.org/hadoop/PerformanceTuning
export HBASE_OPTS="-XX:+UseConcMarkSweepGC"
# Configure PermSize. Only needed in JDK7. You can safely remove it for JDK8+
export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"
export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"
# Uncomment one of the below three options to enable java garbage collection logging for the server-side processes.
# This enables basic gc logging to the .out file.
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"
# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"
# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"
# Uncomment one of the below three options to enable java garbage collection logging for the client processes.
# This enables basic gc logging to the .out file.
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"
# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"
# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"
# See the package documentation for org.apache.hadoop.hbase.io.hfile for other configurations
# needed setting up off-heap block caching.
# Uncomment and adjust to enable JMX exporting
# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.
# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html
# NOTE: HBase provides an alternative JMX implementation to fix the random ports issue, please see JMX
# section in HBase Reference Guide for instructions.
# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10101"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10102"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"
# export HBASE_REST_OPTS="$HBASE_REST_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10105"
# File naming hosts on which HRegionServers will run. $HBASE_HOME/conf/regionservers by default.
# export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers
# Uncomment and adjust to keep all the Region Server pages mapped to be memory resident
#HBASE_REGIONSERVER_MLOCK=true
#HBASE_REGIONSERVER_UID="hbase"
# File naming hosts on which backup HMaster will run. $HBASE_HOME/conf/backup-masters by default.
# export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters
# Extra ssh options. Empty by default.
# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"
# Where log files are stored. $HBASE_HOME/logs by default.
# export HBASE_LOG_DIR=${HBASE_HOME}/logs
# Enable remote JDWP debugging of major HBase processes. Meant for Core Developers
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073"
# A string representing this instance of hbase. $USER by default.
# export HBASE_IDENT_STRING=$USER
# The scheduling priority for daemon processes. See 'man nice'.
# export HBASE_NICENESS=10
# The directory where pid files are stored. /tmp by default.
# export HBASE_PID_DIR=/var/hadoop/pids
# Seconds to sleep between slave commands. Unset by default. This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HBASE_SLAVE_SLEEP=0.1
# Tell HBase whether it should manage it's own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=true
hbase-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop-master:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master</name>
<value>hadoop-master:60000</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/usr/local/zookeeper</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop-master,hadoop-slave1,hadoop-slave2,hadoop-slave3,hadoop-slave4</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
</configuration>
hbase(main):001:0> status
ERROR: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:2293)
at org.apache.hadoop.hbase.master.MasterRpcServices.getClusterStatus(MasterRpcServices.java:777)
at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:55652)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2180)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
at java.lang.Thread.run(Thread.java:745)
Here is some help for this command:
Show cluster status. Can be 'summary', 'simple', 'detailed', or 'replication'. The
default is 'summary'. Examples:
hbase> status
hbase> status 'simple'
hbase> status 'summary'
hbase> status 'detailed'
hbase> status 'replication'
hbase> status 'replication', 'source'
hbase> status 'replication', 'sink
'
root@hadoop-master:~# jps
2116 Jps
1320 HQuorumPeer
222 NameNode
1430 HMaster
696 ResourceManager
1737 Main
467 SecondaryNameNode
17/04/03 18:03:31 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
17/04/03 18:03:32 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
17/04/03 18:03:33 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
17/04/03 18:03:34 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
17/04/03 18:03:35 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
17/04/03 18:03:36 INFO ipc.Client: Retrying connect to server: hadoop-master/172.18.0.2:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
能否提供hadoop3的docker啊!
on osx, a master cannot find its default 2 slaves: slave1.kiwenlau.com and slave2.kiwenlau.com. the 2 urls must be added to /etc/hosts:
172.17.0.2 slave1.kiwenlau.com slave1
i have see your install hadoop cmd in hadoop-base dockerfile, it's just a ln cmd instead of install cmd
(/ □ ):
RUN ln -s /usr/local/hadoop-2.3.0 /usr/local/hadoop.
you must have compiled hadoop and i found compile step in your blog. http://www.cnblogs.com/kiwenlau/p/4227204.html
maybe need a ADD cmd in dockerfile.
ADD xxx /usr/local/hadoop-2.3.0
Hi,
This repo really help me a lot, thank for your kindness.
However I run python script with hadoop-streaming jar and it stop at map 100% reduce 0%
and then I go to dashboard and find reducer is stop at STATING, which word count example is work perfect.
hadoop jar hadoop-streaming.jar \
-files mapper.py,reducer.py \
-mapper "python mapper.py" \
-reducer "python reducer.py" \
-input streaming-data \
-output streaming-output
And I run it by local command, it also work
cat data/* | python mapper.py | sort | python reducer.py
http://www.sunlab.org/teaching/cse6250/fall2017/lab/hadoop-streaming/
Where can I find out office python example, to clarify where is the problem?
I am trying to run sequenceiq/hadoop-docker:2.4.1 in docker as :
docker run -i -t sequenceiq/hadoop-docker:2.4.1 /etc/bootstrap.sh -bash
and getting the following error :
17/01/09 00:17:06 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
java.net.ConnectException: Call From be491b66d596/172.17.0.2 to be491b66d596:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
at org.apache.hadoop.ipc.Client.call(Client.java:1414)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy14.delete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
at com.sun.proxy.$Proxy14.delete(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:482)
at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1703)
at org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:595)
at org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:591)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:591)
at org.apache.hadoop.examples.Grep.run(Grep.java:95)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.examples.Grep.main(Grep.java:101)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462)
at org.apache.hadoop.ipc.Client.call(Client.java:1381)
... 31 more
Any reason why am getting the same?
Currently working on CENTOS 7 with kernel version :
3.10.0-514.2.2.el7.x86_64
I tried this out on my local and it was fantastic, 5 node cluster. However, I want to also set up a spark cluster on the docker images as well as a multi-host cluster for high availability.
E.g have a 5 node cluster per physical host for 3 physical hosts and have them communicate with each other.
Also want to know if this image is production ready.
Thank you
wordcount output:
cat: Call From hadoop-master/172.18.0.2 to hadoop-master:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
When pulling the repository, it appears the following errors:
`docker@ubuntu:~$ sudo docker pull kiwenlau/hadoop:1.0
[sudo] password for docker:
Pulling repository docker.io/kiwenlau/hadoop
Tag 1.0 not found in repository docker.io/kiwenlau/hadoop
`
: not foundiner.sh: 2: start-container.sh:
: not foundiner.sh: 5: start-container.sh:
: not foundiner.sh: 6: start-container.sh:
: Permission denied 8: start-container.sh: cannot create /dev/null
start hadoop-master container...
Error response from daemon: No such container: hadoop-master
docker: invalid reference format.
See 'docker run --help'.
start-container.sh: 11: start-container.sh: --net=hadoop: not found
start-container.sh: 12: start-container.sh: -p: not found
start-container.sh: 13: start-container.sh: -p: not found
start-container.sh: 14: start-container.sh: --name: not found
start-container.sh: 15: start-container.sh: --hostname: not found
: Permission denied 16: start-container.sh: cannot create /dev/null
: not foundiner.sh: 17: start-container.sh:
: not foundiner.sh: 18: start-container.sh:
start-container.sh: 31: start-container.sh: Syntax error: "done" unexpected (expecting "do")
start-container.sh: 16: start-container.sh: kiwenlau/hadoop:1.0: not found
Hi
I'm confuse on what are the file inside the docker image before get clone how I could take a look at it . Looking forward to heard from you soon ! can guide me how you build your dockerfile images ? I very confuse
when running my own mapreduce job, I met that problem. And I have changed the hadoop-env.sh from jdk7 to jdk8
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
Below is the error message:
root@hadoop-master:~/src# hadoop com.sun.tools.javac.Main *.java
Error: Could not find or load main class com.sun.tools.javac.Main
start-hadoop doesn't start node manager and data node. ony start resource manager
root@hadoop-master:# $HADOOP_HOME/sbin/start-dfs.sh# ps -ef
Starting namenodes on [hadoop-master]
hadoop-master: Warning: Permanently added 'hadoop-master,172.18.0.2' (ECDSA) to the list of known hosts.
hadoop-master: Welcome to Ubuntu 14.04 LTS (GNU/Linux 3.13.0-86-generic x86_64)
hadoop-master:
hadoop-master: * Documentation: https://help.ubuntu.com/
hadoop-master:
hadoop-master: The programs included with the Ubuntu system are free software;
hadoop-master: the exact distribution terms for each program are described in the
hadoop-master: individual files in /usr/share/doc//copyright.
hadoop-master:
hadoop-master: Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
hadoop-master: applicable law.
hadoop-master:
hadoop-slave2: Warning: Permanently added 'hadoop-slave2,172.18.0.4' (ECDSA) to the list of known hosts.
hadoop-slave1: Warning: Permanently added 'hadoop-slave1,172.18.0.3' (ECDSA) to the list of known hosts.
hadoop-slave1: Welcome to Ubuntu 14.04 LTS (GNU/Linux 3.13.0-86-generic x86_64)
hadoop-slave1:
hadoop-slave1: * Documentation: https://help.ubuntu.com/
hadoop-slave1:
hadoop-slave1: The programs included with the Ubuntu system are free software;
hadoop-slave1: the exact distribution terms for each program are described in the
hadoop-slave1: individual files in /usr/share/doc//copyright.
hadoop-slave1:
hadoop-slave1: Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
hadoop-slave1: applicable law.
hadoop-slave1:
hadoop-slave2: Welcome to Ubuntu 14.04 LTS (GNU/Linux 3.13.0-86-generic x86_64)
hadoop-slave2:
hadoop-slave2: * Documentation: https://help.ubuntu.com/
hadoop-slave2:
hadoop-slave2: The programs included with the Ubuntu system are free software;
hadoop-slave2: the exact distribution terms for each program are described in the
hadoop-slave2: individual files in /usr/share/doc/*/copyright.
hadoop-slave2:
hadoop-slave2: Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
hadoop-slave2: applicable law.
hadoop-slave2:
Starting secondary namenodes [0.0.0.0]
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: Welcome to Ubuntu 14.04.4 LTS (GNU/Linux 4.1.12-61.1.28.el6uek.x86_64 x86_64)
0.0.0.0:
0.0.0.0: * Documentation: https://help.ubuntu.com/
root@hadoop-master:
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 21:11 ? 00:00:00 sh -c service ssh start; bash
root 33 1 0 21:11 ? 00:00:00 /usr/sbin/sshd
root 36 1 0 21:11 ? 00:00:00 bash
root 46 0 0 21:11 ? 00:00:00 bash
root 400 46 0 21:12 ? 00:00:00 ps -ef
@kiwenlau 你好。我看slave hadoop启动的时候都需要配置 -e JOIN_IP=$FIRST_IP 如下:
sudo docker run -d -t --dns 127.0.0.1 -P --name slave$i -h slave$i.kiwenlau.com -e JOIN_IP=$FIRST_IP kiwenlau/hadoop-slave:0.1.0 &> /dev/null
serf是否都需要配置这个环境变量才能工作,那么FIRT_IP的机器是否每次都要先启动。
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.