flokkr / docker-hadoop Goto Github PK

View Code? Open in Web Editor NEW

56.0 56.0 24.0 13.17 MB

Docker image for main Apache Hadoop components (Yarn/Hdfs)

Home Page: https://flokkr.github.io

License: Apache License 2.0

Shell 67.91% Dockerfile 28.16% Go 3.92%

docker flekszible hadoop hdfs yarn

docker-hadoop's Introduction

Run Apache Bigdata projects in Kubernetes

Flokkr is a project to run Apache Bigdata projects.

It provides:

Ready to use docker containers to run Hadoop/Ozone/Flink/Spark.
Example cluster definitions and tests
A powerful, composition-based approach to generate Kubernetes resource files for different use cases.
Toolset to run Bigdata project in the containers

Active repositories to check:

Containers are created from flokkr/docker-* repositories
Resource generation is based on https://github.com/elek/flekszible tool (Think about Kustomize on steriod)
Kustomize based cluster definition can be found:
- https://github.com/flokkr/k8s: Main repository with all the K8s definitions
- https://github.com/flokkr/infra-flekszible: Independent monitoring/logging as addition to the clusters
- https://github.com/flokkr/ozone-flekszible: Dedicated defitions for Apache Hadoop Ozone
The last three repositories can be used with flekszible to generate any cluster setup (and deploy it with existing k8s tools)
https://github.com/flokkr/demo repository contains ready-to-use example cluster defintions (easy to customize with flekszible) with different components (Spark + Yarn, Flink + Kafka, Ozone + Spark, ...)

For more information, check https://github.flokkr.iot

docker-hadoop's People

Contributors

Stargazers

Watchers

docker-hadoop's Issues

Example config doesn't work

Hi!

I'm using the docker-compose.yml from the example.

And namenode is falling. Here's the logs:

Attaching to hdfs_namenode
hdfs_namenode | Called launcher with command parameters: /opt/hadoop/bin/hdfs namenode
hdfs_namenode | Configuration type: simple
hdfs_namenode | hdfs-site.xml
hdfs_namenode | File hdfs-site.xml has been written out successfullly.
hdfs_namenode | log4j.properties
hdfs_namenode | File log4j.properties has been written out successfullly.
hdfs_namenode | Listening for transport dt_socket at address: 5005
hdfs_namenode | log4j:ERROR Could not find value for key log4j.appender.stdout
hdfs_namenode | log4j:ERROR Could not instantiate appender named "stdout".
hdfs_namenode | log4j:WARN No appenders could be found for logger (org.apache.hadoop.hdfs.server.namenode.NameNode).
hdfs_namenode | log4j:WARN Please initialize the log4j system properly.
hdfs_namenode | log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

Can you help me make it work, please?

/cc @frol

Upgrade to hadoop 3.3.4

Hadoop 3.3.4 was released on 2022 Aug 8. We have the desire to move to this version of Hadoop to possibly mitigate vulnerabilities found in Hadoop 3.3.1 and the version of the JARs that it uses.

Thank you very much for providing such good technology.
But when I used it, I found that yarn logs reported an error:
[hadoop@yarn-resourcemanager-1-0 bin]$ yarn logs -applicationId application_1640440677063_0006
2021-12-25 14:21:56 INFO RMProxy:134-Connecting to ResourceManager at yarn-resourcemanager-1-0.yarn-resourcemanager-1.default.svc.cluster.local/10.2.1.1:8032
Can not find any log file matching the pattern: [ALL] for the application: application_1640440677063_0006
Can not find the logs for the application: application_1640440677063_0006 with the appOwner: hadoop

Any good suggestions?

Failed to start the KeyspaceManager

Hi,
I try to startup the hadoop/ozone cluster as explained here:
https://cwiki.apache.org/confluence/display/HADOOP/Getting+Started+with+docker

But I got the following error when trying to start:

......
ksm_1       | STARTUP_MSG:   build = https://github.com/apache/hadoop.git -r 056a9783337f7e384f651cf86b30abf995d1ead8; compiled by 'elek' on 2017-09-27T19:10Z
ksm_1       | STARTUP_MSG:   java = 1.8.0_121
ksm_1       | ************************************************************/
ksm_1       | 2017-09-29 22:03:33 INFO  KeySpaceManager:51 - registered UNIX signal handlers for [TERM, HUP, INT]
ksm_1       | 2017-09-29 22:03:34 INFO  CallQueueManager:84 - Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 20000 scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler
ksm_1       | 2017-09-29 22:03:34 INFO  Server:1067 - Starting Socket Reader #1 for port 9862
ksm_1       | 2017-09-29 22:03:34 WARN  NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ksm_1       | 2017-09-29 22:03:34 ERROR KeySpaceManager:191 - Failed to start the KeyspaceManager.
ksm_1       | java.lang.NullPointerException
ksm_1       | 	at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:187)
ksm_1       | 	at org.apache.hadoop.ozone.web.utils.OzoneUtils.getScmMetadirPath(OzoneUtils.java:256)
ksm_1       | 	at org.apache.hadoop.ozone.ksm.KSMMetadataManagerImpl.<init>(KSMMetadataManagerImpl.java:71)
ksm_1       | 	at org.apache.hadoop.ozone.ksm.KeySpaceManager.<init>(KeySpaceManager.java:104)
ksm_1       | 	at org.apache.hadoop.ozone.ksm.KeySpaceManager.main(KeySpaceManager.java:187)
ksm_1       | 2017-09-29 22:03:34 INFO  ExitUtil:210 - Exiting with status 1: java.lang.NullPointerException
ksm_1       | 2017-09-29 22:03:34 INFO  KeySpaceManager:51 - SHUTDOWN_MSG: 
ksm_1       | /************************************************************
ksm_1       | SHUTDOWN_MSG: Shutting down KeySpaceManager at 25ef3a8c8827/172.18.0.4
ksm_1       | ************************************************************/
ksm_1       | Process exited with exit code 1
ksm_1       | Process has been failed (exit code: 1), restarting after 60 seconds... (4/10)
......

Is there any solution to fix this? It would be great to use docker for testing the hadoop/ozone feature.

Thanks for any help

not localhost:8080 it is localhost:8088

How to persist data on the datanodes?

It seems that the example config persists only the namenode data to the host volume (at least, I see only the ./namenode/ folder in the mounted volume after I start namenode and datanode). What is the point of /tmp:/data volume mounting for the datanode container in this case? How can I specify the data root folder to be /data/datanode? (Given the logs I see, the default datanode folder is /tmp/hadoop-root/dfs/)

P.S. Thank you for building these nice BigData images!

flokkr / docker-hadoop Goto Github PK

docker-hadoop's Introduction

Run Apache Bigdata projects in Kubernetes

docker-hadoop's People

Contributors

Stargazers

Watchers

Forkers

docker-hadoop's Issues

Example config doesn't work

Upgrade to hadoop 3.3.4

support hive on yarn

yarn logs error

Failed to start the KeyspaceManager

not localhost:8080 it is localhost:8088

How to persist data on the datanodes?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent