A development big data infrastructure with docker-compose.
In this platform, you will have HDFS, Hive, Spark, Hue, Zeppelin, Kafka, Zookeeper, and Streamsets connected together.
Just run docker-compose up
and enjoy!
bd-infra's Introduction
bd-infra's People
Forkers
saurabhpathak21 devendrasr senthilkumarparamathma apnmrv-docker-environmets shishir-kr92 morellifernando02 karayakar badicst databill86 trinhvo feihu618 abbi1680 soujyo prubach devops-best-practices gagan3012 spartonia whiteneverdie snovian cnshsliu enzomar sbhnet merlin2008 chu8129 ucub mohamedragabanas soufiane-fartit elkhaddari04 alvaro-forks-conservar sumit1982 alvaromaceda amal4m41 sugyan84 anugula-bigdata seanyinx huangmatthieu farid-feyzi ajinkyasakhare19 zarkara shiuli1821 swathi14-5 suryabharadwaj143 xqfgbc betelgeuse29 rajaiahn guanyili-craig cmendesfirmino abbasloo klymya antigone660 wesleyegberto rafaelquirino gwpost x4leqxinn chowned jsxz drosanda cenahcoid swanandmehta pimiento strongfist rohanaceres shumailaahmed iracleous muktadircefalo mrcsdf zhoujing26bd-infra's Issues
zeppelin | Apache Zeppelin requires either Java 8 update 151 or newer
zeppelin | - Setting hadoop.proxyuser.hue.groups=*
zeppelin | Configuring hdfs
zeppelin | - Setting dfs.webhdfs.enabled=true
zeppelin | - Setting dfs.permissions.enabled=false
zeppelin | Configuring yarn
zeppelin | - Setting yarn.timeline-service.enabled=true
zeppelin | - Setting yarn.resourcemanager.system-metrics-publisher.enabled=true
zeppelin | - Setting yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
zeppelin | - Setting yarn.log.server.url=http://historyserver:8188/applicationhistory/logs/
zeppelin | - Setting yarn.resourcemanager.fs.state-store.uri=/rmstate
zeppelin | - Setting yarn.timeline-service.generic-application-history.enabled=true
zeppelin | - Setting yarn.log-aggregation-enable=true
zeppelin | - Setting yarn.resourcemanager.hostname=resourcemanager
zeppelin | - Setting yarn.resourcemanager.resource.tracker.address=resourcemanager:8031
zeppelin | - Setting yarn.timeline-service.hostname=historyserver
zeppelin | - Setting yarn.resourcemanager.scheduler.address=resourcemanager:8030
zeppelin | - Setting yarn.resourcemanager.address=resourcemanager:8032
zeppelin | - Setting yarn.nodemanager.remote-app-log-dir=/app-logs
zeppelin | - Setting yarn.resourcemanager.recovery.enabled=true
zeppelin | Configuring httpfs
zeppelin | Configuring kms
zeppelin | Configuring mapred
zeppelin | Configuring hive
zeppelin | - Setting datanucleus.autoCreateSchema=false
zeppelin | sed: can't read /opt/hive/conf/hive-site.xml: No such file or directory
zeppelin | - Setting javax.jdo.option.ConnectionPassword=hive
zeppelin | sed: can't read /opt/hive/conf/hive-site.xml: No such file or directory
zeppelin | - Setting hive.metastore.uris=thrift://hive-metastore:9083
zeppelin | sed: can't read /opt/hive/conf/hive-site.xml: No such file or directory
zeppelin | - Setting javax.jdo.option.ConnectionURL=jdbc:postgresql://hive-metastore-postgresql/metastore
zeppelin | sed: can't read /opt/hive/conf/hive-site.xml: No such file or directory
zeppelin | - Setting javax.jdo.option.ConnectionUserName=hive
zeppelin | sed: can't read /opt/hive/conf/hive-site.xml: No such file or directory
zeppelin | - Setting javax.jdo.option.ConnectionDriverName=org.postgresql.Driver
zeppelin | sed: can't read /opt/hive/conf/hive-site.xml: No such file or directory
zeppelin | Configuring for multihomed network
zeppelin | Apache Zeppelin requires either Java 8 update 151 or newer
zeppelin exited with code 1
Hue wont connect to HDFS
hello, is there any other settings to make Hue connectes to HDFS Service
thank you
Ports are not available
When I run docker compose up (as administrator) I will get the following error:Error response from daemon: Ports are not available: exposing port TCP 0.0.0.0:50070 -> 0.0.0.0:0: listen tcp 0.0.0.0:50070: bind: An attempt was made to access a socket in a way forbidden by its access permissions.
I am using Windows 11 and IIS is not installed and the port is not assigned to any other service and it is free.
I have tested same docker file on Linux and it will work there.
Could you please guide me what is the problem and how I can solve it?
Fails to start on M1 Mac, missing ARM image
$ docker-compose up
Status: Downloaded newer image for openkbs/docker-spark-bde2020-zeppelin:latest
Pulling database (mysql:5.7)...
5.7: Pulling from library/mysql
ERROR: no matching manifest for linux/arm64/v8 in the manifest list entries
Kerberos integration?
Dear all,
I'm new to Hadoop and Spark. I'm trying to use your setupfor writing integration tests for spark job that works with Clickhouse and Hdfs.
One of the issues I faced - it looks like there is no integration with Kerberos.
Is there any hint or example how to integrate Kerberos auth with current configuration?
hue crashes after a few seconds
Hello,
Thanks for sharing this great repo. Unfortunately, it seems like hue service is crashing and I cannot access any UIs (not even the namenode UI. I am running the following commands in Ubuntu 20.04 inside WSL2:
git clone https://github.com/m-semnani/bd-infra.git
cd bd-infra
docker-compose up -d
Then, after everything is installed and the services are up and running, I do a simple docker ps
and see the following results:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d1cac433a6ee bde2020/hive:2.3.2-postgresql-metastore "entrypoint.sh /bin/…" 6 seconds ago Up 4 seconds 0.0.0.0:10000->10000/tcp, 10002/tcp hive-server
603604a7ce37 bde2020/hive:2.3.2-postgresql-metastore "entrypoint.sh /opt/…" 7 seconds ago Up 5 seconds 10000/tcp, 0.0.0.0:9083->9083/tcp, 10002/tcp hive-metastore
9964d2f49f5b bde2020/hive-metastore-postgresql:2.3.0 "/docker-entrypoint.…" 8 seconds ago Up 6 seconds 5432/tcp hive-metastore-postgresql
494f5edaed7e bde2020/spark-worker:2.4.0-hadoop2.7 "/bin/bash /worker.sh" 8 seconds ago Up 6 seconds 0.0.0.0:8081->8081/tcp spark-worker
3d42461686fb gethue/hue:20191107-135001 "./startup.sh" 8 seconds ago Up 7 seconds 0.0.0.0:8888->8888/tcp hue
ef5213648cee bde2020/hadoop-datanode:2.0.0-hadoop2.7.4-java8 "/entrypoint.sh /run…" 9 seconds ago Up 7 seconds (health: starting) 0.0.0.0:50075->50075/tcp datanode
82dc9e4ab2ab bde2020/hadoop-namenode:2.0.0-hadoop2.7.4-java8 "/entrypoint.sh /run…" 10 seconds ago Up 8 seconds (health: starting) 0.0.0.0:50070->50070/tcp namenode
51271761043f wurstmeister/kafka:2.12-2.3.0 "start-kafka.sh" 10 seconds ago Up 8 seconds 0.0.0.0:9092->9092/tcp bd-infra_kafka_1
8c6f51379a48 bde2020/spark-master:2.4.0-hadoop2.7 "/bin/bash /master.sh" 10 seconds ago Up 8 seconds 0.0.0.0:7077->7077/tcp, 6066/tcp, 0.0.0.0:8080->8080/tcp spark-master
b8ca46f10106 wurstmeister/zookeeper:3.4.6 "/bin/sh -c '/usr/sb…" 10 seconds ago Up 7 seconds 22/tcp, 2888/tcp, 3888/tcp, 0.0.0.0:2181->2181/tcp bd-infra_zookeeper_1
As we can see, the hue service is running is up and running with container ID 3d42461686fb. However, after I wait a few seconds and do docker ps again
, I see that the hue container is not running anymore:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d1cac433a6ee bde2020/hive:2.3.2-postgresql-metastore "entrypoint.sh /bin/…" 29 seconds ago Up 28 seconds 0.0.0.0:10000->10000/tcp, 10002/tcp hive-server
603604a7ce37 bde2020/hive:2.3.2-postgresql-metastore "entrypoint.sh /opt/…" 30 seconds ago Up 29 seconds 10000/tcp, 0.0.0.0:9083->9083/tcp, 10002/tcp hive-metastore
9964d2f49f5b bde2020/hive-metastore-postgresql:2.3.0 "/docker-entrypoint.…" 31 seconds ago Up 30 seconds 5432/tcp hive-metastore-postgresql
494f5edaed7e bde2020/spark-worker:2.4.0-hadoop2.7 "/bin/bash /worker.sh" 31 seconds ago Up 30 seconds 0.0.0.0:8081->8081/tcp spark-worker
ef5213648cee bde2020/hadoop-datanode:2.0.0-hadoop2.7.4-java8 "/entrypoint.sh /run…" 32 seconds ago Up 31 seconds (healthy) 0.0.0.0:50075->50075/tcp datanode
82dc9e4ab2ab bde2020/hadoop-namenode:2.0.0-hadoop2.7.4-java8 "/entrypoint.sh /run…" 33 seconds ago Up 32 seconds (healthy) 0.0.0.0:50070->50070/tcp namenode
51271761043f wurstmeister/kafka:2.12-2.3.0 "start-kafka.sh" 33 seconds ago Up 32 seconds 0.0.0.0:9092->9092/tcp bd-infra_kafka_1
8c6f51379a48 bde2020/spark-master:2.4.0-hadoop2.7 "/bin/bash /master.sh" 33 seconds ago Up 31 seconds 0.0.0.0:7077->7077/tcp, 6066/tcp, 0.0.0.0:8080->8080/tcp spark-master
b8ca46f10106 wurstmeister/zookeeper:3.4.6 "/bin/sh -c '/usr/sb…" 33 seconds ago Up 31 seconds 22/tcp, 2888/tcp, 3888/tcp, 0.0.0.0:2181->2181/tcp bd-infra_zookeeper_1
Also, when I try to access the UI of the namenode on localhost:50070 and the UI of hue on localhost:8888, I get the error This site can’t be reached
(which is obviously the case for the hue service, but surprisingly also for the namenode.
The only thing I changed compared to your docker-compose file is that I changed the volume path /tmp
into ./tmp
, but I am getting the same problem in either case. Do you have any recommendations how to fix this issue?
Update 1: Below are the logs of the hue container of a few seconds after running docker-compose up --build -d
and then docker-compose logs hue
:
Attaching to hue
hue | [21/Mar/2021 10:38:00 ] settings INFO Welcome to Hue 4.5.0
hue | [21/Mar/2021 03:38:04 -0700] decorators INFO AXES: BEGIN LOG
hue | [21/Mar/2021 03:38:04 -0700] decorators INFO Using django-axes 2.2.0
hue | Traceback (most recent call last):
hue | File "./build/env/bin/hue", line 11, in <module>
hue | load_entry_point('desktop', 'console_scripts', 'hue')()
hue | File "/usr/share/hue/desktop/core/src/desktop/manage_entry.py", line 225, in entry
hue | execute_from_command_line(sys.argv)
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.22-py2.7.egg/django/core/management/__init__.py", line 364, in execute_from_command_line
hue | utility.execute()
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.22-py2.7.egg/django/core/management/__init__.py", line 356, in execute
hue | self.fetch_command(subcommand).run_from_argv(self.argv)
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.22-py2.7.egg/django/core/management/base.py", line 283, in run_from_argv
hue | self.execute(*args, **cmd_options)
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.22-py2.7.egg/django/core/management/base.py", line 327, in execute
hue | self.check()
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.22-py2.7.egg/django/core/management/base.py", line 359, in check
hue | include_deployment_checks=include_deployment_checks,
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.22-py2.7.egg/django/core/management/base.py", line 346, in _run_checks
hue | return checks.run_checks(**kwargs)
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.22-py2.7.egg/django/core/checks/registry.py", line 81, in run_checks
hue | new_errors = check(app_configs=app_configs)
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.22-py2.7.egg/django/core/checks/model_checks.py", line 30, in check_all_models
hue | errors.extend(model.check(**kwargs))
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.22-py2.7.egg/django/db/models/base.py", line 1284, in check
hue | errors.extend(cls._check_fields(**kwargs))
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.22-py2.7.egg/django/db/models/base.py", line 1359, in _check_fields
hue | errors.extend(field.check(**kwargs))
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.22-py2.7.egg/django/db/models/fields/__init__.py", line 913, in check
hue | errors = super(AutoField, self).check(**kwargs)
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.22-py2.7.egg/django/db/models/fields/__init__.py", line 219, in check
hue | errors.extend(self._check_backend_specific_checks(**kwargs))
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.22-py2.7.egg/django/db/models/fields/__init__.py", line 322, in _check_backend_specific_checks
hue | return connections[db].validation.check_field(self, **kwargs)
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.22-py2.7.egg/django/db/backends/mysql/validation.py", line 49, in check_field
hue | field_type = field.db_type(self.connection)
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.22-py2.7.egg/django/db/models/fields/__init__.py", line 644, in db_type
hue | return connection.data_types[self.get_internal_type()] % data
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.22-py2.7.egg/django/utils/functional.py", line 35, in __get__
hue | res = instance.__dict__[self.name] = self.func(instance)
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.22-py2.7.egg/django/db/backends/mysql/base.py", line 174, in data_types
hue | if self.features.supports_microsecond_precision:
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.22-py2.7.egg/django/utils/functional.py", line 35, in __get__
hue | res = instance.__dict__[self.name] = self.func(instance)
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.22-py2.7.egg/django/db/backends/mysql/features.py", line 53, in supports_microsecond_precision
hue | return self.connection.mysql_version >= (5, 6, 4) and Database.version_info >= (1, 2, 5)
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.22-py2.7.egg/django/utils/functional.py", line 35, in __get__
hue | res = instance.__dict__[self.name] = self.func(instance)
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.22-py2.7.egg/django/db/backends/mysql/base.py", line 385, in mysql_version
hue | with self.temporary_connection() as cursor:
hue | File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
hue | return self.gen.next()
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.22-py2.7.egg/django/db/backends/base/base.py", line 591, in temporary_connection
hue | cursor = self.cursor()
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.22-py2.7.egg/django/db/backends/base/base.py", line 254, in cursor
hue | return self._cursor()
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.22-py2.7.egg/django/db/backends/base/base.py", line 229, in _cursor
hue | self.ensure_connection()
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.22-py2.7.egg/django/db/backends/base/base.py", line 213, in ensure_connection
hue | self.connect()
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.22-py2.7.egg/django/db/utils.py", line 94, in __exit__
hue | six.reraise(dj_exc_type, dj_exc_value, traceback)
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.22-py2.7.egg/django/db/backends/base/base.py", line 213, in ensure_connection
hue | self.connect()
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.22-py2.7.egg/django/db/backends/base/base.py", line 189, in connect
hue | self.connection = self.get_new_connection(conn_params)
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/Django-1.11.22-py2.7.egg/django/db/backends/mysql/base.py", line 274, in get_new_connection
hue | conn = Database.connect(**conn_params)
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/MySQL_python-1.2.5-py2.7-linux-x86_64.egg/MySQLdb/__init__.py", line 81, in Connect
hue | return Connection(*args, **kwargs)
hue | File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/MySQL_python-1.2.5-py2.7-linux-x86_64.egg/MySQLdb/connections.py", line 193, in __init__
hue | super(Connection, self).__init__(*args, **kwargs2)
hue | django.db.utils.OperationalError: (2005, "Unknown MySQL server host 'database' (0)")
There are actually more logs, but they are just a repititon of the above. My initial thought was that maybe there is something wrong in the database section of the hue-overrides.ini file, but the host and port name seem to make sense to me. If anyone could share some more insights on this, I'd highly appreciate it.
Thanks,
Kevin
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.