fabiogjardim / bigdata_docker Goto Github PK
View Code? Open in Web Editor NEWBig Data Ecosystem Docker
Big Data Ecosystem Docker
Olá Fábio,
Por favor, de acordo com a imagem do ecossistema, cada um dos itens será colocado em um container específico? Por exemplo, o MongoDB e o Mongo Express ficariam em containers separados ou no mesmo container?
Muito obrigado,
Daniel Adorno Gomes
em uma parte do código, voce pede para renomear o arquivo, porém o comando que é passado é um move. Acredito que o correto seria rename
Olá Fábio, tudo bem?
Tentei executar esse comando, de acordo com as instruções em sua página em: https://github.com/fabiogjardim/bigdata_docker mas não funcionou:
C:\docker>git clone http://github.com/fabiobjardim/bigdata_docker.git
Cloning into 'bigdata_docker'...
info: please complete authentication in your browser...
remote: Repository not found.
fatal: repository 'https://github.com/fabiobjardim/bigdata_docker.git/' not found
Grande Fábio,
sua distribuição caiu como uma luva pra mim, agradeço muito.
Entretanto estou com um erro ao tentar realizar qualquer conexão do Spark com o Hive. Dá mensagem
Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog. Tanto pelo Jupyter, como diretamente no pyspark dentro da vm do spark.
Já reinstalei tudo (apaguei todas as imagens do docker e inicializei somente o bigdata_docker, verifiquei se tem alguma porta em conflito, aumentei os recursos do Docker para 4 CPU, 16 GB de memória, 4 swap, e não mudou nada. Não achei nada de relevante nas pesquisas pela net.
Estou rodando em um iMac (24 GB RAM) com MacOS Catalina 10.15.4 e Docker 2.2.0.5 .
O restante está tudo funcionando, o HUE o Presto e o Metabase acessam normalmente o Hive.
Agradeço se puder me dar alguma idéia do que está errado. Não alterei nenhuma configuração sua ou das imagens.
root@jupyter-spark:/opt/spark/conf# pyspark
Python 3.5.3 (default, Sep 27 2018, 17:25:39)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
20/04/11 17:11:24 WARN spark.SparkConf: Note that spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone/kubernetes and LOCAL_DIRS in YARN).
20/04/11 17:11:25 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Welcome to
____ __
/ / ___ / /
\ / _ / _ `/ __/ '/
/ / ._/_,// //_\ version 2.4.1
//
Using Python version 3.5.3 (default, Sep 27 2018 17:25:39)
SparkSession available as 'spark'.
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
sqlContext.sql("show databases").show()
Traceback (most recent call last):
File "/opt/spark/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o38.sql.
: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:192)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:103)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.org$apache$spark$sql$hive$HiveSessionStateBuilder$$externalCatalog(HiveSessionStateBuilder.scala:39)
at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anonfun$1.apply(HiveSessionStateBuilder.scala:54)
at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anonfun$1.apply(HiveSessionStateBuilder.scala:54)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:90)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:90)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.listDatabases(SessionCatalog.scala:247)
at org.apache.spark.sql.execution.command.ShowDatabasesCommand$$anonfun$2.apply(databases.scala:44)
at org.apache.spark.sql.execution.command.ShowDatabasesCommand$$anonfun$2.apply(databases.scala:44)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.execution.command.ShowDatabasesCommand.run(databases.scala:44)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3364)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3363)
at org.apache.spark.sql.Dataset.(Dataset.scala:194)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:189)
... 36 more
Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/metadata/HiveException
at org.apache.spark.sql.hive.HiveExternalCatalog.(HiveExternalCatalog.scala:71)
... 41 more
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.metadata.HiveException
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 42 more
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "", line 1, in
File "/opt/spark/python/pyspark/sql/context.py", line 358, in sql
return self.sparkSession.sql(sqlQuery)
File "/opt/spark/python/pyspark/sql/session.py", line 767, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call
File "/opt/spark/python/pyspark/sql/utils.py", line 79, in deco
raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: "Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':"
Hi,
Hope you are all well !
Is it possible to provide an example of ingesting a csv file into this stack ?
Thanks in advance for any insights or inputs on that issue.
Cheers,
X
Olá... Infelizmente meu SO: Windows 10 Enterprise 64-bit não tem suporte para virtualizacão:
Atualmente utilizo o Docker Desktop Windows, dependente do Hyper-V, que quando ativo é incompatível com o VirtualBox...
Infelizmente por se tratar de um computador corporativo, não posso alterar a BIOS para ativar a virtualizacão.
Com base nesse cenário, alguma sugestão? Infelizmente não conheco muito de docker, mas acho que dever ter alguma alternativa.
Bom dia amigos,
Estou tentando iniciar a imagem do mysql, entretando após iniciar ele reinicia. Olhando o log, tenho o seguinte erro
`2020-05-29 01:57:46+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 5.7.29-1debian10 started.
2020-05-29 01:57:48+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
2020-05-29 01:57:48+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 5.7.29-1debian10 started.
2020-05-29T01:57:48.720344Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2020-05-29T01:57:48.732126Z 0 [Note] mysqld (mysqld 5.7.29) starting as process 1 ...
2020-05-29T01:57:48.749124Z 0 [Note] InnoDB: PUNCH HOLE support available
2020-05-29T01:57:48.749141Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2020-05-29T01:57:48.749144Z 0 [Note] InnoDB: Uses event mutexes
2020-05-29T01:57:48.749146Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2020-05-29T01:57:48.749148Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2020-05-29T01:57:48.749324Z 0 [Note] InnoDB: Number of pools: 1
2020-05-29T01:57:48.749400Z 0 [Note] InnoDB: Using CPU crc32 instructions
2020-05-29T01:57:48.750569Z 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2020-05-29T01:57:48.759106Z 0 [Note] InnoDB: Completed initialization of buffer pool
2020-05-29T01:57:48.761304Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2020-05-29T01:57:48.809724Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
2020-05-29T01:57:48.822850Z 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 155984619
2020-05-29T01:57:48.822870Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 155984628
2020-05-29T01:57:48.822874Z 0 [Note] InnoDB: Database was not shutdown normally!
2020-05-29T01:57:48.822876Z 0 [Note] InnoDB: Starting crash recovery.
2020-05-29T01:57:49.364512Z 0 [ERROR] InnoDB: Operating system error number 1 in a file operation.
2020-05-29T01:57:49.364549Z 0 [ERROR] InnoDB: Error number 1 means 'Operation not permitted'
2020-05-29T01:57:49.364555Z 0 [Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
2020-05-29T01:57:49.364559Z 0 [ERROR] InnoDB: File ./ibtmp1: 'delete' returned OS error 101.
2020-05-29T01:57:49.364563Z 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2020-05-29T01:57:49.365233Z 0 [ERROR] InnoDB: Operating system error number 1 in a file operation.
2020-05-29T01:57:49.365244Z 0 [ERROR] InnoDB: Error number 1 means 'Operation not permitted'
2020-05-29T01:57:49.365247Z 0 [Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
2020-05-29T01:57:49.365250Z 0 [ERROR] InnoDB: File ./ibtmp1: 'stat' returned OS error 101.
2020-05-29T01:57:49.365275Z 0 [ERROR] InnoDB: os_file_get_status() failed on './ibtmp1'. Can't determine file permissions
2020-05-29T01:57:49.365278Z 0 [ERROR] InnoDB: Could not create the shared innodb_temporary.
2020-05-29T01:57:49.365280Z 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
2020-05-29T01:57:49.566478Z 0 [ERROR] InnoDB: Operating system error number 1 in a file operation.
2020-05-29T01:57:49.566527Z 0 [ERROR] InnoDB: Error number 1 means 'Operation not permitted'
2020-05-29T01:57:49.566563Z 0 [Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
2020-05-29T01:57:49.566568Z 0 [ERROR] InnoDB: File ./ibtmp1: 'delete' returned OS error 101.
2020-05-29T01:57:49.566573Z 0 [ERROR] Plugin 'InnoDB' init function returned error.
2020-05-29T01:57:49.566576Z 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2020-05-29T01:57:49.566581Z 0 [ERROR] Failed to initialize builtin plugins.
2020-05-29T01:57:49.566583Z 0 [ERROR] Aborting
2020-05-29T01:57:49.566587Z 0 [Note] Binlog end
2020-05-29T01:57:49.566638Z 0 [Note] Shutting down plugin 'CSV'
2020-05-29T01:57:49.569736Z 0 [Note] mysqld: Shutdown complete`
Assim está minha configuracão da imagem no .yml
database: image: fjardim/mysql container_name: database hostname: database ports: - "33061:3306" deploy: resources: limits: memory: 500m command: mysqld --innodb-flush-method=O_DSYNC --innodb-use-native-aio=OFF --init-file /data/application/init.sql volumes: - /c/docker/bigdata_docker/data/mysql/data:/var/lib/mysql - /c/docker/bigdata_docker/data/init.sql:/data/application/init.sql environment: MYSQL_ROOT_USER: root MYSQL_ROOT_PASSWORD: secret MYSQL_DATABASE: hue MYSQL_USER: root MYSQL_PASSWORD: secret
Alguma idéia do que pode estar causando o erro? Lembrado que estou usando o Windows 10 e docker desktop para executar tudo.
Obrigado!
java.lang.IllegalArgumentException: java.net.UnknownHostException: historyserver
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.