fabiogjardim / bigdata_docker Goto Github PK

Big Data Ecosystem Docker

VBA 88.38% PLSQL 0.17% Jupyter Notebook 9.33% Batchfile 0.75% Shell 1.37%

hadoop hdfs hbase hive presto spark jupyter-notebook hue mongo metabase

bigdata_docker's Issues

Dúvida sobre docker

Olá Fábio,

Por favor, de acordo com a imagem do ecossistema, cada um dos itens será colocado em um container específico? Por exemplo, o MongoDB e o Mongo Express ficariam em containers separados ou no mesmo container?

Muito obrigado,

Daniel Adorno Gomes

move no lugar de rm

em uma parte do código, voce pede para renomear o arquivo, porém o comando que é passado é um move. Acredito que o correto seria rename

fatal: repository 'https://github.com/fabiobjardim/bigdata_docker.git/' not found

Olá Fábio, tudo bem?

Tentei executar esse comando, de acordo com as instruções em sua página em: https://github.com/fabiogjardim/bigdata_docker mas não funcionou:

C:\docker>git clone http://github.com/fabiobjardim/bigdata_docker.git
Cloning into 'bigdata_docker'...
info: please complete authentication in your browser...
remote: Repository not found.
fatal: repository 'https://github.com/fabiobjardim/bigdata_docker.git/' not found

Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog

Grande Fábio,

sua distribuição caiu como uma luva pra mim, agradeço muito.

Entretanto estou com um erro ao tentar realizar qualquer conexão do Spark com o Hive. Dá mensagem
Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog. Tanto pelo Jupyter, como diretamente no pyspark dentro da vm do spark.

Já reinstalei tudo (apaguei todas as imagens do docker e inicializei somente o bigdata_docker, verifiquei se tem alguma porta em conflito, aumentei os recursos do Docker para 4 CPU, 16 GB de memória, 4 swap, e não mudou nada. Não achei nada de relevante nas pesquisas pela net.

Estou rodando em um iMac (24 GB RAM) com MacOS Catalina 10.15.4 e Docker 2.2.0.5 .

O restante está tudo funcionando, o HUE o Presto e o Metabase acessam normalmente o Hive.

Agradeço se puder me dar alguma idéia do que está errado. Não alterei nenhuma configuração sua ou das imagens.

root@jupyter-spark:/opt/spark/conf# pyspark
Python 3.5.3 (default, Sep 27 2018, 17:25:39)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
20/04/11 17:11:24 WARN spark.SparkConf: Note that spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone/kubernetes and LOCAL_DIRS in YARN).
20/04/11 17:11:25 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Welcome to
____ __
/ / ___ / /
\ / _ / _ `/ __/ '/
/ / ._/_,// //_\ version 2.4.1
//

Using Python version 3.5.3 (default, Sep 27 2018 17:25:39)
SparkSession available as 'spark'.

from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
sqlContext.sql("show databases").show()
Traceback (most recent call last):
File "/opt/spark/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o38.sql.
: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:192)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:103)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.org$apache$spark$sql$hive$HiveSessionStateBuilder$$externalCatalog(HiveSessionStateBuilder.scala:39)
at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anonfun$1.apply(HiveSessionStateBuilder.scala:54)
at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anonfun$1.apply(HiveSessionStateBuilder.scala:54)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:90)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:90)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.listDatabases(SessionCatalog.scala:247)
at org.apache.spark.sql.execution.command.ShowDatabasesCommand$$anonfun$2.apply(databases.scala:44)
at org.apache.spark.sql.execution.command.ShowDatabasesCommand$$anonfun$2.apply(databases.scala:44)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.execution.command.ShowDatabasesCommand.run(databases.scala:44)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3364)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3363)
at org.apache.spark.sql.Dataset.(Dataset.scala:194)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:189)
... 36 more
Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/metadata/HiveException
at org.apache.spark.sql.hive.HiveExternalCatalog.(HiveExternalCatalog.scala:71)
... 41 more
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.metadata.HiveException
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 42 more

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "/opt/spark/python/pyspark/sql/context.py", line 358, in sql
return self.sparkSession.sql(sqlQuery)
File "/opt/spark/python/pyspark/sql/session.py", line 767, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call
File "/opt/spark/python/pyspark/sql/utils.py", line 79, in deco
raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: "Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':"

ingest data / demo example

Hi,

Hope you are all well !

Is it possible to provide an example of ingesting a csv file into this stack ?

Thanks in advance for any insights or inputs on that issue.

Cheers,
X

Windows 10 Enterprise 64-bit

Olá... Infelizmente meu SO: Windows 10 Enterprise 64-bit não tem suporte para virtualizacão:

Atualmente utilizo o Docker Desktop Windows, dependente do Hyper-V, que quando ativo é incompatível com o VirtualBox...

Infelizmente por se tratar de um computador corporativo, não posso alterar a BIOS para ativar a virtualizacão.

Com base nesse cenário, alguma sugestão? Infelizmente não conheco muito de docker, mas acho que dever ter alguma alternativa.

importante

Problema para iniciar imagem mysql

Bom dia amigos,

Estou tentando iniciar a imagem do mysql, entretando após iniciar ele reinicia. Olhando o log, tenho o seguinte erro

`2020-05-29 01:57:46+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 5.7.29-1debian10 started.
2020-05-29 01:57:48+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
2020-05-29 01:57:48+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 5.7.29-1debian10 started.
2020-05-29T01:57:48.720344Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2020-05-29T01:57:48.732126Z 0 [Note] mysqld (mysqld 5.7.29) starting as process 1 ...
2020-05-29T01:57:48.749124Z 0 [Note] InnoDB: PUNCH HOLE support available
2020-05-29T01:57:48.749141Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2020-05-29T01:57:48.749144Z 0 [Note] InnoDB: Uses event mutexes
2020-05-29T01:57:48.749146Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2020-05-29T01:57:48.749148Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2020-05-29T01:57:48.749324Z 0 [Note] InnoDB: Number of pools: 1
2020-05-29T01:57:48.749400Z 0 [Note] InnoDB: Using CPU crc32 instructions
2020-05-29T01:57:48.750569Z 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2020-05-29T01:57:48.759106Z 0 [Note] InnoDB: Completed initialization of buffer pool
2020-05-29T01:57:48.761304Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2020-05-29T01:57:48.809724Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
2020-05-29T01:57:48.822850Z 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 155984619
2020-05-29T01:57:48.822870Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 155984628
2020-05-29T01:57:48.822874Z 0 [Note] InnoDB: Database was not shutdown normally!
2020-05-29T01:57:48.822876Z 0 [Note] InnoDB: Starting crash recovery.
2020-05-29T01:57:49.364512Z 0 [ERROR] InnoDB: Operating system error number 1 in a file operation.
2020-05-29T01:57:49.364549Z 0 [ERROR] InnoDB: Error number 1 means 'Operation not permitted'
2020-05-29T01:57:49.364555Z 0 [Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
2020-05-29T01:57:49.364559Z 0 [ERROR] InnoDB: File ./ibtmp1: 'delete' returned OS error 101.
2020-05-29T01:57:49.364563Z 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2020-05-29T01:57:49.365233Z 0 [ERROR] InnoDB: Operating system error number 1 in a file operation.
2020-05-29T01:57:49.365244Z 0 [ERROR] InnoDB: Error number 1 means 'Operation not permitted'
2020-05-29T01:57:49.365247Z 0 [Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
2020-05-29T01:57:49.365250Z 0 [ERROR] InnoDB: File ./ibtmp1: 'stat' returned OS error 101.
2020-05-29T01:57:49.365275Z 0 [ERROR] InnoDB: os_file_get_status() failed on './ibtmp1'. Can't determine file permissions
2020-05-29T01:57:49.365278Z 0 [ERROR] InnoDB: Could not create the shared innodb_temporary.
2020-05-29T01:57:49.365280Z 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
2020-05-29T01:57:49.566478Z 0 [ERROR] InnoDB: Operating system error number 1 in a file operation.
2020-05-29T01:57:49.566527Z 0 [ERROR] InnoDB: Error number 1 means 'Operation not permitted'
2020-05-29T01:57:49.566563Z 0 [Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
2020-05-29T01:57:49.566568Z 0 [ERROR] InnoDB: File ./ibtmp1: 'delete' returned OS error 101.
2020-05-29T01:57:49.566573Z 0 [ERROR] Plugin 'InnoDB' init function returned error.
2020-05-29T01:57:49.566576Z 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2020-05-29T01:57:49.566581Z 0 [ERROR] Failed to initialize builtin plugins.
2020-05-29T01:57:49.566583Z 0 [ERROR] Aborting

2020-05-29T01:57:49.566587Z 0 [Note] Binlog end
2020-05-29T01:57:49.566638Z 0 [Note] Shutting down plugin 'CSV'
2020-05-29T01:57:49.569736Z 0 [Note] mysqld: Shutdown complete`

Assim está minha configuracão da imagem no .yml

database: image: fjardim/mysql container_name: database hostname: database ports: - "33061:3306" deploy: resources: limits: memory: 500m command: mysqld --innodb-flush-method=O_DSYNC --innodb-use-native-aio=OFF --init-file /data/application/init.sql volumes: - /c/docker/bigdata_docker/data/mysql/data:/var/lib/mysql - /c/docker/bigdata_docker/data/init.sql:/data/application/init.sql environment: MYSQL_ROOT_USER: root MYSQL_ROOT_PASSWORD: secret MYSQL_DATABASE: hue MYSQL_USER: root MYSQL_PASSWORD: secret
Alguma idéia do que pode estar causando o erro? Lembrado que estou usando o Windows 10 e docker desktop para executar tudo.

Obrigado!

java.net.UnknownHostException: historyserver

java.lang.IllegalArgumentException: java.net.UnknownHostException: historyserver

fabiogjardim / bigdata_docker Goto Github PK

bigdata_docker's Issues

Dúvida sobre docker

move no lugar de rm

fatal: repository 'https://github.com/fabiobjardim/bigdata_docker.git/' not found

Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog

ingest data / demo example

Windows 10 Enterprise 64-bit

importante

Problema para iniciar imagem mysql

java.net.UnknownHostException: historyserver

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent