Git Product home page Git Product logo

Comments (3)

leewyang avatar leewyang commented on May 24, 2024

A couple things of the top of my head:

  • Do you see any useful errors in the executor logs (since this looks like the driver logs)?
  • Are you able to run a simple Spark job in this environment (w/o TFoS), e.g. SparkPi example?
  • Sounds like you are using " t3.medium with 2 cores and 4 GB of RAM" and also "--executor-memory 4G". Maybe try reducing executor memory a bit to make sure you're not hitting some limit?

from tensorflowonspark.

GianmarcoSetzu1 avatar GianmarcoSetzu1 commented on May 24, 2024

Hi leewyang, thanks for the reply.
I can run sparkPi.
I tried to reduce the "executor-memory" parameter, and studied the best container size settings considering my architecture, but nothing to do, always the same error.
I enclose the stderr of one of the two performers :

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/ubuntu/hadoop_data/hdfs/tmp/nm-local-dir/usercache/root/filecache/25/__spark_libs__2761952758319157227.zip/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
22/03/28 14:36:39 INFO util.SignalUtils: Registered signal handler for TERM
22/03/28 14:36:39 INFO util.SignalUtils: Registered signal handler for HUP
22/03/28 14:36:39 INFO util.SignalUtils: Registered signal handler for INT
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/home/ubuntu/hadoop_data/hdfs/tmp/nm-local-dir/usercache/root/filecache/25/__spark_libs__2761952758319157227.zip/spark-unsafe_2.12-3.0.3.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
22/03/28 14:36:40 INFO spark.SecurityManager: Changing view acls to: ubuntu,root
22/03/28 14:36:40 INFO spark.SecurityManager: Changing modify acls to: ubuntu,root
22/03/28 14:36:40 INFO spark.SecurityManager: Changing view acls groups to:
22/03/28 14:36:40 INFO spark.SecurityManager: Changing modify acls groups to:
22/03/28 14:36:40 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu, root); groups with view permissions: Set(); users with modify permissions: Set(ubuntu, root); groups with modify permissions: Set()
22/03/28 14:36:41 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1648473916737_0069_000002
22/03/28 14:36:41 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread
22/03/28 14:36:41 INFO yarn.ApplicationMaster: Waiting for spark context initialization...
22/03/28 14:36:42 ERROR yarn.ApplicationMaster: User application exited with status 1
22/03/28 14:36:42 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 13, (reason: User application exited with status 1)
22/03/28 14:36:42 ERROR yarn.ApplicationMaster: Uncaught exception:
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:302)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:500)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:264)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:892)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:891)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:891)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: org.apache.spark.SparkUserAppException: User application exited with 1
at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:103)
at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:728)
22/03/28 14:36:42 INFO yarn.ApplicationMaster: Deleting staging directory hdfs://NameNode:9000/user/root/.sparkStaging/application_1648473916737_0069
22/03/28 14:36:42 ERROR util.Utils: Uncaught exception in thread Thread-1
java.lang.IllegalStateException: Shutdown in progress, cannot add a shutdownHook
at org.apache.hadoop.util.ShutdownHookManager.addShutdownHook(ShutdownHookManager.java:152)
at org.apache.hadoop.tracing.SpanReceiverHost.get(SpanReceiverHost.java:79)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:634)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:619)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.deploy.yarn.ApplicationMaster.cleanupStagingDir(ApplicationMaster.scala:674)
at org.apache.spark.deploy.yarn.ApplicationMaster.$anonfun$run$2(ApplicationMaster.scala:258)
at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1934)
at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
22/03/28 14:36:42 INFO util.ShutdownHookManager: Shutdown hook called

E anche lo stdout :

Traceback (most recent call last):
File "mnist_data_setup.py", line 15, in
import tensorflow as tf
ModuleNotFoundError: No module named 'tensorflow'

from tensorflowonspark.

GianmarcoSetzu1 avatar GianmarcoSetzu1 commented on May 24, 2024

"Solved" by running the task in standalone mode.

from tensorflowonspark.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.