Comments (3)
A couple things of the top of my head:
- Do you see any useful errors in the executor logs (since this looks like the driver logs)?
- Are you able to run a simple Spark job in this environment (w/o TFoS), e.g. SparkPi example?
- Sounds like you are using " t3.medium with 2 cores and 4 GB of RAM" and also "--executor-memory 4G". Maybe try reducing executor memory a bit to make sure you're not hitting some limit?
from tensorflowonspark.
Hi leewyang, thanks for the reply.
I can run sparkPi.
I tried to reduce the "executor-memory" parameter, and studied the best container size settings considering my architecture, but nothing to do, always the same error.
I enclose the stderr of one of the two performers :
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/ubuntu/hadoop_data/hdfs/tmp/nm-local-dir/usercache/root/filecache/25/__spark_libs__2761952758319157227.zip/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
22/03/28 14:36:39 INFO util.SignalUtils: Registered signal handler for TERM
22/03/28 14:36:39 INFO util.SignalUtils: Registered signal handler for HUP
22/03/28 14:36:39 INFO util.SignalUtils: Registered signal handler for INT
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/home/ubuntu/hadoop_data/hdfs/tmp/nm-local-dir/usercache/root/filecache/25/__spark_libs__2761952758319157227.zip/spark-unsafe_2.12-3.0.3.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
22/03/28 14:36:40 INFO spark.SecurityManager: Changing view acls to: ubuntu,root
22/03/28 14:36:40 INFO spark.SecurityManager: Changing modify acls to: ubuntu,root
22/03/28 14:36:40 INFO spark.SecurityManager: Changing view acls groups to:
22/03/28 14:36:40 INFO spark.SecurityManager: Changing modify acls groups to:
22/03/28 14:36:40 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu, root); groups with view permissions: Set(); users with modify permissions: Set(ubuntu, root); groups with modify permissions: Set()
22/03/28 14:36:41 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1648473916737_0069_000002
22/03/28 14:36:41 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread
22/03/28 14:36:41 INFO yarn.ApplicationMaster: Waiting for spark context initialization...
22/03/28 14:36:42 ERROR yarn.ApplicationMaster: User application exited with status 1
22/03/28 14:36:42 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 13, (reason: User application exited with status 1)
22/03/28 14:36:42 ERROR yarn.ApplicationMaster: Uncaught exception:
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:302)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:500)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:264)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:892)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:891)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:891)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: org.apache.spark.SparkUserAppException: User application exited with 1
at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:103)
at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:728)
22/03/28 14:36:42 INFO yarn.ApplicationMaster: Deleting staging directory hdfs://NameNode:9000/user/root/.sparkStaging/application_1648473916737_0069
22/03/28 14:36:42 ERROR util.Utils: Uncaught exception in thread Thread-1
java.lang.IllegalStateException: Shutdown in progress, cannot add a shutdownHook
at org.apache.hadoop.util.ShutdownHookManager.addShutdownHook(ShutdownHookManager.java:152)
at org.apache.hadoop.tracing.SpanReceiverHost.get(SpanReceiverHost.java:79)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:634)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:619)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.deploy.yarn.ApplicationMaster.cleanupStagingDir(ApplicationMaster.scala:674)
at org.apache.spark.deploy.yarn.ApplicationMaster.$anonfun$run$2(ApplicationMaster.scala:258)
at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1934)
at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
22/03/28 14:36:42 INFO util.ShutdownHookManager: Shutdown hook called
E anche lo stdout :
Traceback (most recent call last):
File "mnist_data_setup.py", line 15, in
import tensorflow as tf
ModuleNotFoundError: No module named 'tensorflow'
from tensorflowonspark.
"Solved" by running the task in standalone mode.
from tensorflowonspark.
Related Issues (20)
- Writing checkpoints to HDFS takes long HOT 2
- when using mnist_spark.py , serializer.dump_stream Timeout while feeding partition HOT 2
- pkg_resources.DistributionNotFound: The 'tensorflow' distribution was not found and is required by the application HOT 3
- MNIST example - Exception in TF background thread HOT 2
- the doubt about the data policy HOT 1
- Performance issues in the program HOT 2
- Performance issues in examples/mnist/estimator (by P3) HOT 3
- Retaining original columns after inference HOT 2
- tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme 'cosn' not implemented HOT 2
- Model Saved with TF-2.5.0 HOT 3
- How to integrate a model into Spark cluster HOT 12
- Get stuck at "Added broadcast_0_piece0 in memory on" while runing Spark standalone cluster HOT 1
- can it run on tensorflow-cpu? HOT 1
- can it run use ParameterServerStrategy HOT 3
- do we support scala & java code write tensorflow model with tenorflow-core-api ? HOT 3
- Evalator hangs while training HOT 1
- yarn mode error HOT 1
- error while running mnist_tf_ds.py HOT 1
- I have been trying to use TensorFlowOnSpark in Azure Synapse Analytics and I would like to ask if you have any information about its compatibility in this environment
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorflowonspark.