Comments (3)
I have the same problem. If I run in the yarn-client mode, there is error "File does not exist: hdfs://hdfs/tmp/hadoop-yarn/e.zamyatin/.staging/application_1661620604842_4444/libjars/kryo-shaded-4.0.0.jar".
Also the command to run AngelApplicationMaster is strange:
[2022-08-28 22:41:34.854+0300] INFO com.tencent.angel.client.yarn.AngelYarnClient: Command to launch container for ApplicationMaster is : $JAVA_HOME/bin/java -Dlog4j.configuration=log/angel.properties -Dlog4j.logger.com.tencent.ml=DEBUG -Dyarn.app.contai
ner.log.dir=<LOG_DIR> -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx1536M -Xms1536M -XX:PermSize=100M -XX:MaxPermSize=200M -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintCommandLineFlags -XX:+PrintTenuringDistribution -XX:+
PrintAdaptiveSizePolicy -Xloggc:<LOG_DIR>/gc.log com.tencent.angel.master.AngelApplicationMaster 1><LOG_DIR>/stdout 2><LOG_DIR>/stderr
Seems like <LOG_DIR> should be replaced somewhere.
from angel.
The problem was hadoop version. I found answer here: #628
from angel.
Environment:
- Java version:1.8
- Scala version:2.11.8
- Spark version:2.4.5
- PyTorch and Python version:
- OS and version:centos7.4
22/08/26 16:51:21 ERROR Client: Application diagnostics message: User class threw exception: com.tencent.angel.exception.AngelException: init AngelPSContext fail, please check logs of master of angel at com.tencent.angel.spark.context.PSContext$.liftedTree1$1(PSContext.scala:101) at com.tencent.angel.spark.context.PSContext$.instance(PSContext.scala:94) at com.tencent.angel.spark.context.PSContext$.getOrCreate(PSContext.scala:78) at com.tencent.angel.graph.rank.pagerank.edgecut.PageRank.transform(PageRank.scala:83) at com.tencent.angel.spark.examples.cluster.PageRankExample$.edgeCutPageRank(PageRankExample.scala:114) at com.tencent.angel.spark.examples.cluster.PageRankExample$.main(PageRankExample.scala:65) at com.tencent.angel.spark.examples.cluster.PageRankExample.main(PageRankExample.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:685) Exception in thread "main" org.apache.spark.SparkException: Application application_1657344020931_1430915 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:1158) at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1535) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:852) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:927) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:936) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
可以看下ps的master日志信息吗,这边显示的ps没拉起来
from angel.
Related Issues (20)
- 2022Tencent Rhino-bird Open-source Training Program—Angel-Xinhong Ma-Week2
- write xxx meta to file failed HOT 1
- 2022Tencent Rhino-bird Open-source Training Program—Angel-Zihan Li-Week3&4
- 2022Tencent Rhino-bird Open-source Training Program—Angel-Yinan Zhang-Week1
- 2022Tencent Rhino-bird Open-source Training Program—Angel-YinHan Zhang-Week3&4
- yarn-client模式下运行 cluster.LINEExample 用例,checkpoint 步骤报错 HOT 9
- 2022Tencent Rhino-bird Open-source Training Program—Angel-YinHan Zhang-Week5
- Possible Word2Vec optimization HOT 2
- LINEModel训练时,每个epoch只训练一个batch? HOT 3
- 2022Tencent Rhino-bird Open-source Training Program—Angel-YinHan Zhang-Week7
- 2022Tencent Rhino-bird Open-source Training Program—Angel-Zihan Li-Week5&6
- 2022Tencent Rhino-bird Open-source Training Program—Angel-Zihan Li-Week7&8
- 2022Tencent Rhino-bird Open-source Training Program—Angel-ZhangYinHan-Week8
- 2022Tencent Rhino-bird Open-source Training Program—YuFei Zhang
- Broken link Angel homepage in Linux FD HOT 1
- 使用doker的方式安装失败 HOT 2
- java: package com.tencent.angel.protobuf.generated not exist
- 编译问题,Maven build error HOT 2
- spark-on-angel-graph 编译报错 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from angel.