Git Product home page Git Product logo

Comments (3)

ezamyatin avatar ezamyatin commented on May 26, 2024

I have the same problem. If I run in the yarn-client mode, there is error "File does not exist: hdfs://hdfs/tmp/hadoop-yarn/e.zamyatin/.staging/application_1661620604842_4444/libjars/kryo-shaded-4.0.0.jar".
Also the command to run AngelApplicationMaster is strange:
[2022-08-28 22:41:34.854+0300] INFO com.tencent.angel.client.yarn.AngelYarnClient: Command to launch container for ApplicationMaster is : $JAVA_HOME/bin/java -Dlog4j.configuration=log/angel.properties -Dlog4j.logger.com.tencent.ml=DEBUG -Dyarn.app.contai
ner.log.dir=<LOG_DIR> -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx1536M -Xms1536M -XX:PermSize=100M -XX:MaxPermSize=200M -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintCommandLineFlags -XX:+PrintTenuringDistribution -XX:+
PrintAdaptiveSizePolicy -Xloggc:<LOG_DIR>/gc.log com.tencent.angel.master.AngelApplicationMaster 1><LOG_DIR>/stdout 2><LOG_DIR>/stderr
Seems like <LOG_DIR> should be replaced somewhere.

from angel.

ezamyatin avatar ezamyatin commented on May 26, 2024

The problem was hadoop version. I found answer here: #628

from angel.

ouyangwen-it avatar ouyangwen-it commented on May 26, 2024

Environment:

  • Java version:1.8
  • Scala version:2.11.8
  • Spark version:2.4.5
  • PyTorch and Python version:
  • OS and version:centos7.4
22/08/26 16:51:21 ERROR Client: Application diagnostics message: User class threw exception: com.tencent.angel.exception.AngelException: init AngelPSContext fail, please check logs of master of angel
        at com.tencent.angel.spark.context.PSContext$.liftedTree1$1(PSContext.scala:101)
        at com.tencent.angel.spark.context.PSContext$.instance(PSContext.scala:94)
        at com.tencent.angel.spark.context.PSContext$.getOrCreate(PSContext.scala:78)
        at com.tencent.angel.graph.rank.pagerank.edgecut.PageRank.transform(PageRank.scala:83)
        at com.tencent.angel.spark.examples.cluster.PageRankExample$.edgeCutPageRank(PageRankExample.scala:114)
        at com.tencent.angel.spark.examples.cluster.PageRankExample$.main(PageRankExample.scala:65)
        at com.tencent.angel.spark.examples.cluster.PageRankExample.main(PageRankExample.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:685)

Exception in thread "main" org.apache.spark.SparkException: Application application_1657344020931_1430915 finished with failed status
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:1158)
        at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1535)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:852)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:927)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:936)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

可以看下ps的master日志信息吗,这边显示的ps没拉起来

from angel.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.