17/06/20 07:25:11 INFO utils.AngelRunJar: angelHomePath conf path=/host/root/angel/dist/target/angel-1.0.0-bin/bin/..//conf/angel-site.xml
17/06/20 07:25:11 INFO utils.AngelRunJar: load system config file success
17/06/20 07:25:11 INFO utils.UGITools: UGI_PROPERTY_NAME is null
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/host/root/angel/dist/target/angel-1.0.0-bin/lib/slf4j-log4j12-1.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/06/20 07:25:12 INFO model.PSModel: After training matrix lr_weight will be saved to hdfs://model-hdp00:8020/user/wuhaonan/test/model
17/06/20 07:25:12 INFO utils.UGITools: UGI_PROPERTY_NAME is null
17/06/20 07:25:12 INFO impl.TimelineClientImpl: Timeline service address: http://model-hdp00:8188/ws/v1/timeline/
17/06/20 07:25:13 INFO client.RMProxy: Connecting to ResourceManager at model-hdp00/172.27.2.60:8032
17/06/20 07:25:13 INFO client.AngelClient: running mode = ANGEL_PS_WORKER
17/06/20 07:25:13 INFO utils.HdfsUtil: tmp output dir is hdfs://model-hdp00:8020/tmp/root/application_1497010209719_0829_4960f24c-61b4-4cfb-8c8b-c70bfc143dc1
17/06/20 07:25:13 INFO utils.HdfsUtil: tmp output dir is hdfs://model-hdp00:8020/tmp/root/application_1497010209719_0829_e519601b-e9fb-4db4-bdf9-336d6f42feea
17/06/20 07:25:13 INFO client.AngelClient: angel.tmp.output.path=hdfs://model-hdp00:8020/tmp/root/application_1497010209719_0829_4960f24c-61b4-4cfb-8c8b-c70bfc143dc1
17/06/20 07:25:13 INFO client.AngelClient: internal state file is hdfs://model-hdp00:8020/tmp/root/application_1497010209719_0829_e519601b-e9fb-4db4-bdf9-336d6f42feea/state
17/06/20 07:25:13 INFO yarn.AngelYarnClient: default FileSystem: hdfs://model-hdp00:8020
17/06/20 07:25:13 INFO yarn.AngelYarnClient: libjarsDir=/tmp/hadoop-yarn/root/.staging/application_1497010209719_0829/libjars
17/06/20 07:25:13 INFO yarn.AngelYarnClient: libjars=file:///host/root/angel/dist/target/angel-1.0.0-bin/bin/..//lib/memory-0.8.1.jar,file:///host/root/angel/dist/target/angel-1.0.0-bin/bin/..//lib/sketches-core-0.8.1.jar,file:///host/root/angel/dist/target/angel-1.0.0-bin/bin/..//lib/commons-pool-1.6.jar,file:///host/root/angel/dist/target/angel-1.0.0-bin/bin/..//lib/kryo-shaded-3.0.3.jar,file:///host/root/angel/dist/target/angel-1.0.0-bin/bin/..//lib/scala-library-2.11.8.jar,file:///host/root/angel/dist/target/angel-1.0.0-bin/bin/..//lib/angel-ps-core-1.0.0.jar,file:///host/root/angel/dist/target/angel-1.0.0-bin/bin/..//lib/angel-ps-mllib-1.0.0.jar,file:///host/root/angel/dist/target/angel-1.0.0-bin/bin/..//lib/angel-ps-examples-1.0.0.jar,file:///host/root/angel/dist/target/angel-1.0.0-bin/bin/..//lib/angel-ps-psf-1.0.0.jar,file:///host/root/angel/dist/target/angel-1.0.0-bin/bin/..//lib/fastutil-7.1.0.jar,file:///host/root/angel/dist/target/angel-1.0.0-bin/bin/..//lib/sizeof-0.3.0.jar,file:///host/root/angel/dist/target/angel-1.0.0-bin/bin/..//lib/kryo-shaded-3.0.3.jar,file:///host/root/angel/dist/target/angel-1.0.0-bin/bin/..//lib/minlog-1.3.0.jar,file:///host/root/angel/dist/target/angel-1.0.0-bin/bin/..//lib/breeze_2.11-0.12.jar
AppMaster capability = <memory:2048, vCores:1>
17/06/20 07:25:15 INFO yarn.AngelYarnClient: Command to launch container for ApplicationMaster is : $JAVA_HOME/bin/java -Dlog4j.configuration=log/angel.properties -Dlog4j.logger.com.tencent.ml=DEBUG -Dyarn.app.container.log.dir=<LOG_DIR> -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx1024m com.tencent.angel.master.AngelApplicationMaster 1><LOG_DIR>/stdout 2><LOG_DIR>/stderr
17/06/20 07:25:15 INFO yarn.AngelYarnClient: ApplicationSubmissionContext Queuename : default
17/06/20 07:25:15 INFO impl.YarnClientImpl: Submitted application application_1497010209719_0829
17/06/20 07:25:26 INFO yarn.AngelYarnClient: appMaster getTrackingUrl = http://model-hdp00:8088/cluster/app/application_1497010209719_0829/
17/06/20 07:25:26 INFO yarn.AngelYarnClient: master host=172.27.2.62, port=21030
17/06/20 07:25:26 INFO yarn.AngelYarnClient: start to create rpc client to am
17/06/20 07:35:37 ERROR yarn.AngelYarnClient: submit application to yarn failed.
com.google.protobuf.ServiceException: java.util.concurrent.ExecutionException: java.io.IOException: Error connecting to /172.27.2.62:21030
at com.tencent.angel.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:312)
at com.sun.proxy.$Proxy24.getAllPSLocation(Unknown Source)
at com.tencent.angel.client.AngelClient.waitForAllPS(AngelClient.java:710)
at com.tencent.angel.client.yarn.AngelYarnClient.startPSServer(AngelYarnClient.java:167)
at com.tencent.angel.ml.MLRunner$class.train(MLRunner.scala:46)
at com.tencent.angel.ml.classification.lr.LRRunner.train(LRRunner.scala:28)
at com.tencent.angel.ml.classification.lr.LRRunner.train(LRRunner.scala:40)
at com.tencent.angel.ml.MLRunner$class.submit(MLRunner.scala:90)
at com.tencent.angel.ml.classification.lr.LRRunner.submit(LRRunner.scala:28)
at com.tencent.angel.utils.AngelRunJar$1.run(AngelRunJar.java:124)
at com.tencent.angel.utils.AngelRunJar$1.run(AngelRunJar.java:110)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at com.tencent.angel.utils.AngelRunJar.main(AngelRunJar.java:110)
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Error connecting to /172.27.2.62:21030
at com.tencent.angel.ipc.CallFuture.get(CallFuture.java:126)
at com.tencent.angel.ipc.NettyTransceiver.call(NettyTransceiver.java:462)
at com.tencent.angel.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:290)
... 14 more
Caused by: java.io.IOException: Error connecting to /172.27.2.62:21030
at com.tencent.angel.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:283)
at com.tencent.angel.ipc.NettyTransceiver.writeDataPack(NettyTransceiver.java:535)
at com.tencent.angel.ipc.NettyTransceiver.transceive(NettyTransceiver.java:505)
at com.tencent.angel.ipc.NettyTransceiver.call(NettyTransceiver.java:458)
... 15 more
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:148)
at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:104)
at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:78)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:41)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
com.tencent.angel.exception.AngelException: com.google.protobuf.ServiceException: java.util.concurrent.ExecutionException: java.io.IOException: Error connecting to /172.27.2.62:21030
at com.tencent.angel.client.yarn.AngelYarnClient.startPSServer(AngelYarnClient.java:171)
at com.tencent.angel.ml.MLRunner$class.train(MLRunner.scala:46)
at com.tencent.angel.ml.classification.lr.LRRunner.train(LRRunner.scala:28)
at com.tencent.angel.ml.classification.lr.LRRunner.train(LRRunner.scala:40)
at com.tencent.angel.ml.MLRunner$class.submit(MLRunner.scala:90)
at com.tencent.angel.ml.classification.lr.LRRunner.submit(LRRunner.scala:28)
at com.tencent.angel.utils.AngelRunJar$1.run(AngelRunJar.java:124)
at com.tencent.angel.utils.AngelRunJar$1.run(AngelRunJar.java:110)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at com.tencent.angel.utils.AngelRunJar.main(AngelRunJar.java:110)
Caused by: com.google.protobuf.ServiceException: java.util.concurrent.ExecutionException: java.io.IOException: Error connecting to /172.27.2.62:21030
at com.tencent.angel.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:312)
at com.sun.proxy.$Proxy24.getAllPSLocation(Unknown Source)
at com.tencent.angel.client.AngelClient.waitForAllPS(AngelClient.java:710)
at com.tencent.angel.client.yarn.AngelYarnClient.startPSServer(AngelYarnClient.java:167)
... 11 more
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Error connecting to /172.27.2.62:21030
at com.tencent.angel.ipc.CallFuture.get(CallFuture.java:126)
at com.tencent.angel.ipc.NettyTransceiver.call(NettyTransceiver.java:462)
at com.tencent.angel.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:290)
... 14 more
Caused by: java.io.IOException: Error connecting to /172.27.2.62:21030
at com.tencent.angel.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:283)
at com.tencent.angel.ipc.NettyTransceiver.writeDataPack(NettyTransceiver.java:535)
at com.tencent.angel.ipc.NettyTransceiver.transceive(NettyTransceiver.java:505)
at com.tencent.angel.ipc.NettyTransceiver.call(NettyTransceiver.java:458)
... 15 more
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:148)
at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:104)
at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:78)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:41)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
2017-06-20 15:26:40,812 INFO [1379994152@qtp-422522663-15] org.apache.hadoop.yarn.webapp.View: before compute worker state items
2017-06-20 15:26:41,774 INFO [1865938769@qtp-422522663-8] org.apache.hadoop.yarn.webapp.View: before compute worker state items
2017-06-20 15:26:42,289 INFO [1690374204@qtp-422522663-7] org.apache.hadoop.yarn.webapp.View: before compute worker state items
2017-06-20 15:26:42,812 INFO [628989099@qtp-422522663-14] org.apache.hadoop.yarn.webapp.View: before compute worker state items
2017-06-20 15:26:43,251 INFO [628989099@qtp-422522663-14] org.apache.hadoop.yarn.webapp.View: before compute worker state items
2017-06-20 15:26:43,611 INFO [1883304309@qtp-422522663-11] org.apache.hadoop.yarn.webapp.View: before compute worker state items
2017-06-20 15:30:09,534 INFO [707941836@qtp-422522663-4] org.apache.hadoop.yarn.webapp.View: before compute worker state items
2017-06-20 15:30:18,249 INFO [1082928458@qtp-422522663-5] org.apache.hadoop.yarn.webapp.View: before compute worker state items
2017-06-20 15:34:52,296 INFO [846664587@qtp-422522663-9] org.apache.hadoop.yarn.webapp.View: before compute worker state items
2017-06-20 15:35:00,537 INFO [707941836@qtp-422522663-4] org.apache.hadoop.yarn.webapp.View: before compute worker state items
2017-06-20 15:35:02,072 INFO [2093319848@qtp-422522663-0] org.apache.hadoop.yarn.webapp.View: before compute worker state items
2017-06-20 15:35:03,185 INFO [971746334@qtp-422522663-17] org.apache.hadoop.yarn.webapp.View: before compute worker state items
2017-06-20 15:35:03,912 INFO [1865938769@qtp-422522663-8] org.apache.hadoop.yarn.webapp.View: before compute worker state items
2017-06-20 15:35:05,704 INFO [1082928458@qtp-422522663-5] org.apache.hadoop.yarn.webapp.View: before compute worker state items
2017-06-20 15:35:07,095 INFO [1082928458@qtp-422522663-5] org.apache.hadoop.yarn.webapp.View: before compute worker state items
2017-06-20 15:35:08,273 INFO [1865938769@qtp-422522663-8] org.apache.hadoop.yarn.webapp.View: before compute worker state items
2017-06-20 15:35:10,411 INFO [1865938769@qtp-422522663-8] org.apache.hadoop.yarn.webapp.View: before compute worker state items
2017-06-20 15:35:10,463 FATAL [app-state-monitor] com.tencent.angel.master.app.InternalErrorEvent: app in state NEW over 600000 milliseconds!
2017-06-20 15:35:10,463 FATAL [app-state-monitor] com.tencent.angel.master.app.InternalErrorEvent: app in state NEW over 600000 milliseconds!
2017-06-20 15:35:10,463 FATAL [app-state-monitor] com.tencent.angel.master.app.InternalErrorEvent: app in state NEW over 600000 milliseconds!
2017-06-20 15:35:10,463 FATAL [app-state-monitor] com.tencent.angel.master.app.InternalErrorEvent: app in state NEW over 600000 milliseconds!
2017-06-20 15:35:10,463 FATAL [app-state-monitor] com.tencent.angel.master.app.InternalErrorEvent: app in state NEW over 600000 milliseconds!
2017-06-20 15:35:10,463 FATAL [app-state-monitor] com.tencent.angel.master.app.InternalErrorEvent: app in state NEW over 600000 milliseconds!
2017-06-20 15:35:10,464 INFO [AsyncDispatcher event handler] com.tencent.angel.master.app.App: some error happened, InternalErrorEvent [errorMsg=app in state NEW over 600000 milliseconds!, getType()=INTERNAL_ERROR]
2017-06-20 15:35:10,464 INFO [AsyncDispatcher event handler] com.tencent.angel.master.app.App: application_1497010209719_0829Job Transitioned from NEW to FAILED
2017-06-20 15:35:10,465 INFO [Thread-281] com.tencent.angel.master.AngelApplicationMaster: Calling stop for all the services
2017-06-20 15:35:10,485 INFO [Thread-281] com.tencent.angel.master.deploy.ContainerAllocator: to unregister from Yarn RM
2017-06-20 15:35:10,486 INFO [Thread-281] com.tencent.angel.master.deploy.ContainerAllocator: Setting job diagnostics to app in state NEW over 600000 milliseconds!
2017-06-20 15:35:10,494 INFO [Thread-281] com.tencent.angel.master.deploy.ContainerAllocator: Waiting for application to be successfully unregistered.
2017-06-20 15:35:15,496 INFO [Thread-281] com.tencent.angel.master.deploy.ContainerAllocator: ContainerAllocator service stop!
2017-06-20 15:35:15,497 INFO [Thread-281] com.tencent.angel.master.oplog.AppStateStorage: app-state-writter service stop!
2017-06-20 15:35:15,497 INFO [Thread-281] com.tencent.angel.master.AngelApplicationMaster: start to write app state to file and clear tmp directory
2017-06-20 15:35:15,497 INFO [Thread-281] com.tencent.angel.master.AngelApplicationMaster: start to write app state to file hdfs://model-hdp00:8020/tmp/root/application_1497010209719_0829_e519601b-e9fb-4db4-bdf9-336d6f42feea/state
2017-06-20 15:35:15,680 INFO [Thread-281] com.tencent.angel.master.app.App: write app report to file successfully jobReport {
jobState: J_FAILED
curIteration: 0
totalIteration: 100
diagnostics: "app in state NEW over 600000 milliseconds!"
}