cnevd / difacto_dmlc Goto Github PK
View Code? Open in Web Editor NEWDistributed FM and LR based on Parameter Server with Ftrl
Distributed FM and LR based on Parameter Server with Ftrl
2016-06-30 14:44:10,301 INFO start listen on 10.32.44.171:9102
Traceback (most recent call last):
File "../../dmlc-core/tracker/../yarn//run_hdfs_prog.py", line 47, in
ret = subprocess.call(args = sys.argv[1:], env = env)
File "/usr/lib64/python2.6/subprocess.py", line 478, in call
p = Popen(_popenargs, *_kwargs)
File "/usr/lib64/python2.6/subprocess.py", line 642, in init
errread, errwrite)
File "/usr/lib64/python2.6/subprocess.py", line 1238, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.6/threading.py", line 484, in run
self.__target(_self.__args, *_self.__kwargs)
File "/home/formath/github/Difacto_DMLC/dmlc-core/tracker/tracker.py", line 345, in
self.thread = Thread(target = (lambda : subprocess.check_call(self.cmd, env=env, shell=True)), args = ())
File "/usr/lib64/python2.6/subprocess.py", line 505, in check_call
raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '../../dmlc-core/tracker/../yarn//run_hdfs_prog.py build/linear.dmlc guide/demo_hdfs.conf ' returned non-zero exit status 1
cool work, do you have plan to update dmlc-core and ps-lite, and put all ps-based algorithm together?
Hi there, I was testing the basic function of difacto on some simple data. The data format is libsvm, i.e. 10 1 2 3 5 (10 is value and rest are the feature ids).
But according to the dumped model I found it always missed some feature id, based on the data example I gave above, the model only has w and V for 1, 2 and 3, so 5 is missing.
Can someone give me some suggestions? Thanks!
for the line 32 in Difacto_DMLC/dmlc-core/tracker/dmlc_yarn.py ,
should '--server-nodes' be '--server_nodes' ?
when the model files are more than one parts, it goes wrong when predicting
I have tested L1 and L2 parameters ranged from 0.00 to 1000 and found that has no effect on the test auc anymore. Is there something incomplete in the code of the original project or I use it in a wrong way?
Can someone give a example about how to run linear on yarn?
root@hadoop-master:~/Difacto_DMLC/src/difacto# make
g++ -O3 -ggdb -Wall -std=c++11 -I./ -I../ -I../../ps-lite/src -I../../dmlc-core/include -I../../dmlc-core/src -I../../ps-lite/deps/include -fopenmp -fPIC -DDMLC_USE_HDFS=0 -DDMLC_USE_S3=0 -DDMLC_USE_GLOG=1 -DDMLC_USE_AZURE=0 build/config.pb.o build/difacto.o ../../dmlc-core/libdmlc.a ../../ps-lite/build/libps.a -fopenmp -lrt ../../ps-lite/deps/lib/libglog.a ../../ps-lite/deps/lib/libprotobuf.a ../../ps-lite/deps/lib/libgflags.a ../../ps-lite/deps/lib/libzmq.a ../../ps-lite/deps/lib/libcityhash.a ../../ps-lite/deps/lib/liblz4.a -lgssapi_krb5 -o build/difacto.dmlc
../../dmlc-core/libdmlc.a(hdfs_filesys.o): In function dmlc::io::HDFSFileSystem::~HDFSFileSystem()': hdfs_filesys.cc:(.text+0xba): undefined reference to
hdfsDisconnect'
../../dmlc-core/libdmlc.a(hdfs_filesys.o): In function dmlc::io::HDFSFileSystem::Open(dmlc::io::URI const&, char const*, bool)': hdfs_filesys.cc:(.text+0x6a0): undefined reference to
hdfsOpenFile'
../../dmlc-core/libdmlc.a(hdfs_filesys.o): In function dmlc::io::HDFSFileSystem::GetPathInfo(dmlc::io::URI const&)': hdfs_filesys.cc:(.text+0x16df): undefined reference to
hdfsGetPathInfo'
hdfs_filesys.cc:(.text+0x1cbd): undefined reference to hdfsFreeFileInfo' ../../dmlc-core/libdmlc.a(hdfs_filesys.o): In function
dmlc::io::HDFSFileSystem::HDFSFileSystem()':
hdfs_filesys.cc:(.text+0x27e1): undefined reference to hdfsConnect' ../../dmlc-core/libdmlc.a(hdfs_filesys.o): In function
dmlc::io::HDFSFileSystem::ListDirectory(dmlc::io::URI const&, std::vector<dmlc::io::FileInfo, std::allocatordmlc::io::FileInfo >)':
hdfs_filesys.cc:(.text+0x2d5a): undefined reference to hdfsListDirectory' hdfs_filesys.cc:(.text+0x3419): undefined reference to
hdfsFreeFileInfo'
../../dmlc-core/libdmlc.a(hdfs_filesys.o): In function dmlc::io::HDFSStream::~HDFSStream()': hdfs_filesys.cc:(.text._ZN4dmlc2io10HDFSStreamD2Ev[_ZN4dmlc2io10HDFSStreamD5Ev]+0x41): undefined reference to
hdfsCloseFile'
hdfs_filesys.cc:(.text._ZN4dmlc2io10HDFSStreamD2Ev[_ZN4dmlc2io10HDFSStreamD5Ev]+0xa2): undefined reference to hdfsDisconnect' ../../dmlc-core/libdmlc.a(hdfs_filesys.o): In function
dmlc::io::HDFSStream::Read(void, unsigned long)':
hdfs_filesys.cc:(.text._ZN4dmlc2io10HDFSStream4ReadEPvm[_ZN4dmlc2io10HDFSStream4ReadEPvm]+0x55): undefined reference to hdfsRead' ../../dmlc-core/libdmlc.a(hdfs_filesys.o): In function
dmlc::io::HDFSStream::Write(void const*, unsigned long)':
hdfs_filesys.cc:(.text._ZN4dmlc2io10HDFSStream5WriteEPKvm[_ZN4dmlc2io10HDFSStream5WriteEPKvm]+0x5e): undefined reference to hdfsWrite' ../../dmlc-core/libdmlc.a(hdfs_filesys.o): In function
dmlc::io::HDFSStream::Seek(unsigned long)':
hdfs_filesys.cc:(.text._ZN4dmlc2io10HDFSStream4SeekEm[_ZN4dmlc2io10HDFSStream4SeekEm]+0x2e): undefined reference to hdfsSeek' ../../dmlc-core/libdmlc.a(hdfs_filesys.o): In function
dmlc::io::HDFSStream::Tell()':
hdfs_filesys.cc:(.text._ZN4dmlc2io10HDFSStream4TellEv[_ZN4dmlc2io10HDFSStream4TellEv]+0x2b): undefined reference to `hdfsTell'
collect2: error: ld returned 1 exit status
make: *** [build/difacto.dmlc] Error 1
@CNevd 你好,你的 Difacto_DMLC 项目中给出了在本地执行和远程yarn执行两个例子,但是之前的在进行简化(parameter_server 变为 para lite)之前,parameter server是可以通过提供一个ip列表的形式分布式跑起来的,请问这个功能在 Difacto_DMLC 有吗?如果要加上这个功能的话要通过什么样子的修改呢?
sh run_yarn.sh
the error infomation:
18/01/24 13:26:09 WARN util.NativeCodeLoader (NativeCodeLoader.java:(62)) : Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/01/24 13:26:09 INFO client.RMProxy (RMProxy.java:createRMProxy(92)) : Connecting to ResourceManager at master4.osos.com/10.155.140.215:8032
18/01/24 13:26:16 INFO dmlc.Client (Client.java:run(289)) : jobname=DMLC[nworker=2,nsever=1]:difacto.dmlc,username=nlp_qrw
18/01/24 13:26:16 INFO dmlc.Client (Client.java:run(296)) : Submitting application application_1506450123858_250218
18/01/24 13:26:16 INFO impl.YarnClientImpl (YarnClientImpl.java:submitApplication(236)) : Submitted application application_1506450123858_250218
F0124 13:34:31.273530 7162 manager.cc:55] Timeout (500 sec) to wait all other nodes initialized. See commmets for more information
*** Check failure stack trace: ***
@ 0x482eaa google::LogMessage::Fail()
@ 0x484d72 google::LogMessage::SendToLog()
@ 0x482a8f google::LogMessage::Flush()
@ 0x48568e google::LogMessageFatal::~LogMessageFatal()
@ 0x4715e2 ps::Manager::Run()
@ 0x468e19 ps::Postoffice::Run()
@ 0x40813b main
@ 0x7f180e2dbb35 __libc_start_main
@ 0x40a221 (unknown)
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.7/threading.py", line 764, in run
self.__target(*self.__args, **self.__kwargs)
File "/data1/peak/peakzeng/Difacto_DMLC-master/dmlc-core/tracker/tracker.py", line 345, in
self.thread = Thread(target = (lambda : subprocess.check_call(self.cmd, env=env, shell=True)), args = ())
File "/usr/lib64/python2.7/subprocess.py", line 542, in check_call
raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '../../dmlc-core/tracker/../yarn//run_hdfs_prog.py build/difacto.dmlc guide/demo_hdfs.conf ' returned non-zero exit status 250
My hdfs data path is like this: /root/${date}/part-*
It will cause error when train_data written as blow examples:
1) /root/2016060*/part-*
2) /root/2016060[1-9]/part-*
3) /root/20160601/part-* /root/20160602/part-* ...
4) /root/20160601/part-*,/root/20160602/part-*, ...
So, how to set multiple hdfs training data path? Thanks.
When predict, we only get the proba, but we don't get the original samples.
for example:
y1 x11 x12 (sample 1)
y2 x21 x22 (sample 2)
we then get
predict_y1
predict_y2
but we don't know predict_y1 belongs to the prediction result of sample 1 or sample 2.
@CNevd Hi, Nevd, Can we upgrade to the newest dmlc-core version to introduce the hdfs viewfs schema support and we will appreciate your contribution.
root@hadoop-master:~/Difacto_DMLC/src/linear# sh run_local.sh
2016-11-16 15:27:54,784 INFO start listen on 172.18.0.2:9091
build/linear.dmlc: error while loading shared libraries: libhdfs.so.0.0.0: cannot open shared object file: No such file or directory
build/linear.dmlc: error while loading shared libraries: libhdfs.so.0.0.0: cannot open shared object file: No such file or directory
build/linear.dmlc: error while loading shared libraries: libhdfs.so.0.0.0: cannot open shared object file: No such file or directory
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "/root/Difacto_DMLC/dmlc-core/tracker/tracker.py", line 345, in
self.thread = Thread(target = (lambda : subprocess.check_call(self.cmd, env=env, shell=True)), args = ())
File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
raise CalledProcessError(retcode, cmd)
CalledProcessError: Command 'build/linear.dmlc guide/demo.conf ' returned non-zero exit status 127
build/dump.dmlc: error while loading shared libraries: libhdfs.so.0.0.0: cannot open shared object file: No such file or directory
./difacto.dmlc: /lib64/libc.so.6: version GLIBC_2.14' not found (required by ./difacto.dmlc) ./difacto.dmlc: /lib64/libc.so.6: version
GLIBC_2.14' not found (required by ./libstdc++.so.6)
I found the training or prediction is started in the last of
void StartDispatch()
namely,
/*ask all workers to start by sending an empty workload*/
Workload wl;
SendWorkload(ps::kWorkerGroup, wl);
But if the workload in the request sent to worker node is empty, worker node just jump out the process method and response nothing. And then the scheduler node could't receive response so not to assign new training or prediction workload to workers. So, how the training or prediction be started? I'm a little confused about this. Thanks for your response.
command is : ../../dmlc-core/tracker/dmlc_yarn.py --queue users.bigdata --hadoop_binary /opt/cloudera/parcels/CDH/bin/hadoop --vcores 1 -n 2 -s 1 build/linear.dmlc guide/demo_hdfs.conf
LogType:stderr
Log Upload Time:Fri Sep 15 12:23:24 +0800 2017
LogLength:606
Log Contents:
F0915 12:14:57.163543 2300 manager.cc:55] Timeout (500 sec) to wait all other nodes initialized. See commmets for more information
*** Check failure stack trace: ***
@ 0x496a2a google::LogMessage::Fail()
@ 0x4988f2 google::LogMessage::SendToLog()
@ 0x49660f google::LogMessage::Flush()
@ 0x49920e google::LogMessageFatal::~LogMessageFatal()
@ 0x47daf2 ps::Manager::Run()
@ 0x475339 ps::Postoffice::Run()
@ 0x408381 main
@ 0x7f3bdc936b15 __libc_start_main
@ 0x40a481 (unknown)
LogType:stdout
Log Upload Time:Fri Sep 15 12:23:24 +0800 2017
LogLength:75
Log Contents:
===============================argv: ['./linear.dmlc', './demo_hdfs.conf']
I used the agaricus data to test dump.cc , and have dumped a readable model.
But the model's keys are very large such as 602879701896396800 , the train&test data's feature index only 0 to 120+ .
So I confusing about what key's mean in the readable model? How can I match it with feature'index.
你好,我在运行run_local.sh脚本的时候,停在了2017-07-27 16:17:08,738 INFO start listen on 127.0.0.1:9091 这段INFO了,不知道是什么原因?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.