liuyuisanai / caffemex_v2 Goto Github PK
View Code? Open in Web Editor NEWEasily deploy multi-GPU caffe on Windows or Linux
License: Other
Easily deploy multi-GPU caffe on Windows or Linux
License: Other
Thank you for providing this version caffe for multi-gpu training in MATLAB? Have you used this code for craftGBD? I found my training code can work well, however, it didn't convergence using your code for training.
Which version of gcc is compatible to this caffe?
Dear Yu,
I was about to run a new experiment using the updated caffe (compilation passes) and yet, it keeps failing. Fuck. Server: 8 cards, titan x. ssh -P 2170 [email protected]
There's the snapshot of the terminal:
loss bbox weight is 10.00, must coincide with those in prototxts.
loss bbox weight is 10.00, must coincide with those in prototxts.
loss bbox weight is 5.00, must coincide with those in prototxts.
loss cls weight is 1.00, must coincide with those in prototxts.
loss cls weight is 1.00, must coincide with those in prototxts.
loss cls weight is 1.00, must coincide with those in prototxts.
|| Loading existant stats of RPN training data (/data/training_test_data/rpn/D03_s31/train.mat) ...
Done.
== iter 1 ==
iter 1, data, 3.698030
iter 1, train, 11.313035
iter: 1
TIME: oneIterTime: 15.92 s, est.LeftTime: 884.20 hours
LEVEL 1: accuracy: 0.2891, loss: 1.7593, (cls: 1.1324, bbox: 0.6268)
LEVEL 2: accuracy: 0.3247, loss: 2.5165, (cls: 1.1079, bbox: 1.4086)
LEVEL 3: accuracy: 0.3677, loss: 1.9367, (cls: 1.1204, bbox: 0.8163)
TOTAL: accuracy: 0.9815, loss: 6.2124, (cls: 3.3608, bbox: 2.8516)
== iter 2 ==
iter 2, data, 4.151567
F1006 13:48:31.281177 34061 math_functions.cu:121] Check failed: status == CUBLAS_STATUS_SUCCESS (11 vs. 0) CUBLAS_STATUS_MAPPING_ERROR
*** Check failure stack trace: ***
F1006 13:48:31.281177 34058F10 6: 13:48:121] .Check failed: 281182 status == CUBLAS_STATUS_SUCCESS (11 vs. 0) CUBLAS_STATUS_MAPP:121] Check failed: status == CUBLAS_STATUS_SUCCESS (11 vs. 0) CUBLAS_STATUS_MAPPING_ERRORF1006 13:48:31.281932 34062 math_functions.hpp:176] Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encounteredF1006 13:48:31.281960 34059 math_functions.hpp:176] Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encounteredF1006 13:48:31.282369 34060 math_functions.hpp:176] Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered
*** Check failure stack trace: ***
F1006 13:48:31.281177 34058F10 6: 13:48:121] .Check failed: 281182 status == CUBLAS_STATUS_SUCCESS (11 vs. 0) CUBLAS_STATUS_MAPP:121] Check failed: status == CUBLAS_STATUS_SUCCESS (11 vs. 0) CUBLAS_STATUS_MAPPING_ERRORF1006 13:48:31.281932 34062 math_functions.hpp:176] Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encounteredF1006 13:48:31.281960 34059 math_functions.hpp:176] Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encounteredF1006 13:48:31.282369 34060 math_functions.hpp:176] Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered
*** Check failure stack trace: ***
F1006 13:48:31.281177 34058F10 6: 13:48:121] .Check failed: 281182 status == CUBLAS_STATUS_SUCCESS (11 vs. 0) CUBLAS_STATUS_MAPP:121] Check failed: status == CUBLAS_STATUS_SUCCESS (11 vs. 0) CUBLAS_STATUS_MAPPING_ERRORF1006 13:48:31.281932 34062 math_functions.hpp:176] Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encounteredF1006 13:48:31.281960 34059 math_functions.hpp:176] Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encounteredF1006 13:48:31.282369 34060 math_functions.hpp:176] Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered
*** Check failure stack trace: ***
F1006 13:48:31.281177 34058F10 6: 13:48:121] .Check failed: 281182 status == CUBLAS_STATUS_SUCCESS (11 vs. 0) CUBLAS_STATUS_MAPP:121] Check failed: status == CUBLAS_STATUS_SUCCESS (11 vs. 0) CUBLAS_STATUS_MAPPING_ERRORF1006 13:48:31.281932 34062 math_functions.hpp:176] Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encounteredF1006 13:48:31.281960 34059 math_functions.hpp:176] Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encounteredF1006 13:48:31.282369 34060 math_functions.hpp:176] Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered
*** Check failure stack trace: ***
Segmentation fault (core dumped)
error C2995: “void caffe::Layer::Forward_gpu(const std::vector<caffe::Blob,std::allocator<_Ty>> &,const std::vector<caffe::Blob,std::allocator<_Ty>> &)”: 函数模板已经定义
Is the function "my_solver.snapshot('snapshot_path_and_name')" to save the "XXX.caffemodel.solverstate“ file?
--then we try to recover the traing proicessing by using the
function"caffe_solver.restore("XXX.caffemodel.solverstate") ,but there some errors: sgd_solver.cpp:316] Check failed: state.history_size() == history_.size() (0 vs. 450) Incorrect length of history blobs.
notes:in additions to this, we have not make any changes;
by reference the usage of the "caffe_solver.net{1}.save",we try "caffe_solver.net{1}.snapshot('snapshot_path_and_name')" to get the solverstate file. and another error occured:"No appropriate method, property, or field 'snapshot' for class 'caffe.Net'."
Caffe doesn't seem to be the most popular ML library today.
Could you please release a keras, tensorflow or pytorch demo version of the COCO loss layer?
Does this version of Caffe can reduce the occupied memory on singlecard?
Hi Yu,
Chill down. don't worry. I have successfully complied the newest version of your caffe on three machines and yet when I tried on the fourth one, the following error occurs when make all -j
.
Not sure how this happens. The worst case is that I will instead use old version without mem opt. on this specific machine.
Hongyang
CXX/LD -o .build_release/examples/mnist/convert_mnist_data.bin
CXX/LD -o .build_release/examples/cpp_classification/classification.bin
..build_release/build_release/lib./build_release/libcaffe.so: undefined libreference /libcaffe.soto : `cudnnConvolutionBackwardFilter_v3undefined '
reference .build_releaseto /lib`cudnnConvolutionBackwardFilter_v3/libcaffe.so'
: .build_releaseundefined /libreference /libcaffe.soto : `cudnnConvolutionBackwardData_v3undefined '
reference to .`build_release/cudnnConvolutionBackwardData_v3'
.collect2: error: ld returned 1 exit status
collect2: error: ld returned 1 exit status
build_release.liblib/./libcaffe.so: undefined/ libreferencebuild_releaselibcaffe.so:/lib undefined/libcaffe.so reference: build_releaseundefined toreference to/lib ` cudnnConvolutionBackwardFilter_v3to '
`.build_release/libcaffe.so/lib`cudnnConvolutionBackwardFilter_v3/libcaffe.socudnnConvolutionBackwardFilter_v3: :undefined '
reference undefinedto '
`cudnnConvolutionBackwardData_v3.build_release'
. referencebuild_release/ /libtolib/ /libcaffe.so`libcaffe.so:collect2: error: ld returned 1 exit status
: undefined cudnnConvolutionBackwardFilter_v3reference to ' undefined
. build_release/referencelib/libcaffe.so to: undefined ` referencecudnnConvolutionBackwardData_v3 to ``cudnnConvolutionBackwardData_v3cudnnConvolutionBackwardData_v3'
'
'/libcaffe.so:collect2: error: ld returned 1 exit status
make: collect2: error: ld returned 1 exit status
*** [.build_release/tools/compute_image_mean.bin] Error 1
make: undefined reference to `cudnnConvolutionBackwardFilter_v3'
.build_release/lib*** Waiting for unfinished jobs..../
libcaffe.so: undefinedmake: *** [.build_release/tools/convert_imageset.bin] Error 1 reference to `cudnnConvolutionBackwardData_v3'
make: *** [.build_release/examples/cifar10/convert_cifar_data.bin] Error 1
collect2: error: ld returned 1 exit status
make: *** [.build_release/tools/upgrade_solver_proto_text.bin] Error 1
make: *** [.build_release/tools/upgrade_net_proto_binary.bin] Error 1
collect2: error: ld returned 1 exit status
make: *** [.build_release/tools/upgrade_net_proto_text.bin] Error 1
make: *** [.build_release/tools/extract_features.bin] Error 1
.build_release/lib/libcaffe.so: undefined reference to `cudnnConvolutionBackwardFilter_v3'
.build_release/lib/libcaffe.so: undefined reference to `cudnnConvolutionBackwardData_v3'
collect2: error: ld returned 1 exit status
make: *** [.build_release/tools/caffe.bin] Error 1
I build this version caffe on workstation with 4×1080TI, but I find that log.cpp, log.hpp and log.cu are missing.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.