Git Product home page Git Product logo

Comments (7)

nqanh avatar nqanh commented on July 29, 2024 1

I'm using cuda 8 and cudnn 5. I can build with cudnn without any problem. You may want check which cudnn version is ok for your cuda on Caffe site.

from affordance-net.

nqanh avatar nqanh commented on July 29, 2024

Currently, some python layers from Faster R-CNN do not support multi gpu training. You may want to check this repo to know how to do this. The integration process may be very complicated. Unfortunately, I don't have access to multi gpu to test it. Good luck!

from affordance-net.

deokisys avatar deokisys commented on July 29, 2024

@nqanh what's you're cudnn and cuda version?
I want to make caffe USE_CUDNN := 1 is not working. I think that is cudnn version problem.

from affordance-net.

deokisys avatar deokisys commented on July 29, 2024

@nqanh Please forgive me for many questioning.
I using cudnn5,5.1 above, but it doesn't work, then i use cudnn 4 that worked.
and trainig time about 1iterator is almost <1 second.
then training time is almost 30days.
when you training this for 2,000,000iterator, how long did it takes in your computer?

from affordance-net.

nqanh avatar nqanh commented on July 29, 2024

@deokisys No problems! Just tell us if you have any problems with the code.

About the training time, it depends on your hardware and how big your dataset is. For example, for IIT-AFF dataset, we have around 6K training images and training for 200000 (not 2, 000, 000 as you said) takes around 2 days on a Titan X.

from affordance-net.

deokisys avatar deokisys commented on July 29, 2024

@nqanh thank you I missed it.
and I have loss problem.

I0125 21:00:01.081059 2920 solver.cpp:229] Iteration 160, loss = nan
I0125 21:00:01.081090 2920 solver.cpp:245] Train net output #0: loss_bbox = nan (* 2 = nan loss)
I0125 21:00:01.081097 2920 solver.cpp:245] Train net output #1: loss_cls = 87.3365 (* 3 = 262.01 loss)
I0125 21:00:01.081102 2920 solver.cpp:245] Train net output #2: loss_mask = 25.5312 (* 3 = 76.5937 loss)
I0125 21:00:01.081107 2920 solver.cpp:245] Train net output #3: rpn_cls_loss = 0.538947 (* 1 = 0.538947 loss)
I0125 21:00:01.081112 2920 solver.cpp:245] Train net output #4: rpn_loss_bbox = 0.0324526 (* 1 = 0.0324526 loss)
I0125 21:00:01.081117 2920 sgd_solver.cpp:106] Iteration 160, lr = 0.001

if i make caffe without cudnn , then training speed is down then with cudnn (1 iterator is almos 2s), but loss is going down.
if i make caffe with cudnn, then training speed is going up (1 iterator is almost 0.8s) , but loss is 'nan'.
is not good? loss 'nan' mean about not training?

from affordance-net.

nqanh avatar nqanh commented on July 29, 2024

Loss=nan is a serious problem. Sometimes we have this problem due to the numerical explosion in some python layers (from Faster R-CNN). It's quite random and also depends on your GPU (and cuda). You should stop the training and try to avoid this problem because the network is dead if loss=nan. Also, we do not use cuDNN during training.

from affordance-net.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.