Git Product home page Git Product logo

Comments (8)

jerrinbright avatar jerrinbright commented on July 19, 2024 1

Yeah makes sense. I will get back to you if I find a soln. Thank you!

from 4d-humans.

DavidBoja avatar DavidBoja commented on July 19, 2024

The PL output estimation of the memory allocation of the whole mode is:

637 M     Trainable params
0         Non-trainable params
637 M     Total params
2,549.510 Total estimated model params size (MB)

Which does make sense - and is around what gets allocated on the gpu (2,706 is the actual size) when training starts.

However, once the training actually starts (when training_step starts running) the size just keeps increasing in each iteration. I am still at a loss, what exactly is causing that. I checked if any there are any appends of non-detached tensors, I stopped the logging, and reduced everything else I could, but the issue still persists.

Do you have any advice?

from 4d-humans.

jerrinbright avatar jerrinbright commented on July 19, 2024

Hi @DavidBoja, any luck finding the reason behind the spike? I am facing the same issue.

from 4d-humans.

DavidBoja avatar DavidBoja commented on July 19, 2024

No, unfortunately I did not.

I switched to other architectures since the architecture from this paper needs a lot of compute power (gpu) and I believe it primarily achieves good results because of the huge amount of data it is trained on.

I wish you luck. Let me know if you manage to find a solution please :).

from 4d-humans.

mlkorra avatar mlkorra commented on July 19, 2024

Hi @DavidBoja ,are you working on 3D Human Reconstruction problem? Also which architecture have you been using currently?

from 4d-humans.

DavidBoja avatar DavidBoja commented on July 19, 2024

Hi @mlkorra
I'm more focused on 3D data, rather than 2D data, but I'm interested in guided transformers like these, and non-learning NNs like these.

from 4d-humans.

geopavlakos avatar geopavlakos commented on July 19, 2024

Not sure of the exact setting you are working with, but one aspect that we observed that could create issues with GPU memory is using values for the number of workers that might be more than what it's available at the machine that we train on. In that case, we decreased that value and we avoided the GPU memory increase issue.

from 4d-humans.

DavidBoja avatar DavidBoja commented on July 19, 2024

Hi @geopavlakos,

Thanks for the help. I am working with two 12gb Nvidia cards. I tried lowering the number of workers but unfortunately this did not help.

I can run the demo successfully, but I face issues with the training. I tried lowering the number of workers, the batch size, and even played around with lowering the SMPL_HEAD depth and heads to only 2, but the issue still persists.
I think the number of workers should not be related to the issue I'm facing however (problem with gpu memory that keeps increasing), because (as I understand it) the workers prepare the dataset examples that are going to be batched in an training iteration - but they are only transferred on the gpu once the training loop starts.

On the other hand, I have never used pytorch lightning before, so maybe issues are arising from there.

In the meantime I have switched to other work so I'm not actively experimenting with the network, so maybe @jerriebright can share more input regarding the issue he is facing if he is still working on it - or if he has found a solution :).

from 4d-humans.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.