Git Product home page Git Product logo

Comments (5)

Stonesjtu avatar Stonesjtu commented on August 16, 2024
  1. It depends on where the reporter is collecting GPU tensors, e.g. if you report the memory usage after backward, then probably most intermediate tensors are freed already.
  2. The 15G you get from nvtop / nvidia-smi is the peak memory usage during the whole computation (includes forward / backward / optimization), probably the CNN algorithm you chose requires a lot of memory as its' workspace, or the feature maps are too big while not tracked by python object.
  3. Pytorch-1.6 introduces a new memory profiler utils, I think this may help tackle the autograd graph problem in this tool.

Can you post a snippet how the memory reporter is used in you training scripts?

from pytorch_memlab.

indigoviolet avatar indigoviolet commented on August 16, 2024

That's helpful, thanks for the quick response. I tried to add reporting at intermediate stages during training, and identified that the losses = model(input) step leads to the biggest jump in allocated memory. At the end of that step, MemReporter reports 7GB allocated (peak is now 15GB), of which about 150MB is accounted for by tensors in the report. I haven't been able to figure out how that 7GB breaks down beyond the 150MB, or (2) what is leading to the peak 15GB usage.

I guess a lot of this could be the feature maps, how can I confirm it?

I did try using the memory profiling using torch.autograd.profiler from pytorch 1.6, but so far I haven't been able to make great sense of the output (see below). I see that there are functions that are allocating lots of memory, but it's unclear how to trace where those functions are being invoked in a complex model.

------------------------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  
Name                                  Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     CPU Mem          Self CPU Mem     CUDA Mem         Self CUDA Mem    Number of Calls  
------------------------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  
empty                                 1.84%            24.998ms         1.84%            24.998ms         23.538us         168 b            168 b            24.20 Gb         24.20 Gb         1062             
resize_                               0.36%            4.899ms          0.36%            4.899ms          12.155us         312 b            312 b            12.33 Gb         12.33 Gb         403              
nonzero                               9.59%            130.246ms        9.59%            130.246ms        3.831ms          0 b              0 b              8.07 Gb          8.07 Gb          34               
sub                                   0.17%            2.371ms          0.44%            5.974ms          49.787us         0 b              0 b              6.75 Gb          0 b              120            

from pytorch_memlab.

Stonesjtu avatar Stonesjtu commented on August 16, 2024

if you report the memory usage after losses = model(input), then most intermediate tensors go from python variables to C-level storages, which is not trackable in pure python code. I believe this is the reason for such a large gap between the reported results and the actual allocated memory.

Can you plz try to use the memory_profiler (https://github.com/Stonesjtu/pytorch_memlab#memory-profiler) to profile your model's forward function line by line.

from pytorch_memlab.

indigoviolet avatar indigoviolet commented on August 16, 2024

I was able to get a better picture of why my peak memory usage was high after using the memory profiler. Thanks for your advice!

from pytorch_memlab.

Stonesjtu avatar Stonesjtu commented on August 16, 2024

from pytorch_memlab.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.