I'm trying to understand why there is a large discrepancy (between the 'Used Memory' o

It depends on where the reporter is collecting GPU tensors, e.g. if you report t

Question: discrepancy between MemReporter 'Used Memory' and 'nvtop' about pytorch_memlab HOT 5 CLOSED

indigoviolet commented on August 16, 2024 1

Question: discrepancy between MemReporter 'Used Memory' and 'nvtop'

from pytorch_memlab.

Comments (5)

Stonesjtu commented on August 16, 2024

It depends on where the reporter is collecting GPU tensors, e.g. if you report the memory usage after backward, then probably most intermediate tensors are freed already.
The 15G you get from nvtop / nvidia-smi is the peak memory usage during the whole computation (includes forward / backward / optimization), probably the CNN algorithm you chose requires a lot of memory as its' workspace, or the feature maps are too big while not tracked by python object.
Pytorch-1.6 introduces a new memory profiler utils, I think this may help tackle the autograd graph problem in this tool.

Can you post a snippet how the memory reporter is used in you training scripts?

from pytorch_memlab.

indigoviolet commented on August 16, 2024

That's helpful, thanks for the quick response. I tried to add reporting at intermediate stages during training, and identified that the losses = model(input) step leads to the biggest jump in allocated memory. At the end of that step, MemReporter reports 7GB allocated (peak is now 15GB), of which about 150MB is accounted for by tensors in the report. I haven't been able to figure out how that 7GB breaks down beyond the 150MB, or (2) what is leading to the peak 15GB usage.

I guess a lot of this could be the feature maps, how can I confirm it?

I did try using the memory profiling using torch.autograd.profiler from pytorch 1.6, but so far I haven't been able to make great sense of the output (see below). I see that there are functions that are allocating lots of memory, but it's unclear how to trace where those functions are being invoked in a complex model.

------------------------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  
Name                                  Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     CPU Mem          Self CPU Mem     CUDA Mem         Self CUDA Mem    Number of Calls  
------------------------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  
empty                                 1.84%            24.998ms         1.84%            24.998ms         23.538us         168 b            168 b            24.20 Gb         24.20 Gb         1062             
resize_                               0.36%            4.899ms          0.36%            4.899ms          12.155us         312 b            312 b            12.33 Gb         12.33 Gb         403              
nonzero                               9.59%            130.246ms        9.59%            130.246ms        3.831ms          0 b              0 b              8.07 Gb          8.07 Gb          34               
sub                                   0.17%            2.371ms          0.44%            5.974ms          49.787us         0 b              0 b              6.75 Gb          0 b              120

from pytorch_memlab.

Stonesjtu commented on August 16, 2024

if you report the memory usage after losses = model(input), then most intermediate tensors go from python variables to C-level storages, which is not trackable in pure python code. I believe this is the reason for such a large gap between the reported results and the actual allocated memory.

Can you plz try to use the memory_profiler (https://github.com/Stonesjtu/pytorch_memlab#memory-profiler) to profile your model's forward function line by line.

from pytorch_memlab.

indigoviolet commented on August 16, 2024

I was able to get a better picture of why my peak memory usage was high after using the memory profiler. Thanks for your advice!

from pytorch_memlab.

Stonesjtu commented on August 16, 2024

would you mind sharing the pipeline how do you debug the memory problem here?

…

On Wed, Aug 12, 2020 at 12:06 PM Venky Iyer ***@***.***> wrote: I was able to get a better picture of why my peak memory usage was high after using the memory profiler. Thanks for your advice! — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#15 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABCYKDB35RFKP2WYJ5SR57DSAIINBANCNFSM4PXKDQLQ> .

from pytorch_memlab.

Question: discrepancy between MemReporter 'Used Memory' and 'nvtop' about pytorch_memlab HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent