Comments (5)
- It depends on where the reporter is collecting GPU tensors, e.g. if you report the memory usage after backward, then probably most intermediate tensors are freed already.
- The 15G you get from
nvtop
/nvidia-smi
is the peak memory usage during the whole computation (includesforward
/backward
/optimization
), probably the CNN algorithm you chose requires a lot of memory as its' workspace, or the feature maps are too big while not tracked by python object. - Pytorch-1.6 introduces a new memory profiler utils, I think this may help tackle the autograd graph problem in this tool.
Can you post a snippet how the memory reporter is used in you training scripts?
from pytorch_memlab.
That's helpful, thanks for the quick response. I tried to add reporting at intermediate stages during training, and identified that the losses = model(input)
step leads to the biggest jump in allocated memory. At the end of that step, MemReporter reports 7GB allocated (peak is now 15GB), of which about 150MB is accounted for by tensors in the report. I haven't been able to figure out how that 7GB breaks down beyond the 150MB, or (2) what is leading to the peak 15GB usage.
I guess a lot of this could be the feature maps, how can I confirm it?
I did try using the memory profiling using torch.autograd.profiler
from pytorch 1.6, but so far I haven't been able to make great sense of the output (see below). I see that there are functions that are allocating lots of memory, but it's unclear how to trace where those functions are being invoked in a complex model.
------------------------------------ --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ---------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls
------------------------------------ --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ---------------
empty 1.84% 24.998ms 1.84% 24.998ms 23.538us 168 b 168 b 24.20 Gb 24.20 Gb 1062
resize_ 0.36% 4.899ms 0.36% 4.899ms 12.155us 312 b 312 b 12.33 Gb 12.33 Gb 403
nonzero 9.59% 130.246ms 9.59% 130.246ms 3.831ms 0 b 0 b 8.07 Gb 8.07 Gb 34
sub 0.17% 2.371ms 0.44% 5.974ms 49.787us 0 b 0 b 6.75 Gb 0 b 120
from pytorch_memlab.
if you report the memory usage after losses = model(input)
, then most intermediate tensors go from python variables to C-level storages, which is not trackable in pure python code. I believe this is the reason for such a large gap between the reported results and the actual allocated memory.
Can you plz try to use the memory_profiler
(https://github.com/Stonesjtu/pytorch_memlab#memory-profiler) to profile your model's forward
function line by line.
from pytorch_memlab.
I was able to get a better picture of why my peak memory usage was high after using the memory profiler. Thanks for your advice!
from pytorch_memlab.
from pytorch_memlab.
Related Issues (20)
- Fail to install using pip HOT 4
- Request: solving the lack of incremental reporting in loops / functions HOT 2
- Error when running on Colab CPU instance HOT 8
- turn profile decorator on and off HOT 2
- Jupyter Support Issues HOT 1
- Does not work with torch 1.7.1+ HOT 1
- is PyTorch Profiler Used Internally? HOT 3
- weakly-referenced object no longer exists HOT 2
- Question about Used Memory and GPU memory HOT 3
- Naming Tensors within MemReporter HOT 1
- OSError: could not get source code when running LineProfiler HOT 2
- What's the difference between `active_bytes` and `reserved_bytes`? HOT 3
- How to get memory diff between first and second batch? HOT 1
- Nor working HOT 4
- How do I record the maximum memory usage of a script?
- KeyError: "['XXX'] not in index" HOT 1
- DataFrame.applymap has been deprecated
- Should we remove /requirements.txt? HOT 1
- tensor.storage() is deprecated
- `.report()` print issue with `torch.SymInt` / `torch.SymFloat` from `torch.compile`'d modules HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch_memlab.