Comments (3)
Q1: Do you know how to explain this: If I keep the same batch-size, but change how I partition the self.features internally (into checkpointed segments), the active_bytes of the next non-checkpointed line self.classifier(out) also changed.
The column (or metric) active bytes peak all
is actually the peak active bytes during the execution of this line, it's an accumulated value which depends on the active bytes
before the execution of this line.
e.g. you have 4 Linear
layer in nn.Sequential
, checkpointing after layers[after] would consume less active bytes than checkpointing after layer[0].
Q2: So how to explain the reserved_bytes, i.e. 10.80G, 8.77G, in the stats generated by pytorch_memlab above? Does it mean that pytorch internally allocates much more GPU memory than it really needs?
According to the pytorch documentation:
PyTorch uses a caching memory allocator to speed up memory allocations. This allows fast memory deallocation without device synchronizations. However, the unused memory managed by the allocator will still show as if used in nvidia-smi.
Actually it needs the cached memory at a certain point of execution, but at the time of your torch.cuda.max_memory_allocated
, it doesn't need so much memory space. You can try torch.cuda.empty_cache()
before getting torch.cuda.max_memory_allocated
.
from pytorch_memlab.
PyTorch caches CUDA memory to prevent repeated memory allocatation cost, you can get more information here:
https://pytorch.org/docs/stable/notes/cuda.html#cuda-memory-management
In your case, the reserved bytes should be peak memory usage before checkpointing
, while active bytes should be the current memory usage after `checkpointing'
from pytorch_memlab.
## VGG.forward
active_bytes reserved_bytes line code
all all
peak peak
5.71G 10.80G 50 @profile
51 def forward(self, x):
3.86G 8.77G 52 out = self.features(x)
2.19G 8.77G 53 out = self.classifier(out)
2.19G 8.77G 54 return out
@Stonesjtu Could you help me re-check the code above: I checkpointed the self.features
internally (which itself is a nn.Module
with nn.Sequential
inside) but added the @profile
decorator on the forward
method (as above) of the outer class that uses the features (conv2d layers)
Q1: Do you know how to explain this: If I keep the same batch-size, but change how I partition the self.features
internally (into checkpoint
ed segments), the active_bytes
of the next non-checkpointed line self.classifier(out)
also changed.
I also have two additional lines printed by the following code before the stats above printed:
Max CUDA memory allocated on forward: 1.22G
Max CUDA memory allocated on backward: 5.71G
which are generated by the code appended below.
Q2: So how to explain the reserved_bytes
, i.e. 10.80G
, 8.77G
, in the stats generated by pytorch_memlab above? Does it mean that pytorch internally allocates much more GPU memory than it really needs?
# compute output
if i < 1:
torch.cuda.reset_peak_memory_stats()
output = model(images)
loss = criterion(output, target)
if i < 1:
print('Max CUDA memory allocated on forward: ', utils.readable_size(torch.cuda.max_memory_allocated()))
# measure accuracy and record loss
acc1, acc5 = accuracy(output, target, topk=(1, 5))
losses.update(loss.detach().item(), images.size(0))
top1.update(acc1[0], images.size(0))
top5.update(acc5[0], images.size(0))
# compute gradient and do SGD step
if i < 1:
torch.cuda.reset_peak_memory_stats()
optimizer.zero_grad()
loss.backward()
optimizer.step()
if i < 1:
print('Max CUDA memory allocated on backward: ', utils.readable_size(torch.cuda.max_memory_allocated()))
from pytorch_memlab.
Related Issues (20)
- Variable 'tensor_names' referenced before assignment HOT 1
- Redirect report() to the file HOT 3
- sparse tensors do not have storage HOT 1
- Documentation for pl.LightningModule that includes many nn.Modules HOT 6
- Fail to install using pip HOT 4
- Request: solving the lack of incremental reporting in loops / functions HOT 2
- Error when running on Colab CPU instance HOT 8
- turn profile decorator on and off HOT 2
- Jupyter Support Issues HOT 1
- Does not work with torch 1.7.1+ HOT 1
- is PyTorch Profiler Used Internally? HOT 3
- weakly-referenced object no longer exists HOT 2
- Question about Used Memory and GPU memory HOT 3
- Naming Tensors within MemReporter HOT 1
- OSError: could not get source code when running LineProfiler HOT 2
- How to get memory diff between first and second batch? HOT 1
- Nor working HOT 4
- How do I record the maximum memory usage of a script?
- KeyError: "['XXX'] not in index" HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch_memlab.