I need to show that some technique called gradient checkpointing can really save GPU m

<div class="highlight highlight-source-python notranslate position-relative overflow-auto" dir="auto

What's the difference between `active_bytes` and `reserved_bytes`? about pytorch_memlab HOT 3 OPEN

stonesjtu commented on August 16, 2024

What's the difference between `active_bytes` and `reserved_bytes`?

from pytorch_memlab.

Comments (3)

Stonesjtu commented on August 16, 2024 1

Q1: Do you know how to explain this: If I keep the same batch-size, but change how I partition the self.features internally (into checkpointed segments), the active_bytes of the next non-checkpointed line self.classifier(out) also changed.

The column (or metric) active bytes peak all is actually the peak active bytes during the execution of this line, it's an accumulated value which depends on the active bytes before the execution of this line.

e.g. you have 4 Linear layer in nn.Sequential, checkpointing after layers[after] would consume less active bytes than checkpointing after layer[0].

Q2: So how to explain the reserved_bytes, i.e. 10.80G, 8.77G, in the stats generated by pytorch_memlab above? Does it mean that pytorch internally allocates much more GPU memory than it really needs?

According to the pytorch documentation:

PyTorch uses a caching memory allocator to speed up memory allocations. This allows fast memory deallocation without device synchronizations. However, the unused memory managed by the allocator will still show as if used in nvidia-smi.

Actually it needs the cached memory at a certain point of execution, but at the time of your torch.cuda.max_memory_allocated, it doesn't need so much memory space. You can try torch.cuda.empty_cache() before getting torch.cuda.max_memory_allocated.

from pytorch_memlab.

Stonesjtu commented on August 16, 2024

PyTorch caches CUDA memory to prevent repeated memory allocatation cost, you can get more information here:

https://pytorch.org/docs/stable/notes/cuda.html#cuda-memory-management

In your case, the reserved bytes should be peak memory usage before checkpointing, while active bytes should be the current memory usage after `checkpointing'

from pytorch_memlab.

nyngwang commented on August 16, 2024

## VGG.forward

active_bytes reserved_bytes line code
         all            all
        peak           peak
       5.71G         10.80G   50     @profile
                              51     def forward(self, x):
       3.86G          8.77G   52         out = self.features(x)
       2.19G          8.77G   53         out = self.classifier(out)
       2.19G          8.77G   54         return out

@Stonesjtu Could you help me re-check the code above: I checkpointed the self.features internally (which itself is a nn.Module with nn.Sequential inside) but added the @profile decorator on the forward method (as above) of the outer class that uses the features (conv2d layers)

Q1: Do you know how to explain this: If I keep the same batch-size, but change how I partition the self.features internally (into checkpointed segments), the active_bytes of the next non-checkpointed line self.classifier(out) also changed.

I also have two additional lines printed by the following code before the stats above printed:

Max CUDA memory allocated on forward:  1.22G
Max CUDA memory allocated on backward:  5.71G

which are generated by the code appended below.

Q2: So how to explain the reserved_bytes, i.e. 10.80G, 8.77G, in the stats generated by pytorch_memlab above? Does it mean that pytorch internally allocates much more GPU memory than it really needs?

# compute output
if i < 1:
    torch.cuda.reset_peak_memory_stats()
output = model(images)
loss = criterion(output, target)
if i < 1:
    print('Max CUDA memory allocated on forward: ', utils.readable_size(torch.cuda.max_memory_allocated()))

# measure accuracy and record loss
acc1, acc5 = accuracy(output, target, topk=(1, 5))
losses.update(loss.detach().item(), images.size(0))
top1.update(acc1[0], images.size(0))
top5.update(acc5[0], images.size(0))

# compute gradient and do SGD step
if i < 1:
    torch.cuda.reset_peak_memory_stats()
optimizer.zero_grad()
loss.backward()
optimizer.step()
if i < 1:
    print('Max CUDA memory allocated on backward: ', utils.readable_size(torch.cuda.max_memory_allocated()))

from pytorch_memlab.

What's the difference between `active_bytes` and `reserved_bytes`? about pytorch_memlab HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent