Git Product home page Git Product logo

Comments (3)

Stonesjtu avatar Stonesjtu commented on August 16, 2024 1

Q1: Do you know how to explain this: If I keep the same batch-size, but change how I partition the self.features internally (into checkpointed segments), the active_bytes of the next non-checkpointed line self.classifier(out) also changed.

The column (or metric) active bytes peak all is actually the peak active bytes during the execution of this line, it's an accumulated value which depends on the active bytes before the execution of this line.

e.g. you have 4 Linear layer in nn.Sequential, checkpointing after layers[after] would consume less active bytes than checkpointing after layer[0].


Q2: So how to explain the reserved_bytes, i.e. 10.80G, 8.77G, in the stats generated by pytorch_memlab above? Does it mean that pytorch internally allocates much more GPU memory than it really needs?

According to the pytorch documentation:

PyTorch uses a caching memory allocator to speed up memory allocations. This allows fast memory deallocation without device synchronizations. However, the unused memory managed by the allocator will still show as if used in nvidia-smi.

Actually it needs the cached memory at a certain point of execution, but at the time of your torch.cuda.max_memory_allocated, it doesn't need so much memory space. You can try torch.cuda.empty_cache() before getting torch.cuda.max_memory_allocated.

from pytorch_memlab.

Stonesjtu avatar Stonesjtu commented on August 16, 2024

PyTorch caches CUDA memory to prevent repeated memory allocatation cost, you can get more information here:

https://pytorch.org/docs/stable/notes/cuda.html#cuda-memory-management

In your case, the reserved bytes should be peak memory usage before checkpointing, while active bytes should be the current memory usage after `checkpointing'

from pytorch_memlab.

nyngwang avatar nyngwang commented on August 16, 2024
## VGG.forward

active_bytes reserved_bytes line code
         all            all
        peak           peak
       5.71G         10.80G   50     @profile
                              51     def forward(self, x):
       3.86G          8.77G   52         out = self.features(x)
       2.19G          8.77G   53         out = self.classifier(out)
       2.19G          8.77G   54         return out

@Stonesjtu Could you help me re-check the code above: I checkpointed the self.features internally (which itself is a nn.Module with nn.Sequential inside) but added the @profile decorator on the forward method (as above) of the outer class that uses the features (conv2d layers)

Q1: Do you know how to explain this: If I keep the same batch-size, but change how I partition the self.features internally (into checkpointed segments), the active_bytes of the next non-checkpointed line self.classifier(out) also changed.

I also have two additional lines printed by the following code before the stats above printed:

Max CUDA memory allocated on forward:  1.22G
Max CUDA memory allocated on backward:  5.71G

which are generated by the code appended below.

Q2: So how to explain the reserved_bytes, i.e. 10.80G, 8.77G, in the stats generated by pytorch_memlab above? Does it mean that pytorch internally allocates much more GPU memory than it really needs?

# compute output
if i < 1:
    torch.cuda.reset_peak_memory_stats()
output = model(images)
loss = criterion(output, target)
if i < 1:
    print('Max CUDA memory allocated on forward: ', utils.readable_size(torch.cuda.max_memory_allocated()))

# measure accuracy and record loss
acc1, acc5 = accuracy(output, target, topk=(1, 5))
losses.update(loss.detach().item(), images.size(0))
top1.update(acc1[0], images.size(0))
top5.update(acc5[0], images.size(0))

# compute gradient and do SGD step
if i < 1:
    torch.cuda.reset_peak_memory_stats()
optimizer.zero_grad()
loss.backward()
optimizer.step()
if i < 1:
    print('Max CUDA memory allocated on backward: ', utils.readable_size(torch.cuda.max_memory_allocated()))

from pytorch_memlab.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.