stonesjtu / pytorch_memlab Goto Github PK

Profiling and inspecting memory in pytorch

License: MIT License

Python 54.99% Jupyter Notebook 45.01%

pytorch_memlab's Issues

Fail to install using pip

I tried to install using pip install or pip install git+https://github.com/stonesjtu/pytorch_memlab and in both cases got this error:

Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-sx0xf6jy/pandas/setup.py", line 333
        f"{extension}-source file '{sourcefile}' not found.\n"
                                                             ^
    SyntaxError: invalid syntax
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-sx0xf6jy/pandas/

Support for gpu2,3,4

pytorch_memlab works excellent for gpu0, however, all mem of tensors turn to 0 when I used gpu2,3,4.

Thank you for the productive tools for the open source community!

Hi, thank you for very useful python library. I am just checking if there is a way to redirect report() output directly to the txt file without redirecting stdout or anything similar. Or add method that returns string fo report, something like reporter.report() -> str.

Naming Tensors within MemReporter

I'm developing a custom network layer and it subsequently has many unnamed Tensors within the MemReporter output. below is a snippet example

pcconv5_1.weight_net.layers.4.weight                 (1, 64)   512.00B
pcconv5_1.weight_net.layers.4.bias                      (1,)   512.00B
pcconv5_1.feature_layer.weight                       (1, 32)   512.00B
Tensor0                                   (1, 10, 10, 10, 8)    31.50K
Tensor1                                            (1, 8, 3)   512.00B
Tensor2                                  (1, 10, 10, 10, 12)    47.00K
Tensor3                                           (1, 12, 3)   512.00B
Tensor4                                            (1, 8, 8)   512.00B
Tensor5                                 (1, 8, 1, 1, 1, 8, 5)     1.50K
Tensor6                                 (1, 8, 1, 1, 1, 8, 32)     8.00K
Tensor7                                 (1, 8, 1, 1, 1, 8, 32)     8.00K
Tensor8                                 (1, 8, 1, 1, 1, 8, 64)    16.00K
Tensor9                                 (1, 8, 1, 1, 1, 8, 64)    16.00K
Tensor10(->Tensor4)                     (1, 8, 1, 1, 1, 8, 1)     0.00B

Is there any way to name or label these tensors (Tensors 0 through 10 for example) so that I can more easily determine which operation specifically creates them?

Jupyter Support Issues

Hi, thanks for this awesome package, it indeed help me a lot.
However, when i try to run some basic code from the jupyter demo provide by you.
I have a error [TypeError: can only concatenate tuple (not "list") to tuple], and do not know how to solve the issues.

My environment settings is

python==3.8.8
torch==1.8.1+cu102
pytorch-memlab==0.2.3
ipykernel==5.5.3
ipython==7.22.0

Do you have any idea how to solve this issues ? Thanks for your attention, hope to get your reply soon.

Memory differs due to the matrix alignment or invisible gradient buffer tensors

I was just wondering what this message means in the MemReporter output

Total Tensors: 266979334        Used Memory: 924.71M
The allocated memory on cuda:0: 1.31G
Memory differs due to the matrix alignment or invisible gradient buffer tensors

Also what is the difference between Used Memory and allocated memory?

Many thanks

Do you support profile and print the corresponding memory usage of each line while training?

Like https://pypi.org/project/memory-profiler/

Do you support profile and print the corresponding memory usage of each line while training?

OSError: could not get source code when running LineProfiler

Hi, when I trying to use the LineProfiler example, the following error message pops up:

import torch

from pytorch_memlab import LineProfiler

def inner():
... torch.nn.Linear(100, 100).cuda()
...
def outer():
... linear = torch.nn.Linear(100, 100).cuda()
... linear2 = torch.nn.Linear(100, 100).cuda()
... inner()
...
with LineProfiler(outer, inner) as prof:
... outer()
...
Traceback (most recent call last):
File "", line 1, in
File "/home/wyuancs/miniconda3/envs/response-selection/lib/python3.8/site-packages/pytorch_memlab/line_profiler/line_profiler.py", line 45, in init
self.add_function(func)
File "/home/wyuancs/miniconda3/envs/response-selection/lib/python3.8/site-packages/pytorch_memlab/line_profiler/line_profiler.py", line 59, in add_function
first_line = inspect.getsourcelines(func)[1]
File "/home/wyuancs/miniconda3/envs/response-selection/lib/python3.8/inspect.py", line 979, in getsourcelines
lines, lnum = findsource(object)
File "/home/wyuancs/miniconda3/envs/response-selection/lib/python3.8/inspect.py", line 798, in findsource
raise OSError('could not get source code')
OSError: could not get source code

How could I solve this problem? I tried to search online, but not find any solution

Negative 'diff max' and 'diff peak' values?

What do the negative values mean? For example below is part of what I captured when running some code:

Line # Max usage   Peak usage diff max diff peak  Line Contents
===============================================================
    56                                           @profile
    58                                           def main():
    59     0.00B        0.00B   -4.14G   -4.64G      dtype, multidevice, backward_device = setup_gpu()

weakly-referenced object no longer exists

Hello,

Firstly, congratulations for memlab. I have been trying to use it in Google Colab, but sometimes this error happens:

ReferenceError Traceback (most recent call last)
in ()
33 print('Reporter!!!!!!!')
34 reporter = MemReporter()
---> 35 reporter.report()

2 frames
/usr/local/lib/python3.7/dist-packages/pytorch_memlab/mem_reporter.py in (.0)
62 #FIXME: make the grad tensor collected by gc
63 objects = gc.get_objects()
---> 64 tensors = [obj for obj in objects if isinstance(obj, torch.Tensor)]
65 for t in tensors:
66 self.device_mapping[t.device].append(t)

ReferenceError: weakly-referenced object no longer exists

In my code, I use MemReport() just right after the training phase, i.e.:

for epoch in range(num_epochs):
net.train()
...
# end of training

print('Reporter!!!!!!!')
reporter = MemReporter()
reporter.report()

Do you know what is the problem?

Thank you and regards.

Jupyter notebook support

Hi,
Thanks for the super useful package. Currently, it seems it is not possible to leverage it under a jupyter notebook. If I create a new notebook and add a cell with the contents

import torch
from pytorch_memlab import profile
@profile
def work():
    linear = torch.nn.Linear(100, 100).cuda()
    linear2 = torch.nn.Linear(100, 100).cuda()
    linear3 = torch.nn.Linear(100, 100).cuda()

work()

I get no results printed.

However, when I use MemReporter I do get results:

import torch
from pytorch_memlab import  MemReporterrter

linear = torch.nn.Linear(1024, 1024).cuda()
reporter = MemReporter()
reporter.report()

Element type                                            Size  Used MEM
-------------------------------------------------------------------------------
Storage on cuda:0
Parameter0                                      (1024, 1024)     4.00M
Parameter1                                           (1024,)     4.00K
Parameter2                                      (1024, 1024)     4.00M
Parameter3                                           (1024,)     4.00K
-------------------------------------------------------------------------------
Total Tensors: 2099200 	Used Memory: 8.01M
The allocated memory on cuda:0: 8.01M
-------------------------------------------------------------------------------

Additionally, I wonder if it is possible to add a line magic similar to %lprun for profiling cells.

Variable 'tensor_names' referenced before assignment

pytorch_memlab/pytorch_memlab/mem_reporter.py

Line 37 in ec9a72f

for param, name in tensor_names.items():

The variable tensor_names is referenced before assignment if no model is passed into the MemReporter. Need to move it into the above if statement.

Error when running on Colab CPU instance

When I attempt to import anything from pytorch_memlab on a Google Colab CPU instance, I get the following error:

Could not reset CUDA stats and cache: 'NoneType' object has no attribute 'lower'

Documentation for pl.LightningModule that includes many nn.Modules

I have a pl.LightningModule (pytorch-lightning) that includes many nn.Modules.

It's not obvious from the documentation how I can profile all the LightningModule tensors and the subordinate Module tensors. Could you please provide an example?

turn profile decorator on and off

Hello,

This is probably a naive question. I would like to turn on and off the decorator without commenting things out. What would be an elegant way to achieve this? I have something like the following in mind but I don't know how to achieve this. Would you like to share some thoughts on this? Many thanks!

profile_flag = True # False

@profile(profile_flag)
def func1():
    ...

@profile(profile_flag)
def func2():
    ...

Clearing GPU/CUDA memory

This is a very interesting library and will hopefully help with constant OOM errors. Beyond profiling the memory usage, does memlab offer a way to clear GPU memory (if we do find the GPU holding onto tensors after execution)? Thanks!

Does not work with torch 1.7.1+

The demo code does not work with this version of PyTorch. No output is printed

import torch
from pytorch_memlab import LineProfiler

def inner():
    torch.nn.Linear(100, 100).cuda()

def outer():
    linear = torch.nn.Linear(100, 100).cuda()
    linear2 = torch.nn.Linear(100, 100).cuda()
    inner()

with LineProfiler(outer, inner) as prof:
    outer()
    prof.display()

Memory leak when calling mem reporter

The tensor_name and tensor_device mapping dict maintains a reference to the key / value tensor, which prevents python collecting unused tensors.

is PyTorch Profiler Used Internally?

Great repository! Thanks for creating this :)
Just have a quick question. Is pytorch_memlab tracking the memory usage by torch.autograd.profiler? thanks a lot! Just want to clarify this to see if I need to use both :)

What's the difference between `active_bytes` and `reserved_bytes`?

I need to show that some technique called gradient checkpointing can really save GPU memory usage during backward propagation. When I see the result there are two columns on the left showing active_bytes and reserved_bytes. In my testing, while active bytes read 3.83G, the reserved bytes read 9.35G. So why does PyTorch still reserve that much GPU memory?

KeyError: "['XXX'] not in index"

I added the @profile decorator to a custom module method (after from pytorch_memlab import LineProfiler, profile) and the output is the truncated display with the error KeyError: "['XXX'] not in index" for my method XXX. What's the meaning of this?

Question: discrepancy between MemReporter 'Used Memory' and 'nvtop'

I'm trying to understand why there is a large discrepancy (between the 'Used Memory' or 'allocated memory on cuda:0' from MemReporter, versus what nvtop's (or nvidia-smi)'s reported memory usage. For example, while training a model (RetinaNet from detectron2, for context), I'm seeing ~285M from MemReporter, and ~15G from nvtop/nvidia-smi.

Is this all due to the autograd graph? I've been trying to read more about this but haven't found good references.

Thanks for your work on this library, and any pointers you can share about this!

sparse tensors do not have storage

Request: solving the lack of incremental reporting in loops / functions

This is a great tool for finding where the memory has gone - thank you!

I have a request:

add support for partial loop unrolling - reporting first iterations separately
same for functions

Problem:

Memory is being reported incorrectly for any loop or a function, since those are repeated multiple times, it doesn't show how the peak/active memory counters were progressing in, say, first iteration and shows the data for the whole loop/function after it has run more than once. It's correct for the final iteration, but not the first one and it's crucial to see the first moment the memory usage has gone up and the peak.

This functionality is typically not needed for a normal memory profiler, since all we want to know there is frequency of calls and total usage for the given line, but here if it's an investigation tool we need to see the first few uses. I hope I was able to convey the issue clearly.

I tried to solve this manually by unrolling the loop in the code I was profiling and replicating the iteration code multiple times, which is not very sustainable.

It's also typical that there is a change in memory footprint from iteration 1 to 2 and then things stabilize at iteration 3 and onward (if there is no leak that is). So probably there could be an option to record and prints 3 different stats:

iteration 1
iteration 2
all iterations (like it's done now)

same applies to functions.

I'm thinking perhaps the low-hanging fruit is to give users an option to record a loop iteration or function just the first time it's run and report that. That already would be very useful and perhaps not to difficult to implement.

Thank you!

How do I record the maximum memory usage of a script?

I want to measure the max memory used during the execution of a script.

I think doing this naively will count max reserved memory, while I want the max memory actually used.

I do not need any further details.

Will this have an overhead? I want to do this on scripts taking hours to finish.

How to get memory diff between first and second batch?

I'm using MemReporter
After run reporter.report()
I don't know which part different since there are lot of layer and tensor being output
Wondering if there is a way to find diff easily

Nor working

for me it does not work at all
I even tried your example code, without profiler does nothing, with profiler tag it does not compile, see bellow:

Where to specify the device?

From the readme:

You can also filter the device to report on by passing extra arguments: report(device=torch.device(0))

The device parameter seems to work neither with the report decorator nor with the MemReporter. If this feature is still available, the readme should be clarified.

Question about Used Memory and GPU memory

Hi,

Thanks a lot for providing this very helpful library.
I have a question about Used Memory and GPU memory.
I followed your code to get the Used Memory of my model for one batch (size: (16, 3, 224, 224)). It is 928.02M. But the same code for the same model could not run in 2070 super GPU (8 GiB capacity).
928.02 M vs 8 GiB. What is the difference between the Used Memory in your code and GPU memory? Thanks.

Here are the running results.

ImportError: cannot import name 'set_target_gpu'

I installed pytorch_memlab with pip3 install pytorch_memlab and I get this error when trying to set the target GPU:

    from pytorch_memlab import profile, set_target_gpu
ImportError: cannot import name 'set_target_gpu'

stonesjtu / pytorch_memlab Goto Github PK

pytorch_memlab's Issues

Recommend Projects

Recommend Topics

Recommend Org