pytorch / torchdynamo Goto Github PK

View Code? Open in Web Editor NEW

977.0 47.0 125.0 6.35 MB

A Python-level JIT compiler designed to make unmodified PyTorch programs faster.

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

torchdynamo's Introduction

NOTICE: TorchDynamo has moved

We have moved TorchDynamo to pytorch/pytorch

import torchdynamo is now import torch._dynamo
import torchinductor is now import torch._inductor

For Documentation: https://pytorch.org/docs/stable/torch.compiler.html

License

TorchDynamo has a BSD-style license, as found in the LICENSE file.

torchdynamo's People

Contributors

Stargazers

Watchers

Forkers

wushirong comaniac sanchitintel shunting314 anijain2305 powderluv frank-wei yf225 jroesch jansel yanboliang imaginary-person xuzhao9 techthiyanes bertmaher jiayisunx mookel ramiro050 eellison pyjhzwh voznesenskym zzpmiracle yzhliu qinggangz leslie-fang-intel vesuppi mmatheson yushangdi shingjan gglin001 reiase yelite yzl-eup shiyu22 ezyang ynimmaga hassan11196 xwang233 wenzhe-nrv migeed-z wschin ashishbdatta sspintel ssusantachary yoshida-ryuhei mbrukman sangongs deanofthewebb ivanyashchuk marcelroed jjsjann123 desertfire poodarchu armbiant cclauss laplacekorea qihqi o-hau211001w jgong5 o-hau211003x rdspring1 juda strint ludgerpaehler amzacatalin jspark1105 ohoa210906 hoaong210906 vtranh210916 otranh210913 o-hau230604 o-hau211004g alaatekleh oh-au230711 chunyuan-w ohau210917a ohau210916p daukeo-e3 ihpeo-re3 stumpos nouamanetazi rasong-77 jiltseb blzheng guobing-chen xiaobingsuper valentine233 shantanunair yunxing jiawenliu64 itang00 dumpmemory zyq8709 typedfemale eikanwang krovatkin zdevito eaidova msaroufim muhammadmoizulhaq

torchdynamo's Issues

Support formatted literal strings (f-strings)

f-strings produce the FORMAT_VALUE and BUILD_STRING bytecodes that are yet supported.

These bytecodes could be supported by rewriting them to call the related functions (str.format, str, repr, ascii, etc).

Likely the most useful case of this would be constants (or things TorchDynamo specializes on like cls.__name__). Something like f"foo {self.__class__.__name__} bar {x.shape}" should not need to cause a graph break.

Another useful case would be deferring the string formatting calls to the end of the graph. If there are pytorch ops in the f-string, currently they won't be included in the graph.

Build issues on Mac OS, build script complains about gcc not supporting c++14

Followed the instructions on https://github.com/facebookresearch/torchdynamo to build on Mac OS, got the following error: https://gist.github.com/vkuzo/a9b316590d0eb043f347ae2c0e8c209f . Note: pytorch/pytorch builds without issues in my setup.

Relevant line:

/Users/vasiliy/pytorch/torch/include/torch/csrc/api/include/torch/all.h:4:2: error: C++14 or later compatible compiler is required to use PyTorch.

Note: can fix this locally by adding extra_compile_args=["-std=c++14"] to the torchdynamo._guards portion in setup.py.

[fx2trt] TRT issue

~~hf_Reformer: CUDA error: device-side assert triggered~~
fastNLP_Bert: [TRT] [E] 3: [layers.h::setAxis::624] Error Code 3: API Usage Error

Issue with demucs model

The demucs model will throw at:

cuda eval demucs Traceback (most recent call last):
File "torchbench.py", line 957, in
main()
File "torchbench.py", line 824, in main
run_one_model(
File "torchbench.py", line 904, in run_one_model
assert not torchdynamo.utils.is_jit_model(submod)
AssertionError

Here is the command to repro: python torchbench.py -dcuda --speedup-fx2trt-fp16 --only demucs

Support writing to closures while inlining

TorchDynamo support most cases of closures, however this one is not supported

def make_counter():
    x = torch.randn(10)

    def counter():
        nonlocal x
        x = x + 1
        return x

    return counter

@torchdynamo.optimize(torchdynamo.testing.CompileCounter(), nopython=True)
def fn(counter):
    return counter() + counter()


fn(make_counter())

The error (when in nopython=True mode) is:

...
torchdynamo.exc.Unsupported: write to __closure__ while inlining
Processing original code:
  File "/home/jansel/torchdynamo/tests/test_misc.py", line 959, in fn
    return counter() + counter()
  File "/home/jansel/torchdynamo/tests/test_misc.py", line 953, in counter
    x = x + 1

This will work if the closure is in the top-level frame. It will also work if the closure is defined within the captured scope. But in this case we can't actually emit a STORE_DEREF bytecode because "x" is not in our function's freevars.

To support this case we need to rewrite the STORE_DEREF to do something like:

def fn(counter):
   v0 = counter.__closure__[0].cell_contents
   v1 = v0 + 1
   v2 = v1 + 1
   counter.__closure__[0].cell_contents = v2
   return v1 + v2

We already do the first part here:
https://github.com/facebookresearch/torchdynamo/blob/1d1ef71d111d3df894f535ebf8ff6088c0c2e1e1/torchdynamo/variables/functions.py#L142
When we read the value of the closure.

We need to make it writable though. For that we need to register the cell using AttributeMutationExisting and then we can use side_effects.store_cell() on it.

Other types of cells are handled in
https://github.com/facebookresearch/torchdynamo/blob/1d1ef71d111d3df894f535ebf8ff6088c0c2e1e1/torchdynamo/symbolic_convert.py#L1152
and
https://github.com/facebookresearch/torchdynamo/blob/1d1ef71d111d3df894f535ebf8ff6088c0c2e1e1/torchdynamo/symbolic_convert.py#L1163

Note that this explicit cell handling is specific to inlining. When we aren't inlining we treat closures like normal variables with different load/store bytecodes.

[fx2trt][fx] symbolically traced variables cannot be used as inputs to control flow

Error msg:
symbolically traced variables cannot be used as inputs to control flow
pytorch_stargan, pytorch_CycleGAN_and_pix2pix, tts_angular, demucs

Support partially dynamic shapes

In update 5 we wrote:

Unfortunately, the problem of dynamic shapes is more complex than one might think. Enabling torchdynamo.config.dynamic_shapes will cause new graph breaks. Many models have code like assert x.shape == (1,2,3), if x.size(1) == 10, math.sqrt(x.shape[-1]), etc. This Python code operating on integer shapes is the defacto way to express many things in PyTorch. With static shapes, TorchDynamo can constant-propagate this stuff away, however, with dynamic shapes it will break the graph.

My current thinking is a “partially specialized shapes” mode in TorchDynamo. The basic idea would be that all shape start as fully dynamic, but then TorchDynamo would convert a tensor’s shapes to be static when the user called Tensor.size() and passed the result to a non-PyTorch operation. This would allow dynamic shapes most of the time, but still allow bigger graphs when users operate directly on shapes as integers.

To implement an initial version of this:

First build the analysis to add a TensorVariable().input_sources: Set[Source].

def foo(a, b):
  c = a + b

In this example:

a.input_souces = {a.source}
b.input_souces = {b.source}
c.input_souces = {a.source, b.source}

This is just a straight forward data flow analysis where sources are combined. It looks similar to the shape propagation currently implemented in TensorVariable.create.

Next, split GuardBuilder.TENSOR_MATCH into TENSOR_MATCH_STATIC and TENSOR_MATCH_DYNAMIC. The underlying TensorGuards object implemented in C++ already has these two modes, so it just requires having the generated code have two instances of that object.

Finally, modify how TensorVariable handles shape specialization. Defer setting TensorVariable().size and TensorVariable().stride until the user calls Tensor.size(). Note there are a few different ways to get the size, so search for usages of TensorVariable.size.

When .size is called, add a new guard for TENSOR_MATCH_STATIC on all the input_sources. (You can remove the now redundant TENSOR_MATCH_DYNAMIC guard in guard codegen.)

This should give you something that works and passes tests.

Improvements initial prototype:

We need to handle dynamic shape ops like nonzero, where, repeat, etc. Modify the analysis to mark tensors flowing from these ops, and break the graph if the user calls size on them. You can search for config.dynamic_shapes to find where we currently conditionally break the graph on those ops.
If a user passes the size directly to another PyTorch op, for example torch.empty(x.size()) we don't need to shape specialize and can just put the call to .size() in the graph. Similarly, simple math ops on sizes can be included in the graph. To handle this we will need a SizeVariable() to track and decide what can go in the graph and what requires specialization.
We don't need to specialize every dimension if the user code only uses some dimensions. We need better shape analysis to make this happen though. @eellison might be able to provide pointers for better shape analysis.

cc @ezyang

Investigate issue with python key tracing and LSTM

Repro:

env PYTHONKEY_VERBOSE=1 ./torchbench.py --no-skip --python-key -n 1 -k demucs

Python key tracing produces the following warning:

WARNING:torchdynamo.optimizations.python_key:returning real tensor? call_function _operator.getitem <built-in function getitem> (mod_model_lstm_lstm, 0) {}

which is coming from this line:
https://github.com/facebookresearch/torchdynamo/blob/e84f9fee18ae5ab7bfca5504e200503de174efb5/torchdynamo/optimizations/python_key.py#L97

This makes me worries we are missing some operators. While python key tracing, everything should be a functorch._src.python_key.PythonTensor, yet somehow an unwrapped tensor is leaking through.

I suspect our pytree walk of the module hierarchy might be missing some LSTM-related wrapper class. Though I haven't confirmed this.

cc @Chillee

Verify python 3.7 still works and add it to github actions CI

It has been a while since I tested Python 3.7, so it is possible/likely something could be broken.

We should:

Verify it still works, and fix any issues in tests/torchbench
Add a github actions workflow similar to the python 3.8 one to prevent it from getting broken in the future.

Debug python key tracing error with hf_Reformer

Repro

./torchbench.py --no-skip --python-key -n 1 -k hf_Reformer

Error:

Traceback (most recent call last):
  File "/home/jansel/torchdynamo/torchdynamo/output_graph.py", line 317, in call_user_compiler
    compiled_fn = self.compiler_fn(gm, self.example_inputs())
  File "/home/jansel/torchdynamo/torchdynamo/optimizations/python_key.py", line 137, in python_key
    gm, make_wrapper = python_key_normalize(gm, example_inputs)
  File "/home/jansel/torchdynamo/torchdynamo/optimizations/python_key.py", line 120, in python_key_normalize
    graph = tracer.trace(fake_signature(fn_for_tracing, nargs))
  File "/home/jansel/pytorch/torch/fx/_symbolic_trace.py", line 577, in trace
    self.create_node('output', 'output', (self.create_arg(fn(*args)),), {},
  File "<string>", line 1, in <lambda>
  File "/home/jansel/torchdynamo/torchdynamo/optimizations/python_key.py", line 114, in fn_for_tracing
    out = PatchingInterpreter(gm).run(*args[params_len:])
  File "/home/jansel/pytorch/torch/fx/interpreter.py", line 120, in run
    self.env[node] = self.run_node(node)
  File "/home/jansel/torchdynamo/torchdynamo/optimizations/python_key.py", line 77, in run_node
    result = super().run_node(n)
  File "/home/jansel/pytorch/torch/fx/interpreter.py", line 147, in run_node
    return getattr(self, n.op)(n.target, args, kwargs)
  File "/home/jansel/pytorch/torch/fx/interpreter.py", line 241, in call_method
    return getattr(self_obj, target)(*args_tail, **kwargs)
RuntimeError: DispatchKey PythonTLSSnapshot doesn't correspond to a device

This is coming from a call to Tensor.new it seems.

cc @Chillee @anijain2305

Fix issues in detectron2_maskrcnn

./torchbench.py --no-skip -k detectron2_maskrcnn

ERROR FROM offset=6 filename /home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/structures/masks.py 522 KeyError
========== TorchDynamo Stack Trace ==========
Traceback (most recent call last):
  File "/home/jansel/torchdynamo/torchdynamo/convert_frame.py", line 158, in _convert_frame_assert
    code = transform_code_object(frame.f_code, transform)
  File "/home/jansel/torchdynamo/torchdynamo/bytecode_transformation.py", line 284, in transform_code_object
    transformations(instructions, code_options)
  File "/home/jansel/torchdynamo/torchdynamo/convert_frame.py", line 134, in transform
    tracer.run()
  File "/home/jansel/torchdynamo/torchdynamo/symbolic_convert.py", line 274, in run
    and self.step()
  File "/home/jansel/torchdynamo/torchdynamo/symbolic_convert.py", line 252, in step
    getattr(self, inst.opname)(inst)
  File "/home/jansel/torchdynamo/torchdynamo/symbolic_convert.py", line 384, in IMPORT_FROM
    self.LOAD_ATTR(inst)
  File "/home/jansel/torchdynamo/torchdynamo/symbolic_convert.py", line 608, in LOAD_ATTR
    result = BuiltinVariable(getattr).call_function(
  File "/home/jansel/torchdynamo/torchdynamo/variables/builtin.py", line 212, in call_function
    result = handler(tx, *args, **kwargs)
  File "/home/jansel/torchdynamo/torchdynamo/variables/builtin.py", line 461, in call_getattr
    member = obj.value.__dict__[name]
KeyError: 'paste_masks_in_image'
========== Exception (above) while processing ==========
  File "./torchbench.py", line 1019, in <module>
    main()
  File "./torchbench.py", line 913, in main
    run_one_model(
  File "./torchbench.py", line 981, in run_one_model
    new_result = model_iter_fn(model, example_inputs)
  File "./torchbench.py", line 469, in forward_pass
    def forward_pass(mod, inputs, collect_outputs=True):
  File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1111, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 122, in forward
    def forward(self, batched_inputs: List[Dict[str, torch.Tensor]]):
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 174, in inference
    def inference(
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 174, in inference
    def inference(
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 174, in inference
    def inference(
  [Previous line repeated 1 more time]
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 229, in _postprocess
    @staticmethod
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/modeling/postprocessing.py", line 9, in detector_postprocess
    def detector_postprocess(
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/modeling/postprocessing.py", line 9, in detector_postprocess
    def detector_postprocess(
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/modeling/postprocessing.py", line 9, in detector_postprocess
    def detector_postprocess(
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/modeling/postprocessing.py", line 67, in detector_postprocess
    results.pred_masks = roi_masks.to_bitmasks(
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/structures/masks.py", line 517, in to_bitmasks
    @torch.jit.unused
========== End debug info ==========

Debug issue with tacotron2

tacotron2 was recently added to torchbench with get_module() support, and it seems it does not work properly.

@anijain2305 reported an error running

python torchbench.py --devices=cuda --only=tacotron2

I haven't had a chance to look into this one yet, but creating an issue so that it does not get lost. Feel free to add more details @anijain2305.

[fx2trt] op support

hf_T5:
torch.rsqrt, pow, acc_ops.to,torch.isinf, any, float, type_as,

hf_GPT2:
acc_ops.split, torch.where, type

soft_actor_critic:
exp,torch.functional.broadcast_tensors

TensorRT virtualMemoryBuffer internal error

Observed the error msg after run. Maybe related with how torchdynamo release the resources?

python torchbench.py -dcuda --speedup-fx2trt-fp16 --only mobilenet_v2
cuda eval mobilenet_v2 [04/05/2022-15:43:11] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.1
[04/05/2022-15:43:14] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5
[04/05/2022-15:43:48] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.1
[04/05/2022-15:43:48] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5
[04/05/2022-15:43:48] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.1
[04/05/2022-15:43:48] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5

Supported node types in the model:
acc_ops.conv2d: ((), {'input': torch.float16, 'weight': torch.float16})
acc_ops.batch_norm: ((), {'input': torch.float16, 'running_mean': torch.float16, 'running_var': torch.float16, 'weight': torch.float16, 'bias': torch.float16})
acc_ops.hardtanh: ((), {'input': torch.float16})
acc_ops.add: ((), {'input': torch.float16, 'other': torch.float16})
acc_ops.adaptive_avg_pool2d: ((), {'input': torch.float16})
acc_ops.flatten: ((), {'input': torch.float16})
acc_ops.linear: ((), {'input': torch.float16, 'weight': torch.float16, 'bias': torch.float16})

Unsupported node types in the model:

graph is split into _run_on_acc_0
Similarity score=0.9999595284461975
7.243x p=0.00
Unexpected Internal Error: [virtualMemoryBuffer.cpp::~StdVirtualMemoryBufferImpl::121] Error Code 1: Cuda Runtime (driver shutting down)

[fx2trt] model issues that hard to support

pyhpc_turbulent_kinetic_energy does not support fp16
moco: trace error in torch.distributed.all_gather

public repo for FX to TRT

next step is to discuss with Yinghai.

[fx2trt] acc op getitem throw error

dlrm model

FAMBench Integration for Training

Debug issue with AOTAutograd for speech_transformer/hf_GPT2/hf_T5

The three models - speech_transformer, hf_GPT2 and hf_T5 fail with similar type of error signature.

TorchDynamo finds static subgraphs and sends them to AOT Autograd. AOT Autograd generates the forward and backward graphs. The output of AOT Autograd is a autograd.Function (code). AOT Autograd saves some tensors for the backward pass gradient computation in the forward pass.

The issue arises in the backward pass. When we read the saved_tensors, one of the item in the saved_tensors is not of Tensor type anymore. This causes cryptic error messages like the one below. And this type changes from run to run. I have seen immutable_dict, tuple and even weakref and builtin.

ERROR:root:unhandled error
Traceback (most recent call last):
  File "torchbench.py", line 1006, in run_one_model
    new_result = model_iter_fn(model, example_inputs)
  File "torchbench.py", line 482, in forward_and_backward_pass
    def forward_and_backward_pass(mod, inputs, collect_outputs=True):
  File "torchbench.py", line 482, in forward_and_backward_pass
    def forward_and_backward_pass(mod, inputs, collect_outputs=True):
  File "torchbench.py", line 482, in forward_and_backward_pass
    def forward_and_backward_pass(mod, inputs, collect_outputs=True):
  [Previous line repeated 2 more times]
  File "/fsx/users/anijain/functorch/functorch/_src/monkey_patching.py", line 97, in _backward
    return _old_backward(*args, **kwargs)
  File "/data/home/anijain/miniconda/envs/pytorch_dev/lib/python3.8/site-packages/torch/_tensor.py", line 395, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/data/home/anijain/miniconda/envs/pytorch_dev/lib/python3.8/site-packages/torch/autograd/__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/data/home/anijain/miniconda/envs/pytorch_dev/lib/python3.8/site-packages/torch/autograd/function.py", line 253, in apply
    return user_fn(self, *args)
  File "/fsx/users/anijain/torchdynamo/torchdynamo/eval_frame.py", line 58, in _fn
    return fn(*args, **kwargs)
  File "/fsx/users/anijain/functorch/functorch/_src/aot_autograd.py", line 188, in backward
    out = normalize_as_list(compiled_bw(*ctx.saved_tensors, *contiguous_args))
  File "/fsx/users/anijain/torchdynamo/torchdynamo/eval_frame.py", line 58, in _fn
    return fn(*args, **kwargs)
  File "/data/home/anijain/miniconda/envs/pytorch_dev/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
RuntimeError: forward() Expected a value of type 'Tensor (inferred)' for argument 'primals_14' but instead found type 'tuple'.
Inferred 'primals_14' to be of type 'Tensor' because it was not annotated with an explicit type.
Position: 19
Value: ('___check_obj_id', '___check_tensors', '___check_type_id', '___guarded_code')

I further looked into C++ and starting printing the type of objects while saving the tensors at the end of forward pass, and reading them back in backward pass. I observed the weird behavior in this line -(https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/python_function.cpp#L834). This is called in the backward pass, when we call ctx.saved_tensors.

When I print the unpacked_var, it is a tensor. It has its dim, I can print its shape and everything.
But Py_TYPE(value)→tp_name equals immutable_dict here.
The unpack_fn is basically THPVariable_Wrap - (https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/python_function.cpp#L849).

For completeness, adding images for the failure

Repro - python torchbench.py --training --devices=cuda --accuracy-aot-nop --only=hf_GPT2

Repro - python torchbench.py --training --devices=cuda --accuracy-aot-nop --only=speech_transformer

Repro - python torchbench.py --training --devices=cuda --accuracy-aot-nop --only=hf_T5

Add support for Python 3.10

Python 3.9 (#33) should be added before 3.10.

Known issues for Python 3.10 support:

Python 3.10 and later have a new method for mapping the bytecode index to line number, deprecating co_lnotab. See PEP 626. lnotab_writer in TorchDynamo needs to be rewritten to support the new format. See #36 for more on line numbers.

There are 9 new/changed bytecodes in 3.10

COPY_DICT_WITHOUT_KEYS
GET_LEN
MATCH_MAPPING
MATCH_SEQUENCE
MATCH_KEYS
MATCH_CLASS
Look like easy to add aliases for already supported things.

MAKE_FUNCTION
Handling of annotations changed. We should add a test case to make sure annotations work for nested functions.

ROT_N
Seems easy to add. We should update usage of rot_n_helper to use this new bytecode instead.

GEN_START
This could break handling of inline generators. Need to look into this in more detail.

Handle -inf for Torchscript of FX graphs

Repro

import torch
import torch.fx

x = torch.randn(4, 5)
mask = torch.randn(4, 5) > 0.5

def f(x, mask):
    # return x.masked_fill_(mask, 1.0) # PASSES
    return x.masked_fill_(mask, float("-inf"))

print(f(x, mask))

# Only fails when symbolic_trace
fx_mod = torch.fx.symbolic_trace(f)
scripted_f = torch.jit.script(fx_mod)
print(scripted_f(x, mask))

@eellison

Support STORE_GLOBAL side effects

The STORE_GLOBAL opcode is not supported.

This could be handled similar to how attribute mutation side effects work.

[fx2trt] unclear issues

vision_maskrcnn: Error
detectron2_maskrcnn: various tracing issues
Super_SloMo: various tracing issues
opacus_cifar10: tracing issues
hf_BigBird: op support, tracing issues, shape '[1, 12, 62, 192]' is invalid for input of size 11904
pyhpc_equation_of_state: nan output

torchdynamo+fx2trt hangs torchbench opacus_cifar10

Command to reproduce: python run.py opacus_cifar10 -d cuda -t eval --torchdynamo fx2trt

The process never halts (waited for 1 hour)

[fx2trt][fx] Proxy' object does not support item assignment

Error msg like
Proxy' object does not support item assignment
Affected models: ~~hf_Longformer~~, pyhpc_isoneutral_mixing, speech_transformer

Skip non-Tensor/Module frames

People may have non-PyTorch code run under TorchDynamo. We should make sure TorchDynamo does nothing in this case.

If TorchDynamo reaches the end of the frame without finding PyTorch ops, it will just run the frame normally. However, if there are unsupported things that prevent a whole-graph, TorchDyanmo could generate specialized frames for non-PyTorch code. This should be correct, but could add extra overhead.

To improve this we should expand the logic in this function:
https://github.com/facebookresearch/torchdynamo/blob/44971ffd9a7e6798b7868a592c337acb75bd1d2d/torchdynamo/symbolic_convert.py#L1008
That function controls if TorchDynamo should break the graph and generate a resume_at_xx function to pick up after an unsupported thing.

The logic I would propose is: examine the stack, locals, and globals referenced by co_names; if there is a tensor/nn.Module/torch.* anywhere then, then keep doing what we do now; if there is not, just bail out and switch to normal execution.

AOT Autograd - Functionalization pass with torch.dispatch

Run functionalization to resolve mutation related errors in AOT Autograd.

Get TorchBench result on Dynamo inference TRT

The CI is ready, working on understanding the results - quite different from what we get from torchbench.py

Usage Tutorial

Create a tutorial on how to use it:

GPU: Forward, Backwards
CPU: ??

TorchTidy Fixes in TorchBench

Torch Tidy recommends which backends to use, and those backend calls are added in TorchBench.

Add support for Python 3.9

The process for adding support for Python 3.9 is as follows.

First, examine the 10 new bytecodes in 3.9:

RERAISE
raising exceptions is currently not supported and breaks the graph. So this can just call unimplemented() for now. See issue pytorch/pytorch#93720 for more on exceptions.

WITH_EXCEPT_START
might affect support for with no_grad(): and related ops

LOAD_ASSERTION_ERROR
LIST_TO_TUPLE
LIST_EXTEND
SET_UPDATE
DICT_UPDATE
DICT_MERGE
IS_OP
CONTAINS_OP
these are all simple aliases for things TorchDynamo already supports and should be easy to handle

Next, update the versions supported in setup.py so the build works.

Next, iteratively add support for ops and fix issues until all tests pass. pytest tests to run tests.

Next, iteratively fix issues in ./torchbench.py until all models pass and match the coverage of Python 3.8

Fix python key tracing errors with quantized models

Repro

./torchbench.py --no-skip --python-key -n 1 -k mobilenet_v2_quantized

Partial output:

ERROR:torchdynamo.optimizations.python_key:exception running call_function torch.quantize_per_tensor (inputs_0_, mod_features_0_0_input_scale_0, mod_features_0_0_input_zero_point_0, torch.quint8) {}
Traceback (most recent call last):
  File "/home/jansel/torchdynamo/torchdynamo/optimizations/python_key.py", line 77, in run_node
    result = super().run_node(n)
  File "/home/jansel/pytorch/torch/fx/interpreter.py", line 147, in run_node
    return getattr(self, n.op)(n.target, args, kwargs)
  File "/home/jansel/pytorch/torch/fx/interpreter.py", line 219, in call_function
    return target(*args, **kwargs)
  File "/home/jansel/functorch/functorch/_src/python_key.py", line 112, in __torch_dispatch__
    return wrap_with_proxy(real_out, proxy_out)
  File "/home/jansel/functorch/functorch/_src/python_key.py", line 104, in wrap_with_proxy
    return PythonTensor(e, proxy)
  File "/home/jansel/functorch/functorch/_src/python_key.py", line 60, in __new__
    proxy.node.meta['tensor_meta'] = _extract_tensor_metadata(r)
  File "/home/jansel/pytorch/torch/fx/passes/shape_prop.py", line 48, in _extract_tensor_metadata
    qscheme = result.qscheme()
RuntimeError: toIValue() cannot handle converting to type: QScheme

This effects 2 quantized modules with the same error.

cc @Chillee @anijain2305

Torch Bench Integration Training

Fix AOT Autograd bugs discovered by TorchDynamo on Torchbench

Add a tag in functorch repo to track these.

Run torchdynamo+nvfuser on PyTorch's test suite

@ezyang suggested running TorchDynamo+nvFuser on the PyTorch test suite to get a better sense of corner cases we might be missing.

This should be doable using the unittest setUp/tearDown hooks.

Pip Installs

pip install dynamo
pip install trt ? (or other TRT install?)

Initial training support with AOT Autograd + nvFuser

This one is already in flight and close to landing. So this is a placeholder task to track progress.

CICD setup to build pip/conda packages

Someone should be able to pip install torchdynamo and not need to install from source.

If we do binary releases we may need to pin to specific PyTorch versions, so it might be better to ship only source packages to pypi.

TorchTidy Integration

Have TorchTidy recommend to use specific backends and make the code changes.

Test TorchDynamo on a wider variety of models

Now that TorchDynamo is working on most TorchBench models, we should start looking for additional models to test on to continue improving coverage and robustness.

All models are welcome here, so if you have specific models in mind from a use case you are familiar with please test them and report your experiences.

For testing on Meta models, @dzhulgakov suggested that @houseroad and @divchenko would be able to provide pointers to increasingly complex models to test on.

Initial support - AOTAutograd - Test performance for TorchBench models

Initial support = 80% of models

Debug python key tracing error with hf_BigBird

edited by @ezyang

Repro

./torchbench.py --no-skip --python-key -n 1 -k hf_BigBird --devices cuda --float32

old stuff (this output is no longer master

Output

cpu  eval  hf_BigBird                         ERROR:root:unhandled error
Traceback (most recent call last):
  File "./torchbench.py", line 911, in run_one_model
    new_result = model_iter_fn(model, example_inputs)
  File "./torchbench.py", line 456, in forward_pass
    def forward_pass(mod, inputs, collect_outputs=True):
  File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1111, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 2321, in forward
    @add_start_docstrings_to_model_forward(BIG_BIRD_INPUTS_DOCSTRING.format("(batch_size, sequence_length)"))
  File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1111, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 1920, in forward
    @add_start_docstrings_to_model_forward(BIG_BIRD_INPUTS_DOCSTRING.format("(batch_size, sequence_length)"))
  File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1111, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 1615, in forward
    layer_outputs = layer_module(
  File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1111, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 1451, in forward
    def forward(
  File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1111, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 1381, in forward
    self_outputs = self.self(
  File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1111, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 435, in forward
    def forward(
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 502, in bigbird_block_sparse_attention
    def bigbird_block_sparse_attention(
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 502, in bigbird_block_sparse_attention
    def bigbird_block_sparse_attention(
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 502, in bigbird_block_sparse_attention
    def bigbird_block_sparse_attention(
  [Previous line repeated 2 more times]
  File "/home/jansel/torchdynamo/torchdynamo/eval_frame.py", line 58, in _fn
    return fn(*args, **kwargs)
  File "/home/jansel/torchdynamo/torchdynamo/optimizations/python_key.py", line 129, in call_fn
    return inner(*params_flat, *args)
  File "<eval_with_key>.20", line 105, in forward
    unsqueeze__1 = torch.ops.aten.unsqueeze_(detach_36, 2);  detach_36 = None
  File "/home/jansel/pytorch/torch/_ops.py", line 142, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: set_storage_offset is not allowed on a Tensor created from .data or .detach().
If your intent is to change the metadata of a Tensor (such as sizes / strides / storage / storage_offset)
without autograd tracking the change, remove the .data / .detach() call and wrap the change in a `with torch.no_grad():` block.
For example, change:
    x.data.set_(y)
to:
    with torch.no_grad():
        x.set_(y)
ERROR

Somehow the code produced by python key tracing triggers an error.

cc @Chillee @anijain2305

Debug TorchScript error for Slomo

cc @eellison Repro for the bug while running TorchDynamo + AOTAutograd with Torchscript

It seems like, torchscript expects the default values for torch.ops.aten.avg_pool2d_backward to be present.

The error can be repro by - python torchbench.py --training --devices=cuda --accuracy-aot-ts --only=Super_SloMo

RuntimeError:
Arguments for call are not valid.
The following variants are available:

  aten::avg_pool2d_backward(Tensor grad_output, Tensor self, int[2] kernel_size, int[2] stride, int[2] padding, bool ceil_mode, bool count_include_pad, int? divisor_override) -> (Tensor):
  Expected a value of type 'List[int]' for argument 'stride' but instead found type 'List[Tensor]'.
  Empty lists default to List[Tensor]. Add a variable annotation to the assignment to create an empty list of another type (torch.jit.annotate(List[T, []]) where T is the type of elements in the list for Python 2)

  aten::avg_pool2d_backward.grad_input(Tensor grad_output, Tensor self, int[2] kernel_size, int[2] stride, int[2] padding, bool ceil_mode, bool count_include_pad, int? divisor_override, *, Tensor(a!) grad_input) -> (Tensor(a!)):
  Expected a value of type 'List[int]' for argument 'stride' but instead found type 'List[Tensor]'.
  Empty lists default to List[Tensor]. Add a variable annotation to the assignment to create an empty list of another type (torch.jit.annotate(List[T, []]) where T is the type of elements in the list for Python 2)

The original call is:
  File "<eval_with_key>.12", line 362
    getitem_95 = convolution_backward_22[1]
    getitem_96 = convolution_backward_22[2];  convolution_backward_22 = None
    avg_pool2d_backward = torch.ops.aten.avg_pool2d_backward(getitem_94, leaky_relu_32, [2, 2], [], [0, 0], False, True, None);  getitem_94 = leaky_relu_32 = None
                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    add_46 = torch.ops.aten.add(detach_195, avg_pool2d_backward);  detach_195 = avg_pool2d_backward = None
    leaky_relu_backward_13 = torch.ops.aten.leaky_relu_backward(add_46, convolution_32, 0.1, False);  add_46 = convolution_32 = None

ERROR

Add resnet50 JAX to benchmark

Assign to Eric (I cannot assign)

Initial support - AOTAutograd - Test accuracy for TorchBench models

List of bugs
Eager

torch.ops.aten.add_ is slow - pytorch/pytorch#74943

Torchscript bugs

TorchDyanamo bugs

hf_GPT2/speech transformer/hf_t5 - #85
tacotron2 - #82

Torchbench issues

Most of them tracked at pytorch/benchmark#652

AOTAutograd issues

Functionalization tracker - #88
(inplace) pytorch_struct - pytorch/functorch#591
(inplace) pytorch_struct second error - pytorch/functorch#514
tts_angular - pytorch/functorch#586
Contiguous tensors - pytorch/functorch#537

NVFuser issues

Assertion Error 1 - csarofeen/pytorch#1502
Jit trace fails for training
Replay error - csarofeen/pytorch#1538
index error

Which IPEX version is supported as the backend

I think the latest IPEX has no method of _optimize_catch_errors , may I kindly know which IPEX version is supported as the default ipex optimization backend? https://github.com/facebookresearch/torchdynamo/blob/476ca6b7c148dd6705bf218684dfe05c6cd9fcfb/torchdynamo/optimizations/backends.py#L281-L285

Python 3.11 support

Python 3.11 won't be released until the end of the year, but I wanted to put a few notes as we see changes in the development version.

The main one so far is PEP 523 is being moved to the internal Python API and to use it we will need to do:

#ifndef Py_BUILD_CORE_MODULE
#  define Py_BUILD_CORE_MODULE
#endif
#include <Python.h>
#include <internal/pycore_interp.h> // _PyInterpreterState_SetEvalFrameFunc()
#include <internal/pycore_ceval.h>  // _PyEval_EvalFrameDefault

There is some discussion about a different #define being needed, so that may change before release.
For more details, see this thread.

Support distributed training

This is a placeholer task to make distributed training work with TorchDynamo + AOT Autograd. The main work seems to be making sure the relevant ops can be traced with AOT Autograd and are properly added to the FX graph by TorchDynamo.

I expect most of the issues will be at the AOT Autograd level, because TorchDynamo treats most torch.* ops as black box. We should test and verify this though.

@alanwaketan can fill in details.

Debug TorchScript error from moco

Repro - python torchbench.py --training --devices=cuda --accuracy-ts --only=moco

This ones has a DistributedDataParallel module, so it might be something we can table for now.

Error is pretty long, the important section is as follows

	First diverging operator:
	Node diff:
		- %mod : __torch__.torch.nn.parallel.distributed.DistributedDataParallel = prim::GetAttr[name="mod"](%self.1)
		+ %mod : __torch__.torch.nn.parallel.distributed.___torch_mangle_596.DistributedDataParallel = prim::GetAttr[name="mod"](%self.1)
		?                                                ++++++++++++++++++++
ERROR: Tensor-valued Constant nodes differed in value across invocations. This often indicates that the tracer has encountered untraceable code.

Improve line number tracking and exceptions/error messages

Currently, TorchDynamo does not preserve line numbers in generated code in most cases, though it does in a few cases and in the case where the error happens at compile time. It supports generating line numbers in output code, so fixing this is just a matter of populating the line numbers on the Instruction() objects.

We should improve this and carry line numbers through our transformations. We should also test TorchDynamo on deliberately buggy programs and make sure it produces good error messages.

pytorch / torchdynamo Goto Github PK

torchdynamo's Introduction

NOTICE: TorchDynamo has moved

License

torchdynamo's People

Contributors

Stargazers

Watchers

Forkers

torchdynamo's Issues

Recommend Projects

Recommend Topics

Recommend Org