Git Product home page Git Product logo

torchdynamo's Introduction

torchdynamo's People

Contributors

anijain2305 avatar ansley avatar bertmaher avatar chillee avatar davidberard98 avatar desertfire avatar eellison avatar ezyang avatar fdrocha avatar jansel avatar jgong5 avatar lezcano avatar mlazos avatar msaroufim avatar ngimel avatar pyjhzwh avatar sangongs avatar sherlocknomad avatar shunting314 avatar suo avatar tugsbayasgalan avatar vesuppi avatar vkuzo avatar voznesenskym avatar wconstab avatar williamwen42 avatar wschin avatar xuzhao9 avatar yanboliang avatar yushangdi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

torchdynamo's Issues

Support formatted literal strings (f-strings)

f-strings produce the FORMAT_VALUE and BUILD_STRING bytecodes that are yet supported.

These bytecodes could be supported by rewriting them to call the related functions (str.format, str, repr, ascii, etc).

Likely the most useful case of this would be constants (or things TorchDynamo specializes on like cls.__name__). Something like f"foo {self.__class__.__name__} bar {x.shape}" should not need to cause a graph break.

Another useful case would be deferring the string formatting calls to the end of the graph. If there are pytorch ops in the f-string, currently they won't be included in the graph.

Build issues on Mac OS, build script complains about gcc not supporting c++14

Followed the instructions on https://github.com/facebookresearch/torchdynamo to build on Mac OS, got the following error: https://gist.github.com/vkuzo/a9b316590d0eb043f347ae2c0e8c209f . Note: pytorch/pytorch builds without issues in my setup.

Relevant line:

/Users/vasiliy/pytorch/torch/include/torch/csrc/api/include/torch/all.h:4:2: error: C++14 or later compatible compiler is required to use PyTorch.

Note: can fix this locally by adding extra_compile_args=["-std=c++14"] to the torchdynamo._guards portion in setup.py.

[fx2trt] TRT issue

hf_Reformer: CUDA error: device-side assert triggered
fastNLP_Bert: [TRT] [E] 3: [layers.h::setAxis::624] Error Code 3: API Usage Error

Issue with demucs model

The demucs model will throw at:

cuda eval demucs Traceback (most recent call last):
File "torchbench.py", line 957, in
main()
File "torchbench.py", line 824, in main
run_one_model(
File "torchbench.py", line 904, in run_one_model
assert not torchdynamo.utils.is_jit_model(submod)
AssertionError

Here is the command to repro: python torchbench.py -dcuda --speedup-fx2trt-fp16 --only demucs

Support writing to closures while inlining

TorchDynamo support most cases of closures, however this one is not supported

def make_counter():
    x = torch.randn(10)

    def counter():
        nonlocal x
        x = x + 1
        return x

    return counter

@torchdynamo.optimize(torchdynamo.testing.CompileCounter(), nopython=True)
def fn(counter):
    return counter() + counter()


fn(make_counter())

The error (when in nopython=True mode) is:

...
torchdynamo.exc.Unsupported: write to __closure__ while inlining
Processing original code:
  File "/home/jansel/torchdynamo/tests/test_misc.py", line 959, in fn
    return counter() + counter()
  File "/home/jansel/torchdynamo/tests/test_misc.py", line 953, in counter
    x = x + 1

This will work if the closure is in the top-level frame. It will also work if the closure is defined within the captured scope. But in this case we can't actually emit a STORE_DEREF bytecode because "x" is not in our function's freevars.

To support this case we need to rewrite the STORE_DEREF to do something like:

def fn(counter):
   v0 = counter.__closure__[0].cell_contents
   v1 = v0 + 1
   v2 = v1 + 1
   counter.__closure__[0].cell_contents = v2
   return v1 + v2

We already do the first part here:
https://github.com/facebookresearch/torchdynamo/blob/1d1ef71d111d3df894f535ebf8ff6088c0c2e1e1/torchdynamo/variables/functions.py#L142
When we read the value of the closure.

We need to make it writable though. For that we need to register the cell using AttributeMutationExisting and then we can use side_effects.store_cell() on it.

Other types of cells are handled in
https://github.com/facebookresearch/torchdynamo/blob/1d1ef71d111d3df894f535ebf8ff6088c0c2e1e1/torchdynamo/symbolic_convert.py#L1152
and
https://github.com/facebookresearch/torchdynamo/blob/1d1ef71d111d3df894f535ebf8ff6088c0c2e1e1/torchdynamo/symbolic_convert.py#L1163

Note that this explicit cell handling is specific to inlining. When we aren't inlining we treat closures like normal variables with different load/store bytecodes.

Support partially dynamic shapes

In update 5 we wrote:

Unfortunately, the problem of dynamic shapes is more complex than one might think. Enabling torchdynamo.config.dynamic_shapes will cause new graph breaks. Many models have code like assert x.shape == (1,2,3), if x.size(1) == 10, math.sqrt(x.shape[-1]), etc. This Python code operating on integer shapes is the defacto way to express many things in PyTorch. With static shapes, TorchDynamo can constant-propagate this stuff away, however, with dynamic shapes it will break the graph.

My current thinking is a “partially specialized shapes” mode in TorchDynamo. The basic idea would be that all shape start as fully dynamic, but then TorchDynamo would convert a tensor’s shapes to be static when the user called Tensor.size() and passed the result to a non-PyTorch operation. This would allow dynamic shapes most of the time, but still allow bigger graphs when users operate directly on shapes as integers.

To implement an initial version of this:

First build the analysis to add a TensorVariable().input_sources: Set[Source].

def foo(a, b):
  c = a + b

In this example:

  • a.input_souces = {a.source}
  • b.input_souces = {b.source}
  • c.input_souces = {a.source, b.source}

This is just a straight forward data flow analysis where sources are combined. It looks similar to the shape propagation currently implemented in TensorVariable.create.

Next, split GuardBuilder.TENSOR_MATCH into TENSOR_MATCH_STATIC and TENSOR_MATCH_DYNAMIC. The underlying TensorGuards object implemented in C++ already has these two modes, so it just requires having the generated code have two instances of that object.

Finally, modify how TensorVariable handles shape specialization. Defer setting TensorVariable().size and TensorVariable().stride until the user calls Tensor.size(). Note there are a few different ways to get the size, so search for usages of TensorVariable.size.

When .size is called, add a new guard for TENSOR_MATCH_STATIC on all the input_sources. (You can remove the now redundant TENSOR_MATCH_DYNAMIC guard in guard codegen.)

This should give you something that works and passes tests.

Improvements initial prototype:

  • We need to handle dynamic shape ops like nonzero, where, repeat, etc. Modify the analysis to mark tensors flowing from these ops, and break the graph if the user calls size on them. You can search for config.dynamic_shapes to find where we currently conditionally break the graph on those ops.
  • If a user passes the size directly to another PyTorch op, for example torch.empty(x.size()) we don't need to shape specialize and can just put the call to .size() in the graph. Similarly, simple math ops on sizes can be included in the graph. To handle this we will need a SizeVariable() to track and decide what can go in the graph and what requires specialization.
  • We don't need to specialize every dimension if the user code only uses some dimensions. We need better shape analysis to make this happen though. @eellison might be able to provide pointers for better shape analysis.

cc @ezyang

Investigate issue with python key tracing and LSTM

Repro:

env PYTHONKEY_VERBOSE=1 ./torchbench.py --no-skip --python-key -n 1 -k demucs

Python key tracing produces the following warning:

WARNING:torchdynamo.optimizations.python_key:returning real tensor? call_function _operator.getitem <built-in function getitem> (mod_model_lstm_lstm, 0) {}

which is coming from this line:
https://github.com/facebookresearch/torchdynamo/blob/e84f9fee18ae5ab7bfca5504e200503de174efb5/torchdynamo/optimizations/python_key.py#L97

This makes me worries we are missing some operators. While python key tracing, everything should be a functorch._src.python_key.PythonTensor, yet somehow an unwrapped tensor is leaking through.

I suspect our pytree walk of the module hierarchy might be missing some LSTM-related wrapper class. Though I haven't confirmed this.

cc @Chillee

Debug python key tracing error with hf_Reformer

Repro

./torchbench.py --no-skip --python-key -n 1 -k hf_Reformer

Error:

Traceback (most recent call last):
  File "/home/jansel/torchdynamo/torchdynamo/output_graph.py", line 317, in call_user_compiler
    compiled_fn = self.compiler_fn(gm, self.example_inputs())
  File "/home/jansel/torchdynamo/torchdynamo/optimizations/python_key.py", line 137, in python_key
    gm, make_wrapper = python_key_normalize(gm, example_inputs)
  File "/home/jansel/torchdynamo/torchdynamo/optimizations/python_key.py", line 120, in python_key_normalize
    graph = tracer.trace(fake_signature(fn_for_tracing, nargs))
  File "/home/jansel/pytorch/torch/fx/_symbolic_trace.py", line 577, in trace
    self.create_node('output', 'output', (self.create_arg(fn(*args)),), {},
  File "<string>", line 1, in <lambda>
  File "/home/jansel/torchdynamo/torchdynamo/optimizations/python_key.py", line 114, in fn_for_tracing
    out = PatchingInterpreter(gm).run(*args[params_len:])
  File "/home/jansel/pytorch/torch/fx/interpreter.py", line 120, in run
    self.env[node] = self.run_node(node)
  File "/home/jansel/torchdynamo/torchdynamo/optimizations/python_key.py", line 77, in run_node
    result = super().run_node(n)
  File "/home/jansel/pytorch/torch/fx/interpreter.py", line 147, in run_node
    return getattr(self, n.op)(n.target, args, kwargs)
  File "/home/jansel/pytorch/torch/fx/interpreter.py", line 241, in call_method
    return getattr(self_obj, target)(*args_tail, **kwargs)
RuntimeError: DispatchKey PythonTLSSnapshot doesn't correspond to a device

This is coming from a call to Tensor.new it seems.

cc @Chillee @anijain2305

Fix issues in detectron2_maskrcnn

./torchbench.py --no-skip -k detectron2_maskrcnn
ERROR FROM offset=6 filename /home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/structures/masks.py 522 KeyError
========== TorchDynamo Stack Trace ==========
Traceback (most recent call last):
  File "/home/jansel/torchdynamo/torchdynamo/convert_frame.py", line 158, in _convert_frame_assert
    code = transform_code_object(frame.f_code, transform)
  File "/home/jansel/torchdynamo/torchdynamo/bytecode_transformation.py", line 284, in transform_code_object
    transformations(instructions, code_options)
  File "/home/jansel/torchdynamo/torchdynamo/convert_frame.py", line 134, in transform
    tracer.run()
  File "/home/jansel/torchdynamo/torchdynamo/symbolic_convert.py", line 274, in run
    and self.step()
  File "/home/jansel/torchdynamo/torchdynamo/symbolic_convert.py", line 252, in step
    getattr(self, inst.opname)(inst)
  File "/home/jansel/torchdynamo/torchdynamo/symbolic_convert.py", line 384, in IMPORT_FROM
    self.LOAD_ATTR(inst)
  File "/home/jansel/torchdynamo/torchdynamo/symbolic_convert.py", line 608, in LOAD_ATTR
    result = BuiltinVariable(getattr).call_function(
  File "/home/jansel/torchdynamo/torchdynamo/variables/builtin.py", line 212, in call_function
    result = handler(tx, *args, **kwargs)
  File "/home/jansel/torchdynamo/torchdynamo/variables/builtin.py", line 461, in call_getattr
    member = obj.value.__dict__[name]
KeyError: 'paste_masks_in_image'
========== Exception (above) while processing ==========
  File "./torchbench.py", line 1019, in <module>
    main()
  File "./torchbench.py", line 913, in main
    run_one_model(
  File "./torchbench.py", line 981, in run_one_model
    new_result = model_iter_fn(model, example_inputs)
  File "./torchbench.py", line 469, in forward_pass
    def forward_pass(mod, inputs, collect_outputs=True):
  File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1111, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 122, in forward
    def forward(self, batched_inputs: List[Dict[str, torch.Tensor]]):
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 174, in inference
    def inference(
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 174, in inference
    def inference(
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 174, in inference
    def inference(
  [Previous line repeated 1 more time]
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 229, in _postprocess
    @staticmethod
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/modeling/postprocessing.py", line 9, in detector_postprocess
    def detector_postprocess(
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/modeling/postprocessing.py", line 9, in detector_postprocess
    def detector_postprocess(
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/modeling/postprocessing.py", line 9, in detector_postprocess
    def detector_postprocess(
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/modeling/postprocessing.py", line 67, in detector_postprocess
    results.pred_masks = roi_masks.to_bitmasks(
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/structures/masks.py", line 517, in to_bitmasks
    @torch.jit.unused
========== End debug info ==========

Debug issue with tacotron2

tacotron2 was recently added to torchbench with get_module() support, and it seems it does not work properly.

@anijain2305 reported an error running

python torchbench.py --devices=cuda --only=tacotron2

I haven't had a chance to look into this one yet, but creating an issue so that it does not get lost. Feel free to add more details @anijain2305.

[fx2trt] op support

hf_T5:
torch.rsqrt, pow, acc_ops.to,torch.isinf, any, float, type_as,

hf_GPT2:
acc_ops.split, torch.where, type

soft_actor_critic:
exp,torch.functional.broadcast_tensors

TensorRT virtualMemoryBuffer internal error

Observed the error msg after run. Maybe related with how torchdynamo release the resources?

python torchbench.py -dcuda --speedup-fx2trt-fp16 --only mobilenet_v2
cuda eval mobilenet_v2 [04/05/2022-15:43:11] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.1
[04/05/2022-15:43:14] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5
[04/05/2022-15:43:48] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.1
[04/05/2022-15:43:48] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5
[04/05/2022-15:43:48] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.1
[04/05/2022-15:43:48] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5

Supported node types in the model:
acc_ops.conv2d: ((), {'input': torch.float16, 'weight': torch.float16})
acc_ops.batch_norm: ((), {'input': torch.float16, 'running_mean': torch.float16, 'running_var': torch.float16, 'weight': torch.float16, 'bias': torch.float16})
acc_ops.hardtanh: ((), {'input': torch.float16})
acc_ops.add: ((), {'input': torch.float16, 'other': torch.float16})
acc_ops.adaptive_avg_pool2d: ((), {'input': torch.float16})
acc_ops.flatten: ((), {'input': torch.float16})
acc_ops.linear: ((), {'input': torch.float16, 'weight': torch.float16, 'bias': torch.float16})

Unsupported node types in the model:

graph is split into _run_on_acc_0
Similarity score=0.9999595284461975
7.243x p=0.00
Unexpected Internal Error: [virtualMemoryBuffer.cpp::~StdVirtualMemoryBufferImpl::121] Error Code 1: Cuda Runtime (driver shutting down)

Debug issue with AOTAutograd for speech_transformer/hf_GPT2/hf_T5

The three models - speech_transformer, hf_GPT2 and hf_T5 fail with similar type of error signature.

TorchDynamo finds static subgraphs and sends them to AOT Autograd. AOT Autograd generates the forward and backward graphs. The output of AOT Autograd is a autograd.Function (code). AOT Autograd saves some tensors for the backward pass gradient computation in the forward pass.

The issue arises in the backward pass. When we read the saved_tensors, one of the item in the saved_tensors is not of Tensor type anymore. This causes cryptic error messages like the one below. And this type changes from run to run. I have seen immutable_dict, tuple and even weakref and builtin.

ERROR:root:unhandled error
Traceback (most recent call last):
  File "torchbench.py", line 1006, in run_one_model
    new_result = model_iter_fn(model, example_inputs)
  File "torchbench.py", line 482, in forward_and_backward_pass
    def forward_and_backward_pass(mod, inputs, collect_outputs=True):
  File "torchbench.py", line 482, in forward_and_backward_pass
    def forward_and_backward_pass(mod, inputs, collect_outputs=True):
  File "torchbench.py", line 482, in forward_and_backward_pass
    def forward_and_backward_pass(mod, inputs, collect_outputs=True):
  [Previous line repeated 2 more times]
  File "/fsx/users/anijain/functorch/functorch/_src/monkey_patching.py", line 97, in _backward
    return _old_backward(*args, **kwargs)
  File "/data/home/anijain/miniconda/envs/pytorch_dev/lib/python3.8/site-packages/torch/_tensor.py", line 395, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/data/home/anijain/miniconda/envs/pytorch_dev/lib/python3.8/site-packages/torch/autograd/__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/data/home/anijain/miniconda/envs/pytorch_dev/lib/python3.8/site-packages/torch/autograd/function.py", line 253, in apply
    return user_fn(self, *args)
  File "/fsx/users/anijain/torchdynamo/torchdynamo/eval_frame.py", line 58, in _fn
    return fn(*args, **kwargs)
  File "/fsx/users/anijain/functorch/functorch/_src/aot_autograd.py", line 188, in backward
    out = normalize_as_list(compiled_bw(*ctx.saved_tensors, *contiguous_args))
  File "/fsx/users/anijain/torchdynamo/torchdynamo/eval_frame.py", line 58, in _fn
    return fn(*args, **kwargs)
  File "/data/home/anijain/miniconda/envs/pytorch_dev/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
RuntimeError: forward() Expected a value of type 'Tensor (inferred)' for argument 'primals_14' but instead found type 'tuple'.
Inferred 'primals_14' to be of type 'Tensor' because it was not annotated with an explicit type.
Position: 19
Value: ('___check_obj_id', '___check_tensors', '___check_type_id', '___guarded_code')

I further looked into C++ and starting printing the type of objects while saving the tensors at the end of forward pass, and reading them back in backward pass. I observed the weird behavior in this line -(https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/python_function.cpp#L834). This is called in the backward pass, when we call ctx.saved_tensors.

When I print the unpacked_var, it is a tensor. It has its dim, I can print its shape and everything.
But Py_TYPE(value)→tp_name equals immutable_dict here.
The unpack_fn is basically THPVariable_Wrap - (https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/python_function.cpp#L849).

For completeness, adding images for the failure

Repro - python torchbench.py --training --devices=cuda --accuracy-aot-nop --only=hf_GPT2
image

Repro - python torchbench.py --training --devices=cuda --accuracy-aot-nop --only=speech_transformer
image

Repro - python torchbench.py --training --devices=cuda --accuracy-aot-nop --only=hf_T5
image

Add support for Python 3.10

Python 3.9 (#33) should be added before 3.10.

Known issues for Python 3.10 support:

Python 3.10 and later have a new method for mapping the bytecode index to line number, deprecating co_lnotab. See PEP 626. lnotab_writer in TorchDynamo needs to be rewritten to support the new format. See #36 for more on line numbers.

There are 9 new/changed bytecodes in 3.10

COPY_DICT_WITHOUT_KEYS
GET_LEN
MATCH_MAPPING
MATCH_SEQUENCE
MATCH_KEYS
MATCH_CLASS
Look like easy to add aliases for already supported things.

MAKE_FUNCTION
Handling of annotations changed. We should add a test case to make sure annotations work for nested functions.

ROT_N
Seems easy to add. We should update usage of rot_n_helper to use this new bytecode instead.

GEN_START
This could break handling of inline generators. Need to look into this in more detail.

Handle -inf for Torchscript of FX graphs

Repro

import torch
import torch.fx

x = torch.randn(4, 5)
mask = torch.randn(4, 5) > 0.5

def f(x, mask):
    # return x.masked_fill_(mask, 1.0) # PASSES
    return x.masked_fill_(mask, float("-inf"))

print(f(x, mask))

# Only fails when symbolic_trace
fx_mod = torch.fx.symbolic_trace(f)
scripted_f = torch.jit.script(fx_mod)
print(scripted_f(x, mask))

@eellison

[fx2trt] unclear issues

vision_maskrcnn: Error
detectron2_maskrcnn: various tracing issues
Super_SloMo: various tracing issues
opacus_cifar10: tracing issues
hf_BigBird: op support, tracing issues, shape '[1, 12, 62, 192]' is invalid for input of size 11904
pyhpc_equation_of_state: nan output

Skip non-Tensor/Module frames

People may have non-PyTorch code run under TorchDynamo. We should make sure TorchDynamo does nothing in this case.

If TorchDynamo reaches the end of the frame without finding PyTorch ops, it will just run the frame normally. However, if there are unsupported things that prevent a whole-graph, TorchDyanmo could generate specialized frames for non-PyTorch code. This should be correct, but could add extra overhead.

To improve this we should expand the logic in this function:
https://github.com/facebookresearch/torchdynamo/blob/44971ffd9a7e6798b7868a592c337acb75bd1d2d/torchdynamo/symbolic_convert.py#L1008
That function controls if TorchDynamo should break the graph and generate a resume_at_xx function to pick up after an unsupported thing.

The logic I would propose is: examine the stack, locals, and globals referenced by co_names; if there is a tensor/nn.Module/torch.* anywhere then, then keep doing what we do now; if there is not, just bail out and switch to normal execution.

Usage Tutorial

Create a tutorial on how to use it:

GPU: Forward, Backwards
CPU: ??

Add support for Python 3.9

The process for adding support for Python 3.9 is as follows.

First, examine the 10 new bytecodes in 3.9:

RERAISE
raising exceptions is currently not supported and breaks the graph. So this can just call unimplemented() for now. See issue pytorch/pytorch#93720 for more on exceptions.

WITH_EXCEPT_START
might affect support for with no_grad(): and related ops

LOAD_ASSERTION_ERROR
LIST_TO_TUPLE
LIST_EXTEND
SET_UPDATE
DICT_UPDATE
DICT_MERGE
IS_OP
CONTAINS_OP
these are all simple aliases for things TorchDynamo already supports and should be easy to handle

Next, update the versions supported in setup.py so the build works.

Next, iteratively add support for ops and fix issues until all tests pass. pytest tests to run tests.

Next, iteratively fix issues in ./torchbench.py until all models pass and match the coverage of Python 3.8

Fix python key tracing errors with quantized models

Repro

./torchbench.py --no-skip --python-key -n 1 -k mobilenet_v2_quantized

Partial output:

ERROR:torchdynamo.optimizations.python_key:exception running call_function torch.quantize_per_tensor (inputs_0_, mod_features_0_0_input_scale_0, mod_features_0_0_input_zero_point_0, torch.quint8) {}
Traceback (most recent call last):
  File "/home/jansel/torchdynamo/torchdynamo/optimizations/python_key.py", line 77, in run_node
    result = super().run_node(n)
  File "/home/jansel/pytorch/torch/fx/interpreter.py", line 147, in run_node
    return getattr(self, n.op)(n.target, args, kwargs)
  File "/home/jansel/pytorch/torch/fx/interpreter.py", line 219, in call_function
    return target(*args, **kwargs)
  File "/home/jansel/functorch/functorch/_src/python_key.py", line 112, in __torch_dispatch__
    return wrap_with_proxy(real_out, proxy_out)
  File "/home/jansel/functorch/functorch/_src/python_key.py", line 104, in wrap_with_proxy
    return PythonTensor(e, proxy)
  File "/home/jansel/functorch/functorch/_src/python_key.py", line 60, in __new__
    proxy.node.meta['tensor_meta'] = _extract_tensor_metadata(r)
  File "/home/jansel/pytorch/torch/fx/passes/shape_prop.py", line 48, in _extract_tensor_metadata
    qscheme = result.qscheme()
RuntimeError: toIValue() cannot handle converting to type: QScheme

This effects 2 quantized modules with the same error.

cc @Chillee @anijain2305

Pip Installs

pip install dynamo
pip install trt ? (or other TRT install?)

CICD setup to build pip/conda packages

Someone should be able to pip install torchdynamo and not need to install from source.

If we do binary releases we may need to pin to specific PyTorch versions, so it might be better to ship only source packages to pypi.

Test TorchDynamo on a wider variety of models

Now that TorchDynamo is working on most TorchBench models, we should start looking for additional models to test on to continue improving coverage and robustness.

All models are welcome here, so if you have specific models in mind from a use case you are familiar with please test them and report your experiences.

For testing on Meta models, @dzhulgakov suggested that @houseroad and @divchenko would be able to provide pointers to increasingly complex models to test on.

Debug python key tracing error with hf_BigBird

edited by @ezyang

Repro

./torchbench.py --no-skip --python-key -n 1 -k hf_BigBird --devices cuda --float32

old stuff (this output is no longer master

Output

cpu  eval  hf_BigBird                         ERROR:root:unhandled error
Traceback (most recent call last):
  File "./torchbench.py", line 911, in run_one_model
    new_result = model_iter_fn(model, example_inputs)
  File "./torchbench.py", line 456, in forward_pass
    def forward_pass(mod, inputs, collect_outputs=True):
  File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1111, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 2321, in forward
    @add_start_docstrings_to_model_forward(BIG_BIRD_INPUTS_DOCSTRING.format("(batch_size, sequence_length)"))
  File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1111, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 1920, in forward
    @add_start_docstrings_to_model_forward(BIG_BIRD_INPUTS_DOCSTRING.format("(batch_size, sequence_length)"))
  File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1111, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 1615, in forward
    layer_outputs = layer_module(
  File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1111, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 1451, in forward
    def forward(
  File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1111, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 1381, in forward
    self_outputs = self.self(
  File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1111, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 435, in forward
    def forward(
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 502, in bigbird_block_sparse_attention
    def bigbird_block_sparse_attention(
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 502, in bigbird_block_sparse_attention
    def bigbird_block_sparse_attention(
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 502, in bigbird_block_sparse_attention
    def bigbird_block_sparse_attention(
  [Previous line repeated 2 more times]
  File "/home/jansel/torchdynamo/torchdynamo/eval_frame.py", line 58, in _fn
    return fn(*args, **kwargs)
  File "/home/jansel/torchdynamo/torchdynamo/optimizations/python_key.py", line 129, in call_fn
    return inner(*params_flat, *args)
  File "<eval_with_key>.20", line 105, in forward
    unsqueeze__1 = torch.ops.aten.unsqueeze_(detach_36, 2);  detach_36 = None
  File "/home/jansel/pytorch/torch/_ops.py", line 142, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: set_storage_offset is not allowed on a Tensor created from .data or .detach().
If your intent is to change the metadata of a Tensor (such as sizes / strides / storage / storage_offset)
without autograd tracking the change, remove the .data / .detach() call and wrap the change in a `with torch.no_grad():` block.
For example, change:
    x.data.set_(y)
to:
    with torch.no_grad():
        x.set_(y)
ERROR

Somehow the code produced by python key tracing triggers an error.

cc @Chillee @anijain2305

Debug TorchScript error for Slomo

cc @eellison Repro for the bug while running TorchDynamo + AOTAutograd with Torchscript

It seems like, torchscript expects the default values for torch.ops.aten.avg_pool2d_backward to be present.

The error can be repro by - python torchbench.py --training --devices=cuda --accuracy-aot-ts --only=Super_SloMo

RuntimeError:
Arguments for call are not valid.
The following variants are available:

  aten::avg_pool2d_backward(Tensor grad_output, Tensor self, int[2] kernel_size, int[2] stride, int[2] padding, bool ceil_mode, bool count_include_pad, int? divisor_override) -> (Tensor):
  Expected a value of type 'List[int]' for argument 'stride' but instead found type 'List[Tensor]'.
  Empty lists default to List[Tensor]. Add a variable annotation to the assignment to create an empty list of another type (torch.jit.annotate(List[T, []]) where T is the type of elements in the list for Python 2)

  aten::avg_pool2d_backward.grad_input(Tensor grad_output, Tensor self, int[2] kernel_size, int[2] stride, int[2] padding, bool ceil_mode, bool count_include_pad, int? divisor_override, *, Tensor(a!) grad_input) -> (Tensor(a!)):
  Expected a value of type 'List[int]' for argument 'stride' but instead found type 'List[Tensor]'.
  Empty lists default to List[Tensor]. Add a variable annotation to the assignment to create an empty list of another type (torch.jit.annotate(List[T, []]) where T is the type of elements in the list for Python 2)

The original call is:
  File "<eval_with_key>.12", line 362
    getitem_95 = convolution_backward_22[1]
    getitem_96 = convolution_backward_22[2];  convolution_backward_22 = None
    avg_pool2d_backward = torch.ops.aten.avg_pool2d_backward(getitem_94, leaky_relu_32, [2, 2], [], [0, 0], False, True, None);  getitem_94 = leaky_relu_32 = None
                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    add_46 = torch.ops.aten.add(detach_195, avg_pool2d_backward);  detach_195 = avg_pool2d_backward = None
    leaky_relu_backward_13 = torch.ops.aten.leaky_relu_backward(add_46, convolution_32, 0.1, False);  add_46 = convolution_32 = None

ERROR

Initial support - AOTAutograd - Test accuracy for TorchBench models

List of bugs
Eager

Torchscript bugs

TorchDyanamo bugs

  • hf_GPT2/speech transformer/hf_t5 - #85
  • tacotron2 - #82

Torchbench issues

AOTAutograd issues

NVFuser issues

Python 3.11 support

Python 3.11 won't be released until the end of the year, but I wanted to put a few notes as we see changes in the development version.

The main one so far is PEP 523 is being moved to the internal Python API and to use it we will need to do:

#ifndef Py_BUILD_CORE_MODULE
#  define Py_BUILD_CORE_MODULE
#endif
#include <Python.h>
#include <internal/pycore_interp.h> // _PyInterpreterState_SetEvalFrameFunc()
#include <internal/pycore_ceval.h>  // _PyEval_EvalFrameDefault

There is some discussion about a different #define being needed, so that may change before release.
For more details, see this thread.

Support distributed training

This is a placeholer task to make distributed training work with TorchDynamo + AOT Autograd. The main work seems to be making sure the relevant ops can be traced with AOT Autograd and are properly added to the FX graph by TorchDynamo.

I expect most of the issues will be at the AOT Autograd level, because TorchDynamo treats most torch.* ops as black box. We should test and verify this though.

@alanwaketan can fill in details.

Debug TorchScript error from moco

Repro - python torchbench.py --training --devices=cuda --accuracy-ts --only=moco

This ones has a DistributedDataParallel module, so it might be something we can table for now.

Error is pretty long, the important section is as follows

	First diverging operator:
	Node diff:
		- %mod : __torch__.torch.nn.parallel.distributed.DistributedDataParallel = prim::GetAttr[name="mod"](%self.1)
		+ %mod : __torch__.torch.nn.parallel.distributed.___torch_mangle_596.DistributedDataParallel = prim::GetAttr[name="mod"](%self.1)
		?                                                ++++++++++++++++++++
ERROR: Tensor-valued Constant nodes differed in value across invocations. This often indicates that the tracer has encountered untraceable code.

Improve line number tracking and exceptions/error messages

Currently, TorchDynamo does not preserve line numbers in generated code in most cases, though it does in a few cases and in the case where the error happens at compile time. It supports generating line numbers in output code, so fixing this is just a matter of populating the line numbers on the Instruction() objects.

We should improve this and carry line numbers through our transformations. We should also test TorchDynamo on deliberately buggy programs and make sure it produces good error messages.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.