Git Product home page Git Product logo

torchdynamo's Introduction

PyTorch Logo


PyTorch is a Python package that provides two high-level features:

  • Tensor computation (like NumPy) with strong GPU acceleration
  • Deep neural networks built on a tape-based autograd system

You can reuse your favorite Python packages such as NumPy, SciPy, and Cython to extend PyTorch when needed.

Our trunk health (Continuous Integration signals) can be found at hud.pytorch.org.

More About PyTorch

Learn the basics of PyTorch

At a granular level, PyTorch is a library that consists of the following components:

Component Description
torch A Tensor library like NumPy, with strong GPU support
torch.autograd A tape-based automatic differentiation library that supports all differentiable Tensor operations in torch
torch.jit A compilation stack (TorchScript) to create serializable and optimizable models from PyTorch code
torch.nn A neural networks library deeply integrated with autograd designed for maximum flexibility
torch.multiprocessing Python multiprocessing, but with magical memory sharing of torch Tensors across processes. Useful for data loading and Hogwild training
torch.utils DataLoader and other utility functions for convenience

Usually, PyTorch is used either as:

  • A replacement for NumPy to use the power of GPUs.
  • A deep learning research platform that provides maximum flexibility and speed.

Elaborating Further:

A GPU-Ready Tensor Library

If you use NumPy, then you have used Tensors (a.k.a. ndarray).

Tensor illustration

PyTorch provides Tensors that can live either on the CPU or the GPU and accelerates the computation by a huge amount.

We provide a wide variety of tensor routines to accelerate and fit your scientific computation needs such as slicing, indexing, mathematical operations, linear algebra, reductions. And they are fast!

Dynamic Neural Networks: Tape-Based Autograd

PyTorch has a unique way of building neural networks: using and replaying a tape recorder.

Most frameworks such as TensorFlow, Theano, Caffe, and CNTK have a static view of the world. One has to build a neural network and reuse the same structure again and again. Changing the way the network behaves means that one has to start from scratch.

With PyTorch, we use a technique called reverse-mode auto-differentiation, which allows you to change the way your network behaves arbitrarily with zero lag or overhead. Our inspiration comes from several research papers on this topic, as well as current and past work such as torch-autograd, autograd, Chainer, etc.

While this technique is not unique to PyTorch, it's one of the fastest implementations of it to date. You get the best of speed and flexibility for your crazy research.

Dynamic graph

Python First

PyTorch is not a Python binding into a monolithic C++ framework. It is built to be deeply integrated into Python. You can use it naturally like you would use NumPy / SciPy / scikit-learn etc. You can write your new neural network layers in Python itself, using your favorite libraries and use packages such as Cython and Numba. Our goal is to not reinvent the wheel where appropriate.

Imperative Experiences

PyTorch is designed to be intuitive, linear in thought, and easy to use. When you execute a line of code, it gets executed. There isn't an asynchronous view of the world. When you drop into a debugger or receive error messages and stack traces, understanding them is straightforward. The stack trace points to exactly where your code was defined. We hope you never spend hours debugging your code because of bad stack traces or asynchronous and opaque execution engines.

Fast and Lean

PyTorch has minimal framework overhead. We integrate acceleration libraries such as Intel MKL and NVIDIA (cuDNN, NCCL) to maximize speed. At the core, its CPU and GPU Tensor and neural network backends are mature and have been tested for years.

Hence, PyTorch is quite fast — whether you run small or large neural networks.

The memory usage in PyTorch is extremely efficient compared to Torch or some of the alternatives. We've written custom memory allocators for the GPU to make sure that your deep learning models are maximally memory efficient. This enables you to train bigger deep learning models than before.

Extensions Without Pain

Writing new neural network modules, or interfacing with PyTorch's Tensor API was designed to be straightforward and with minimal abstractions.

You can write new neural network layers in Python using the torch API or your favorite NumPy-based libraries such as SciPy.

If you want to write your layers in C/C++, we provide a convenient extension API that is efficient and with minimal boilerplate. No wrapper code needs to be written. You can see a tutorial here and an example here.

Installation

Binaries

Commands to install binaries via Conda or pip wheels are on our website: https://pytorch.org/get-started/locally/

NVIDIA Jetson Platforms

Python wheels for NVIDIA's Jetson Nano, Jetson TX1/TX2, Jetson Xavier NX/AGX, and Jetson AGX Orin are provided here and the L4T container is published here

They require JetPack 4.2 and above, and @dusty-nv and @ptrblck are maintaining them.

From Source

Prerequisites

If you are installing from source, you will need:

  • Python 3.8 or later (for Linux, Python 3.8.1+ is needed)
  • A compiler that fully supports C++17, such as clang or gcc (gcc 9.4.0 or newer is required)

We highly recommend installing an Anaconda environment. You will get a high-quality BLAS library (MKL) and you get controlled dependency versions regardless of your Linux distro.

NVIDIA CUDA Support

If you want to compile with CUDA support, select a supported version of CUDA from our support matrix, then install the following:

Note: You could refer to the cuDNN Support Matrix for cuDNN versions with the various supported CUDA, CUDA driver and NVIDIA hardware

If you want to disable CUDA support, export the environment variable USE_CUDA=0. Other potentially useful environment variables may be found in setup.py.

If you are building for NVIDIA's Jetson platforms (Jetson Nano, TX1, TX2, AGX Xavier), Instructions to install PyTorch for Jetson Nano are available here

AMD ROCm Support

If you want to compile with ROCm support, install

  • AMD ROCm 4.0 and above installation
  • ROCm is currently supported only for Linux systems.

If you want to disable ROCm support, export the environment variable USE_ROCM=0. Other potentially useful environment variables may be found in setup.py.

Intel GPU Support

If you want to compile with Intel GPU support, follow these

If you want to disable Intel GPU support, export the environment variable USE_XPU=0. Other potentially useful environment variables may be found in setup.py.

Install Dependencies

Common

conda install cmake ninja
# Run this command from the PyTorch directory after cloning the source code using the “Get the PyTorch Source“ section below
pip install -r requirements.txt

On Linux

conda install intel::mkl-static intel::mkl-include
# CUDA only: Add LAPACK support for the GPU if needed
conda install -c pytorch magma-cuda121  # or the magma-cuda* that matches your CUDA version from https://anaconda.org/pytorch/repo

# (optional) If using torch.compile with inductor/triton, install the matching version of triton
# Run from the pytorch directory after cloning
# For Intel GPU support, please explicitly `export USE_XPU=1` before running command.
make triton

On MacOS

# Add this package on intel x86 processor machines only
conda install intel::mkl-static intel::mkl-include
# Add these packages if torch.distributed is needed
conda install pkg-config libuv

On Windows

conda install intel::mkl-static intel::mkl-include
# Add these packages if torch.distributed is needed.
# Distributed package support on Windows is a prototype feature and is subject to changes.
conda install -c conda-forge libuv=1.39

Get the PyTorch Source

git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
# if you are updating an existing checkout
git submodule sync
git submodule update --init --recursive

Install PyTorch

On Linux

If you would like to compile PyTorch with new C++ ABI enabled, then first run this command:

export _GLIBCXX_USE_CXX11_ABI=1

If you're compiling for AMD ROCm then first run this command:

# Only run this if you're compiling for ROCm
python tools/amd_build/build_amd.py

Install PyTorch

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python setup.py develop

Aside: If you are using Anaconda, you may experience an error caused by the linker:

build/temp.linux-x86_64-3.7/torch/csrc/stub.o: file not recognized: file format not recognized
collect2: error: ld returned 1 exit status
error: command 'g++' failed with exit status 1

This is caused by ld from the Conda environment shadowing the system ld. You should use a newer version of Python that fixes this issue. The recommended Python version is 3.8.1+.

On macOS

python3 setup.py develop

On Windows

Choose Correct Visual Studio Version.

PyTorch CI uses Visual C++ BuildTools, which come with Visual Studio Enterprise, Professional, or Community Editions. You can also install the build tools from https://visualstudio.microsoft.com/visual-cpp-build-tools/. The build tools do not come with Visual Studio Code by default.

If you want to build legacy python code, please refer to Building on legacy code and CUDA

CPU-only builds

In this mode PyTorch computations will run on your CPU, not your GPU

conda activate
python setup.py develop

Note on OpenMP: The desired OpenMP implementation is Intel OpenMP (iomp). In order to link against iomp, you'll need to manually download the library and set up the building environment by tweaking CMAKE_INCLUDE_PATH and LIB. The instruction here is an example for setting up both MKL and Intel OpenMP. Without these configurations for CMake, Microsoft Visual C OpenMP runtime (vcomp) will be used.

CUDA based build

In this mode PyTorch computations will leverage your GPU via CUDA for faster number crunching

NVTX is needed to build Pytorch with CUDA. NVTX is a part of CUDA distributive, where it is called "Nsight Compute". To install it onto an already installed CUDA run CUDA installation once again and check the corresponding checkbox. Make sure that CUDA with Nsight Compute is installed after Visual Studio.

Currently, VS 2017 / 2019, and Ninja are supported as the generator of CMake. If ninja.exe is detected in PATH, then Ninja will be used as the default generator, otherwise, it will use VS 2017 / 2019.
If Ninja is selected as the generator, the latest MSVC will get selected as the underlying toolchain.

Additional libraries such as Magma, oneDNN, a.k.a. MKLDNN or DNNL, and Sccache are often needed. Please refer to the installation-helper to install them.

You can refer to the build_pytorch.bat script for some other environment variables configurations

cmd

:: Set the environment variables after you have downloaded and unzipped the mkl package,
:: else CMake would throw an error as `Could NOT find OpenMP`.
set CMAKE_INCLUDE_PATH={Your directory}\mkl\include
set LIB={Your directory}\mkl\lib;%LIB%

:: Read the content in the previous section carefully before you proceed.
:: [Optional] If you want to override the underlying toolset used by Ninja and Visual Studio with CUDA, please run the following script block.
:: "Visual Studio 2019 Developer Command Prompt" will be run automatically.
:: Make sure you have CMake >= 3.12 before you do this when you use the Visual Studio generator.
set CMAKE_GENERATOR_TOOLSET_VERSION=14.27
set DISTUTILS_USE_SDK=1
for /f "usebackq tokens=*" %i in (`"%ProgramFiles(x86)%\Microsoft Visual Studio\Installer\vswhere.exe" -version [15^,17^) -products * -latest -property installationPath`) do call "%i\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=%CMAKE_GENERATOR_TOOLSET_VERSION%

:: [Optional] If you want to override the CUDA host compiler
set CUDAHOSTCXX=C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\bin\HostX64\x64\cl.exe

python setup.py develop
Adjust Build Options (Optional)

You can adjust the configuration of cmake variables optionally (without building first), by doing the following. For example, adjusting the pre-detected directories for CuDNN or BLAS can be done with such a step.

On Linux

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python setup.py build --cmake-only
ccmake build  # or cmake-gui build

On macOS

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py build --cmake-only
ccmake build  # or cmake-gui build

Docker Image

Using pre-built images

You can also pull a pre-built docker image from Docker Hub and run with docker v19.03+

docker run --gpus all --rm -ti --ipc=host pytorch/pytorch:latest

Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g. for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you should increase shared memory size either with --ipc=host or --shm-size command line options to nvidia-docker run.

Building the image yourself

NOTE: Must be built with a docker version > 18.06

The Dockerfile is supplied to build images with CUDA 11.1 support and cuDNN v8. You can pass PYTHON_VERSION=x.y make variable to specify which Python version is to be used by Miniconda, or leave it unset to use the default.

make -f docker.Makefile
# images are tagged as docker.io/${your_docker_username}/pytorch

You can also pass the CMAKE_VARS="..." environment variable to specify additional CMake variables to be passed to CMake during the build. See setup.py for the list of available variables.

make -f docker.Makefile

Building the Documentation

To build documentation in various formats, you will need Sphinx and the readthedocs theme.

cd docs/
pip install -r requirements.txt

You can then build the documentation by running make <format> from the docs/ folder. Run make to get a list of all available output formats.

If you get a katex error run npm install katex. If it persists, try npm install -g katex

Note: if you installed nodejs with a different package manager (e.g., conda) then npm will probably install a version of katex that is not compatible with your version of nodejs and doc builds will fail. A combination of versions that is known to work is [email protected] and [email protected]. To install the latter with npm you can run npm install -g [email protected]

Previous Versions

Installation instructions and binaries for previous PyTorch versions may be found on our website.

Getting Started

Three-pointers to get you started:

Resources

Communication

Releases and Contributing

Typically, PyTorch has three minor releases a year. Please let us know if you encounter a bug by filing an issue.

We appreciate all contributions. If you are planning to contribute back bug-fixes, please do so without any further discussion.

If you plan to contribute new features, utility functions, or extensions to the core, please first open an issue and discuss the feature with us. Sending a PR without discussion might end up resulting in a rejected PR because we might be taking the core in a different direction than you might be aware of.

To learn more about making a contribution to Pytorch, please see our Contribution page. For more information about PyTorch releases, see Release page.

The Team

PyTorch is a community-driven project with several skillful engineers and researchers contributing to it.

PyTorch is currently maintained by Soumith Chintala, Gregory Chanan, Dmytro Dzhulgakov, Edward Yang, and Nikita Shulga with major contributions coming from hundreds of talented individuals in various forms and means. A non-exhaustive but growing list needs to mention: Trevor Killeen, Sasank Chilamkurthy, Sergey Zagoruyko, Adam Lerer, Francisco Massa, Alykhan Tejani, Luca Antiga, Alban Desmaison, Andreas Koepf, James Bradbury, Zeming Lin, Yuandong Tian, Guillaume Lample, Marat Dukhan, Natalia Gimelshein, Christian Sarofeen, Martin Raison, Edward Yang, Zachary Devito.

Note: This project is unrelated to hughperkins/pytorch with the same name. Hugh is a valuable contributor to the Torch community and has helped with many things Torch and PyTorch.

License

PyTorch has a BSD-style license, as found in the LICENSE file.

torchdynamo's People

Contributors

anijain2305 avatar ansley avatar bertmaher avatar chillee avatar davidberard98 avatar desertfire avatar eellison avatar ezyang avatar fdrocha avatar jansel avatar jgong5 avatar lezcano avatar mlazos avatar msaroufim avatar ngimel avatar pyjhzwh avatar sangongs avatar sherlocknomad avatar shunting314 avatar suo avatar tugsbayasgalan avatar vesuppi avatar vkuzo avatar voznesenskym avatar wconstab avatar williamwen42 avatar wschin avatar xuzhao9 avatar yanboliang avatar yushangdi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

torchdynamo's Issues

[fx2trt] unclear issues

vision_maskrcnn: Error
detectron2_maskrcnn: various tracing issues
Super_SloMo: various tracing issues
opacus_cifar10: tracing issues
hf_BigBird: op support, tracing issues, shape '[1, 12, 62, 192]' is invalid for input of size 11904
pyhpc_equation_of_state: nan output

Initial support - AOTAutograd - Test accuracy for TorchBench models

List of bugs
Eager

Torchscript bugs

TorchDyanamo bugs

  • hf_GPT2/speech transformer/hf_t5 - #85
  • tacotron2 - #82

Torchbench issues

AOTAutograd issues

NVFuser issues

Debug TorchScript error for Slomo

cc @eellison Repro for the bug while running TorchDynamo + AOTAutograd with Torchscript

It seems like, torchscript expects the default values for torch.ops.aten.avg_pool2d_backward to be present.

The error can be repro by - python torchbench.py --training --devices=cuda --accuracy-aot-ts --only=Super_SloMo

RuntimeError:
Arguments for call are not valid.
The following variants are available:

  aten::avg_pool2d_backward(Tensor grad_output, Tensor self, int[2] kernel_size, int[2] stride, int[2] padding, bool ceil_mode, bool count_include_pad, int? divisor_override) -> (Tensor):
  Expected a value of type 'List[int]' for argument 'stride' but instead found type 'List[Tensor]'.
  Empty lists default to List[Tensor]. Add a variable annotation to the assignment to create an empty list of another type (torch.jit.annotate(List[T, []]) where T is the type of elements in the list for Python 2)

  aten::avg_pool2d_backward.grad_input(Tensor grad_output, Tensor self, int[2] kernel_size, int[2] stride, int[2] padding, bool ceil_mode, bool count_include_pad, int? divisor_override, *, Tensor(a!) grad_input) -> (Tensor(a!)):
  Expected a value of type 'List[int]' for argument 'stride' but instead found type 'List[Tensor]'.
  Empty lists default to List[Tensor]. Add a variable annotation to the assignment to create an empty list of another type (torch.jit.annotate(List[T, []]) where T is the type of elements in the list for Python 2)

The original call is:
  File "<eval_with_key>.12", line 362
    getitem_95 = convolution_backward_22[1]
    getitem_96 = convolution_backward_22[2];  convolution_backward_22 = None
    avg_pool2d_backward = torch.ops.aten.avg_pool2d_backward(getitem_94, leaky_relu_32, [2, 2], [], [0, 0], False, True, None);  getitem_94 = leaky_relu_32 = None
                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    add_46 = torch.ops.aten.add(detach_195, avg_pool2d_backward);  detach_195 = avg_pool2d_backward = None
    leaky_relu_backward_13 = torch.ops.aten.leaky_relu_backward(add_46, convolution_32, 0.1, False);  add_46 = convolution_32 = None

ERROR

Test TorchDynamo on a wider variety of models

Now that TorchDynamo is working on most TorchBench models, we should start looking for additional models to test on to continue improving coverage and robustness.

All models are welcome here, so if you have specific models in mind from a use case you are familiar with please test them and report your experiences.

For testing on Meta models, @dzhulgakov suggested that @houseroad and @divchenko would be able to provide pointers to increasingly complex models to test on.

Add support for Python 3.10

Python 3.9 (#33) should be added before 3.10.

Known issues for Python 3.10 support:

Python 3.10 and later have a new method for mapping the bytecode index to line number, deprecating co_lnotab. See PEP 626. lnotab_writer in TorchDynamo needs to be rewritten to support the new format. See #36 for more on line numbers.

There are 9 new/changed bytecodes in 3.10

COPY_DICT_WITHOUT_KEYS
GET_LEN
MATCH_MAPPING
MATCH_SEQUENCE
MATCH_KEYS
MATCH_CLASS
Look like easy to add aliases for already supported things.

MAKE_FUNCTION
Handling of annotations changed. We should add a test case to make sure annotations work for nested functions.

ROT_N
Seems easy to add. We should update usage of rot_n_helper to use this new bytecode instead.

GEN_START
This could break handling of inline generators. Need to look into this in more detail.

Improve line number tracking and exceptions/error messages

Currently, TorchDynamo does not preserve line numbers in generated code in most cases, though it does in a few cases and in the case where the error happens at compile time. It supports generating line numbers in output code, so fixing this is just a matter of populating the line numbers on the Instruction() objects.

We should improve this and carry line numbers through our transformations. We should also test TorchDynamo on deliberately buggy programs and make sure it produces good error messages.

Debug issue with AOTAutograd for speech_transformer/hf_GPT2/hf_T5

The three models - speech_transformer, hf_GPT2 and hf_T5 fail with similar type of error signature.

TorchDynamo finds static subgraphs and sends them to AOT Autograd. AOT Autograd generates the forward and backward graphs. The output of AOT Autograd is a autograd.Function (code). AOT Autograd saves some tensors for the backward pass gradient computation in the forward pass.

The issue arises in the backward pass. When we read the saved_tensors, one of the item in the saved_tensors is not of Tensor type anymore. This causes cryptic error messages like the one below. And this type changes from run to run. I have seen immutable_dict, tuple and even weakref and builtin.

ERROR:root:unhandled error
Traceback (most recent call last):
  File "torchbench.py", line 1006, in run_one_model
    new_result = model_iter_fn(model, example_inputs)
  File "torchbench.py", line 482, in forward_and_backward_pass
    def forward_and_backward_pass(mod, inputs, collect_outputs=True):
  File "torchbench.py", line 482, in forward_and_backward_pass
    def forward_and_backward_pass(mod, inputs, collect_outputs=True):
  File "torchbench.py", line 482, in forward_and_backward_pass
    def forward_and_backward_pass(mod, inputs, collect_outputs=True):
  [Previous line repeated 2 more times]
  File "/fsx/users/anijain/functorch/functorch/_src/monkey_patching.py", line 97, in _backward
    return _old_backward(*args, **kwargs)
  File "/data/home/anijain/miniconda/envs/pytorch_dev/lib/python3.8/site-packages/torch/_tensor.py", line 395, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/data/home/anijain/miniconda/envs/pytorch_dev/lib/python3.8/site-packages/torch/autograd/__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/data/home/anijain/miniconda/envs/pytorch_dev/lib/python3.8/site-packages/torch/autograd/function.py", line 253, in apply
    return user_fn(self, *args)
  File "/fsx/users/anijain/torchdynamo/torchdynamo/eval_frame.py", line 58, in _fn
    return fn(*args, **kwargs)
  File "/fsx/users/anijain/functorch/functorch/_src/aot_autograd.py", line 188, in backward
    out = normalize_as_list(compiled_bw(*ctx.saved_tensors, *contiguous_args))
  File "/fsx/users/anijain/torchdynamo/torchdynamo/eval_frame.py", line 58, in _fn
    return fn(*args, **kwargs)
  File "/data/home/anijain/miniconda/envs/pytorch_dev/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
RuntimeError: forward() Expected a value of type 'Tensor (inferred)' for argument 'primals_14' but instead found type 'tuple'.
Inferred 'primals_14' to be of type 'Tensor' because it was not annotated with an explicit type.
Position: 19
Value: ('___check_obj_id', '___check_tensors', '___check_type_id', '___guarded_code')

I further looked into C++ and starting printing the type of objects while saving the tensors at the end of forward pass, and reading them back in backward pass. I observed the weird behavior in this line -(https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/python_function.cpp#L834). This is called in the backward pass, when we call ctx.saved_tensors.

When I print the unpacked_var, it is a tensor. It has its dim, I can print its shape and everything.
But Py_TYPE(value)→tp_name equals immutable_dict here.
The unpack_fn is basically THPVariable_Wrap - (https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/python_function.cpp#L849).

For completeness, adding images for the failure

Repro - python torchbench.py --training --devices=cuda --accuracy-aot-nop --only=hf_GPT2
image

Repro - python torchbench.py --training --devices=cuda --accuracy-aot-nop --only=speech_transformer
image

Repro - python torchbench.py --training --devices=cuda --accuracy-aot-nop --only=hf_T5
image

Support partially dynamic shapes

In update 5 we wrote:

Unfortunately, the problem of dynamic shapes is more complex than one might think. Enabling torchdynamo.config.dynamic_shapes will cause new graph breaks. Many models have code like assert x.shape == (1,2,3), if x.size(1) == 10, math.sqrt(x.shape[-1]), etc. This Python code operating on integer shapes is the defacto way to express many things in PyTorch. With static shapes, TorchDynamo can constant-propagate this stuff away, however, with dynamic shapes it will break the graph.

My current thinking is a “partially specialized shapes” mode in TorchDynamo. The basic idea would be that all shape start as fully dynamic, but then TorchDynamo would convert a tensor’s shapes to be static when the user called Tensor.size() and passed the result to a non-PyTorch operation. This would allow dynamic shapes most of the time, but still allow bigger graphs when users operate directly on shapes as integers.

To implement an initial version of this:

First build the analysis to add a TensorVariable().input_sources: Set[Source].

def foo(a, b):
  c = a + b

In this example:

  • a.input_souces = {a.source}
  • b.input_souces = {b.source}
  • c.input_souces = {a.source, b.source}

This is just a straight forward data flow analysis where sources are combined. It looks similar to the shape propagation currently implemented in TensorVariable.create.

Next, split GuardBuilder.TENSOR_MATCH into TENSOR_MATCH_STATIC and TENSOR_MATCH_DYNAMIC. The underlying TensorGuards object implemented in C++ already has these two modes, so it just requires having the generated code have two instances of that object.

Finally, modify how TensorVariable handles shape specialization. Defer setting TensorVariable().size and TensorVariable().stride until the user calls Tensor.size(). Note there are a few different ways to get the size, so search for usages of TensorVariable.size.

When .size is called, add a new guard for TENSOR_MATCH_STATIC on all the input_sources. (You can remove the now redundant TENSOR_MATCH_DYNAMIC guard in guard codegen.)

This should give you something that works and passes tests.

Improvements initial prototype:

  • We need to handle dynamic shape ops like nonzero, where, repeat, etc. Modify the analysis to mark tensors flowing from these ops, and break the graph if the user calls size on them. You can search for config.dynamic_shapes to find where we currently conditionally break the graph on those ops.
  • If a user passes the size directly to another PyTorch op, for example torch.empty(x.size()) we don't need to shape specialize and can just put the call to .size() in the graph. Similarly, simple math ops on sizes can be included in the graph. To handle this we will need a SizeVariable() to track and decide what can go in the graph and what requires specialization.
  • We don't need to specialize every dimension if the user code only uses some dimensions. We need better shape analysis to make this happen though. @eellison might be able to provide pointers for better shape analysis.

cc @ezyang

Support writing to closures while inlining

TorchDynamo support most cases of closures, however this one is not supported

def make_counter():
    x = torch.randn(10)

    def counter():
        nonlocal x
        x = x + 1
        return x

    return counter

@torchdynamo.optimize(torchdynamo.testing.CompileCounter(), nopython=True)
def fn(counter):
    return counter() + counter()


fn(make_counter())

The error (when in nopython=True mode) is:

...
torchdynamo.exc.Unsupported: write to __closure__ while inlining
Processing original code:
  File "/home/jansel/torchdynamo/tests/test_misc.py", line 959, in fn
    return counter() + counter()
  File "/home/jansel/torchdynamo/tests/test_misc.py", line 953, in counter
    x = x + 1

This will work if the closure is in the top-level frame. It will also work if the closure is defined within the captured scope. But in this case we can't actually emit a STORE_DEREF bytecode because "x" is not in our function's freevars.

To support this case we need to rewrite the STORE_DEREF to do something like:

def fn(counter):
   v0 = counter.__closure__[0].cell_contents
   v1 = v0 + 1
   v2 = v1 + 1
   counter.__closure__[0].cell_contents = v2
   return v1 + v2

We already do the first part here:
https://github.com/facebookresearch/torchdynamo/blob/1d1ef71d111d3df894f535ebf8ff6088c0c2e1e1/torchdynamo/variables/functions.py#L142
When we read the value of the closure.

We need to make it writable though. For that we need to register the cell using AttributeMutationExisting and then we can use side_effects.store_cell() on it.

Other types of cells are handled in
https://github.com/facebookresearch/torchdynamo/blob/1d1ef71d111d3df894f535ebf8ff6088c0c2e1e1/torchdynamo/symbolic_convert.py#L1152
and
https://github.com/facebookresearch/torchdynamo/blob/1d1ef71d111d3df894f535ebf8ff6088c0c2e1e1/torchdynamo/symbolic_convert.py#L1163

Note that this explicit cell handling is specific to inlining. When we aren't inlining we treat closures like normal variables with different load/store bytecodes.

Build issues on Mac OS, build script complains about gcc not supporting c++14

Followed the instructions on https://github.com/facebookresearch/torchdynamo to build on Mac OS, got the following error: https://gist.github.com/vkuzo/a9b316590d0eb043f347ae2c0e8c209f . Note: pytorch/pytorch builds without issues in my setup.

Relevant line:

/Users/vasiliy/pytorch/torch/include/torch/csrc/api/include/torch/all.h:4:2: error: C++14 or later compatible compiler is required to use PyTorch.

Note: can fix this locally by adding extra_compile_args=["-std=c++14"] to the torchdynamo._guards portion in setup.py.

Python 3.11 support

Python 3.11 won't be released until the end of the year, but I wanted to put a few notes as we see changes in the development version.

The main one so far is PEP 523 is being moved to the internal Python API and to use it we will need to do:

#ifndef Py_BUILD_CORE_MODULE
#  define Py_BUILD_CORE_MODULE
#endif
#include <Python.h>
#include <internal/pycore_interp.h> // _PyInterpreterState_SetEvalFrameFunc()
#include <internal/pycore_ceval.h>  // _PyEval_EvalFrameDefault

There is some discussion about a different #define being needed, so that may change before release.
For more details, see this thread.

Support distributed training

This is a placeholer task to make distributed training work with TorchDynamo + AOT Autograd. The main work seems to be making sure the relevant ops can be traced with AOT Autograd and are properly added to the FX graph by TorchDynamo.

I expect most of the issues will be at the AOT Autograd level, because TorchDynamo treats most torch.* ops as black box. We should test and verify this though.

@alanwaketan can fill in details.

Pip Installs

pip install dynamo
pip install trt ? (or other TRT install?)

Debug issue with tacotron2

tacotron2 was recently added to torchbench with get_module() support, and it seems it does not work properly.

@anijain2305 reported an error running

python torchbench.py --devices=cuda --only=tacotron2

I haven't had a chance to look into this one yet, but creating an issue so that it does not get lost. Feel free to add more details @anijain2305.

Debug python key tracing error with hf_Reformer

Repro

./torchbench.py --no-skip --python-key -n 1 -k hf_Reformer

Error:

Traceback (most recent call last):
  File "/home/jansel/torchdynamo/torchdynamo/output_graph.py", line 317, in call_user_compiler
    compiled_fn = self.compiler_fn(gm, self.example_inputs())
  File "/home/jansel/torchdynamo/torchdynamo/optimizations/python_key.py", line 137, in python_key
    gm, make_wrapper = python_key_normalize(gm, example_inputs)
  File "/home/jansel/torchdynamo/torchdynamo/optimizations/python_key.py", line 120, in python_key_normalize
    graph = tracer.trace(fake_signature(fn_for_tracing, nargs))
  File "/home/jansel/pytorch/torch/fx/_symbolic_trace.py", line 577, in trace
    self.create_node('output', 'output', (self.create_arg(fn(*args)),), {},
  File "<string>", line 1, in <lambda>
  File "/home/jansel/torchdynamo/torchdynamo/optimizations/python_key.py", line 114, in fn_for_tracing
    out = PatchingInterpreter(gm).run(*args[params_len:])
  File "/home/jansel/pytorch/torch/fx/interpreter.py", line 120, in run
    self.env[node] = self.run_node(node)
  File "/home/jansel/torchdynamo/torchdynamo/optimizations/python_key.py", line 77, in run_node
    result = super().run_node(n)
  File "/home/jansel/pytorch/torch/fx/interpreter.py", line 147, in run_node
    return getattr(self, n.op)(n.target, args, kwargs)
  File "/home/jansel/pytorch/torch/fx/interpreter.py", line 241, in call_method
    return getattr(self_obj, target)(*args_tail, **kwargs)
RuntimeError: DispatchKey PythonTLSSnapshot doesn't correspond to a device

This is coming from a call to Tensor.new it seems.

cc @Chillee @anijain2305

Support formatted literal strings (f-strings)

f-strings produce the FORMAT_VALUE and BUILD_STRING bytecodes that are yet supported.

These bytecodes could be supported by rewriting them to call the related functions (str.format, str, repr, ascii, etc).

Likely the most useful case of this would be constants (or things TorchDynamo specializes on like cls.__name__). Something like f"foo {self.__class__.__name__} bar {x.shape}" should not need to cause a graph break.

Another useful case would be deferring the string formatting calls to the end of the graph. If there are pytorch ops in the f-string, currently they won't be included in the graph.

Debug TorchScript error from moco

Repro - python torchbench.py --training --devices=cuda --accuracy-ts --only=moco

This ones has a DistributedDataParallel module, so it might be something we can table for now.

Error is pretty long, the important section is as follows

	First diverging operator:
	Node diff:
		- %mod : __torch__.torch.nn.parallel.distributed.DistributedDataParallel = prim::GetAttr[name="mod"](%self.1)
		+ %mod : __torch__.torch.nn.parallel.distributed.___torch_mangle_596.DistributedDataParallel = prim::GetAttr[name="mod"](%self.1)
		?                                                ++++++++++++++++++++
ERROR: Tensor-valued Constant nodes differed in value across invocations. This often indicates that the tracer has encountered untraceable code.

Investigate issue with python key tracing and LSTM

Repro:

env PYTHONKEY_VERBOSE=1 ./torchbench.py --no-skip --python-key -n 1 -k demucs

Python key tracing produces the following warning:

WARNING:torchdynamo.optimizations.python_key:returning real tensor? call_function _operator.getitem <built-in function getitem> (mod_model_lstm_lstm, 0) {}

which is coming from this line:
https://github.com/facebookresearch/torchdynamo/blob/e84f9fee18ae5ab7bfca5504e200503de174efb5/torchdynamo/optimizations/python_key.py#L97

This makes me worries we are missing some operators. While python key tracing, everything should be a functorch._src.python_key.PythonTensor, yet somehow an unwrapped tensor is leaking through.

I suspect our pytree walk of the module hierarchy might be missing some LSTM-related wrapper class. Though I haven't confirmed this.

cc @Chillee

TensorRT virtualMemoryBuffer internal error

Observed the error msg after run. Maybe related with how torchdynamo release the resources?

python torchbench.py -dcuda --speedup-fx2trt-fp16 --only mobilenet_v2
cuda eval mobilenet_v2 [04/05/2022-15:43:11] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.1
[04/05/2022-15:43:14] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5
[04/05/2022-15:43:48] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.1
[04/05/2022-15:43:48] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5
[04/05/2022-15:43:48] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.1
[04/05/2022-15:43:48] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5

Supported node types in the model:
acc_ops.conv2d: ((), {'input': torch.float16, 'weight': torch.float16})
acc_ops.batch_norm: ((), {'input': torch.float16, 'running_mean': torch.float16, 'running_var': torch.float16, 'weight': torch.float16, 'bias': torch.float16})
acc_ops.hardtanh: ((), {'input': torch.float16})
acc_ops.add: ((), {'input': torch.float16, 'other': torch.float16})
acc_ops.adaptive_avg_pool2d: ((), {'input': torch.float16})
acc_ops.flatten: ((), {'input': torch.float16})
acc_ops.linear: ((), {'input': torch.float16, 'weight': torch.float16, 'bias': torch.float16})

Unsupported node types in the model:

graph is split into _run_on_acc_0
Similarity score=0.9999595284461975
7.243x p=0.00
Unexpected Internal Error: [virtualMemoryBuffer.cpp::~StdVirtualMemoryBufferImpl::121] Error Code 1: Cuda Runtime (driver shutting down)

Handle -inf for Torchscript of FX graphs

Repro

import torch
import torch.fx

x = torch.randn(4, 5)
mask = torch.randn(4, 5) > 0.5

def f(x, mask):
    # return x.masked_fill_(mask, 1.0) # PASSES
    return x.masked_fill_(mask, float("-inf"))

print(f(x, mask))

# Only fails when symbolic_trace
fx_mod = torch.fx.symbolic_trace(f)
scripted_f = torch.jit.script(fx_mod)
print(scripted_f(x, mask))

@eellison

Add support for Python 3.9

The process for adding support for Python 3.9 is as follows.

First, examine the 10 new bytecodes in 3.9:

RERAISE
raising exceptions is currently not supported and breaks the graph. So this can just call unimplemented() for now. See issue pytorch/pytorch#93720 for more on exceptions.

WITH_EXCEPT_START
might affect support for with no_grad(): and related ops

LOAD_ASSERTION_ERROR
LIST_TO_TUPLE
LIST_EXTEND
SET_UPDATE
DICT_UPDATE
DICT_MERGE
IS_OP
CONTAINS_OP
these are all simple aliases for things TorchDynamo already supports and should be easy to handle

Next, update the versions supported in setup.py so the build works.

Next, iteratively add support for ops and fix issues until all tests pass. pytest tests to run tests.

Next, iteratively fix issues in ./torchbench.py until all models pass and match the coverage of Python 3.8

CICD setup to build pip/conda packages

Someone should be able to pip install torchdynamo and not need to install from source.

If we do binary releases we may need to pin to specific PyTorch versions, so it might be better to ship only source packages to pypi.

Debug python key tracing error with hf_BigBird

edited by @ezyang

Repro

./torchbench.py --no-skip --python-key -n 1 -k hf_BigBird --devices cuda --float32

old stuff (this output is no longer master

Output

cpu  eval  hf_BigBird                         ERROR:root:unhandled error
Traceback (most recent call last):
  File "./torchbench.py", line 911, in run_one_model
    new_result = model_iter_fn(model, example_inputs)
  File "./torchbench.py", line 456, in forward_pass
    def forward_pass(mod, inputs, collect_outputs=True):
  File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1111, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 2321, in forward
    @add_start_docstrings_to_model_forward(BIG_BIRD_INPUTS_DOCSTRING.format("(batch_size, sequence_length)"))
  File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1111, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 1920, in forward
    @add_start_docstrings_to_model_forward(BIG_BIRD_INPUTS_DOCSTRING.format("(batch_size, sequence_length)"))
  File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1111, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 1615, in forward
    layer_outputs = layer_module(
  File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1111, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 1451, in forward
    def forward(
  File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1111, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 1381, in forward
    self_outputs = self.self(
  File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1111, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 435, in forward
    def forward(
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 502, in bigbird_block_sparse_attention
    def bigbird_block_sparse_attention(
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 502, in bigbird_block_sparse_attention
    def bigbird_block_sparse_attention(
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 502, in bigbird_block_sparse_attention
    def bigbird_block_sparse_attention(
  [Previous line repeated 2 more times]
  File "/home/jansel/torchdynamo/torchdynamo/eval_frame.py", line 58, in _fn
    return fn(*args, **kwargs)
  File "/home/jansel/torchdynamo/torchdynamo/optimizations/python_key.py", line 129, in call_fn
    return inner(*params_flat, *args)
  File "<eval_with_key>.20", line 105, in forward
    unsqueeze__1 = torch.ops.aten.unsqueeze_(detach_36, 2);  detach_36 = None
  File "/home/jansel/pytorch/torch/_ops.py", line 142, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: set_storage_offset is not allowed on a Tensor created from .data or .detach().
If your intent is to change the metadata of a Tensor (such as sizes / strides / storage / storage_offset)
without autograd tracking the change, remove the .data / .detach() call and wrap the change in a `with torch.no_grad():` block.
For example, change:
    x.data.set_(y)
to:
    with torch.no_grad():
        x.set_(y)
ERROR

Somehow the code produced by python key tracing triggers an error.

cc @Chillee @anijain2305

Fix issues in detectron2_maskrcnn

./torchbench.py --no-skip -k detectron2_maskrcnn
ERROR FROM offset=6 filename /home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/structures/masks.py 522 KeyError
========== TorchDynamo Stack Trace ==========
Traceback (most recent call last):
  File "/home/jansel/torchdynamo/torchdynamo/convert_frame.py", line 158, in _convert_frame_assert
    code = transform_code_object(frame.f_code, transform)
  File "/home/jansel/torchdynamo/torchdynamo/bytecode_transformation.py", line 284, in transform_code_object
    transformations(instructions, code_options)
  File "/home/jansel/torchdynamo/torchdynamo/convert_frame.py", line 134, in transform
    tracer.run()
  File "/home/jansel/torchdynamo/torchdynamo/symbolic_convert.py", line 274, in run
    and self.step()
  File "/home/jansel/torchdynamo/torchdynamo/symbolic_convert.py", line 252, in step
    getattr(self, inst.opname)(inst)
  File "/home/jansel/torchdynamo/torchdynamo/symbolic_convert.py", line 384, in IMPORT_FROM
    self.LOAD_ATTR(inst)
  File "/home/jansel/torchdynamo/torchdynamo/symbolic_convert.py", line 608, in LOAD_ATTR
    result = BuiltinVariable(getattr).call_function(
  File "/home/jansel/torchdynamo/torchdynamo/variables/builtin.py", line 212, in call_function
    result = handler(tx, *args, **kwargs)
  File "/home/jansel/torchdynamo/torchdynamo/variables/builtin.py", line 461, in call_getattr
    member = obj.value.__dict__[name]
KeyError: 'paste_masks_in_image'
========== Exception (above) while processing ==========
  File "./torchbench.py", line 1019, in <module>
    main()
  File "./torchbench.py", line 913, in main
    run_one_model(
  File "./torchbench.py", line 981, in run_one_model
    new_result = model_iter_fn(model, example_inputs)
  File "./torchbench.py", line 469, in forward_pass
    def forward_pass(mod, inputs, collect_outputs=True):
  File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1111, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 122, in forward
    def forward(self, batched_inputs: List[Dict[str, torch.Tensor]]):
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 174, in inference
    def inference(
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 174, in inference
    def inference(
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 174, in inference
    def inference(
  [Previous line repeated 1 more time]
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 229, in _postprocess
    @staticmethod
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/modeling/postprocessing.py", line 9, in detector_postprocess
    def detector_postprocess(
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/modeling/postprocessing.py", line 9, in detector_postprocess
    def detector_postprocess(
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/modeling/postprocessing.py", line 9, in detector_postprocess
    def detector_postprocess(
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/modeling/postprocessing.py", line 67, in detector_postprocess
    results.pred_masks = roi_masks.to_bitmasks(
  File "/home/jansel/conda/envs/torchdynamo/lib/python3.8/site-packages/detectron2/structures/masks.py", line 517, in to_bitmasks
    @torch.jit.unused
========== End debug info ==========

Issue with demucs model

The demucs model will throw at:

cuda eval demucs Traceback (most recent call last):
File "torchbench.py", line 957, in
main()
File "torchbench.py", line 824, in main
run_one_model(
File "torchbench.py", line 904, in run_one_model
assert not torchdynamo.utils.is_jit_model(submod)
AssertionError

Here is the command to repro: python torchbench.py -dcuda --speedup-fx2trt-fp16 --only demucs

Usage Tutorial

Create a tutorial on how to use it:

GPU: Forward, Backwards
CPU: ??

Skip non-Tensor/Module frames

People may have non-PyTorch code run under TorchDynamo. We should make sure TorchDynamo does nothing in this case.

If TorchDynamo reaches the end of the frame without finding PyTorch ops, it will just run the frame normally. However, if there are unsupported things that prevent a whole-graph, TorchDyanmo could generate specialized frames for non-PyTorch code. This should be correct, but could add extra overhead.

To improve this we should expand the logic in this function:
https://github.com/facebookresearch/torchdynamo/blob/44971ffd9a7e6798b7868a592c337acb75bd1d2d/torchdynamo/symbolic_convert.py#L1008
That function controls if TorchDynamo should break the graph and generate a resume_at_xx function to pick up after an unsupported thing.

The logic I would propose is: examine the stack, locals, and globals referenced by co_names; if there is a tensor/nn.Module/torch.* anywhere then, then keep doing what we do now; if there is not, just bail out and switch to normal execution.

Fix python key tracing errors with quantized models

Repro

./torchbench.py --no-skip --python-key -n 1 -k mobilenet_v2_quantized

Partial output:

ERROR:torchdynamo.optimizations.python_key:exception running call_function torch.quantize_per_tensor (inputs_0_, mod_features_0_0_input_scale_0, mod_features_0_0_input_zero_point_0, torch.quint8) {}
Traceback (most recent call last):
  File "/home/jansel/torchdynamo/torchdynamo/optimizations/python_key.py", line 77, in run_node
    result = super().run_node(n)
  File "/home/jansel/pytorch/torch/fx/interpreter.py", line 147, in run_node
    return getattr(self, n.op)(n.target, args, kwargs)
  File "/home/jansel/pytorch/torch/fx/interpreter.py", line 219, in call_function
    return target(*args, **kwargs)
  File "/home/jansel/functorch/functorch/_src/python_key.py", line 112, in __torch_dispatch__
    return wrap_with_proxy(real_out, proxy_out)
  File "/home/jansel/functorch/functorch/_src/python_key.py", line 104, in wrap_with_proxy
    return PythonTensor(e, proxy)
  File "/home/jansel/functorch/functorch/_src/python_key.py", line 60, in __new__
    proxy.node.meta['tensor_meta'] = _extract_tensor_metadata(r)
  File "/home/jansel/pytorch/torch/fx/passes/shape_prop.py", line 48, in _extract_tensor_metadata
    qscheme = result.qscheme()
RuntimeError: toIValue() cannot handle converting to type: QScheme

This effects 2 quantized modules with the same error.

cc @Chillee @anijain2305

[fx2trt] TRT issue

hf_Reformer: CUDA error: device-side assert triggered
fastNLP_Bert: [TRT] [E] 3: [layers.h::setAxis::624] Error Code 3: API Usage Error

[fx2trt] op support

hf_T5:
torch.rsqrt, pow, acc_ops.to,torch.isinf, any, float, type_as,

hf_GPT2:
acc_ops.split, torch.where, type

soft_actor_critic:
exp,torch.functional.broadcast_tensors

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.