🐛 Describe the bug When TransformerEnco

Forward hooks not called when fast path is used in TransformerEncoderLayer about pytorch HOT 1 CLOSED

iibrahimli commented on September 28, 2024

Forward hooks not called when fast path is used in TransformerEncoderLayer

from pytorch.

Comments (1)

iibrahimli commented on September 28, 2024

My proposed solution would be to fall back from using fast path if there are pre-/forward hooks on any submodules of the layer. I have started working on it: #128415

from pytorch.

Related Issues (20)

xpu: set of aten ops are missing for Huggingface Transformers
DOT/SVG Generation of Quantization Annotated FX Graphs Broken on 2.5.0.dev20240901
Setting wrong type values to `stride`, `padding`, `output_padding` and `dilation` argument of `nn.ConvTranspose1d()` gets the wrong error messages saying only `tuple of ints`
`padding_mode` parameter of `nn.ConvTranspose1d()`, `nn.ConvTranspose2d()` and `nn.ConvTranspose3d()` is not explained in the docs
The real AttributeError information
torch._dynamo.exc.Unsupported: builtin: bool [<class 'torch._dynamo.variables.tensor.SymNodeVariable'>] False
ValueError: Pointer argument (at 3) cannot be accessed from Triton
ONNX Export Fails with Dynamic Slicing on Data-Dependent Value HOT 6
gesvda driver of svd returns nan for zero matrix HOT 2
Same token different output from `Conv1d` HOT 1
[ONNX] `dynamo_export` `Unknown call_function target: <function sym_float at 0x7a47c206c860>` HOT 6
Tensor `isin` and `unique` missing bfloat16 support and half support on CPU HOT 1
Have a way to mark that particular buffers can be reused for Inductor HOT 1
Setting wrong type values to `kernel_size`, `stride`, `padding` and `dilation` argument of `nn.MaxPool1d()` gets the wrong error messages saying only `tuple of ints` HOT 1
Quantized model is way slower than regular model.
DISABLED test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 (__main__.TestSDPACudaOnlyCUDA) HOT 1
Each parameter of `nn.MaxPool1d()`, `nn.MaxPool2d()` and `nn.MaxPool3d()` should have `required` or `optional`
`int`, `float` and `complex` type with `return_indices` of `nn.MaxPool1d()` also work
Inductor doesn't inplace normalization operations HOT 1
Flex Attention: Calculates Gradients Even if Input Has requires_grad=False

Forward hooks not called when fast path is used in TransformerEncoderLayer about pytorch HOT 1 CLOSED

Comments (1)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent