🚀 The feature, motivation and pitch In the documentation, dtype i

GradType: a subset of dtype that is differentiable, containing all float and complex dtypes about pytorch HOT 3 OPEN

dvorst commented on July 22, 2024

GradType: a subset of dtype that is differentiable, containing all float and complex dtypes

from pytorch.

Comments (3)

ezyang commented on July 22, 2024 1

I'm not convinced this is all that much clarity, it's usually contextually clear if something is differentiable...

from pytorch.

mikaylagawarecki commented on July 22, 2024

Do you have specific examples of functions that are type-hinted with dtype that would benefit from this

from pytorch.

dvorst commented on July 22, 2024

I suppose everything for which a gradient is calculated during backprop, such as torch.nn.Linear

from pytorch.

Related Issues (20)

DISABLED test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_192_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 (__main__.TestSDPACudaOnlyCUDA) HOT 2
DISABLED test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_48_bfloat16_scale0_cuda_bfloat16 (__main__.TestSDPACudaOnlyCUDA) HOT 1
DISABLED test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_48_float16_scale_l1_cuda_float16 (__main__.TestSDPACudaOnlyCUDA) HOT 1
DISABLED test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 (__main__.TestSDPACudaOnlyCUDA) HOT 4
DISABLED test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_48_bfloat16_scale0_cuda_bfloat16 (__main__.TestSDPACudaOnlyCUDA) HOT 1
DISABLED test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_48_bfloat16_scale0_cuda_bfloat16 (__main__.TestSDPACudaOnlyCUDA) HOT 1
The empty 1D or more D tensor for `mode()` with the deepest `dim` doesn't work HOT 2
Masked Attention has no effect in ``TransformerEncoderLayer``
DISABLED test_sdp_math_gradcheck_contiguous_inputs_True_cuda (__main__.TestSDPACUDA) HOT 1
torch.nonzero with known count to avoid host-device synchronization HOT 2
FlashAttention IMA HOT 1
An API to re-enable dynamo-disabled functions
[torch.compile][HF] torch.compile issue tracker for torch.compile on forward method of Meta Llama model HOT 2
ONNX: wrong operator for ceil_mode Pooling in case of skip the last window HOT 1
The empty 0D or more D tensor with exceeding `dim`(`dim=100` or `dim=-100`) for `cummin()` works
The empty 0D or more D tensor with exceeding `dim`(`dim=100` or `dim=-100`) for `cummax()` works
Failing to export ONNX when `GroupNorm` input shape = 2
The shape of weights in lstm seems to be wrong.
`input` argument of `nan_to_num()` works with `complex` type but `nan`, `posinf` and `neginf` argument don't work with `complex` type
torch cuda-121 fails to load .so library HOT 1

GradType: a subset of dtype that is differentiable, containing all float and complex dtypes about pytorch HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent