Comments (3)
I'm not convinced this is all that much clarity, it's usually contextually clear if something is differentiable...
from pytorch.
Do you have specific examples of functions that are type-hinted with dtype
that would benefit from this
from pytorch.
I suppose everything for which a gradient is calculated during backprop, such as torch.nn.Linear
from pytorch.
Related Issues (20)
- DISABLED test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_192_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 (__main__.TestSDPACudaOnlyCUDA) HOT 2
- DISABLED test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_48_bfloat16_scale0_cuda_bfloat16 (__main__.TestSDPACudaOnlyCUDA) HOT 1
- DISABLED test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_48_float16_scale_l1_cuda_float16 (__main__.TestSDPACudaOnlyCUDA) HOT 1
- DISABLED test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 (__main__.TestSDPACudaOnlyCUDA) HOT 4
- DISABLED test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_48_bfloat16_scale0_cuda_bfloat16 (__main__.TestSDPACudaOnlyCUDA) HOT 1
- DISABLED test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_48_bfloat16_scale0_cuda_bfloat16 (__main__.TestSDPACudaOnlyCUDA) HOT 1
- The empty 1D or more D tensor for `mode()` with the deepest `dim` doesn't work HOT 2
- Masked Attention has no effect in ``TransformerEncoderLayer``
- DISABLED test_sdp_math_gradcheck_contiguous_inputs_True_cuda (__main__.TestSDPACUDA) HOT 1
- torch.nonzero with known count to avoid host-device synchronization HOT 2
- FlashAttention IMA HOT 1
- An API to re-enable dynamo-disabled functions
- [torch.compile][HF] torch.compile issue tracker for torch.compile on forward method of Meta Llama model HOT 2
- ONNX: wrong operator for ceil_mode Pooling in case of skip the last window HOT 1
- The empty 0D or more D tensor with exceeding `dim`(`dim=100` or `dim=-100`) for `cummin()` works
- The empty 0D or more D tensor with exceeding `dim`(`dim=100` or `dim=-100`) for `cummax()` works
- Failing to export ONNX when `GroupNorm` input shape = 2
- The shape of weights in lstm seems to be wrong.
- `input` argument of `nan_to_num()` works with `complex` type but `nan`, `posinf` and `neginf` argument don't work with `complex` type
- torch cuda-121 fails to load .so library HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch.