Comments (2)
Given those two points, it seems likely that when autograd spawns threads, we aren't properly copying over the threadlocal AMP state from the spawned thread?
Yup - today the engine automatically replicates TLS state managed by ThreadLocalState() onto the spawned threads. We should have autocast dtype state to also be managed this way.
from pytorch.
tentatively marking hi-pri since running the backward at a different precision than the user asked for seems bad
from pytorch.
Related Issues (20)
- Tensor computation error on MPS backend HOT 8
- [ONNX] view(dtype=dtype) is not supported by both onnx.export and onnx.dynamo_export HOT 3
- [compiled autograd][cudagraphs] accessing TLS cudagraph manager results in corrupted memory
- [FSDP] show better warning msg when wrapping nn.ModuleList or nn.ModuleDict HOT 1
- [compiled autograd][aot autograd] accumulate grad (on param with non empty grad) mutates inputs and prevents cudagraph HOT 2
- Linker Errors on ARM System While Building PyTorch from Source with clang on Main Branch HOT 8
- [async H2D] memory ordering issue for async H2D with pin memory on CUDA device HOT 3
- `dsplit()` with `indices_or_sections=` doesn't work while `dsplit()` without `indices_or_sections=` works
- [DSD] keep 'initial_lr' in `torch.distributed.checkpoint.state_dict.set_optimizer_state_dict`
- `tensor_split()` with `indices_or_sections=` doesn't work while `tensor_split()` without `indices_or_sections=` works
- [DSD] keep 'exp_avg' as DTensor after `torch.distributed.checkpoint.state_dict.set_optimizer_state_dict`
- Add integrity check in torch.save HOT 2
- DISABLED test_memory_snapshot (__main__.TestCudaMallocAsync) HOT 1
- DISABLED test_memory_format_type_cuda (__main__.TestTorchDeviceTypeCUDA) HOT 1
- Segmentation fault (core dumped) when using pytorch Conv layers HOT 3
- Implicit data type promotion in torch.cat is undocumented HOT 1
- ONNX Exporter Fails with Handling Complex Tensors
- Performance Degradation in F.linear with Batch Size > 1 in Multi-Head Attention
- TypeError: slice indices must be integers or None or have an __index__ method
- A huge difference between the results of torch.round() on the GPU compared to its results on the CPU and other DL libraries HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch.