Repro below. It looks like even though dtype=torch.bfloat16<

multithreaded autograd backward doesn't respect autocast dtype context manager about pytorch HOT 2 OPEN

bdhirsh commented on June 23, 2024

multithreaded autograd backward doesn't respect autocast dtype context manager

from pytorch.

Comments (2)

soulitzer commented on June 23, 2024 2

Given those two points, it seems likely that when autograd spawns threads, we aren't properly copying over the threadlocal AMP state from the spawned thread?

Yup - today the engine automatically replicates TLS state managed by ThreadLocalState() onto the spawned threads. We should have autocast dtype state to also be managed this way.

from pytorch.

bdhirsh commented on June 23, 2024

tentatively marking hi-pri since running the backward at a different precision than the user asked for seems bad

from pytorch.

Related Issues (20)

Tensor computation error on MPS backend HOT 8
[ONNX] view(dtype=dtype) is not supported by both onnx.export and onnx.dynamo_export HOT 3
[compiled autograd][cudagraphs] accessing TLS cudagraph manager results in corrupted memory
[FSDP] show better warning msg when wrapping nn.ModuleList or nn.ModuleDict HOT 1
[compiled autograd][aot autograd] accumulate grad (on param with non empty grad) mutates inputs and prevents cudagraph HOT 2
Linker Errors on ARM System While Building PyTorch from Source with clang on Main Branch HOT 8
[async H2D] memory ordering issue for async H2D with pin memory on CUDA device HOT 3
`dsplit()` with `indices_or_sections=` doesn't work while `dsplit()` without `indices_or_sections=` works
[DSD] keep 'initial_lr' in `torch.distributed.checkpoint.state_dict.set_optimizer_state_dict`
`tensor_split()` with `indices_or_sections=` doesn't work while `tensor_split()` without `indices_or_sections=` works
[DSD] keep 'exp_avg' as DTensor after `torch.distributed.checkpoint.state_dict.set_optimizer_state_dict`
Add integrity check in torch.save HOT 2
DISABLED test_memory_snapshot (__main__.TestCudaMallocAsync) HOT 1
DISABLED test_memory_format_type_cuda (__main__.TestTorchDeviceTypeCUDA) HOT 1
Segmentation fault (core dumped) when using pytorch Conv layers HOT 3
Implicit data type promotion in torch.cat is undocumented HOT 1
ONNX Exporter Fails with Handling Complex Tensors
Performance Degradation in F.linear with Batch Size > 1 in Multi-Head Attention
TypeError: slice indices must be integers or None or have an __index__ method
A huge difference between the results of torch.round() on the GPU compared to its results on the CPU and other DL libraries HOT 4

multithreaded autograd backward doesn't respect autocast dtype context manager about pytorch HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent