Comments (7)
We could add it to torch logs. I don't want to add to warning spam, especially for something that is not actionable to users.
But in either case we should get to full meta coverage of foreach kernels. As issue #105105 here shows there are a number of foreach without meta.
from pytorch.
cc @mlazos ?
from pytorch.
The fix should be to add a meta registration (#123463). I have confirmed that fixes the issue locally. I am not familiar with the test infra to adjust the expected failing tests, as it seems like some change is needed for the sample inputs.
Something for discussion though is if we can warn when the fallback behavior happens. It looks like the op was silently being run on CPU despite being under FakeTensorMode
.
from pytorch.
From offline discussion: @janeyx99 will figure out how to land the PR with the appropriate testing changes.
from pytorch.
While the PR #123486 "fixes" this issue for this specific use case, it'd be good to confirm why this happened in the first place/if the behavior is expected.
from pytorch.
@janeyx99 yes it is expected that if you don't have a meta registration we fallback to the eager implementation. This was originally for bootstrapping before we had large meta kernel support. @zou3519 turned the default to not fallback for non-aten kernels but could not switch the default for aten kernels due to long-tail ops.
For foreach kernels that we are running with parameters IMO it would makes sense to prioritize meta coverage here since fallback can be memory intensive.
cc @mlazos
from pytorch.
@eellison What is your opinion on adding a warning when falling back to the eager implementation? Would it be too noisy?
For this particular case, debugging this CPU OOM was non-trivial (first seen on MAST and having to iteratively narrow down the culprit through the training loop to DTensor
to the FakeTensorMode
part of DTensor
sharding propagation), and it would have saved a lot of time if we knew that the fallback was materializing CPU tensors.
from pytorch.
Related Issues (20)
- When test the scalar version, test_AllenaiLongformerBase_repro will fail
- Quantization occurs with RuntimeError: `zero_point` must be between `quant_min` and `quant_max`.
- ```FlopCounterMode``` returns 0 when inference mode is on during forwardpropagation. HOT 4
- [ONNX] dynamo_export() fails to automatically switch to external weights for large models HOT 4
- torch.linalg.lstsq encountered an error while calculating the least squares solution HOT 1
- [inductor][cpu]speech_transformer AMP single/multiple thread static/dynamic shape CPP/default wrapper performance regression in 2024-05-12 nightly release HOT 1
- The wall of warning text from process group in 2.4 is too much
- torch script not support ModuleList/Sequential ?
- [JIT] Unexpected change in requires_grad attribute after calling JIT traced function with the profiling executor disabled
- torch.chunk returns Tuple, not List HOT 2
- Erroneous Conda Windows MKL dependency HOT 5
- aarch64 DEBUG build failure HOT 4
- `StateDictOptions(strict=True)` gets ignored HOT 2
- Adopt inductor-perf-test-nightly.yml to be used for release testing
- Pytorch -> ExecuTorch conversion fails with BatchNorm2d layer
- [inductor][cpu]lennard_jones, pyhpc_isoneutral_mixing and pyhpc_equation_of_state performance regression in 2024-05-12 nightly release HOT 1
- DISABLED test_ring_attention_custom_transformer (__main__.RingAttentionTest) HOT 1
- Make it easier to run distributed tests without run_test.py
- FSDP `use_orig_params` + full sharding results in missing parameters in gathered state dict HOT 4
- torch.compile hang/crashes with worker_start_method=spawn HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch.