Comments (7)
Anomaly mode is supposed to always work for errors that are triggered from Variable._execution_engine.run_backward. But actually, it seems like the problem here is warnings, rather than errors, and it does seem likely to me that we are not annotating the traceback of the forward that caused it with warnings. This is compounded by the fact that we are typically not setting stacklevel correctly when we warn, so the single line warning printout doesn't even say what the relevant user code is.
It feels like it should be possible to install a temporary warning handler when we run backwards which augments warnings as well with user stacks. But we... probably don't want to print the full stacks? So we need some way of abbreviating it to one filename:lineno by default?? Not trivial. If anyone wants to try their hand at it I'd be happy tor eview.
@albanD wdyt
from pytorch.
It might also be a good time, in the age of PT2 supremacy, to consider turning anomaly mode error tracking on by default.
from pytorch.
Anomaly mode is supposed to always work for errors that are triggered from Variable._execution_engine.run_backward. But actually, it seems like the problem here is warnings, rather than errors
To be clear, I am not talking about anomaly mode, but warnings (that are printed from C++) in general. I actually meant that anomaly mode is an example of relatively GOOD warning messages. What I'd like to see is for all other C++ warnings in pytorch to have the same debugability.
But we... probably don't want to print the full stacks? So we need some way of abbreviating it to one filename:lineno by default?? Not trivial.
Actually, I think that full tracebacks are required for debugging. Rather than trying to guess the correct stack level, I'd prefer to have a special "verbose warnings" mode where the full tracebacks are printed.
I suspect that the hard part would be to record/identify the correct tracebacks during the forward pass. Without this, it doesn't matter if you are able to guess the stack level correctly, all of the warnings will just point to the my_loss.backwards()
line in the user's code (which isn't that much of an improvement compared to return Variable._execution_engine.run_backward
).
from pytorch.
Oh, I am reminded of #72948 which we eventually decided not to do because passing C++ log messages to Python was just... not a great idea. @kurtamohler did we ever take a closer look at the warning only piece of the puzzle?
from pytorch.
I don't think I ever looked into improving the traceability of warnings
from pytorch.
Actionable to augment warnings during backward with user stacks.
Prior to executing each node during backward, we already enter a warning recording context - when anomaly mode is enabled we should be able to include more information there. See: ehttps://github.com//pull/66235.
from pytorch.
Reserving this as an internal onboarding task
from pytorch.
Related Issues (20)
- Torch C++ extension build failed with `fatal error: nlohmann/json.hpp: No such file or directory`
- TorchScript: `Return value was annotated as having type float but is actually of type int` violates PEP 484
- Pytorch 2.4 RC cu118 wheels do not work on old drivers HOT 2
- The doc of `stack()` should say `tuple` or `list` of tensors for `tensors` argument HOT 1
- Compilation Fails with torch.sparse and "fullgraph=True" HOT 4
- The doc of `cat()` should say `tuple` or `list` of tensors for `tensors` argument HOT 1
- The doc of `hstack()` should say `tuple` or `list` of tensors for `tensors` argument HOT 1
- The doc of `vstack()` should say `tuple` or `list` of tensors for `tensors` argument HOT 1
- The doc of `column_stack()` should say `tuple` or `list` of tensors for `tensors` argument HOT 1
- The doc of `dstack()` should say `tuple` or `list` of tensors for `tensors` argument HOT 1
- fx.wrap() doesn't really work for things in torch/*
- Converting a numpy array of size larger than 32,768 to a tensor causes a segmentation fault HOT 7
- tree_map_only_ doesn't seem work as expected HOT 2
- [TensorDict - compile] dynamo unaware of `set().union()` HOT 3
- [TensorDict - compile] dynamo doesn't know about `callable(smth)`
- [dynamo] Add support for torch.cuda.FloatTensor() HOT 1
- Module _dynamo init failed with trition folder in workdir using torch 2.4 HOT 1
- Mask not materialized for embedding backward HOT 4
- TorchDispatchMode fails on jit.trace
- Encountering HIP OOM error for the baddbmm operator when auto-tune is enabled HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch.