Comments (29)
Link to landed trunk PR:
Link to release branch PR:
Criteria Category:
- Low risk critical fix for checkpointing error introduced in 2.3. Fixes #122792
@atalman wrote: Hi @antoinebrl the PR: #126567 was merged, please correct posted cherry-pick
@antoinebrl: Thanks @atalman! I isolated the cherry picked commit now that the previous refactoring is merged.
@huydhn merged
from pytorch.
Please note, we are in the phase:
Phase 2 (after 5/27): Perform extended integration/stability/performance testing based on Release Candidate builds.
We do not accept any new cherry-picks
from pytorch.
Link to main PR (if applicable):
Link to 2.3.1 release branch PR:
Criteria Category:
Fixes performance regression issues pytorch/builder#1774 and #124922
@atalman merged
from pytorch.
Link to landed trunk PR (if applicable):
Link to release branch PR:
Criteria Category:
- Documentation improvements
@atalman merged
from pytorch.
Link to landed trunk PR (if applicable):
Link to release branch PR:
Criteria Category:
- low risk critical fix
@atalman merged
from pytorch.
Link to landed trunk PR (if applicable):
- This is not a PR cherrypick but a revert commit: 51cf57c
Link to release branch PR:
Criteria Category:
- Low risk critical fix
@atalman merged
from pytorch.
Link to landed trunk PR (if applicable):
Link to release branch PR:
Criteria Category:
- Critical Fix - torchdata compatibility
@huydhn merged
from pytorch.
Link to landed trunk PR (if applicable):
Link to release branch PR:
Criteria Category:
- Critical fix for #125109
@huydhn merged
from pytorch.
Link to landed trunk PR (if applicable):
Link to release branch PR:
Criteria Category:
- Critical fix for silent correctness issue: #125135
@huydhn merged
from pytorch.
Link to landed trunk PR (if applicable):
Link to release branch PR:
Criteria Category:
- Critical fix for: #125094
@huydhn merged
from pytorch.
Link to landed trunk PR (if applicable):
Link to release branch PR:
Criteria Category:
- Fixes doc failure on Release branch
@huydhn merged
from pytorch.
Link to landed trunk PR (if applicable):
Link to release branch PR:
Criteria Category:
- Critical Fixes #124850
@huydhn merged
from pytorch.
Link to landed trunk PR (if applicable):
Link to release branch PR:
Criteria Category:
- Fixes Crash #124868
@huydhn I update the cherry pick manually to remove changes from #124479, let's see if that works.
@atalman merged
from pytorch.
Link to landed trunk PR (if applicable):
Link to release branch PR:
Criteria Category:
- Critical fix #124335
@huydhn merged
from pytorch.
Link to landed trunk PR (if applicable):
Link to release branch PR:
Criteria Category:
- Critical fix #119607
@huydhn This cherry pick is complex from what I see because cherry picking it blindly won't work. It depends on #122146, that enables dynamo for 3.12. And that stack is not a small one. So, I think we need to rework the cherry pick #126107 manually if we want to go ahead with this (cc @williamwen42 @atalman @malfet @albanD)
@atalman Closed - #126107 . Merged - #126235
from pytorch.
Link to landed trunk PR (if applicable):
Link to release branch PR:
Criteria Category:
- Fixes new feature
@huydhn merged
from pytorch.
Link to landed trunk PR (if applicable):
Link to release branch PR:
Criteria Category:
- Required to fix Lint errors on release
@huydhn merged
from pytorch.
Link to landed trunk PR (if applicable):
Link to release branch PR:
Criteria Category:
- Fixes to regressions against the 2.3.0 release
@atalman merged
from pytorch.
Link to landed trunk PR (if applicable):
*
Link to release branch PR:
Criteria Category:
- Release only - use triton 2.3.1 version rather then current 2.3.0. Revert temp change after triton 2.3.1 release
from pytorch.
Link to landed trunk PR (if applicable):
Link to release branch PR:
Criteria Category:
- Documentation
@atalman merged
from pytorch.
Link to landed trunk PR (if applicable):
Link to release branch PR:
Criteria Category:
- Critical fix #119607 torch.compile refleak
@atalman merged
from pytorch.
Link to landed trunk PR (if applicable):
Link to release branch PR:
Criteria Category:
- Low risk addition of hipify mappings to enable DeepSpeed transformer extensions on ROCm
@atalman merged
from pytorch.
Link to landed trunk PR (if applicable):
Link to release branch PR:
Criteria Category:
- Fixes to regressions against the 2.3.0 release
@atalman merged
from pytorch.
Link to landed trunk PR (if applicable):
- NA
Link to release branch PR:
Criteria Category:
- Release only changes - pin docker image for rocm CI. Temporary PR. ROCm test jobs were failing with the MIOpen error because a subtle difference crept in the MIOpen kdb files when the docker images were rebuilt. Hence pin to make CI jobs green. Will unpin once kdb issue is resolved
from pytorch.
Link to main PR (if applicable):
Link to 2.3.1 release branch PR:
Criteria Category:
- Low risk critical fix for checkpointing. This PR removes an additional check introduced in torch 2.3 which is actually incorrect and causes checkpointing to fail if at least 1 forward/backward pass has not been run.
@fegin Cherry-pick the test PR #127130. cc.,
@atalman merged
from pytorch.
Link to main PR (if applicable):
Link to 2.3.1 release branch PR:
Criteria Category:
- Low risk critical fix for checkpointing. When checkpointing with activation checkpointing, the names of the variables (FQNs) are changed with the tag
_checkpoint_wrapped_module
. This breaks checkpointing if you resume without activation checkpointing. - @wz337 Added unit test for this cherry-pick.
@atalman merged
from pytorch.
Link to main PR:
Link to 2.3.1 release branch PR:
Criteria Category:
- Low risk critical fix for checkpointing. Torch 2.3 ignores
_extra_state
(whereas prior PyTorch versions correctly handle it), breaking checkpoint loading from prior checkpoints. This also breaks integration with Nvidia's TransformerEngine.
@atalman merged
from pytorch.
Link to landed trunk PR (if applicable):
Link to release branch PR:
Criteria Category:
- CI fix to keep inductor job and pull jobs green in release branch
from pytorch.
closing release tracker. release published
from pytorch.
Related Issues (20)
- torch.utils.flop_counter.FlopCounterMode doesn't work well with compiled model
- Wrong result for Inplace tensor update on transpose for some devices with torch 2.3.0 HOT 7
- Documentation for DDP-related environment variables HOT 5
- [Bug] The cuDNN version is too old! HOT 13
- Adam is storing step as cpu tensor
- nn.Transformer gives different output in torch.no_grad() context HOT 3
- Memory leak when assigning function to instance variable HOT 6
- DISABLED test_attention_vs_linear (__main__.MemoryBudgetTest) HOT 2
- `torch.compile` with `reduce-overhead` and DDP causes crash in inference after training HOT 1
- Why does the size of state_dict decrease a lot when I move the parameters to the CPU? What information was lost? HOT 6
- Rebase your PRs: Unstable CUDA signal in CI caused by cudnn 9 update HOT 1
- torch.jit.script causes Segmentation fault 139 HOT 3
- AOTAutogradCache implementation HOT 1
- Cannot convert -oo to int HOT 4
- 'FakeRootModule' object has no attribute 'self___aot_engines_0_short_term_memories_list_0_0_0' HOT 4
- Use of standard library math functions vs global namespace HOT 1
- can not create custom gradient when fp8 type is used HOT 2
- JIT script can cause unneeded `requires_grad` inside `no_grad` blocks
- torchinductor error in torchao tests HOT 6
- unexpected `inference_mode` interaction with `torch.autograd.functional.jacobian` HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch.