Git Product home page Git Product logo

Comments (29)

antoinebrl avatar antoinebrl commented on September 24, 2024 1

⚠️ #126567 must be merged before

Link to landed trunk PR:

Link to release branch PR:

Criteria Category:

  • Low risk critical fix for checkpointing error introduced in 2.3. Fixes #122792

@atalman wrote: Hi @antoinebrl the PR: #126567 was merged, please correct posted cherry-pick
@antoinebrl: Thanks @atalman! I isolated the cherry picked commit now that the previous refactoring is merged.
@huydhn merged

from pytorch.

atalman avatar atalman commented on September 24, 2024 1

Please note, we are in the phase:
Phase 2 (after 5/27): Perform extended integration/stability/performance testing based on Release Candidate builds.
We do not accept any new cherry-picks

from pytorch.

snadampal avatar snadampal commented on September 24, 2024

Link to main PR (if applicable):

Link to 2.3.1 release branch PR:

Criteria Category:
Fixes performance regression issues pytorch/builder#1774 and #124922


@atalman merged

from pytorch.

saitcakmak avatar saitcakmak commented on September 24, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Documentation improvements

@atalman merged

from pytorch.

soulitzer avatar soulitzer commented on September 24, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • low risk critical fix

@atalman merged

from pytorch.

eqy avatar eqy commented on September 24, 2024

Link to landed trunk PR (if applicable):

  • This is not a PR cherrypick but a revert commit: 51cf57c

Link to release branch PR:

Criteria Category:

  • Low risk critical fix

@atalman merged

from pytorch.

atalman avatar atalman commented on September 24, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Critical Fix - torchdata compatibility

@huydhn merged

from pytorch.

atalman avatar atalman commented on September 24, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:


@huydhn merged

from pytorch.

atalman avatar atalman commented on September 24, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Critical fix for silent correctness issue: #125135

@huydhn merged

from pytorch.

atalman avatar atalman commented on September 24, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:


@huydhn merged

from pytorch.

atalman avatar atalman commented on September 24, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Fixes doc failure on Release branch

@huydhn merged

from pytorch.

atalman avatar atalman commented on September 24, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:


@huydhn merged

from pytorch.

atalman avatar atalman commented on September 24, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:


@huydhn I update the cherry pick manually to remove changes from #124479, let's see if that works.
@atalman merged

from pytorch.

atalman avatar atalman commented on September 24, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:


@huydhn merged

from pytorch.

atalman avatar atalman commented on September 24, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:


@huydhn This cherry pick is complex from what I see because cherry picking it blindly won't work. It depends on #122146, that enables dynamo for 3.12. And that stack is not a small one. So, I think we need to rework the cherry pick #126107 manually if we want to go ahead with this (cc @williamwen42 @atalman @malfet @albanD)

@atalman Closed - #126107 . Merged - #126235

from pytorch.

atalman avatar atalman commented on September 24, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Fixes new feature

@huydhn merged

from pytorch.

atalman avatar atalman commented on September 24, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Required to fix Lint errors on release

@huydhn merged

from pytorch.

wanchaol avatar wanchaol commented on September 24, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Fixes to regressions against the 2.3.0 release

@atalman merged

from pytorch.

atalman avatar atalman commented on September 24, 2024

Link to landed trunk PR (if applicable):
*

Link to release branch PR:

Criteria Category:

  • Release only - use triton 2.3.1 version rather then current 2.3.0. Revert temp change after triton 2.3.1 release

from pytorch.

atalman avatar atalman commented on September 24, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Documentation

@atalman merged

from pytorch.

williamwen42 avatar williamwen42 commented on September 24, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Critical fix #119607 torch.compile refleak

@atalman merged

from pytorch.

jithunnair-amd avatar jithunnair-amd commented on September 24, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Low risk addition of hipify mappings to enable DeepSpeed transformer extensions on ROCm

@atalman merged

from pytorch.

weifengpy avatar weifengpy commented on September 24, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Fixes to regressions against the 2.3.0 release

@atalman merged

from pytorch.

atalman avatar atalman commented on September 24, 2024

Link to landed trunk PR (if applicable):

  • NA

Link to release branch PR:

Criteria Category:

  • Release only changes - pin docker image for rocm CI. Temporary PR. ROCm test jobs were failing with the MIOpen error because a subtle difference crept in the MIOpen kdb files when the docker images were rebuilt. Hence pin to make CI jobs green. Will unpin once kdb issue is resolved

@atalman merged #126452

from pytorch.

mvpatel2000 avatar mvpatel2000 commented on September 24, 2024

Link to main PR (if applicable):

Link to 2.3.1 release branch PR:

Criteria Category:

  • Low risk critical fix for checkpointing. This PR removes an additional check introduced in torch 2.3 which is actually incorrect and causes checkpointing to fail if at least 1 forward/backward pass has not been run.

@fegin Cherry-pick the test PR #127130. cc.,
@atalman merged

from pytorch.

mvpatel2000 avatar mvpatel2000 commented on September 24, 2024

Link to main PR (if applicable):

Link to 2.3.1 release branch PR:

Criteria Category:

  • Low risk critical fix for checkpointing. When checkpointing with activation checkpointing, the names of the variables (FQNs) are changed with the tag _checkpoint_wrapped_module. This breaks checkpointing if you resume without activation checkpointing.
  • @wz337 Added unit test for this cherry-pick.

@atalman merged

from pytorch.

mvpatel2000 avatar mvpatel2000 commented on September 24, 2024

Link to main PR:

Link to 2.3.1 release branch PR:

Criteria Category:

  • Low risk critical fix for checkpointing. Torch 2.3 ignores _extra_state (whereas prior PyTorch versions correctly handle it), breaking checkpoint loading from prior checkpoints. This also breaks integration with Nvidia's TransformerEngine.

@atalman merged

from pytorch.

huydhn avatar huydhn commented on September 24, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • CI fix to keep inductor job and pull jobs green in release branch

from pytorch.

atalman avatar atalman commented on September 24, 2024

closing release tracker. release published

from pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.