Git Product home page Git Product logo

Comments (9)

eracah avatar eracah commented on August 18, 2024 1

Today I observed that some arguments of the CheckpointSaver are not exposed through the Trainer (remote_file_name and latest_remote_file_name). This might be another argument towards decorrelating these two entities by forcing the user to instantiate the callback.

It's a good point, but I don't think we should force users to make a callback for a very basic, non-custom thing.
remote_file_name is indirectly passed in by parsing the save_folder if it includes an object store prefix and we don't allow users to customize latest_remote_file_name because that was a mistake to expose that fine-grained of a knob to them to begin with

from composer.

mvpatel2000 avatar mvpatel2000 commented on August 18, 2024

Seems reasonable to me. I would emit a log.info that it's being skipped for auto-creation if the requisite args are passed but an existing callback is present. We'd love a community PR for this!

Also tagging @eracah in case you have any objections

from composer.

antoinebrl avatar antoinebrl commented on August 18, 2024

Cool! I can start working on it tomorrow.

Out of curiosity, when is the next release expected? Maybe I get this merged before it's out.

from composer.

mvpatel2000 avatar mvpatel2000 commented on August 18, 2024

Cool! I can start working on it tomorrow.

Out of curiosity, when is the next release expected? Maybe I get this merged before it's out.

We are planning in 1-2 weeks. We try to aim for every month minimum and optionally a mid-month if there is enough PRs accumulated. We can try to make this release process and cadence more public if helpful.

from composer.

eracah avatar eracah commented on August 18, 2024

@antoinebrl, this is a good idea! One other thing to keep in mind is that we may have a use case where someone wants two checkpoint savers. e.g. they are saving some checkpoints locally at a different frequency that they are saving some remotely.

from composer.

eracah avatar eracah commented on August 18, 2024

Another thing to note is that in the next few months we will be moving to a config-based set up for checkpointing where you specify 1 or more configs (usually 1) for checkpoint saving and each config is used to create a CheckpointSaver. This obviates the need for a user creating their own CheckpointSaver (unless they are doing something very custom), so I fear this PR may only be useful for a short time.

from composer.

antoinebrl avatar antoinebrl commented on August 18, 2024

@mvpatel2000, Depending on the criticality of the feature we contribute we either wait or develop a quick hack. Having an ETA for the next release would help us plan better, but I also understand that committing to specific release cycle might increase the maintenance burden on your side. Without specific schedule, can I suggest you to pin an issue when you know when the next release will be? This should give us a notice of a few days.

@eracah,I will definitely take into account scenarios involving multiple CheckpointSavers, and I appreciate you sharing the roadmap for this. Are the roadmaps and design documents available publicly? I would be happy to share ideas and feedback or consider implementing some of them.

How will the flexibility be affected once the config object is introduced? For instance, I created a custom CheckpointSaver that saves only after an evaluation. This synchronization ensures each intermediate evaluation is reproducible and that the intermediate checkpoints can be compared performance wise. This could be extended to create checkpoints only when a certain metric is improving. As you can see, using different checkpointing strategies involves not just different configurations but also implementing custom rules.

From the user's perspective, given the simplicity of the CheckpointSaver API, there's not much difference between passing all values to the Trainer, assembling them inside a config structure, or directly instantiating the corresponding object. My preference is for the latter option: letting the user instantiate the callback(s) themselves. Introducing a config object expands the API surface without much benefit (and can even reduce flexibility as pointed out above). Once the CheckpointSaver instantiation is made optional within the Trainer, it raises the question of whether the Trainer should still accept checkpoint-related values. Potentially, nine pass-through arguments of the Trainer (save_filename, save_weights_only, etc.) could be removed, which would enhance the separation of responsibilities and simplify the API. wdyt?

from composer.

antoinebrl avatar antoinebrl commented on August 18, 2024

PR is ready for review: #3334

from composer.

antoinebrl avatar antoinebrl commented on August 18, 2024

Today I observed that some arguments of the CheckpointSaver are not exposed through the Trainer (remote_file_name and latest_remote_file_name). This might be another argument towards decorrelating these two entities by forcing the user to instantiate the callback.

from composer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.