Git Product home page Git Product logo

hydra-torch's Introduction

hydra-torch

Configuration classes enabling type-safe PyTorch configuration for Hydra apps.
This repo is work in progress.

The config dataclasses are generated using configen, check it out if you want to generate config dataclasses for your own project.

Install:

# For now, please obtain through github. Soon, versioned (per-project) dists will be on PyPI.
pip install git+https://github.com/pytorch/hydra-torch

Example config:

Here is one of many configs available. Notice it uses the defaults defined in the torch function signatures:

@dataclass
class TripletMarginLossConf:
    _target_: str = "torch.nn.modules.loss.TripletMarginLoss"
    margin: float = 1.0
    p: float = 2.0
    eps: float = 1e-06
    swap: bool = False
    size_average: Any = None
    reduce: Any = None
    reduction: str = "mean"

Importing Convention:

from hydra_configs.<package_name>.path.to.module import <ClassName>Conf

where <package_name> is the package being configured and path.to.module is the path in the original package.

Inferring where the package is located is as simple as prepending hydra_configs. and postpending Conf to the original class import: e.g.

#module to be configured
from torch.optim.adam import Adam

#config for the module
from hydra_configs.torch.optim.adam import AdamConf

Getting Started:

Take a look at our tutorial series:

  1. Basic Tutorial
  2. Intermediate Tutorial (coming soon)
  3. Advanced Tutorial (coming soon)

Other Config Projects:

A list of projects following the hydra_configs convention (please notify us if you have one!):

Pytorch Lightning

License

hydra-torch is licensed under MIT License.

hydra-torch's People

Contributors

jieru-hu avatar omry avatar pixelb avatar romesco avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hydra-torch's Issues

Is this project still actively developed?

Is there any plan to keep this project active? I do not see any active development since quiet a long time.
Would be a shame. I think it's very useful to have configs readily available for PyTorch.

HYDRA Arithmetic operations within configs

Consider the configuration files:

# dataset.yaml
# @package _group_
  class_name: dataloaders.datasets.COCODataset
  data_dir: ${storage.data_dir}/Docs
  class_labels: [1, 2]
  num_classes: 2

and

# model.yaml
# @package _group_

  module:
    class_name: segmentation.maskrcnn.models.DateOutliner

  top:
    class_name: torchvision.models.detection.mask_rcnn.MaskRCNNPredictor
    params:
      num_classes: ${dataset.num_classes}

Is there a way to add perform arithmetic operations with a paramete. So as to obtain config.model.top.num_classes as 3 (2+1)?

I have tried many variations, but I have not managed to get it to work:

Below are examples that won't work:

  top:
    class_name: torchvision.models.detection.mask_rcnn.MaskRCNNPredictor
    params:
      num_classes: ${dataset.num_classes + 1} 
  top:
    class_name: torchvision.models.detection.mask_rcnn.MaskRCNNPredictor
    params:
      num_classes: ${dataset.num_classes++} 

[dev] Overview.md

Since this repository contains N packages each corresponding to a collection of configs for their corresponding libraries, we should write up a version controlled overview of the design decisions and methodology moving forward. Currently this info is scattered within PR reviews, the zulip channel, and a google doc.

We will also include info on how we handle version compatibility for package releases.

[dev] Metapackaging

A hanging concern is how to support installation of all projects as a 'metapackage'. Not sure of the correct way to do this.

One possibility is a setup.py at the root of the repo which installs the latest of each of the current packages? Thoughts?

[hydra_configs] ModuleNotFoundError: No module named 'hydra_configs'

Thank for sharing your great work!
I am doing mnist_00.md(basic) tutorial but i get error "ModuleNotFoundError: No module named 'hydra_configs' after import

from hydra_configs.torch.optim import AdadeltaConf
from hydra_configs.torch.optim.lr_scheduler import StepLRConf

I installed Hydra with using the commands:
pip install hydra
pip install hydra-core

Btw, hopefully you will release next tutorial soon =)))

Thanks!

Plans for release?

This is an awesome project and is just what I'm looking for to support feiertag. I'd like to start kicking the tires on this repo and was wondering if you have plans to cut a dev release on pypi? In the interim I can add it as a directly-installable lib from git.

Renaming `master` branch to `main`

Renaming master branch to main

As a part of a broad effort to avoid insensitive terminology in our software, we are renaming our default branch from master to main. We recognize that this is only a small step, but it is an opportunity to make our project and community more welcoming to historically marginalized communities.

How does this impact my development process?

There should be very little impact. GitHub will surface the branch name change in your fork, if you have one. For new forks, you will automatically have main as the default branch.

We encourage the use of feature branches for local development. The only change in practice is changing which branch your feature branch is started from. When sending Pull Requests on GitHub, the target will default to our main branch, so there are no changes to make there.

I have a lot of tools that depend on master being the upstream branch name. How can I fix that?

master has always been only a default value and a number of projects have used other names for their primary development branch for years. We encourage updating your tooling to instead dynamically determine the branch to use. This article provides insight into how you can do that. Additionally, you can always set up a branch locally of any name to track our main branch.

I'd like to do this for my own projects, do you have any documentation on how this works?

GitHub has published a guide documenting their tooling. We recommend reading that and the accompanying documentation.

If you're a Facebook employee looking to do this for a project you maintain, please reach out to the Open Source Team.

How Best to Use This Library

I have done something similar in a recent project using Hydra/PyTorch and I'm evaluating if it makes sense for me to switch to this (I'm trying to simplify the code by replacing as much as I can with 3rd party libraries), but I'm not entirely sure if it works for my use case.

One question I had immediately after reading the tutorial is the section on instantiating from the configs:

 optimizer = Adadelta(lr=cfg.adadelta.lr, 
                         rho=cfg.adadelta.rho,
                         eps=cfg.adadelta.eps,
                         weight_decay=cfg.adadelta.weight_decay,
                         params=model.parameters()

Shouldn't this be something like optimizer = hydra.utils.instantiate(cfg.adadelta, params=model.paramters), that way the user could plug in whatever optimizer they wanted to the config? (Would love @omry 's feedback on this too because maybe I'm misunderstanding it). Or more flexibly:

@dataclass
class MNISTConf:
    ...
    optimizer: Any = AdadeltaConf() 
    scheduler Any = StepLRConf(step_size=1)
...
optimizer = hydra.utils.instantiate(cfg.optimizer, params=model.paramters)
scheduler = hydra.utils.instantiate(cfg.scheduler, optimizer=optimizer)

So when I did this I had a bunch of YAMLs that defined the various options like (folder stucture):

configs:
  optimizer
      adadelta.yaml
      adam.yaml
      sgd.yaml
  scheduler
      steplr.yaml
      cosineannealing.yaml
experiment.yaml

Then the user can put in their experiment config:

defaults:
    ...
    - optimizer: sgd
    - scheduler: cosineannealing
    ...

and then my code uses hydra.utils.instantiate to make whatever the user wants.

So what I would really love to do is replace all the yaml files I wrote with the configs from this repo and keep only the experiment configs. Is this possible to do?

One issue I see with that is that I would need to register, potentially, all of the possible configs this project provides.

Install with Poetry Fails

Currently

poetry add git+https://github.com/pytorch/hydra-torch

fails with

  RuntimeError

  The dependency name for hydra-configs-torch does not match the actual package's name: hydra-torch

  at ~/.local/lib/python3.6/site-packages/poetry/puzzle/provider.py:293 in get_package_from_directory
      289│         if name and name != package.name:
      290│             # For now, the dependency's name must match the actual package's name
      291│             raise RuntimeError(
      292│                 "The dependency name for {} does not match the actual package's name: {}".format(
    → 293│                     name, package.name
      294│                 )
      295│             )
      296│ 
      297│         return package

may be worth looking into if this is a poetry problem or a packaging problem

pytorch/mmdetection distributed training with multi-machines with hydra

Hi all,

I'm newbie to hydra, here I meet a problem in developing my own project.

My project is based on mmdetection with it's own yaml configure system, but I am working on integrating hydra to the project. To train the model, distributed training is necessary(not only the data parallel). I'm wondering is there any tutorial or documentary about how to do distributed training with multiple machines with hydra?

Thanks all ;-)

[tutorial] Intermediate MNIST

Pickup where we left off in Basic Tutorial

To address:

  • Configuring the model
  • Configuring the dataset
  • Swapping in and out different Optimizers/Schedulers

Another thing to think about diving further into:
Quoting @omry:

Complexity here has multiple dimensions:
Config style:
* File based
* Dataclass bases
* Dataclass as schema for files
Config modeling:
* Single config
* Config groups

Single-node distributed processing with Hydra

Distributed processing with Hydra in single-node multi-GPU setting, as mentioned here.

  • Explain PyTorch's distributed processing/training.
  • Simple demonstration of various distributed communication primitives.
  • Incorporate Hydra into PyTorch's distributed processing.
  • Using multirun to run multiple processes.

This will serve as an introductory example for #38.

[hydra-configs-torch][tests] Optimizers and LR schedulers

https://github.com/facebookresearch/hydra-torch/blob/691a390abd2edf764f9431a56b8058ff2c12eb0c/tests/test_instantiate.py#L40

Check minimal tests. Is this the right way to confirm our configs instantiate the correct object?

From previous PR discussion:

Ideally I wanted these tests to do 3 things:

  1. Ensure the config exists.
  2. Be a valid input to instantiate (and subsequently get an object back).
  3. Show that this object works as expected.

For an optimizer, taking a step seems to prove that it is functioning correctly. Comparing the output of your cfg optimizer and a directly called optimizer is kind of a bonus in that if it didn't work, it doesn't mean we didn't instantiate an optimizer correctly given the config. It just means the optimizer has nondeterministic behavior.

[dev] Structure branches for release.

We intend to release versions of singular project packages - think hydra-configs-torch or hydra-configs-torchvision via release branches that get tagged for upload to PyPI.

This enables users to get the exact version they need by specifying:
pip install hydra-configs-torch==1.6 or pip install hydra-configs-torch==1.7.

Master/the top level will remain the metapackage hydra-torch and will include the most uptodate configs for each package. For example, at present, if the user wrote:
pip install hydra-torch,
they would end up with the packages:

hydra-configs-torch==1.7
hydra-configs-torchvision==0.8

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.