pytorch / hydra-torch Goto Github PK

Configuration classes enabling type-safe PyTorch configuration for Hydra apps

License: MIT License

Python 100.00%

hydra-torch's Introduction

hydra-torch

Configuration classes enabling type-safe PyTorch configuration for Hydra apps.
This repo is work in progress.

The config dataclasses are generated using configen, check it out if you want to generate config dataclasses for your own project.

Install:

# For now, please obtain through github. Soon, versioned (per-project) dists will be on PyPI.
pip install git+https://github.com/pytorch/hydra-torch

Example config:

Here is one of many configs available. Notice it uses the defaults defined in the torch function signatures:

@dataclass
class TripletMarginLossConf:
    _target_: str = "torch.nn.modules.loss.TripletMarginLoss"
    margin: float = 1.0
    p: float = 2.0
    eps: float = 1e-06
    swap: bool = False
    size_average: Any = None
    reduce: Any = None
    reduction: str = "mean"

Importing Convention:

from hydra_configs.<package_name>.path.to.module import <ClassName>Conf

where <package_name> is the package being configured and path.to.module is the path in the original package.

Inferring where the package is located is as simple as prepending hydra_configs. and postpending Conf to the original class import: e.g.

#module to be configured
from torch.optim.adam import Adam

#config for the module
from hydra_configs.torch.optim.adam import AdamConf

Getting Started:

Take a look at our tutorial series:

Basic Tutorial
Intermediate Tutorial (coming soon)
Advanced Tutorial (coming soon)

Other Config Projects:

A list of projects following the hydra_configs convention (please notify us if you have one!):

Pytorch Lightning

License

hydra-torch is licensed under MIT License.

hydra-torch's People

Contributors

Stargazers

Watchers

Forkers

tkornuta-nvidia zeta1999 queuecumber maciejdomagala shivamdb global-localhost global19 global19-atlassian-net isabella232 jieru-hu classicvalues armbiant pixelb 5l1v3r1 junwucs

hydra-torch's Issues

Is this project still actively developed?

Is there any plan to keep this project active? I do not see any active development since quiet a long time.
Would be a shame. I think it's very useful to have configs readily available for PyTorch.

HYDRA Arithmetic operations within configs

Consider the configuration files:

# dataset.yaml
# @package _group_
  class_name: dataloaders.datasets.COCODataset
  data_dir: ${storage.data_dir}/Docs
  class_labels: [1, 2]
  num_classes: 2

and

# model.yaml
# @package _group_

  module:
    class_name: segmentation.maskrcnn.models.DateOutliner

  top:
    class_name: torchvision.models.detection.mask_rcnn.MaskRCNNPredictor
    params:
      num_classes: ${dataset.num_classes}

Is there a way to add perform arithmetic operations with a paramete. So as to obtain config.model.top.num_classes as 3 (2+1)?

I have tried many variations, but I have not managed to get it to work:

Below are examples that won't work:

  top:
    class_name: torchvision.models.detection.mask_rcnn.MaskRCNNPredictor
    params:
      num_classes: ${dataset.num_classes + 1}

  top:
    class_name: torchvision.models.detection.mask_rcnn.MaskRCNNPredictor
    params:
      num_classes: ${dataset.num_classes++}

[dev] Auto register Configs upon import of hydra_configs.module.name

Write functions of the form:
hydra_configs.torch.register(), hydra_configs.torch.optim.register(), etc.

Within these functions, call config store API:

cs = ConfigStore.instance()
cs.store(name="adamconf", node=AdamConf)

Call these in __init__.py for the module.

[dev] Move specific library requirements into corresponding package folders

e.g. hydra-configs-torchvision for the 0.7 release should require torchvision==0.7, torch==1.6, but other packages like hydra-configs-torch should not depend on torchvision at all.

Make sure noxfile.py is updated to use these new requirements in testing environments.

Fix str defaults in lr_scheduler configen output.

Configen does not correctly output defaults of str type when there are no type annotations in signature. Manually fix this in configs.

Also fixed this upstream in configen and filed PR:
facebookresearch/hydra#1072

[dev] Overview.md

Since this repository contains N packages each corresponding to a collection of configs for their corresponding libraries, we should write up a version controlled overview of the design decisions and methodology moving forward. Currently this info is scattered within PR reviews, the zulip channel, and a google doc.

We will also include info on how we handle version compatibility for package releases.

[hydra-configs-torchvision] Configs for MNIST datasets

Source: https://github.com/pytorch/vision/blob/master/torchvision/datasets/mnist.py

Using configen, create a subset of the torchvision datasets for MNIST. Pair with tests.

These should be along the same lines as @tkornuta-nvidia prototype in NeMo:
https://github.com/NVIDIA/NeMo/blob/main-vis-res/nemo/collections/cv/datasets/configs.py

[hydra-configs-torch] Configs for Linear / Conv Modules

Generate these module confs, brainstorm robust testing method.

[dev] Document outlining how to create external `hydra-<library-name>` project repos.

This is low priority since much of this can be achieved by reading the documentation in this larger repo, but eventually it might be nice to make it stand alone.

I am also considering the idea of creating a 'template', but only after this repo is past release and healthy.

[dev] Metapackaging

A hanging concern is how to support installation of all projects as a 'metapackage'. Not sure of the correct way to do this.

One possibility is a setup.py at the root of the repo which installs the latest of each of the current packages? Thoughts?

[examples][tests] MNIST

Tests for examples from tutorial: examples/mnist_00.py and examples/mnist_01.py.

Something along these lines:
https://github.com/facebookresearch/hydra/tree/master/tests/test_examples

[hydra_configs] ModuleNotFoundError: No module named 'hydra_configs'

Thank for sharing your great work!
I am doing mnist_00.md(basic) tutorial but i get error "ModuleNotFoundError: No module named 'hydra_configs' after import

from hydra_configs.torch.optim import AdadeltaConf
from hydra_configs.torch.optim.lr_scheduler import StepLRConf

I installed Hydra with using the commands:
pip install hydra
pip install hydra-core

Btw, hopefully you will release next tutorial soon =)))

Thanks!

Plans for release?

This is an awesome project and is just what I'm looking for to support feiertag. I'd like to start kicking the tires on this repo and was wondering if you have plans to cut a dev release on pypi? In the interim I can add it as a directly-installable lib from git.

configure CI to run nox

Renaming `master` branch to `main`

As a part of a broad effort to avoid insensitive terminology in our software, we are renaming our default branch from master to main. We recognize that this is only a small step, but it is an opportunity to make our project and community more welcoming to historically marginalized communities.

How does this impact my development process?

There should be very little impact. GitHub will surface the branch name change in your fork, if you have one. For new forks, you will automatically have main as the default branch.

We encourage the use of feature branches for local development. The only change in practice is changing which branch your feature branch is started from. When sending Pull Requests on GitHub, the target will default to our main branch, so there are no changes to make there.

I have a lot of tools that depend on `master` being the upstream branch name. How can I fix that?

master has always been only a default value and a number of projects have used other names for their primary development branch for years. We encourage updating your tooling to instead dynamically determine the branch to use. This article provides insight into how you can do that. Additionally, you can always set up a branch locally of any name to track our main branch.

I'd like to do this for my own projects, do you have any documentation on how this works?

GitHub has published a guide documenting their tooling. We recommend reading that and the accompanying documentation.

If you're a Facebook employee looking to do this for a project you maintain, please reach out to the Open Source Team.

[hydra-configs-torch] Restructure configen directory for configen>=0.9.0dev8

Will do this after outstanding config PRs are merged to limit possible conflicts.

Implement example: mnist/main.py

Implement: https://github.com/pytorch/examples/blob/master/mnist/main.py
using the generated structured configs as schema.

Consider what the most basic form of the example should be. Then incrementally show the power of additional features hydra enables.

[dev][configen] Investigate proper subclassing during generation.

e.g. if MNIST inherits from Datset, MNISTConf should probably be a subclass of DatasetConf.

Fix black within nox to lint instead of format

should be black . --check instead of just black.

How Best to Use This Library

I have done something similar in a recent project using Hydra/PyTorch and I'm evaluating if it makes sense for me to switch to this (I'm trying to simplify the code by replacing as much as I can with 3rd party libraries), but I'm not entirely sure if it works for my use case.

One question I had immediately after reading the tutorial is the section on instantiating from the configs:

 optimizer = Adadelta(lr=cfg.adadelta.lr, 
                         rho=cfg.adadelta.rho,
                         eps=cfg.adadelta.eps,
                         weight_decay=cfg.adadelta.weight_decay,
                         params=model.parameters()

Shouldn't this be something like optimizer = hydra.utils.instantiate(cfg.adadelta, params=model.paramters), that way the user could plug in whatever optimizer they wanted to the config? (Would love @omry 's feedback on this too because maybe I'm misunderstanding it). Or more flexibly:

@dataclass
class MNISTConf:
    ...
    optimizer: Any = AdadeltaConf() 
    scheduler Any = StepLRConf(step_size=1)
...
optimizer = hydra.utils.instantiate(cfg.optimizer, params=model.paramters)
scheduler = hydra.utils.instantiate(cfg.scheduler, optimizer=optimizer)

So when I did this I had a bunch of YAMLs that defined the various options like (folder stucture):

configs:
  optimizer
      adadelta.yaml
      adam.yaml
      sgd.yaml
  scheduler
      steplr.yaml
      cosineannealing.yaml
experiment.yaml

Then the user can put in their experiment config:

defaults:
    ...
    - optimizer: sgd
    - scheduler: cosineannealing
    ...

and then my code uses hydra.utils.instantiate to make whatever the user wants.

So what I would really love to do is replace all the yaml files I wrote with the configs from this repo and keep only the experiment configs. Is this possible to do?

One issue I see with that is that I would need to register, potentially, all of the possible configs this project provides.

Single-node ImageNet DDP

An example of DDP for ImageNet using multirun, as discussed with @omry in the Hydra repo.

[hydra-configs-torch] Configs for Losses

Generate these confs. Write tests.

[hydra-configs-torch][tests] TensorDatasetConf

This test requires improvements to hydra's instantiate. Namely being able to instantiate with non-keyword passthrough arguments.

[hydra-configs-torch][tests] DistributedSamplerConf

In general, we need to discuss how to test instantiation of 'Distributed' classes. This is one class we can configure, but have not experimented with tests for yet.

[hydra-configs-torchvision][tests] MNIST datasets

Note, testing instantiation for these requires passing _check_exists: https://github.com/pytorch/vision/blob/v0.7.0/torchvision/datasets/mnist.py#L119

A solution might be to include a 'dummy' MNIST data folder in tests, and symlink the rest of the datasets paths to this folder. This eliminates holding data in the repo.

[hydra-configs-torch][tests] Losses

Write tests to verify instantiation of loss classes works.

Install with Poetry Fails

Currently

poetry add git+https://github.com/pytorch/hydra-torch

fails with

  RuntimeError

  The dependency name for hydra-configs-torch does not match the actual package's name: hydra-torch

  at ~/.local/lib/python3.6/site-packages/poetry/puzzle/provider.py:293 in get_package_from_directory
      289│         if name and name != package.name:
      290│             # For now, the dependency's name must match the actual package's name
      291│             raise RuntimeError(
      292│                 "The dependency name for {} does not match the actual package's name: {}".format(
    → 293│                     name, package.name
      294│                 )
      295│             )
      296│ 
      297│         return package

may be worth looking into if this is a poetry problem or a packaging problem

pytorch/mmdetection distributed training with multi-machines with hydra

Hi all,

I'm newbie to hydra, here I meet a problem in developing my own project.

My project is based on mmdetection with it's own yaml configure system, but I am working on integrating hydra to the project. To train the model, distributed training is necessary(not only the data parallel). I'm wondering is there any tutorial or documentary about how to do distributed training with multiple machines with hydra?

Thanks all ;-)

[tutorial] Intermediate MNIST

Pickup where we left off in Basic Tutorial

To address:

Configuring the model
Configuring the dataset
Swapping in and out different Optimizers/Schedulers

Another thing to think about diving further into:
Quoting @omry:

Complexity here has multiple dimensions:
Config style:
* File based
* Dataclass bases
* Dataclass as schema for files
Config modeling:
* Single config
* Config groups

Single-node distributed processing with Hydra

Distributed processing with Hydra in single-node multi-GPU setting, as mentioned here.

Explain PyTorch's distributed processing/training.
Simple demonstration of various distributed communication primitives.
Incorporate Hydra into PyTorch's distributed processing.
Using multirun to run multiple processes.

This will serve as an introductory example for #38.

[hydra-configs-torchvision] Configs for Transforms

Blockers on MNIST Intermediate+ Tutorial

Add pytest to nox sessions

torch/optim - restructure to mirror torch imports structure

Reconfigure the generation to output the configs to the same import hierarchy as torch.

Allow for 'group imports' like in: https://github.com/pytorch/pytorch/blob/master/torch/optim/__init__.py

[hydra-configs-torch] Configs for Activations

Generate these confs. Write tests.

torch/optim - LR Schedulers

Generate scheduler configs (needed for MNIST example).

[hydra-configs-torchvision] Add ConfigStore registration function to torchvision configs.

See #72 for discussion and template on how to do this.

Missing copyright header

https://github.com/pytorch/hydra-torch/blob/master/examples/mnist_00.py is missing a copyright header.
@romesco , can you take a fix and check why it passed lint?

[hydra-configs-torchvision] Configs for ImageNet models

Adding these as they might be useful for configuring 'backbones' for larger compositional models.

[hydra-configs-torch] Configs for Datasets and Samplers

https://github.com/pytorch/pytorch/tree/master/torch/utils/data

Add Dataset, DataLoader, Sampler, and DistributedSampler configs.

[hydra-configs-torchvision][tests] Configs for Transforms

Tests accompanying #57

[hydra-configs-torch][tests] Optimizers and LR schedulers

https://github.com/facebookresearch/hydra-torch/blob/691a390abd2edf764f9431a56b8058ff2c12eb0c/tests/test_instantiate.py#L40

Check minimal tests. Is this the right way to confirm our configs instantiate the correct object?

From previous PR discussion:

Ideally I wanted these tests to do 3 things:

Ensure the config exists.

Be a valid input to instantiate (and subsequently get an object back).

Show that this object works as expected.

For an optimizer, taking a step seems to prove that it is functioning correctly. Comparing the output of your cfg optimizer and a directly called optimizer is kind of a bonus in that if it didn't work, it doesn't mean we didn't instantiate an optimizer correctly given the config. It just means the optimizer has nondeterministic behavior.

[hydra-configs-torch][tests] Dataset and Sampler

Add instantiation tests in pytest for these configs.

[hydra-configs-torch] Add version mismatch warning in hydra_configs/torch/init.py

Assuming this approach is agreed upon within the onging torchvision datasets PR, do this for the hydra-configs-torch as well.

[dev] Determine if namespace packaging will work for imports alongside pytorch.

Quoting @omry:

In an ideal world, the configs would be packaged with PyTorch itself, so probably:
torch.optim.AdamConf

I agree with this.

Alternatives are torch_config.optim.AdamConf. Using config.* is probably too generic of a module name and I imagine could lead to collisions.

[dev] Structure branches for release.

We intend to release versions of singular project packages - think hydra-configs-torch or hydra-configs-torchvision via release branches that get tagged for upload to PyPI.

This enables users to get the exact version they need by specifying:
pip install hydra-configs-torch==1.6 or pip install hydra-configs-torch==1.7.

Master/the top level will remain the metapackage hydra-torch and will include the most uptodate configs for each package. For example, at present, if the user wrote:
pip install hydra-torch,
they would end up with the packages:

hydra-configs-torch==1.7
hydra-configs-torchvision==0.8

[dev] freeze flake version, add pre-commit for dev consistency

Freezing flake version to 3.8.4

Create pre-commit file so others contributors can use the correct linting/formatting before commit.