Git Product home page Git Product logo

pytorch_tempest's Introduction

tempest

DeepSource

This repository has my pipeline for training neural nets.

Main frameworks used:

The main ideas of the pipeline:

  • all parameters and modules are defined in configs;
  • prepare configs beforehand for different optimizers/schedulers and so on, so it is easy to switch between them;
  • have templates for different deep learning tasks. Currently, image classification and named entity recognition are supported;

Examples of running the pipeline: This will run training on MNIST (data will be downloaded):

>>> python train.py --config-name mnist_config model.encoder.params.to_one_channel=True

Running on MPS (M1 macbook)

python train.py --config-name mnist_config model.encoder.params.to_one_channel=True trainer.accelerator=mps +trainer.devices=1 optimizer=adan training.lr=0.001

The default run:

>>> python train.py

The default version of the pipeline is run on imagenette dataset. To do it, download the data from this repository: https://github.com/fastai/imagenette unzip it and define the path to it in conf/datamodule/image_classification.yaml path

pytorch_tempest's People

Contributors

erlemar avatar shrey-b avatar utisetur avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch_tempest's Issues

Switch to Train/EvalResult

Train/EvalResult seems promising, but has some bugs. When everything if fixed, I should switch to it.

first run not work

If run command

python train.py --config-name mnist_config model.encoder.params.to_one_channel=True

I get an error

`LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

| Name | Type | Params

0 | model | Net | 23.6 M
1 | loss | CrossEntropyLoss | 0
2 | metric | Accuracy | 0

23.6 M Trainable params
0 Non-trainable params
23.6 M Total params
94.253 Total estimated model params size (MB)
Epoch 0: 0%| | 0/16 [00:00<?, ?it/s]Traceback (most recent call last):
File "train.py", line 92, in
run_model()
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/hydra/main.py", line 32, in decorated_main
_run_hydra(
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/hydra/_internal/utils.py", line 346, in _run_hydra
run_and_report(
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/hydra/_internal/utils.py", line 201, in run_and_report
raise ex
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/hydra/_internal/utils.py", line 198, in run_and_report
return func()
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/hydra/_internal/utils.py", line 347, in
lambda: hydra.run(
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 107, in run
return run_job(
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/hydra/core/utils.py", line 125, in run_job
ret.return_value = task_function(task_cfg)
File "train.py", line 88, in run_model
run(cfg)
File "train.py", line 61, in run
trainer.fit(model, dm)
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 460, in fit
self._run(model)
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 758, in _run
self.dispatch()
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 799, in dispatch
self.accelerator.start_training(self)
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
self.training_type_plugin.start_training(trainer)
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training
self._results = trainer.run_stage()
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 809, in run_stage
return self.run_train()
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 871, in run_train
self.train_loop.run_training_epoch()
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 499, in run_training_epoch
batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 738, in run_training_batch
self.optimizer_step(optimizer, opt_idx, batch_idx, train_step_and_backward_closure)
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 434, in optimizer_step
model_ref.optimizer_step(
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/pytorch_lightning/core/lightning.py", line 1403, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 214, in step
self.__optimizer_step(*args, closure=closure, profiler_name=profiler_name, **kwargs)
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 134, in __optimizer_step
trainer.accelerator.optimizer_step(optimizer, self._optimizer_idx, lambda_closure=closure, **kwargs)
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 329, in optimizer_step
self.run_optimizer_step(optimizer, opt_idx, lambda_closure, **kwargs)
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 336, in run_optimizer_step
self.training_type_plugin.optimizer_step(optimizer, lambda_closure=lambda_closure, **kwargs)
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 193, in optimizer_step
optimizer.step(closure=lambda_closure, **kwargs)
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/torch/optim/optimizer.py", line 88, in wrapper
return func(*args, **kwargs)
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/torch/optim/adamw.py", line 65, in step
loss = closure()
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 732, in train_step_and_backward_closure
result = self.training_step_and_backward(
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 823, in training_step_and_backward
result = self.training_step(split_batch, batch_idx, opt_idx, hiddens)
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 290, in training_step
training_step_output = self.trainer.accelerator.training_step(args)
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 204, in training_step
return self.training_type_plugin.training_step(*args)
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/dp.py", line 98, in training_step
return self.model(*args, **kwargs)
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/pytorch_lightning/overrides/data_parallel.py", line 77, in forward
output = super().forward(*inputs, **kwargs)
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/pytorch_lightning/overrides/base.py", line 46, in forward
output = self.module.training_step(*inputs, **kwargs)
File "/home/joefox/data/nextcloud/projects/pytorch_tempest/src/lightning_classes/lightning_image_classification.py", line 54, in training_step
score = self.metric(logits.argmax(1), target)
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/torchmetrics/metric.py", line 190, in forward
self.update(*args, **kwargs)
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/torchmetrics/metric.py", line 249, in wrapped_func
return update(*args, **kwargs)
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/torchmetrics/classification/accuracy.py", line 231, in update
mode = _mode(preds, target, self.threshold, self.top_k, self.num_classes, self.multiclass)
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/torchmetrics/functional/classification/accuracy.py", line 36, in _mode
mode = _check_classification_inputs(
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/torchmetrics/utilities/checks.py", line 288, in _check_classification_inputs
_check_num_classes_mc(preds, target, num_classes, multiclass, implied_classes)
File "/home/joefox/.pyenv/versions/hydra/lib/python3.8/site-packages/torchmetrics/utilities/checks.py", line 164, in _check_num_classes_mc
raise ValueError("The highest label in target should be smaller than num_classes.")
ValueError: The highest label in target should be smaller than num_classes.
Epoch 0: 0%| | 0/16 [00:00<?, ?it/s] `

What have I done wrong?

mnist_config.yaml is broken (private, to_one_channel)

Hi,

tried to run the MNIST example, but it looks like there are some minor problems with mnist_config

issue 7.1 (private/custom):

>>python train.py --config-name mnist_config 
Could not load private/custom.
Available options:
        default

I have tried to fix it by overriding it at the command line, but faced with the next problem.

issue 7.2:

>>python train.py --config-name mnist_config private=default

RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[128, 1, 28, 28] to have 3 channels, but got 1 channels instead

The solution is to set parameter "model.encoder.params.to_one_channel" as True:

way1: in command line

python train.py --config-name mnist_config private=default model.encoder.params.to_one_channel=True

way2: update 1-2 files (root config to fix "private" node, and create standalone simple_model_mnist.yaml with correct value of encoder.params.to_one_channels.

way3 (?): override something in mnist_config.yaml

Which way is preferable?

I am confused about way2, because it's not very suitable have a several simple_model_xxx.yaml files
P.S. Perhaps a better way is to take info about the numbers of input channels from the datamodule

Thanks

Use of hydra.utils.instantiate

Hi really like your template! Thanks for putting it out there.

I was wondering what your thoughts are on using hydras coming instantiate feature vs. the load_obj? Have you had a look at it or played around with it? It seems to work for me in some cases but can also cause weird effects. Would be curious on your thoughts.

Add documentation

Add documentation with sphinx

Basic docs are done, now it would be great, if they were automatically generated over all the code.

Add new schedulers

It is always good to have more options to choose. So it would be a good idea to add more schedulers. The steps are the following:

  • in conf/scheduler add a config for a new scheduler
  • if this scheduler requires some other library, update requirements
  • run tests to check that everything works

Example: https://github.com/Erlemar/pytorch_tempest/blob/master/conf/scheduler/cyclic.yaml

# @package _group_
class_name: torch.optim.lr_scheduler.CyclicLR
step: step
params:
  base_lr: ${training.lr}
  max_lr: 0.1
  • # @package _group_ - default necessary line for hydra
  • class_name - full name/path to the object
  • params: parameters, which are overriden. If scheduler has more parameters than defined in config, then default values will be used.

There are 3 possible cases of adding a scheduler:

  • default pytorch scheduler. Simply add config for it.
  • schedulerfrom another library. Add this library to requirements, define config with class_name based on the library. For example cyclicLR.CyclicCosAnnealingLR
  • schedulerfrom custom class. Add class to src/scheduler and add config with full path to the class starting with src

protobuf crash

HI- Thanks for putting this out there. I would love to see your refinements.
Currently I'm trying to run the MNIST example you have in the README and I'm getting the following crash when it tries to load the data. I tried changing the protobuf version a few times but it didn't seem to matter (v 3.8, 3.14, 3.11)

[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/descriptor_database.cc:393] Invalid file descriptor data passed to EncodedDescriptorDatabase::Add().
[libprotobuf FATAL external/com_google_protobuf/src/google/protobuf/descriptor.cc:1367] CHECK failed: GeneratedDatabase()->Add(encoded_file_descriptor, size):
libc++abi.dylib: terminating with uncaught exception of type google::protobuf::FatalException: CHECK failed: GeneratedDatabase()->Add(encoded_file_descriptor, size):

Add new optimizers

It is always good to have more options to choose. So it would be a good idea to add more optimizers. The steps are the following:

  • in conf/optimizer add a config for a new optimizer
  • if this optimizer requires some other library, update requirements
  • run tests to check that everything works with command pytest

Example: https://github.com/Erlemar/pytorch_tempest/blob/master/conf/optimizer/adamw.yaml

# @package _group_
class_name: torch.optim.AdamW
params:
  lr: ${training.lr}
  weight_decay: 0.001
  • # @package _group_ - default necessary line for hydra
  • class_name - full name/path to the object
  • params: parameters, which are overriden. If optimizer has more parameters than defined in config, then default values will be used.

There are 3 possible cases of adding an optimizer:

  • default pytorch optimizers. Simply add config for it.
  • optimizer from another library. Add this library to requirements, define config with class_name based on the library. For example adamp.AdamP
  • optimizer from custom class. Add class to src/optimizers and add config with full path to the class

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.