Git Product home page Git Product logo

generic-vision-trainer's Introduction

GenViT: Generic Vision Trainer

This repository is a generic, extensible, and robust vision model trainer. It is designed to be a good starting point for any vision project, and to be easily extensible to new models and datasets, and tasks, while also offering robust support for multi-gpu training and automatic mixed-precision training. It's an amalgamation of all the nice parts that I like about the repos I've worked with over the years.

While there exist many repositories which are lighter-weight and hackable, I usually desire something more robust for experimentation which offers reproducibility while still being reasonably hackable. As a result, I've chosen to adopt a yacs-based configuration system, which works reasonably well for most of my image recognition related tasks.

I generally subscribe to the philosophy that everything should be written in the config, where having a config and the same version of code is all that is needed to reproduce a result. However, I'm not nearly as strict about there only being one way to doing this, as using command line args for hacking is very useful. Another point that I follow is a factoring of the codebase into a few key components which are separated, making it easier to hack separately on. Every component is built in its corresponding build.py file, and configs should be entirely handled within that file so that the components themselves can be used in isolation or in other applications.

The optimizer is also step-based rather than epoch-based, which is a bit more flexible for my use cases.

Quickstart

To run an example, try running the train_resnet18.sh script. This will train a resnet18 on the CIFAR10 dataset.

Todos:

  • Multiple slurm submit options (PySlurm and submitit).
  • Proper wandb support with loggers (D2 like?).

Nice repository references and inspirations

Tiny, hackable codebases:

  • Karpathy's minGPT
  • Bring Your Own Latent (BYOL)
  • Kaiming's Masked Autoencoders (absolutely superb for submitit, helpful for slurm).
  • Karpathy's nanoGPT (slightly more structure but enough to train large models)
  • CleanRL, nice for RL... obviously

Medium sized repositories with good abstractions (actual folder breakdowns):

  • Berkeley's nerfstudio (for graphics however).
  • Microsoft Swin Transformer. Most of the code is taken from here because I enjoyed my time working with this repoository.

Larger codebases that are decent:

  • Facebook's PySlowFast, but too bloated to hack with (much like detectron2).

generic-vision-trainer's People

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.