Git Product home page Git Product logo

tsap's People

Contributors

bytesnake avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

tsap's Issues

Outline of tsap idea and prototype

In the last half year I used hydra extensively for machine learning at my university job. It provides a configuration tool for complex applications, remote capability (e.g. executing on SLURM cluster) and multiruns (e.g. doing hyperparameter search).

This issue proposes a similar crate for Rust and linfa to build and configure machine learning applications. Similar to structopt and clap it consists of two elements, a macro system to derive structs and a argument parser to override default hierarchies. Compared to established argument parsers, we don't follow Unix argument patterns and combine a default from file with override by arguments step. The crate is therefore geared towards ML instead of command line applications.

To keep dependencies low we make most of the API surface optional. In minimal setting the parameter macro generates only a builder pattern; allowing programmatical control of the algorithm. This is geared for consumer crates, which uses simple parameter sets and don't have to make configuration accessible. In more complex settings we also want to take files and command line arguments into account and extend our API surface to that of Hydra.

Because most of the infrastructure already exist, we use serde for serialization, TOML as configuration format and add a small template engine to rewrite TOML's configuration tree. Extending configuration from other files is a typical usecase for a resolver. In total tsap exists to give a unified macro for generating highly configurable parameter sets for algorithms we have at Linfa.

Motivation

We will start with a simple example to illustrate a usecase:

use tsap::param;

#[param]
struct RandomForest {
    ntrees: usize,
}

#[param]
struct SVClassifier {
    regul_term: f32,
}

#[param]
enum Model {
    RandomForest(RandomForest),
    SVClassifier(SVClassifier),
}

#[param]
struct Main {
    seed: usize,
    #[check]
    model: Model,
}

fn main() {
     // load default configuration from entry file
     let conf: Main = Main::from_file("config/main.toml")
        .amend_args()
        .seed(40)
        .try_into()?;

     // start experiment, do other stuff...
     conf.run_experiment(); 
}

The following binary calls are then valid:

  1. just plainly with default params: cargo run
  2. override default model: cargo run model=svclassifier
  3. do multirun with multiple seeds: cargo run -m model=randomforest seed=10,20,30,40

Default values are defined with a hierarchy of TOML files, typical parametrizations can be shaped by using using default values. For example the upper example could have one for the main entry and one for the model:

seed = 42

[model]
from_file = { base_path="models/", default="randomforest" }
variant = "randomforest"
ntrees = 128

The keys variant and from_file are reserved keys and indicate enum variant and where to look for the configuration. When overloading the model with a different configuration, the application will automatically look in that folder for a TOML file with the same name.

Guide-level explanation

WIP

Alternatives

  • use YAML instead of TOML (or make backend agnostic over configuration file)
    • complex hierarchy of YAML files, when your application can't be fitted into a simple structure, then you should refactor it
    • configuration backend agnostic seems sensible, but more work required
  • support traits instead of enums for variants of keys
  • add inline templates supporting system more similar to omegaconf (e.g. "cmd:data" with expansion to current date)

Unresolved Questions

  • is the current prototype possible in a static typed language?
  • should we add a builder pattern generator to help simplify linfa's parameter structs?
  • should we implement Try when building for nightly, this would support ? for building

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.