Git Product home page Git Product logo

morty's Introduction

Morty

Morty is a lightweight experiment and configuration manager for small ML/DL projects and Kaggling.

Main Features:

  • Configuration Management. Morty includes a config loading system based on the python files that makes you configure a wide variety of moving parts quickly and without overheads.
  • Experiment Management. Morty provides a flexible, simple and local experiment management system that tracks a lots of context about your project state to make it possible to reproduce experiments.

Installation

pip install morty
# or
poetry add morty

Example of Usage

Trains a Keras model on MNIST:

import numpy as np
from tensorflow import keras
from tensorflow.keras import layers

from examples.configs.basic_config import Config
from morty import Experiment, ExperimentManager
from morty.cli import Option, run
from morty.trainers import TensorflowTrainingTracker


def train(
    config: Config = Option(default=Config, help="Experiment Configurations")
) -> None:
    experiment: Experiment = ExperimentManager(configs=config).create()

    (x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

    x_train = x_train.astype("float32") / 255
    x_test = x_test.astype("float32") / 255

    x_train = np.expand_dims(x_train, -1)
    x_test = np.expand_dims(x_test, -1)

    y_train = keras.utils.to_categorical(y_train, config.num_classes)
    y_test = keras.utils.to_categorical(y_test, config.num_classes)

    model = keras.Sequential(
        [
            keras.Input(shape=config.image_shape),
            layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
            layers.MaxPooling2D(pool_size=(2, 2)),
            layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
            layers.MaxPooling2D(pool_size=(2, 2)),
            layers.Flatten(),
            layers.Dropout(0.5),
            layers.Dense(config.num_classes, activation="softmax"),
        ]
    )

    model.compile(
        loss="categorical_crossentropy",
        optimizer=config.optimizer,
        metrics=("accuracy",),
    )

    model.summary()

    training_history = model.fit(
        x_train,
        y_train,
        epochs=config.epochs,
        batch_size=config.batch_size,
        validation_split=config.val_dataset_fraction,
        callbacks=(TensorflowTrainingTracker(experiment),),
    )

    experiment.log_artifact("training_history.pkl", training_history)

    test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0)

    print(f"Test loss: {test_loss}")
    print(f"Test accuracy: {test_accuracy}")


if __name__ == "__main__":
    run(train)

Citation

If Morty helped you to streamline your research, be sure to mention it via the following BibTeX entry:

@Misc{Glushko2021Morty,
  author =       {Roman Glushko},
  title =        {Morty - a lightweight experiment and configuration tracking library for small ML/DL projects and Kaggling},
  howpublished = {Github},
  year =         {2021},
  url =          {https://github.com/roma-glushko/morty}
}

Acknowledgment

Credentials

Made with โค๏ธ by Roman Glushko (c)

morty's People

Contributors

deepsourcebot avatar roma-glushko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

morty's Issues

Create experiment indexing system

Experiment Indexer should collect information that is needed to build the leaderboard. The system should be able to identify index inconsistencies and the recent changes in order to perform an automatic partial reindex of the experiment information.

[Story] Config and Experiment Management - MVP

Description

Config Management

For small to middle (side) projects and experiments, it's useful to have a simple, straightforward and flexible configuration system. It's not convenient to copy-past even simple plain dictionary/class based configuration loading code.

Also, when using configs, we need to be able to:

  • access nested values in the easy and readable way
  • change different moving parts of the experiment (e.g. loss functions, optimisers, LR schedulers, feature extractors, architectures or parts of them)
  • load secrets from a place that is not under version control, so all env variables could be safety stored separately and loaded along with other configs.

Experiment Management

In the same time, it may be helpful to have a simple and straightforward experiment management system. We can't focus too deep on this part as we probably never replace Neptune.ai or W&B functionality.

However, in some cases, installing them may be an overhead and all that scientist may need is a just plain and quick way to understand what is the best scored experiment so far and how it was produced.

In other cases, they may not be compatible with ML/DL framework version you need (like TF 2.5 RC Ampere-compatible version may not be compatible with Neptune dependency list)

Solution

Add a straightforward and simple way to manage configs/hyperparams and track experiment outcomes.

Configurations

We may based our configs on python dictionaries specified as separate files/modules. This should be a flexible approach that allows to make configurable such a pieces of code as augmentation pipelines. This would be annoying to add another layer of abstraction to just be able to experiment with augmentation.

Another useful thing would be to have reusable factories which allows to register different "moving" components (e.g. losses). So we could combine configured class/type value and create an instance of it via the factories.

Experiment Tracking

It's a bit harder to imagine a simple experiment management system to keep straightforward and still useful. The system would be local and file-based. All information could be logged in separated directories where experiment details could be logged.

We would like to see the following information logged:

  • entire script output
  • backup of hyperparams
  • GIT commit hash of the project
  • GIT diff patch (many changes are done on the fly, so this may be helpful)
  • ability to backup specific files (in case, it's not feasible to make them configurable)
  • ability to dump and save any artifacts (like training history or embeddings)

We are not going to focus on more advanced way of reusing logged information like plotting learning curves or embeddings.
However, DVC provides a simple way to plot some kind of information and we could use the system compatible with their plotting functionality.

Let's call it an MVP.

References

The main source of inspiration came from Kaggle master's pipelines and Hydra config system:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.