jarnorfb / incense Goto Github PK

View Code? Open in Web Editor NEW

80.0 2.0 8.0 521 KB

Interactively retrieve data from sacred experiments.

License: MIT License

Jupyter Notebook 36.91% Python 62.35% Shell 0.09% Dockerfile 0.65%

sacred machine-learning mongodb

incense's Introduction

https://travis-ci.org/JarnoRFB/incense.svg?branch=master

Incense

Though automated logging of machine learning experiments results is crucial, it does not replace manual interpretation. Incense is a toolbox to facilitate manual interpretation of experiments that are logged using sacred. It lets you find and evaluate experiments directly in Jupyter notebooks. Incense lets you query the database for experiments by id, name or any hyperparmeter value. For each found experiment, configuration, artifacts and metrics can be displayed. The artifacts are rendered according to their type, e.g. a PNG image is displayed as an image, while a CSV file gets transformed to a pandas DataFrame. Metrics are by default transformed into pandas Series, which allows for flexible plotting. Together with sacred and incense, Jupyter notebooks offer the perfect solution for interpreting experiments as they allow for a combination of code that reproducibly displays the experiment’s results, as well as text that contains the interpretation.

Installation

Install the latest release

pip install incense

Or install the latest development version

pip install git+https://github.com/JarnoRFB/incense.git

Documentation

demo.ipynb demonstrates the basic functionality of incense. You can also try it out interactively on binder.

Contributing

We recommend using the VSCode devcontainer for development. It will automatically install all dependencies and start necessary services, such as mongoDB and JupyterLab. See .devcontainer/docker-compose.yml for details. If the output of id -u is something different than 1000 on your system, please add

export UID

to your .bashrc or .zshrc.

Building the container for the first time may take some time. Once in the container run

$ pre-commit install
$ python tests/example_experiment/conduct.py

to set up the pre-commit hooks and populate the example database.

Alternatively, you can use conda to set up your local development environment.

$ conda create -n incense-dev python=3.7
$ conda activate incense-dev
# virtualenv is required for the precommit environments.
$ conda install virtualenv
# tox-conda is required for using tox with conda.
$ pip install tox-conda
$ pip install -r requirements-dev.txt
$ pre-commit install

incense's People

Contributors

Stargazers

Watchers

Forkers

david-lindner vnmabus am2234 kkoutini michael-t-mccann lgtm-migrator christian-steinmeyer rajatvd

incense's Issues

Add option to delete whole querysets

Allow to pass host and port separately to ExperimentLoader

Currently, the host and port should be passed as a single URL to ExperimentLoader. If you already have those, it should be possible to pass them instead of building the URL.

Release in PyPI

The last release is not in PyPI. I suggest you to use a Github action to upload automatically to PyPI on release.

Add check that demo notebook runs without errors

Decoding numpy arrays require import sacred

Sacred creates its own encoder to encode numpy arrays in experiments. Thus, if sacred is not imported before obtaining an experiment with numpy arrays in the info dict, those are not unpickled. I propose to import sacred, at least when unpickle is True.

Slow when loading many experiments over network

Hi,
thanks for maintaining this cool library!

One problem I have encountered is that loading hundreds of experiments, each with several metrics, takes a very long time when done over a network.

E.g. loading 300 runs takes around a minute for me.

I think the problem is that the library sends a request to retrieve metrics for each run separately.
With 300 runs, loading the runs takes only around 5 seconds (one request) and the next 55 seconds (300 requests) are spent loading the metrics.

Would it be possible to load the runs and the metrics jointly with one request?
MongoDB should be able to fill in the metrics by their object IDs automatically I believe.

Thanks,
Ondrej

Deletion of experiments and caching

Hi, first of all thanks for making incense.

It seems like there is some caching issue (maybe related to #16 ?). In a notebook:

exps = loader.find_by_key("status", "FAILED")
print(len(exps))
for e in exps:
    e.delete(confirmed=True)
exps = loader.find_by_key("status", "FAILED")
print(len(exps))

outputs

714
714

Only after restarting the kernel does the query return 0 results.

Download all artifacts of QuerySet concurrently

Draft for the API:

loader.find(...).artifacts["my_artifact"].save()

Make experiment sources accessible

Make the sources of an experiment accessible similar to artifacts.

Migrate to Github Actions

incense 0.0.10 cannot support the FileSystemExperimentLoader

Hi，
I pip install incense. but I find the pip distributed version cannot support the FileSystemExperimentLoader. Will you release a new version to fix it?
Thanks a lot

Fix binder deployment

tries to import pytest
mongo can't find native libraries

Decode configs with jsonpickle

Add sphinx documentation

Unify filesystem and mongo experiments and loaders

Experiments and loaders for filesystem and mongo should have a similar interface. In particular, a method such as find_by_ids should be available also for the filesystem loader, even if in that case it is just a loop. This allows a user to deal with both in the same way, using the optimized implementation for Mongo if available.

Make all attributes from sacred data model available

Add codecov support

https://codecov.io/

Adapt observer interface in example experiment

Test with tox across versions

Add documentation about running tests

Column order is non-deterministic on Python 3.5

Set up travis CI

Add .env file for dev container setup

Make experiment objects immutable

Otherwise the caching mechanism could lead to unexpected side effects.

Make non string keys queryable

Give experiments an eq methods that also works with sacred runs

Make compatible with Python 3.5

Using https://github.com/asottile/future-fstrings

add artifact to experiment with Incense

Hello,
Thanks for developping this library. I use Sacred + MongoDB to store movie acquisitions of chemistry experiments and I use Incense to do post-treatment of the movies. Would there be an easy way to store the post-processing results as new artifacts ? something like exp.add_artifact("path_to_temp_file") ?
Thanks !

Add functionality to projects query directly onto dataframe

Load info dict in FileSystemExperiment

The info dictionary is not loaded in FileSystemExperiment. It should be easy to add.

Add prompt to delete method

Enhance project quality

isort
mypy
black

Add filter method to QuerySet

Great package! Would it be difficult to add a filter method to the QuerySet object? I'm imagining it would work similar to exps.project(on=["experiment.name", "config.optimizer", "config.epochs"]), except it takes a list of key:value pairs and return a filtered list of experiment objects. I often find myself loading all experiments of a particular name, then writing ugly for loops to further filter the list based on some criteria.

Throw error or warning if no experiments are found

Multiple reducer in QuerySet.project

Hi, thanks for the library! Very useful for working with sacred experiments.

Is there a way to specify multiple reducers for projection? Say, I want to get the min and argmin of a metric. Right now I can do one of them with query_set.project(on=[{'metrics.loss': np.min}]) (or np.argmin) but not both because dicts only allow unique keys. Looking at the code, it doesn't seem supported yet. Would be nice if it is!

Make pip installable

Replace show by render

Test artifact `save` methods

Add interface to load experiments lazily from a query

A common pattern when loading experiments is to pick some data for each experiment (such as the accuracy) and process it for each experiment (for example, storing it into a dataframe).

With the current code, in order to do that we have to first load all experiments into memory and then iterate over them. This is memory-expensive, and I had not enough memory in my machine for doing that for a (rather large) collection of experiments.

However, a lazy way to iterate over the experiments in the query can be provided, so that only one experiment needs to be created at a time.