Git Product home page Git Product logo

dvclive's Introduction

DVCLive

PyPI Status Python Version License

Tests Codecov pre-commit Black

DVCLive is a Python library for logging machine learning metrics and other metadata in simple file formats, which is fully compatible with DVC.


Quickstart

Python API Overview PyTorch Lightning Scikit-learn Ultralytics YOLO v8

Install dvclive

$ pip install dvclive

Initialize DVC Repository

$ git init
$ dvc init
$ git commit -m "DVC init"

Example code

Copy the snippet below into train.py for a basic API usage example:

import time
import random

from dvclive import Live

params = {"learning_rate": 0.002, "optimizer": "Adam", "epochs": 20}

with Live() as live:

    # log a parameters
    for param in params:
        live.log_param(param, params[param])

    # simulate training
    offset = random.uniform(0.2, 0.1)
    for epoch in range(1, params["epochs"]):
        fuzz = random.uniform(0.01, 0.1)
        accuracy = 1 - (2 ** - epoch) - fuzz - offset
        loss = (2 ** - epoch) + fuzz + offset

        # log metrics to studio
        live.log_metric("accuracy", accuracy)
        live.log_metric("loss", loss)
        live.next_step()
        time.sleep(0.2)

See Integrations for examples using DVCLive alongside different ML Frameworks.

Running

Run this a couple of times to simulate multiple experiments:

$ python train.py
$ python train.py
$ python train.py
...

Comparing

DVCLive outputs can be rendered in different ways:

DVC CLI

You can use dvc exp show and dvc plots to compare and visualize metrics, parameters and plots across experiments:

$ dvc exp show
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Experiment                 Created    train.accuracy   train.loss   val.accuracy   val.loss   step   epochs
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
workspace                  -                  6.0109      0.23311          6.062    0.24321      6   7
master                     08:50 PM                -            -              -          -      -   -
โ”œโ”€โ”€ 4475845 [aulic-chiv]   08:56 PM           6.0109      0.23311          6.062    0.24321      6   7
โ”œโ”€โ”€ 7d4cef7 [yarer-tods]   08:56 PM           4.8551      0.82012         4.5555   0.033533      4   5
โ””โ”€โ”€ d503f8e [curst-chad]   08:56 PM           4.9768     0.070585         4.0773    0.46639      4   5
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
$ dvc plots diff $(dvc exp list --names-only) --open

dvc plots diff

DVC Extension for VS Code

Inside the DVC Extension for VS Code, you can compare and visualize results using the Experiments and Plots views:

VSCode Experiments

VSCode Plots

While experiments are running, live updates will be displayed in both views.

DVC Studio

If you push the results to DVC Studio, you can compare experiments against the entire repo history:

Studio Compare

You can enable Studio Live Experiments to see live updates while experiments are running.


Comparison to related technologies

DVCLive is an ML Logger, similar to:

The main differences with those ML Loggers are:

  • DVCLive does not require any additional services or servers to run.
  • DVCLive metrics, parameters, and plots are stored as plain text files that can be versioned by tools like Git or tracked as pointers to files in DVC storage.
  • DVCLive can save experiments or runs as hidden Git commits.

You can then use different options to visualize the metrics, parameters, and plots across experiments.


Contributing

Contributions are very welcome. To learn more, see the Contributor Guide.

License

Distributed under the terms of the Apache 2.0 license, dvclive is free and open source software.

dvclive's People

Contributors

aazuspan avatar alexandrekempf avatar blacku13 avatar daavoo avatar dberenbaum avatar dependabot[bot] avatar dmpetrov avatar dtrifiro avatar efiop avatar floer32 avatar francisquintallauzon avatar github-actions[bot] avatar kwon-young avatar mattseddon avatar mnrozhkov avatar mvshmakov avatar naibatsuteki avatar natikgadzhi avatar ocraft avatar omesser avatar pacifikus avatar pared avatar pmrowla avatar pre-commit-ci[bot] avatar raphcoterta avatar shcheklein avatar sirily avatar sisp avatar skshetry avatar swarajpande5 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dvclive's Issues

logging: configure logs directory

The ability to control the logging directory in dvc.yaml files is excellent. However, it does not appear to support subdirectories. Logs must be in the current running directory. It would be helpful to be able to specify a directory to place generated logs into as a configuration option under the live tag in dvc.yaml

integrations: sklearn

Seems we should be supporting at least few popular frameworks.

Considering their popularity, we should probably start with:

  • keras - we have initial implementation
  • sklearn
  • xgboost

Worth considering:

  • - FastAi - #136
  • - pytorch lightning

TF and PyTorch - it seems to me that using their pure form is done when users need highly custom models, and probably in that cases they will be able to handle dvclive by hand.
@dmpetrov did I miss some popular framework?

EDIT:
crossing out FastAi as it has its own issue now

api: make dvclive thread safe

Currently only safe way to use dvclive in threads is to manually create MericsLogger class.

In other case, for example:

import dvclive
from concurrent.futures import ThreadPoolExecutor
from time import sleep

def log_some(index):
    dvclive.init(str(index))
    metrics = [index * i for i in range(10)]
    for m in metrics:
        dvclive.log(f"metric-{str(index)}",m)
        sleep(0.1)
        dvclive.next_step()


with ThreadPoolExecutor(max_workers=10) as e:
    e.map(log_some, [1,2,7,13,17])

we will get mixed up results. Research whether it is possible to keep current API functionality and be thread safe at the same time.

logger: Add notifier to `next_step`?

Depending on the type of model to be trained, the time in between calls to next_step may vary significantly. In common deep learning scenarios, i.e. the keras callback, next_step is being called at the end of an epoch which could result in long times (maybe hours) in between calls.

It could be useful to have built-in support for optionally sending a notification each time next_step is being called.

Without changing dvclive, the user could just call a custom library (i.e. https://github.com/liiight/notifiers) after next_step:

class MetricsCallback(Callback):
    def on_epoch_end(self, epoch: int, logs: dict = None):
        logs = logs or {}
        for metric, value in logs.items():
            dvclive.log(metric, value)
        dvclive.next_step()
        notify('pushover', user='foo', token='bar', message=f'epoch: {epoch}')

But having the notification step built inside MetricLogger would have some benefits like access to internals (i.e. _metrics) and configuration options in addition to hiding complexity to the end user.

However, I'm not sure if it is worth to implement this feature inside dvclive or if it would be better to keep dvclive as lightweight as possible.

Installing dvclive breaks dvc's version command

I wanted to test dvclive in a project I'm already using dvc.
My current version of dvc is 2.0.5.
After install dvclive, the command dvc --version returns 0.0.1 (which is thus also wrongly shown when running dvc repro ...).

Step to reproduce with python 3.8.7 on a Mac:

pip install dvc==2.0.5
pip install dvclive
dvc --version

Some information (after installing dvc and dvclive:

$pip list
Package           Version
----------------- ---------
appdirs           1.4.4
atpublic          2.1.3
cached-property   1.5.2
certifi           2020.12.5
cffi              1.14.5
chardet           4.0.0
colorama          0.4.4
commonmark        0.9.1
configobj         5.0.6
decorator         4.4.2
dictdiffer        0.8.1
diskcache         5.2.1
distro            1.5.0
dpath             2.0.1
dulwich           0.20.20
dvc               2.0.5
dvclive           0.0.1
flatten-dict      0.3.0
flufl.lock        3.2
fsspec            0.8.7
ftfy              5.9
funcy             1.15
future            0.18.2
gitdb             4.0.5
GitPython         3.1.14
grandalf          0.6
idna              2.10
jsonpath-ng       1.5.2
mailchecker       4.0.3
nanotime          0.5.2
networkx          2.5
packaging         20.9
pathlib2          2.3.5
pathspec          0.8.1
phonenumbers      8.12.19
pip               21.0.1
ply               3.11
psutil            5.8.0
pyasn1            0.4.8
pycparser         2.20
pydot             1.4.2
pygit2            1.5.0
Pygments          2.8.1
pygtrie           2.3.2
pyparsing         2.4.7
python-benedict   0.23.2
python-dateutil   2.8.1
python-fsutil     0.4.0
python-slugify    4.0.1
PyYAML            5.4.1
requests          2.25.1
rich              9.13.0
ruamel.yaml       0.16.13
ruamel.yaml.clib  0.2.2
setuptools        49.2.1
shortuuid         1.0.1
shtab             1.3.5
six               1.15.0
smmap             3.0.5
tabulate          0.8.9
text-unidecode    1.3
toml              0.10.2
tqdm              4.59.0
typing-extensions 3.7.4.3
urllib3           1.26.4
voluptuous        0.12.1
wcwidth           0.2.5
xmltodict         0.12.0
zc.lockfile       2.0

It was tested in a project and repeated in a fresh virtual environment with the same output (i.e. dvc --version returning 0.0.1).

But surprisingly, if installing first dvclive and then dvc, dvc --version returns the expected 2.0.5.

summary: add option to supress

In some cases user might not need summary json and just metrics history.
Add a way to prevent dvclive from dumping the latest step metrics.

integrations: other ML tracking tools

Now that dvclive is decoupled from dvc, it should be pretty trivial to log data in formats expected by other ML tracking tools like mlflow. It would be great to establish a pattern for adding integrations to any other ML tracking tools, making dvclive an agnostic lightweight wrapper for any other ML tracking tools.

This approach has a few benefits:

  • Users who don't have dvc or don't want to use dvc integration still have options for visualizing their model progress.
  • Users can switch between different ML tracking tools by simply switching the integration they use. For example, specifying the integration might be as simple as dvclive.init(style="mlflow").
  • Less development resources spent on visualization in dvclive itself.

Tools to support for integration:

logging: infer when the step should be finalized

Currently we require user to call next_step to append metrics to history, and dump new summary json.

We could probably omit that with some logic, eg:

  • assume new step when trying to insert duplicated key to metrics dict.

Worth checking if and how other tools deal with that.

dvc integration: summarizing already running experiment

Scenario:

  1. We run experiment, with logger turned on
  2. We want to take a peek into our training
  3. We run a command dvc logs show --continuous {target} (I presume dvc logs show {target} will be reserved for ad-hoc check
  4. Command keeps generating new HTML until we are done with training

We need to detach summary generation from dvclive (as it is done currently).

EDIT:
probably we do not need to implement --continuous in the beginning as it is easily replaceable by simple watch command.

dvclive: reconsider initialization

Currently, when doing from dvclive import dvclive
we are initializing empty logger. Only calling dvclive.init actually "inits" the object.
We also have init function that calls dvclive.init.
As a result, for example in one of the tests we need to initialize DvcLive by hand, because imported logger depends on previous test run. That should not happen, need to come up with initialization that will not depend on previous imports.

dvc: random order in html

If I regenerate html periodically during the training I'm getting random order of the plots.

The order should be stable and ideally, the same as the reporting order and metrics diff output order.

image

integrations: HuggingFace transformers

It would be nice to have a dvclive integration with the popular repository transformers from huggingface.

Existing integrations are maintained in the repository implemented as callbacks:

https://github.com/huggingface/transformers/blob/master/src/transformers/integrations.py
https://huggingface.co/transformers/main_classes/callback.html#available-callbacks

We would maintain the implementation in the DVCLive repository.

  • Add new callback to DVCLive
  • Update docs in dvc.org

summary: Add option to store **best** value?

By default the values for each metric at the latest step are being saved in the summary. It might be interesting to add an option to additionally save the best value for each metric.

An example use case would be the usual deep learning training loop where the model is being validated at the end of each epoch and the last epoch is not necessarily the one with the best performance. Having the best value saved in the summary could be more useful for comparing experiments (i.e. dvc metrics diff --targets dvclive.json)

A potential problem would be how to expose to the user (#75 (comment)) some options like whether to use the ยฟsave_best? or not and what to consider as better (i.e. higher or lower)

Proposal for more visual output options for standalone dvclive

Currently, dvclive without dvc produces just a tsv output tracking metrics progress. This doesn't seem like enough to make it worthwhile for users. There are at least a few drawbacks to the current output:

  • It's not visual - some kind of plot, progress bar, etc. would be easier to read
  • It's in a separate file that needs to be opened and refreshed as training progresses

Ideally, it would be great to have a nice dynamic plot that shows up in stdout during training and also optionally gets saved to a file at the end. More realistically, we might have different output options that users could choose or that dvclive could default to depending on what's available:

  • Use dvc plots to visualize if installed (although this is still a static separate file)
  • Use matplotlib or other python plotting libraries to render and update as training progresses
  • Use stdout to make a very simplistic plot or to print out the metrics being written to tsv

Hopefully, there are better suggestions than these! Interested in thoughts on the general concept, and individual output types can be opened in separate issues if needed. Keeping dvclive as lightweight as possible should also remain a priority.

errors: get rid of DvcLiveError calls.

DvcLiveError is an error that should be an abstract exception that lets end user know where it did originated.
We should replace current occurences of raise DvcLiveError with particular errors with naming related to the cause.

integrations: CML

From a high level user perspective, I think it would be interesting to have some kind of integration between dvclive and CML.

Just like there is already a custom dvclive behavior when used along with DVC (i.e. generating checkpoints) maybe a similar approach could be used when dvclive is also used along with CML.

I was thinking on a use case like dvclive taking care of using cml-send-comment inside next_step if CML is detected.

Conceptually i think that it would be something similar to the existing make_checkpoint:

https://github.com/iterative/dvclive/blob/master/dvclive/metrics.py#L105

This might be just an specific use case of #90

Don't start Usage with Keras exampe

The current dvclive Usage Guide directly jumps into the keras integration omitting the possibility of it's most basic usage (as a python library). Likewise the Dvclive with DVC is also "tied" to this keras example.

I think that it might be better to start the Usage Guide with a standalone approach (something similar to this existing section of the README) and a separate page explaining the usage along with DVC and link to a new page called Integrations.

The current content of the Usage Guide could be moved to a new Integrations/keras section. And we might create new pages for other undocumented integrations like xgboost or mmcv. Further integrations could be simply appended to Integrations.

Something like:

dvclive/
โ”œโ”€โ”€ Integrations
โ”‚ย ย  โ”œโ”€โ”€ keras
โ”‚ย ย  โ”œโ”€โ”€ mmcv
โ”‚ย ย  โ””โ”€โ”€ xgboost
โ””โ”€โ”€ Usage
    โ”œโ”€โ”€ StandAlone
    โ””โ”€โ”€ WithDVC

logger: env var configuration

Detected in #74 (review)

Apparently DVCLive can be configured via env vars ๐Ÿ™‚

@staticmethod
def from_env():
from . import env
if env.DVCLIVE_PATH in os.environ:
directory = os.environ[env.DVCLIVE_PATH]
dump_latest = bool(int(os.environ.get(env.DVCLIVE_SUMMARY, "0")))
html = bool(int(os.environ.get(env.DVCLIVE_HTML, "0")))
checkpoint = bool(int(os.environ.get(env.DVC_CHECKPOINT, "0")))
resume = bool(int(os.environ.get(env.DVCLIVE_RESUME, "0")))
return MetricLogger(
directory,
summary=dump_latest,
html=html,
checkpoint=checkpoint,
resume=resume,
)
return None

I don't see this info neither in the README nor in any page of https://dvc.org/doc/dvclive. Should we add it?

Esp. if #74 gets merged.

logger: refactor initialization error message

After #67 we initialize logger in default dir dvclive.
In next_step, we rise InitializationError, which message suggests that init has not been called.
This situation actually happens only if we

  1. Neve call dvclive.init
    AND
  2. Never call dvclive.log

So the error message is misleading. It should be refactored in a way that tells user he never called init nor log

tests: integrations with DVC

Those should probably be made in dvc repo, but I create issue here to not clutter the main repo.

  • "normal" use case - dvc --logs produces plots and metrics
  • checkpoints - we should be able to plots show/diff and metrics diff for experiments using checkpoints

More granular control on caching of logs with dvclive

The cache flag under live in dvc.yaml seems to be an all-or-nothing type flag (i.e., all log files, summaries, and HTML must be cached or none of them). The logging directories, which contain each iteration of data, are less likely to be checked into git than the summaries are. This is due to the large number of differences that will always be present in the logged iteration data. Summaries are generally smaller and thus a prime candidate for being tracked with git.

Currently, this can be done by setting cache to true and removing summary files from .gitignore. This seems counter to the intentions of DVC providing the cache option. I'm also unsure of the implications of doing this. Does DVC still track that item? Is it now duplicated in git and DVC tracking?

Adding the options for individually caching of the outputs of the live tag would allow for easier workflows when only the summaries from logging are to be tracked.

fix naming among the project

During development I made some arbitrary decisions that needs to be reconsidered:

  • next_step - shouldn't it be next_epoch or even just next?
  • init args - report, dump_latest, step - its hard to grasp what does what

While at it, it would be good to take care of dvc -side too

Feature Request: Support different formats of summaries

Summaries are currently printed out as a single JSON line. It would be handy to be able to change this format to a more readable format (such as YAML or multiline JSON).
These summaries are more likely to be checked into git. As such, having them in a more human-readable format would be beneficial.

dvc integration: handling dvclive- produced metrics and plots

I would like to start a discussion on how do we let our users specify that particular directory is a logs dir.

  1. What dvclive produces?
    Lets assume we initialized dvclive in code:
    dvclive.init("live_logs")
    As of today, when we use dvclive it dumps the history of particular logged metrics into respective .tsv files under "live_logs"
โ”œโ”€โ”€ live_metrics
โ”‚ย ย  โ”œโ”€โ”€ accuracy.tsv
โ”‚ย ย  โ””โ”€โ”€ loss.tsv
โ”œโ”€โ”€ live_metrics.json
...

accuracy.tsv - stores history of accuracy metric across all registered steps
loss.tsv - does the same for loss
live_metrics.json - stores JSON containing all the metrics logged during latest step of training.

So, we can translate this structure to dvc "language" by saying that live_metrics stores dvc plots and live_metrics.json is a dvc metrics file.

I was wondering how to integrate this with dvc and my first idea was adding --logs/--logs-no-cache options for dvc run.
Essentialy what were those flags doing was converting
dvc run ... --logs live_logs ... into dvc run ... --metrics live_logs.json --plots live_logs ....

For now I removed it from my dvc's PR because it can be solved by just using --metrics and --plots.

Now, I would like to discuss how we should handle it in DVC, since, the changes there can be significant, depending on what we choose to do.

Option 1:

  • We go with my initial idea, create new output type on dvc side - this will require more work, adds additional options to dvc run and will require us to adjust dvc to dvclive development - which does not sound too good to me.

Option 2:

  • We just tell users "When working with dvclive, you need to specify live_logs.json as metric and live_logs as plots" - that is also not ideal, as at the beginning of working with both tools we require user to get understanding on dvclive data structure and dvc options.

Option 3:

  • create --logs / --logs-no-cache as a convenience method: they could be converted to metrics/plots right in dvc run command definition. In that use case there is not too much integration with DVC, we don't have to entrench dvclive concepts in dvc core, yet we have something for first-time users. In this case I would make sure to mention that this is only convenience method and when, for example writing your own dvc.yaml one needs to specify the logs properly.

I would go with option number 3, it makes the most sense to me.

@dmpetrov I would love your opinion on that.

tensorflow integration: fix installation

Currently we setting up tensorflow by trying to check if its already installed (by importing it in setup.py), If it does not exist, we install cpu version.

I think even installing tf for users is suboptimal. We should probably just check if tf is installed and print help how to install it, so that user install version that matches his needs.

But the problem with this approach is that it will effectively prevent any user from successful installation, even when he/she does not want to use Keras integration at all.

We need to introduce "targeted" installs pip install dvclive[(keras|other_lib|all)] before implementing previous point.

Questions about integrations and scope of `dvclive`

Hi there!

First of all, thank you for your open source efforts.

I have been using dvc for a while and, as we are migrating our workflows to the new version 2.X, we are considering to integrate dvclive and willing to contribute. After reading the documentation along with #5 and #66 I am a little confused about the scope of the project and how to proceed with the integration.

We mainly use mmcv and (to a lesser extent) pytorch-lightning as training frameworks. In both those frameworks the usual procedure for adding support to a new metric tracking tool would be to send a P.R. directly to those repositories adding a new Logger (pytorch-lightning / mmcv). This leads me to the next question:

  1. Would be better to send a P.R. to those frameworks or to this repository (as I understand reading #5) ?

In addition, as my understanding of the official documentation and release post does not match the direction described in #66:

  1. Should we consider dvclive, in the short term, as an alternative, companion or wrapper to other metric tracking tools we might be currently using (i.e. MLflow Tracking )

Thanks in advance

checkpoints: num(ber)/epoch awareness

Solves two problems:

  • dvc exp run && dvc exp run shouldn't re-run everything a second time
    • Yes, the user code can check this but so what? Same argument for non-checkpointed dvc repro && dvc repro. DVC should non re-run.
  • interrupting dvc exp run (e.g. due to a runner timeout) and resuming shouldn't re-start from checkpoint zero.

readme: migrate to rst

Using markdown README, results in ugly description on pypi page:

# dvclive dvclive is an open-source library for monitoring machine learning model performance.

dvclive aims to provide the user with simple python interface what will allow the user to log the model metrics as the training progresses.

The interface consists of three main methods: 1. dvclive.init(path) - initializes dvclive logger. The metrics will be saved under path. 2. dvclive.log(metric, value, step) - logs the metric value. The value and step will be appended to path/{metric}.tsv file. The step value is optional. 3. dvclive.next_step() - signals dvclive that current step has ended. Executed automatically if metric is logged again.

Should the `dvc stage add --live <path>` arg override `init(path)`?

It's more of an idea for dvc stage add but I thought I'd ask here first (we can transfer the issue if needed).

If you call dvc.init() manually, path is always required. So in those cases, combined with using DVC, you have to match path with the arg. of dvc stage add --live. Seems error-prone.

The arg to --live could be optional. If there's no init() call in code, then DVC would need to give an error

UPDATE: Calling init() is no longer required though, as DVCLive has a default path for its outputs now.

Thoughts?

Introduce default directory

dvclive.error.InitializationError: Initialization error - call `dvclive.init()` before `dvclive.log()`

Why don't we use a default dir if it is not set up explicitly?

logger: track system metrics automatically

Right now dvclive doesn't track system metrics, to name a few, CPU, GPU, RAM utilisation. This is useful when comparing experiment results (e.g., I can get +1% accuracy, but what is the price in terms of time/resources?) and analysing how long it takes to run training on different GPUs (e.g., I can rent another GPU model and get 2x speedup?).

The usual practice is to log this somehow manually with https://github.com/giampaolo/psutil and analyse the results later, but because heavy experiments in ML require this quite often, IMO it makes sense to have this functionality out-of-the-box.

Also it would be great to have summary on these metrics in .json file produced in experiment to make quick decisions instead of diving too deep (e.g., the average CPU utilisation was 4 cores, my script doesn't utilise all 32 cores I have; the peak RAM utilisation was 8GB, that means I can rent a smaller server on aws to run this training; etc).

To name one example among ml tools, this is already tracked with W&B, see the bottom of dashboard: https://wandb.ai/stacey/estuary?workspace=user-lavanyashukla

This page states main metrics logged in W&B https://docs.wandb.ai/ref/app/features/system-metrics

If this will be useful, I could gather a list of metrics with notes describing the cases these metrics are helpful to a user in ml tasks.

`dvclive.log`: log array, tensors and similar objects

AFAIK, some experiment management libraries (e.g., w&b) can log not only simple python objects, but also more complex, few of them being numpy arrays, torch tensors, tensorflow tensors, etc. It seems like a general functionality which can help a lot of users and make the logging process more convenient for them.

Also, surprisingly to me, w&b client library doesn't have numpy/torch/tensorflow dependencies: https://github.com/wandb/client/blob/master/requirements.txt

Could we implement something similar for the dvclive? If the answer is positive, I'm gonna do a little research and post more examples on what custom types would be great to support.

Clean up summary and html files

It seems like all the metrics files are cleaning up when init() is called. But summary and html files stay in the same place.
Should we clean them up as well?

make build run dvc live tests

As a build step, we could clone dvc repo and run live integration tests.
It might be worth considering to move all integration tests to dvclive
Related to #28

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.