This project is growing in complexity and contains some unrelated modules that might make sense to move to a separate repository to reduce complexity. I'm unsure what's the best path forward, but here are my thoughts.
what do we want this package to be
moving forward, I think ploomber-engine should become a toolbox for executing notebooks (a papermill replacement). I imagine ploomber-engine providing a Python and CLI for running notebooks.
Example:
ploomber-engine input.ipynb output.ipynb
With the option to enable features such as debug later:
ploomber-engine input.ipynb output.ipynb --debuglater
or produce profiling plots:
ploomber-engine input.ipynb output.ipynb --profile
or track stuff:
ploomber-engine input.ipynb output.ipynb --track
or combine multiple things:
ploomber-engine input.ipynb output.ipynb --track --profile
Note: My only reservation with ploomber-engine becoming a notebook toolbox is that since it'll host the experiment tracker feature (at least right now), at some point it might not be the right place if we implement advanced experiment tracking features. (see last section of this comment)
things we have here
Currently, we have a few things here:
papermill integration
When installing ploomber-engine, papermill users get a few new custom engines (debug, debuglater and profiling), and they can use them with:
papermill in.ipynb out.ipynb --engine {name}
I'm unsure to what extent papermill users are using ploomber-engine for this purpose; however, ploomber is indeed using it and the documentation mentions how to switch engines by passing a new engine name (this is because ploomber still uses papermill as its execution engine)
I think we should split the core functionality from the integration with papermill. So maybe leave ploomber-engine as the package that provides papermill engines, and create a new package with a new name that contains the core functionality for running notetebooks. then have ploomber-engine be a dependency of that new package and keep ploomber depending on ploomber-engine.
To keep backwards support we can keep our tracker here for some time and show a deprecation warning so users move to the other one.
notebook executor
we have a custom notebook executor (i.e., a papermill replacement) that runs notebooks in the same process. this executor is what allowed us to provide the debugging, profiling and experiment tracking features.
experiment tracker
we have a command-line interface to track ML experiments, which is what we launched in this blog post. This tracker is an extra layer on top of sklearn-evaluation's SQLiteTracker - it runs the notebooks and logs stuff to a SQLite database without the user having to write custom code
The experiment tracker has received great response from the community so we'll likely keep investing in such feature. this posits a question of where such advanced tracking features will live. the SQliteTracker lives in sklearn-evaluation, the notebook tracker lives here, but if we build a UI for tracking or more advanced features, it's unclear where those should go.