Cellout is both a command-line utility as well as a Haskell library for
operating on Jupyter notebooks. You can think of cellout as a translation layer
between different kinds of notebook formats. An additional goal of cellout
is
to provide nbconvert
functionality without a python runtime dependency.
Cellout was written to provide a solid foundation for experimenting with notebook file formats. Integrating well with pandoc by providing readers has been one of the motivating factors.
The haskell library contains a well-typed Jupyter notebook document representation, as well as facilities for operating on notebooks (such as filtering output, removing metadata, concatenating multiple notebooks).
Note: cellout
is under active development, some of the behaviors outline in the
rest of this document are how it should work, and does not
necessarily reflect what is implemented right now.
$ cellout --clear-output Untitled314.ipynb
$ █
When there are no errors to report, cellout stays quiet, though you can pass the
--verbose
flag to get informational output.
$ cellout --clear-output --verbose Untitled314.ipynb
Wrote Untitled314.ipynb
$ █
A shortcut for the --verbose
flag is just -v
and you can add it multiple
times to get higher levels of verbosity (more and more details).
$ cellout --clear-output -vvv Untitled314.ipynb
Options {input = "Untitled314.ipynb", clearOutput = False, clearPrompt = False, outputFilename = "", outputType = "notebook", verbosity = 3}
Reading Untitled314.ipynb
Notebook contains 7 cells (3 code and 4 markdown)
Wrote Untitled314.ipynb
$ █
Personal context: I am writing cellout in part to have a mechanism for exploring
future notebook formats that is not tied down to the JSON .ipynb
format. From
the beginning of nbconvert back in 2012, I've expressed the opinion in the past
that we should integrate with pandoc as much as we can and not have nbconvert do
so much of what it does, and all that was missing is someone sitting down and
learning more Haskell, which is what I've now started doing myself.
The JSON based .ipynb
Jupyter's notebook format has been hugely successful, on
the one hand, because everything is in one file and it can act as a unit of
sharing, but it has also been a hindrance in other contexts. For example, the
JSON format impedes effective version control, and being a single-file monolith
means you can't load large notebooks quickly, since the entire .ipynb
file needs
to be transferred before the javascript can render it. That's not a big deal for
short little notebooks, but in practice it has meant that no one writes long
notebooks, and it's a bummer that there's effectively a cap on how many plots
you pack into one.
At the last Project Jupyter all hands week-long meeting back in May 2018, we
discussed different requirements, trade-offs, features, limitations and pain
points for possible future versions of the notebook document format. So there
are several ideas that people have for addressing particular kinds of uses cases
where .ipynb
file limitations end up a bottleneck.
As a Haskell library, cellout provides a well-defined in-memory representation of a notebook and can be used to convert and translate between these future formats. It will also allow us to integrate better with pandoc, and bring nbconvert functionality in a standalone executable (no python dependencies).
notebook* (* - or some notebook-compatible file format)
|
V
cellout (filter/transform)
|
V
notebook*
In a more complex scenario, you can also use cellout as a reader for pandoc:
notebook*
|
V
cellout (filter/transform)
|
V
pandoc
|
V
one of the many output formats pandoc supports
Pandoc filters operate on the pandoc abstract syntax tree (AST), which I think would be "lossy" without some hacks. It's quite reasonable to convert a notebook to a document, but to have that document preserve notions of cells, outputs, etc, would create too much work for writing things like "clear the HTML-based outputs" while operating on the pandoc AST.
-
[p] - partial
-
- fully supported
-
- planned
-
[p] .ipynb (nbformat 4.2)
-
comment percent-percent format used by spyder
-
jupytext's extensions to the percent format
-
folder ("manila folder")
-
zip-file ("manila envelope")
Similarities
-
--clear-output
flag -
--no-prompt
flag -
Invoking
cellout
andnbconvert
without an input file will print the usage information, instead of attempting to read from stdin aspandoc
does. -
Invoking
cellout
without an output file and without an output format, or with an output format that matches the input file will make changes in-place. (This is provided by the--inplace
flag ofnbconvert
)
Differences
-
Whereas
nbconvert
will default to converting to html,cellout
keeps the same file format, which means that at most, invokingcellout filename
without any other flags will read in the filename and write it back out, possibly with slightly different formatting, but without changing any data. -
Whereas
nbconvert
reads jupyter configuration files at startup, there are no configuration files forcellout
- all options are specified at the command line. -
--execute
flag is not supported at this time.
Similarities
- Like
pandoc
, there are no configuration files forcellout
- all options are specified at the command line.
Differences
-
When invoked without an input file,
pandoc
, like the venerable UNIXcat
command will await input from stdin. While this is a completely legitimate default behavior, at this time, I think it would be best to just print the usage information, which is whatcellout
does. -
Similarly, unlike
pandoc
which writes to stdout when the output file is not specified, the result forcellout
is written to a file. If the output format does not differ from the input format, the change is made to the file in-place.
nbconvert - a python implementation of converting .ipynb
to other formats.
Primarily, the
nbconvert
tool allows you to convert a Jupyter.ipynb
notebook document file into another static format including HTML, LaTeX, PDF, Markdown, reStructuredText, and more. nbconvert can also add productivity to your workflow when used to execute notebooks programatically.
pandoc - a universal document converter, pandoc can read and write a bunch of different formats including rst, html, LaTeX, docx, epub, and pdf.
ipyaml - IPython Notebooks as YAML.
jupytext - provides text-editor friendly format conversion to/from notebooks.
ipymd - markdown format for notebooks (excludes outputs).
nbconvert-vc - The experimental nbconvert plugin (to a YAML format) mentioned by Mike Droettboom in his Jupyter Notebooks and Version Control post.
nbdime - diffing and merging in Jupyter Notebooks.
spinx-gallery -
has a format of converting python scripts to notebooks and to Restructured Text
(.rst
) with execution results included.