cellout

Rich notebook conversion

Cellout is both a command-line utility as well as a Haskell library for operating on Jupyter notebooks. You can think of cellout as a translation layer between different kinds of notebook formats. An additional goal of cellout is to provide nbconvert functionality without a python runtime dependency.

Cellout was written to provide a solid foundation for experimenting with notebook file formats. Integrating well with pandoc by providing readers has been one of the motivating factors.

The haskell library contains a well-typed Jupyter notebook document representation, as well as facilities for operating on notebooks (such as filtering output, removing metadata, concatenating multiple notebooks).

Note: cellout is under active development, some of the behaviors outline in the rest of this document are how it should work, and does not necessarily reflect what is implemented right now.

Examples

$ cellout --clear-output Untitled314.ipynb
$ █

When there are no errors to report, cellout stays quiet, though you can pass the --verbose flag to get informational output.

$ cellout --clear-output --verbose Untitled314.ipynb
Wrote Untitled314.ipynb
$ █

A shortcut for the --verbose flag is just -v and you can add it multiple times to get higher levels of verbosity (more and more details).

$ cellout --clear-output -vvv  Untitled314.ipynb
Options {input = "Untitled314.ipynb", clearOutput = False, clearPrompt = False, outputFilename = "", outputType = "notebook", verbosity = 3}
Reading Untitled314.ipynb
Notebook contains 7 cells (3 code and 4 markdown)
Wrote Untitled314.ipynb
$ █

Why cellout?

Personal context: I am writing cellout in part to have a mechanism for exploring future notebook formats that is not tied down to the JSON .ipynb format. From the beginning of nbconvert back in 2012, I've expressed the opinion in the past that we should integrate with pandoc as much as we can and not have nbconvert do so much of what it does, and all that was missing is someone sitting down and learning more Haskell, which is what I've now started doing myself.

The JSON based .ipynb Jupyter's notebook format has been hugely successful, on the one hand, because everything is in one file and it can act as a unit of sharing, but it has also been a hindrance in other contexts. For example, the JSON format impedes effective version control, and being a single-file monolith means you can't load large notebooks quickly, since the entire .ipynb file needs to be transferred before the javascript can render it. That's not a big deal for short little notebooks, but in practice it has meant that no one writes long notebooks, and it's a bummer that there's effectively a cap on how many plots you pack into one.

At the last Project Jupyter all hands week-long meeting back in May 2018, we discussed different requirements, trade-offs, features, limitations and pain points for possible future versions of the notebook document format. So there are several ideas that people have for addressing particular kinds of uses cases where .ipynb file limitations end up a bottleneck.

As a Haskell library, cellout provides a well-defined in-memory representation of a notebook and can be used to convert and translate between these future formats. It will also allow us to integrate better with pandoc, and bring nbconvert functionality in a standalone executable (no python dependencies).

Visual summary

notebook* (* - or some notebook-compatible file format)
    |
    V
 cellout (filter/transform)
    |
    V
notebook*

In a more complex scenario, you can also use cellout as a reader for pandoc:

notebook*
    |
    V
 cellout (filter/transform)
    |
    V
 pandoc
    |
    V
one of the many output formats pandoc supports

Why is cellout not a pandoc filter?

Pandoc filters operate on the pandoc abstract syntax tree (AST), which I think would be "lossy" without some hacks. It's quite reasonable to convert a notebook to a document, but to have that document preserve notions of cells, outputs, etc, would create too much work for writing things like "clear the HTML-based outputs" while operating on the pandoc AST.

Formats

[p] - partial
- fully supported
- planned
[p] .ipynb (nbformat 4.2)
ipyaml
spinx-gallery
comment percent-percent format used by spyder
jupytext's extensions to the percent format
ipymd
folder ("manila folder")
zip-file ("manila envelope")

Comparing to `nbconvert`

Similarities

--clear-output flag
--no-prompt flag
Invoking cellout and nbconvert without an input file will print the usage information, instead of attempting to read from stdin as pandoc does.
Invoking cellout without an output file and without an output format, or with an output format that matches the input file will make changes in-place. (This is provided by the --inplace flag of nbconvert)

Differences

Whereas nbconvert will default to converting to html, cellout keeps the same file format, which means that at most, invoking cellout filename without any other flags will read in the filename and write it back out, possibly with slightly different formatting, but without changing any data.
Whereas nbconvert reads jupyter configuration files at startup, there are no configuration files for cellout - all options are specified at the command line.
--execute flag is not supported at this time.

Comparing to `pandoc`

Similarities

Like pandoc, there are no configuration files for cellout - all options are specified at the command line.

Differences

When invoked without an input file, pandoc, like the venerable UNIX cat command will await input from stdin. While this is a completely legitimate default behavior, at this time, I think it would be best to just print the usage information, which is what cellout does.
Similarly, unlike pandoc which writes to stdout when the output file is not specified, the result for cellout is written to a file. If the output format does not differ from the input format, the change is made to the file in-place.

Related projects

nbconvert - a python implementation of converting .ipynb to other formats.

Primarily, the nbconvert tool allows you to convert a Jupyter .ipynb notebook document file into another static format including HTML, LaTeX, PDF, Markdown, reStructuredText, and more. nbconvert can also add productivity to your workflow when used to execute notebooks programatically.

pandoc - a universal document converter, pandoc can read and write a bunch of different formats including rst, html, LaTeX, docx, epub, and pdf.

ipyaml - IPython Notebooks as YAML.

jupytext - provides text-editor friendly format conversion to/from notebooks.

ipymd - markdown format for notebooks (excludes outputs).

nbconvert-vc - The experimental nbconvert plugin (to a YAML format) mentioned by Mike Droettboom in his Jupyter Notebooks and Version Control post.

nbdime - diffing and merging in Jupyter Notebooks.

spinx-gallery - has a format of converting python scripts to notebooks and to Restructured Text (.rst) with execution results included.

ivanov / cellout Goto Github PK

cellout's Introduction

cellout

Rich notebook conversion

Examples

Why cellout?

Visual summary

Why is cellout not a pandoc filter?

Formats

Comparing to `nbconvert`

Comparing to `pandoc`

Related projects

cellout's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

ivanov / cellout Goto Github PK

cellout's Introduction

cellout

Rich notebook conversion

Examples

Why cellout?

Visual summary

Why is cellout not a pandoc filter?

Formats

Comparing to nbconvert

Comparing to pandoc

Related projects

cellout's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

Recommend Topics

Recommend Org

Comparing to `nbconvert`

Comparing to `pandoc`