Git Product home page Git Product logo

cellout's Introduction

cellout

Rich notebook conversion

Cellout is both a command-line utility as well as a Haskell library for operating on Jupyter notebooks. You can think of cellout as a translation layer between different kinds of notebook formats. An additional goal of cellout is to provide nbconvert functionality without a python runtime dependency.

Cellout was written to provide a solid foundation for experimenting with notebook file formats. Integrating well with pandoc by providing readers has been one of the motivating factors.

The haskell library contains a well-typed Jupyter notebook document representation, as well as facilities for operating on notebooks (such as filtering output, removing metadata, concatenating multiple notebooks).

Note: cellout is under active development, some of the behaviors outline in the rest of this document are how it should work, and does not necessarily reflect what is implemented right now.

Examples

$ cellout --clear-output Untitled314.ipynb
$ █

When there are no errors to report, cellout stays quiet, though you can pass the --verbose flag to get informational output.

$ cellout --clear-output --verbose Untitled314.ipynb
Wrote Untitled314.ipynb
$ █

A shortcut for the --verbose flag is just -v and you can add it multiple times to get higher levels of verbosity (more and more details).

$ cellout --clear-output -vvv  Untitled314.ipynb
Options {input = "Untitled314.ipynb", clearOutput = False, clearPrompt = False, outputFilename = "", outputType = "notebook", verbosity = 3}
Reading Untitled314.ipynb
Notebook contains 7 cells (3 code and 4 markdown)
Wrote Untitled314.ipynb
$ █

Why cellout?

Personal context: I am writing cellout in part to have a mechanism for exploring future notebook formats that is not tied down to the JSON .ipynb format. From the beginning of nbconvert back in 2012, I've expressed the opinion in the past that we should integrate with pandoc as much as we can and not have nbconvert do so much of what it does, and all that was missing is someone sitting down and learning more Haskell, which is what I've now started doing myself.

The JSON based .ipynb Jupyter's notebook format has been hugely successful, on the one hand, because everything is in one file and it can act as a unit of sharing, but it has also been a hindrance in other contexts. For example, the JSON format impedes effective version control, and being a single-file monolith means you can't load large notebooks quickly, since the entire .ipynb file needs to be transferred before the javascript can render it. That's not a big deal for short little notebooks, but in practice it has meant that no one writes long notebooks, and it's a bummer that there's effectively a cap on how many plots you pack into one.

At the last Project Jupyter all hands week-long meeting back in May 2018, we discussed different requirements, trade-offs, features, limitations and pain points for possible future versions of the notebook document format. So there are several ideas that people have for addressing particular kinds of uses cases where .ipynb file limitations end up a bottleneck.

As a Haskell library, cellout provides a well-defined in-memory representation of a notebook and can be used to convert and translate between these future formats. It will also allow us to integrate better with pandoc, and bring nbconvert functionality in a standalone executable (no python dependencies).

Visual summary

notebook* (* - or some notebook-compatible file format)
    |
    V
 cellout (filter/transform)
    |
    V
notebook*

In a more complex scenario, you can also use cellout as a reader for pandoc:

notebook*
    |
    V
 cellout (filter/transform)
    |
    V
 pandoc
    |
    V
one of the many output formats pandoc supports

Why is cellout not a pandoc filter?

Pandoc filters operate on the pandoc abstract syntax tree (AST), which I think would be "lossy" without some hacks. It's quite reasonable to convert a notebook to a document, but to have that document preserve notions of cells, outputs, etc, would create too much work for writing things like "clear the HTML-based outputs" while operating on the pandoc AST.

Formats

  • [p] - partial

  • - fully supported

  • - planned

  • [p] .ipynb (nbformat 4.2)

  • ipyaml

  • spinx-gallery

  • comment percent-percent format used by spyder

  • jupytext's extensions to the percent format

  • ipymd

  • folder ("manila folder")

  • zip-file ("manila envelope")

Comparing to nbconvert

Similarities

  • --clear-output flag

  • --no-prompt flag

  • Invoking cellout and nbconvert without an input file will print the usage information, instead of attempting to read from stdin as pandoc does.

  • Invoking cellout without an output file and without an output format, or with an output format that matches the input file will make changes in-place. (This is provided by the --inplace flag of nbconvert)

Differences

  • Whereas nbconvert will default to converting to html, cellout keeps the same file format, which means that at most, invoking cellout filename without any other flags will read in the filename and write it back out, possibly with slightly different formatting, but without changing any data.

  • Whereas nbconvert reads jupyter configuration files at startup, there are no configuration files for cellout - all options are specified at the command line.

  • --execute flag is not supported at this time.

Comparing to pandoc

Similarities

  • Like pandoc, there are no configuration files for cellout - all options are specified at the command line.

Differences

  • When invoked without an input file, pandoc, like the venerable UNIX cat command will await input from stdin. While this is a completely legitimate default behavior, at this time, I think it would be best to just print the usage information, which is what cellout does.

  • Similarly, unlike pandoc which writes to stdout when the output file is not specified, the result for cellout is written to a file. If the output format does not differ from the input format, the change is made to the file in-place.

Related projects

nbconvert - a python implementation of converting .ipynb to other formats.

Primarily, the nbconvert tool allows you to convert a Jupyter .ipynb notebook document file into another static format including HTML, LaTeX, PDF, Markdown, reStructuredText, and more. nbconvert can also add productivity to your workflow when used to execute notebooks programatically.

pandoc - a universal document converter, pandoc can read and write a bunch of different formats including rst, html, LaTeX, docx, epub, and pdf.

ipyaml - IPython Notebooks as YAML.

jupytext - provides text-editor friendly format conversion to/from notebooks.

ipymd - markdown format for notebooks (excludes outputs).

nbconvert-vc - The experimental nbconvert plugin (to a YAML format) mentioned by Mike Droettboom in his Jupyter Notebooks and Version Control post.

nbdime - diffing and merging in Jupyter Notebooks.

spinx-gallery - has a format of converting python scripts to notebooks and to Restructured Text (.rst) with execution results included.

cellout's People

Contributors

ivanov avatar theghosthucodes avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

theghosthucodes

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.