Git Product home page Git Product logo

sos-notebook's Introduction

Anaconda-Server Badge PyPI version DOI Build Status Build status

SoS Notebook

SoS Notebook is a Jupyter kernel that allows the use of multiple kernels in one Jupyter notebook. Using language modules that understand datatypes of underlying languages (modules sos-bash, sos-r, sos-matlab, etc), SoS Notebook allows data exchange among live kernels of supported languages.

SoS Notebook also extends the Jupyter frontend and adds a console panel for the execution of scratch commands and display of intermediate results and progress information, and a number of shortcuts and magics to facilitate interactive data analysis. All these features have been ported to JupyterLab, either in the sos extension jupyterlab-sos or contributed to JupyterLab as core features.

SoS Notebook also serves as the IDE for the SoS Workflow that allows the development and execution of workflows from Jupyter notebooks. This not only allows easy translation of scripts developed for interactive data analysis to workflows running in containers and remote systems, but also allows the creation of scientific workflows in a format with narratives, sample input and output.

SoS Notebook is part of the SoS suite of tools. Please refer to the SoS Homepage for details about SoS, and this page for documentations and examples on SoS Notebook. If a language that you are using is not yet supported by SoS, please submit a ticket, or consider adding a language module by yourself following the guideline here.

sos-notebook's People

Contributors

andrewcrook avatar bopeng avatar carinaup avatar daniel-mietchen avatar fortierq avatar gaow avatar junma80 avatar mathieuboudreau avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sos-notebook's Issues

Setting up environments before loading subkernels.

This is an extension to vatlab/sos#695 and is related to vatlab/sos#688

On a multi-user environment, there can be multiple versions of the same interpreter that are loaded on demand with commands such as module load. Although users can add these commands to shell configuration files such as .bashrc, these files might not be executed when the Jupyter server is started remotely from JupyterHub. Also, using .bashrc limits users to a specific version of interpreter.

It therefore makes sense to define customized language modules (e.g. R3.3, R3.4) and associate them with certain commands (e.g. module load R-3.3 or PATH='/path/to/R3.3:$PATH) so that the correct interpreter could be located before the kernel is started.

Implementation wise, instead of the default

subkernel name: R
language module: sos_R.kernel.sos_R
interpreter: what ever R in the path
kernel: ir

we could add in user or site configuration files multiple subkernels like

subkernel name: R-3.3
language module: R (or sos_R.kernel.sos_R)
environ:  path and variables, or commands to execute
kernel: ir

A stand alone utility for data format exchanges

I'm not sure how difficult it is, but I'd very much like it if it is possible to isolate the SoS data model into some SoS utilities to converting data between different languages for SoS workflow, just like what SoS notebook does. I'd like to contribute to the coding since I need the feature myself :) just want to discuss here the feasibility and a proper interface.

sos install kernel or python -m sos_notebook.install?

python -m sos_notebook.install

is the standard command to install jupyter kernel (e.g. Bash_kernel), but this command is difficult to type or remember, and it is somehow awkward to use

pip install sos-notebook
python -m sos_notebook.install

because it is not clear what the second command does (this is exactly why I customized python setup.py install to install kernel #39).

Given that we have some other minor items such as vim syntax highlighter, perhaps even codemirror stuff, perhaps we can change this command to something like

sos install kernel

or

sos install jupyter-kernel

with otherthings such as

sos install vim-syntax

Of course it is possible to allow both commands.

Porting SoS Notebook to JupyterLab.

As JupyterLab turnning beta, we should gather some speed to port SoS Notebook to JupyterLab. JupyterLab plans to release its 1.0 version at later 2018, which should be our goal as well.

This ticket will track the status of JupyterLab port and provide key information on technical issues.

SoS Kernel + SoS Workflow engine

This part seems to be usable directly in JupyterLab.

Side panel -> SoS Console

Our single-cell side panel should be replaced by a SoS Console. Contents from SoS Notebook will be sent to and be displayed at this console. The console and notebook will be backed by the SoS kernel managing multiple subkernels.

UI improvements such as language selection boxes

It is unclear how to do this before checking the developer guide. It is likely that it will be provided as an extension.

Magics and shortcuts

These have to be examined one by one, but most likely they will work just fine.

Improve %clear

  1. Magic %clear -s completed --all does not work. It should clear completed jobs from all cells in the notebook.
  2. There are some unneeded and long error messages, we should allow something like %clear --class stderr to clear all elements with specified classes.

Potential focus problem with scratch cell.

In a few occasions when I enter command in scratch cell and press Ctrl-Enter, the scratch cell is not executed and it appears that the main notebook creates a new cell and executed. This looks like a focus problem (shortcut not sent to scratch cell) but I am not sure why it happens and how to fix it. Actually, I cannot reliably reproduce it either.

Paste figures into sos notebook

We have been able to paste excel tables in md format to sos notebook (#2). In github, you can paste figures in clipboard into ticket tracker. github basically saves the figure somewhere and insert md code into the ticket.

I am not sure if there are already jupyter extensions for this but we can potentially incorporate this feature into SoS notebook to make it easier to incorporate pieces of information into the notebook.

bdist_wheel fails.

When installing sos-notebook using pip install sos-notebook, there is an error message

installing to build/bdist.macosx-10.9-x86_64/wheel
running install
Checking .pth file support in build/bdist.macosx-10.9-x86_64/wheel/
/Users/bpeng1/anaconda3/bin/python -E -c pass
TEST FAILED: build/bdist.macosx-10.9-x86_64/wheel/ does NOT support .pth files
error: bad install directory or PYTHONPATH

You are attempting to install a package to a directory that is not
on PYTHONPATH and which Python does not read ".pth" files from.  The
installation directory you specified (via --install-dir, --prefix, or
the distutils default setting) was:

    build/bdist.macosx-10.9-x86_64/wheel/

and your PYTHONPATH environment variable currently contains:

    ''

Here are some of your options for correcting the problem:

* You can choose a different installation directory, i.e., one that is
  on PYTHONPATH or supports .pth files

* You can add the installation directory to the PYTHONPATH environment
  variable.  (It must then also be on PYTHONPATH whenever you run
  Python and want to use the package(s) you are installing.)

* You can set up the installation directory to support ".pth" files by
  using one of the approaches described here:

  https://setuptools.readthedocs.io/en/latest/easy_install.html#custom-installation-locations


Please make the appropriate changes for your system and try again.

Checking the documentation, it appears that bdist_wheel is the newer standard. The problem with our setup.py is the use of customized install script that installs the sos kernel to jupyter, which is discouraged by wheel. Actually, as this post suggests, and as other kernels do, we should separate the installation and deployment of the sos kernel.

Syntax to separate sos and subkernel magic?

Right now sos handles sos magics and send the rest of the cell content to subkernels. There is however a slight chance that a subkernel defines the same magic as SoS, so

%get
blah

will cause some problem if %get is supposed to be sent to the subkernel.

I suspect that we can add some syntax to separate sos magics and subkernel magics, something like

%get var --from R
%
%get

Not that we have a similar problem with # and ! so in case the subkernel actually handles # or !, we should perhaps do something like

%
! something for subkernel

The separator could be %, %stop, or %--, or two blank lines...

Notebook saves unused kernels.

Opening a notebook in a machine will modify the meta information of the notebook with kernels of the current machine, leading to unexpected changes of the notebook. We should only save kernels that are actually used in the notebook.

Multi-processing using scratch cell?

This is another not-sure-what-can-be-done issue. Currently Jupyter notebook, including the scratch cell, uses only one process. All execution requests are queued and executed one by one. However, it would be more flexible if the scratch cell could be executed separately, at least for non-notebook related requests such as execution of shell commands and magics to check task status.

initial task status

Right now if we submit a bunch of jobs that have been running or failed, they all appear to have no status until the tasks are executed. Whereas for a completely new task, the status would be non-exist when I click, it is useful to let the users know that the tasks have failed (although pending re-execute) so that users can click the task to check the reason why it was failed. In sort, it might be helpful to indicate some sort of "fail while pending" status in contrast to "non-exist while pending".

C-S-Enter triggers when cell does not have focus

Right now when we use shortcut C-S-Enter, sos will send the text in the current cell to the side panel, even if the notebook does not have focus. The consequence is that when the side panel has focus, C-S-Enter will grab text from notebook to the side panel, leading to somehow strange experience.

Use `feather` for large arrays

Currently we use feather format to transfer dataframes and matrixes. However, if array of any datatype gets large (e.g. > 1M in length), it would be difficult and inefficient to transfer them as strings in memory. It would then make sense to use disk files to transfer them and feather can be a good candidate for this task. Of course for these simple data type we can use simpler format but feather` might have done extra work to improve efficiency and accuracy.

Because it would be wasteful to transfer simple and small arrays in this manner, we would need to do something like

if len(array) > 10000:
   # use disk file
else:
  # use memory

in every case... and it has to be done one by one.

Displaying only the last "screen"

Not at all sure if it can be done but one problem with the notebook interface is that the outputs are appended to the output cell and you have to constantly scroll to check the last few lines. Perhaps we can do something like auto-scroll, or show only the last 50 lines of results to give users a familiar "terminal" feeling.

f-string syntax highlight

This in fact has been bothering me for a while now:

2017-11-27-19-17-45_scrot

Basically any text after an f-string syntax will be highlighted in red as raw string in the SoS notebook cell (python sub-kernel). Do you observe the same? A pure Python notebook would not have this issue.

%capture --append var

%capture --to var current capture output from a cell, optionally parse it as json or csv, and save the results to var. I just had a case where I would like to capture output from multiple cells to a variable so something like

%capture --append var

would be useful.

Here --append var would be equivalent to --to var is var does not exist. Otherwise --append var will append the captured text to var, with type matching type of existing var. That is to say, text to text, DataFrame appended to DataFrame ...

Stable anchor

Our header anchors are generated automatically and will change with addition of new headers. We will need to use stable anchors to keep our document consistent.

nbviewer for sos notebooks

nbviewer renders Jupyter notebooks and display them in HTML format. It is a great service but it does not handle SoS notebooks properly (missing kernel indications and syntax highlighting). We should make it running locally and submit a PR to nbviewer.

Jupyter Lab support

This is something we have avoided for a while but since the whole community is moving away from Jupyter to JupyterLab, we have to address it one way of another. Here are my observations:

Pros of JupyterLab:

  1. Tabs are nice, both notebook tabs and side panel tabs.
  2. A terminal is really nice.
  3. Much better GUI compare to Jupyter.
  4. Has something like image viewer and inspector, although not sure how it works.

Cons of JupyterLab: (Correct me if I am wrong)

  1. Does not have the scratchpad style side panel
  2. Does not support multiple languages in one notebook.
  3. Does not support inline expression (our %render magic)
  4. Does not support line-by-line execution

Other points:

  1. Jupyter is still being maintained and will likely exist for a long time, so SoS will keep up at least with Jupyter development.
  2. JupyterLab is new development of Jupyter with better GUI but does not address any of the problems that SoS is trying to address (multi-kernel notebook).

It might be relatively easy to have a stripped down version of SoS notebook to work with JupyterLab, that is essentially a Python3.6 kernel that supports SoS workflow engine. Even then, it is awkward to be able to open SoS notebooks in JupyterLab but not able to work with it.

An option to clean notebook

It is probably a good idea to have a way to clean up all output of a notebook so that it can be rerun or be archived without results. From command line, it would be something like

sos convert notebook.ipynb code.sos --all
sos convert code.sos notebook_cleaned.ipynb

From notebook, it would be good to have to magic to remove all such outputs.

Error message when running sos-notebook docker image

A reviewer reported error

$ docker run -d -p 8888:8888 -v $HOME:/Users/xyz/data-analysis/  mdabioinfo/sos-notebook
5394a73ddfa1d349b6fd7455e06ca97ed774f08450c18f7807de98a3418a9319
docker: Error response from daemon: driver failed programming external connectivity on endpoint frosty_easley (f4dc9b57a4b0080d4b64529a572c491c5db866ec845c748291da6e85f625fbd6): Bind for 0.0.0.0:8888 failed: port is already allocated.

It is true that the port is taken so he should use another port? `-p 8888:8889'?

Shortcut for copy tables into Jupyter/SoS

I often find a need to copy various information in tabular format to SoS as description of data. Right now, the easiest method to do so is to use Pandas' read_clipboard function

import pandas as pd
pd.read_clipboard()

and get a table as output. The problems are

  1. The table is in a separate cell as output
  2. The table can not be modified. In case that there are some problems with the pasted table (e.g. empty cells got shifted), there is no way to modify it.

I am wondering if we can implement a shortcut or magic to

  1. use read_clipboard to read the table in.
  2. use
from tabulate import tabulate
print(tabulate(df, headers='keys', tablefmt='pipe'))

to get markdown
3. Insert the markdown to current markdown cell.

Note that we currently do not have magic for markdown cells because markdown cells are handled by Jupyter frontend so the SoS kernel cannot touch it. So if we are going to implement this, we will have to implement it in frontend, and the frontend has to communicate with backend in private (not as cell execution) to get the markdown representation of the table.

A capture magic?

Motivated by vatlab/sos#869

We have a render magic that renders output from subkernels as markdown, html etc. We could provide a capture magic to capture output from subkernels. Something like,

  1. %capture --to var will capture the output (stdout) of subkernel to a variable var as string.
  2. %capture --to var --as json would use json.load() to parse the output and return a dictionary. %capture --to var --as csv would use csv.read() to parse the output as pandas.DataFrame.
  3. %capture --to var --as json --from outfile.txt would get the input from a file instead of standard output.

However, becaues capture and render are similar in functionality, we could add these features to the render magic. That is to say, we could add the following features to the render magic

  1. --from outfile.txt get input from a file instead of stdout, assuming the file would be generated by the subkernel.
  2. %render magic already supports JSON but we could add %render csv
  3. --to var %render displays output as cell output but we could save the results to a file.

No `__max_procs__` attribute

Is SoS Notebook broken? Here is how the error is reproduced:

%run -v4
[1]
data = open('/tmp/1.txt'.readlines())

I get:

DEBUG: Workflow default created with 1 sections: default_1
  File "/opt/miniconda3/lib/python3.6/site-packages/sos_notebook-0.9.11.1-py3.6.egg/sos_notebook/workflow_executor.py", line 365, in runfile
    'max_procs': args.__max_procs__,
Traceback (most recent call last):
  File "/opt/miniconda3/lib/python3.6/site-packages/sos_notebook-0.9.11.1-py3.6.egg/sos_notebook/workflow_executor.py", line 365, in runfile
    'max_procs': args.__max_procs__,
AttributeError: 'Namespace' object has no attribute '__max_procs__'
'Namespace' object has no attribute '__max_procs__'

A "download" button in SoS generated HTML report.

It would be helpful to add a download button (perhaps hidden in the control panel) to download the notebook version of the report from HTML report. This is usually not needed but can be a very useful for online tutorials where a HTML version is displayed with a notebook version to be downloaded and executed.

Notebook services like try Jupyter and Azure Notebooks can go further and allow the execution of notebooks, but we do not have the resources for it and in most cases notebooks are not executable without needed tools and source data.

Multi-column preview of multiple images in the main notebook

I'm wondering if it is possible to make preview in main notebook configurable. Currently it is taking the entire width of the cell which does not look very nice. Even better would be to allow for having a multi-column view when previewing multiple figures, if I'm not asking too much here :)

sos lock/unlock

There can be values for "locking" a notebook so that it cannot be modified, or copied. A locked notebook can be viewed with a SoS kernel but will be shown as garbage with other kernels. We might even prevent the copying of notebooks by disabling keyboard copy/paste.

IPython notebook has some security feature, not sure if Jupyter notebook inherits it.

Use Apache Arrow to transfer float numbers.

We sometimes use strings to transfer float numbers between kernels. For example,

a = 1.3454443459

in Python can be transferred to another kernel as string but the precision of the string representation of float numbers is not guaranteed. That is to say, if not executed carefully, we might pass

a = 1.345444

or

a = 1.345444334589999999

to another kernel, which might or might not matter depending on different applications. For this reason, we should try to pass binary representation of float numbers between kernels whenever possible. In particular, we should use Apache Arrow for languages that supports the Arrow on disk file format. That is to say, we can use pyarrow to save arrays and load from another kernel.

Note that we have already done this for dataframe because the feather format uses Apache Arrow, but we will need to expand the usage to support more data types.

Support for scala

Scala should in my opinion be the next language to support (after Ruby) because it has a Java root, which allows us to work with other Java flavored languages/kernels such as Kotlin later. It is also one of the front end languages to Spark so supporting Scala will lead to support to a number of kernels such as iScala and iSpark.

Fix side panel output resize

The side panel has a output area with a prompt. The prompt is removed after the outputs are generated so the output first cramp to the right and then shifted to the left.

It is better to use css to set the width of side panel prompt to zero instead of removing it each time.

sos_comm closed after long waiting time.

After a notebook has been running for a long time (e.g. waiting for long tasks to complete), the frontend sos_comm can be disconnected for some reason, and the notebook becomes non-operable (e.g. %task would not get response). There should be someway for frontend to detect lost sos_comm and re-create one.

Magic %expand

WIth the shift to { }, it is more troublesome to have scripts in subkernels automatically interpolated. It makes sense to introduce a magic to do this.

Separate colors for languages sharing the same language module

Current prompt color is defined for each language module, so language sharing the same language modules (e.g. matlab and octave, python2 and python3, javascript and typescript, ...) would be marked by the same color in the notebook. Perhaps we should define colors at the language level, not at the module level.

Improve "kernel does not support %get" message.

This happens when we have a "bare" kernel, namely a kernel without a working language module. The source of the problem can be either

  1. Users are using a unsupported language or kernel
  2. Users do not have a working language module, which, especially in case of matlab and SAS, can be tricky to set up.

Since users will unlikely complain about 1, we should improve the error message, which sounds like a problem with SoS.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.