drivendataorg / nbautoexport Goto Github PK
View Code? Open in Web Editor NEWAutomatically export Jupyter notebooks to various file formats (.py, .html, and more) on save.
Home Page: https://nbautoexport.drivendata.org/
License: MIT License
Automatically export Jupyter notebooks to various file formats (.py, .html, and more) on save.
Home Page: https://nbautoexport.drivendata.org/
License: MIT License
Set up TravisCI to run tests and publish new packages to PyPI. And get that sweet sweet badge. Check out the Deon's .travis.yml for how to do it.
Currently if I rename a notebook, the exported file with old name is not deleted.
Currently, export_format
and organize_by
are not checked when a user runs nbautoexport
. This could lead to hard-to-trace bugs down the line. Better to fail on nbautoexport
installation if the inputs are not among a set of supported values.
Workflow failed: tests #244
But how do we generate a save?
Just get the simplest automatic docs going. Try using mkdocs. See how Deon does it:
Recently, typer released 0.4.0 which also upped the version ceiling on click to include v8. There are some changes to either typer or click that cause some of our tests to fail.
https://github.com/drivendataorg/nbautoexport/actions/runs/1178182529
nbconvert
requires a LaTeX engine to convert to pdf
. We need to figure out the best way to install this in our CI pipeline so we can test the pdf
export format in our unit tests.
Some strategies and notes about them:
texlive-core
from conda-forge.
nbconvert
uses xelatex
but this installation only has pdflatex
.xelatex
to pdflatex
nbconvert
configuration to use pdflatex
instead of xelatex
tectonic
, a different TeX distribution that's available cross-platform from conda-forge
tectonic
xelatex
to tectonic
: Does not work. nbconvert
passes a --quiet
flag that tectonic
does not support.nbconvert
configuration to use tectonic
instead of xelatex
(and with appropriate CLI flags)tinytex
. Has install scripts for different OSes. Will need different conditional pipeline steps, which adds complexity to the pipeline, but scripts hopefully work out of the box.Currently if an existing .nb_autoexport
sentinel file exists and you run autoexport
again, it silently refuses to overwrite. Add a logging.warning
call that will inform the user that no changes were made.
There are three names being used for the .nbautoexport
config file:
SAVE_PROGRESS_INDICATOR_FILE
We should use one name to make the code easier to read. config
seems like the best choice.
Separated this from issue #34, because this will be a noisy change for the git diff.
Workflow failed: tests #219
Setting clean=True
in the configuration is dangerous, because it will delete things without user input and can lead to data loss.
It might be a good idea to add a prompt to the configure
command where if a user sets clean=True
, they need to read a warning and confirm that they want to configure things that way.
For now, we should just remove clean from the configuration and from the post-save hook. This isn't mature enough to let users run automatically.
Workflow failed: tests #199
If the user has an existing post_save hook installed it will be overwritten (maybe?). We should better understand how these interact and document.
Currently, we use subprocess.check_output
to get the Jupyter config directory for a shell command.
nbautoexport/nbautoexport/nbautoexport.py
Lines 88 to 91 in cb4ccdf
I found the function that the Jupyter CLI calls when you use that flag:
Works like a charm:
>>> from jupyter_core.paths import jupyter_config_dir
>>> jupyter_config_dir()
'/Users/Jay/.jupyter'
Workflow failed: tests #202
Workflow failed: tests #245
Workflow failed: tests #208
Currently there's not a way to remove the nbautoexport block from your Jupyter configuration. We should add this.
I have a jupyter notebook and I have an image together in the same folder. I then inserted the image into the notebook using markdown i.e. ![png](image.png)
. This displays fine within the notebook.
When running nbautoexport to export the notebook to markdown the result is placed in a separate folder called markdown. However the path to the image stays the same which causes it to not be displayed. Is there a way to update the path such that it points to the original file?
There is a workaround which involves displaying the image using code. The image then does get copied over to the markdown folder.
What is your preferred use? Should we not use markdown to insert images?
Could be useful to have a command to generate all exports. This would be useful if creating or changing the sentinel file after existing notebooks already exist.
It would be helpful to have explicit logging for nbautoexport on startup and runtime to help users confirm what is and is not working in their setup. This would also help add details for #71.
From @cccntu, originally mentioned here
I also tried the
export
command, but found they always exports every files. Maybe we can check the modified time and only export the necessary files? (like in Make) So user can add this command in Makefile without generating all the files each time. And add a flag to force regenerating everything.
I newly installed and configured nbautoexport. New/modified jupyter notebooks in the configured directory are not getting exported as scripts on save when using Jupyter Lab.
I opened the same notebooks using Jupyter Notebook and they are exported as scripts on save. Interestingly, when I then go back to using Jupyter Lab, things are working as they should (new/modified notebooks are exported on save).
As of now, nbautoexport
does not work with jupyterlab
when the version of nbclassic
is 4.0 or greater. To get both jupyterlab and jupyter notebook to work, run pip install nbclassic=0.3.7
(earlier 0.3 versions also work)
Write out to ~/.jupyter/jupyter_server_config.py
in addition to ~/.jupyter/jupyter_notebook_config.py
. After running cp ~/.jupyter/jupyter_notebook_config.py ~/.jupyter/jupyter_server_config.py
, both jupyter lab and jupyter notebook export scripts correctly regardless of the version of nbclassic
nbclassic
: linkThere's some cleanup we should probably do when ready to submit to PyPI. Off the top of my head:
In the clean
doc: https://nbautoexport.drivendata.org/cleaning/
I tried to add a folder to exclude, and it overwrote my old non-default export_formats
config.
I think it would be more natural to not have everything overwritten.
❯ cat notebooks/.nbautoexport
{
"export_formats": [
"script",
"html"
],
"organize_by": "extension",
}
❯ nbautoexport configure notebooks/ \
--clean-exclude 'report/*'
Detected existing autoexport configuration at notebooks/.nbautoexport. If you wish to overwrite, use the --overwrite flag.
❯ nbautoexport configure notebooks/ --overwrite \
--clean-exclude 'report/*'
❯ cat notebooks/.nbautoexport
{
"export_formats": [
"script"
],
"organize_by": "extension",
"clean": {
"exclude": [
"report/*"
]
}
}
Workflow failed: tests #220
It may be useful to have a page in our docs with common troubleshooting solutions:
nbautoexport install
. Restart Jupyter.nbconvert has built-in templates that allow alternative versions of their formats.
https://nbconvert.readthedocs.io/en/latest/usage.html#supported-output-formats
It may be useful to support specifying them.
When running a source install from PyPI, the installation will error with FileNotFoundError: [Errno 2] No such file or directory: 'requirements.txt'
. This is because we forgot to include requirements.txt
in MANIFEST.in
.
This is not a problem for binary installs (using wheels).
This blocks #51.
pip install nbautoexport --no-binary :all:
Collecting nbautoexport
Downloading nbautoexport-0.1.0.tar.gz (54 kB)
|████████████████████████████████| 54 kB 4.0 MB/s
Installing build dependencies ... done
Getting requirements to build wheel ... error
ERROR: Command errored out with exit status 1:
command: /Users/jqi/miniconda3/envs/temp/bin/python3.7 /Users/jqi/miniconda3/envs/temp/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /var/folders/nq/vp3dgt812jgb0q09rh5l706c0000gn/T/tmpi69keqqq
cwd: /private/var/folders/nq/vp3dgt812jgb0q09rh5l706c0000gn/T/pip-install-yloh0brm/nbautoexport
Complete output (24 lines):
Traceback (most recent call last):
File "/Users/jqi/miniconda3/envs/temp/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py", line 280, in <module>
main()
File "/Users/jqi/miniconda3/envs/temp/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py", line 263, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/Users/jqi/miniconda3/envs/temp/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py", line 114, in get_requires_for_build_wheel
return hook(config_settings)
File "/private/var/folders/nq/vp3dgt812jgb0q09rh5l706c0000gn/T/pip-build-env-qv2a9o6t/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 147, in get_requires_for_build_wheel
config_settings, requirements=['wheel'])
File "/private/var/folders/nq/vp3dgt812jgb0q09rh5l706c0000gn/T/pip-build-env-qv2a9o6t/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 127, in _get_build_requires
self.run_setup()
File "/private/var/folders/nq/vp3dgt812jgb0q09rh5l706c0000gn/T/pip-build-env-qv2a9o6t/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 249, in run_setup
self).run_setup(setup_script=setup_script)
File "/private/var/folders/nq/vp3dgt812jgb0q09rh5l706c0000gn/T/pip-build-env-qv2a9o6t/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 142, in run_setup
exec(compile(code, __file__, 'exec'), locals())
File "setup.py", line 27, in <module>
requirements = load_requirements(Path(__file__).parent / "requirements.txt")
File "setup.py", line 13, in load_requirements
with path.open("r") as fp:
File "/Users/jqi/miniconda3/envs/temp/lib/python3.7/pathlib.py", line 1203, in open
opener=self._opener)
File "/Users/jqi/miniconda3/envs/temp/lib/python3.7/pathlib.py", line 1058, in _opener
return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'requirements.txt'
----------------------------------------
ERROR: Command errored out with exit status 1: /Users/jqi/miniconda3/envs/temp/bin/python3.7 /Users/jqi/miniconda3/envs/temp/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /var/folders/nq/vp3dgt812jgb0q09rh5l706c0000gn/T/tmpi69keqqq Check the logs for full command output.
On a fresh machine, running nbautoexport
will fail
FileNotFoundError: [Errno 2] No such file or directory: '/root/.jupyter/jupyter_notebook_config.py'
We should catch this case and issue a nicer error message with a suggestion to run jupyter notebook --generate-config
.
nbautoexport install
does not log which .py
file it edited, and users may be curious.
The .nb_autoexport
sentinel file is saved to the tracked directory and looks something like this:
{
"export_format": ["script"],
"organize_by": "notebook",
"directory": "notebooks"
}
it is redundant to have "directory" as part of the config file, since the location of the sentinel file conveys that info. We can remove that key.
It would be nice if docstrings were included in the mkdocs
documentation.
https://pawamoy.github.io/mkdocstrings/ looks like a nice way to achieve this.
Sometimes a user may have other files in their notebooks directory that are intentional. There isn't a good way for clean
to anticipate arbitrary files, so we'll need a way for users to specify files to exclude/ignore.
Here is a potential interface that lets users specify globs:
Passing in ad hoc to a clean
command:
nbautoexport clean notebooks/ --exclude images/* --exclude README.md
Configuring in a way to be reusable:
nbautoexport configure notebooks/ -f script -b extension --clean-exclude images/* --clean-exclude README.md
{
"export_formats": ["script"],
"organize_by": "extension",
"clean": {
"exclude": [
"images/*",
"README.md"
]
}
}
Then for a file tree that looks like:
notebooks
├──0.1-ejm-data-exploration.ipynb
├── script
│ └── 0.1-ejm-data-exploration.py
│ └── 0.2-ejm-features-creation.py
└── html
└── 0.1-ejm-data-exploration.html
└── README.md
└── images
└── plot.jpg
notesbooks/script/0.2-ejm-features-creation.py
, 0.1-ejm-data-exploration.html/html/0.1-ejm-data-exploration.html
will be marked for deleting.
notebooks/README.md
and notebooks/images/plot.jpg
will be safe.
Package generates .nbautoinstall file in the intended folder, but no 'script' folder, nor scripts are generated.
I installed the package following the installation guide using the Anaconda 3 Prompt and the conda install nbautoexport
command. Then I ran nbautoexport install
and specified the folder directory via nbautoexport configure "path"
.
The .nbautoexport file got generated and looks like this:
{
"export_formats": [
"script"
],
"organize_by": "extension",
"clean": {
"exclude": []
}
}
After running Jupyter Notebook (Versions: jupyter core: 4.7.1, jupyter-notebook : 6.3.0) and saving some of my .ipynb files, no 'script' folder was generated.
Since I am by no means an expert, let me know if I made an obvious mistake.
Some of the export formats depend on pandoc or LaTeX, which are not trivial dependencies.
One way we can control whether to skip tests is to use custom command-line options to pytest: https://docs.pytest.org/en/latest/example/simple.html#pass-different-values-to-a-test-function-depending-on-command-line-options
This would allow someone without pandoc or LaTeX to run all of the other tests, and we could also skip tests on certain OS configurations in CI.
Thanks to the conda-forge autotick bot, it looks like our dependencies may need some cleaning up: conda-forge/nbautoexport-feedstock#1
Some of these dependencies are implicit through ones we do have, but it's probably better to list them explicitly.
Not sure what jupyter_contrib_nbextensions
is for. It looks like it got inherited from the original implementation according to the git blame. @r-b-g-b do you know what this might be?
Packages found by inspection but not in the meta.yaml:
- traitlets
- notebook
- jupyter_core
- nbformat
Packages found in the meta.yaml but not found by inspection:
- jupyter_contrib_nbextensions
nbconvert v6.0 introduced some kind of file encoding change that breaks exporting in nbautoexport.
Here is a stack trace from our scheduled test build.
nbautoexport\export.py:110: in export_notebook
converter.convert_notebooks()
c:\miniconda\envs\test\lib\site-packages\nbconvert\nbconvertapp.py:524: in convert_notebooks
self.convert_single_notebook(notebook_filename)
c:\miniconda\envs\test\lib\site-packages\nbconvert\nbconvertapp.py:491: in convert_single_notebook
self.postprocess_single_notebook(write_results)
c:\miniconda\envs\test\lib\site-packages\nbconvert\nbconvertapp.py:463: in postprocess_single_notebook
self.postprocessor(write_results)
c:\miniconda\envs\test\lib\site-packages\nbconvert\postprocessors\base.py:28: in __call__
self.postprocess(input)
nbautoexport\export.py:41: in postprocess
text = f.read()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <encodings.cp1252.IncrementalDecoder object at 0x00000186A042E190>
input = b'<!DOCTYPE html>\r\n<html>\r\n<head><meta charset="utf-8" />\r\n<meta name="viewport" content="width=device-width, in...>\r\n\r\n </div>\r\n</div>\r\n</div>\r\n</div>\r\n\r\n</div>\r\n</body>\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n</html>\r\n'
final = True
def decode(self, input, final=False):
> return codecs.charmap_decode(input,self.errors,decoding_table)[0]
E UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 226813: character maps to <undefined>
c:\miniconda\envs\test\lib\encodings\cp1252.py:23: UnicodeDecodeError
---------------------------- Captured stderr call -----------------------------
[NbConvertApp] Converting notebook C:\Users\runneradmin\AppData\Local\Temp\pytest-of-runneradmin\pytest-0\test_notebook_exports_generato2\the_notebook_0.ipynb to html
[NbConvertApp] Writing 588470 bytes to C:\Users\runneradmin\AppData\Local\Temp\pytest-of-runneradmin\pytest-0\test_notebook_exports_generato2\the_notebook_0.html
https://github.com/drivendataorg/nbautoexport/runs/1131815930?check_suite_focus=true#step:6:137
It looks like the converted files are now using a Windows encoding, but when we read back in to remove cell numbers, we're expecting unicode?
When fixing this, we may want to consider doing it in a way that ensures backwards compatibility with nbconvert v5.6.
Currently the type hint for export_format
is string.
But it's actually expecting a list of strings. You can use the typing
module to import the special annotation objects for containers.
https://docs.python.org/3/library/typing.html
from typing import List
export_format: List[str]
though maybe you want to use the abstract base class Sequence
instead of List
so it could also accept tuples, etc.
install
:
Installs the post-save hook in the jupyter configuration.
configure
:
Creates the .nbautoexport
configuration file in the
install
has not been runOther potential related fixes:
May be useful to have an nbautoexport
clean command to remove exports that don't match the existing configuration and notebooks. This would clean up exports after changing the configuration, or renaming notebooks.
nbconvert v6.0 introduced a new webpdf
format that generates a PDF via HTML rather than via LaTeX.
https://nbconvert.readthedocs.io/en/latest/usage.html#convert-webpdf
This may need some work on dependencies for CI, since it requires a headless Chromium browser. The documentation suggests that simply installing pyppeteer
might be enough.
If we change the command, you could end up with multiple versions in your config file.
https://github.com/drivendataorg/nbautoexport/blob/master/nbautoexport/nbautoexport.py#L97-L103
let's see if there's a good way to store this block as Python code so it gets highlighting in source control and could be easier to test directly if we need to eventually. I think that we could either define the function and use inspect.getsource to get this code after the function is imported, or maybe just have it in a .py that we read instead of importing.
I personally like the idea of doing it in a function. Then we can pass in c
as an argument, which would satisfy linting, and also make it possible to do some basic compatibility testing against Jupyter in the unit tests.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.