drivendataorg / nbautoexport Goto Github PK

View Code? Open in Web Editor NEW

72.0 7.0 9.0 2.65 MB

Automatically export Jupyter notebooks to various file formats (.py, .html, and more) on save.

Home Page: https://nbautoexport.drivendata.org/

License: MIT License

Makefile 1.63% Python 82.00% Jupyter Notebook 16.10% Shell 0.27%

jupyter-notebook nbconvert jupyter-lab hacktoberfest

nbautoexport's Issues

Set up CI to test and push new tagged packages to PyPI

Set up TravisCI to run tests and publish new packages to PyPI. And get that sweet sweet badge. Check out the Deon's .travis.yml for how to do it.

Proposal: delete auto-exported file that are renamed

Currently if I rename a notebook, the exported file with old name is not deleted.

Use enums to check export_format and organize_by

Currently, export_format and organize_by are not checked when a user runs nbautoexport. This could lead to hard-to-trace bugs down the line. Better to fail on nbautoexport installation if the inputs are not among a set of supported values.

Failed build on master branch (tests #244)

Workflow failed: tests #244

Write integration test that triggers post-save with Jupyter saving

But how do we generate a save?

Start docs

Just get the simplest automatic docs going. Try using mkdocs. See how Deon does it:

Failing tests from typer 0.4.0 / click 8 upgrade

Recently, typer released 0.4.0 which also upped the version ceiling on click to include v8. There are some changes to either typer or click that cause some of our tests to fail.

https://github.com/drivendataorg/nbautoexport/actions/runs/1178182529

Install LaTeX in CI to enable testing PDF export format

nbconvert requires a LaTeX engine to convert to pdf. We need to figure out the best way to install this in our CI pipeline so we can test the pdf export format in our unit tests.

Some strategies and notes about them:

Warn when not overwriting due to existing nb_autoexport file

Currently if an existing .nb_autoexport sentinel file exists and you run autoexport again, it silently refuses to overwrite. Add a logging.warning call that will inform the user that no changes were made.

Unify naming of .nbautoexport config in code

There are three names being used for the .nbautoexport config file:

SAVE_PROGRESS_INDICATOR_FILE
"sentinel"
"config"

We should use one name to make the code easier to read. config seems like the best choice.

Separated this from issue #34, because this will be a noisy change for the git diff.

Failed build on master branch (tests #219)

Workflow failed: tests #219

Proposal: Add support for having no subfolders

v0.1.0 release

I think we're in good shape for a v0.1.0 release. What do you think?

@pjbull @ejm714 @r-b-g-b

Remove clean as configuration option

Setting clean=True in the configuration is dangerous, because it will delete things without user input and can lead to data loss.

~~It might be a good idea to add a prompt to the configure command where if a user sets clean=True, they need to read a warning and confirm that they want to configure things that way.~~

For now, we should just remove clean from the configuration and from the post-save hook. This isn't mature enough to let users run automatically.

Failed build on master branch (tests #199)

Workflow failed: tests #199

Document interaction with existing post_save hook

If the user has an existing post_save hook installed it will be overwritten (maybe?). We should better understand how these interact and document.

Replace subprocess to get Jupyter config directory?

Currently, we use subprocess.check_output to get the Jupyter config directory for a shell command.

nbautoexport/nbautoexport/nbautoexport.py

Lines 88 to 91 in cb4ccdf

 def get_jupyter_config_directory(): 

 output = subprocess.check_output(["jupyter", "--config-dir"]) 

 config_directory = output.strip(b"\n").decode("utf-8") 

 return config_directory

I found the function that the Jupyter CLI calls when you use that flag:

https://github.com/jupyter/jupyter_core/blob/ac712354ec2c9377a290acca03a91712f991c889/jupyter_core/paths.py#L53-L68

Works like a charm:

>>> from jupyter_core.paths import jupyter_config_dir
>>> jupyter_config_dir()
'/Users/Jay/.jupyter'

Failed build on master branch (tests #202)

Workflow failed: tests #202

Failed build on master branch (tests #245)

Workflow failed: tests #245

Failed build on master branch (tests #208)

Workflow failed: tests #208

Add uninstall command

Currently there's not a way to remove the nbautoexport block from your Jupyter configuration. We should add this.

Referencing images with markdown not carried over in output

nbautoexport version: 0.3.1
Python version: 3.8
Operating System: Windows 10

I have a jupyter notebook and I have an image together in the same folder. I then inserted the image into the notebook using markdown i.e. ![png](image.png). This displays fine within the notebook.

When running nbautoexport to export the notebook to markdown the result is placed in a separate folder called markdown. However the path to the image stays the same which causes it to not be displayed. Is there a way to update the path such that it points to the original file?

There is a workaround which involves displaying the image using code. The image then does get copied over to the markdown folder.

What is your preferred use? Should we not use markdown to insert images?

Add recipe to conda-forge

Export command

Could be useful to have a command to generate all exports. This would be useful if creating or changing the sentinel file after existing notebooks already exist.

Improve startup/runtime logging

It would be helpful to have explicit logging for nbautoexport on startup and runtime to help users confirm what is and is not working in their setup. This would also help add details for #71.

Document examples of use with GNU make

From @cccntu, originally mentioned here

I also tried the export command, but found they always exports every files. Maybe we can check the modified time and only export the necessary files? (like in Make) So user can add this command in Makefile without generating all the files each time. And add a flag to force regenerating everything.

Export works with Jupyter Notebook but not JupyterLab

nbautoexport version: 0.3.1+7.gf539250
Python version: 3.6.13
Operating System: Ubuntu 18.04.5 LTS on AWS EC2 Instance

Description

I newly installed and configured nbautoexport. New/modified jupyter notebooks in the configured directory are not getting exported as scripts on save when using Jupyter Lab.

What I Did

I opened the same notebooks using Jupyter Notebook and they are exported as scripts on save. Interestingly, when I then go back to using Jupyter Lab, things are working as they should (new/modified notebooks are exported on save).

Interim workaround

As of now, nbautoexport does not work with jupyterlab when the version of nbclassic is 4.0 or greater. To get both jupyterlab and jupyter notebook to work, run pip install nbclassic=0.3.7 (earlier 0.3 versions also work)

Long term fix

Write out to ~/.jupyter/jupyter_server_config.py in addition to ~/.jupyter/jupyter_notebook_config.py. After running cp ~/.jupyter/jupyter_notebook_config.py ~/.jupyter/jupyter_server_config.py, both jupyter lab and jupyter notebook export scripts correctly regardless of the version of nbclassic

Related issue in nbclassic: link

Prep for PyPI submission

There's some cleanup we should probably do when ready to submit to PyPI. Off the top of my head:

choose an OSS license and create a LICENSE.md file (closed by #48)
go through MANIFEST.in and make sure we're including/excluding what we want (closed by #48)
go through README and clean it up (some instructions might not match recent changes) (closed by #40)

Proposal: additive `configure`

nbautoexport version: 0.2.1
Python version: 3.9.1
Operating System: Ubuntu 18.04.1

Description

In the clean doc: https://nbautoexport.drivendata.org/cleaning/
I tried to add a folder to exclude, and it overwrote my old non-default export_formats config.
I think it would be more natural to not have everything overwritten.

What I Did

❯ cat notebooks/.nbautoexport
{
  "export_formats": [
    "script",
    "html"
  ],
  "organize_by": "extension",
}
❯ nbautoexport configure notebooks/ \
    --clean-exclude 'report/*'
Detected existing autoexport configuration at notebooks/.nbautoexport. If you wish to overwrite, use the --overwrite flag.
❯ nbautoexport configure notebooks/ --overwrite \
    --clean-exclude 'report/*'
❯ cat notebooks/.nbautoexport
{
  "export_formats": [
    "script"
  ],
  "organize_by": "extension",
  "clean": {
    "exclude": [
      "report/*"
    ]
  }
}

Failed build on master branch (tests #220)

Workflow failed: tests #220

Troubleshooting docs

It may be useful to have a page in our docs with common troubleshooting solutions:

Run nbautoexport install. Restart Jupyter.
Check for runtime errors in the Jupyter config script on startup by looking at server logs (came up in #70)
Check for runtime errors when saving notebooks by looking at server logs

Support nbconvert templates

nbconvert has built-in templates that allow alternative versions of their formats.

https://nbconvert.readthedocs.io/en/latest/usage.html#supported-output-formats

It may be useful to support specifying them.

No such file or directory: 'requirements.txt' error for source install

nbautoexport version: v0.1.0

Description

When running a source install from PyPI, the installation will error with FileNotFoundError: [Errno 2] No such file or directory: 'requirements.txt'. This is because we forgot to include requirements.txt in MANIFEST.in.

This is not a problem for binary installs (using wheels).

This blocks #51.

What I Did

pip install nbautoexport --no-binary :all:

Collecting nbautoexport
  Downloading nbautoexport-0.1.0.tar.gz (54 kB)
     |████████████████████████████████| 54 kB 4.0 MB/s 
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  ERROR: Command errored out with exit status 1:
   command: /Users/jqi/miniconda3/envs/temp/bin/python3.7 /Users/jqi/miniconda3/envs/temp/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /var/folders/nq/vp3dgt812jgb0q09rh5l706c0000gn/T/tmpi69keqqq
       cwd: /private/var/folders/nq/vp3dgt812jgb0q09rh5l706c0000gn/T/pip-install-yloh0brm/nbautoexport
  Complete output (24 lines):
  Traceback (most recent call last):
    File "/Users/jqi/miniconda3/envs/temp/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py", line 280, in <module>
      main()
    File "/Users/jqi/miniconda3/envs/temp/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py", line 263, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/Users/jqi/miniconda3/envs/temp/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py", line 114, in get_requires_for_build_wheel
      return hook(config_settings)
    File "/private/var/folders/nq/vp3dgt812jgb0q09rh5l706c0000gn/T/pip-build-env-qv2a9o6t/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 147, in get_requires_for_build_wheel
      config_settings, requirements=['wheel'])
    File "/private/var/folders/nq/vp3dgt812jgb0q09rh5l706c0000gn/T/pip-build-env-qv2a9o6t/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 127, in _get_build_requires
      self.run_setup()
    File "/private/var/folders/nq/vp3dgt812jgb0q09rh5l706c0000gn/T/pip-build-env-qv2a9o6t/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 249, in run_setup
      self).run_setup(setup_script=setup_script)
    File "/private/var/folders/nq/vp3dgt812jgb0q09rh5l706c0000gn/T/pip-build-env-qv2a9o6t/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 142, in run_setup
      exec(compile(code, __file__, 'exec'), locals())
    File "setup.py", line 27, in <module>
      requirements = load_requirements(Path(__file__).parent / "requirements.txt")
    File "setup.py", line 13, in load_requirements
      with path.open("r") as fp:
    File "/Users/jqi/miniconda3/envs/temp/lib/python3.7/pathlib.py", line 1203, in open
      opener=self._opener)
    File "/Users/jqi/miniconda3/envs/temp/lib/python3.7/pathlib.py", line 1058, in _opener
      return self._accessor.open(self, flags, mode)
  FileNotFoundError: [Errno 2] No such file or directory: 'requirements.txt'
  ----------------------------------------
ERROR: Command errored out with exit status 1: /Users/jqi/miniconda3/envs/temp/bin/python3.7 /Users/jqi/miniconda3/envs/temp/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /var/folders/nq/vp3dgt812jgb0q09rh5l706c0000gn/T/tmpi69keqqq Check the logs for full command output.

How to handle missing Jupyter config directory

On a fresh machine, running nbautoexport will fail

FileNotFoundError: [Errno 2] No such file or directory: '/root/.jupyter/jupyter_notebook_config.py'

We should catch this case and issue a nicer error message with a suggestion to run jupyter notebook --generate-config.

Tell users which config file we edited.

nbautoexport install does not log which .py file it edited, and users may be curious.

Remove "directory" field from autoexport .nb_autoexport

The .nb_autoexport sentinel file is saved to the tracked directory and looks something like this:

{
  "export_format": ["script"],
  "organize_by": "notebook",
  "directory": "notebooks"
}

it is redundant to have "directory" as part of the config file, since the location of the sentinel file conveys that info. We can remove that key.

Automatically include docstrings in documentation

It would be nice if docstrings were included in the mkdocs documentation.

https://pawamoy.github.io/mkdocstrings/ looks like a nice way to achieve this.

Configure "exclude" for the clean command

Sometimes a user may have other files in their notebooks directory that are intentional. There isn't a good way for clean to anticipate arbitrary files, so we'll need a way for users to specify files to exclude/ignore.

Potential implementation

Here is a potential interface that lets users specify globs:

Passing in ad hoc to a clean command:

nbautoexport clean notebooks/ --exclude images/* --exclude README.md

Configuring in a way to be reusable:

nbautoexport configure notebooks/ -f script -b extension --clean-exclude images/* --clean-exclude README.md

{
  "export_formats": ["script"],
  "organize_by": "extension",
  "clean": {
      "exclude": [
        "images/*",
        "README.md"
      ]
    }
}

Then for a file tree that looks like:

notebooks
├──0.1-ejm-data-exploration.ipynb
├── script
│   └── 0.1-ejm-data-exploration.py
│   └── 0.2-ejm-features-creation.py
└── html
    └── 0.1-ejm-data-exploration.html
└── README.md
└── images
    └── plot.jpg

notesbooks/script/0.2-ejm-features-creation.py, 0.1-ejm-data-exploration.html/html/0.1-ejm-data-exploration.html will be marked for deleting.

notebooks/README.md and notebooks/images/plot.jpg will be safe.

No script folder generated by nbautoexport

nbautoexport version: 0.3.1
Python version: 3.8.8
Operating System: Windows 10

Description

Package generates .nbautoinstall file in the intended folder, but no 'script' folder, nor scripts are generated.

What I Did

I installed the package following the installation guide using the Anaconda 3 Prompt and the conda install nbautoexport command. Then I ran nbautoexport install and specified the folder directory via nbautoexport configure "path".
The .nbautoexport file got generated and looks like this:

{
  "export_formats": [
    "script"
  ],
  "organize_by": "extension",
  "clean": {
    "exclude": []
  }
}

After running Jupyter Notebook (Versions: jupyter core: 4.7.1, jupyter-notebook : 6.3.0) and saving some of my .ipynb files, no 'script' folder was generated.
Since I am by no means an expert, let me know if I made an obvious mistake.

Control whether to run tests that depend on pandoc or LaTeX

Some of the export formats depend on pandoc or LaTeX, which are not trivial dependencies.

One way we can control whether to skip tests is to use custom command-line options to pytest: https://docs.pytest.org/en/latest/example/simple.html#pass-different-values-to-a-test-function-depending-on-command-line-options

This would allow someone without pandoc or LaTeX to run all of the other tests, and we could also skip tests on certain OS configurations in CI.

Clean up dependencies

Thanks to the conda-forge autotick bot, it looks like our dependencies may need some cleaning up: conda-forge/nbautoexport-feedstock#1

Some of these dependencies are implicit through ones we do have, but it's probably better to list them explicitly.

Not sure what jupyter_contrib_nbextensions is for. It looks like it got inherited from the original implementation according to the git blame. @r-b-g-b do you know what this might be?

Packages found by inspection but not in the meta.yaml:

traitlets

notebook

jupyter_core

nbformat

Packages found in the meta.yaml but not found by inspection:

jupyter_contrib_nbextensions

Export error on Windows with nbconvert v6.0

nbconvert v6.0 introduced some kind of file encoding change that breaks exporting in nbautoexport.

Here is a stack trace from our scheduled test build.

nbautoexport\export.py:110: in export_notebook
    converter.convert_notebooks()
c:\miniconda\envs\test\lib\site-packages\nbconvert\nbconvertapp.py:524: in convert_notebooks
    self.convert_single_notebook(notebook_filename)
c:\miniconda\envs\test\lib\site-packages\nbconvert\nbconvertapp.py:491: in convert_single_notebook
    self.postprocess_single_notebook(write_results)
c:\miniconda\envs\test\lib\site-packages\nbconvert\nbconvertapp.py:463: in postprocess_single_notebook
    self.postprocessor(write_results)
c:\miniconda\envs\test\lib\site-packages\nbconvert\postprocessors\base.py:28: in __call__
    self.postprocess(input)
nbautoexport\export.py:41: in postprocess
    text = f.read()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <encodings.cp1252.IncrementalDecoder object at 0x00000186A042E190>
input = b'<!DOCTYPE html>\r\n<html>\r\n<head><meta charset="utf-8" />\r\n<meta name="viewport" content="width=device-width, in...>\r\n\r\n     </div>\r\n</div>\r\n</div>\r\n</div>\r\n\r\n</div>\r\n</body>\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n</html>\r\n'
final = True

    def decode(self, input, final=False):
>       return codecs.charmap_decode(input,self.errors,decoding_table)[0]
E       UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 226813: character maps to <undefined>

c:\miniconda\envs\test\lib\encodings\cp1252.py:23: UnicodeDecodeError
---------------------------- Captured stderr call -----------------------------
[NbConvertApp] Converting notebook C:\Users\runneradmin\AppData\Local\Temp\pytest-of-runneradmin\pytest-0\test_notebook_exports_generato2\the_notebook_0.ipynb to html
[NbConvertApp] Writing 588470 bytes to C:\Users\runneradmin\AppData\Local\Temp\pytest-of-runneradmin\pytest-0\test_notebook_exports_generato2\the_notebook_0.html

https://github.com/drivendataorg/nbautoexport/runs/1131815930?check_suite_focus=true#step:6:137

It looks like the converted files are now using a Windows encoding, but when we read back in to remove cell numbers, we're expecting unicode?

When fixing this, we may want to consider doing it in a way that ensures backwards compatibility with nbconvert v5.6.

Type hint for export_format

Currently the type hint for export_format is string.

https://github.com/drivendataorg/labslib/blob/cfb90e1adcf8a026823363797f4454cb99649567/labslib/autoexport/autoexport.py#L184

But it's actually expecting a list of strings. You can use the typing module to import the special annotation objects for containers.

https://docs.python.org/3/library/typing.html

from typing import List

export_format: List[str]

though maybe you want to use the abstract base class Sequence instead of List so it could also accept tuples, etc.

Split install command into install and configure

install:
Installs the post-save hook in the jupyter configuration.

log that jupyter servers need to be restarted if installed for first time or new version

configure:
Creates the .nbautoexport configuration file in the

error if not install has not been run

Other potential related fixes:

Rename "save progress indicator" and "sentinel" to be configuration throughout

Clean command

May be useful to have an nbautoexport clean command to remove exports that don't match the existing configuration and notebooks. This would clean up exports after changing the configuration, or renaming notebooks.

Support webpdf format

nbconvert v6.0 introduced a new webpdf format that generates a PDF via HTML rather than via LaTeX.

https://nbconvert.readthedocs.io/en/latest/usage.html#convert-webpdf

This may need some work on dependencies for CI, since it requires a headless Chromium browser. The documentation suggests that simply installing pyppeteer might be enough.

Documenting alternatives

We should document alternative tools and provide some light discussion about differences.

Jupyter Notebook conversion / alternative formats

Jupyter code review tools

Proposal: Change default organize_by to extension

Check for existing config and don't process if not new

If we change the command, you could end up with multiple versions in your config file.

https://github.com/drivendataorg/nbautoexport/blob/master/nbautoexport/nbautoexport.py#L97-L103

We should include the version number in the command initialize so that we can compare
If the installing version number > one in config, we should remove that version and replace with the new one.

Make Jupyter config initialize block code instead of a string

From @pjbull (source):

let's see if there's a good way to store this block as Python code so it gets highlighting in source control and could be easier to test directly if we need to eventually. I think that we could either define the function and use inspect.getsource to get this code after the function is imported, or maybe just have it in a .py that we read instead of importing.

I personally like the idea of doing it in a function. Then we can pass in c as an argument, which would satisfy linting, and also make it possible to do some basic compatibility testing against Jupyter in the unit tests.

	def get_jupyter_config_directory():
	output = subprocess.check_output(["jupyter", "--config-dir"])
	config_directory = output.strip(b"\n").decode("utf-8")
	return config_directory

drivendataorg / nbautoexport Goto Github PK

nbautoexport's Issues

Description

What I Did

Interim workaround

Long term fix

Description

What I Did

Description

What I Did

Potential implementation

Description

What I Did

Jupyter Notebook conversion / alternative formats

Jupyter code review tools

Recommend Projects

Recommend Topics

Recommend Org