Git Product home page Git Product logo

papyri's Introduction

Papyri

Papyri is a set of tools to build, publish (future functionality - to be done), install and render documentation within IPython and Jupyter.


Information Links
Project License Rendered documentation
CI Python Package Linting

Papyri allows:

  • bidirectional crosslinking across libraries,
  • navigation,
  • proper reflow of user docstrings text,
  • proper reflow of inline images (when rendered to html),
  • proper math rendering (both in terminal and html),
  • and more.

Motivation

See some of the reasons behind the project on this blog post.

Key motivation is building a set of tools to build better documentation for Python projects.

  • Uses an opinionated implementation to enable better understanding about the structure of your project.
  • Allow automatic cross-links (back and forth) between documentation across Python packages.
  • Use a documentation IR (intermediate representation) to separate building the docs from rendering the docs in many contexts.

This approach should hopefully allow a conda-forge-like model, where projects upload their IR to a given repo, a single website that contains documentation for multiple projects (without sub domains). The documentation pages can then be built with better cross-links between projects, and efficient page rebuild.

This should also allow displaying user-facing documentation on non html backends (think terminal), or provide documentation in an IDE (Spyder/Jupyterlab), without having to iframe it.

Overview Presentation

And this small presentation at CZI EOSS4 meeting in early november 2021.

Screenshots

Click to expand Navigating astropy's documentation from within IPython. Note that this includes forward refs but also backward references (i.e. which pages link to the current page.)

Type inference and keyboard navigation in terminal: Directives are properly rendered in terminal, examples are type inferred, clicking (or pressing enter) on highlighted tokens would open said page (backspace navigates back).

Since Jupyter Notebook and Lab pages can render HTML, it should be possible to have inline graphs and images when using Jupyter inline help (to be implemented). In terminals, we replace inline images with a button/link to open images in an external viewer (quicklook, evince, paint...)

Papyri has complete information about which pages link to other pages; this allows us to create a local graph of which pages mention each other to find related topics.

Below, you can see the local connectivity graph for numpy.zeros (d3js, draggable, clickable). numpy.zeroes links to (or is linked from) all dots present there. In green, we show other numpy functions; in blue, we show skimage functions; in orange, we show scipy functions; in red, we show xarray functions. Arrows between dots indicate pages which link to each other (for example ndarray is linked from xarray.cos), and dot size represents the popularity of a page.

Math expressions are properly rendered even in the terminal: here, polyfit is shown in IPyhton with papyri enabled (left) and disabled (right).

Installation (not fully functional):

Some functionality is not yet available when installing from PyPI. For now you need a dev-install (see next section) to access all features.

You'll need Python 3.8 or newer, otherwise pip will tell you it can't find any matching distribution.

Pip install from PyPI:

$ pip install papyri

Install given package documentation:

$ papyri install package_name [package_name [package_name [...]]]

Only numpy 1.20.0, scipy 1.5.0 and xarray 0.17.0 are currently installable and published. For other packages you will need to build locally which is a much more involved process.

Run IPython terminal with Papyri as an extension:

$ ipython --ext papyri.ipython

This will augment the ? operator to show better documentation (when installed with papyri install ...

Papyri does not completely build its own docs yet, but you might be able to view a static rendering of it here. It is not yet automatically built, so might be out of date.

Development install

You may need to get a modified version of numpydoc depending on the stage of development. You will need pip > 21.3 if you want to make editable installs.

# clone this repo
# cd this repo
pip install -e .

Some functionality requires tree_sitter_rst. To build the TreeSitter rst parser:

$ git submodule update --init
$ papyri build-parser

Look at CI file if those instructions are not up to date.

Note that papyri still uses a custom parser which will be removed in the future to rely mostly on TreeSitter.

Testing

Install extra development dependencies by running:

$ pip install -r requirements-dev.txt

Run tests using

$ pytest

Usage

In the end there should be roughly 3 steps,

  • IR generation (package maintainers)
  • IR installation (end user or via pip/conda)
  • IR rendering (usually IDE, CLI/webserver)

IR Generation

This is the step you want to trigger if you are building documentation using Papyri for a library you maintain. Most likely as an end user you will not have to issue this step and can install pre-published documentation bundles. This step is likely to occur only once per new release of a project.

Look at the Toml files in examples, this will give you example configurations from some existing libraries.

$ ls -1 examples/*.toml
examples/IPython.toml
examples/astropy.toml
examples/dask.toml
examples/matplotlib.toml
examples/numpy.toml
examples/papyri.toml
examples/scipy.toml
examples/skimage.toml

Right now these files lives in papyri but would likely be in relevant repositories under docs/papyri.toml later on.

It is slow on full numpy/scipy; use --no-infer (see below) for a subpar but faster experience.

Use papyri gen <path to example file>

for example:

$ papyri gen examples/numpy.toml
$ papyri gen examples/scipy.toml

This will create intermediate docs files in in ~/.papyri/data/<library name>_<library_version>

Installation/ingestion

The installation/ingestion of documentation bundles is the step that makes all bundles "aware" of each other, and allows crosslinking/indexing to work.

We'll reserve the term "install" and "installation" for when you download pre-build documentation bundle from an external source and give only the package name โ€“ which is not completely implemented yet.

You can ingest local folders with the following command:

$ papyri ingest ~/.papyri/data/<path to folder generated at previous step>

This will crosslink the newly generate folder with the existing ones. Ingested data can be found in ~/.papyri/ingest/ but you are not supposed to interact with this folder with tools external to papyri.

There is currently a couple of pre-built documentation bundles that can be pre-installed, but are likely to break with each new version of papyri. We suggest you use the developer installation and ingestion procedure for now.

Rendering

The last step of the papyri pipeline is to render the docs, or the subset that is of interest to you. This will likely be done by your favorite IDE, probably just in time when you explore documentation. Nonetheless, we've implemented a couple of external renderers to help debug issues.

WARNING:

Many rendering methods current require papyri's own docs to be built and ingested first.

$ papyri gen examples/papyri.toml
$ papyri ingest ~/.papyri/data/papyri_0.0.7  # or any current version

Or you can try to pre-install an old papyri doc bundle

$ papyri install papyri

Standalone HTML rendering

$ papyri render  # render all the html pages statically in ~/.papyri/html
$ papyri serve-static # start a http.server with the propoer root to serve above files.
$ papyri serve  # start a server that will render the pages on the fly (nice to debug or iterate on theme, rendering)

Ascii terminal rendering (experimental)

$ papyri ascii <fully qualified names> # try to render in the terminal.

For example,

$ papyri ascii numpy.linspace

The next step uses urwid to provide a browsable interface in terminal.

$ papyri browse <fully qualified name> # urwid documentation browser.

Hacking on scrapping libraries papyri gen --no-infer [...] will skip type inference of examples. --exec option need to be passed to try to execute examples.

Papyri - Name's meaning

See the legendary Villa of Papyri, which get its name from its collection of many papyrus scrolls.

Legacy (MISC/OLD) documentation (Inaccurate):

Generation (papyri gen)

Collects the documentation of a project into a DocBundle -- a number of DocBlobs (currently json files), with a defined semantic structure, and some metadata (version of the project this documentation refers to, and potentially some other blobs).

During the generation a number of normalisation and inference can and should happen, for example

  • using type inference into the Examples sections of docstrings and storing those as pairs (token, reference), so that you can later decide that clicking on np.array in an example brings you to numpy array documentation; whether or not we are currently in the numpy doc.
  • Parsing "See Also" into a well defined structure
  • running Example to generate images for docs with images (not implemented)
  • resolve package local references for example building numpy doc "zeroes_like" is non ambiguous and shoudl be Normalized to "numpy.zeroes_like", ~.pyplot.histogram, normalized to matplotlib.pyplot.histogram as the target and histogram as the text ...etc.

The Generation step is likely project specific, as there might be import conventions that are per-project and should not need to be repeated (import pandas as pd, for example,)

Ingestion (papyri ingest)

The ingestion step takes a DocBundle and/or DocBlobs and adds them into a graph of known items; the ingestion is critical to efficiently build the collection graph metadata and understand which items refers to which. This allows the following:

  • Update the list of backreferences to a DocBundle
  • Update forward references metadata to know whether links are valid.

Currently the ingestion loads all in memory and update all the bundle in place but this can likely be done more efficiently.

A lot more can likely be done at larger scale, like detecting if documentation have changed in previous version so infer for which versions of a library this documentation is valid.

There is also likely some curating that might need to be done at that point, as for example, numpy.array have an extremely large number of back-references.

tree sitter info.

https://tree-sitter.github.io/tree-sitter/creating-parsers

When things don't work !

SqlOperationalError:

  • The DB schema likely have changed, try: rm -rf ~/.papyri/ingest/.

Can't build tree-sitter:

An error occurred trying to build-tree-sitter with clang, you likely have a conda environment. Install all the compilers in the current conda env:

conda install compilers

papyri's People

Contributors

aktech avatar asmeurer avatar carreau avatar dependabot[bot] avatar mathdugre avatar melissawm avatar rowanc1 avatar steff456 avatar tacaswell avatar tonyfast avatar tupui avatar willingc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

papyri's Issues

Work better with --dry-run/don't write to disk.

Currently there is a papyri gen has a --dry-run (gen.py:gen_main) option that tries to not write to disk.
It is a bit of an ad-hoc solution.

We should figure out a more generic way to either:

  • Make it no-op, maybe via a MockUp pathlib,
  • Or actually use MakeTempdir and use this as a target (and cleanup on exit)

I tend to prefer the second solution as this would also help us ensure that the paths are not hardcoded.

misc thoughts

how about:

  • translations of docs
  • allowing dynamic parameters when running in kernel
  • docstrings datasets or languages for example could be domain specific; or business specific.
  • w3m like browsing.
  • Validated example style (like rust)
  • doc coverage
  • gallery-like for plots.

check rendering of literal blocks

There seem to be multiple types of literal blocks

::

   lit

some text::

    lit
some text ::

   lit
some text: ::

    lit

The rendering for seem seem to be off sometime. To check.

Warn on space in directive.

see https://github.com/stsewd/tree-sitter-rst/issues/18 while it's supported by docutils/tree-sitter it is non standard and we should discourage it.

Extract `paragraph()` out of templates.

It does the link parsing and a lot of other stuff; with this one could resolve the reference earlier.

I've split it into render_lines, and render_paragraph, and each location the usage of paragraph could be moved one stack higher until it is set into the doc blobs.

CZI EOSS timeline, planning and report.

This is a trimmed down version of the timeline and goal we planned in the original grant for better public tracking.


We propose overhauling the Jupyter and IPython interactive documentation framework with many features (inline graphs, navigation, indexing) while providing access to content (tutorials, how-toโ€™s, examples, gallery) currently only accessible through hosted documentation websites. Building a better understanding of the Python Ecosystemโ€™s documentation convention into IPython and Jupyter will also augment those capabilities with many desirable features, like local search, indexing, cross-references, and many others for a best-in-class documentation experience.

Python has multiple stories for documentation: docstrings, narrative documentation built via Sphinx and hosted online, Dynamic Tutorials by downloading notebooks. However, despite the diversity, none of these offer a complete documentation experience.

When interactively exploring data in IPython/Jupyter powered tools, the question-mark operator is the typical entry point to access documentation. This has many advantages: showing the documentation without the user having to search for the right page, or know the types of objects. However, it lacks the richness of hosted documentation: limited to text, no images, navigation, links or indexing. It is limited to docstrings and cannot expose tutorials and narrative sections critical for discovery and understanding. Users are exposed to raw source code containing LaTeX equations and Restructured Text directives, making for a poor user experience and lack of accessibility.

While hosted documentation is better in some of these aspects, it is often scattered across the web, does not reflect versions of libraries installed by users, and is often shadowed in search engine results by poorly maintained click-bait articles, leading to confusion and poor coding habits among practitioners.

Library authors are constrained in their technical writing to decide whether to prioritize interactive session documentation or hosted versions, leading to long-standing debates
(Sympyโ€™s syntax for equations), or complex and costly workarounds (Matplotlib, Napari and Pandas dynamic docstring generation at runtime).

Via a reusable framework we call Papyri, included in IPython/Jupyter, we can offer a state-of-the-art documentation experience to end-users. Our current proof of concept allows library authors to publish a semantic Intermediate Representation Documentation format (IRD). On usersโ€™ machines, tools can leverage IRD to provide access to the Python Ecosystem documentationโ€™s full richness. Our prototype shows that the following features are in reach:

  • Documentation from within IPython/Jupyter with rich text, images, and rendered mathematics.
  • Access to narrative sections, tutorial, examples, and image gallery.
  • Seamless integration and navigation across libraries.
  • Better in-built accessibility features, and the ability to customise usersโ€™ preferences.
  • Ensure documentation matches the user's installed libraries version.
  • Avoid dynamic docstring generation and their performance impact on libraries.

We believe the above is a first step to enhance the documentation experience for both consumers and authors. This project represents the key for the development, quality, ease of use, and discoverability in a growing Python ecosystem.

Additionally, this framework will open the door to several other valuable features, such as allowing docstrings to be written in the widely-used markdown format, better configuration of end-user appearance and preferences, translations and domain-specific alternatives, indexing, and others.


There are three technical components that need to be addressed. 1) Providing the tools to generate IRD from library source code 2) Installing and rendering IRD on usersโ€™ machines, and 3) Uploading and distributing IRD files. For this proposal we request funding the first two. The last one can be achieved by reusing other infrastructures like GitHub Pages, GitHub Actions, or a conda-forge-like model.

The key user-facing components of this project require either extensions or changes within IPython and JupyterLab. Developing these as extensions allows a large flexibility in release timeline and allows integration with already released versions, widening the pool of users who can access early prototypes. Once extensions are well-developed and stabilized, those features can be migrated to the core IPython and JupyterLab. The IPython monthly minor releases make it easy to regularly incorporate these improvements to users. We expect one major release of IPython mid 2022, which would be the opportunity to make large changes if necessary. Major versions of JupyterLab are published with a cycle of about 6 months, which give us several opportunities to make the Papyri extension part of the default set of shipped extensions.

Building and publishing of IRD files by libraries can be done after release of the library, therefore roadmaps of other projects we would build documentation for do not affect this projectโ€™s schedule.

A significant community investment is also necessary to provide the right models and get adoption across the scientific community. A number of projects are already using Sphinx with various configuration options and specialized extensions for each library. It will be critical to engage with those libraries to make sure the features they currently use and their documentation build processes can be accommodated by Papyri. As this will rely on developing a standard for IRD files to publish and ship documentation to users, agreement across the core Scientific Python ecosystem will need to be reached for the format of IRD files.


Year One:

The first six months will be targeted toward publishing a usable prototype to quickly gather feedback and drive user contribution.

  • Review the core supported features and critical needs from existing Python libraries for a usable prototype
  • Implement Parsing of Numpydoc formatted Docstrings
  • Implement prototype JupyterLab and IPython extensions to render IRD files

Month 6 to 12 will revolve around presenting progress at SciPy to expand adoptions.

  • Publish initial draft of IRD files for development version of at least 5 core scientific python projects (e.g. SciPy, NumPy, Skimage, Matplotlib, Pandas)
  • Provide alpha release for early user feedback and adoption
  • Parse and crosslinking with narrative documentation and examples
  • Prepare in-person events during Scipy 2022
    • Presentation at SciPy (conditional to talk acceptance)
    • In Person workshop.
    • In person user study.
      Year Two:

The second year focuses on growth, and extending functionality, which is critical for a self-sustaining project and seeking future sources of funding.

  • Review of UX and design feedback collected during Scipy
  • Beta release of extensions, and IRD, most features design API and configuration options considered stable enough for end-users
  • Publish draft specification of stabilized IRD format

The last six month will be marked by the second presentation at SciPy, stabilisation and release of a first stable as part of IPython and Jupyter.

  • Presentation at SciPy 2023
  • Second In-person meeting
  • Automatically building and publication of IRD for multiple libraries of the Scientific Python Ecosystem
  • JupyterLab uses IRD when available, and may suggest install/updates of missing IRD files.

Deliverables consist of both implementation and specification of IRD format in order to allow and encourage competing implementation and tooling. This includes:

  • Specification of an intermediate representation documentation format (IRD)
  • Extensions or core components for IPython and JupyterLab to render IRD files.
  • CLI and Library to generate IRD for most library authors.
  • Registry to publish/install IRD file.
  • CLI, and Python library to install from above registry.
  • Automatic building of IRD files for core libraries of the Python Ecosystem.

As for many open source projects it can be relatively difficult to get metrics relative to success, especially since download numbers can be heavily biased due to Continuous Integration installation. While IRD download counts would be better, it requires infrastructure investment which is not included in this proposal. We will thus try to infer user and library adoption using different proxy metrics.

  • Number of libraries in the Python ecosystem that publish IRD as part of their release process is a metric of adoption by libraries and maintainers.
  • Number of third party users that publish IRD for their preferred libraries (without library author involvement) is a metric of how much users interest there is for IRD.
  • Qualitative user engagement on social media, blogs, tutorials and talks about this project.
  • Number of issues/PRs opened by unique users.

Update Jinja2 dependency >= 3.1.0

Jinja2 <3.1 has reached end of support with the release of 3.1.0, so it may be a good time to update the dependency for new features and bugfixes.

Misc cleanup task.

Those a misc task, that don't deserve an issue in themselves.
They don't need to be done all at once, but can be done in small groups:

  • Gen.__init__ takes dummy_progress parameter, it should get it from the config object.
  • Gen.collect_package_metadata should likely be folded in __init__, as it is required for many other steps to run.
  • Gen.collect_api_docs takes a root parameter, but should likely get this value from self.config

Upstream edge cases.

List list a number of edge cases in upstream libraries,
it would be great is upstream would accept to fix them.

They are usually handled by sphinx but add complexity to papyri.

Numpy:

Links that use `<...>` syntax with no text. I think they can be replaced by just `...`

$ rg '[^`]`<.+>`'
doc/RELEASE_WALKTHROUGH.rst.txt
158:provided at `<https://github.com/MacPython/numpy-wheels>`_ to check the
245:Go to `<https://github.com/numpy/numpy/releases>`_, there should be a ``v1.21.0
311:This assumes that you have forked `<https://github.com/numpy/numpy.org>`_::

numpy/_typing/_scalars.py
11:# coerced into `<X>` (with the casting rule `same_kind`)

doc/HOWTO_RELEASE.rst.txt
112:- numpy-wheels `<https://github.com/MacPython/numpy-wheels>`_ (clone)
128:- terryfy `<https://github.com/MacPython/terryfy>`_ (clone).

doc/source/release/1.5.0-notes.rst
19:at `<https://web.archive.org/web/20100817112505/http://bitbucket.org/jpellerin/nose3/>`_ however.

Scipy

Math directive, some function use multiline math directive where it's difficult to distinguish arguments from body:

The Bartlett window is defined as

.. math:: w(n) = \frac{2}{M-1} \left(
          \frac{M-1}{2} - \left|n - \frac{M-1}{2}\right|
          \right)

would be better as

The Bartlett window is defined as

.. math::

   w(n) = \frac{2}{M-1} \left(
          \frac{M-1}{2} - \left|n - \frac{M-1}{2}\right|
          \right)

Here is some functions that do that:

 scipy.signal.bartlett
 scipy.signal.chebwin
 scipy.signal.chebwin
 scipy.signal.hamming
 scipy.signal.hann
 scipy.signal.kaiser
 scipy.signal.windows._windows.bartlett
 scipy.signal.windows._windows.chebwin
 scipy.signal.windows._windows.chebwin
 scipy.signal.windows._windows.general_cosine
 scipy.signal.windows._windows.general_hamming
 scipy.signal.windows._windows.hamming
 scipy.signal.windows._windows.hann
 scipy.signal.windows._windows.kaiser

Configuration option for global example setup

I was interested in trying papyri out on some projects, so I copied an example toml file and tried running on networkx.

The running of nearly all the examples fails with NameError: name 'nx' is not defined because the NX sphinx documentation uses the doctest_global_setup option to run import networkx as nx for every doctest example.

Is there a similar configuration option for papyri yet when running the examples?

Update Graphstore to make proper use of SQLite.

My apologies, this is going to be a bit of a brain dump, is is not completely finalized in my head, my SQL vocabulary is also limited.

The goal of the graphstore is to keep tracks of references between "pages", though we want to keep in mind that a page may have multiple version, so many of the sql recipe you find online won't work.

My thought are the following:

  • A source document can be represented by a 4-tuple key: (package, version, category, identifier)
  • A destination document can be represented by a 3-tuple (package, category, identifier). No version as generally when say scipy reference numpy it does not link to a particular version...
  • A link is a 2 tuple (source,destination), but let's make it a 3 tuple (source, destination, metadata).

I'd like to have a way to insert a document into the graphstore/sql, that make it easy to keep things coheremt, so I'm thinking

3 tables:
SOURCE, DEST, LINKS

  • Source with source key primary id, and package, version, category, identifier
  • Dest with dest key primary id package, category, identifier
  • And Link with Foreign key to source, Foreign key to id and a 3rd metadata field.

Now I'm not super familiar with SQL but: is there a ways to go garbage collection on DEST ? That is to say that if nothing in the Link table refers to it it's culled ?

And what would the API to insert objects into it look like ? Do I have to do three operations on Source/Dest/Link when inserting a list of references ? can I "insert if does not exists or otherwise do nothing and return the primary id" ?

Figure out better naming.

Everything has generic names, and it makes it difficult to Navigate.
We should have distinctive names.

If we pull from the linguistic/librarian context, we could use things like Corpus, Volumes, Monograph, Article...

Things like collections are too ambiguous with the collection module and object of python,
there are other things like incunable, compendium that may be not known enough.

Long term planning.

As a first pass, I'd like to get the following relatively stable / working before making a wide announcement / adoption.

  1. The gen part should be relatively stable and working well in a defined subset of the scipy stack, we are not trying to replace sphinx, and the focus โ€“ for now, should be on being able to better rendering, crosslink of docstrings, grouped by projects (and versions of some project). (re) build multiple versions of Numpy, Scipy... to see how it looks like. The gen part is the most important as once done projects with rarely regen their docs.

  2. The configuration should be fairly minimal if not necessary, and try to work out of the box. If it helps it would be great to standardize across project. Some information is critical to get now even if not understood yet by the rest of the stack but that can still be done later. For example:

  • GitHub repo slug,
  • Release commit hash
  • function file and line.
  • Aliasing fully-qualified name, reference name.
  • link to narative docs.
  • logo
  • authors,
  1. Support of images, math and other simple directive and rst features

  2. Something like the IPython directive that does execute code and embed result, store whether the code has been execute/checked by this directive and store in an attribute. I'm thinking something like rustdocs that marks in red/yellow example that might be invalid.

  3. proper inference working across examples.

  4. Segregate backreference from which section they come from (examples generate way too many references, and if something is explicitly referenced it should have higher priority).

  5. draft integration into IPython (ascii rendering), spyder, jupyterlab, xonsh.

Rename top level configuration key.

I don't like the fact that in each configuration file the top-level key has the name of the package.

for example scipy.toml starts with

[scipy]
...

I'd like to modify it so that it's always something that can't conflict with package import name for reasons. I'm thinking

[global]
module=scipy
...

I'm happy with other names.

@steff456 is that something you could help with ?

Option to disable rich logging.

We currently use rich to display logs and progression.
If one want to debug with PDD, mostly progress bar are an issue as they hijack stdout.

It should be possible to pass a flag to disable rich progress bars.

  • #28
  • for the ingest/install step
  • for the render step.

Configuration Expected Errors.

Right now there are many places that just keep going on errors like here:

papyri/papyri/gen.py

Lines 1669 to 1680 in de17755

try:
with with_context(qa=qa):
item_docstring, arbitrary, api_object = self.helper_1(
qa=qa,
target_item=target_item,
# mutable, not great.
failure_collection=failure_collection,
)
except Exception as e:
failure_collection["ErrorHelper1-" + str(type(e))].append(qa)
# raise
continue
and configuration that skip problematic examples:

exclude = [ "scipy.signal.spectral._spectral_helper",
"scipy.optimize._lsq.least_squares.least_squares",
"scipy.signal.signaltools.correlation_lags",
"scipy.linalg",
"scipy.special.orthogonal.roots_chebyc",
"scipy.special.orthogonal.roots_gegenbauer",
# not implemented substitution reference
"scipy.optimize.nonlin.TerminationCondition",
"scipy.sparse.linalg._expm_multiply.LazyOperatorNormInfo.d",
# directive with space
"scipy.spatial.transform.rotation.Rotation",
# citation not implemented
"scipy.optimize.zeros.toms748",
"scipy.optimize.nonlin.nonlin_solve",
"scipy.special.orthogonal._roots_hermite_asy",
"scipy.spatial.qhull.ConvexHull",
"scipy.spatial.qhull.HalfspaceIntersection",
# not implemented citation_reference
"scipy.optimize.nonlin.anderson",
"scipy.optimize.zeros.brentq",
"scipy.special.orthogonal.roots_chebys",
"scipy.special.orthogonal.roots_sh_chebyu",
"scipy.special.orthogonal.roots_sh_chebyt",
"scipy.special.orthogonal.roots_chebyu",
"scipy.special.orthogonal.roots_chebyt",
"scipy.optimize._lsq.lsq_linear.lsq_linear",
"scipy.optimize._lsq.trf",
"scipy.optimize._lsq.dogbox",
"scipy.special._basic.zeta",
"scipy.spatial._spherical_voronoi.calculate_solid_angles",
'scipy.optimize.zeros.ridder',
'scipy.special.orthogonal.roots_hermite',
'scipy.special.orthogonal.roots_hermitenorm',
'scipy.special.orthogonal.roots_jacobi',
'scipy.special.orthogonal.roots_sh_jacobi',
'scipy.special.orthogonal.roots_laguerre',
'scipy.special.orthogonal.roots_genlaguerre',
'scipy.special.orthogonal',
'scipy.special.orthogonal.roots_legendre',
'scipy.signal.lti_conversion.tf2ss',
'scipy.special._basic.polygamma',
'scipy.special.orthogonal.roots_sh_legendre',
'scipy.optimize.nonlin.Anderson',
'scipy.special.orthogonal._pbcf',
'scipy.spatial._spherical_voronoi.SphericalVoronoi',
# not implemented visit target
"scipy.interpolate.fitpack2.SmoothSphereBivariateSpline",
"scipy.interpolate.fitpack2.RectSphereBivariateSpline",
"scipy.interpolate.interpnd.CloughTocher2DInterpolator",
"scipy.interpolate.fitpack2.LSQSphereBivariateSpline",
# not implemented .. comment
"scipy.linalg.interpolative.interp_decomp",
"scipy.linalg.interpolative.estimate_rank",
"scipy.linalg.interpolative.estimate_spectral_norm",
"scipy.linalg.interpolative.estimate_spectral_norm_diff",
"scipy.linalg.interpolative.id_to_svd",
"scipy.linalg.interpolative.reconstruct_interp_matrix",
"scipy.linalg.interpolative.reconstruct_matrix_from_id",
"scipy.linalg.interpolative.reconstruct_skel_matrix",
"scipy.linalg.interpolative.svd",
#serialisation issue
"scipy.signal.wavelets.cwt",
"scipy.linalg.basic.pinv",
"scipy.integrate.odepack.odeint",
"scipy.spatial.distance.pdist",
"scipy.spatial.kdtree.KDTree.query_ball_point",
"scipy.interpolate.interpolate._PPolyBase.extend",
"scipy.spatial.distance.cdist",
"scipy.optimize._lsq.common.CL_scaling_vector",
"scipy.optimize._trustregion.BaseQuadraticSubproblem.get_boundaries_intersections",
"scipy.optimize.nonlin.newton_krylov",
"scipy.linalg.basic.pinvh",
"scipy.linalg._decomp_update.qr_insert",
"scipy.optimize._linprog_ip._ip_hsd",
"scipy.optimize.nonlin.KrylovJacobian",
"scipy.special.orthogonal._compute_tauk",
"scipy.interpolate.interpolate.BPoly.extend",
"scipy.spatial.qhull.Delaunay",
"scipy.spatial.qhull.Voronoi",
# issue in parsing examples section that gets directive as output w/o
# body.
"scipy.signal.lti_conversion.ss2tf",
"scipy.interpolate.interpolate.interp1d",
"scipy.optimize._linprog.linprog",
"scipy.interpolate.interpolate.BPoly",
"scipy.interpolate.interpolate.lagrange",
"scipy.sparse.csgraph._traversal.breadth_first_tree",
"scipy.sparse.csgraph._traversal.depth_first_tree",
"scipy.sparse.csgraph._min_spanning_tree.minimum_spanning_tree",
# ts parse error
'scipy.optimize.optimize.fmin',
'scipy.optimize.optimize.fmin_powell',
'scipy.integrate.quadpack.quad',
"scipy.optimize.optimize._minimize_scalar_bounded",
"scipy.optimize.optimize._line_for_search",
"scipy.optimize.optimize.fmin_bfgs",
"scipy.optimize.optimize.fminbound",
"scipy.fft._pocketfft.pypocketfft.PyCapsule.c2c",
"scipy.fft._pocketfft.pypocketfft.PyCapsule.c2r",
"scipy.fft._pocketfft.pypocketfft.PyCapsule.dct",
"scipy.fft._pocketfft.pypocketfft.PyCapsule.dst",
"scipy.fft._pocketfft.pypocketfft.PyCapsule.genuine_hartley",
"scipy.fft._pocketfft.pypocketfft.PyCapsule.r2c",
"scipy.fft._pocketfft.pypocketfft.PyCapsule.r2r_fftpack",
"scipy.fft._pocketfft.pypocketfft.PyCapsule.separable_hartley",
"scipy.sparse.linalg.isolve.iterative.bicg",
"scipy.sparse.linalg.isolve.iterative.bicgstab",
"scipy.sparse.linalg.isolve.iterative.cg",
"scipy.sparse.linalg.isolve.iterative.cgs",
'scipy.sparse.linalg.isolve.minres.minres',
'scipy.sparse.linalg.isolve.iterative.qmr',
# ts difference parse
'scipy.sparse.linalg.isolve.iterative.gmres',
# not numpydoc format
# See https://github.com/scipy/scipy/pull/15034
'scipy.sparse.linalg.dsolve._superlu.gstrf',
# numpydoc parse error
"scipy.optimize._linprog_doc._linprog_highs_doc",
"scipy.optimize._linprog_doc._linprog_highs_ds_doc",
"scipy.optimize._linprog_doc._linprog_highs_ipm_doc",
"scipy.optimize._linprog_ip._linprog_ip",
"scipy.optimize._linprog_doc._linprog_ip_doc",
"scipy.optimize._linprog_rs._linprog_rs",
"scipy.optimize._linprog_doc._linprog_rs_doc",
"scipy.optimize._linprog_simplex._linprog_simplex",
"scipy.optimize._linprog_doc._linprog_simplex_doc",
"scipy.optimize._qap._quadratic_assignment_2opt",
"scipy.optimize._qap._quadratic_assignment_faq",
"scipy.optimize._linprog_highs._linprog_highs",
#"ExecError-<class 'ValueError'>": [
"scipy.signal.signaltools.resample_poly",
"scipy.optimize.optimize.bracket",
"scipy.spatial.transform._rotation_spline.RotationSpline",
"scipy.signal.filter_design.freqz",
"scipy.signal.waveforms.sweep_poly",
"scipy.optimize.cobyla.fmin_cobyla",
"scipy.optimize._shgo.shgo",
"scipy.linalg.basic.matmul_toeplitz",
"scipy.linalg.basic.solve_banded",
"scipy.linalg.basic.solve_toeplitz",
"scipy.linalg.basic.solve_triangular",
"scipy.linalg.basic.solveh_banded",
"scipy.integrate._bvp.solve_bvp",
"scipy._lib._util.DeprecatedImport",
"scipy._lib._util.scipy.linalg.lapack",
"scipy.sparse.csgraph._flow.maximum_flow",
# `....`th is not liked by tree-sitter.
"scipy.signal._peak_finding.find_peaks_cwt",
"scipy.signal._peak_finding._filter_ridge_lines",
# stray backtick https://github.com/scipy/scipy/pull/15564
"scipy.integrate._quadrature.cumtrapz",
"scipy.integrate._quadrature.simps",
"scipy.integrate._quadrature.trapz",
# citation not impl
"scipy.optimize._nonlin.anderson",
"scipy.optimize._zeros_py.brenth",
"scipy.optimize._zeros_py.brentq",
"scipy.optimize._zeros_py.ridder",
"scipy.special._orthogonal",
"scipy.special._orthogonal.roots_chebyc",
"scipy.special._orthogonal.roots_gegenbauer",
"scipy.special._orthogonal.roots_hermite",
"scipy.special._orthogonal.roots_hermitenorm",
"scipy.special._orthogonal.roots_jacobi",
"scipy.special._orthogonal.roots_sh_jacobi",
"scipy.special._orthogonal.roots_laguerre",
"scipy.special._orthogonal.roots_genlaguerre",
"scipy.special._orthogonal.roots_legendre",
"scipy.special._orthogonal.roots_sh_legendre",
"scipy.special._orthogonal.roots_chebys",
"scipy.special._orthogonal.roots_chebyt",
"scipy.special._orthogonal.roots_chebyu",
"scipy.special._orthogonal.roots_sh_chebyt",
"scipy.special._orthogonal.roots_sh_chebyu",
"scipy.optimize._nonlin.Anderson",
"scipy.special._orthogonal._pbcf",
# unmatched backticks
"scipy.integrate.quadpack.dblquad",
"scipy.signal.spline.symiirorder1",
"scipy.signal.spline.symiirorder2",
"scipy.signal.filter_design._ellipdeg",
"scipy.special._logsumexp.log_softmax",
"scipy.linalg.decomp.cdf2rdf",
"scipy.linalg.special_matrices.fiedler",
"scipy.interpolate._fitpack_impl.bisplev",
"scipy.interpolate.fitpack.spalde",
"scipy.interpolate._fitpack_impl.spalde",
"scipy.optimize._lsq.common.regularized_lsq_operator",
"scipy.optimize._trustregion_constr.tr_interior_point.tr_interior_point",
"scipy.sparse.bsr.bsr_matrix.check_format",
"scipy.optimize._trustregion_constr.tr_interior_point.BarrierSubproblem.gradient_and_jacobian",
# other
"scipy.signal._spline.symiirorder1",
"scipy.signal._spline.symiirorder2",
"scipy.signal._filter_design._ellipdeg",
# misc assert
"scipy.linalg._decomp.cdf2rdf",
"scipy.linalg._special_matrices.fiedler",
"scipy.integrate._quadpack_py.dblquad",
"scipy.integrate._odepack_py.odeint",
"scipy.interpolate._fitpack_py.spalde",
"scipy.special._orthogonal._compute_tauk",
"scipy.sparse._bsr.bsr_matrix.check_format"
]

It would be good to have something like the following in configuration:

[expected_errors]
VisitSubstitutionDefinitionNotImplementedError = ['networkx.algorithms.approximation.vertex_cover', networkx.algorithms.approximation.something_else]

And verify that the errors we get is the one we expect.

Part of that has been done in #94.

Currently the early_error can only be done in config file.
We should

  • add a --fail-early/--no-fail-early CLI flag that overwrite the option.
  • Add a flag to fail if the are expected errors that have not been encountered.

Make dev install work for everyone.

There are many places in the code that assume it's running on my machine, or might make assumptions that folders/files already exist.

It would be nice to have the basic workflow work for everyone.

install papyri

$ pip install flit
$ git clone https://github.com/Carreau/papyri
$ cd papyri
$ flit install --symlink
$ git clone https://github.com/stsewd/tree-sitter-rst   # yes, inside papyri source tree
$ papyri build-parser

Create own docs:

$ papyri gen examples/papyri.toml
$ papyri ingest ~/.papyri/data/papyri_0.0.8/

View own docs

$ papyri serve
Running on http://127.0.0.1:5000 (CTRL + C to quit)

Directives to implement

The API for directive is not clear (yet), and I think to get it right we should implement a few more inline and block directives.

I think the two biggest one currently are : .. code:: to display/execute(and similar), and .. toctree:, that is used to implicitly build the narrative doc hierarchy.

Rendering of rst tables seems to be missing

This may be a known issue but just something I noticed so I thought I'd mention it here - it looks like rst table syntax is not being properly rendered in the html output. I noticed this while looking at the papyri-generated API docs for numpy.digitize, which has a table in the extended summary of the docstring.

Packaging docs / documentation serving

Hi @Carreau, this is a really cool project!

I was thinking about this today when considering docstrings in Literary.

Background

TL;DR Literary enables you to develop a package with Jupyter Notebooks. Notebooks form the source+docs+tests, whilst there is a build step that produces specialised views (e.g. pure Python source for PyPI, clean rendered docs, etc). During development, notebooks can be imported via an import hook.

I'm planning on treating specially tagged Markdown cells as docstrings, so that there can be rich markup instead of plain-old-text. The best approach for this (I think) is using MyST markdown (which jupyterlab-myst can render in the source notebook), which can then be used with Papyri and Sphinx to generate the intermediate Papyri IR.

Proposed Developer Features

There are a couple of features that I think might be useful for Papyri to support. To be clear, these are not "can you add this" ideas; rather, I am curious as to whether these sorts of features are aligned with your view of the direction of the project.

  • Support finding documentation by data_files.
    It might be useful for package authors to leverage the existing packaging system (PyPI / conda-forge) to distribute documentation. I'm not an expert here, so there might be unforeseen issues with this, but I wonder whether having numpy-docs installed as an extras package for numpy would be a good way of bundling documentation.
  • Add protocol hook _rich_doc_
    I would like to be able to annotate generated modules with metadata that would provide Papyri with the IR for each object. I imagined something like _papyri_ir_ or something analogous to __doc__. I did see your comment that suggests this might already be a feature of sorts - If Literary places the markdown in __doc__, then it would seem that Papyri would implement a system to determine how to parse it. Maybe this is what I want?
  • Add documentation server entrypoint ?
    Whilst the above point enables IR generation for Python objects, if this IR references other parts of the API, I would like to be able to generate that IR for Papyri. For my use case, notebooks will be the main document format, anduring development, I don't want users to have to run an update-docs-like command in order to be able to follow documentation links. It would be nice if there were some way besides data_files to provide documentation i.e. a documentation server. The idea here is that Literary would provide a plugin to generate documentation modules on-the-fly for notebook-backed Python modules. This is a very use-case specific request, so I realise that it might not align with the scope of the tool.

One thing that might be confusing here is that I feel as though Papyri is attempting to do many different things, including:

  1. Better help rendering (e.g. object?, or help(object)
  2. Full documentation navigation.
    I'm not totally clear yet on how these two different things interact; __doc__ is usually a subset of the documentation, but in the existing IPython space there is no navigation, just __doc__ rendering

I hope I've distinguished between the two carefully.

Regardless of my use case, I think this is a really exciting idea for the Jupyter ecosystem. Documentation rendering is the one thing that hasn't really moved forward with the rest of the Jupyter tools, and this project would make a big difference to the every-day experience of developers.

Resolve configuration path with respect to configuration file.

Right now path in the configuration file are most of the time resolved with respect to the current running folder of papyri, i think they should be resolved with respect to the path of the configuration file, at least when relative. We need to of course take into consideration that the configuration may not come from a file.

I'm thinking that we should have a configuration object in general, and the loading step would resolve the paths.

Add strikethrough ?

I'm wondering if the bold/underline/emph should be applied as some kind of stylemodifier that only accept leaf nodes.

Upgrade all examples/*.toml to #94 (expected_errors)

#94 introduce the <module_name>.expected_errors, which has a mapping of error names to which object fail with those.

In particular this will allow us to track when some errors get fixed (in papyri or upstream).

Other files, use the exclude = [list of str] option, which is impractical as it does not tell us whether the failure are still happening or not.

For all files that have a lot of values in exclude, we should migrate to <module_name>.expected_errors:

  • make exclude and empty list.
  • set early_error to false.
  • Run papyri gen --dry-run: it will print a list of encountered errors at the end of the run
  • Update the config file with the values from the step before.

It's ok to keep a few items in exclude if there are issues, but please add comments in the toml file with the reasons and or open issues if necessary.

use on insert/remove trigger to track changes.

So I've been poking at SQL:

  • first I added indexes to speed up queries. I don't know what I'm doing so @steff456 can you look at what I did and if it make sens or can be made better.
  • I don't know if we can make categorical columns to save us space / speed up queries, from what I can tell we can't in sqlite. If not that's ok.
  • Am I also correct that there is single type of index in sqlite ?

And finally: A long term project would be fast/efficient re-rendering of modified pages, for that I need a ways to track all rows that have been updated/created/removed since a specific time. For our purpose "time" is not wallclock time but a sort of "version" integer that is explicitly bumped.

  • Can we use trigger updates to have more tables that tracks which rows have been touched ? I'm thinking something like a 2 tables "documents_changes(Foreign key:int, version:int)" and "destinations_changes(Foreign key:int, version:int)" @steff456 is that something you could help me create ?

No such file .papyri/papyri.toml

I experienced this on the latest release as well as on master:

  File "/home/ryan/Repositories/idom-team/idom/venv/lib/python3.9/site-packages/papyri/__init__.py", line 288, in gen
    gen_main(names, infer=infer, exec_=exec)
  File "/home/ryan/Repositories/idom-team/idom/venv/lib/python3.9/site-packages/papyri/gen.py", line 418, in gen_main
    conf = toml.loads(conffile.read_text())
  File "/home/ryan/.pyenv/versions/3.9.0/lib/python3.9/pathlib.py", line 1255, in read_text
    with self.open(mode='r', encoding=encoding, errors=errors) as f:
  File "/home/ryan/.pyenv/versions/3.9.0/lib/python3.9/pathlib.py", line 1241, in open
    return io.open(self, mode, buffering, encoding, errors, newline,
  File "/home/ryan/.pyenv/versions/3.9.0/lib/python3.9/pathlib.py", line 1109, in _opener
    return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/home/ryan/.papyri/papyri.toml'

I tried touching this file, got another error about missing .papyri/config.toml, but finally got this traceback and gave up:

  File "/home/ryan/Repositories/idom-team/idom/venv/lib/python3.9/site-packages/papyri/__init__.py", line 288, in gen
    gen_main(names, infer=infer, exec_=exec)
  File "/home/ryan/Repositories/idom-team/idom/venv/lib/python3.9/site-packages/papyri/gen.py", line 432, in gen_main
    g.do_one_mod(names, infer, exec_, conf)
  File "/home/ryan/Repositories/idom-team/idom/venv/lib/python3.9/site-packages/papyri/gen.py", line 1015, in do_one_mod
    raise ValueError(f"from {qa}") from e
ValueError: from idom

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.