executablebooks / myst-parser Goto Github PK
View Code? Open in Web Editor NEWAn extended commonmark compliant parser, with bridges to docutils/sphinx
Home Page: https://myst-parser.readthedocs.io
License: MIT License
An extended commonmark compliant parser, with bridges to docutils/sphinx
Home Page: https://myst-parser.readthedocs.io
License: MIT License
I'm loving all of the documentation improvements that we've made on this. It helps me to wrap my head around the tool and think about how it extends commonmark markdown.
One thing I'm concerned with is that over time we forget to clearly document the departures that we've made from CommonMark. Perhaps there is a way that we can programmatically generate a "reference spec" that can be inserted into the docs, so we know there is always a canonical source of truth for what should be in the documentation?
This is an issue to discuss whether / how to implement a CommonMark + directives parser for Sphinx, as @chrisjsewell and I had discussed earlier.
The recommonmark
project piggy-backs on the commonmark-py
project to parse markdown. It then defines a Sphinx parser that sub-classes the docutils
parser and defines methods that convert the commonmark-py AST into docutils AST (https://github.com/readthedocs/recommonmark/blob/master/recommonmark/parser.py#L21).
Under the hood it's still using docutils methods since they sub-class the docutils parser, and as a result there is some weird behavior (like nested_parse
expecting rST in the content blocks).
@chrisjsewell proposed writing our own CommonMark -> docutils AST parser, and then adding on the syntax for roles and directives. This would be two things:
The hope is that this parser would be easier to maintain, understand, and grow as we wished to support new syntax. It would be a collection of "markdown -> docutils AST" rules, rather than relying on an intermediate AST as the commonmark-py
project does.
commonmark-py
?As I was looking through documentation, I am wondering whether we could still use the commonmark-py
machinery to parse basic commonmark syntax, and then use our own statemachine parser to handle the "extra" grammar elements like roles and directives.
Basically, I'm wondering whether we could do the same thing that recommonmark
does, but instead of sub-classing a docutils Parser, we sub-class a parser that knows how to parse only the subset of markdown that commonmark doesn't cover.
If this were possible, I feel like we wouldn't need to worry about re-writing the test suite of commonmark-py
, and we could then focus only on the extra syntax needed for things like roles and directives. We could then also have a markdown parser under the hood for the nested_parse
sections.
Note - it may also be illustrative to look at how the commonmark-py parser does its parsing - I believe that code starts here: https://github.com/readthedocs/commonmark.py/blob/c4c5b0df72961663060c65ed0858840b5e031b10/commonmark/blocks.py#L881
And the blocks.py module in general defines how they parse markdown...maybe we could re-use (or explicitly use) some of it...
I'm curious what @chrisjsewell thinks about this - mostly I am trying to find ways that we don't have to write our own from-scratch markdown parser as I'm a bit worried about all the edge-cases we'll have to consider :-)
Find it here: https://github.com/ExecutableBookProject/meta/wiki/Resources:-Markdown-(MD)#markdown-parsers-in-python
What do people think about converting much of the README into links to places in the documentation? This could help us cut-back on duplicated content that may be hard to remember to update.
I think basic "install and usage" stuff is fine in the README, but other stuff like supported syntax would be better to have in a single canonical place.
Another option would be to use include
statements in our documentation and pull from the README.
Thought I'd trying building the docs but the pandas theme is still causing me grief. Other instructions are also generating errors.
Following install instructions here in a fresh conda environment (conda create --name test1 sphinx
) I get
(This follows conda install -c conda-forge myst-parser
)
Running pip install git+https://github.com/pandas-dev/pandas-sphinx-theme.git@master
and then repeating leads to exactly the same error.
With pip install myst-parser[sphinx]
I get
With the last set of developer instructions I get
Curious why we are calling it "develop" and not just using "master" since this is our own repository and not a fork.
I think the name itself is kind of trivial, but remembering to git checkout develop
instead of master
feels like an unnecessary cognitive overhead to remember. I'm just curious if there's a specific reason for this, or plans to change it to master
at some point?
We should use sphinx-apidoc
in the conf.py
to auto generate the API documentation, similar to what I do in: https://github.com/chrisjsewell/ase-notebook/blob/fdf5abb403c17f544453c1afd9e9bdaae4826a6d/docs/source/conf.py#L285
Originally posted by @chrisjsewell in #70 (comment)
Describe the bug
Attempting to use a single-line caption:
argument within the code-block
results in the following error during sphinx build:
home/ross/repos/elegant-scipy/sphinx/content/ch1.md:22: WARNING: Directive options:
while scanning for the next token
found character '`' that cannot start any token
in "<unicode string>", line 2, column 18:
caption: {numref}`tab:counts` as a `numpy` array
Note that this error is avoided by:
caption: 'this will not โfail {ref}`my:ref`'
|
:
caption: |
This won't fail either {ref}`my:ref`
To Reproduce
Use a backtick in an un-quoted, single-line caption within a code block, e.g.:
```{code-block} python
---
name: code:label
caption: Caption with reference to outside figure {ref}`fig`
---
a = 1
```
Expected behavior
As noted above, there are already (at least) two ways around this problem, so maybe it just needs to be documented. In a perfect world, it would be possible to use backticks in a single-line caption without any additional escaping.
Screenshots
n/a
Environment (please complete the following information):
Additional context
This was the real cause of #56
As noted in executablebooks/MyST-NB#17:
One thing that should be checked/documented at the myst_parser
level, is that the docutils.RstParser
specifically prohibits transitions from 'non-consecutive' header levels, e.g.
# Title h1
### Title h3
If think with myst, this will currently end up being the same as:
# Title h1
## Title h2
I should check this, and decide how to handle it, e.g. would sphinx be happy with 'virtual' sections being added that don't have a <title>
child.
Currently, target syntax has something like this:
(targetname)=
# Some header
What do folks think about this?
Another idea that might be worth exploring is to look into using an in-line syntax for targets. If I recall, they only work for headers anyway, so what if we defined targets with something like:
# My header name ={mytargetname}
Though that might break down if we ever supported things like extra attributes syntax
This happens when there are user errors in the {cite}
roles in the document (specifically, reference to something not in the .bib)
To Reproduce
{cite}
something that is not in reference.bib{bibliography}
Edit: Sorry, this is wrong. Another instance here with different cause.
Screenshots
Here is a citation with no corresponding entry in references.bib
The bibliography
outputs a different reference:
(This doesn't happen with rST and seems to be introduced by myst_parser. The following is in a separate Sphinx project with only sphinxcontrib.bibtex as an extension.) See input:
Environment:
A list of available parsers for Markdown for Python
project | AST | Notes |
---|---|---|
mistletoe | ? | |
mistune | ? | |
commonmark-py | ? | |
marko | ? |
paka.cmark excluded as it is a wrapper for commonmark c library
As noted in #66, in RST the whole header text is turned into a hyperlink (1st and 2nd level headers only), whereas in Myst it is only the trailing anchor.
RST:
Myst:
I'm just playing around with mistletoe, and since your fork doesn't allow issues (forks generally don't) I'll put code-specific comments here.
First thing I found: sometimes you want to have colons in arguments. For example in the default sphinx toctree directive it has :caption: Contents:
. Doing this with myst results in this error:
yaml.scanner.ScannerError: mapping values are not allowed here
There are some inconsistencies between RTD and Circle builds. For example:
In particular, note blocks are classed differently. The RTD docs class things as admonition
while the CircleCI docs class things as alert
I think this is related to the pydata bootstrap theme re-classing (https://github.com/pandas-dev/pydata-bootstrap-sphinx-theme/blob/master/pandas_sphinx_theme/bootstrap_html_translator.py#L44). This is code that replaces common Sphinx classes with the respective bootstrap classes so that the CSS shows up.
For some reason, this replacement happens in CircleCI, but not in readthedocs...
With the standard docutils syntax/parser, there is no way to parse options to role functions :name:`content`
. However, in the actual role function signature it does accept an options
keyword:
def __call__(self, role, rawtext, text, lineno, inliner, options={}, content=[]):
pass
Therefore, it would be conceivable to write myst specific roles that actually did something with these options. No idea what a good syntax would be though.
Describe the bug
{ref}`*vectorization* <sec:vectorization>`
should result in "vectorization" as a link and italicized. The current result is:
<a class="reference internal" href="ch1.html#sec-vectorization"><span class="std std-ref">*vectorization*</span></a>.</p>
To Reproduce
To use an example already at hand:
docs/examples/wealth_dynamics_md.md
(sec:lorenz)=
## Lorenz Curves and the Gini Coefficient
Here is a reference to {ref}`*Lorenz* <sec:lorenz>`, with emphasis!
docs/
and build: make html
<browser> _build/html/examples/wealth_dynamics_md.html
Expected behavior
The emphasis syntax within a role should result in a functioning link that is also italicized.
Screenshots
N/A
Environment (please complete the following information):
Additional context
I noticed that tests/test_syntax/test_ast.py
has a test_role
which appears to test internal emphasis. I'm not as familiar with pytest
as I need to be, so I had trouble parsing the test suite to figure out exactly what was happening - either way, it doesn't appear the test covers the full conversion from MyST syntax->HTML.
Describe the bug
Running the install for this package conflicts with a current install of awscli
due to requirement for docutils>=0.16
ERROR: botocore 1.15.12 has requirement docutils<0.16,>=0.10, but you'll have docutils 0.16 which is incompatible.
ERROR: awscli 1.18.12 has requirement docutils<0.16,>=0.10, but you'll have docutils 0.16 which is incompatible.
Installing collected packages: docutils, myst-parser
Found existing installation: docutils 0.14
ERROR: Cannot uninstall 'docutils'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
@chrisjsewell indicated we only required docutils>=0.15
-- should this be specified in setup.py
?
I tend to think we should develop to the latest version though so perhaps environments
are better in this case.
As noted by @choldgraf in #50
We use pytest-regressions
` in the developer documentation - we should document it!
Originally written by @chrisjsewell:
Basically some code needs to written to (a) check the link is local (e.g. doesn't start with https, etc),
(b) get hold of the file extensions that sphinx deems 'linkable', then (c) if both these check pass, remove the extension.
Thinking about it now though, maybe if these checks parse, then we should also actually check if the file exists and, if it does, then set it as equivalent to writing `doc`{name.txt}
here: https://github.com/ExecutableBookProject/MyST-Parser/blob/cf3352c8a224af219cb062cfe467d7da9289284a/myst_parser/docutils_renderer.py#L583
See sphinx.roles.AnyXRefRole
for how the ref roles actually work.
We have effectively invented a new flavor of markdown in this repository, one that explicitly tries to mirror the functionality of Sphinx/rST using markdownic (is that a word?) syntax.
Thus far, the two main names we have given it are:
This is an issue to see if one of these is acceptable, or if we should keep brainstorming. I actually quite like myst, with the exception that it is a fairly popular old school video game so we might run into confusion with that in searches etc.
Other ideas
(I don't have any other ideas I just wanted an excuse to write "spharkdown" but you all should chime in ๐ )
I've had a few people ask me if / how MyST markdown is different from RMarkdown. Particularly because they use similar patterns for configuration etc.
We should add a page that compares how to do the same thing (more or less) in both flavors of markdown, to make it easier for people to see the similarities and differences. This is also an opportunity to decide for ourselves if this is a more generic and well-structured improvement to RMarkdown :-)
I wonder if that's something that one of the integration developers could work on? create a page that does "this is how you do XXX in MyST markdown, this is how you do it in RMarkdown". What do you think @jstac ?
As discussed in executablebooks/MyST-NB#17, a few things could be improved to be more intuitive:
DocutilsRenderer.set_current_node
to maybe DocutilsRenderer.current_node_context
__enter__
methods re-instate myst specific tokens (DocutilsRenderer
, HTMLRenderer
, AstRenderer
).Lastly there is a bug in mistletoe that I should fix here: https://github.com/ExecutableBookProject/mistletoe/blob/2cfe7446b975685f98837f9e40aaabcc0e270a79/mistletoe/core_tokens.py#L301 (I think it should return False or raise an error?)
Originally posted by @chrisjsewell in executablebooks/myst-nb.example-project#1 (comment)
My feeling is that this repository is still evolving quite rapidly (eg the documentation). Can we hold off on requiring an approving review until we think the repo is in a more steady state, and then turn it on?
Also, we should discuss as a group whether we want to impose these kinds of restrictions on the repositories in the EBP org (eg a requirement that a person reviews the PR before it may be merged). It may be the right decision, but either way it should be a group decision.
Describe the bug
From limited testing, it appears that the name:
argument requires a string to work within the {code-block}
directive. This is inconsistent with how name:
appears to work within other directives such as {figure}
or {table}
, for instance.
To Reproduce
The following works for creating a label called fig:my
that can then be used in the text via the {ref}
or {numref}
roles. This works with the table directive as well.
โ```{figure} my_figure.png
---
name: fig:my
---
โ```
However, the following fails to define a label (a sphinx warning about an undefined label is raised if you try to reference code:my in the document):
โ```{code-block} python
---
name: code:my
---
def my_fun(): pass
โ```
The following works however (note the quotes around the label name):
โ```{code-block} python
---
name: 'code:my'
---
def my_fun(): pass
โ```
Expected behavior
I am not a sphinx/docutils expert, so maybe this is the expected behavior. However I would expect the name:
argument to result in consistent behavior within all directives for which it is defined.
Screenshots
n/a
Environment (please complete the following information):
Last week we had a few nice conversations around "how to extend Markdown to support roles and directives from Sphinx".
This is a quick issue to try an keep track of our thinking there.
After a few conversations, we arrived at a syntax that uses triple-backticks, followed by directive name, followed by configuration with two options (either using {key=val}
or YAML front-matter inside the code block).
So something like (ignore the slashes, just for rendering purposes):
\```mydirective {key=val}
\```
And for in-line text, using single-backticks followed by an identifier in the traits associated w/ it:
This is `my role`{myrolename key=val}
This effectively treats everything as "raw text", with the idea that this would degrade gracefully by just rendering as a raw blob if the directive didn't exist.
Something like this:
{}
becomes configuration for the directive. Anything inside the backticks becomes content that is processed by the directive.Something similar could be done with in-line blocks
Hey all - after our recent conversations about RMarkdown / JMarkdown / rST / etc, it reminded me that a few years back we had kicked around the idea for an improved version of rST, called Myst.
We created a repository for it, and iterated a bit in issues and PRs, but we never really solidified it into something. It's been sitting in @fperez's private repos for a while now.
In case it's helpful in informing our thinking, @fperez just transferred it over to this GitHub org:
https://github.com/ExecutableBookProject/myst
It's a slightly different take over the "which markdown flavor do we use?" question - myst is more like "can we make rST more friendly" rather than "can we extend CommonMark to support publishing". The hope is to make minor modifications to rST (likely with a re-implementation of a parser in python) so that we keep the flexibility of rST but enjoy some more human-friendly syntax where it's possible. That said, it would definitely be an "N+1th standard" which is why I suspect it never took off for us
I'd be curious to hear what you all think about it!
Is your feature request related to a problem? Please describe.
Not so much a problem as a limitation of sphinx for scientific publication. Setting numfig=True
in the conf.py
enables automatic numbering of things like figures, tables, code-blocks, etc. Unfortunately, for code-blocks
this only works if the :caption:
has been defined.
Describe the solution you'd like
For scientific publication, it would be very nice if it were possible to reference a code block via numref
without explicitly having to define a caption:
within the code-block
directive. For example:
I want to reference the following code-block with a numref:
```{code-block} python
---
name: my-code
---
print("Hello World")
```
But when I use try to reference via {numref}`my-code`, it fails unless there is a caption.
Note that the desired behavior (numref without caption) works find for tables and figures.
Describe alternatives you've considered
Pre-labeling code-blocks doesn't work either:
(my-code)=
```{code-block} python
def foo():
pass
```
Referencing the code-block like {numref}`my-code` doesn't work either
Additional context
Note that this is a limitation of sphinx, not the myst-parser
. The myst-parser
output is entirely consistent with rST output with sphinx (i.e. warnings and failed xrefs). Perhaps this is something that needs to be fixed upstream in sphinx, not in MyST (let me know if that's the case).
Thus far our documentation has focused on the Sphinx parsing and authoring side, but we should also document how to import MyST, run it on markdown, understand the AST, etc.
The $-delimited math on line 352 of syntax.md doesn't render properly at build.
Other instances of $-delimited inline math in the file seem to work though, not sure why this particular instance is not working.
N.B. switching to the myst-role syntax {math}...
works fine.
I ran into a weird bug and I'm not sure if it's a bug or not :-)
If I run the MyST parser on a single line with no newline, then the result is always a paragraph, even if the line is a heading. E.g.:
from myst_parser.block_tokens import tokenize
tokenize("# Header\n".splitlines(keepends=True))
results in a Heading
token, but:
from myst_parser.block_tokens import tokenize
tokenize("# Header".splitlines(keepends=True))
Results in a Paragraph
token
This is semi-common in notebooks, if a user puts a single header line in a markdown cell.
A list of markdown items that we should review in addition to discussions on Jmarkdown
, RST
support etc.
fall 2015
but may have some useful ideas for consideration.I'm teaching a course right now, and I observed again that some students need a much simpler material presentation than others. I think one way to accommodate this is having interchangeable segments of text that the readers can switch to the presentation they prefer.
Is myst markup sufficiently flexible to support this? Is this an interesting feature to implement?
I keep getting warnings when building the docs, they look like this:
Pygments lexer name '' is not known
I'm pretty sure this is because of
```
some raw text
```
blocks with no language specified. Should we default code blocks w/o a language to a default "raw" language with Pygments, so a warning isn't raised?
Since we have directives and roles that could make HTML at a block or in-line level explicit (rather than detected with regexes and such), what do folks think about explicitly not allowing raw HTML in the myst content, and instead asking people to use a directive or role if they want to embed HTML in their page?
I came across a weird bug today:
Links that don't resolve properly seem to break Sphinx. Basically, any link that isn't either a correct target to a filename, or a URL, will result in this error:
Exception occurred:
File "/home/choldgraf/anaconda/envs/dev/lib/python3.7/site-packages/sphinx/writers/html5.py", line 779, in unknown_visit
raise NotImplementedError('Unknown node: ' + node.__class__.__name__)
NotImplementedError: Unknown node: TextElement
I reproduced this by adding any of these lines to the docs of myst_parser:
[test](test.html)
[test](test.md)
[test](install.md)
(page actually exists but wrong target name)[test](blahblah)
(target doesn't exist)but the error is not triggered with these links:
[test](install)
[test](https://test.html)
Especially for quick demonstration purposes, without having to hook in to the whole sphinx infastructure. For example myst-to-ast path/to/doc.md
would convert:
# header
Some **bold** text
to
<document>
<section>
header
<paragraph>
Some
<emphasis>
bold
text
With #38 (thanks @rossbar) the documentation build is closer to being able to run with no warnings.
At this point @choldgraf we can add the sphinx-build
nitpick options -nW
here: https://github.com/ExecutableBookProject/myst_parser/blob/8c7c7a7cd856836e73d0d9de7e6827ac4da6c29d/docs/Makefile#L20
For local debugging, we can also add a make debug
option to the Makefile, as I do here.
As discussed in #31 it appears that people are for a footnote syntax, as implemented in some flavours of Markdown (see Extended Syntax):
Here's a simple footnote,[^1] and here's a longer one.[^bignote]
[^1]: This is the first footnote.
[^bignote]: Here's one with multiple paragraphs and code.
Indent paragraphs to include them in the footnote.
Can we come to a decision on this, bearing in mind
Note these footnotes would be inherently auto-numbered, as can be tried in the VS Code standard preview (GitHub markdown does not support footnotes)
paging @choldgraf @mmcky @jstac @akhmerov @najuzilu
Currently, link definitions cannot be used inside directives, e.g.
```{note}
[reference]
```
[reference]: https://github.com/ExecutableBookProject
will not work, but this will:
```{note}
[reference]
[reference]: https://github.com/ExecutableBookProject
```
This will also be the case for upcoming footnotes
In our review of existing Markdown parsers (ExecutableBookProject/meta#18 ExecutableBookProject/meta#19) mistletoe
came out as the favourite for its mix of parsing speed, extensibility and well constructed API. However, the one con was that it is not really being actively maintained.
To this extent, for current development, a fork is being used as a dependency (ExecutableBookProject/mistletoe#1), to allow for required/desired changes to the core code. Eventually though these need to be up-streamed, or the fork needs to become its own package (with pypi and conda distributions).
What is the best way to contact/work with the current maintainer to achieve this?
For example if a role/directive name is not found.
There are some 'TODO's in the source code for this, and some unit tests should be added.
I was looking into using this sphinx extension to handle the MyST vs. rST comparison:
https://github.com/djungelorm/sphinx-tabs
It uses a top-level directive .. tabs::
that should take no arguments. However, I found that when I use this directive in MyST with:
```{tabs}
```
I am getting an error that arguments are passed to the tabs
directive. The error shows because the directive says it takes zero arguments.
I am wondering if we are passing an empty argument, or something like this?
Here's a PR that shows off this behavior: #60
It would be good to include in the documentation some guidance for users on how to get the most out of MyST; particularly those that haven't used rST before.
Most of the directives are documented at:
So at a minimum these links should be provided. It would be ideal though to (a) show what these directives look like in MyST format, and (b) also provide some more 'user-friendly' tutorials on some specific aspects (the include
directive being one of these, as discussed in #80).
A good example of some user-friendly documentation is what the Overleaf knowledge base has for LaTex.
For a full list of available (core) roles and directives (with tested input/output), see tests/test_renderers/sphinx_roles.json
and tests/test_renderers/sphinx_directives.json
.
@jstac & @mmcky this could be something that your guys helped with ๐
Within the source code if have added numerous # TODO details...
comments to record anything known to require further work/condsideration.
It should be possible (probably with some more patches to mistletoe), to have a renderer that just walks through the AST and builds up a database that a Language Server Protocol server can use, containing e.g.
Describe the bug
Adding a line comment (starting with %
) introduces a newline in the html output.
To Reproduce
Using docs at hand:
docs/examples/wealth_dynamics_md.md
A value of 0 indicates perfect equality (corresponding the case where
%TODO: Add more here
the Lorenz curve matches the 45 degree line) and a value of 1 indicates
complete inequality (all wealth held by the richest household).
cd docs && make html
Expected behavior
Line comments would have no affect at all on the output text.
Screenshots
N/A
Environment (please complete the following information):
Additional context
N/A
I would find it helpful to have a specific syntax for page titles so that you don't end up wasting your first markdown hash on it and have to start using ##
from then on.
A few ways we could do it:
setext
header for it (e.g. a single ==========
line)What do folks think about this idea?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.