Git Product home page Git Product logo

Comments (23)

chrisjsewell avatar chrisjsewell commented on May 24, 2024 4

Behold the first Markdown directive parser!

See the bottom of https://github.com/chrisjsewell/mistletoe/blob/myst/test.ipynb

Current format is:

````{note}
abcd *abc* [a](link)

```{warning}
xyz
```

````

which is transformed to docutils AST:

<document source="">
    <note>
        <paragraph>
            abcd 
            <emphasis>
                abc
             
            <pending_xref refdomain="True" refexplicit="True" reftarget="link" reftype="any" refwarn="True">
                <reference refuri="link">
                    a
        <warning>
            <paragraph>
                xyz

FYI for all the tests (which are extensive) see: https://travis-ci.org/chrisjsewell/mistletoe

from myst-parser.

choldgraf avatar choldgraf commented on May 24, 2024 4

from myst-parser.

jstac avatar jstac commented on May 24, 2024 2

Very cool. I see you're picking up line numbers corresponding to cells in the AST. So ticking all the boxes already in terms of what was needed...

from myst-parser.

akhmerov avatar akhmerov commented on May 24, 2024 1

This looks totally awesome!

A quick question: should it be possible to configure default directive/role akin to sphinx? These could use blank {} for example.

from myst-parser.

jstac avatar jstac commented on May 24, 2024 1

You've worked hard!!!

I personally find the YAML syntax far more readable than {name key=value} when there are multiple options. But opinion will be split on that point.

Regarding the YAML syntax, could you do

```{image} path/to/image
---
height: 20
width: 40
---
Here is a *caption*.
```

That seems a bit more symmetric --- and hence easy to remember.

from myst-parser.

chrisjsewell avatar chrisjsewell commented on May 24, 2024 1

Yeh that could also work ta.

Implemented roles and math as well now (no option key/val parsing yet). It actually could end up being more powerful than RST in some respects, because you can nest inline elements, which isn't possible in RST:

````{note}
abcd *abc* [a](link)

```{warning}
xyz
```

````

```{figure}+ path/to/image
height: 40
---
Caption
```

**{code}`` a=1{`} ``**

**$a=1$**

$$b=2$$

`` a=1{`} ``

goes to:

<document source="">
    <note>
        <paragraph>
            abcd 
            <emphasis>
                abc
             
            <pending_xref refdomain="True" refexplicit="True" reftarget="link" reftype="any" refwarn="True">
                <reference refuri="link">
                    a
        <warning>
            <paragraph>
                xyz
    <figure>
        <image height="40" uri="path/to/image">
        <caption>
            Caption
    <paragraph>
        <strong>
            <literal classes="code">
                a=1{`}
    <paragraph>
        <strong>
            <math>
                a=1
    <paragraph>
        <math_block xml:space="preserve">
            b=2
    <paragraph>
        <literal>
            a=1{`}

from myst-parser.

chrisjsewell avatar chrisjsewell commented on May 24, 2024 1

@choldgraf @mmcky @jstac @AakashGfude I've added the Sphinx Parser πŸ˜ƒ

You just install my fork of mistletoe (pip install -e .[sphinx,testing], on the myst branch), and add extensions = ["mistletoe"] to your conf.py and it will pick up all the .md files.

Note if you look in myst/test/test_sphinx/test_sphinx_builds.py, I have set up automated testing of sphinx builds, for folders in myst/test/test_sphinx/sourcedirs. So if you run that with pytest it will actually generate the _build folders (comment out the remove_sphinx_builds fixture, so that they are not removed at the end of the test).

from myst-parser.

choldgraf avatar choldgraf commented on May 24, 2024

A suggestion from John Macfarlane:

Sincewe've been talking about dedicated syntax that would map on to a directive, but wouldn't be confusable with code blocks, use what RMarkdown and Pandoc do and use {} for "special" inline or block literals, something like:

```{mydirective}
This is
my special section
literal
```

We could assume that any code blocks that had curly brackets were block-level directives, and reference the first element in the {} against our list of directives. If it doesn't exist, fall back to assuming it is just an attribute.

This would also be fairly parsable in other markdown parsers, since the {} pattern is quite common, and we wouldn't introduce any extra syntax. Also we could then still use

```language
This is
my language syntax
```

from myst-parser.

choldgraf avatar choldgraf commented on May 24, 2024

also - just a note for @rowanc1 here, I feel like if we end up using Sphinx and have a directive / role syntax for markdown, then maybe that's a place where components.ink pieces could be inserted into content at build time by writing a role/directive that injects the proper JS and HTML into the page (maybe as a separate sphinx extension?) curious what you think about that...

from myst-parser.

rowanc1 avatar rowanc1 commented on May 24, 2024

My interpretation of how this would apply is:

Some happy text.
```{ink-scope name=scope1}
``{ink-var name=x value=2}
My variable $x=$``{ink-display name=x}.

```

I am putting the scope in there as an example. It gets a bit messy, especially if you have multiple block directives. For example, styling an input as a callout box.

A couple of questions:

  • Would indentation or raw html input be allowed?
  • Any thoughts on empty content for inline elements?

For example: indentation

{ink-scope name=scope1}:
    ``{ink-var name=x value=2}
    {ink-callout kind=info}:
        Variable $x=$``{ink-display name=x}.

For example: html

<ink-scope name="scope1">
    ``{ink-var name=x value=2}
    Variable $x=$``{ink-display name=x}.
</ink-scope>

And that would either be ignored in other representations - or perhaps if you have an intermediate AST then it could last until there? I liked the comment you posted about the C markdown parser coming to a common xml representation that can be acted upon.

from myst-parser.

chrisjsewell avatar chrisjsewell commented on May 24, 2024

So I've added testing against most of the docutils directives (see here), and added parsing of arguments, e.g.

```{image} path/to/image
```

The last part is to parse options. It has been mentioned about parsing like ```{name key=value}, but a major problem with this is it would break the current code fence regex, which looks for a string with no spaces for the language component (I also don't think it looks very nice).

I think the YAML block is the best way and I was thinking, for efficient parsing, it would be good to signify in the first line if the block contains options. Something like:
(note the +)

```{image}+ path/to/image
height: 20
width: 40
---
Here is a *caption*.
```

Then it would read everything as YAML until either a --- is found or the end of the block is reached.

from myst-parser.

chrisjsewell avatar chrisjsewell commented on May 24, 2024

@choldgraf FYI front-matter does start with --- (see here), so it makes sense in the directives to also do this, which I've now changed to:

```{name} argument text
---
option: 1
---
content with *markdown* **syntax**
```

from myst-parser.

jstac avatar jstac commented on May 24, 2024

Love your work @chrisjsewell. Outstanding.

from myst-parser.

choldgraf avatar choldgraf commented on May 24, 2024

Duuude - it works! So cool! Tonight I'll try making a little sphinx documentation site in your myst branch using the content that @AakashGfude put together...I am curious how it'll look!

from myst-parser.

jorisvandenbossche avatar jorisvandenbossche commented on May 24, 2024

(Chris pointed me to those discussions; I am an extensive sphinx user due to being one of the maintainers of the pandas docs, which is a quite big sphinx site. And I am excited about the issues you are tackling here: I love sphinx, but I also love to see improvements to it ;))

One thing I am wondering: to what extent are you already set on the syntax for roles and directives?

It seems you are now taking the syntax for code (both for inline and blocks) with adding a role/directive name in the {}.

This is closer to existing markdown syntax, so I can imagine this is easier to extend an existing parser for this? (and it's also closer to things in the existing standard / pandoc, which are very good reasons)

But thinking about some usecases for roles in the documentation projects I am working with, and I think something along the lines of the generic directives syntax proposal might be easier to work with (as an end user):

Small example rst snippet:

We can link to :meth:`pandas.DataFrame` in the API reference
or to another section :ref:`here <label>` (:issue:`1234`).

How it might look like based on the role examples above (the details might not be correct):

We can link to `pandas.DataFrame`{meth} in the API reference
or to another section `here`{ref, id=label} (`1234`{issue}).

And how it might look like with the linked proposal:

We can link to :meth[pandas.DataFrame] in the API reference
or to another section :ref[here]{label} (:issue[1234]).

Personally, I think the third snippet "looks" better than the second (but that's very subjective of course. Maybe that's because I am so used to having colons in rst .. ;-))
But maybe a slightly more objective argument: I think having the role name come first, instead of in the end, improves readability. And it also gives more contrast with actual code snippets.

from myst-parser.

chrisjsewell avatar chrisjsewell commented on May 24, 2024

I think having the role name come first, instead of in the end, improves readability.

Yep that how it has now been implemented, as {name}`content`. I guess the issue with using square brackets, is that they are not degradable when using a standard Markdown parser; with backticks the content will remain raw text, whereas in brackets it will be treated as Markdown.

Also with colons, this might clash with the potential syntax extension of field lists . For example, if you want to be able to use the :orphan: metadata token.

from myst-parser.

stefanv avatar stefanv commented on May 24, 2024

This is great stuff, thanks @chrisjsewell!

Wondering about that yaml header: if you use two --- lines, that takes up the majority of space in the fenced block. Can you think of any risk in removing the first instance? I couldn't immediately see a downside.

```{name} argument text
---
--- and any such arbitrary text
​```

Quickly surveying the landscape: in pandoc, the yaml blocks are surrounded by --- and ... respectively (no idea why); Hugo uses matching ---; org-mode uses #+VARIABLE_NAME: value.

from myst-parser.

choldgraf avatar choldgraf commented on May 24, 2024

@stefanv I believe the main reason for this is because otherwise the regex search can become really expensive.

Imagine that you have lots of code blocks with parameters inside. Because --- is also valid markdown, you need to figure out if the --- is there because it is the break between YAML config and the content, or if it is just regular markdown ---. So you have to do some more complex search to figure it out.

If you know there's a character that defines "this is the start of config" then it becomes much easier, so adding a starting --- makes this trivial to figure out, at the cost of extra verbosity.

After using it a bit, I think a way we could get around this issue is to also support some kind of arguments in the first line, and suggest that people use this only if they have a very small number of arguments. Then if the number of args is non-trivial (maybe > 2 or so) they can use the YAML, and if they number of args is small they can keep it close to a one-liner.

from myst-parser.

choldgraf avatar choldgraf commented on May 24, 2024

Another option would be to denote that arguments section with a special character on each line. For example, "parameters can be provided by starting a line with : at the beginning of the content block. E.g.:

```{directive}
:key: val
:key2: val2
:arg3:
:key4: |
  Val 4
Content
```

That would have the benefit of even more parity w/ rST. For a very short paragraph then you'd have something like:

```{code-block} python
:linenos:
My content
```

from myst-parser.

chrisjsewell avatar chrisjsewell commented on May 24, 2024

Yeh as I've noted in #24, I think I will add in a block token for docutils field list syntax, which I didn't actually realise before was part of the RST spec. Then you should be able to use:

```{name} arguments
:option: a
:non-kwarg:

Content
```

from myst-parser.

choldgraf avatar choldgraf commented on May 24, 2024

@chrisjsewell is the idea that this would replace the YAML parsing? Or just be an option? I quite like the YAML syntax. Instead of allowing full rST syntax could we just say that if the block starts with lines that begin with : then those will be parsed as YAML lines? (AKA it is just a shorthand to avoid requiring the --- fences?)

from myst-parser.

chrisjsewell avatar chrisjsewell commented on May 24, 2024

Yeh I don’t think I’m going to add actual parsing for these field lists any more; in favour of just using YAML. But yeh for directives you could maybe include that alternative approach.

from myst-parser.

choldgraf avatar choldgraf commented on May 24, 2024

I think it'd be helpful to include the : short-hand for metadata. That way there are basically two options for YAML metadata, depending on whether you care about conciseness. As an example we could recommend:

If there are <= 2 configuration lines:

```{directivename}
:key: true
:key2: config2
```

If there are >=2 configuration lines:

```{directivename}
---
key: true
key2: config2
key3: config3
key4: |
  Multi line
  config
---
```

Either would be valid, but for cases where the directive just needs one or two config
options (which is common) I think supporting : could keep things tighter. It would help avoid the case where there are more "configuration fence" lines than actual configuration options.

from myst-parser.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.