executablebooks / myst-spec Goto Github PK

View Code? Open in Web Editor NEW

13.0 9.0 6.0 3.24 MB

MyST is designed to create publication-quality, computational documents written entirely in Markdown.

Home Page: https://mystmd.org/spec

License: MIT License

JavaScript 0.87% TypeScript 99.13%

markdown sphinx specification github-flavored-markdown commonmark

myst-spec's People

Stargazers

Watchers

Forkers

datalayer-externals isabella232 carreau boisgera aktech sglyon

myst-spec's Issues

Image Node should have height.

Introduce leaf directives

background

Restructured text allow for the syntax:

.. note:: This is a short note

It is a nice, terse, one-line syntax for an admonition.

This comes at a price though; the parsing of a directive's "structure" is dependent on the type of directive.
Because the body content can possibly be on the first line, this does not work:

.. note:: I want a bespoke title

    Then some content

It is simply treated as a note with two body paragraphs
This means you then have to then use a different directive 😒:

.. admonition:: I want a bespoke title
	:class: note

    Then some content

https://github.com/executablebooks/MyST-Parse currently follows this logic with colon_fence, e.g.

:::{note} This is a body paragraph
this is a continuation of the same paragraph
:::

However, https://myst-tools.org/docs/mystjs/admonitions#admonition-titles does not

:::{note} This is the title
This is the body paragraph
:::

proposal

I propose "disambiguating" this difference, by introducing a "leaf div" to compliment the block "div"

::{note} This is a leaf div. It is interpreted as a single body paragraph,
that can continue on to multiple lines if really necessary

:::{note} this is a block div title
This is a block div content
:::

In executablebooks/mdit-py-plugins#72, I have sketched out what this would look like for a markdown-it plugin

Note with block attributes, you could also provide options to leaf divs, e.g. this

{#name .class key=value}
::{note} A note

would be equivalent to

:::{note}
:name: name
:class: class
:key: value

A note
:::

Code (and Code-Block) vs CodeCell

There are directives that define codeBlocks that have the same properties as code, and we have reduced that down to a single node in the spec. I think we need to carry another property to differentiate that!

Not quite sure what it is, maybe executable? Right now this allows thebe to target as well as changes the view of the UI (see below).

CodeBlock:

Code:

Recommend that definition parsing is top level only.

This does not change the AST def. But does have implications for efficiency of parsing.

See:
https://github.com/micromark/micromark#content-types

Myst spec should define blockBreak in addition to block

Base "myst-spec" should just parse the breaks. These can be transformed into blocks.

Implement MEP0002

We have recently accepted MEP0002 as a team:
https://mep.myst-tools.org/en/latest/meps/mep-0002/

This discusses project cross-references using markdown links, the final part of the process is to update the myst spec to include those changes. This should include at least the following changes:

bring the content from the MEP0002 and adapt it to the spec for this page: https://myst-tools.org/docs/spec/references
- Adaptions probably include removing some of the preamble and forward looking discussion in the MEP, and pulling in the relevant core pieces (various schemes, ways to cross reference things, etc.)
Updating the tests and examples (testing in at least mystjs, which uses these tests today)
Documenting the warnings (hasn't yet been done in myst-spec)
Update the AST: https://mep.myst-tools.org/en/latest/meps/mep-0002/#specification-ast
- Including scheme name, urlSource, internal, and updating the kind
Removing the %s from the template/spec and only using {number}
Deprioritizing the {eq}, {ref}, etc. roles in the spec documentation (e.g. moving to the bottom of the page)

I am happy to take a first crack at this (probably late next week), I think a lot of the pieces should be very fast to do as all the content is already there!

Meeting Notes Week of April 11, 2022

Renaming

are these now mystTargets #40
these are now mystComments #40
#36
- enumerated, enumerator
- Then good to go!

mystTargets should not be parsed with identifiers

These can come into the picture on a propagateTargets transformation.

These no longer have identifiers, targets must be propagated to define what the identifier actually is.
mystTarget removal is an option of the transform
Regarding Link Referencing: https://github.com/executablebooks/myst-spec/pull/28/files#r842494008
- Find definitions, delete them, get identifiers, turns them into link etc.
- This is an early transform
Add a stage enum to the test? There are a few levels of testing. (parse, transform, post, html)

@fwkoch merges:

References

From here

Target (this matches the identifier) of the target
Explicit (has children, and possibly an enumerator)
domain (std, math, prf, or any string)
#24 has other thoughts on this as well!

Apr. 12 Decisions:

rename crossReference identifier --> target (this is clearer!); this is explicitly different than MDAST
- The thinking is that the linkReference in MDAST and other MDAST pieces get removed early in the parsing, and the crossReference will be long-lived all the way to render time.
Add domain to the crossReference (std, math, prf, or any string) (allow for arbitrary strings!)
For kind: expand to strings

For example, math: eq role becomes: domain: math, kind: numref

TODO:

Write down some of the expected behaviours around references, and how we could possibly simplify ref/numref/eq choices for users.

April 14:

References

Links []() syntax
- HTML, mailto, ref, -> any
References that are for a page
- Then there are references that can be cross page.
- Sphinx refs, every key needs to be unique across documents.
- Sphinx doesn't tell you which reference, there isn't warnings on this.

Syntax discussions / Thinking:

[](http(s?) or mailto) - these are external links, done
[](#target) - this is any reference on this page (ref role, i.e. no eq (at the moment))
[](doc.md) - this is a document link, it must have the extension in it.
[](doc.md#target) - this is a target specifically on on another page (ref role, i.e. no eq (at the moment))
- possible extension to this is recognize targets that start with certain characters, use this domain.
[](any) - this is a fallback to look up anything in the project
{py:func}`myFunciton` -- use roles to reference into specific domains
Chris to move on this in the python implementation!
- Write up a small thing on target resolution / order for a kick start at MEP.
Franlin to improve myst-spec (small, decided upon issues!)
- Table: rowspan/colspan
- Check out: https://docutils.sourceforge.io/docs/ref/doctree.html#table
executablebooks/unified-myst#17
- Move to title!
- @rowanc1 to do this!
On implementing an extension:
- executablebooks/unified-myst#16
- ListTable
  - Would look to markdown-it-docutils
  - nestedParse, then look at the AST and do some testing

Remove YAML dependency from directive option parsing

Currently there are two ways of specifying options within a directive:

(option 1) enclosing in ----

```{name}
---
option1: value
option2: value
---
```

(option 2) prepending all lines by :

```{name}
:option1: value
:option2: value
```

Firstly, there should be one clear way of doing things, and so it would be ideal to remove one of these.

Secondly, the following logic proceeds for converting them to the "final" input options for the directive:

identifying the full block of text
parse it with YAML (and abort if the result is not a dictionary)
convert all the values back to strings
convert the values back to specific value types (and validate) by "converters" specified by the directive implementation

Clearly here the YAML value parsing is unnecessary, and worse can lead to discrepancies, such as a: becomes {"a": null} as opposed to {"a": ""}.
YAML is also quite complex (see e.g. here) and not really necessary for the more simple requirements of option parsing.

If we accept that it is the directive implementation's responsibility to do any conversions from strings,
then we simply need a syntax/format that maps string keys to string values.

There are two ways to do this that come to mind:

Something like field lists (see the rST spec, and mdit-py implementation), i.e. very similar to the current (option2)
```
```{name}
:name: x
:class: y
:other: z
```
```
(It is of note that in the field list spec, keys are parsed as Markdown, that is not what we want here though)
Block attributes before the directive (see here)
```
{#x .y other=z}
```{name}
```
```

It is of note, that the only place where we definitely need a direct mapping of options <-> JSON is in code-cell, whereby the options actually map to the metadata of a Jupyter Notebook code cell.
In this case though, code-cell can be viewed as a "pseudo-directive" and perhaps should have a different syntax, so as not to be confusing.

mystRole/mystDirective: use `name` instead of `kind`

In unified-myst I currently use name, e.g. for {name}`content`, rather than kind

This also syncs with: https://github.com/syntax-tree/mdast-util-directive/blob/4c494b18ac31f27f67b95b917aacc03207d9584f/complex-types.d.ts#L7

Any major objections to changing that here @rowanc1 @fwkoch?

Directives should be declarative

Arguments are always arguments.

The parser should be able to read directives, and that should never be bumped down to the body.

Don't care what the directive is, can always know that the options are the options.

Readme should indicate that this is in development!

Avoiding english keywords (where possible)

If we wish MyST to be a "global" spec, then I think it should strive to meet:

avoiding syntax keywords that are hard-coded English

Granted, this is very difficult, especially with roles and directives (and their options),
but when thinking about new features, syntax extension etc, I think we should always bear this in mind

One thing in particular to note, is that currently the myst-spec, and by extension mystjs, hard-codes directive names to be english.
This is not the case for docutils directives, which have a translation module https://github.com/docutils/docutils/tree/master/docutils/docutils/parsers/rst/languages, so for example you can do this:

$ echo ":::{tip}\nhi\n:::" | myst-docutils-demo --myst-enable-extensions=colon_fence
<aside class="admonition tip">
<p class="admonition-title">Tip</p>
<p>hi</p>
</aside>

$ echo ":::{astuce}\nhi\n:::" | myst-docutils-demo --language=fr --myst-enable-extensions=colon_fence
<aside class="admonition tip">
<p class="admonition-title">Astuce</p>
<p>hi</p>
</aside>

I wanted to bring this up, especially as we were recently talking about admonitions: #49

Define post transform ASTs

What does the AST look like after targets propagate down?
... after cross references are resolved?
... after adding default admonition titles?
... after directive/role nodes are removed?
... after block breaks are resolved to blocks with children?
...

Collection of pointers to reference improvements

This issue aims at keeping links to places in various repositories where people have issues or suggestions with cross-references and citations.

This thread is not aimed at proposing a solution.

Distribute examples as JSON

Then you can import them directly for testing (with https://www.typescriptlang.org/tsconfig#resolveJsonModule)

Working towards a specification for MyST

Based on the team meeting yesterday, and follow up conversations with @chrisjsewell, @mmcky, @choldgraf and @fwkoch, we are starting to put together a repository that will house the technical documentation and test cases for a MyST Spec. This will be housed at https://spec.myst.tools in the future (see executablebooks/meta#538).

To build on the MyST markup language and to make the ecosystem as rich and interoperable as possible, we need to formalize three formats:

the MyST markup syntax, to ensure MyST works as expected across languages and implementations;
the MyST abstract syntax tree (AST), to promote an ecosystem of transformations and exports to diverse formats (e.g. latex/word/html/docutils/etc.); and
suggested semantic HTML output and CSS class structure, to promote web-accessibility and interoperability of themes.

There is additional standardization on optional extensions (e.g. dollarMath, and configuration), that is likely outside the scope of this repository and more coordination between packages. Additionally, MyST as a community standard requires ways to improve and enhance these formalizations over time in our multi-stakeholder community. We will aim to start to introduce more formalization on this process over the coming months (e.g. MEPs in an extension proposals repo)!

There are a number of places where test-cases and documentation already live. @fwkoch @rowanc1 and @chrisjsewell will be doing some initial work to pull these together to get a first pass spec (CommonMark + GFM + Base directives/roles + Admonitions). There is existing work is in:

MDAST exports of CommonMark: https://github.com/chrisjsewell/myst-spec
Test-cases for MDAST/html: https://github.com/executablebooks/mystjs
Documentation (mostly narrative) in JupyterBook/Myst-Parser
Test cases for latex export: https://github.com/curvenote/schema

Goals:

Documentation of the spec, choices and properties of both the AST and MyST syntax
- These are based on extensions to MDAST, GFM, and CommonMark
Single built json file of all test cases with links back to the docs
JSON Schema
Typescript types
Limited duplication on the source of truth for examples, properties and test-cases.

Next steps

For the next few weeks @chrisjsewell will be working on the CommonMark/GFM side and documentation, as well as familiarizing with Mdast work done in mystjs, and @rowanc1 and @fwkoch will be working on the MyST side bringing these over to json schema and formalizing the mdast naming and properties.

🚀🚀🚀

Whiteboard from our call today:

Container node `kinds` split into separate node types?

Currently, containers are a node type, with kind, which specifies the type of container (e.g. figure, table, etc), and typed children corresponding to the kind (e.g. figure -> image + figure caption).

This doesn't follow the pattern of other nodes: we shouldn't need to look at kind to resolve/validate the children. If we want to be strict about children, each container should probably be its own node type, e.g. ContainerFigure. However, if we want it to be easy to add new kinds, we could keep the Container node, allow kind to be more flexible, and make children simply FlowContent. This would then allow authors to add new numbered kinds in their papers without extending the spec, e.g. Lemma 1.

crossReference kind

This is quite limited in the spec to: eq, numref, ref, doc https://github.com/executablebooks/myst-spec/blob/main/schema/references.schema.json#L37 (even doc is missing in this schema file) - this matches the jupyterbook myst documentation https://jupyterbook.org/content/references.html#reference-figures

This feels a little mixed up: ref and numref don't specify the target type, it may be a header, figure, table, etc. On the other hand eq and doc specify the target type (but don't differentiate between reference or numbered reference).

Should these kinds be specific in their target type? math table figure etc? Or should they simply be references to be resolved later? There are pros/cons each way - e.g. the former requires more validation but allows knowledge of the target type without resolving the reference every time.

We can also introduce kind + domain, like sphinx does here: https://www.sphinx-doc.org/en/master/usage/restructuredtext/domains.html#the-math-domain

Container Extend or Change "kind"

There is some flexibility to extend the container kind, there are also meaningful

Currently: kind*: string (“figure” | “table”) - kind of container contents

Other things to consider:

code
Should also be: string (allows anything to be a container kind)
How does this work with domains? @chrisjsewell to do some thinking/investigation on this.

Incorporate global/page level configuration into the spec

Currently, there are numerous configuration formats/schemas for how MyST is parsed/rendered across implementations:

global:

myst-parser (+ sphinx) conf.py: https://myst-parser.readthedocs.io/en/latest/configuration.html#global-configuration
myst-nb conf.py: https://myst-nb.readthedocs.io/en/latest/configuration.html#global-configuration
jupyter-book _conf.yml: https://jupyterbook.org/en/stable/customize/config.html
mystjs myst.yml: https://js.myst.tools/guide/frontmatter#in-a-myst-yml-file
vscode myst plugin setting.json (to set markdown-it syntax plugins): https://github.com/executablebooks/myst-vs-code/blob/c5b209c77196a46152f545ff366d8b729f7b9c84/package.json#L125
jupyterlab-myst: setting of markdown-it syntax plugins via jupyterlab-markup config

page level:

For consistency across tools, it would be ideal to have a defined "core" JSON schema for such configuration, then potentially also a standard file name/format for global configuration (although this may not be possible for all implementations).

Particularly for page-level metadata, namespacing should definitely be considered, and also potentially defined way for extensions to add configuration.

Standardisation of common attributes (classes, names)

As specified here: https://docutils.sourceforge.io/docs/ref/doctree.html#common-attributes, there are some common attributes associated with all docutils nodes, and this should essentially be the same here.

As an example, here:

myst-spec/docs/examples/directives.admonitions.yml

Line 122 in 35f8097

class: tip

This should be classes: ['tip']

mystDirective Improvements

args should be list of strings, they are currently a single string!
capture bodyOffset, this is the number of lines into the body for the directive value

Collection of pointers for Admonitions improvements

This issue is for collecting pointers to challenges that users have in interacting with admonitions, especially around setting a custom title. Some similar themes are cropping up, as well as pointers to other tools in the community:

There has also been some discussion about making arguments to directives more deterministic to simplify parsing, and reduce exceptions to the directive rules.

Add attributes syntax

Relevant links:

https://johnmacfarlane.net/beyond-markdown.html#attributes
jgm/pandoc#684
https://www.npmjs.com/package/markdown-it-attrs
For micromark, there is essentially already code for parsing them here: https://github.com/micromark/micromark-extension-directive/blob/main/dev/lib/factory-attributes.js

Attributes are a common Markdown syntax extension, e.g. {#id .class value=key}, which I think might be useful in MyST.

At a block-level, I think this would be relatively simple to implement in https://github.com/executablebooks/MyST-Parser, since they would essentially follow the same rules/logic as targets, e.g.

(target)=
# header

would be equivalent to

{#target}
# header

But then extra logic could also be added, to propagate classes as well as identifiers (for now we would ignore key/values), e.g.

{#target .class1 .class2}
# header

Would relate to HTML like

<h1 id="target" class="class1 class2">header</h1>

another place they could be added, is for defining role options, e.g. {name}`content`{#id .class key=value}

The problem here, though, is that docutils roles syntax does not support options.
Although it is of note, that the role functions themselves, do actually accept an options key-word: https://github.com/live-clones/docutils/blob/48bb76093b4ba83654b2f2c86e7c52c4bb39c63b/docutils/docutils/parsers/rst/roles.py#L197-L211

Collection of pointers to directives without content

There is some ability in other markup languages to supply directives without content, MyST right now is a bit awkward and confuses some folks:

Other implementations:

https://www.npmjs.com/package/remark-directive (leafDirective)

This issue is just for collecting pointers to various places where we might want to think about improvements to the spec in the future!

More flexible code blocks.

Context

As You may be aware I'm trying to adopt the Myst AST in papyri, and I'm missing some flexibility in code blocks.

In particular before moving to Myst AST my code block were more structured, and I was able to have links for each token, so that say clicking on array in a code block that contain np.array to open the relevant docs.

Proposal

Do you believe it would make send to have code-block be able to contain a list of children that are inline items instead of a value ? The other things this would allow it to potentially highlight individual token differently, like make some bold and/or and help the syntax highlighting by precomputing it for each token.

Tasks and updates

No response

Rethink folder structure

Move non-directive content out of directives/
Move non-block content out of blocks/

`table` node spec changes / enhancements

The mdast table specification is very simplistic and too limited for publication-quality tables. While the goal need not be to support every edge case supported by html tables, there are a few small improvements that could help a lot.

mdast spec

See https://github.com/syntax-tree/mdast#table and https://github.com/syntax-tree/mdast-util-gfm-table

This specification assumes (1) first row of the table is header, everything else is standard, (2) rows and columns all have equal numbers of cells (no spans), and (3) cell alignment is consistent for the column

current myst-spec

See https://executablebooks.github.io/myst-spec/features/tables.html#tables

There are already a few changes to tables in myst-spec to allow more flexibility, with some similarities to HTML:

header is a boolean field on each cell - header cells can be anywhere, not just the first row
align is a field on each cell as well - left/right/center alignment is not necessarily constant on the entire column
align on the table is a single value which refers to alignment of the table itself (I think this should be removed, see below)

Proposed changes

Remove align from table node: this is confusing since it conflicts with mdast table alignment. We should only have align on cells (for the cell content) and the parent container (for alignment of the table itself)
Add colspan and rowspan to cells to allow them to span multiple rows / columns
Add a way to specify column width (and row height?)

Whitespace in nested directives?

This shouldn't be in the spec tests at least?

Register MYST as an official markdown variant with IANA

IANA accepted an official registration for the text/markdown mimetype here: https://www.iana.org/assignments/media-types/text/markdown

Section 6.1 of the RFC specified a IANA based registry for markdown variants that is established here: https://www.iana.org/assignments/markdown-variants/markdown-variants.xhtml

The list of variants registered is woefully lacking but the registration mechanism allow formalize which mimetype variant parameter should be used to identify a flavor of markdown.

i.e. text/markdown;variant=MYST or text/markdown;variant=GFM

Even having an open draft RFC (probably for years) might allow us to establish and register the MYST parameter at least somewhat formally.

It is also worth noting that pandoc additionally provides for more parameters, that can specify which extensions/directives are loaded.

add log node

docutils has the concept of a system_message node, which can be inserted into the AST at points of failure.

As an example, in unified-myst, this is what is currently output for an error in the table directive:

```{table} This is a caption
:name: test
:class: a

Hallo
```

type: root
children:
  - type: mystDirective
    name: table
    args:
      - This is a caption
    options:
      name: test
      class:
        - a
    value: Hallo
    bodyOffset: 4
    children:
      - type: log
        message: >-
          Error parsing content block for the "table" directive: exactly one
          table expected.
        level: error

Defining mdast for citations

Currently doing some investigation on citations and thought I would post it here as it would be great to get on the same page for the data-structures for citations in mdast (I think there is more thought probably on the myst-syntax, do we adopt [@key] pandoc style citations, etc.). I would love to be aiming for the same place for the mdast data structures as the other syntax conversations evolve.

For a piece of technical content, the best practices for in-text citations are probably latex/natbib and pandoc citations which are defined here:

I am think the following mdast data-structures might capture everything:

type CiteGroup = {
  type: 'citeGroup'
  kind: 'narrative' | 'parenthetical'; // 'citet' vs 'citep'
  children: Cite[]
}

type Cite = {
  type: 'cite'
  identifier: string
  label: string
  expand: boolean // this is the * in natbib, expands authors, false by default
  partial: 'author' | 'year'
  prefix: string // e.g. "see" or "e.g."
  suffix: string // e.g. "99 years later" or something
  locator: string // e.g. "chap. 2", joined with a comma -- defined by CSL locale (pp. fig. etc.)
  // alias: string // use "Paper 1", maybe do this later?
}

I think this works pretty well and can fit with the {cite:t}`jon22` syntax we already have defined, but maybe in the future there is some way to give roles more data:
For example: {cite:p}[prefix="see", locator="chap. 2"]`jon22`
would yield: (see Jones et al., 2022, chap. 2)
Or maybe there is a specialized way to do this with [see @jon22, chap. 2] (see pandoc)

For multiple citations, the citeGroup would never be a directive or be in the markup, (i.e. [@key1; @key2] or {cite:p}`key1; key2`), but I think that the AST data structure is better represented by multiple nodes, one holding the group (parenthetical) information, this also means UIs can open groups of citations in a list (e.g. see distill/elife as good examples of this UI).

Both cite and citeGroup would be flow content, so the equivalent of a "citet" in latex is just a cite node in a paragraph (@key1 in pandoc style).

Some questions:

what is the best name for citeGroup?
~~should we follow kind or have some different flags like parenthetical? I suggested kind because that seemed easier to expand in the future if we add num or alt etc.~~ (previously suggested a single cite node, splitting into group solves this).
narrative and parenthetical nomenclature comes from here

Existing implementations:

similar data structure here: https://github.com/timlrx/rehype-citation/blob/main/src/parse-citation.js#L139

Would be curious on your thoughts @chrisjsewell and @fwkoch (maybe @mmcky as well?)!

Wrapping Directives & Roles

Add processed flag to directive.

Directives wrap other FlowContent.

Define a transform --> This should lift all children out of the directives.

Drop indented code blocks

As discussed in https://johnmacfarlane.net/beyond-markdown.html (the creator of Markdown)

There should be one clear way of doing things, fenced code block are already the primary way of adding a code block
explicit is better than implicit, fenced code blocks provide a much more explicit indication of the code block
indented code block enforce a lot of technical limitations for the syntax/parser

As an example, you can look at https://markdown-it.github.io/ compared to https://djot.net/playground/ (a proposed successor to commonmark).
In djot it is perfectly fine to indent syntax blocks with any number of spaces, without having it become a code block

::::admonition

     :::admonition

     s

     :::

::::

This would be a breaking change from CommonMark, but one that should be possible

Standardised identifier for myst documents

It may be useful to have a standard way of signalling that a document is myst, and even perhaps the myst-spec version it is compliant with, particularly if using the (non-specific) .md file extension.

For example, in the top-matter

---
format: myst
format-version: "0.1.0"
---

executablebooks / myst-spec Goto Github PK

myst-spec's People

Stargazers

Watchers

Forkers

myst-spec's Issues

background

proposal

Renaming

mystTargets should not be parsed with identifiers

References

April 14:

References

Syntax discussions / Thinking:

Goals:

Next steps

Context

Proposal

Tasks and updates

mdast spec

current myst-spec

Proposed changes

Some questions:

Existing implementations:

Recommend Projects

Recommend Topics

Recommend Org