Git Product home page Git Product logo

myst-spec's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

myst-spec's Issues

Introduce leaf directives

background

Restructured text allow for the syntax:

.. note:: This is a short note

It is a nice, terse, one-line syntax for an admonition.

This comes at a price though; the parsing of a directive's "structure" is dependent on the type of directive.
Because the body content can possibly be on the first line, this does not work:

.. note:: I want a bespoke title

    Then some content

It is simply treated as a note with two body paragraphs
This means you then have to then use a different directive πŸ˜’:

.. admonition:: I want a bespoke title
	:class: note

    Then some content

https://github.com/executablebooks/MyST-Parse currently follows this logic with colon_fence, e.g.

:::{note} This is a body paragraph
this is a continuation of the same paragraph
:::

However, https://myst-tools.org/docs/mystjs/admonitions#admonition-titles does not

:::{note} This is the title
This is the body paragraph
:::

proposal

I propose "disambiguating" this difference, by introducing a "leaf div" to compliment the block "div"

::{note} This is a leaf div. It is interpreted as a single body paragraph,
that can continue on to multiple lines if really necessary

:::{note} this is a block div title
This is a block div content
:::

In executablebooks/mdit-py-plugins#72, I have sketched out what this would look like for a markdown-it plugin

Note with block attributes, you could also provide options to leaf divs, e.g. this

{#name .class key=value}
::{note} A note

would be equivalent to

:::{note}
:name: name
:class: class
:key: value

A note
:::

Code (and Code-Block) vs CodeCell

There are directives that define codeBlocks that have the same properties as code, and we have reduced that down to a single node in the spec. I think we need to carry another property to differentiate that!

Not quite sure what it is, maybe executable? Right now this allows thebe to target as well as changes the view of the UI (see below).

CodeBlock:
image

Code:
image

Implement MEP0002

We have recently accepted MEP0002 as a team:
https://mep.myst-tools.org/en/latest/meps/mep-0002/

This discusses project cross-references using markdown links, the final part of the process is to update the myst spec to include those changes. This should include at least the following changes:

  • bring the content from the MEP0002 and adapt it to the spec for this page: https://myst-tools.org/docs/spec/references
    • Adaptions probably include removing some of the preamble and forward looking discussion in the MEP, and pulling in the relevant core pieces (various schemes, ways to cross reference things, etc.)
  • Updating the tests and examples (testing in at least mystjs, which uses these tests today)
  • Documenting the warnings (hasn't yet been done in myst-spec)
  • Update the AST: https://mep.myst-tools.org/en/latest/meps/mep-0002/#specification-ast
    • Including scheme name, urlSource, internal, and updating the kind
  • Removing the %s from the template/spec and only using {number}
  • Deprioritizing the {eq}, {ref}, etc. roles in the spec documentation (e.g. moving to the bottom of the page)

I am happy to take a first crack at this (probably late next week), I think a lot of the pieces should be very fast to do as all the content is already there!

Meeting Notes Week of April 11, 2022

Renaming

  • are these now mystTargets #40
  • these are now mystComments #40
  • #36
    • enumerated, enumerator
    • Then good to go!

mystTargets should not be parsed with identifiers

These can come into the picture on a propagateTargets transformation.

  • These no longer have identifiers, targets must be propagated to define what the identifier actually is.

  • mystTarget removal is an option of the transform

  • Regarding Link Referencing: https://github.com/executablebooks/myst-spec/pull/28/files#r842494008

    • Find definitions, delete them, get identifiers, turns them into link etc.
    • This is an early transform
  • Add a stage enum to the test? There are a few levels of testing. (parse, transform, post, html)

@fwkoch merges:

References

From here

  • Target (this matches the identifier) of the target
  • Explicit (has children, and possibly an enumerator)
  • domain (std, math, prf, or any string)
  • #24 has other thoughts on this as well!

Apr. 12 Decisions:

  • rename crossReference identifier --> target (this is clearer!); this is explicitly different than MDAST
    • The thinking is that the linkReference in MDAST and other MDAST pieces get removed early in the parsing, and the crossReference will be long-lived all the way to render time.
  • Add domain to the crossReference (std, math, prf, or any string) (allow for arbitrary strings!)
  • For kind: expand to strings

For example, math: eq role becomes: domain: math, kind: numref

TODO:

  • Write down some of the expected behaviours around references, and how we could possibly simplify ref/numref/eq choices for users.

April 14:

References

  • Links []() syntax
    • HTML, mailto, ref, -> any
  • References that are for a page
    • Then there are references that can be cross page.
    • Sphinx refs, every key needs to be unique across documents.
    • Sphinx doesn't tell you which reference, there isn't warnings on this.

Syntax discussions / Thinking:

  • [](http(s?) or mailto) - these are external links, done

  • [](#target) - this is any reference on this page (ref role, i.e. no eq (at the moment))

  • [](doc.md) - this is a document link, it must have the extension in it.

  • [](doc.md#target) - this is a target specifically on on another page (ref role, i.e. no eq (at the moment))

    • possible extension to this is recognize targets that start with certain characters, use this domain.
  • [](any) - this is a fallback to look up anything in the project

  • {py:func}`myFunciton` -- use roles to reference into specific domains

  • Chris to move on this in the python implementation!

    • Write up a small thing on target resolution / order for a kick start at MEP.
  • Franlin to improve myst-spec (small, decided upon issues!)

  • executablebooks/unified-myst#17

  • On implementing an extension:

Remove YAML dependency from directive option parsing

Currently there are two ways of specifying options within a directive:

(option 1) enclosing in ----

```{name}
---
option1: value
option2: value
---
```

(option 2) prepending all lines by :

```{name}
:option1: value
:option2: value
```

Firstly, there should be one clear way of doing things, and so it would be ideal to remove one of these.

Secondly, the following logic proceeds for converting them to the "final" input options for the directive:

  1. identifying the full block of text
  2. parse it with YAML (and abort if the result is not a dictionary)
  3. convert all the values back to strings
  4. convert the values back to specific value types (and validate) by "converters" specified by the directive implementation

Clearly here the YAML value parsing is unnecessary, and worse can lead to discrepancies, such as a: becomes {"a": null} as opposed to {"a": ""}.
YAML is also quite complex (see e.g. here) and not really necessary for the more simple requirements of option parsing.

If we accept that it is the directive implementation's responsibility to do any conversions from strings,
then we simply need a syntax/format that maps string keys to string values.

There are two ways to do this that come to mind:

  1. Something like field lists (see the rST spec, and mdit-py implementation), i.e. very similar to the current (option2)

    ```{name}
    :name: x
    :class: y
    :other: z
    ```
    

    (It is of note that in the field list spec, keys are parsed as Markdown, that is not what we want here though)

  2. Block attributes before the directive (see here)

    {#x .y other=z}
    ```{name}
    ```
    

It is of note, that the only place where we definitely need a direct mapping of options <-> JSON is in code-cell, whereby the options actually map to the metadata of a Jupyter Notebook code cell.
In this case though, code-cell can be viewed as a "pseudo-directive" and perhaps should have a different syntax, so as not to be confusing.

Directives should be declarative

Arguments are always arguments.

The parser should be able to read directives, and that should never be bumped down to the body.

Don't care what the directive is, can always know that the options are the options.

Avoiding english keywords (where possible)

If we wish MyST to be a "global" spec, then I think it should strive to meet:

avoiding syntax keywords that are hard-coded English

Granted, this is very difficult, especially with roles and directives (and their options),
but when thinking about new features, syntax extension etc, I think we should always bear this in mind

One thing in particular to note, is that currently the myst-spec, and by extension mystjs, hard-codes directive names to be english.
This is not the case for docutils directives, which have a translation module https://github.com/docutils/docutils/tree/master/docutils/docutils/parsers/rst/languages, so for example you can do this:

$ echo ":::{tip}\nhi\n:::" | myst-docutils-demo --myst-enable-extensions=colon_fence
<aside class="admonition tip">
<p class="admonition-title">Tip</p>
<p>hi</p>
</aside>

$ echo ":::{astuce}\nhi\n:::" | myst-docutils-demo --language=fr --myst-enable-extensions=colon_fence
<aside class="admonition tip">
<p class="admonition-title">Astuce</p>
<p>hi</p>
</aside>

I wanted to bring this up, especially as we were recently talking about admonitions: #49

Define post transform ASTs

  • What does the AST look like after targets propagate down?
  • ... after cross references are resolved?
  • ... after adding default admonition titles?
  • ... after directive/role nodes are removed?
  • ... after block breaks are resolved to blocks with children?
  • ...

Working towards a specification for MyST

Based on the team meeting yesterday, and follow up conversations with @chrisjsewell, @mmcky, @choldgraf and @fwkoch, we are starting to put together a repository that will house the technical documentation and test cases for a MyST Spec. This will be housed at https://spec.myst.tools in the future (see executablebooks/meta#538).

To build on the MyST markup language and to make the ecosystem as rich and interoperable as possible, we need to formalize three formats:

  1. the MyST markup syntax, to ensure MyST works as expected across languages and implementations;
  2. the MyST abstract syntax tree (AST), to promote an ecosystem of transformations and exports to diverse formats (e.g. latex/word/html/docutils/etc.); and
  3. suggested semantic HTML output and CSS class structure, to promote web-accessibility and interoperability of themes.

There is additional standardization on optional extensions (e.g. dollarMath, and configuration), that is likely outside the scope of this repository and more coordination between packages. Additionally, MyST as a community standard requires ways to improve and enhance these formalizations over time in our multi-stakeholder community. We will aim to start to introduce more formalization on this process over the coming months (e.g. MEPs in an extension proposals repo)!


There are a number of places where test-cases and documentation already live. @fwkoch @rowanc1 and @chrisjsewell will be doing some initial work to pull these together to get a first pass spec (CommonMark + GFM + Base directives/roles + Admonitions). There is existing work is in:


Goals:

  • Documentation of the spec, choices and properties of both the AST and MyST syntax
    • These are based on extensions to MDAST, GFM, and CommonMark
  • Single built json file of all test cases with links back to the docs
  • JSON Schema
  • Typescript types
  • Limited duplication on the source of truth for examples, properties and test-cases.

Next steps

For the next few weeks @chrisjsewell will be working on the CommonMark/GFM side and documentation, as well as familiarizing with Mdast work done in mystjs, and @rowanc1 and @fwkoch will be working on the MyST side bringing these over to json schema and formalizing the mdast naming and properties.

πŸš€πŸš€πŸš€


Whiteboard from our call today:
image

Container node `kinds` split into separate node types?

Currently, containers are a node type, with kind, which specifies the type of container (e.g. figure, table, etc), and typed children corresponding to the kind (e.g. figure -> image + figure caption).

This doesn't follow the pattern of other nodes: we shouldn't need to look at kind to resolve/validate the children. If we want to be strict about children, each container should probably be its own node type, e.g. ContainerFigure. However, if we want it to be easy to add new kinds, we could keep the Container node, allow kind to be more flexible, and make children simply FlowContent. This would then allow authors to add new numbered kinds in their papers without extending the spec, e.g. Lemma 1.

crossReference kind

This is quite limited in the spec to: eq, numref, ref, doc https://github.com/executablebooks/myst-spec/blob/main/schema/references.schema.json#L37 (even doc is missing in this schema file) - this matches the jupyterbook myst documentation https://jupyterbook.org/content/references.html#reference-figures

This feels a little mixed up: ref and numref don't specify the target type, it may be a header, figure, table, etc. On the other hand eq and doc specify the target type (but don't differentiate between reference or numbered reference).

Should these kinds be specific in their target type? math table figure etc? Or should they simply be references to be resolved later? There are pros/cons each way - e.g. the former requires more validation but allows knowledge of the target type without resolving the reference every time.

We can also introduce kind + domain, like sphinx does here: https://www.sphinx-doc.org/en/master/usage/restructuredtext/domains.html#the-math-domain

Container Extend or Change "kind"

There is some flexibility to extend the container kind, there are also meaningful

Currently: kind*: string (β€œfigure” | β€œtable”) - kind of container contents

Other things to consider:

  • code
  • Should also be: string (allows anything to be a container kind)
  • How does this work with domains? @chrisjsewell to do some thinking/investigation on this.

Incorporate global/page level configuration into the spec

Currently, there are numerous configuration formats/schemas for how MyST is parsed/rendered across implementations:

global:

page level:

For consistency across tools, it would be ideal to have a defined "core" JSON schema for such configuration, then potentially also a standard file name/format for global configuration (although this may not be possible for all implementations).

Particularly for page-level metadata, namespacing should definitely be considered, and also potentially defined way for extensions to add configuration.

mystDirective Improvements

  • args should be list of strings, they are currently a single string!
  • capture bodyOffset, this is the number of lines into the body for the directive value

Collection of pointers for Admonitions improvements

This issue is for collecting pointers to challenges that users have in interacting with admonitions, especially around setting a custom title. Some similar themes are cropping up, as well as pointers to other tools in the community:

There has also been some discussion about making arguments to directives more deterministic to simplify parsing, and reduce exceptions to the directive rules.

Add attributes syntax

Relevant links:

Attributes are a common Markdown syntax extension, e.g. {#id .class value=key}, which I think might be useful in MyST.

At a block-level, I think this would be relatively simple to implement in https://github.com/executablebooks/MyST-Parser, since they would essentially follow the same rules/logic as targets, e.g.

(target)=
# header

would be equivalent to

{#target}
# header

But then extra logic could also be added, to propagate classes as well as identifiers (for now we would ignore key/values), e.g.

{#target .class1 .class2}
# header

Would relate to HTML like

<h1 id="target" class="class1 class2">header</h1>

another place they could be added, is for defining role options, e.g. {name}`content`{#id .class key=value}

The problem here, though, is that docutils roles syntax does not support options.
Although it is of note, that the role functions themselves, do actually accept an options key-word: https://github.com/live-clones/docutils/blob/48bb76093b4ba83654b2f2c86e7c52c4bb39c63b/docutils/docutils/parsers/rst/roles.py#L197-L211

Collection of pointers to directives without content

There is some ability in other markup languages to supply directives without content, MyST right now is a bit awkward and confuses some folks:

Other implementations:

This issue is just for collecting pointers to various places where we might want to think about improvements to the spec in the future!

More flexible code blocks.

Context

As You may be aware I'm trying to adopt the Myst AST in papyri, and I'm missing some flexibility in code blocks.

In particular before moving to Myst AST my code block were more structured, and I was able to have links for each token, so that say clicking on array in a code block that contain np.array to open the relevant docs.

Proposal

Do you believe it would make send to have code-block be able to contain a list of children that are inline items instead of a value ? The other things this would allow it to potentially highlight individual token differently, like make some bold and/or and help the syntax highlighting by precomputing it for each token.

Tasks and updates

No response

`table` node spec changes / enhancements

The mdast table specification is very simplistic and too limited for publication-quality tables. While the goal need not be to support every edge case supported by html tables, there are a few small improvements that could help a lot.

mdast spec

See https://github.com/syntax-tree/mdast#table and https://github.com/syntax-tree/mdast-util-gfm-table

This specification assumes (1) first row of the table is header, everything else is standard, (2) rows and columns all have equal numbers of cells (no spans), and (3) cell alignment is consistent for the column

current myst-spec

See https://executablebooks.github.io/myst-spec/features/tables.html#tables

There are already a few changes to tables in myst-spec to allow more flexibility, with some similarities to HTML:

  • header is a boolean field on each cell - header cells can be anywhere, not just the first row
  • align is a field on each cell as well - left/right/center alignment is not necessarily constant on the entire column
  • align on the table is a single value which refers to alignment of the table itself (I think this should be removed, see below)

Proposed changes

  • Remove align from table node: this is confusing since it conflicts with mdast table alignment. We should only have align on cells (for the cell content) and the parent container (for alignment of the table itself)
  • Add colspan and rowspan to cells to allow them to span multiple rows / columns
  • Add a way to specify column width (and row height?)

Register MYST as an official markdown variant with IANA

IANA accepted an official registration for the text/markdown mimetype here: https://www.iana.org/assignments/media-types/text/markdown

Section 6.1 of the RFC specified a IANA based registry for markdown variants that is established here: https://www.iana.org/assignments/markdown-variants/markdown-variants.xhtml

The list of variants registered is woefully lacking but the registration mechanism allow formalize which mimetype variant parameter should be used to identify a flavor of markdown.

i.e. text/markdown;variant=MYST or text/markdown;variant=GFM

Even having an open draft RFC (probably for years) might allow us to establish and register the MYST parameter at least somewhat formally.

It is also worth noting that pandoc additionally provides for more parameters, that can specify which extensions/directives are loaded.

add log node

docutils has the concept of a system_message node, which can be inserted into the AST at points of failure.

As an example, in unified-myst, this is what is currently output for an error in the table directive:

```{table} This is a caption
:name: test
:class: a

Hallo
```
type: root
children:
  - type: mystDirective
    name: table
    args:
      - This is a caption
    options:
      name: test
      class:
        - a
    value: Hallo
    bodyOffset: 4
    children:
      - type: log
        message: >-
          Error parsing content block for the "table" directive: exactly one
          table expected.
        level: error

Defining mdast for citations

Currently doing some investigation on citations and thought I would post it here as it would be great to get on the same page for the data-structures for citations in mdast (I think there is more thought probably on the myst-syntax, do we adopt [@key] pandoc style citations, etc.). I would love to be aiming for the same place for the mdast data structures as the other syntax conversations evolve.

For a piece of technical content, the best practices for in-text citations are probably latex/natbib and pandoc citations which are defined here:

I am think the following mdast data-structures might capture everything:

type CiteGroup = {
  type: 'citeGroup'
  kind: 'narrative' | 'parenthetical'; // 'citet' vs 'citep'
  children: Cite[]
}

type Cite = {
  type: 'cite'
  identifier: string
  label: string
  expand: boolean // this is the * in natbib, expands authors, false by default
  partial: 'author' | 'year'
  prefix: string // e.g. "see" or "e.g."
  suffix: string // e.g. "99 years later" or something
  locator: string // e.g. "chap. 2", joined with a comma -- defined by CSL locale (pp. fig. etc.)
  // alias: string // use "Paper 1", maybe do this later?
}

I think this works pretty well and can fit with the {cite:t}`jon22` syntax we already have defined, but maybe in the future there is some way to give roles more data:
For example: {cite:p}[prefix="see", locator="chap. 2"]`jon22`
would yield: (see Jones et al., 2022, chap. 2)
Or maybe there is a specialized way to do this with [see @jon22, chap. 2] (see pandoc)

For multiple citations, the citeGroup would never be a directive or be in the markup, (i.e. [@key1; @key2] or {cite:p}`key1; key2`), but I think that the AST data structure is better represented by multiple nodes, one holding the group (parenthetical) information, this also means UIs can open groups of citations in a list (e.g. see distill/elife as good examples of this UI).

Both cite and citeGroup would be flow content, so the equivalent of a "citet" in latex is just a cite node in a paragraph (@key1 in pandoc style).

Some questions:

  • what is the best name for citeGroup?
  • should we follow kind or have some different flags like parenthetical? I suggested kind because that seemed easier to expand in the future if we add num or alt etc. (previously suggested a single cite node, splitting into group solves this).
  • narrative and parenthetical nomenclature comes from here

Existing implementations:

Would be curious on your thoughts @chrisjsewell and @fwkoch (maybe @mmcky as well?)!

Wrapping Directives & Roles

Add processed flag to directive.

Directives wrap other FlowContent.

Define a transform --> This should lift all children out of the directives.

Drop indented code blocks

As discussed in https://johnmacfarlane.net/beyond-markdown.html (the creator of Markdown)

  1. There should be one clear way of doing things, fenced code block are already the primary way of adding a code block
  2. explicit is better than implicit, fenced code blocks provide a much more explicit indication of the code block
  3. indented code block enforce a lot of technical limitations for the syntax/parser

As an example, you can look at https://markdown-it.github.io/ compared to https://djot.net/playground/ (a proposed successor to commonmark).
In djot it is perfectly fine to indent syntax blocks with any number of spaces, without having it become a code block

::::admonition

     :::admonition

     s

     :::

::::

This would be a breaking change from CommonMark, but one that should be possible

Standardised identifier for myst documents

It may be useful to have a standard way of signalling that a document is myst, and even perhaps the myst-spec version it is compliant with, particularly if using the (non-specific) .md file extension.

For example, in the top-matter

---
format: myst
format-version: "0.1.0"
---

Rename `mystComment` --> `comment`

There isn't anything special about a myst comment % comment, and from an AST perspective I think this would be simpler as just being named comment to be closer to other AST implementation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.