executablebooks / myst-spec Goto Github PK
View Code? Open in Web Editor NEWMyST is designed to create publication-quality, computational documents written entirely in Markdown.
Home Page: https://mystmd.org/spec
License: MIT License
MyST is designed to create publication-quality, computational documents written entirely in Markdown.
Home Page: https://mystmd.org/spec
License: MIT License
Restructured text allow for the syntax:
.. note:: This is a short note
It is a nice, terse, one-line syntax for an admonition.
This comes at a price though; the parsing of a directive's "structure" is dependent on the type of directive.
Because the body content can possibly be on the first line, this does not work:
.. note:: I want a bespoke title
Then some content
It is simply treated as a note with two body paragraphs
This means you then have to then use a different directive π:
.. admonition:: I want a bespoke title
:class: note
Then some content
https://github.com/executablebooks/MyST-Parse currently follows this logic with colon_fence
, e.g.
:::{note} This is a body paragraph
this is a continuation of the same paragraph
:::
However, https://myst-tools.org/docs/mystjs/admonitions#admonition-titles does not
:::{note} This is the title
This is the body paragraph
:::
I propose "disambiguating" this difference, by introducing a "leaf div" to compliment the block "div"
::{note} This is a leaf div. It is interpreted as a single body paragraph,
that can continue on to multiple lines if really necessary
:::{note} this is a block div title
This is a block div content
:::
In executablebooks/mdit-py-plugins#72, I have sketched out what this would look like for a markdown-it plugin
Note with block attributes, you could also provide options to leaf divs, e.g. this
{#name .class key=value}
::{note} A note
would be equivalent to
:::{note}
:name: name
:class: class
:key: value
A note
:::
There are directives that define codeBlocks
that have the same properties as code
, and we have reduced that down to a single node in the spec. I think we need to carry another property to differentiate that!
Not quite sure what it is, maybe executable
? Right now this allows thebe to target as well as changes the view of the UI (see below).
This does not change the AST def. But does have implications for efficiency of parsing.
Base "myst-spec" should just parse the breaks. These can be transformed into blocks.
We have recently accepted MEP0002 as a team:
https://mep.myst-tools.org/en/latest/meps/mep-0002/
This discusses project cross-references using markdown links, the final part of the process is to update the myst spec to include those changes. This should include at least the following changes:
urlSource
, internal
, and updating the kind
%s
from the template/spec and only using {number}I am happy to take a first crack at this (probably late next week), I think a lot of the pieces should be very fast to do as all the content is already there!
mystTargets
#40mystComments
#40These can come into the picture on a propagateTargets
transformation.
These no longer have identifiers
, targets must be propagated to define what the identifier actually is.
mystTarget
removal is an option of the transform
Regarding Link Referencing: https://github.com/executablebooks/myst-spec/pull/28/files#r842494008
link
etc.Add a stage enum to the test? There are a few levels of testing. (parse, transform, post, html)
@fwkoch merges:
From here
target
Apr. 12 Decisions:
identifier
--> target
(this is clearer!); this is explicitly different than MDAST
linkReference
in MDAST and other MDAST pieces get removed early in the parsing, and the crossReference
will be long-lived all the way to render time.domain
to the crossReference
(std, math, prf, or any string) (allow for arbitrary strings!)kind
: expand to stringsFor example, math
: eq
role becomes: domain: math, kind: numref
TODO:
[]()
syntax
any
[](http(s?) or mailto)
- these are external links, done
[](#target)
- this is any reference on this page (ref role, i.e. no eq (at the moment))
[](doc.md)
- this is a document link, it must have the extension in it.
[](doc.md#target)
- this is a target specifically on on another page (ref role, i.e. no eq (at the moment))
[](any)
- this is a fallback to look up anything in the project
{py:func}`myFunciton`
-- use roles to reference into specific domains
Chris to move on this in the python implementation!
Franlin to improve myst-spec (small, decided upon issues!)
executablebooks/unified-myst#17
On implementing an extension:
markdown-it-docutils
Currently there are two ways of specifying options within a directive:
(option 1) enclosing in ----
```{name}
---
option1: value
option2: value
---
```
(option 2) prepending all lines by :
```{name}
:option1: value
:option2: value
```
Firstly, there should be one clear way of doing things, and so it would be ideal to remove one of these.
Secondly, the following logic proceeds for converting them to the "final" input options for the directive:
Clearly here the YAML value parsing is unnecessary, and worse can lead to discrepancies, such as a:
becomes {"a": null}
as opposed to {"a": ""}
.
YAML is also quite complex (see e.g. here) and not really necessary for the more simple requirements of option parsing.
If we accept that it is the directive implementation's responsibility to do any conversions from strings,
then we simply need a syntax/format that maps string keys to string values.
There are two ways to do this that come to mind:
Something like field lists (see the rST spec, and mdit-py implementation), i.e. very similar to the current (option2)
```{name}
:name: x
:class: y
:other: z
```
(It is of note that in the field list spec, keys are parsed as Markdown, that is not what we want here though)
Block attributes before the directive (see here)
{#x .y other=z}
```{name}
```
It is of note, that the only place where we definitely need a direct mapping of options <-> JSON
is in code-cell
, whereby the options actually map to the metadata of a Jupyter Notebook code cell.
In this case though, code-cell
can be viewed as a "pseudo-directive" and perhaps should have a different syntax, so as not to be confusing.
In unified-myst
I currently use name
, e.g. for {name}`content`
, rather than kind
This also syncs with: https://github.com/syntax-tree/mdast-util-directive/blob/4c494b18ac31f27f67b95b917aacc03207d9584f/complex-types.d.ts#L7
Any major objections to changing that here @rowanc1 @fwkoch?
Arguments are always arguments.
The parser should be able to read directives, and that should never be bumped down to the body.
Don't care what the directive is, can always know that the options are the options.
If we wish MyST to be a "global" spec, then I think it should strive to meet:
avoiding syntax keywords that are hard-coded English
Granted, this is very difficult, especially with roles and directives (and their options),
but when thinking about new features, syntax extension etc, I think we should always bear this in mind
One thing in particular to note, is that currently the myst-spec, and by extension mystjs, hard-codes directive names to be english.
This is not the case for docutils directives, which have a translation module https://github.com/docutils/docutils/tree/master/docutils/docutils/parsers/rst/languages, so for example you can do this:
$ echo ":::{tip}\nhi\n:::" | myst-docutils-demo --myst-enable-extensions=colon_fence
<aside class="admonition tip">
<p class="admonition-title">Tip</p>
<p>hi</p>
</aside>
$ echo ":::{astuce}\nhi\n:::" | myst-docutils-demo --language=fr --myst-enable-extensions=colon_fence
<aside class="admonition tip">
<p class="admonition-title">Astuce</p>
<p>hi</p>
</aside>
I wanted to bring this up, especially as we were recently talking about admonitions: #49
This issue aims at keeping links to places in various repositories where people have issues or suggestions with cross-references and citations.
This thread is not aimed at proposing a solution.
Then you can import them directly for testing (with https://www.typescriptlang.org/tsconfig#resolveJsonModule)
Based on the team meeting yesterday, and follow up conversations with @chrisjsewell, @mmcky, @choldgraf and @fwkoch, we are starting to put together a repository that will house the technical documentation and test cases for a MyST Spec. This will be housed at https://spec.myst.tools in the future (see executablebooks/meta#538).
To build on the MyST markup language and to make the ecosystem as rich and interoperable as possible, we need to formalize three formats:
There is additional standardization on optional extensions (e.g. dollarMath, and configuration), that is likely outside the scope of this repository and more coordination between packages. Additionally, MyST as a community standard requires ways to improve and enhance these formalizations over time in our multi-stakeholder community. We will aim to start to introduce more formalization on this process over the coming months (e.g. MEPs in an extension proposals repo)!
There are a number of places where test-cases and documentation already live. @fwkoch @rowanc1 and @chrisjsewell will be doing some initial work to pull these together to get a first pass spec (CommonMark + GFM + Base directives/roles + Admonitions). There is existing work is in:
For the next few weeks @chrisjsewell will be working on the CommonMark/GFM side and documentation, as well as familiarizing with Mdast work done in mystjs, and @rowanc1 and @fwkoch will be working on the MyST side bringing these over to json schema and formalizing the mdast naming and properties.
πππ
Currently, containers are a node type, with kind
, which specifies the type of container (e.g. figure, table, etc), and typed children
corresponding to the kind (e.g. figure -> image + figure caption).
This doesn't follow the pattern of other nodes: we shouldn't need to look at kind to resolve/validate the children. If we want to be strict about children
, each container should probably be its own node type, e.g. ContainerFigure
. However, if we want it to be easy to add new kinds, we could keep the Container
node, allow kind
to be more flexible, and make children
simply FlowContent
. This would then allow authors to add new numbered kinds
in their papers without extending the spec, e.g. Lemma 1
.
This is quite limited in the spec to: eq
, numref
, ref
, doc
https://github.com/executablebooks/myst-spec/blob/main/schema/references.schema.json#L37 (even doc is missing in this schema file) - this matches the jupyterbook myst documentation https://jupyterbook.org/content/references.html#reference-figures
This feels a little mixed up: ref
and numref
don't specify the target type, it may be a header, figure, table, etc. On the other hand eq
and doc
specify the target type (but don't differentiate between reference or numbered reference).
Should these kinds be specific in their target type? math
table
figure
etc? Or should they simply be references
to be resolved later? There are pros/cons each way - e.g. the former requires more validation but allows knowledge of the target type without resolving the reference every time.
We can also introduce kind + domain, like sphinx does here: https://www.sphinx-doc.org/en/master/usage/restructuredtext/domains.html#the-math-domain
There is some flexibility to extend the container kind
, there are also meaningful
Currently: kind*: string (βfigureβ | βtableβ) - kind of container contents
Other things to consider:
Currently, there are numerous configuration formats/schemas for how MyST is parsed/rendered across implementations:
global:
conf.py
: https://myst-parser.readthedocs.io/en/latest/configuration.html#global-configurationconf.py
: https://myst-nb.readthedocs.io/en/latest/configuration.html#global-configuration_conf.yml
: https://jupyterbook.org/en/stable/customize/config.htmlmyst.yml
: https://js.myst.tools/guide/frontmatter#in-a-myst-yml-filesetting.json
(to set markdown-it syntax plugins): https://github.com/executablebooks/myst-vs-code/blob/c5b209c77196a46152f545ff366d8b729f7b9c84/package.json#L125page level:
For consistency across tools, it would be ideal to have a defined "core" JSON schema for such configuration, then potentially also a standard file name/format for global configuration (although this may not be possible for all implementations).
Particularly for page-level metadata, namespacing should definitely be considered, and also potentially defined way for extensions to add configuration.
As specified here: https://docutils.sourceforge.io/docs/ref/doctree.html#common-attributes, there are some common attributes associated with all docutils nodes, and this should essentially be the same here.
As an example, here:
This should be classes: ['tip']
bodyOffset
, this is the number of lines into the body for the directive value
This issue is for collecting pointers to challenges that users have in interacting with admonitions, especially around setting a custom title. Some similar themes are cropping up, as well as pointers to other tools in the community:
There has also been some discussion about making arguments to directives more deterministic to simplify parsing, and reduce exceptions to the directive rules.
Relevant links:
Attributes are a common Markdown syntax extension, e.g. {#id .class value=key}
, which I think might be useful in MyST.
At a block-level, I think this would be relatively simple to implement in https://github.com/executablebooks/MyST-Parser, since they would essentially follow the same rules/logic as targets, e.g.
(target)=
# header
would be equivalent to
{#target}
# header
But then extra logic could also be added, to propagate classes as well as identifiers (for now we would ignore key/values), e.g.
{#target .class1 .class2}
# header
Would relate to HTML like
<h1 id="target" class="class1 class2">header</h1>
another place they could be added, is for defining role options, e.g. {name}`content`{#id .class key=value}
The problem here, though, is that docutils roles syntax does not support options.
Although it is of note, that the role functions themselves, do actually accept an options
key-word: https://github.com/live-clones/docutils/blob/48bb76093b4ba83654b2f2c86e7c52c4bb39c63b/docutils/docutils/parsers/rst/roles.py#L197-L211
There is some ability in other markup languages to supply directives without content, MyST right now is a bit awkward and confuses some folks:
Other implementations:
This issue is just for collecting pointers to various places where we might want to think about improvements to the spec in the future!
As You may be aware I'm trying to adopt the Myst AST in papyri, and I'm missing some flexibility in code blocks.
In particular before moving to Myst AST my code block were more structured, and I was able to have links for each token, so that say clicking on array
in a code block that contain np.array
to open the relevant docs.
Do you believe it would make send to have code-block be able to contain a list of children that are inline items instead of a value ? The other things this would allow it to potentially highlight individual token differently, like make some bold and/or and help the syntax highlighting by precomputing it for each token.
No response
directives/
blocks/
The mdast
table specification is very simplistic and too limited for publication-quality tables. While the goal need not be to support every edge case supported by html tables, there are a few small improvements that could help a lot.
See https://github.com/syntax-tree/mdast#table and https://github.com/syntax-tree/mdast-util-gfm-table
This specification assumes (1) first row of the table is header, everything else is standard, (2) rows and columns all have equal numbers of cells (no spans), and (3) cell alignment is consistent for the column
See https://executablebooks.github.io/myst-spec/features/tables.html#tables
There are already a few changes to tables in myst-spec to allow more flexibility, with some similarities to HTML:
header
is a boolean field on each cell - header cells can be anywhere, not just the first rowalign
is a field on each cell as well - left/right/center alignment is not necessarily constant on the entire columnalign
on the table is a single value which refers to alignment of the table itself (I think this should be removed, see below)align
from table node: this is confusing since it conflicts with mdast table alignment. We should only have align
on cells (for the cell content) and the parent container (for alignment of the table itself)colspan
and rowspan
to cells to allow them to span multiple rows / columnswidth
(and row height
?)This shouldn't be in the spec tests at least?
IANA accepted an official registration for the text/markdown
mimetype here: https://www.iana.org/assignments/media-types/text/markdown
Section 6.1 of the RFC specified a IANA based registry for markdown variants that is established here: https://www.iana.org/assignments/markdown-variants/markdown-variants.xhtml
The list of variants registered is woefully lacking but the registration mechanism allow formalize which mimetype variant parameter should be used to identify a flavor of markdown.
i.e. text/markdown;variant=MYST
or text/markdown;variant=GFM
Even having an open draft RFC (probably for years) might allow us to establish and register the MYST parameter at least somewhat formally.
It is also worth noting that pandoc
additionally provides for more parameters, that can specify which extensions/directives are loaded.
docutils has the concept of a system_message
node, which can be inserted into the AST at points of failure.
As an example, in unified-myst, this is what is currently output for an error in the table
directive:
```{table} This is a caption
:name: test
:class: a
Hallo
```
type: root
children:
- type: mystDirective
name: table
args:
- This is a caption
options:
name: test
class:
- a
value: Hallo
bodyOffset: 4
children:
- type: log
message: >-
Error parsing content block for the "table" directive: exactly one
table expected.
level: error
Currently doing some investigation on citations and thought I would post it here as it would be great to get on the same page for the data-structures for citations in mdast (I think there is more thought probably on the myst-syntax, do we adopt [@key]
pandoc style citations, etc.). I would love to be aiming for the same place for the mdast data structures as the other syntax conversations evolve.
For a piece of technical content, the best practices for in-text citations are probably latex/natbib and pandoc citations which are defined here:
I am think the following mdast data-structures might capture everything:
type CiteGroup = {
type: 'citeGroup'
kind: 'narrative' | 'parenthetical'; // 'citet' vs 'citep'
children: Cite[]
}
type Cite = {
type: 'cite'
identifier: string
label: string
expand: boolean // this is the * in natbib, expands authors, false by default
partial: 'author' | 'year'
prefix: string // e.g. "see" or "e.g."
suffix: string // e.g. "99 years later" or something
locator: string // e.g. "chap. 2", joined with a comma -- defined by CSL locale (pp. fig. etc.)
// alias: string // use "Paper 1", maybe do this later?
}
I think this works pretty well and can fit with the {cite:t}`jon22`
syntax we already have defined, but maybe in the future there is some way to give roles more data:
For example: {cite:p}[prefix="see", locator="chap. 2"]`jon22`
would yield: (see Jones et al., 2022, chap. 2)
Or maybe there is a specialized way to do this with [see @jon22, chap. 2]
(see pandoc)
For multiple citations, the citeGroup
would never be a directive or be in the markup, (i.e. [@key1; @key2]
or {cite:p}`key1; key2`
), but I think that the AST data structure is better represented by multiple nodes, one holding the group (parenthetical) information, this also means UIs can open groups of citations in a list (e.g. see distill/elife as good examples of this UI).
Both cite
and citeGroup
would be flow content, so the equivalent of a "citet" in latex is just a cite node in a paragraph (@key1
in pandoc style).
citeGroup
?kind
or have some different flags like parenthetical
? I suggested kind
because that seemed easier to expand in the future if we add num
or alt
etc.narrative
and parenthetical
nomenclature comes from hereWould be curious on your thoughts @chrisjsewell and @fwkoch (maybe @mmcky as well?)!
Add processed
flag to directive.
Directives wrap other FlowContent.
Define a transform --> This should lift all children out of the directives.
As discussed in https://johnmacfarlane.net/beyond-markdown.html (the creator of Markdown)
As an example, you can look at https://markdown-it.github.io/ compared to https://djot.net/playground/ (a proposed successor to commonmark).
In djot it is perfectly fine to indent syntax blocks with any number of spaces, without having it become a code block
::::admonition
:::admonition
s
:::
::::
This would be a breaking change from CommonMark, but one that should be possible
It may be useful to have a standard way of signalling that a document is myst, and even perhaps the myst-spec version it is compliant with, particularly if using the (non-specific) .md
file extension.
For example, in the top-matter
---
format: myst
format-version: "0.1.0"
---
This change would make naming conflicts with other libraries less likely. Similarly, if we make this change we should do Role
-> MystRole
.
There isn't anything special about a myst comment % comment
, and from an AST perspective I think this would be simpler as just being named comment
to be closer to other AST implementation.
This should be a simple text styling node like strong, underline, etc.
see -
https://github.com/syntax-tree/mdast#delete
https://github.com/syntax-tree/mdast-util-gfm-strikethrough
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.