Markdown Abstract Syntax Tree.
mdast is a specification for representing markdown in a syntax tree. It implements the unist spec. It can represent several flavours of Markdown, such as CommonMark, and GitHub Flavored Markdown extensions.
This document may not be released.
See releases for released documents.
The latest released version is 2.2.0
.
- Introduction
- Nodes
- Mixin
- Enumeration
- Content
- Glossary
- List of Utilities
- References
- Contribute
- Acknowledgments
- License
This document defines a format for representing Markdown as an abstract syntax tree. Development of mdast started in July 2014, in remark, before unist existed. This specification is written in a Web IDL-like grammar.
mdast extends unist, a format for syntax trees, to benefit from its ecosystem of utilities.
mdast relates to JavaScript in that it has a rich ecosystem of utilities for working with compliant syntax trees in JavaScript. However, mdast is not limited to JavaScript and can be used in other programming languages.
mdast relates to the unified and remark projects in that mdast syntax trees are used throughout their ecosystems.
interface Parent <: UnistParent {
children: [Content];
}
Parent (UnistParent) represents a node in mdast containing other nodes (said to be children).
Its content is limited to only other mdast content.
interface Literal <: UnistLiteral {
value: string;
}
Literal (UnistLiteral) represents a node in mdast containing a value.
Its value
field is a string
.
interface Root <: Parent {
type: "root";
}
Root (Parent) represents a document.
Root can be used as the root of a tree, never as a child. Its content model is not limited to top-level content, but can contain any content with the restriction that all content must be of the same category.
interface Paragraph <: Parent {
type: "paragraph";
children: [PhrasingContent]
}
Paragraph (Parent) represents a unit of discourse dealing with a particular point or idea.
Paragraph can be used where block content is expected. Its content model is phrasing content.
For example, the following markdown:
Alpha bravo charlie.
Yields:
{
type: 'paragraph',
children: [{type: 'text', value: 'Alpha bravo charlie.'}]
}
interface Heading <: Parent {
type: "heading";
depth: 1 <= number <= 6;
children: [PhrasingContent];
}
Heading (Parent) represents a heading of a section.
Heading can be used where block content is expected. Its content model is phrasing content.
A depth
field must be present.
A value of 1
is said to be the highest rank and 6
the lowest.
For example, the following markdown:
# Alpha
Yields:
{
type: 'heading',
depth: 1,
children: [{type: 'text', value: 'Alpha'}]
}
interface ThematicBreak <: Node {
type: "thematicBreak";
}
ThematicBreak (Node) represents a thematic break, such as a scene change in a story, a transition to another topic, or a new document.
ThematicBreak can be used where block content is expected. It has no content model.
For example, the following markdown:
***
Yields:
{type: 'thematicBreak'}
interface Blockquote <: Parent {
type: "blockquote";
children: [BlockContent]
}
Blockquote (Parent) represents a section quoted from somewhere else.
Blockquote can be used where block content is expected. Its content model is also block content.
For example, the following markdown:
> Alpha bravo charlie.
Yields:
{
type: 'blockquote',
children: [{
type: 'paragraph',
children: [{type: 'text', value: 'Alpha bravo charlie.'}]
}]
}
interface List <: Parent {
type: "list";
ordered: boolean?;
start: number?;
loose: boolean?;
children: [ListContent];
}
List (Parent) represents a list of items.
List can be used where block content is expected. Its content model is list content.
An ordered
field can be present.
It represents that the items have been intentionally ordered (when true), or
that the order of items is not important (when false
or not present).
If the ordered
field is true
, a start
field can be present.
It represents the starting number of the node.
A loose
field can be present.
It represents that any of its items is separated by a blank line from its
siblings or contains two or more children
(when true
), or not (when false
or not present).
For example, the following markdown:
1. [x] foo
Yields:
{
type: 'list',
ordered: true,
start: 1,
loose: false,
children: [{
type: 'listItem',
checked: true,
children: [{
type: 'paragraph',
children: [{type: 'text', value: 'foo'}]
}]
}]
}
interface ListItem <: Parent {
type: "listItem";
checked: boolean?;
children: [BlockContent];
}
ListItem (Parent) represents an item in a List.
ListItem can be used where list content is expected. Its content model is block content.
A checked
field can be present.
It represents whether the item is done (when true
), not done (when false
),
or indeterminate or not applicable (when null
or not present).
For example, the following markdown:
* [x] bar
Yields:
{
type: 'listItem',
checked: true,
children: [{
type: 'paragraph',
children: [{type: 'text', value: 'bar'}]
}]
}
interface Table <: Parent {
type: "table";
align: [alignType]?;
children: [TableContent];
}
Table (Parent) represents two-dimensional data.
Table can be used where block content is expected. Its content model is table content.
The head of the node represents the labels of the columns.
An align
field can be present.
If present, it must be a list of alignTypes.
It represents how cells in columns are aligned.
For example, the following markdown:
| foo | bar |
| :-- | :-: |
| baz | qux |
Yields:
{
type: 'table',
align: ['left', 'center'],
children: [
{
type: 'tableRow',
children: [
{
type: 'tableCell',
children: [{type: 'text', value: 'foo'}]
},
{
type: 'tableCell',
children: [{type: 'text', value: 'bar'}]
}
]
},
{
type: 'tableRow',
children: [
{
type: 'tableCell',
children: [{type: 'text', value: 'baz'}]
},
{
type: 'tableCell',
children: [{type: 'text', value: 'qux'}]
}
]
}
]
}
interface TableRow <: Parent {
type: "tableRow";
children: [RowContent];
}
TableRow (Parent) represents a row of cells in a table.
TableRow can be used where table content is expected. Its content model is row content.
If the node is a head, it represents the labels of the columns for its parent Table.
For an example, see Table.
interface TableCell <: Parent {
type: "tableCell";
children: [PhrasingContent];
}
TableCell (Parent) represents a header cell in a Table, if its parent is a head, or a data cell otherwise.
TableCell can be used where row content is expected. Its content model is phrasing content.
For an example, see Table.
interface HTML <: Literal {
type: "html";
}
HTML (Literal) represents a fragment of raw HTML.
HTML can be used where block or
phrasing content is expected.
Its content is represented by its value
field.
For example, the following markdown:
<div>
Yields:
{type: 'html', value: '<div>'}
interface Code <: Literal {
type: "code";
lang: string?;
meta: string?;
}
Code (Literal) represents a block of preformatted text, such as ASCII art or computer code.
Code can be used where block content is expected.
Its content is represented by its value
field.
This node relates to the phrasing content concept InlineCode.
A lang
field can be present.
It represents the language of computer code being marked up.
If the lang
field is present, a meta
field can be present.
It represents custom information relating to the node.
For example, the following markdown:
foo()
Yields:
{
type: 'code',
lang: null,
meta: null,
value: 'foo()'
}
And the following markdown:
```javascript highlight-line="2"
foo()
bar()
baz()
```
Yields:
{
type: 'code',
lang: 'javascript',
meta: 'highlight-line="2"',
value: 'foo()\nbar()\nbaz()'
}
interface YAML <: Literal {
type: "yaml";
}
YAML (Literal) represents a collection of metadata for the document in the YAML data serialisation language.
YAML can be used where frontmatter content is
expected.
Its content is represented by its value
field.
For example, the following markdown:
---
foo: bar
---
Yields:
{type: 'yaml', value: 'foo: bar'}
interface Definition <: Node {
type: "definition";
}
Definition includes Association;
Definition includes Resource;
Definition (Node) represents a resource.
Definition can be used where definition content is expected. It has no content model.
Definition includes the mixins Association and Resource.
Definition should be associated with LinkReferences and ImageReferences.
For example, the following markdown:
[Alpha]: http://example.com
Yields:
{
type: 'definition',
identifier: 'alpha',
label: 'Alpha',
url: 'http://example.com',
title: null
}
interface FootnoteDefinition <: Parent {
type: "footnoteDefinition";
children: [BlockContent];
}
FootnoteDefinition includes Association;
FootnoteDefinition (Parent) represents content relating to the document that is outside its flow.
FootnoteDefinition can be used where definition content is expected. Its content model is block content.
FootnoteDefinition includes the mixin Association.
FootnoteDefinition should be associated with FootnoteReferences.
For example, the following markdown:
[^alpha]: bravo and charlie.
Yields:
{
type: 'footnoteDefinition',
identifier: 'alpha',
label: 'alpha',
children: [{
type: 'paragraph',
children: [{type: 'text', value: 'bravo and charlie.'}]
}]
}
interface Text <: Literal {
type: "text";
}
Text (Literal) represents everything that is just text.
Text can be used where phrasing content is
expected.
Its content is represented by its value
field.
For example, the following markdown:
Alpha bravo charlie.
Yields:
{type: 'text', value: 'Alpha bravo charlie.'}
interface Emphasis <: Parent {
type: "emphasis";
children: [PhrasingContent];
}
Emphasis (Parent) represents stress emphasis of its contents.
Emphasis can be used where phrasing content is expected. Its content model is also phrasing content.
For example, the following markdown:
*alpha* _bravo_
Yields:
{
type: 'paragraph',
children: [
{
type: 'emphasis',
children: [{type: 'text', value: 'alpha'}]
},
{type: 'text', value: ' '},
{
type: 'emphasis',
children: [{type: 'text', value: 'bravo'}]
}
]
}
interface Strong <: Parent {
type: "strong";
children: [PhrasingContent];
}
Strong (Parent) represents strong importance, seriousness, or urgency for its contents.
Strong can be used where phrasing content is expected. Its content model is also phrasing content.
For example, the following markdown:
**alpha** __bravo__
Yields:
{
type: 'paragraph',
children: [
{
type: 'strong',
children: [{type: 'text', value: 'alpha'}]
},
{type: 'text', value: ' '},
{
type: 'strong',
children: [{type: 'text', value: 'bravo'}]
}
]
}
interface Delete <: Parent {
type: "delete";
children: [PhrasingContent];
}
Delete (Parent) represents contents that are no longer accurate or no longer relevant.
Delete can be used where phrasing content is expected. Its content model is also phrasing content.
For example, the following markdown:
~~alpha~~
Yields:
{
type: 'delete',
children: [{type: 'text', value: 'alpha'}]
}
interface InlineCode <: Literal {
type: "inlineCode";
}
InlineCode (Literal) represents a fragment of computer code, such as a file name, computer program, or anything a computer could parse.
InlineCode can be used where phrasing content
is expected.
Its content is represented by its value
field.
This node relates to the block content concept Code.
For example, the following markdown:
`foo()`
Yields:
{type: 'inlineCode', value: 'foo()'}
interface Break <: Node {
type: "break";
}
Break (Node) represents a line break, such as in poems or addresses.
Break can be used where phrasing content is expected. It has no content model.
For example, the following markdown:
foo··
bar
Yields:
{
type: 'paragraph',
children: [
{type: 'text', value: 'foo'},
{type: 'break'},
{type: 'text', value: 'bar'}
]
}
interface Link <: Parent {
type: "link";
children: [StaticPhrasingContent];
}
Link includes Resource;
Link (Parent) represents a hyperlink.
Link can be used where phrasing content is expected. Its content model is static phrasing content.
Link includes the mixin Resource.
For example, the following markdown:
[alpha](http://example.com "bravo")
Yields:
{
type: 'link',
url: 'http://example.com',
title: 'bravo',
children: [{type: 'text', value: 'alpha'}]
}
interface Image <: Node {
type: "image";
}
Image includes Resource;
Image includes Alternative;
Image (Node) represents an image.
Image can be used where phrasing content is
expected.
It has no content model, but is described by its alt
field.
Image includes the mixins Resource and Alternative.
For example, the following markdown:
![alpha](http://example.com/favicon.ico "bravo")
Yields:
{
type: 'image',
url: 'http://example.com/favicon.ico',
title: 'bravo',
alt: 'alpha'
}
interface LinkReference <: Parent {
type: "linkReference";
children: [StaticPhrasingContent];
}
LinkReference includes Reference;
LinkReference (Parent) represents a hyperlink through association, or its original source if there is no association.
LinkReference can be used where phrasing content is expected. Its content model is static phrasing content.
LinkReference includes the mixin Reference.
LinkReferences should be associated with a Definition.
For example, the following markdown:
[alpha][Bravo]
Yields:
{
type: 'linkReference',
identifier: 'bravo',
label: 'Bravo',
referenceType: 'full',
children: [{type: 'text', value: 'alpha'}]
}
interface ImageReference <: Node {
type: "imageReference";
}
ImageReference includes Reference;
ImageReference includes Alternative;
ImageReference (Node) represents an image through association, or its original source if there is no association.
ImageReference can be used where phrasing
content is expected.
It has no content model, but is described by its alt
field.
ImageReference includes the mixins Reference and Alternative.
ImageReference should be associated with a Definition.
For example, the following markdown:
![alpha][bravo]
Yields:
{
type: 'imageReference',
identifier: 'bravo',
label: 'bravo',
referenceType: 'full',
alt: 'alpha'
}
interface Footnote <: Parent {
type: "footnote";
children: [PhrasingContent];
}
Footnote (Parent) represents content relating to the document that is outside its flow.
Footnote can be used where phrasing content is expected. Its content model is also phrasing content.
For example, the following markdown:
[^alpha bravo]
Yields:
{
type: 'footnote',
children: [{type: 'text', value: 'alpha bravo'}]
}
interface FootnoteReference <: Node {
type: "footnoteReference";
}
FootnoteReference includes Association;
FootnoteReference (Node) represents a marker through association.
FootnoteReference can be used where phrasing content is expected. It has no content model.
FootnoteReference includes the mixin Association.
FootnoteReference should be associated with a FootnoteDefinition.
For example, the following markdown:
[^alpha]
Yields:
{
type: 'footnoteReference',
identifier: 'alpha',
label: 'alpha'
}
interface mixin Resource {
url: string;
title: string?;
}
Resource represents a reference to resource.
A url
field must be present.
It represents a URL to the referenced resource.
A title
field can be present.
It represents advisory information for the resource, such as would be
appropriate for a tooltip.
interface mixin Association {
identifier: string;
label: string?;
}
Association represents an internal relation from one node to another.
An identifier
field must be present.
It can match an identifier
field on another node.
A label
field can be present.
It represents the original value of the normalised identifier
field.
Whether the value of identifier
is expected to be a unique identifier or not
depends on the type of node including the Association.
An example of this is that identifier
on Definition
should be a unique identifier, whereas multiple
LinkReferences can have the same identifier
and be
associated with one definition.
interface mixin Reference {
referenceType: string;
}
Reference includes Association;
Reference represents a marker that is associated to another node.
A referenceType
field must be present.
Its value must be a referenceType.
It represents the explicitness of the reference.
interface mixin Alternative {
alt: string?;
}
Alternative represents a node with a fallback
An alt
field should be present.
It represents equivalent content for environments that cannot represent the
node as intended.
enum alignType {
"left" | "right" | "center" | null;
}
alignType represents how phrasing content is aligned.
- left: See the
left
value of thetext-align
CSS property - right: See the
right
value of thetext-align
CSS property - center: See the
center
value of thetext-align
CSS property - null: phrasing content is aligned as defined by the host environment
enum referenceType {
"shortcut" | "collapsed" | "full";
}
referenceType represents the explicitness of a reference.
- shortcut: the reference is implicit, its identifier inferred from its content
- collapsed: the reference is explicit, its identifier inferred from its content
- full: the reference is explicit, its identifier explicitly set
type Content =
TopLevelContent | ListContent | TableContent | RowContent | PhrasingContent;
Each node in mdast falls into one or more categories of Content that group nodes with similar characteristics together.
type TopLevelContent = BlockContent | FrontmatterContent | DefinitionContent;
Top-level content represent the sections of document (block content), and metadata such as frontmatter and definitions.
type BlockContent =
Paragraph | Heading | ThematicBreak | Blockquote | List | Table | HTML | Code;
Block content represent the sections of document.
type FrontmatterContent = YAML;
Frontmatter content represent out-of-band information about the document.
If frontmatter is present, it must be limited to one node in the tree, and can only exist as a head.
type DefinitionContent = Definition | FootnoteDefinition;
Definition content represents out-of-band information that typically affects the document through Association.
type ListContent = ListItem;
List content represent the items in a list.
type TableContent = TableRow;
Table content represent the rows in a table.
type RowContent = TableCell;
Row content represent the cells in a row.
type PhrasingContent = StaticPhrasingContent | Link | LinkReference;
Phrasing content represent the text in a document, and its markup.
type StaticPhrasingContent =
Text | Emphasis | Strong | Delete | HTML | InlineCode | Break | Image |
ImageReference | Footnote | FootnoteReference;
StaticPhrasing content represent the text in a document, and its markup, that is not intended for user interaction.
See the unist glossary.
See the unist list of utilities for more utilities.
mdast-util-assert
— Assert nodesmdast-add-list-metadata
— Enhances the metadata of list and listItem nodesmdast-comment-marker
— Parse a comment markermdast-util-compact
— Make a tree compactmdast-util-definitions
— Find definition nodesmdast-flatten-listitem-paragraphs
— Flatten listItem and (nested) paragraph into one listItem nodemdast-flatten-nested-lists
— Transforms a tree to avoid lists inside listsmdast-util-heading-range
— Markdown heading as rangesmdast-util-heading-style
— Get the style of a heading nodemdast-util-inject
— Inject a tree into another at a given headingmdast-util-to-string
— Get the plain text content of a nodemdast-flatten-image-paragraphs
— Flatten paragraph and image into one image nodemdast-move-images-to-root
— Moves image nodes up the tree until they are strict children of the rootmdast-normalize-headings
— Ensure at most one top-level heading is in the documentmdast-squeeze-paragraphs
— Remove empty paragraphsmdast-util-toc
— Generate a Table of Contents from a treemdast-util-to-hast
— Transform to HASTmdast-util-to-nlcst
— Transform to NLCSTmdast-zone
— HTML comments as ranges or markers
- unist: Universal Syntax Tree. T. Wormer; et al.
- Markdown: Markdown. J. Gruber.
- CommonMark: CommonMark. J. MacFarlane; et al.
- GFM: GitHub Flavored Markdown. GitHub.
- HTML: HTML Standard, A. van Kesteren; et al. WHATWG.
- CSSTEXT: CSS Text, CSS Text, E. Etemad, K. Ishii. W3C.
- JavaScript ECMAScript Language Specification. Ecma International.
- YAML: YAML Ain’t Markup Language, O. Ben-Kiki, C. Evans, I. döt Net.
- Web IDL: Web IDL, C. McCormack. W3C.
mdast is built by people just like you!
Check out contributing.md
for ways to get started.
This project has a Code of Conduct. By interacting with this repository, organisation, or community you agree to abide by its terms.
Want to chat with the community and contributors? Join us in Gitter!
Have an idea for a cool new utility or tool?
That’s great!
If you want feedback, help, or just to share it with the world you can do so by
creating an issue in the syntax-tree/ideas
repository!
The initial release of this project was authored by @wooorm.
Special thanks to @eush77 for their work, ideas, and incredibly valuable feedback!
Thanks to @anandthakker, @BarryThePenguin, @izumin5210, @jasonLaster, @justjake, @KyleAMathews, @Rokt33r, @rhysd, @Sarah-Seo, @sethvincent, and @simov for contributing to mdast and related projects!