A list of markdown items that we should review in addition to discussions on <code cla

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Review available Markdown Syntax about myst-parser HOT 13 CLOSED

executablebooks commented on May 24, 2024

Review available Markdown Syntax

from myst-parser.

Comments (13)

choldgraf commented on May 24, 2024 4

I've spent a little bit of time looking into Markdown flavors that might be of interest and wanted to update this issue with some more perspective.

First off, I think this is the main takeaway (it is a suggestion, not a directive): our build system should support a strict subset of Pandoc markdown or RMarkdown. It could do this in addition to another language like rST.

Why support Pandoc Markdown?

I spoke with a few folks at RStudio which is the main driver behind RMarkdown. This is the flavor of markdown supported by bookdown. It has proven resilient and quite popular in the R community, and supports many of the features that we'd need for publishing. RMarkdown is a subset of Pandoc markdown (here is a summary of pandoc markdown and here is the announcement that RMarkdown is a subset of pandoc markdown). This means that if we were to support the same subset of Pandoc Markdown, then we'd be supporting a language that is already utilized by a huge community of people.

How would we support this?

Here is a useful post with information on what we'd need to do to support this.

I think there are two options to supporting pandoc markdown.

First is to write a direct parser that goes from Pandoc markdown -> a docutils AST. This could be, for example, by building on top of the recommonmark project, which does this for the base "commonmark" flavor of markdown.
Second is to build a bridge between markdown and rST, such as the m2r package. This is also what nbsphinx does to support reading Jupyter notebooks.

In either case, we'd want to define a subset of Pandoc markdown syntax that we wish to support, and then create a mapping from that subset onto either docutils objects or rST. This could be a standalone Sphinx extension that would be really useful for the community outside of just this project.

As a side note, here's an interesting post about the differences between pandoc MD vs rST

@chrisjsewell has already made a really interesting implementation of this approach here: executablebooks/meta#12

A few caveats

There are a few potential pitfalls to this...here are some that I can think of:

This could get edge-casey, depending on how much of the Pandoc or RMarkdown world we want to support.
Perhaps we should think purely in form of the within-cell markdown flavor we wish to support, and then farm out the representation of a notebook in that markdown flavor to something like jupytext.
We should do some profiling to see if this would add significant overhead to the build times

What about directives and roles?

The biggest question to my mind is what to do about directives and roles (if we are using Sphinx under the hood). These are one of the most powerful features in Sphinx, and something we could take advantage of to extend new features for books. But, there are no native "directives and roles" features in markdown.

One idea would be to piggy-back on Pandoc markdown syntax for these.

Directives

In rST directives look like this:

.. mydirective:
   :myparam1: myval1
   :myparam2: myval2

In markdown, this might be utilized with Pandoc syntax.
For example, Pandoc allows you to separate <div> elements with fences like so:

::: mydiv
# Some markdown
inside the div
:::

and optionally:

::: myotherdiv {.myattribute}
:::

Perhaps the pattern of ::: something could be mapped onto directives in rST. For example, something like:

::: toctree {maxdepth=1}
* page1
* page2
:::

Roles

For in-line markup, we could use the "bracketed spans" syntax from Pandoc markdown. This is intended to making custom "span" elements in your text like so:

[This is *some text*]{.class key="val"}

However, we could piggy-back on this by defining some specific roles, e.g.:

I'm now linking to a [different document]{doc=anotherPage} which contains [this equation]{eq=myeqID}. And also for [references]{ref=mybibtexref}.

Curious what folks think about that...

I'll update this issue if I can think of some other things to consider...

from myst-parser.

choldgraf commented on May 24, 2024 1

@rowanc1 good point about needing a language to do the rendering. Another possibility is to piggy-back on Jupyter for some of this. E.g., there are some interesting JS tools that use a Binder kernel under the hood to add interactivity backed by a Python/R/whatever kernel:

I also wanted to ping @stefanv and @rossbar who might have thoughts on markdown and its use in a publishing system. I believe that Elegant Scipy is written entirely in markdown, and they're hoping to keep that content in (more or less) the same markup language

from myst-parser.

choldgraf commented on May 24, 2024 1

Was going through the CommonMark forums and found an interesting comment from JGM re: extension syntax in markdown: https://talk.commonmark.org/t/support-for-extension-token/2771/7

from myst-parser.

jlperla commented on May 24, 2024

This means that if we were to support the same subset of Pandoc Markdown, then we'd be supporting a language that is already utilized by a huge community of people.

This is a serious advantage, and could lead to complementarities in building tools and training.

This is the flavor of markdown supported by bookdown. It has proven resilient and quite popular in the R community, and supports many of the features that we'd need for publishing

Yes. Also, the bookdown extensions also should be seriously considered. When I went through executablebooks/meta#11 stuff, it seemed to have a solution for everything I had done through jupinx.

from myst-parser.

rowanc1 commented on May 24, 2024

Have people here heard of Idyll? It is an interactive markdown syntax, which might be relevant here? I have been working on an editor/renderer for this sort of content which is quite similar (https://components.ink/). Ink is written in html directly, so less relevant, but perhaps some of the ideas/schema of bringing interactivity into the markdown might be?

I find this style of interactivity very cool in having the text documents themselves react directly to interaction. This allows for "scalable documents" as well as text that can update directly. This also allows for embedding variables directly in the prose. A few images below to show what I mean.

I have been giving this style of interactive document quite a bit of thought over the last few months (albeit outside of the Jupyter ecosystem) and can expand if people are curious. Having components of this be backed by/interoperable with Jupyter would be quite exciting.

from myst-parser.

choldgraf commented on May 24, 2024

@rowanc1 that kind of functionality would be awesome to have. I've always loved the documents that connect the text with the outputs in an interactive fashion.

Thinking through how to support more complex features (like the neat linking stuff mentioned above), I did a little thought experiment about how to include "directives" and "roles" in markdown. If we supported this, it would let us extend the language to interesting features like the ones that @rowanc1 describes. I added a section with to brainstorms for "directives and roles in markdown" above, would love to hear what people think.

As an example of how the Pandoc syntax I suggested above might work, you could accomplish the basic idyll example here:

# Hello World

[var name:"x" value:5 /]

The value of x is [Display value:x format:"d" /].

[Range value:x min:0 max:10 /]

With something like this:

# Hello world
::: var {name="x" value=5} :::

The value of x is []{display_value=x format="d"}.

::: range {value=x min=0 max=10} :::

from myst-parser.

rowanc1 commented on May 24, 2024

The directives and roles look pretty promising! A few other thoughts if you go down this route.

Scopes

One of the important things that I have seen is the introduction of variable scopes so that you can maintain state in a section of the document. That is, not everything lives in the global document namespace, you can section them off ([Display value:scope1.x /] or the Pandoc equivalent). This is really important in larger documents or in referencing into a scope that you are reusing/importing. I think when connecting this with other computational kernels that also becomes quite important. You may have some client-side presentational calculations (format etc.) - and that should be able to execute without necessarily talking to a computation server.

Transformations

Another issue going down this path is the language of small calculations/transformations. For example, one of the examples I have used is to have the text say "free" when price is equal to zero. This requires you to determine the language that the transformation is written in. In my case (i.e. in Ink), I have chosen javascript, as I believe this is probably the main (if only?) presentational environment that will be dynamic. This may present some (small) complications for the rendering pipeline (i.e. you need a node environment to evaluate variables).

From Ink.

Web components

I went down a path in 2017 of creating a parser for what I called .xmd extensible markdown. I got a parser and a bit of a spec going, but it was brittle and I basically gave up on extending markdown (I think the larger community involvement here changes that calculation). The next approach I took was going into web components, which allows you to define XML components for a browser to parse and display. For example, the variable declaration [var name:"x" value:5 /] becomes <ink-var name="x" value="5" />. I have a full comparison here that might jog some other thinking if you decide to go down this path.

Using web components means the markup output (e.g. from this project) is completely declarative and there should be a 1:1 mapping between the properties in markdown and the attributes in XML (which is important for any round-trip considerations). I think this is quite exciting as the toolchain developed here can be completely separate from the rendering side - and the project is about the standards of what the properties, etc. are called, and less about the rendering implementation (e.g. the js library you choose to import).

Let me know if you want me to expand on any of this, or put these thoughts somewhere else! Excited to see where this project goes.

from myst-parser.

jlperla commented on May 24, 2024

I've always loved the documents that connect the text with the outputs in an interactive fashion.

For sure. But can't we already do that with packages and extensions?

I think it is pretty hard to get that working in a language-neutral (especially if we want a bijective transformation to ipynb). I have a lot of success with https://github.com/JuliaGizmos/Interact.jl for example in Julia but those sorts of features are tied into the particular language and package.

from myst-parser.

rossbar commented on May 24, 2024

Correct, Elegant Scipy was written in markdown, though not all of the desired features (labels/cross-referencing, etc.) were supported by the particular build system (comprising notedown and nbconvert).

From my perspective, markdown makes a lot of sense for a publishing system that aims to support "non-expert" users; i.e. someone like Jane from the user personas. Jupyter, GitHub, GitLab, etc. are very popular and people who use these tools have been exposed to markdown already, so a limited superset of new syntax that provides the necessary features for scientific publishing seems like a natural way to appeal to a lot of potential users --- learning a few new things in a language you have some familiarity with is a lot less daunting than learning a whole new language.

I am certainly no expert when it comes to tooling for scientific publishing, and have not thought as deeply about as many others have (cf. the many interesting ideas and informative issues/PRs in this repo). I aim to convert/add elements to Elegant Scipy with pandoc/rmarkdown to have a concrete test case that is relevant for other upcoming textbook projects. I expect this process and exploring various conversion/build tools with the resulting document will be enlightening as to what features are truly important for our publication needs moving forward.

from myst-parser.

choldgraf commented on May 24, 2024

Another potentially relevant point - if we run into performance issues with parsing markdown etc documents, then we could look into some other parsers for this.

E.g. here are commonmark parsers in several languages

Haskell (in-progress): https://github.com/jgm/commonmark-hs/tree/master/commonmark
Javascript: https://github.com/commonmark/commonmark.js/
C: https://github.com/commonmark/cmark and this related Python wrapper: https://github.com/PavloKapyshin/paka.cmark
Python: https://github.com/miyuchina/mistletoe
Python: https://github.com/readthedocs/commonmark.py (what recommonmark uses)

I think all will convert to an AST rather than doing a direct-to-HTML conversion, which might mean we could piggy-back on it?

from myst-parser.

mmcky commented on May 24, 2024

With the additional Rendering and Execution layer that Jupyter provides -- it will be important to keep in the front of mind the difference between underlying Text representation and the Rendering representation and where each of those elements are produced (i.e. through a build parser or a supporting extension for the notebook etc.).

The way we have been thinking about this recently is:

Text Syntax (i.e. Markdown) <-> IPYNB(as JSON)

The uniqueness of the notebook is that it has both representations. IPYNB(as JSON) a machine readable text representation and IPYNB(as Rendered HTML) as a finished product. Hopefully we can also make a human readable version of IPYNB(as JSON) for direct text representations.

from myst-parser.

mmcky commented on May 24, 2024

This is an interesting discussion on common-mark

https://talk.commonmark.org/t/generic-directives-plugins-syntax/444

from myst-parser.

choldgraf commented on May 24, 2024

Note - I opened up a thread in the Jupyter community forum to see if people have thoughts about text-based standards: https://discourse.jupyter.org/t/should-jupyter-recommend-a-text-based-representation-of-the-notebook/3273/9 (that thread is focused on a text-based representation of a notebook structure, not changing the flavor of markdown that notebooks support)

from myst-parser.

Review available Markdown Syntax about myst-parser HOT 13 CLOSED

Comments (13)

Why support Pandoc Markdown?

How would we support this?

A few caveats

What about directives and roles?

Directives

Roles

Scopes

Transformations

Web components

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent