commonmark / commonmark-java Goto Github PK

View Code? Open in Web Editor NEW

2.2K 105.0 279.0 1.99 MB

Java library for parsing and rendering CommonMark (Markdown)

License: BSD 2-Clause "Simplified" License

Java 99.81% JavaScript 0.09% Shell 0.10%

commonmark java markdown parser renderer library

commonmark-java's Introduction

commonmark-java

Java library for parsing and rendering Markdown text according to the CommonMark specification (and some extensions).

Introduction

Provides classes for parsing input to an abstract syntax tree (AST), visiting and manipulating nodes, and rendering to HTML or back to Markdown. It started out as a port of commonmark.js, but has since evolved into an extensible library with the following features:

Small (core has no dependencies, extensions in separate artifacts)
Fast (10-20 times faster than pegdown which used to be a popular Markdown library, see benchmarks in repo)
Flexible (manipulate the AST after parsing, customize HTML rendering)
Extensible (tables, strikethrough, autolinking and more, see below)

The library is supported on Java 11 and later. It works on Android too, but that is on a best-effort basis, please report problems. For Android the minimum API level is 19, see the commonmark-android-test directory.

Coordinates for core library (see all on Maven Central):

<dependency>
    <groupId>org.commonmark</groupId>
    <artifactId>commonmark</artifactId>
    <version>0.21.0</version>
</dependency>

The module names to use in Java 9 are org.commonmark, org.commonmark.ext.autolink, etc, corresponding to package names.

Note that for 0.x releases of this library, the API is not considered stable yet and may break between minor releases. After 1.0, Semantic Versioning will be followed. A package containing beta means it's not subject to stable API guarantees yet; but for normal usage it should not be necessary to use.

See the spec.txt file if you're wondering which version of the spec is currently implemented. Also check out the CommonMark dingus for getting familiar with the syntax or trying out edge cases. If you clone the repository, you can also use the DingusApp class to try out things interactively.

Usage

Parse and render to HTML

import org.commonmark.node.*;
import org.commonmark.parser.Parser;
import org.commonmark.renderer.html.HtmlRenderer;

Parser parser = Parser.builder().build();
Node document = parser.parse("This is *Markdown*");
HtmlRenderer renderer = HtmlRenderer.builder().build();
renderer.render(document);  // "<p>This is <em>Markdown</em></p>\n"

This uses the parser and renderer with default options. Both builders have methods for configuring their behavior:

escapeHtml(true) on HtmlRenderer will escape raw HTML tags and blocks.
sanitizeUrls(true) on HtmlRenderer will strip potentially unsafe URLs from <a> and <img> tags
For all available options, see methods on the builders.

Note that this library doesn't try to sanitize the resulting HTML with regards to which tags are allowed, etc. That is the responsibility of the caller, and if you expose the resulting HTML, you probably want to run a sanitizer on it after this.

Render to Markdown

import org.commonmark.node.*;
import org.commonmark.renderer.markdown.MarkdownRenderer;

MarkdownRenderer renderer = MarkdownRenderer.builder().build();
Node document = new Document();
Heading heading = new Heading();
heading.setLevel(2);
heading.appendChild(new Text("My title"));
document.appendChild(heading);

renderer.render(document);  // "## My title\n"

For rendering to plain text with minimal markup, there's also TextContentRenderer.

Use a visitor to process parsed nodes

After the source text has been parsed, the result is a tree of nodes. That tree can be modified before rendering, or just inspected without rendering:

Node node = parser.parse("Example\n=======\n\nSome more text");
WordCountVisitor visitor = new WordCountVisitor();
node.accept(visitor);
visitor.wordCount;  // 4

class WordCountVisitor extends AbstractVisitor {
    int wordCount = 0;

    @Override
    public void visit(Text text) {
        // This is called for all Text nodes. Override other visit methods for other node types.

        // Count words (this is just an example, don't actually do it this way for various reasons).
        wordCount += text.getLiteral().split("\\W+").length;

        // Descend into children (could be omitted in this case because Text nodes don't have children).
        visitChildren(text);
    }
}

Add or change attributes of HTML elements

Sometimes you might want to customize how HTML is rendered. If all you want to do is add or change attributes on some elements, there's a simple way to do that.

In this example, we register a factory for an AttributeProvider on the renderer to set a class="border" attribute on img elements.

Parser parser = Parser.builder().build();
HtmlRenderer renderer = HtmlRenderer.builder()
        .attributeProviderFactory(new AttributeProviderFactory() {
            public AttributeProvider create(AttributeProviderContext context) {
                return new ImageAttributeProvider();
            }
        })
        .build();

Node document = parser.parse("![text](/url.png)");
renderer.render(document);
// "<p><img src=\"/url.png\" alt=\"text\" class=\"border\" /></p>\n"

class ImageAttributeProvider implements AttributeProvider {
    @Override
    public void setAttributes(Node node, String tagName, Map<String, String> attributes) {
        if (node instanceof Image) {
            attributes.put("class", "border");
        }
    }
}

Customize HTML rendering

If you want to do more than just change attributes, there's also a way to take complete control over how HTML is rendered.

In this example, we're changing the rendering of indented code blocks to only wrap them in pre instead of pre and code:

Parser parser = Parser.builder().build();
HtmlRenderer renderer = HtmlRenderer.builder()
        .nodeRendererFactory(new HtmlNodeRendererFactory() {
            public NodeRenderer create(HtmlNodeRendererContext context) {
                return new IndentedCodeBlockNodeRenderer(context);
            }
        })
        .build();

Node document = parser.parse("Example:\n\n    code");
renderer.render(document);
// "<p>Example:</p>\n<pre>code\n</pre>\n"

class IndentedCodeBlockNodeRenderer implements NodeRenderer {

    private final HtmlWriter html;

    IndentedCodeBlockNodeRenderer(HtmlNodeRendererContext context) {
        this.html = context.getWriter();
    }

    @Override
    public Set<Class<? extends Node>> getNodeTypes() {
        // Return the node types we want to use this renderer for.
        return Set.of(IndentedCodeBlock.class);
    }

    @Override
    public void render(Node node) {
        // We only handle one type as per getNodeTypes, so we can just cast it here.
        IndentedCodeBlock codeBlock = (IndentedCodeBlock) node;
        html.line();
        html.tag("pre");
        html.text(codeBlock.getLiteral());
        html.tag("/pre");
        html.line();
    }
}

Add your own node types

In case you want to store additional data in the document or have custom elements in the resulting HTML, you can create your own subclass of CustomNode and add instances as child nodes to existing nodes.

To define the HTML rendering for them, you can use a NodeRenderer as explained above.

Customize parsing

There are a few ways to extend parsing or even override built-in parsing, all of them via methods on Parser.Builder (see Blocks and inlines in the spec for an overview of blocks/inlines):

Parsing of specific block types (e.g. headings, code blocks, etc) can be enabled/disabled with enabledBlockTypes
Parsing of blocks can be extended/overridden with customBlockParserFactory
Parsing of inline content can be extended/overridden with customInlineContentParserFactory
Parsing of delimiters in inline content can be extended with customDelimiterProcessor

Thread-safety

Both the Parser and HtmlRenderer are designed so that you can configure them once using the builders and then use them multiple times/from multiple threads. This is done by separating the state for parsing/rendering from the configuration.

Having said that, there might be bugs of course. If you find one, please report an issue.

API documentation

Javadocs are available online on javadoc.io.

Extensions

Extensions need to extend the parser, or the HTML renderer, or both. To use an extension, the builder objects can be configured with a list of extensions. Because extensions are optional, they live in separate artifacts, so additional dependencies need to be added as well.

Let's look at how to enable tables from GitHub Flavored Markdown. First, add an additional dependency (see Maven Central for others):

<dependency>
    <groupId>org.commonmark</groupId>
    <artifactId>commonmark-ext-gfm-tables</artifactId>
    <version>0.21.0</version>
</dependency>

Then, configure the extension on the builders:

import org.commonmark.ext.gfm.tables.TablesExtension;

List<Extension> extensions = List.of(TablesExtension.create());
Parser parser = Parser.builder()
        .extensions(extensions)
        .build();
HtmlRenderer renderer = HtmlRenderer.builder()
        .extensions(extensions)
        .build();

To configure another extension in the above example, just add it to the list.

The following extensions are developed with this library, each in their own artifact.

Autolink

Turns plain links such as URLs and email addresses into links (based on autolink-java).

Use class AutolinkExtension from artifact commonmark-ext-autolink.

Strikethrough

Enables strikethrough of text by enclosing it in ~~. For example, in hey ~~you~~, you will be rendered as strikethrough text.

Use class StrikethroughExtension in artifact commonmark-ext-gfm-strikethrough.

Tables

Enables tables using pipes as in GitHub Flavored Markdown.

Use class TablesExtension in artifact commonmark-ext-gfm-tables.

Heading anchor

Enables adding auto generated "id" attributes to heading tags. The "id" is based on the text of the heading.

# Heading will be rendered as:

<h1 id="heading">Heading</h1>

Use class HeadingAnchorExtension in artifact commonmark-ext-heading-anchor.

In case you want custom rendering of the heading instead, you can use the IdGenerator class directly together with a HtmlNodeRendererFactory (see example above).

Ins

Enables underlining of text by enclosing it in ++. For example, in hey ++you++, you will be rendered as underline text. Uses the <ins> tag.

Use class InsExtension in artifact commonmark-ext-ins.

YAML front matter

Adds support for metadata through a YAML front matter block. This extension only supports a subset of YAML syntax. Here's an example of what's supported:

---
key: value
list:
  - value 1
  - value 2
literal: |
  this is literal value.

  literal values 2
---

document start here

Use class YamlFrontMatterExtension in artifact commonmark-ext-yaml-front-matter. To fetch metadata, use YamlFrontMatterVisitor.

Image Attributes

Adds support for specifying attributes (specifically height and width) for images.

The attribute elements are given as key=value pairs inside curly braces { } after the image node to which they apply, for example:

![text](/url.png){width=640 height=480}

will be rendered as:

<img src="/url.png" alt="text" width="640" height="480" />

Use class ImageAttributesExtension in artifact commonmark-ext-image-attributes.

Note: since this extension uses curly braces { } as its delimiters (in StylesDelimiterProcessor), this means that other delimiter processors cannot use curly braces for delimiting.

Task List Items

Adds support for tasks as list items.

A task can be represented as a list item where the first non-whitespace character is a left bracket [, then a single whitespace character or the letter x in lowercase or uppercase, then a right bracket ] followed by at least one whitespace before any other content.

For example:

- [ ] task #1
- [x] task #2

will be rendered as:

<ul>
<li><input type="checkbox" disabled=""> task #1</li>
<li><input type="checkbox" disabled="" checked=""> task #2</li>
</ul>

Use class TaskListItemsExtension in artifact commonmark-ext-task-list-items.

Third-party extensions

You can also find other extensions in the wild:

commonmark-ext-notifications: this extension allows to easily create notifications/admonitions paragraphs like INFO, SUCCESS, WARNING or ERROR

Contributing

See CONTRIBUTING.md file.

License

BSD (2-clause) licensed, see LICENSE.txt file.

commonmark-java's People

Contributors

Stargazers

Watchers

Forkers

frickler pcj msgitter samn chiwanpark spiffygit chengchaos mgs255 prayagverma devslash-paul derari foxyv sathish-1492 vetesii birdnofoot ccrama radicaled semanticbeeng marcins pabranch melnicki openwide-java vmware-archive codesforliving jarvisxiong kongxianghe1234 yqpan1991 mattsheppard vmelnychuk noway1979 szeiger drobert evpaassen olafdietsche d-baer ashang tobre6 javiosyc mrginglymus vdesmet93 arilesgit yusong666666 my19 davidpeterson laihui0207 kookey duhonghao erikvanzijst partito-radicale lirenmi00 jonsampson cmlanche hyl87 wang-song py389172739 kevinyzy ldrozdz kevinkelley gandus10 michellekwa jleider 1065448858 xzel23 saipsa hiukwok cwf818 andersab hl123123 arcnor capsicum trello s-u-g-a-r crkjohn ashish-cloned-forked-repo perfoware rowhit stanleywin alistairgj grtlinux johannescalvin zyhui98 christianopiccinin nodejs-fabric teiniker princeshow chiyutianyi morristech walviealv wendelanchieta vytm kahtaf jawscout advancemen hhy5277 tinylamb yesltd itflypig lilicool awfeequdng literaryprogrammer

commonmark-java's Issues

Parse backtick quotes (`) failed.

I want to start a code block like <div>、<table>、<pre>、<p> etc. It is begin and end with a backtick quotes(`).
But it looks like the parser don't understand this. The parser show a backpack quote char and then start a large div block, and the div block don't end until it reached the end of the page.

Header anchor extension fails on android

When running the renderer on android, I get an IllegalArgumentException:

java.lang.IllegalArgumentException: Unsupported flags: 256
    at java.util.regex.Pattern.<init>(Pattern.java:1320)
    at java.util.regex.Pattern.compile(Pattern.java:971)
    at org.commonmark.ext.heading.anchor.IdGenerator.<init>(IdGenerator.java:14)
    at org.commonmark.ext.heading.anchor.IdGenerator.<init>(IdGenerator.java:13)
    at org.commonmark.ext.heading.anchor.IdGenerator$Builder.build(IdGenerator.java:106)
    at org.commonmark.ext.heading.anchor.internal.HeadingIdAttributeProvider.<init>(HeadingIdAttributeProvider.java:20)
    at org.commonmark.ext.heading.anchor.internal.HeadingIdAttributeProvider.create(HeadingIdAttributeProvider.java:24)
    at org.commonmark.ext.heading.anchor.HeadingAnchorExtension$1.create(HeadingAnchorExtension.java:61)
    at org.commonmark.renderer.html.HtmlRenderer$RendererContext.<init>(HtmlRenderer.java:199)
    at org.commonmark.renderer.html.HtmlRenderer$RendererContext.<init>(HtmlRenderer.java:188)
    at org.commonmark.renderer.html.HtmlRenderer.render(HtmlRenderer.java:62)
    at org.commonmark.renderer.html.HtmlRenderer.render(HtmlRenderer.java:69)

Here is the code I am using to run the renderer

String md = "# markdown"
List<Extension> extensions = Arrays.asList(
        TablesExtension.create(),
        StrikethroughExtension.create(),
        AutolinkExtension.create(),
        HeadingAnchorExtension.create(),
        InsExtension.create());
Parser parser = Parser.builder().extensions(extensions).build();
HtmlRenderer renderer = HtmlRenderer.builder().extensions(extensions).build();
Node document = parser.parse(md);
renderer.render(document);

And here is the relevant section of my gradle config

apply plugin: 'com.android.application'

android {
    compileSdkVersion 25
    buildToolsVersion "25.0.0"

    defaultConfig {
        minSdkVersion 21
        targetSdkVersion 25
        jackOptions {
            enabled true
        }
        ...
    }
    compileOptions {
        sourceCompatibility JavaVersion.VERSION_1_8
        targetCompatibility JavaVersion.VERSION_1_8
    }
    ...
}

dependencies {
    ...
    ext.commonmark = "0.7.1"
    compile "com.atlassian.commonmark:commonmark:$commonmark"
    compile "com.atlassian.commonmark:commonmark-ext-gfm-tables:$commonmark"
    compile "com.atlassian.commonmark:commonmark-ext-autolink:$commonmark"
    compile "com.atlassian.commonmark:commonmark-ext-gfm-strikethrough:$commonmark"
    compile "com.atlassian.commonmark:commonmark-ext-heading-anchor:$commonmark"
    compile "com.atlassian.commonmark:commonmark-ext-ins:$commonmark"
}

This looks very similar to this issue. Any help would be great, thanks!

AST visitor pattern?

Great that you are implementing the commonmark spec in java.

I like the way pegdown implements the Visitor Pattern to be able to modify or interpret the AST once constructed. Is there a similar or planned facility for commonmark-java? For example, in pegdown:

import org.pegdown.PegDownProcessor;
import org.pegdown.Extensions;
import org.pegdown.ast.RootNode;
import org.pegdown.ast.Visitor;

...

int exts
    = Extensions.DEFINITIONS
    | Extensions.AUTOLINKS
    | Extensions.HARDWRAPS
    | Extensions.TABLES
    | Extensions.STRIKETHROUGH
    | Extensions.SUPPRESS_ALL_HTML
    ;

PegDownProcessor processor = new PegDownProcessor(exts);
RootNode root = processor.parseMarkdown(s.toCharArray());

MyVisitor visitor = new MyVisitor();
root.accept(visitor);

...

class MyVisitor implements Visitor {

    @Override
    public void visit(org.pegdown.ast.Node node) {
        log.debug("visit {}", node);
    }

   ...
}

Thanks,
Paul

CommonMark/Markdown renderer (round-trip parsing/rendering)

It's useful to be able to render a parsed document back to CommonMark markdown. This is a tracker issue for issues relating to that, so far:

Unable to disambiguate emphasis delimiter #10
Unable to disambiguate list-item type #11

Escape Raw HTML does not work.

I have tried to render some text including raw html inside. Whatever I give in escapeHtm() true or false it doesn't work. the given text renders like below:

< test > |&| < /test >

<test&gt |&amp| &lt/test&gt

There can be a bug in the detection part. correct me if i am doing something wrong.

Adding ability to exclude some parsers

Hi! Is it possible to add an ability (via Builder) to disable some parsers? There could be cases when people want to parse everything except lists for example. Or for some cases people need to parse only bold and italic. It would be useful to have only one lib for all these cases :)

Override rendering functions

Hi. Is there a way to override the rendering functions? For example, I'd like to render a link differently if it links to page on our domain vs one off-domain.

I was thinking to override the RenderVisitor inner class in HtmlRenderer but it has private access. Is there another way to do this?

Can't compile from fresh clone

Not sure if this is some interaction with something on my machine, but prevents me from compiling the source. I even deleted my ~/.m2 directory and pulled down dependencies from scratch. I had previously been able to build this. Any ideas?

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:1.3.1:enforce 
  (ban-milestones-and-release-candidates) on project commonmark-ext-autolink: 
   Execution ban-milestones-and-release-candidates of goal org.apache.maven.plugins:maven-enforcer-plugin:1.3.1:enforce failed: 
org.apache.maven.shared.dependency.graph.DependencyGraphBuilderException: Could not resolve dependencies for project com.atlassian.commonmark:commonmark-ext-autolink:jar:0.2.1-SNAPSHOT: 
Could not find artifact com.atlassian.commonmark:commonmark:jar:tests:0.2.1-SNAPSHOT -> [Help 1]

mvn compile
[INFO] Scanning for projects...
[INFO] Inspecting build with total of 6 modules...
[INFO] Installing Nexus Staging features:
[INFO]   ... total of 6 executions of maven-deploy-plugin replaced with nexus-staging-maven-plugin
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO] 
[INFO] commonmark-java parent
[INFO] commonmark-java core
[INFO] commonmark-java extension for autolinking
[INFO] commonmark-java extension for strikethrough
[INFO] commonmark-java extension for tables
[INFO] commonmark-java integration tests
[INFO] 
[INFO] Using the builder org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder with a thread count of 1
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building commonmark-java parent 0.2.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-build-environment) @ commonmark-parent ---
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (ban-milestones-and-release-candidates) @ commonmark-parent ---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:copy-resources (copy-license) @ commonmark-parent ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building commonmark-java core 0.2.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-build-environment) @ commonmark ---
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (ban-milestones-and-release-candidates) @ commonmark ---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:copy-resources (copy-license) @ commonmark ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ commonmark ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO] 
[INFO] --- maven-compiler-plugin:3.2:compile (default-compile) @ commonmark ---
[INFO] Compiling 5 source files to /Users/pcj/pow/opt/commonmark-java/commonmark/target/classes
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building commonmark-java extension for autolinking 0.2.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-build-environment) @ commonmark-ext-autolink ---
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (ban-milestones-and-release-candidates) @ commonmark-ext-autolink ---
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] commonmark-java parent ............................ SUCCESS [  0.663 s]
[INFO] commonmark-java core .............................. SUCCESS [  0.588 s]
[INFO] commonmark-java extension for autolinking ......... FAILURE [  0.026 s]
[INFO] commonmark-java extension for strikethrough ....... SKIPPED
[INFO] commonmark-java extension for tables .............. SKIPPED
[INFO] commonmark-java integration tests ................. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2.321 s
[INFO] Finished at: 2015-09-21T09:09:55-07:00
[INFO] Final Memory: 18M/222M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:1.3.1:enforce (ban-milestones-and-release-candidates) on project commonmark-ext-autolink: Execution ban-milestones-and-release-candidates of goal org.apache.maven.plugins:maven-enforcer-plugin:1.3.1:enforce failed: org.apache.maven.shared.dependency.graph.DependencyGraphBuilderException: Could not resolve dependencies for project com.atlassian.commonmark:commonmark-ext-autolink:jar:0.2.1-SNAPSHOT: Could not find artifact com.atlassian.commonmark:commonmark:jar:tests:0.2.1-SNAPSHOT -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :commonmark-ext-autolink

Unexpected success in commonmark-android-test with android-10

In commit 4cfc2e0, I changed

diff --git a/commonmark-android-test/app/build.gradle b/commonmark-android-test/app/build.gradle
index 3ca56fb..5d940d1 100644
--- a/commonmark-android-test/app/build.gradle
+++ b/commonmark-android-test/app/build.gradle
@@ -10,13 +10,13 @@ if (testPropertiesFile.canRead()) {
 }
 
 android {
-    compileSdkVersion 16
-    buildToolsVersion "21.1.1"
+    compileSdkVersion 10
+    buildToolsVersion "21.1.2"
 
     defaultConfig {
         applicationId "com.atlassian.commonmark.android.test"
-        minSdkVersion 16
-        targetSdkVersion 16
+        minSdkVersion 10
+        targetSdkVersion 10
         versionCode 1
         versionName "1.0"

Even though I would expect it to fail, gradle reports

...
:app:packageSnapshotDebugAndroidTest
:app:assembleSnapshotDebugAndroidTest
:app:connectedSnapshotDebugAndroidTest
BUILD SUCCESSFUL
Total time: 1 mins 12.542 secs

and the report contains

<?xml version='1.0' encoding='UTF-8' ?>
<testsuite name="com.atlassian.commonmark.android.test.AndroidSupportTest" tests="8" failures="0" errors="0" skipped="0" time="50.773" timestamp="2017-03-02T17:26:42" hostname="localhost">
  <properties>
    <property name="device" value="android-10(AVD) - 2.3.3" />
    <property name="flavor" value="SNAPSHOT" />
    <property name="project" value="app" />
  </properties>
  <testcase name="parseTest" classname="com.atlassian.commonmark.android.test.AndroidSupportTest" time="6.03" />
  <testcase name="headingAnchorExtensionTest" classname="com.atlassian.commonmark.android.test.AndroidSupportTest" time="7.379" />
  <testcase name="insExtensionTest" classname="com.atlassian.commonmark.android.test.AndroidSupportTest" time="6.549" />
  <testcase name="strikethroughExtensionTest" classname="com.atlassian.commonmark.android.test.AndroidSupportTest" time="6.069" />
  <testcase name="autolinkExtensionTest" classname="com.atlassian.commonmark.android.test.AndroidSupportTest" time="6.423" />
  <testcase name="tablesExtensionTest" classname="com.atlassian.commonmark.android.test.AndroidSupportTest" time="6.019" />
  <testcase name="yamlFrontMatterExtensionTest" classname="com.atlassian.commonmark.android.test.AndroidSupportTest" time="6.07" />
  <testcase name="htmlRendererTest" classname="com.atlassian.commonmark.android.test.AndroidSupportTest" time="6.059" />
</testsuite>

When I copy commonmark-core into a test project (android-10) and build it as part of this project, the build fails as expected.

TextContent extension for GFM tables

There is an extension for parsing and rendering markdown with GFM tables to HTML, but there is no support for textcontent.

In my usecase I want to be able to parse and render markdown content with GFM tables to plain text and html. Especially for inline style GFM tables and empty headers, the formatting could be improved by an extension, in my opinion.

This is a possible text output:

Markdown | Less | Pretty
--- | --- | ---
*Still* | `renders` | **nicely**
1 | 2 | 3

is it possible to alter laziness in block quote

Just found the library and tried it to write some extension to support non standard syntax.
In short, i want to use commonmark-java library infrastructure to parse and/or render non standard syntax too.

Now, i'm trying to implement the blockquote rule but without laziness.
For example:

>Quote line1
>Quote line2
This should not be part of quote

Using above syntax i expected the result like below(but without extra line between the 'Quote line2' and 'This should not be part of quote'

Quote line1
Quote line2

This should not be part of quote

Is it possible to alter the laziness for my blockquote rule?
Which file should i read more?

Order of Custom Extension Factories

Hi, I'm creating an extension to parsing YAML style metadata of markdown. Hyphens are generally used to divide metadata section. Following is example:

---
key1: value1
key2: value2

---

markdown document start!

This format is widely used. We can find lots of example in web.

But I cannot create the extension to parse metadata because core factories precede the custom extension factories. In above example, --- is considered as horizontal line by HorizontalRuleParser.

I think that the custom extension factories should precede the core factories. It would be helpful to extend parser.

Publish Javadoc somewhere

The API documentation should be published somewhere, probably on gh-pages.

Should use a nice theme like doclava.

Editing enclosing <pre> tag attributes for an IndentedCodeBlock with AttributeProviderFactory

I'm new to commonmark-java, and am trying to figure out how to apply a CSS class to all <pre> tags surrounding an IndentedCodeBlock. I was successfully able to apply attributes to Header and BlockQuote nodes, but if I match a Node with IndentedCodeBlock and add a class to the attribute map, then it applies it to the <code> tag in the HTML it renders.

I think what's happening here is that <pre> isn't a Node, just an artefact of the way the HtmlRenderer renders the IndentedCodeBlock (and FencedCodeBlock) Nodes, and so I'm never able to override the attributes for the <pre> tag because it never goes through the AttributeProviderContext as a Node.

If I'm understanding this correctly, the only way to do this would be to override the rendering itself for FencedCodeBlocks and IndentedCodeBlocks.

Do you think it makes sense to allow for overriding the attributes of tags where more than 1 set of tags is generated per node? Perhaps the AttributesProvider could provide attributes for the outer tags first with an override for providing attributes inside of the nested structure?

Unable to disambiguate list-item type

Similar to #10, list-item types are lost.

* Item 1
* Item 2
- Dash 1
- Dash 2

No information in the AST what the list item type was. Prevents round-trip parsing/emitting of markdown documents.

Could be solved by annotating AST nodes with start/end source positions. Not clear to me how to go about doing that.

Document if the classes are threadsafe.

Can a Parser or HtmlRenderer be used in a threaded environment - shared across threads?

Indent after numbers is not consistent with reference commonmark implementation

The following text:

1. foo


     this should not be code

Renders the last line as code, rather than text. I think this is a bug.

This does not match the reference commonmark implementation: http://spec.commonmark.org/dingus/

The majority of other implementations do not show this behaviour: http://johnmacfarlane.net/babelmark2/?normalize=1&text=1.+foo%0A%0A%0A++++bar

I believe the relevant part of the commonmark spec is 5.3 Lists, which says:

A list is a sequence of one or more list items of the same type. The list items may be separated by any number of blank lines.

Note that this was confusingly worded in earlier versions of the spec: https://talk.commonmark.org/t/multiple-blank-lines-inside-a-list/2289

Table borders

Is there a way to inject a css style into generated tables or to turn on borders? I'd love a bootstrap styled .table kind of thing

Update to CommonMark spec 0.27

Changelog: http://spec.commonmark.org/changelog.txt

Diff: http://spec.commonmark.org/0.27/changes.html

TODO:

Add h2..h6 to block tag list
Check link precedence (see commonmark/commonmark-spec#427)

Allow InputStream or Reader as Parser input

Currently, the only input type is String. In case the input is read from a stream, this means it has to be read into a String first, and then the parser has another copy of the data as block content.

Accepting an InputStream (or Reader?) instead would allow to get rid of one of the copies. DocumentParser already processes the input line by line, so this should be trivial.

Allow custom implementations of InlineParsers

Hi there!

Continuing with the theme of extensibility I'd like to be able to specify the inline parser used to parse markdown.

In my particular use case I'd like to disable a few features of markdown (e.g., inline images) and add a few of my own (e.g., @name will be recognized and parsed as something specific).

Proposal:
Add a method to Builder, public Builder inlineParser(InlineParser parser) that allows passing an implementation of InlineParser to Parser.

e.g.,

Parser.builder().inlineParser(new AtMentionParser()).build()

Table rendering is failing

using commonmark-ext-gfm-tables:0.7.0 with the following input. It actually produces nothing.


| Module    |Javadocs   |
| ------    |--------   |
| gradle-fury-validation    | [Javadocs](javadocs/gradle-fury-validation/index.html)    |
| hello-gradhell    | [Javadocs](javadocs/hello-gradhell/index.html)    |
| hello-universe-lib    | [Javadocs](javadocs/hello-universe-lib/index.html)    |
| hello-world-aar   | [debug](javadocs/hello-world-aar/debug/index.html)     |
| hello-world-aar   | [release](javadocs/hello-world-aar/release/index.html)     |
| hello-world-lib   | [Javadocs](javadocs/hello-world-lib/index.html)   |
| hello-world-war   | [Javadocs](javadocs/hello-world-war/index.html)   |

I've tried adding more whitespace, removing the links, removing the leading and trailing pipes. What am I doing wrong?

            List<org.commonmark.Extension> extensions = new ArrayList<>();
            extensions.add(org.commonmark.ext.gfm.tables.TablesExtension.create());
            extensions.add(org.commonmark.ext.gfm.strikethrough.StrikethroughExtension.create());
            extensions.add(org.commonmark.ext.autolink.AutolinkExtension.create());
            Parser parser = Parser.builder().extensions(extensions).build();

            Node document = parser.parse(contents);
            HtmlRenderer renderer = HtmlRenderer.builder().build();
            String contents = renderer.render(document);  // document is the content above

Add source position/maps to AST

Having source positions is a useful feature for editors, as it allows linking blocks between the source and the rendered output. commonmark.js supports it, see highlighted blocks in preview of dingus.

There is some code for adding source positions to blocks, but it's untested and not currently exposed.

Disable autoconverting HTML entity references

As the title says, is there a way to disable autoconverting HTML entity references, e.g. α to α? I am saving the converted HTML in a DB which is Windows-1252 encoded (can't be changed right now) so having some problems.

Add support for asymmetric delimiters

org.commonmark.parser.DelimiterProcessor only supports inline elements with symmetric delimiters, like * and _ (or ~~ for strikethrough). I'd like to write an extension for inline elements with asymmetric delimiters (in my case, {}), but this is not possible with the current design.

I have considered using a org.commonmark.parser.PostProcessor, but I'd like my extension to be parsed with the same precedence as other inline elements.

Allow rendering to OutputStream/Writer

Similar to #2, rendering to a stream should be possible. Also fairly simple to implement.

generating a pdf

Hi,

I'm trying to generate pdf using this project with creating a custom org.commonmark.renderer.Renderer. the problem is, the render method returns String which in my case it has flaw design and is not needed, IMHO, it would be better if it was outputstream or byte[] do you have any suggestion ?

Update to CommonMark spec 0.22

Make the following succeed:

./etc/update-spec.sh 0.22 && mvn clean test

See changes here: http://spec.commonmark.org/0.22/changes.html

I don't think we handle CR (without following LF) correctly ATM
Note changed conditions of HTML blocks

Unable to disambiguate emphasis delimiter

AST nodes do not provide start or end indices into the source and Emphasis.java and StrongEmphasis.java do not provide delimiter infomation. Therefore, for the input:

Hello *Italic* **Bold** _Emph_ __Strong__ ~~Strike~~!

It does not currently seem possible to parse this, build an AST, and emit the same markdown back out, as the metadata about the delimiter character is lost.

Inline delimiter parser can not be registered more than once, delimiter character: ~

Trying to implement a separate extension for subscript like H~2~0, but I can't use tilde in conjunction with the existing ext-gfm-strikethrough extension, as they clash on parser registration. A workaround seems to be create a new extension that bundles both, something like ext-subscript-and-gfm-strikethrough that works similar to the EmphasisDelimiterProcessor. This is somewhat ugly, but perhaps reasonable since this should be a fairly rare occasion.

Thoughts?

Raw inline HTML incorrectly processed

Raw HTML is not handled according to the spec. Example markdown:

test raw html:
<a><bab><c2c>

Actual output from commonmark-java:

<p>test raw html:</p>
<p>&lt;a&gt;&lt;bab&gt;&lt;c2c&gt;</p>

Expected output:

<p>test raw html:</p>
<a><bab><c2c>

See http://spec.commonmark.org/0.22/#example-559

An option to add automatic IDs to headings

GitHub flavoured markdown adds automatic IDs to headings, and it would be great to see this as an option here (disclaimer: I'm hoping to see the feature in Stash/Bitbucket server).

This is not in commonmark (and won't be added) though some implementations even do this by default.

In the case of GitHub, a heading #foo gets an ID user-generated-foo to avoid clobbering IDs used elsewhere. There's then some JS that means that you can use #foo in your URL instead of #user-generated-foo.

This would allow us to use tools like markdown-toc or doctoc on Bitbucket server.

Update to CommonMark spec 0.24

From the diff between 0.22 and 0.23:

header -> heading
horizontal rule -> thematic break
HtmlTag -> HtmlInline (to follow commonmark.js)
ATX heading: must be space (check regex)
Entity or numeric cahracter references in raw HTML
No more optional whitespace after link label

From the diff between 0.23 and 0.24:

Update spec example parsing
Headings with multiple lines
Parentheses inside the link destination may be escaped
No spaces in in link destination (even with <>)
Link scheme whitelist removed (link)

Update to CommonMark spec 0.26

Changes (from here):

empty list items can no longer interrupt a paragraph; this resolves an ambiguity with setext headers
ordered lists can interrupt a paragraph only when beginning with 1
the two-blank-lines-breaks-out-of-lists rule has been removed
the spec for emphasis and strong emphasis has been refined to give more intuitive results in some cases
tabs can be used after the # in an ATX header and between the markers in a thematic break

Spec changelog: http://spec.commonmark.org/changelog.txt
Spec diff: http://spec.commonmark.org/0.26/changes.html

Check GFM extensions against new spec

GitHub just posted on their blog that GitHub-flavored Markdown is now CommonMark + extensions and has a spec: https://githubengineering.com/a-formal-spec-for-github-markdown/

Spec lives here: https://github.github.com/gfm/

Check our implementation of tables, strikethrough (and maybe autolinking) against the spec.

Document NodeRenderer to customize HTML rendering in README

It's too hidden in the Javadoc. We should have a section in the README showing how to use it, and maybe link to the Javadocs for HtmlRenderer.Builder.

Can we have a 0.4.1 release?

I'm currently working on an app that deals with small texts and I figured it would be nice if it had basic CommonMark support.
I included compile 'com.atlassian.commonmark:commonmark:0.4.0' to my dependencies and added the Parser, but building and running the code would throw an exception:

java.util.regex.PatternSyntaxException: U_ILLEGAL_ARGUMENT_ERROR
^\p{IsWhite_Space}

I saw that this issue was fixed (b954f82) right after the 0.4.0 release, so a new version soon would be great! 👍

StringIndexOutOfBoundsException parsing list followed by unfenced code block

This Markdown fragment throws StringIndexOutOfBoundsException from DocumentParser:

    String markdown = "## Do this\n" +
        "- cd to foo\n" +
        "\n" +
        "\tgit clone https://...\n" +
        "\n" +
        "## Next\n";

java.lang.StringIndexOutOfBoundsException: String index out of range: 8
    at org.commonmark.internal.util.Substring.subSequence(Substring.java:50)
    at org.commonmark.internal.DocumentParser.addLine(DocumentParser.java:330)
    at org.commonmark.internal.DocumentParser.incorporateLine(DocumentParser.java:245)
    at org.commonmark.internal.DocumentParser.parse(DocumentParser.java:74)
    at org.commonmark.parser.Parser.parse(Parser.java:61)

I think the parser is confused exiting the unfenced code block and leaving columnIsInTab=true, resulting in afterTab > line.length() and the parser exploding:

            // Our column is in a partially consumed tab. Expand the remaining columns (to the next tab stop) to spaces.
            int afterTab = index + 1;
            CharSequence rest = line.subSequence(afterTab, line.length());

Support for conditional delimiter processing

Thinking about superscript and subscript: in pandoc, the following is valid 2^10^, but ^a cat^ is not interpreted as superscript, unless one escapes the space (^a\ cat^).

How would the current delimiter parsing scheme handle this?

commonmark-ext-gfm-tables single column table

Would expect to be able to render a single column table. Following code does not render a table.

    final String input = "| First Header |\n" +
            "| ------------- |\n" +
            "| Content Cell |\n" +
            "| Content Cell |\n";
    List<Extension> extensions = Arrays.asList(TablesExtension.create());
    Parser parser = Parser.builder().extensions(extensions).build();
    Node document = parser.parse(input);
    HtmlRenderer renderer = HtmlRenderer.builder().extensions(extensions).build();
    System.out.println(renderer.render(document));

Escaping of pipe symbol in gfm-tables-0.4.1 is not supported

GFM tables allow escaping of the pipe symbol in table cells. For example:

AAA	BBB
a	b

This is not supported in gfm-tables-0.4.1.

Strike Literal value inconsistent

I'm just exploring this library a bit, will post issues as I see stuff. Sorry if too trivial. For the string:

Hello *Italic* **Bold** _Emph_ __Strong__ ~~Strike~~!

The printed AST is:

Document{}
.Paragraph{}
..Text{literal=Hello }
..Emphasis{}
...Text{literal=Italic}
..Text{literal= }
..StrongEmphasis{}
...Text{literal=Bold}
..Text{literal= }
..Emphasis{}
...Text{literal=Emph}
..Text{literal= }
..StrongEmphasis{}
...Text{literal=Strong}
..Text{literal= ~~Strike~~!}

Preservation of the double-tilde in the literal attribute appears inconsistent with the other node implementations. Not sure if this has any side-effect, but I thought I'd point it out.

Using A Visitor

in your visitor example:

Node node = parser.parse("...");
MyVisitor visitor = new MyVisitor();
node.accept(visitor);

class MyVisitor extends AbstractVisitor {
    @Override
    public void visit(Paragraph paragraph) {
        // Do something with paragraph (override other methods for other nodes):
        System.out.println(paragraph);
        // Descend into children:
        visitChildren(paragraph);
    }
}

What should be printed to System out? Because I tried on a few things and when visiting both headers and paragraphs all it prints is the type folowed by curly braces like this:

INFO: Visit:(p)- Paragraph{}

I thought maybe it would print out the contents of the node? How do I get to the content of the node?

AttributeProvider not working for TablesExtension and images

As documented in the source code (TableHtmlRenderer#renderBlock()):

// TODO: What about attributes? If we got the renderer instead of the visitor, we could call getAttributes.

To be able to set custom attributes would be very handy especially when working with Tables!

More in general: IMO AttributeProvider should be working for all tags, including images.

Github wiki style markdown issue

As an experiment, I took the wiki from http://github.com/osmdroid/osmdroid, cloned it and ran it through the commonmark and ran into a few issues.

it appears that relative links to another wiki page isn't resolved...

2. Walk through the [tutorial](How-to-use-the-osmdroid-library)

The target page is generated but the extension .html is missing. Not sure if there's anything that can be done to work around this.
2. The double bracket syntax isn't appear to be handled at all

[[osmdroid thirdparty|osmdroid thirdparty]]

If necessary, I can go back and alter all of the wiki pages to use full urls, but as always, I'll look for less painful solution

StringIndexOutOfBoundsException on empty ordered list input.

Using com.atlassian.commonmark:commonmark:0.1.0:

import static org.junit.Assert.assertEquals;

import org.commonmark.html.HtmlRenderer;
import org.commonmark.node.Node;
import org.commonmark.parser.Parser;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.junit.runners.JUnit4;

@RunWith(JUnit4.class)
public class MarkdownTest {
  @Test
  public void shouldWork() {
    Parser parser = Parser.builder().build();
    Node root = parser.parse("2.");

    HtmlRenderer renderer = HtmlRenderer.builder().escapeHtml(false).build();
    String html = renderer.render(root);

    assertEquals(
        "<ol start=\"2\">\n" +
            "<li></li>\n" +
            "</ol>",
        html);
  }
}

The test above fails with:

java.lang.StringIndexOutOfBoundsException: String index out of range: 2
  at java.lang.String.charAt(String.java:658)
  at org.commonmark.internal.util.Substring.charAt(Substring.java:32)
  at java.lang.Character.codePointAt(Character.java:4668)
  at org.commonmark.internal.util.Parsing.isLetter(Parsing.java:55)
  at org.commonmark.internal.DocumentParser.incorporateLine(DocumentParser.java:194)
  at org.commonmark.internal.DocumentParser.parse(DocumentParser.java:83)
  at org.commonmark.parser.Parser.parse(Parser.java:45)
  at MarkdownTest.shouldWork(MarkdownTest.java:32)

For reference, here's the output for commonmark.js: http://spec.commonmark.org/dingus/?text=2.

Edit Fixed rendering of dingus link

Android support

There's currently a fork with changes for making commonmark-java work on Android. This issue is about merging those changes back, and making sure we don't break things for Android in the future.

Initial discussion here: https://github.com/Doist/commonmark-android/commit/9ff69424f603c6b8c0ddb6419419d651c0de7380#commitcomment-15439453

Add underline support

Hello,

Could we add a UnderlineExtension in a new artifact commonmark-ext-gfm-underline? It would be a cousin class of the existing StrikethroughExtension in artifact commonmark-ext-gfm-strikethrough, with few little changes on this new class and others related to.

If you agree this issue, I could submit a pull request. However, I just need to know which delimiter should be the more accurate to do so.
I thing we should use an "easy to use" character on both QWERTY and AZERTY keyboards (and some others?), with a double occurrence. Here are some proposals: -- or && or %%.

Thank you in advance for your feedback.

Html blocks not wrapped in <p/> when escapeHtml set to true

I'm using the Parser/HtmlRenderer combo with escapeHtml(true), but this leaves html blocks without being wrapped in a paragraph tag. Please see the following example. It seems like incorrect behaviour to me.

Parsing code

public String render(String text) {
    Parser parser = Parser.builder().build();
    HtmlRenderer renderer = HtmlRenderer.builder().escapeHtml(true).build();
    Node document = parser.parse(text);
    return renderer.render(document);
}

Input

This is a paragraph.

This is a paragraph.

<div>html here</div>

This is a paragraph.

This is a paragraph.

<strong>html here</strong>

This is a paragraph.

This is a paragraph.

<p>html here</p>

This is a paragraph.

Expected output

<p>This is a paragraph.</p>
<p>This is a paragraph.</p>
<p>&lt;div&gt;html here&lt;/div&gt;</p>
<p>This is a paragraph.</p>
<p>This is a paragraph.</p>
<p>&lt;strong&gt;html here&lt;/strong&gt;</p>
<p>This is a paragraph.</p>
<p>This is a paragraph.</p>
<p>&lt;p&gt;html here&lt;/p&gt;</p>
<p>This is a paragraph.</p>

Actual output

<p>This is a paragraph.</p>
<p>This is a paragraph.</p>
&lt;div&gt;html here&lt;/div&gt;
<p>This is a paragraph.</p>
<p>This is a paragraph.</p>
<p>&lt;strong&gt;html here&lt;/strong&gt;</p>
<p>This is a paragraph.</p>
<p>This is a paragraph.</p>
&lt;p&gt;html here&lt;/p&gt;
<p>This is a paragraph.</p>

Note that the line <strong>html here</strong> actually is wrapped, but the other two aren't.

StringIndexOutOfBoundsException in InlineParserImpl

Attempting to parse the wrong string like this [example.com](http:\\example.com leads to the following exception:

java.lang.StringIndexOutOfBoundsException: length=32; index=32
            at java.lang.String.charAt(Native Method)
            at org.commonmark.internal.InlineParserImpl.parseCloseBracket(InlineParserImpl.java:579)
            at org.commonmark.internal.InlineParserImpl.parseInline(InlineParserImpl.java:303)
            at org.commonmark.internal.InlineParserImpl.parse(InlineParserImpl.java:157)
            at org.commonmark.internal.ParagraphParser.parseInlines(ParagraphParser.java:61)
            at org.commonmark.internal.DocumentParser.processInlines(DocumentParser.java:349)
            at org.commonmark.internal.DocumentParser.finalizeAndProcess(DocumentParser.java:495)
            at org.commonmark.internal.DocumentParser.parse(DocumentParser.java:84)
            at org.commonmark.parser.Parser.parse(Parser.java:61)