Git Product home page Git Product logo

markdown-parser's Introduction

About

This is a DIY Markdown parser written from scratch in JavaScript!

Here's what it's able to parse:

  • Headings and heading IDs
  • Paragraphs
  • Italic
  • Bold
  • Inline code
  • Links
  • Images
  • Code blocks
  • Blockquotes
  • Horizontal breaks
  • Unordered, ordered, and task lists (but not nested ones, currently)

Here's what hasn't been tackled so far:

  • Nested lists
  • Tables
  • Footnotes
  • Definition lists
  • Highlight

In addition, a basic frontend with localStorage has been implemented, allowing you to create, (auto) save, rename, and delete Markdown files. For a full-stack version, check out Markright - a tool I built with this custom Markdown parser that allows me to write notes and organize them by folder on the go.

Getting started

git clone this repository and then run npm i to install Prettier and http-server (those are the only dependencies other than TypeScript). Then, run npm start to whip up a local HTTP server (which uses http-server).

You can also see the demo at https://jianmin-chen.github.io/markdown-parser.

Thoughts

Disclaimer: This isn't perfect. As I'll mention, I wrote this with little help, for the sake of learning. Most programmers don't do that. We've probably all built off some sort of foundation before, maybe by reading a book, or watching a video, to understand the core underlying concepts. Prior to creating this, I did a bit of that, skimming a few resources for the core terminology but nothing else. Therefore, this is not completely perfect.

I built this as part of a ten-day challenge. Read my blog post to see my thoughts and process on building this.

How this works

We have what's known as a parser. It transforms the Markdown into (mostly safe) HTML. The main function that's responsible for doing this parsing is parseMarkdown, which takes the Markdown, splitting it by each "block" (read: each newline). It also organizes blocks. For example, it organizes the list items in an unordered list as one block.

After this organization is done, we transform the resulting blocks into "tokens". Tokens are just an intermediate representation of the final HTML. For example, here's how an unordered list might look like as a token:

{
    type: "ul",
    content: [
        { type: "li", content: "Item #1" },
        { type: "li", content: "Item #2" }
    ]
}

This is super easy to transform into HTML, as we'll see later. For now, let's continue token generation. Basically, we loop through the blocks we've just organized, passing them into a function called processBlock. This function takes the block and does some initial token generation. For example, if it receives a heading (i.e., # This is a heading *one*), it will strip off the starting #, marking it as a heading token. Then, it passes the rest of the content into splitBlock, which is responsible for recursive token generation. Staying with the example of the heading, you'll notice that "one" is in italics. In this case, a token would be generated for it by splitBlock. The final token would look like:

{
    type: "heading",
    content: [
        "This is a heading ",
        { type: "i", content: "one" }
    ]
}

This provides a couple of benefits:

  • It's readable. You can actually see how the structure of the generated HTML will look, which is super useful when debugging.
  • When you transform it into HTML, you can generate escaped HTML, which ensures that the generated HTML is safe.

Let's get into transforming these tokens into HTML. Now that the program has an array of these tokens, it passes them into a function called parse (confusing name, I'll admit), which creates an HTML tag based on the token. The creation of tags is done by a very simple function, tag:

const tag = (name, content = [], attributes = {}) => {
    // > var linkObject = { name: "a", attributes: { href: "http://www.gokgs.com" }, content: ["Play Go!"] };
    return { name, content, attributes }; // This is smart because content can be recursive when generating HTML.
};

This only provides a representation of HTML tags, but it is so, so useful. We can describe recursive HTML structures with it very easily. For example, task lists can be described by the following representation:

tag("li", [
    tag("input", -1, {
        disabled: "",
        type: "checkbox",
        ...(token.attributes.checked && { checked: "" })
    }),
    tag("span", parse(token.content))
]);

Here, a task item is represented by a list item. The list item has two tags inside: an input tag, which has no content (hence the -1 to represent no content) but a couple of attributes: it is disabled, is a checkbox, and is marked as checked if the Markdown has it marked it as checked (i.e., - [X]). The span tag does have content - the task item - but it doesn't have any attributes. The content inside the span tag also gets passed to parse, which forms the basis of a recursive HTML structure.

From here, it's pretty straightforward to transform this representation of HTML into actual HTML.

And there you have it! A very basic, but functional Markdown parser.

markdown-parser's People

Contributors

jianmin-chen avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

markdown-parser's Issues

Rewrite using TypeScript

Rewrite using TypeScript - keep the original JavaScript in the original location, but write TypeScript in src folder possible piped to JavaScript in a dist folder.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.