Git Product home page Git Product logo

dropbox-paper-to-json's Introduction

Dropbox Paper to Markdown

A Node module to import data from a dropbox paper document and convert it into a json data structure.

Setup

1. get dropbox access token

Create a dropbox App

2. get dropbox paper document id

eg if the url of your dropbox paper is something like

https://paper.dropbox.com/doc/Main-Title-vJdrjMJAHdgfHz0rl83Z

Then the last string element after the last -, reading from left to right, is your document id.

In this ficticious example it would be: vJdrjMJAHdgfHz0rl83Z.

2. add DROPBOX_ACCESS_TOKEN to .env

The project uses dotenv to deal with credentials and enviroment variables.

In the root of the folder repo create a .env file, this is excluded from the github repo by .gitignore to avoid leaking credentials.

Here's an examples format of .env file, with some fictitious credentials

# Dropbox credentials
DROPBOX_ACCESS_TOKEN=vJdrjMJAHdgfHz0rl83ZvJdrjMJAHdgfHz0rl83Z
DROPBOX_DOC_ID=vJdrjMJAHdgfHz0rl83Z

Usage

In development

clone this repo

git clone [email protected]:bbc/dropbox-paper-to-json.git

cd into folder

cd dropbox-paper-to-json

npm install

npm start

This will save a data.json file in the root of the project.

In production

npm install

npm install dropbox-paper-to-json@git+ssh://[email protected]/bbc/dropbox-paper-to-json.git#master -save

Add to your code base

//if using dotenv for environment variable credentials for dropbox paper
require('dotenv').config();
// optional if you want to write the resulting json
const fs = require('fs');
// require module
const dbpMdToJson = require('dropbox-paper-to-json');

dbpMdToJson({
    accessToken: process.env.DROPBOX_ACCESS_TOKEN,
    dbp_doc_id: process.env.DROPBOX_DOC_ID,
    // default for nested === true
    nested: true
}).then((data) => {
    console.log(`done Dropbox Paper to JSON conversion`);
    // optional: now do something with the data
    fs.writeFileSync('./data.json', JSON.stringify(data, null, 2));
});

System Architecture

High level overview of system architecture

Downloading a Dropbox paper

The module uses dpb-download-md node module to get a dropbox paper as markdown given a dropbox paper id and access token.

As the official SDK didn't seem to have a straightforward way to get to a dropbox paper document content.

Converting markdown dropbox paper to "linear" json

The submodule md-to-json/linear.js takes the content of a markdown file as a string and converts it into an array of objects, representing markdown elements.

it's a flat data structure, with no nesting, hence why sometimes refered to as linear.

Example "linear json"

[
    {
      "text": "Chapter 1",
      "type": "h1"
    },
    {
      "text": "Text",
      "type": "h2"
    },
    {
      "text": "vitae elementum velit urna id mi. Sed sodales arcu mi, eu condimentum tellus ornare non. Aliquam non mauris purus. Cras a dignissim tellus. Cras pharetra, felis et convallis tristique, sapien augue interdum ipsum, aliquet rhoncus enim diam vitae eros. Cras ullamcorper, lectus id commodo volutpat, odio urna venenatis tellus, vitae vehicula sapien velit eu purus. Pellentesque a feugiat ex. Proin volutpat congue libero vitae malesuada.",
      "type": "p"
    },
    {
      "text": "Video ",
      "type": "h2"
    },
...
]

Converting linear markdown json to nested json

For some use cases it might be heplfull to nest all the elments between an h1 tag to the next h1 take as siblings/childres/elements of that tag.

Eg h1 tag could contain h2, p tag, link etc..

Likewise h2 tag could contain all other elements up to the next h2 or h1 tag.

NOTE dropbox paper flavour of markdown only properly reppresents H1 and H2 tags hence why we stopped the nesting only at two levels for this use case. But it could be nested further should there be a use case for it.

This is done in md-to-json/index.js

Example "nested json"

{
  "title": "TEST CMS",
  "elements": [
    {
      "text": "Chapter 1",
      "type": "h1",
      "elements": [
        {
          "text": "some text element between h1 and h2 tags",
          "type": "p"
        },
        {
          "text": "text",
          "type": "h2",
          "elements": [
            {
              "text": "vitae elementum velit urna id mi. Sed sodales arcu mi, eu condimentum tell.",
              "type": "p"
            }
          ]
        },
       ...
}

For full example see md-to-json/examples/example_output.json.

Development env

How to run the development environment

Coding style convention ref optional, eg which linter to use

Linting, github pre-push hook - optional

Build

How to run build

NA ?

Tests

How to carry out tests

Minimal test coverage using jest for testing, to run tests:

npm test

Deployment

How to deploy the code/app into test/staging/production

NA, it's a node module.

Contributing

  • Pull requests are welcome.
  • For questions, bugs, ideas feel free to raise a github issue.

Notes Dropbox "flavoured" markdown

Unforntunatelly, Dropbox paper has it's own flawour of markdown. Some of the most relevant and notable difference are:

  • Title of the doc and first heading 1 element, are both marked has h1 / #.
  • Heading 3 is represented as bold ** instead of h3/###.
  • There's no Heading 4, 5 or 6.

Example of dropbox flavour markdown

see md-to-json/examples/test.md as an example of dropbox flavour markdown file.

Markdown elements not included in module

  • H3 tag,since dropbox paper markdown represents it as bold **
  • Parsing markdown github flavour tags h3 to h6 as not generated by dropbox paper markdown.

Markdown elements that could be included in module

  • Parsing markdown github flavour tags for images eg ![alt text](link url). These appear on their own line.

    • NOTE luckily even when displayed on the same line in dropbox paper, the images are still represented on individual lines when exported as markdown. Which makes it easier to identify as separate from other elements and parse.
  • Parsing markdown github flavour tags for links eg [text](link url) these generally appear as part of a paragraph, but could also appear in their own line, or as part of a heading etc..

dropbox-paper-to-json's People

Contributors

alvinsight avatar pietrop avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.