Git Product home page Git Product logo

pandoc-http's Introduction

RESTful HTTP API for Pandoc

Pandoc is an amazing universal document converter. Unfortunately, it just has a command-line interface. In this project, we enable the usage of Pandoc via a RESTful HTTP API, provide a mapping of Pandoc type identifiers to common media types, and wrap everything in a docker container, so that it can be easily used/deployed.

API

The API only allows POST requests. The data to be converted must be passed in the request body. The header field Content-Type specifies the input type and the header field Accept specifies the output type.

Media Type Mapping

Since Pandoc uses its own type identifiers for input and output format, we created a mapping between the Pandoc identifiers and the corresponding media types. For instance, the Pandoc identifier html maps to the media type text/html.

The mapping is incomplete since there does not exist a media type for every format supported by Pandoc. Therefore, you can also use the Pandoc identifiers in Content-Type and Accept but this is not compliant with the HTTP specification. To be compliant, we support the usage of application/x. as a prefix in front of a Pandoc identifier. This prefix is the official media type tree for unregistered types.

Installation with Docker

To simplify the usage of this project, we wrapped everything into a docker container that can easily be deployed on any machine.

Pandoc uses latex to create pdfs. Since the latex dependencies add roughly 2gb to the docker image, we decided to create two images:

  • dwolters/pandoc-http:latest does not include latex and is therefore unable to create pdfs (uncompressed ~700mb, compressed ~280mb). The :latest tag is added by default if no tag is specified.
  • dwolters/pandoc-http:latex includes latex and be used to create pdfs (uncompressed ~2.7gb, compressed ~2gb). It takes a while to build or pull this image.

You can build the image yourself:

docker build -t dwolters/pandoc-http .

Or install it via docker hub:

docker pull dwolters/pandoc-http

Afterwards, you can start the container:

docker run -d -p 8080:80 --name my-pandoc-http dwolters/pandoc-http

Within the container the HTTP API is reachable on port 80. In the command above the HTTP API is bound to port 8080 of the docker host.

You can stop and remove the container if it is not needed anymore:

docker stop my-pandoc-http
docker rm my-pandoc-http

Installation without Docker

In order to use this project without using the docker container, you first must install Pandoc and add it to your PATH. Alternatively, you can set the PANDOC env variable to define the location of your pandoc executable.

Afterwards, clone the repository and switch to the proper directory:

git clone https://github.com/dwolters/pandoc-http
cd pandoc-http

Install the dependencies:

npm install

And finally, you can start the HTTP API for Pandoc:

node server.js

The API can run on a different port by setting the PORT environment variable, e.g., on port 8080:

PORT=8080 node server.js

Example API Call

Assuming the API listens on port 8080, you can test it by using curl. The following command shows how to convert html into markdown using our HTTP API for Pandoc:

curl -s -H "Content-Type: text/html" -H "Accept: text/markdown" --data "<h1>My Headline</h1>"  http://localhost:8080/
curl -s -H "Content-Type: text/html" -H "Accept: docx" --data "<h1>My Headline</h1>"  http://localhost:8080/ > file.docx
curl -s -H "Content-Type: docx" -H "Accept: text/markdown" --data-binary "@file.docx"  http://localhost:8080/

Please note that in this example the pandoc identifier for docx files is used. The correct media type would be application/vnd.openxmlformats-officedocument.wordprocessingml.document.

Swagger Description

The script generate-swagger-spec.js automatically generates the Swagger description for this service based on the supported input and output formats (listed by pandoc --list-[input|output]-formats respectively). The Swagger description can be generated in both YAML and JSON format. The npm scripts generate-swagger-json and generate-swagger-yaml can be used to output the generated description into a file with a fixed filename (pandoc.swagger.json or pandoc.swagger.yaml respectively). To save the description into a file with custom filename, run

node generate-swagger-spec.js [--json|--yaml] > your-filename-here.ext

Acknowledgements

The Dockerfile is partially based on the Dockerfile of vpetersson's pandoc container.

pandoc-http's People

Contributors

dwolters avatar jonaskir avatar pscheit avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.