Git Product home page Git Product logo

gebo-docs's Introduction

This module is no longer maintained

You'll like gebo-text-converter better.

gebo-docs

A gebo-server module for document-to-text conversions

Third-party requirements

This package has a butt-load of dependencies. It's tested on Ubuntu 12.04. It'll probably work on other distributions, but the unit tests may break.

poppler-utils 0.24.5

This enables you to convert PDFs to plain text (with pdftotext).

Remove the current poppler-utility, if present:

sudo apt-get remove poppler-utils

Visit [http://poppler.freedesktop.org/] to get a new and stable version (the preferred version is poppler-0.24.5.tar.xz)

tar xvf poppler-0.24.5.tar.xz

Prep poppler-utils for compilation and installation:

cd poppler-0.24.5
./configure

If configure is stopped due to missing fontconfig, you need to install fontconfig:

sudo apt-get install libfontconfig1-dev
./configure

Compile the package:

sudo make

Install the programs, data files, and documentation:

sudo make install

Reboot the system:

sudo reboot

See if the programs were installed:

pdftohtml

If you see the following message:

error while loading shared libraries: libpoppler.so.44: cannot open shared object
file: No such file or directory

Run:

sudo cp /usr/local/lib/libpoppler.so.44 /usr/lib/

docx2txt

Download the source and install manually from here: [http://sourceforge.net/projects/docx2txt/]

sudo apt-get install unzip
tar xvfz docx2txt-1.2.tgz
cd docx2txt-1.2/
sudo make
cd /usr/local/bin
sudo cp docx2txt.pl docx2txt

Et al

sudo apt-get install unrtf
sudo apt-get install odt2txt
sudo apt-get install catdoc

Install

npm install gebo-docs

Usage

Do this if you're happy with the default configuration:

var doc = require('gebo-docs')();

Do this if you set your own third-party dependencies in gebo-docs.json (copy the file provided into the desired directory and modify there):

var doc = require('gebo-docs')('/directory/in/which/config/file/is/contained');

Once required,

doc.convertToText('filename').
    then(text) {
        console.log(text); 
      }).
    catch(err) {
        // Something went wrong 
      });

Contributing

Hit me with it

Licence

MIT

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.