Git Product home page Git Product logo

pdfbox_text_extraction's Introduction

PDFBox text extraction

This gem lets you extract plain text from PDF documents. It is a Jruby wrapper for the Apache PDFBox library.

Installation

Add this line to your application's Gemfile:

gem 'pdfbox_text_extraction'

And then execute:

$ bundle

Or install it yourself as:

$ gem install pdfbox_text_extraction

Usage

To extract all text on every page:

extracted_text = PdfboxTextExtraction.run(path_to_pdf)

To extract text inside a crop area:

extracted_text = PdfboxTextExtraction.run(
  path_to_pdf,
  {
    crop_x: 0, # crop area top left corner x-coordinate
    crop_y: 1.0, # crop area top left corner y-coordinate
    crop_width: 8.5, # crop area width
    crop_height: 9.4, # crop area height
  }
)

Contributing

  1. Fork it ( https://github.com/jhund/pdfbox_text_extraction/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

Resources

License

MIT licensed.

Copyright

Copyright (c) 2016 Jo Hund. See (MIT) LICENSE for details.

pdfbox_text_extraction's People

Contributors

jhund avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

repon06 chema102

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.