Git Product home page Git Product logo

pdf_cite_analyzer's Introduction

PDF Cite Analyzer

Replaces the internal links in a paper pdf with NASA ADS links. Will require some changes from paper to paper, so recommended familiarity with python.

Requirements

Usage

python pdf_cite_analyzer.py <input file> <output file>

Notes

There is a step which requires a file called replace_rules.txt to be in the top level directory. This file helps with cleaning up scraped text from the PDF so that the name and year attached to the inline citation can be adequately parsed even when the scraped text is garbled. The format of this file is:

<input> -> <replace>
<input 2> -> <replace 2>
;;;
<input> -> <replace>
<input 2> -> <replace 2>

Before the ;;; are replacements done before the regrex attempts to parse the inline citation whereas after the ;;; are replacements done on each group the regrex finds. The specific regrex is ([A-Za-z0-9.\- ,&]+)(?:[.+ ]+)\(?(\d{4}[ab]?) where group 1 is the first authors name and group 2 is the year.

pdf_cite_analyzer's People

Contributors

juniperfdel avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.