Git Product home page Git Product logo

bibcleaner's Introduction

Purpose

Papers have page limits. Bibtex files are ridiculously messy, contain a ton of extraneous attributes that make the references ugly, inconsistent, and push you over the page limit. Consider this example:

  @article{Kalashnikov06,
  author = {Kalashnikov, Dmitri V. and Mehrotra, Sharad},
  title = {Domain-independent Data Cleaning via Analysis of Entity-relationship Graph},
  journal = {ACM Transactions on Database Systems},
  issue_date = {June 2006},
  volume = {31},
  number = {2},
  month = jun,
  year = {2006},
  issn = {0362-5915},
  pages = {716--767},
  numpages = {52},
  doi = {10.1145/1138394.1138401},
  acmid = {1138401},
  publisher = {ACM},
  address = {New York, NY, USA},
  }

We only actually care about author, title, journal, and year. The rest waste space.

There are also multiple entry types, each with their own format (I realize there are subtle differences that I choose to ignore):

  • inproceedings
  • journal
  • article

Booktitles/journals have pretty much no standard. Consider the following versions of "SIGMOD":

  Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data
  Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data
  Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
  Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
  Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
  Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
  Proceedings of the 2014 ACM SIGMOD international conference on Management of data
  Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
  SIGMOD
  SIGMOD Conference
  Proceedings of the 2003 ACM SIGMOD international conference on Management of data
  Proceedings of the 2007 ACM SIGMOD international conference on Management of data
  Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data
  In Proceedings of ACM SIGMOD

Why? WHY?

BibCleaner

bibcleaner is a utility that:

  • parses (using biblib) and loads a bibtex file and throws out the unnecessary entry attributes. It keeps sane attributes for non-inproceedings/journa/article entries.
  • the entries are saved in a sqlite database that can be queried
  • runs a small webserver with a simple GUI to clean the booktitles in your entries
  • once you've cleaned enough booktitles, export a clean, succinct version of your bibtex file, optionally sorted by year, author (lexigraphically), entry key, or booktitle.

Interface

Screenshot

The above is a screenshot of the bibcleaner interface. The middle panel shows the list of unique booktitles in your bibtex file.
Use the input box to type in a new value for the highlighted booktitle, or an empty string to accept the existing value. Up and down arrow keys work and we try really hard to make the interface keyboard only, and efficient.

The left panel shows the mappings you have already created. The blue text is the short normalized booktitle, and the red text are the original variations that map to the blue text. Click on a red text to remove that mapping.

You can upload bibtex files, list the files you've uploaded, and export a clean bibtex file

Installation and Usage

This library requires:

Do the following

  # Setup a clean virtual env
  virtualenv new --python=<path to python3> <venv name>

  # Install custom biblib
  git clone https://github.com/sirrice/biblib.git
  cd biblib
  python setup.py install

  # Install bibcleaner
  pip install bibcleaner

Running the GUI

The easiest is to just run the server and use the gui (click help in the upper right for instructions)

  ./bibcleaner --server --port 8000
  # go to localhost:8000

Command Line Usage

Help

  ./bibcleaner --help

Add entries

  ./bibcleaner --bibpath <path to bib file>

Clean up an existing bibtex file and print to stdout

  ./bibcleaner --bibpath <path> --printout

After normalizing the book titles, output the cleaned entries into a new file

 ./bibcleaner --out <path to output bibtex file> 

Or output them sorted by booktitle, author name or another attribute

./bibcleaner --out <path> --sort booktitle

Now you're under the page limit!

Kudos

We're using aclement's biblib with a tiny hack to ignore parsing errors.

bibcleaner's People

Contributors

sirrice avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.