Git Product home page Git Product logo

researchlei's Introduction

Research Lei

This page contains code for Research Lei: http://cs.stanford.edu/people/karpathy/researchlei/

Google Group: https://groups.google.com/forum/?fromgroups#!forum/research-lei

Feel free to contact me for questions, suggestions on Twitter: https://twitter.com/karpathy or via email: [email protected]

Installation

  1. Clone this repository git clone https://github.com/karpathy/researchlei.git

  2. (Optional) Make sure you have ImageMagick installed on your system if you'd like to extract image thumbnails from downloaded papers. In Ubuntu, this is available as sudo apt-get install imagemagick

  3. (Optional) Install pdftotext. This is included by default in many Linux distributions. This tool is used to extract all the words from a paper and find the top 100. Later, this can be used for other fancy processing, such as topic models, tfidf similarity rankings, etc.

  4. You will need Python, preferrably Python 2.7.

  5. Obtain Microsoft Academic Search API APP ID key and place it into a file appid.txt. Since App ID's are rate limited to 200 queries per minute, I would strongly encourage you to obtain your own key from Microsoft (the request involves a single email and they reply fast). However, if you'd only like to check it out first for a bit, fill out this form.

Usage

  1. You start with an empty database in the beginning. To add a paper, run, for example: python addpaper.py name building rome in a day. This gets the addpaper.py script to search Microsoft Academic Search by name for a paper with a title that contains the query words building, rome, in, a, day. The script will then guide you through downloading its citations, reference, and the actual .pdf of the paper. (Sadly, you may find that Microsoft's Academic Search is sparser than Google Scholar, especially with more recent work :( I contacted them about this and they said they are working on an update to their index. Unfortunately, Google Scholar does not provide convenient API, makes scraping difficult, and does not provide information that is as complete.)

  2. The main Python script addpaper.py creates a JSON file that the client/index.html renders for the UI. Open it to see your library (remember to refresh it too every time you run addpaper.py)! Some browsers like Safari and Chrome will not (by default) allow you to do an AJAX call to read the local JSON file. This can be fixed by starting Chrome with a special flag (--allow-file-access-from-files). In Ubuntu, you can drag the Chrome icon to desktop, right click -> properties and append it to Command. Alternatively, just use Firefox!

Usage Example TLDR

  1. add a paper to library: python addpaper.py name building rome in a day
  2. view library: open client/index.html
  3. follow instructions in interface to download any specific paper of interest, or goto 0. to add a custom new paper

FAQ

Q: Microsoft Academic Search is sparser than Google Scholar, especially on very recent work. Could you use Scholar instead? A: No. Google Scholar does not provide API that allows you to easily download their structured data. It is possible to scrape the HTML manually, but they aggressively throttle your requests. Even if you could, they don't provide as much data. For example, they don't provide references, abstracts, etc. Or if there are more than a few authors on a paper, they simply write "..." and refer you to a publisher's page for the paper. I wish Google Scholar was as cool as Microsoft Academic Search. However, I've written MAS about this issue and they told me that they are actively working to index more papers and that we should all stay tuned. Lets hope for the best.

Licence

BSD licence

researchlei's People

Contributors

karpathy avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.