Git Product home page Git Product logo

scholarly's Introduction

Python package codecov Documentation Status DOI

scripts

For now, there are three scripts:

  1. profile_basic.py: basic statistics about an author with an existing scholar profile
  2. crawl.py: saves all pubs of an author, with bibliometrics
  3. citedBy.py: saves all papers citing a paper; needs a proxy API key (currently only ScraperAPI supported)

Both script take in input a name in quotes (e.g., "Emiliano De Cristofaro") and create a csv file.

scholarly library

scholarly is a module that allows you to retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to solve CAPTCHAs.

Installation

Use pip to install the latest release from pypi:

pip3 install scholarly

or pip to install from github:

pip3 install -U git+https://github.com/scholarly-python-package/scholarly.git

scholarly follows Semantic Versioning.

Optional dependencies

  • geckodriver provides the browser capabilities that may be needed to fully utilize the library. Currently, if a Scholar profile has more than 20 co-authors, geckodriver is needed to fetch the complete list. If not installed, scholarly will fetch only up to 20 co-authors.

    To install geckodriver, download the latest version from their Github repo and the executable should be in the system path. Follow the appropriate installation instructions: macOS | Ubuntu | Windows

  • Tor:

    scholarly comes with a handful of APIs to set up proxies to circumvent anti-bot measures. Tor methods are deprecated since v1.5 and are not actively tested or supported. If you wish to use Tor, install scholarly using the tor tag as

    pip3 install scholarly[tor]

Tests

To check if your installation is succesful, run the tests by executing the test_module.py file as:

python3 test_module

or

python3 -m unittest -v test_module.py

Documentation

Check the documentation for a complete API reference and a quickstart guide.

Examples

from scholarly import scholarly

# Retrieve the author's data, fill-in, and print
# Get an iterator for the author results
search_query = scholarly.search_author('Steven A Cholewiak')
# Retrieve the first result from the iterator
first_author_result = next(search_query)
scholarly.pprint(first_author_result)

# Retrieve all the details for the author
author = scholarly.fill(first_author_result )
scholarly.pprint(author)

# Take a closer look at the first publication
first_publication = author['publications'][0]
first_publication_filled = scholarly.fill(first_publication)
scholarly.pprint(first_publication_filled)

# Print the titles of the author's publications
publication_titles = [pub['bib']['title'] for pub in author['publications']]
print(publication_titles)

# Which papers cited that publication?
citations = [citation['bib']['title'] for citation in scholarly.citedby(first_publication_filled)]
print(citations)

IMPORTANT: Making certain types of queries, such as scholarly.citedby or scholarly.search_pubs, will lead to Google Scholar blocking your requests and may eventually block your IP address. You must use proxy services to avoid this situation. See the "Using proxies" section in the documentation for more details. Here's a short example:

from scholarly import ProxyGenerator

# Set up a ProxyGenerator object to use free proxies
# This needs to be done only once per session
pg = ProxyGenerator()
pg.FreeProxies()
scholarly.use_proxy(pg)

# Now search Google Scholar from behind a proxy
search_query = scholarly.search_pubs('Perception of physical stability and center of mass of 3D objects')
scholarly.pprint(next(search_query))

scholarly also has APIs that work with several premium (paid) proxy services. scholarly is smart enough to know which queries need proxies and which do not. It is therefore recommended to always set up a proxy in the beginning of your application.

Citing

If you have used this codebase in a scientific publication and wish to cite it, please use the following:

@software{cholewiak2021scholarly,
  author  = {Cholewiak, Steven A. and Ipeirotis, Panos and Silva, Victor and Kannawadi, Arun},
  title   = {{SCHOLARLY: Simple access to Google Scholar authors and citation using Python}},
  year    = {2021},
  doi     = {10.5821/zenodo.5764802},
  license = {Unlicense},
  url = {https://github.com/scholarly-python-package/scholarly},
  version = {1.5.0}
}

Disclaimer

The developers use ScraperAPI to run the tests in Github Actions. The developers of scholarly are not affiliated with any of the proxy services and do not profit from them. If your favorite service is not supported, please submit an issue or even better, follow it up with a pull request.

License

The original code that this project was forked from was released by Luciano Bello under a WTFPL license. In keeping with this mentality, all code is released under the Unlicense.

scholarly's People

Contributors

organicirradiation avatar 1ucian0 avatar emidec avatar firefly-cpp avatar utf avatar harrison97 avatar louiskirsch avatar percolator avatar spferical avatar bryant1410 avatar cako avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.