Git Product home page Git Product logo

scholardaemon's Introduction

🔎 🐥 📃 Google Scholar Alerts Twitter bot 📃 🐥 🔎

Google Scholar lacks an API, but unlike PubMed links directly to papers. Often the stream of a Pubmed-sourced bot is filled with papers not deposited with direct links. Occasionally they will have a DOI, but Medline's indexing of these is inconsistent (the XML for articles themselves can be pretty inconsistent as I found out on a previous excursion under Pubmed's bonnet).

Even when a paper is deposited with this identifier, the DOI minting process means it's not guaranteed that the link will work straight away - I myself have felt (and regularly see other scientists online expressing the same) frustration at having the basic line of scientific enquiry rudely interrupted by technical issues. Preprints are another consideration.

via

Preprints are undeniably coming into the fold of bioscience research, a practice originating in the physics/mathematical sciences that crept in through common ground at arXiv's q-bio section. There are various dedicated sites/accounts monitoring particular subfields (e.g. Haldane's sieve/@haldanessieve for population/evolutionary genetics).

Google Scholar indexes all fields, and in my own experience this leads to casual interdisciplinary reading in a way not possible from Pubmed's purely biomedical library - a facet of research which the BBSRC, MRC and the Society of Biology feel is lacking amongst bioscientists.

Creating a feed of interest through Google Scholar

  • Google Scholar Alerts can provide up to 20 results in an e-mail, and posting/archiving these somewhere other than a busy inbox makes new research more accessible
  • Gmail for instance has various APIs and libraries, including an official Python 2.6-2.7 package and gmailr for R
  • Twitter likewise has python-twitter and twitteR

This script checks for Google Scholar Alerts in a Gmail account, parses through the message for paper titles and links, and sends the list of new articles through to Twitter

  • this could perhaps be automated with a cron job like Lynn Root used for her IfMeetThenTweet IFTTT alternative
  • it could also perhaps be hosted on a free micro instance of Amazon Web Services EC2 (but I've not tried yet) etc.
  • sending the papers to Buffer doesn't make much sense since it seems to be at most 1 email a day, though perhaps other queries may vary

Installation and usage

For a walkthrough on installation see the Wiki homepage. Briefly:

  • Install gmailr and twitteR, set up apps on Google Dev console and likewise for Twitter's
  • Authorise gmailr (gmail_auth) with the JSON obtained by setting up an app
  • Run Rscript run_daemon with --help to show available flags and bots.
    • Bots can be passed as arguments to run_daemon indicating which of the available account configurations to use, default behaviour being to check and tweet for all sequentially if unspecified.
    • These arguments are specified under config/bot_registry.json, where they are stored alongside the corresponding sub-directories to retrieve authentication information from. See the Wiki for more info.

Automation

Dave Tang seems to have beaten me to the idea of using R for a paper bot by just a couple of weeks - he has a working example of a cron script, timed for Pubmed's release, as he worked with eUtils (i.e. Pubmed, like all the other existing bots in Casey Bergman's list, with the exception of eQTLpapers which has Scholar Alerts added manually by Sarah Brown).

crontab -l
#minute hour dom month dow user cmd
0 15-23 * * * cd /Users/davetang/Dropbox/transcriptomes && ./feed.R &> /dev/null

Cron automation makes sense for daily MEDLINE (PubMed) updates, but not for emails - IFTTT-like 'triggering' would be ideal, and can be achieved with custom 'events' through Amazon Lambda [free tier], reacting to changes in AWS S3 file storage, which may be modified with dat pull --live.

For now I'm using cron (hourly entry added with crontab -e) to:

  • source my .bashrc which
    • exports the location of the scholaRdaemon directory to an eponymous variable
    • sets an alias runsdaemon as Rscript "$scholaRdaemon/run_daemon"
  • record the date/time in the sd.log file there
  • run the daemon for all bots (default behaviour, for all bots listed in config/bot_registry.json)
0 * * * * source /home/louis/.bashrc; date >> "$scholaRdaemon"sd.log; runsdaemon >> "$scholaRdaemon"sd.log

scholardaemon's People

Contributors

lmmx avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

scholardaemon's Issues

Intermittent communication error

ScholarDaemon.R can't be automated with Rscript stopped working as the mail IDs aren't provided by GetMail() in a timely manner - needs a wait or callback. Not clear of the source or why this behaviour disappeared again...

recent.papers <- GetMail()
recent.paper.mail.ids <- names(recent.papers) # not actually paper names, just the message IDs
CheckMessageHistory(recent.paper.mail.ids)

To reproduce run Rscript check_mail.Rscript

Attaching package: ‘gmailr’

The following object is masked from ‘package:utils’:

    history

The following objects are masked from ‘package:base’:

    body, date, labels, message


Attaching package: ‘twitteR’

The following object is masked from ‘package:gmailr’:

    id

Loading required package: methods

Authorising Twitter
[1] "Using direct authentication"
Error in gsub("\"", "\\\\\"", string) : 
  object 'recent.paper.mail.ids' not found
Calls: source ... WriteNewIDs -> paste0 -> paste -> shQuote -> paste0 -> gsub
Execution halted

Errors not logged!

Errors if created abort the process of writing to the log file. Which kind of defeats the object of having a log file

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.