Git Product home page Git Product logo

dvklopfenstein / pmidcite Goto Github PK

View Code? Open in Web Editor NEW
27.0 5.0 7.0 23.04 MB

Turbocharge a PubMed literature rather than clicking and clicking and clicking on Google Scholar

License: GNU Affero General Public License v3.0

Makefile 0.03% Python 99.27% Jupyter Notebook 0.70% Shell 0.01%
literature-search pubmed command-line-tool ncbi pmid nih-citation-data citation-downloader snowballing citation-analysis literature-review

pmidcite's Introduction

PubMed ID (PMID) Cite

Tweet CodeQL Latest PyPI version DOI

pmidcite summary

Turbocharge a PubMed literature search with the command, icite, rather than clicking and clicking and clicking on Google Scholar "Cited by N" links.

This open-source project is part of a peer-reviewed commentary that was invited by the editors of Research Synthesis Methods. Please Cite if you use pmidcite in your research or literature search.

Contact: [email protected]

PubMed and NIH Citation data

PubMed contains peer-reviewed research papers in biomedicine, biochemistry, chemistry, behavioral science, and other life sciences. Citation data is downloaded each time icite is run from the National Institutes of Health (NIH) and includes:

  • Citation counts of all papers and clinical papers
  • Performance of a paper among its peer papers
  • Existence of MeSH terms for the human, animal, and molecular/cellular categories

Table of Contents

1) Download citation counts and data for a research paper

$ icite -H 26032263

  • This paper (PMID 26032263) has 25 citations, 10 references, and 4 authors.
  • This paper is performing well (74th percentile in column %) compared to its peers.

Starting usage

NIH percentile

This paper is performing well (74th percentile) compared to its peers (column %).

The NIH percentile grouping (column G) helps to highlight the better performing papers in groups 2, 3, and 4 by sorting the citing papers by group first, then publication year.

The sort places the lower performing papers in groups 0 or 1 at the back.

New papers appear at the beginning of a sorted list, no matter how many citations they have to better facilitate researchers in finding the latest discoveries.

The grouping of papers by NIH percentile grouping is a novel feature created by dvklopfenstein for this project.

2) Forward citation search

pmidcite summary

Also known as following a paper's Cited by links or Forward snowballing

icite -H; icite 26032263 --load_citations | sort -k6 -r
or
icite -H; icite 26032263 -c | sort -k6 -r

3) Backward citation search

Also known as following links to a paper's references or Backward snowballing

pmidcite summary

$ icite -H; icite 26032263 --load_references | sort -k6 -r
or
$ icite -H; icite 26032263 -r | sort -k6 -r

4) Summarize a group of citations

Create a file containing numerous PMIDs annotated with icite info

$ icite 30022098 -c -o goatools_cites.txt
  WROTE: goatools_cites.txt

Count the number of lines in the file

$ wc -l goatools_cites.txt
468 goatools_cites.txt

Summarize the papers in "goatools_cites.txt"

$ sumpaps goatools_cites.txt
i=026.9% 4=003.0% 3=018.9% 2=028.8% 1=015.9% 0=006.5%   6 years:2018-2024   465 papers goatools_cites.txt
  • The output is on one line so many files containing sets of PMIDs may be compared
  • The groups are from newest(i) to top-performing(4), great(3), very good(2), and overlooked(1 and 0)

5) Download citations for all papers returned from a PubMed search

  1. Do a search in PubMed
  2. Save all results into a file containing all PMIDs found by the search
  3. Download the list of PMIDs
  4. Run icite to analyze all the PMIDs

1. Do a search in PubMed

pmidcite summary

2. Save all results into a list of PMIDs

pmidcite summary

3. Download the list of PMIDs

pmidcite summary

4. Run icite to analyze all the PMIDs

$ icite -i pmid-HIVANDDNAm-set.txt -o pmid-HIVANDDNAm-icite.txt
$ grep TOP pmid-HIVANDDNAm-icite.txt | sort -k6

Command Line Interface (CLI)

A Command-Line Interface (CLI) can be preferable to a Graphical User Interface (GUI) because:

  • processing can be automated from a script
  • time-consuming mouse clicking is reduced
  • more data can be seen at once on a text screen than in a browser, giving the researcher a better overall impression of the full set of information [1]

Researchers who use Linux or Mac already work from the command line. Researchers who use Windows can get that Linux-like command line feeling while still running native Windows programs by downloading Cygwin from https://www.cygwin.com/ [1].

PubMed vs Google Scholar

Google Scholar vs PubMed

In 2013, Boeker et al. [6] recommended that a scientific search interface contain five integrated search criteria. PubMed implements all five, while Google did not in 2013 or today.

Google's highly popular implementation of the forward citation search through their ubiquitous "Cited by N" links is a "Better" experience than the PubMed's "forward citation search" implementation.

But if your research is in the health sciences and you are amenable to working from the command line, you can use PubMed in your browser plus citation data downloaded from the NIH using the command-line using pmidcite. The NIH's citation data includes a paper's ranking among its co-citation network.

What is in PubMed? Take a quick tour

PubMed Contents

PubMed is a search interface and toolset used to access over 30.5 million article records from databases such as:

  • MEDLINE: a highly selective database started in the 1960s
  • PubMed Central (PMC): an open-access database for full-text papers that are free of cost
  • Additional content such as books and articles published before the 1960s

Installation

To install from PyPI
$ pip3 install pmidcite

To install locally

$ git clone https://github.com/dvklopfenstein/pmidcite.git
$ cd ./pmidcite
$ pip3 install .

Setup

Save your literature search in a GitHub repo.

Add a .pmidciterc init file to a non-git managed directory, such as home (~)

$ icite --generate-rcfile | tee ~/.pmidciterc
[pmidcite]
email = [email protected]
# To download PubMed search results, get an NCBI API key here:
# https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities
apikey = MY_LONG_HEX_NCBI_API_KEY
tool = my_scripts
$ export PMIDCITECONF=~/.pmidciterc

Do not version manage the .pmidciterc using a tool such as GitHub because it contains your personal email and your private NCBI API key.

2. NCBI E-Utils API key

To download PubMed abstracts and PubMed search results using NCBI's E-Utils, get an NCBI API key using these instructions:
https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities

Set the apikey value in the config file: ~/.pmidciterc

Contributing

See the contributing guide for detailed instructions on how to get started contributing to the pmidcite project.

Contact

email: [email protected]
https://orcid.org/0000-0003-0161-7603

How to Cite

If you use pmidcite in your research or literature search, please cite paper 1 (pmidcite) and paper 3 (NIH citation data).

Please also consider reading and citing Gusenbauer's response (paper 2) about improving search for all during the information avalanche of these times:

  1. The pmidcite paper:
    Commentary to Gusenbauer and Haddaway 2020: Evaluating Retrieval Qualities of PubMed and Google Scholar
    Klopfenstein DV and Dampier W
    2020 | Research Synthesis Methods | PMID: 33031632 | DOI: 10.1002/jrsm.1456 | pdf

  2. Gusenbauer's response to the pmidcite paper:
    What every Researcher should know about Searching – Clarified Concepts, Search Advice, and an Agenda to improve Finding in Academia
    Gusenbauer M and Haddaway N
    2020 | Research Synthesis Methods | PMID: 33031639 | DOI: 10.1002/jrsm.1457 | pdf

  3. The NIH citation data used by pmidcite -- Scientific Influence, Translation, and Citation counts:
    The NIH Open Citation Collection: A public access, broad coverage resource
    Hutchins BI ... Santangelo GM
    2019 | PLoS Biology | PMID: 31600197 | DOI: 10.1371/journal.pbio.3000385

References

Please consider reading and citing the paper [4] which inspired the creation of pmidcite [1] and the authors' response to our paper [2]:

  1. Which Academic Search Systems are Suitable for Systematic Reviews or Meta-Analyses? Evaluating Retrieval Qualities of Google Scholar, PubMed and 26 other Resources
    Gusenbauer M and Haddaway N
    2019 | Research Synthesis Methods | PMID: 31614060 | DOI: 10.1002/jrsm.1378

Mentioned in this README are also these outstanding contributions:

  1. Relative Citation Ratio (RCR): A New Metric That Uses Citation Rates to Measure Influence at the Article Level
    Hutchins BI, Xin Yuan, Anderson JM, and Santangelo, George M.
    2016 | PLoS Biology | PMID: 27599104 | DOI: 10.1371/journal.pbio.1002541

  2. Google Scholar as replacement for systematic literature searches: good relative recall and precision are not enough
    Boeker M et al.
    2013 | BMC Medical Research Methodology | PMID: 24160679 | DOI: 10.1186/1471-2288-13-131

  3. Best Match: New relevance search for PubMed
    Fiorini N ... Lu Zhiyong
    2018 | PLoS Biology | PMID: 30153250 | DOI: 10.1371/journal.pbio.2005343

Contact

[email protected]
https://orcid.org/0000-0003-0161-7603

Copyright (C) 2019-present pmidcite, DV Klopfenstein, PhD. All rights reserved.

pmidcite's People

Contributors

dvklopfenstein avatar manodeep avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pmidcite's Issues

Annotate icite results to identify reviews

Hello,

Thank you so much for creating this literature search tool! It is very helpful for my work.

Can you add annotation to the icite results that identifies whether this paper is a review?

Thank you!

bulk download of the citation network?

Great package, ty. I was wondering if rather than making API requests for a list of pmids, is there dump of the citation network somewhere for pubmed? I would just need the list of edges in two columns basically (source, destination). I figured this would max out API requests fairly quickly using the examples provided. I know this is a big and maybe impossible ask, but just thought how I'd obtain as much of the network as possible to assist in a recommendation engine I'm working on. Thank you.

Alternate delimiter on output file

icite -H $PMID -c > $PMID.txt

I would love to use this tool in a program, but the only issue I have is that it is meant to be human-readable, rather than machine-readable. I mean this in the sense that it is space-delimited to keep the columns in line, but the number of spaces is naturally inconsistent.

Is there an option for the output of the above command to be tab-delimited instead? Comma delimiting seems dangerous due to the paper names potentially containing commas. Standard commands to convert spaces to tabs fail due to the paper names and limiting to only consecutive spaces causes issues with columns 5, 6, and 7 where they are often only separated by a single space.

Thanks!

ImportError: cannot import name 'summarize_papers' from 'pmidcite.scripts.icite'

Installed via pip3. icite works normally however when running this as a test:

summarize_papers goatools_cites.txt -p TOP CIT CLI

I get

Traceback (most recent call last):
  File "/cluster/path/to/my/directory/env/bin/summarize_papers", line 5, in <module>
    from pmidcite.scripts.icite import summarize_papers
ImportError: cannot import name 'summarize_papers' from 'pmidcite.scripts.icite' (/cluster/path/to/my/directory/env/lib/pypy3.9/site-packages/pmidcite/scripts/icite.py)

icite command not working

I installed the github repo and used the make command but icite command was not working. Can someone please help me with this ?

Error: Pip install not working

Hey & thx for your tool!

My issue:
After installing pmidcite via pip install pmidcite calling icite does not work.

Error:
-bash: icite: command not found

update icite -k

Hello,
Thank you for creating such a lovely literature search tool!

When using icite -k or icite --print_keys the result is:

 YEAR/citations/references section:
     ----------------------------------
      YEAR: The year the article was published
         x: Total of all unique articles that have cited the paper, including clinical articles
         y: Number of unique clinical articles that have cited the paper
         z: Number of references

But the header while using icite -H is as follows:
YEAR cit cli ref

I would like to kindly request the following changes in the --print_keys display:

YEAR/citations/references section:
    ----------------------------------
     YEAR: The year the article was published
        cit: Total of all unique articles that have cited the paper, including clinical articles
        cli: Number of unique clinical articles that have cited the paper
        ref: Number of references

Thank you for your time.

urllib3 warning seen

Thank you so much for writing this project and showing me how to use it!

I am seeing the following warning. I know it's not from your package, but I thought I would let you know.

 % icite 32976797
/Users/user1/Library/Python/3.9/lib/python/site-packages/urllib3/__init__.py:34: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(

Thank you very much!

AttributeError: 'NoneType' object has no attribute 'pmid'

If NIH citation data is not available for one or more requested PMIDs in a list of PMIDs, this error appears:

**WARNING: 1 NIH CITATION DATA NOT DOWNLOADED FOR PMIDs: 32809475
Traceback (most recent call last):
  File "src/bin/dnld_pmids.py", line 40, in <module>
    main()
  File "src/bin/dnld_pmids.py", line 36, in main
    obj.run(queries, dnld_idx)
  File "/cygdrive/c/Users/note2/Data/git/pmidcite/src/pmidcite/pubmedqueryicite.py", line 40, in run
    self.querypubmed_runicite(ntd.filename, ntd.pubmed_query)
  File "/cygdrive/c/Users/note2/Data/git/pmidcite/src/pmidcite/pubmedqueryicite.py", line 61, in querypubmed_runicite
    self.wr_icite(fout_icite, pmids)
  File "/cygdrive/c/Users/note2/Data/git/pmidcite/src/pmidcite/pubmedqueryicite.py", line 78, in wr_icite
    pmid2paper = dnldr.get_pmid2paper(pmids, self.pmid2note)
  File "/cygdrive/c/Users/note2/Data/git/pmidcite/src/pmidcite/icite/pmid_dnlder.py", line 144, in get_pmid2paper
    pmid2icite = {o.pmid:o for o in self.get_icites(pmids_top)}
  File "/cygdrive/c/Users/note2/Data/git/pmidcite/src/pmidcite/icite/pmid_dnlder.py", line 144, in <dictcomp>
    pmid2icite = {o.pmid:o for o in self.get_icites(pmids_top)}
AttributeError: 'NoneType' object has no attribute 'pmid'

error: package directory 'src/pmidcite/eutils/pubmed/mesh' does not exist

Hey all,

I made a fresh python 3.8 enviornment and ran pip install pmidcite and got the following error:

Collecting pmidcite
  Using cached pmidcite-0.0.36.tar.gz (2.6 MB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [7 lines of output]
      running egg_info
      creating /private/var/folders/lg/xq84d0w15qv8wrgxqbrxznkm0000gn/T/pip-pip-egg-info-zi1zdbyn/pmidcite.egg-info
      writing /private/var/folders/lg/xq84d0w15qv8wrgxqbrxznkm0000gn/T/pip-pip-egg-info-zi1zdbyn/pmidcite.egg-info/PKG-INFO
      writing dependency_links to /private/var/folders/lg/xq84d0w15qv8wrgxqbrxznkm0000gn/T/pip-pip-egg-info-zi1zdbyn/pmidcite.egg-info/dependency_links.txt
      writing top-level names to /private/var/folders/lg/xq84d0w15qv8wrgxqbrxznkm0000gn/T/pip-pip-egg-info-zi1zdbyn/pmidcite.egg-info/top_level.txt
      writing manifest file '/private/var/folders/lg/xq84d0w15qv8wrgxqbrxznkm0000gn/T/pip-pip-egg-info-zi1zdbyn/pmidcite.egg-info/SOURCES.txt'
      error: package directory 'src/pmidcite/eutils/pubmed/mesh' does not exist
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

I'm using Mac m1 Monterey 12.3 and python 3.8. Any ideas why?

Add option to always download citations from the NIH (no temporary working files)

Add an option to always download citations from the NIH; this will now be the default setting.

The previous default mode allows researchers to combine working on-line with large periods of working offline by downloading citation data from the NIH (json format), converting the json data to a Python dict, and writing the Python data into a temporary working file(p.py). Under the former default setting, citation data previously downloaded from the NIH would be loaded from the temporary Python working files rather than re-downloaded from the NIH unless the force_download argument is True.

The new option of always downloading citations from the NIH results in no temporary working files written to disk resulting in the researcher always seeing the latest citation data, but with the cost of not being able to use previously downloaded data when working offline. An advantage of always downloading citations is an increase in speed resulting from not writing citation working data to disk. Another advantage includes simplifiying the user interface of the Python library for researchers using the library in their code. The ability to save NIH citation data to disk as Python dicts remains, but is now an option. No code changes are necessary for any researchers using this library in their code.

Add new functionality: PubMed search from command line

I love your tool! It is so amazing. Thank you so much for this.

I see you can do PubMed searches from the script but I need to do it from the command line. Would it be possible to do this from the command line?

icite -s 'HIV AND methylation AND (2017:2023[pdat])' -o HIV_meth_gt2017.txt

Makefile does not execute completely

Hi,

I'm trying to set-up your tool manually by cloning this repo and running your Makefile, but the makefile doesn't seem to create the necessary files (but rather stops after executing just 2 find-commands). Could you please add some lines of explanation, how to set-up your tool manually?

Thank you so much for your efforts! Looking forward to testing your tool :)

Here my Makefile print-out:

make -f makefile
find src -regextype posix-extended -regex ".*[a-z]+.py"
src/bin/dnld_pmids.py
src/bin/rpt_dates_top.py
src/bin/plot_pubmed_contents.py
src/bin/plt_guassian_nihperc.py
src/bin/scatter.py
src/bin/query_pubmed.py
src/bin/dnld_pubmed.py
src/bin/icite.py
src/bin/read_pmids.py
src/tests/args_dflt.py
src/tests/pmids_many.py
src/tests/test_cfg_icite.py
src/tests/test_nb_print_paper_sort_cites.py
src/tests/test_paper_sorts.py
src/tests/test_nb_nihocc_data_download_always.py
src/tests/test_speed_api_dnld.py
src/tests/dnld_pmids_100k.py
src/tests/test_cli_icite.py
src/tests/test_nb_print_paper_all_refs_cites.py
src/tests/prt_hms.py
src/tests/test_nb_query_pubmed.py
src/tests/test_nb_nihocc_data_download_or_import.py
src/tests/icite.py
src/tests/pmids.py
src/tests/test_dnld_cites_refs.py
src/tests/test_speed_dnld_load.py
src/tests/test_database_list.py
src/tests/test_icite_longreq.py
src/tests/test_print_paper.py
src/tests/test_dnld_pmids.py
src/pmidcite/eutils/cmds/pubmed.py
src/pmidcite/eutils/cmds/efetch.py
src/pmidcite/eutils/cmds/elink.py
src/pmidcite/eutils/cmds/cmdbase.py
src/pmidcite/eutils/cmds/esearch.py
src/pmidcite/eutils/cmds/base.py
src/pmidcite/eutils/cmds/query_ids.py
src/pmidcite/eutils/pubmed/terms.py
src/pmidcite/eutils/pubmed/query.py
src/pmidcite/eutils/pubmed/author.py
src/pmidcite/eutils/pubmed/qualifiers.py
src/pmidcite/eutils/pubmed/descriptors.py
src/pmidcite/eutils/pubmed/rdwr.py
src/pmidcite/eutils/pubmed/record.py
src/pmidcite/eutils/pubmed/authors.py
src/pmidcite/eutils/pubmed/counts/dnlded_data.py
src/pmidcite/eutils/pubmed/counts/dnld.py
src/pmidcite/eutils/pubmed/counts/plt.py
src/pmidcite/eutils/pubmed/counts/data.py
src/pmidcite/cfg.py
src/pmidcite/pubmedqueryicite.py
src/pmidcite/plot/nih_perc.py
src/pmidcite/plot/scatter.py
src/pmidcite/utils_module.py
src/pmidcite/_version.py
src/pmidcite/icite/pmid_dnlder.py
src/pmidcite/icite/downloader.py
src/pmidcite/icite/papers.py
src/pmidcite/icite/paper.py
src/pmidcite/icite/api.py
src/pmidcite/icite/utils.py
src/pmidcite/icite/entry.py
src/pmidcite/icite/dnldr/pmid_dnlder.py
src/pmidcite/icite/dnldr/pmid_loader.py
src/pmidcite/icite/dnldr/pmid_dnlder_base.py
src/pmidcite/icite/dnldr/pmid_dnlder_only.py
src/pmidcite/icite/nih_grouper.py
src/pmidcite/cli/readpmids.py
src/pmidcite/cli/rptdatestop.py
src/pmidcite/cli/querypubmed.py
src/pmidcite/cli/entry_keyset.py
src/pmidcite/cli/utils.py
src/pmidcite/cli/icite.py
src/pmidcite/cli/dnldpubmed.py
src/pmidcite/cfgini.py
find src -regextype posix-extended -regex "[a-z./]*" -type d
src
src/bin
src/tests
src/tests/data
src/pmidcite
src/pmidcite/eutils
src/pmidcite/eutils/cmds
src/pmidcite/eutils/pubmed
src/pmidcite/eutils/pubmed/counts
src/pmidcite/plot
src/pmidcite/icite
src/pmidcite/icite/dnldr
src/pmidcite/cli

Tested on MacOS-System.

Request to add 'fl' parameter to icite API

fl: only return publications with the given fields. Separate multiple fields with commas (no space). Field names are very specific and listed in Response example below. No fl param will return all fields.

High memory usage by pmidcite

@dvklopfenstein Pmidcite uses a huge amount of memory while accessing the pubmed ids. I have a txt file which contains 90,000 pmids. On running pmidcite, the cluster gets aborted due to high memory imprint (about 16 GB). Can you please help me with this ? I only require the headers and do not want the information of the citations.

Screenshot 2021-07-26 at 12 35 37 PM

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.