broadinstitute / gene-hints Goto Github PK

View Code? Open in Web Editor NEW

10.0 5.0 0.0 67.79 MB

Discoverability for gene search :dna: :mag:

Home Page: https://broad.io/gene-hints

License: BSD 3-Clause "New" or "Revised" License

HTML 25.94% Python 74.06%

discoverability search-ui genomics-visualization pubmed

gene-hints's Issues

Fix GitHub Actions failure

Gene Hints updates popularity data each day via GitHub Actions workflow. The workflow fails like so:

100%|██████████| 180/180 [05:35<00:00,  1.87s/it]
100%|██████████| 180/180 [05:35<00:00,  1.87s/it]/home/runner/work/_temp/3996971a-eb39-4aaf-91b7-d4ed401873fb.sh: line 1:  1619 Killed                  python gene_hints/gene_hints.py
Output to: ./pubmed_citations/data/tmp/prev_timeframe/2021_01_20.tsv
Error: Process completed with exit code 137.

Another discussion of this error indicates the cause:

Exit code 137 = Out of memory. Therefore I guess your implementation makes the runner (ou docker container) runs out of memory.

Refine the Gene Hints pipeline so GitHub Actions workflows reliably succeed.

Automate daily updates for PubMed citation count trends

Wikipedia page view trends for human genes are updated each day, and written to data/homo-sapiens-wikipedia-trends.tsv. However, the UI ultimately fetches data/homo-sapiens-gene-details.tsv, which contains both those trends and PubMed citation count trends.

To get daily updates for data used by the UI, we first need to automate daily updates for PubMed trends.

The code to produce PubMed trends is in /creating_citation_counts_tsv. Use GitHub Actions to run that code to produce daily updates for PubMed trends data. It might help to create a script that wraps those various shell and Python files, then call the wrapper from GitHub Actions.

Sort human genes by rank change in Wikipedia page views

The number of page views for a given gene's article on Wikipedia can change due to factors that affect all genes, e.g. weekends.

To normalize for such factors, let's sort genes in https://raw.githubusercontent.com/broadinstitute/gene-hints/main/data/homo-sapiens-gene-details.tsv by view_rank_delta.

Use SPARQL to get Wikipedia article title given gene ID

Wikidata — a project related to Wikipedia — has a service that would allow us to get Wikipedia page title given the gene symbol. It uses a powerful but little-known query language called SPARQL. This would allow us to robustly and quickly determine Wikipedia article names from our known gene data, which is important for getting Wikipedia page view trends for those genes.

Tasks:

Run code in https://github.com/broadinstitute/gene-hints/tree/main/creating_citation_counts_tsv
Investigate the files that code operates on to determine whether Ensembl gene ID, Entrez gene ID, or both are available for all genes
Note that Ensembl gene ID is P594 on Wikidata, and Entrez gene ID is P351. For a real example, see here and here.
Extend the approach described here to query Wikipedia article name from our known values of P594 or P351

Sort non-human genes by rank change in citation count

Like #11, but sort all non-homo-sapiens files by rank_delta (which we might want to call citation_rank_delta).

broadinstitute / gene-hints Goto Github PK

gene-hints's Issues

Fix GitHub Actions failure

Automate daily updates for PubMed citation count trends

Sort human genes by rank change in Wikipedia page views

Use SPARQL to get Wikipedia article title given gene ID

Sort non-human genes by rank change in citation count

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent