monarch-initiative / biolink-api Goto Github PK

View Code? Open in Web Editor NEW

63.0 16.0 25.0 1.22 MB

API for linked biological knowledge

Home Page: https://api.monarchinitiative.org/api/

License: BSD 3-Clause "New" or "Revised" License

Python 82.53% Shell 0.40% Perl 1.44% Makefile 0.24% Gherkin 14.89% CSS 0.05% HTML 0.22% Dockerfile 0.22% Procfile 0.02%

bioinformatics ontologies phenotypes model-organisms python swagger gene api

biolink-api's People

Contributors

Stargazers

Watchers

biolink-api's Issues

exclude_automatic_assertions fails

This is the error: <title>Error 400 undefined field evidence_object_closure</title>

HTTP ERROR 400

Problem accessing /solr/select/. Reason:

    undefined field evidence_object_closure

Powered by Jetty://

This is the python test script
from biogolr.golr_associations import search_associations, search_associations_compact, GolrFields, select_distinct_subjects, get_objects_for_subject, get_subjects_for_object

M=GolrFields()

HUMAN_SHH = 'NCBIGene:6469'
HOLOPROSENCEPHALY = 'HP:0001360'
TWIST_ZFIN = 'ZFIN:ZDB-GENE-050417-357'
DVPF = 'GO:0009953'

def test_go_assocs():
results = search_associations(subject=TWIST_ZFIN,
exclude_automatic_assertions=True,
slim=['GO:0001525','GO:0048731','GO:0005634'],
object_category='function')

assocs = results['associations']
assert len(assocs) > 0
n_found = 0
for r in assocs:
    print("Direct: {} Slimmed: {}".format(r['object'],r['slim']))
    if 'GO:0002040' == r['object']['id']:
        if 'GO:0048731' in r['slim']:
            n_found = n_found+1
assert n_found == 1

Explore ways to add smartapi annotations

smartapi provides an editor to add new annotations. Ideally we would by able to do this by adding annotations at the flask level, rather than exporting the json, editing it and having it get out of sync.

as a first pass test we could do a one off export, extend it in the editor, just as a POC

https://websmartapi.github.io/smartapi/
smart-api.info/editor/#/

MacOSX Install issues with matplotlib. Easy workaround

Mac users may encounter this error when running python biolink/app.py:

    from matplotlib.backends import _macosx
RuntimeError: Python is not installed as a framework. The Mac OS X backend will not be able to function correctly if Python is not installed as a framework. See the Python documentation for more information on installing Python as a framework on Mac OS X. Please either reinstall Python as a framework, or try one of the other backends. If you are Working with Matplotlib in a virtual enviroment see 'Working with Matplotlib in Virtual environments' in the Matplotlib FAQ

Quick fix

echo "backend: TkAgg" >> ~/.matplotlib/matplotlibrc

Details

http://stackoverflow.com/questions/4130355/python-matplotlib-framework-under-macosx

Ensure ID rewriting is applied universally

The GO golr uses MGI:MGI:, the monarch MGI:

We standardize on the latter. Currently we rewrite behind the scenes to the doubled version, but this only happens for some fields.

Check issues with cachier

Needs to be forced to 0.1.26

Note that due to #49 this may need to be done in ontobio

ortholog identifiers

This may turn into 2-3 separate issues, but lets start. The objective is to determine whether any of the orthologs of a given gene have been functionally described using GO.

This query:
https://api.monarchinitiative.org/api/bioentity/gene/MGI%3A97490/homologs/?homology_type=O&rows=20&fetch_objects=true

Returns these results:
"FlyBase:FBgn0005558",
"FlyBase:FBgn0019650",
"NCBIGene:100022926",
"NCBIGene:100514152",
"NCBIGene:181251",
"NCBIGene:25509",
"NCBIGene:286857",
"NCBIGene:5080",
"NCBIGene:695746",
"NCBIGene:737387",
"ZFIN:ZDB-GENE-041210-244",
"ZFIN:ZDB-GENE-081022-10",
"ZFIN:ZDB-GENE-990415-200"

Respectively: fly, fly, possum, pig, WORM, RAT, bovine, HUMAN, macaque, chimp, zfish, zfish, zfish

First problem is that my objective was to use these results to issue a subsequent query to determine whether these homologs have experimental GO annotations associated with them. However AmiGO/GOlr doesn't recognize NCBIGene identifiers so I'll never see results that are actually available. The is especially harmful for human genes. Need the UniProt identifier to find results in AmiGO.
There is also a worm ortholog that is not showing the WB id (NCBIGene:181251 is worm). Same for rat (NCBIGene:25509), need the RGD identifier.

Switch to marshmallow for object schemas

May require a change from flaskrestplus to flask-api

Implement SPARQL wrapping

cc @balhoff

Implement causalmodels API

The causalmodels (cam) routes in the API provider operations over LEGO models (encompassing causal phenotype models). These are implemented via SPARQL queries over a pre-reasoned triplestore.

Implement some methods for handling

model querying
instance querying
'annoton' querying
cross-model analytic queries

Split core obographs/bioontology package into its own top level package

In the top level we have (in order of dependency)

biolink - the flaskrest code
obographs - general purpose ontology library
biogolr - wrapper for golr
prefixcommons - curie util

The last 3 can be combined into a general purpose python library for working with ontologies and ontology associations.

Fix gene/<id>/homologs to use homology type

The API allows the provision of any relationship type for any route that returns association objects (the precomputed relation closure in solr is used for this).

However, for specific routes like /gene//homologs, we want to provide a convenience enum, e.g. corresponding to O/P/LDO. These would trivially map to the RO ID.

We will follow this design pattern in other places where a fixed set of relations are used to connect two categories - e.g. drug-disease, disease-model

search/entity/{term} doesn't return anything

The examples are
https://api.monarchinitiative.org/api/search/entity/parkinson
https://api.monarchinitiative.org/api/search/entity/shh
https://api.monarchinitiative.org/api/search/entity/femur

For each the response is empty.

The swagger page also appears to be acting up for this query, which might be relevant.

Deploy a test instance

Primarily for browsing swagger docs

FB vs. FlyBase

For example: is it FlyBase:FBgn0005558 or FB:FBgn0005558?
There is a discordancy in Biolink, because the method to retrieve the orthologs for a gene uses 'FlyBase' for the matching ortholog identifiers, but if subsequently you try and use this ID to retrieve GO annotations nothing will come back because GOlr (and PANTHER and the GPCR) use FB as the resource prefix.

Add Data Paging

At the moment there is only an option to choose how many rows to return (at least for GET /search/entity/{term}). There should also be an option to choose at what index to start from. It would be nice to be able to load a thousand items in batches of tens or hundreds, or something like that, without having to load all thousand at once.

Allow configuration to point to other services

The external URLs are hard-coded. It'll be useful to be able to have that configurable so biolink can point to other endpoints.

For Monarch we'll use that to point to the dev vs prod services.

enable CORS

http://flask-cors.readthedocs.io/en/latest/

Implement genomic feature queries

Required for https://github.com/monarch-initiative/monarch-app/issues/1431

We have a (stub) route:

/within/<build>/<reference>/<begin>/<end>

We may also want one for fetching features N bp up/down of a gene boundary (include the gene features in that region).

These would wrap to be implemented calls in scigraph/scigraph_util.py (or this could be organized in some other way). @kshefchek has code from CKBD that can be adapted.

Add a functional travis test

See for example http://flask.pocoo.org/docs/0.11/testing/

short-hand function to support 'ribbon' display

Would be convenient for the 'ribbon' display if there were a method to return T/F for each (GO) class in a slim given i) a gene ID (or entity) and ii) the slim to use.

It should default to setting "exclude_automatic_assertions" to True.

This kind be accomplished via get gene/function (once exclude_automatic_assertions is working) and parsing results, but a lot of unnecessary transfer of data that's ignored.

Bind to solr search for autocomplete

Abstract over querying different golr instances

GO and Monarch have similar patterns in their golrs, but different fields. Because the GO schema is less generic we have (somewhat) specific field names such as bioentity (typically a gene or gene product) or annotation_class (typically GO terms)

The Monarch schema is more generic, and we don't assume the LHS is an 'entity', or that the RHS is a class (consider gene-gene interactions), so we use rdf terminology like subject/object for associations.

We should explore patterns for abstracting over these

Add API tests

Required for #11

Behave seems best candidate

Demo shopping cart functionality

This is intended to be very quick and proof of concept.

Create a new route in api/cart/endpoints/cart.py

cart/<id>/

GET - show entities in cart (just IDs). Defer fetching labels for now.
PUT - creates a new cart, returns uuid
POST - add an object to a cart. It can be any entity - just needs a CURIE

See variantset for example code for wrapping a sqlite database. I'd rather avoid sql dependencies, even sqllite. Could just be in-memory and destroyed at end of lifecycle of server for first pass.

Next we can explore actions on those items:

fetching all associations
comparing two carts
run owlsim (assume all entities are phenotypes, or something like genes that can be expanded to phenotypes)
enrichment

Improve biolink API calls wrapping wikidata for disease-substance

We currently have a path /disease/{id}/substance. Currently this returns a substance associated with a disease, but the association model doesn't follow the associations that come from golr.

Like most biolink routes, these are simple facades over more powerful services. In this case, this wraps a SPARQL query provided by @stuppie in NCATS-Tangerine/ncats-ingest#19 (comment)

See:

https://github.com/biolink/biolink-api/blob/a5195d0a2400bd53dbfce190d253f11f48943fbe/biolink/api/bio/endpoints/bioentity.py#L273-L286

In fact this is a facade over a facade; we have a small general purpose python library that wraps WD SPARQL calls: https://github.com/biolink/biolink-api/blob/6a339ef0fa71fafdb1cf0a1a14037124a7aedcbf/biowikidata/wd_sparql.py#L85-L87

The scripps team have a superior API but this one fits our purposes just now

(dipper team, @kshefchek @DoctorBud @mbrush @TomConlin take note. We can reuse something like this rather than duplicating ingest work.. in fact we can even simply federate our own graph queries, and start the wd ingest at the solr stage.. this ties in with the translator kbio plan).

we need to

make sure we grab the qualifiers, see the ticket above
map to the association model
think about the strategy for inference. Can we use the ontologies loaded into WD? What about synchrony, etc

implement /phenotype/<id>/anatomy/

this is just a stub so far.

this can be implemented by the paths:

(phenotype)-[has part]->(_)-[inheres_in|inheres_in_part_of]->(anatomy)
(phenotype)-[inheres_in|inheres_in_part_of]->(anatomy)

we may also want an option for traversing up the subclass hierarchy until a match is found.

How to do this?

SciGraph graph calls?
SciGraph dynamic?
BOLT?

Include neo4j wrapping example

TBD: via BOLT or SciGraph API?

Make compatible with smart-api

Enter

http://api.monarchinitiative.org/api/swagger.json

(or the equivalent on localhost:5000 if testing)

Into http://smart-api.info/editor/#/

And fix errors.

Once the errors are fixed, hit 'Save' - this registers the API.

It may get tedious to do this with every release, maybe there is a way to automate via github releases, can look into this later.

Note: the json is generated by the flaskrestplus framework, so the errors must be fixed at source. However, for debugging purposes we can edit the json

GET /evidence/graph/{id} = 500

https://api.monarchinitiative.org/api/evidence/graph/cfef92b7-bfa3-44c2-a537-579078d2de37

(id provided in the API doc)

response

<title>500 Internal Server Error</title>

Internal Server Error

The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.

A particular slim query with ZFIN:ZDB-GENE-990415-200 gives a 500 error

Thought this was the gene_product issue, but appears not. Here's the query:
http://127.0.0.1:8888/api/bioentityset/slimmer/function?slim=GO:0003824&slim=GO:0004872&slim=GO:0005102&slim=GO:0005215&slim=GO:0005198&slim=GO:0008092&slim=GO:0003677&slim=GO:0003723&slim=GO:0001071&slim=GO:0036094&slim=GO:0046872&slim=GO:0030246&slim=GO:0008283&slim=GO:0071840&slim=GO:0051179&slim=GO:0032502&slim=GO:0000003&slim=GO:0002376&slim=GO:0050877&slim=GO:0050896&slim=GO:0023052&slim=GO:0010467&slim=GO:0019538&slim=GO:0006259&slim=GO:0044281&slim=GO:0050789&slim=GO:0005576&slim=GO:0005829&slim=GO:0005856&slim=GO:0005739&slim=GO:0005634&slim=GO:0005694&slim=GO:0016020&slim=GO:0071944&slim=GO:0030054&slim=GO:0042995&slim=GO:0032991&subject=ZFIN:ZDB-GENE-990415-200

Ensure pagination arguments are used throughout

Add mart route for genotype-phenotype

Need to adequately test. What species has most of these?

Are there any odd bnode issues?

Upgrade flask-restplus version when it's available

It should fix the forward slash issue.

noirbizarre/flask-restplus#251

Add a graphql route

Consider https://github.com/graphql-python as framework

Galaxy tool wrapper for biolink API?

See https://docs.galaxyproject.org/en/master/dev/schema.html
for instructions on getting tools into the Galaxy project toolshed.

Here is an example for MouseMine: http://www.mousemine.org/mousemine/begin.do?GALAXY_URL=https%3A//usegalaxy.org/tool_runner%3Ftool_id%3Dmousemine

Alternatively, perhaps more appropriate to be on the sidebar here: https://usegalaxy.org/

Ideas welcome.

Add counts for distinct subjects and distinct objects in association queries

association counts are trivially returned for any association query, even if pageination used.

Options are:

precompute ahead of time and populate solr with precomputed results
utilize some function of solr I am not aware of
iterate over all results and count

1 is not good as we would have to precompute for the cross-product of all query options

2 may require further investigation

3 will work fine for any subject query for which the subject is loosely-speaking entity level, with a predictably bounded number of associations. E.g. genes, specific diseases.

gene/<id>/function does not return results

with example ID MGI:1342287. There is also some leakage of params specific to the homology call. I am using api.parser.copy() so I suspect there might be a weird gotcha in python inheritance? cc @mbrush

Switch api_marshall_list_with to api_marshall_with

https://api.monarchinitiative.org/api/#!/bioentity/get_gene_interactions

shows the return type as

[ 
 {} 
]

Looks like an issue here:

https://github.com/biolink/biolink-api/blob/7cadd9a7655b610fab672d8999812cf3e4567001/biolink/api/bio/endpoints/bioentity.py#L99

The return should be api_marshall_with

Slimmer should support either a list of IDs, or the tag that indicates the set of IDs (e.g. goslim_agr)

Would rather not need to hard code the list of IDs (will do for the nonce), but have this embedded in the ontology itself.

Leakage of parameters across routes

From #56

/gene/<id>/function

has homology params.

External URLs should be configurable

Useful to point to dev vs prod services for Monarch.

Add service for ID equivalency queries

It would be great to have the ability to take any identifier in the system and return all equivalent IDs (and labels). Some users may wish to dump this data as well as having a service.

@kshefchek

Test /mart/ routes

See mart.py for how this works.

for owlsim we'll need another route for /disease/phenotype/taxon/

Determine mechanism to access biolink-api from Galaxy

Should this a separate tool in the toolshed? Data access on the left sidebar?
I think this is an important dissemination/use case for our API and would be very useful for Galaxy users, who suffer a lack of phenotype data in particular. @tnabtaf can assist in helping us make decisions.

Reorganize sub-packages

Better separate API-independent business logic from API

Eventually general purpose modules should live in their own repo and be distributed using normal python mechanisms.

For now, keep these at the top level, e.g.

scigraph
obographs
golr

Slimmer function should exclude 'NOT' annotations by default

It may already be doing this, but should be checked to verify. One example is ZFIN:ZDB-GENE-990415-200 annotation to GO:0043524 (negative regulation of neuron apoptotic process)

Add example URLs

This should leverage the tests (#20)

use_compact_associations parameter results in 500

In https://api.monarchinitiative.org/api/#!/bioentity/get_disease_gene_associations if the parameter use_compact_associations is set (either true or false), the server returns a 500 error. Unset, the service works fine.

Add parameter to dynamically map ID spaces

In the golr schema we always choose a clique leader from the equivalence set for setting fields like subject, object. In Monarch this is done by @jnguyenx's clique leader code. In GO this is hardwired along partly political grounds as to whose ID to show.

Clients often want associations mapped to their ID space of choice. While it is always possible to do this as a post-processing step using an ID map service, better to give the option to do this on the fly.

In Monarch we can do this because we have closures on subjects as well as objects. (this also gives us superclass closures which is unwanted, but in general these are ids in a different id space; longer term we should have one closure for equivalence, and one for reflexive closure over subclass).

@kltm - as part of general alignment can we have subject_closures in amigo-golr too? No action required on your part now I can make experimental PR, this is just for context.

GET /graph/edges/from/{id} = 500

request
https://api.monarchinitiative.org/api/graph/edges/from/HP%3A0000465

response
{
"code": 500,
"message": "There was an error processing your request. It has been logged (ID e97fa0aef407dbfa)."
}