monarch-initiative / biolink-api Goto Github PK
View Code? Open in Web Editor NEWAPI for linked biological knowledge
Home Page: https://api.monarchinitiative.org/api/
License: BSD 3-Clause "New" or "Revised" License
API for linked biological knowledge
Home Page: https://api.monarchinitiative.org/api/
License: BSD 3-Clause "New" or "Revised" License
This is the error: <title>Error 400 undefined field evidence_object_closure</title>
Problem accessing /solr/select/. Reason:
undefined field evidence_object_closure
This is the python test script
from biogolr.golr_associations import search_associations, search_associations_compact, GolrFields, select_distinct_subjects, get_objects_for_subject, get_subjects_for_object
M=GolrFields()
HUMAN_SHH = 'NCBIGene:6469'
HOLOPROSENCEPHALY = 'HP:0001360'
TWIST_ZFIN = 'ZFIN:ZDB-GENE-050417-357'
DVPF = 'GO:0009953'
def test_go_assocs():
results = search_associations(subject=TWIST_ZFIN,
exclude_automatic_assertions=True,
slim=['GO:0001525','GO:0048731','GO:0005634'],
object_category='function')
assocs = results['associations']
assert len(assocs) > 0
n_found = 0
for r in assocs:
print("Direct: {} Slimmed: {}".format(r['object'],r['slim']))
if 'GO:0002040' == r['object']['id']:
if 'GO:0048731' in r['slim']:
n_found = n_found+1
assert n_found == 1
smartapi provides an editor to add new annotations. Ideally we would by able to do this by adding annotations at the flask level, rather than exporting the json, editing it and having it get out of sync.
as a first pass test we could do a one off export, extend it in the editor, just as a POC
https://websmartapi.github.io/smartapi/
smart-api.info/editor/#/
Mac users may encounter this error when running python biolink/app.py
:
from matplotlib.backends import _macosx
RuntimeError: Python is not installed as a framework. The Mac OS X backend will not be able to function correctly if Python is not installed as a framework. See the Python documentation for more information on installing Python as a framework on Mac OS X. Please either reinstall Python as a framework, or try one of the other backends. If you are Working with Matplotlib in a virtual enviroment see 'Working with Matplotlib in Virtual environments' in the Matplotlib FAQ
echo "backend: TkAgg" >> ~/.matplotlib/matplotlibrc
http://stackoverflow.com/questions/4130355/python-matplotlib-framework-under-macosx
The GO golr uses MGI:MGI:, the monarch MGI:
We standardize on the latter. Currently we rewrite behind the scenes to the doubled version, but this only happens for some fields.
Needs to be forced to 0.1.26
Note that due to #49 this may need to be done in ontobio
This may turn into 2-3 separate issues, but lets start. The objective is to determine whether any of the orthologs of a given gene have been functionally described using GO.
Returns these results:
"FlyBase:FBgn0005558",
"FlyBase:FBgn0019650",
"NCBIGene:100022926",
"NCBIGene:100514152",
"NCBIGene:181251",
"NCBIGene:25509",
"NCBIGene:286857",
"NCBIGene:5080",
"NCBIGene:695746",
"NCBIGene:737387",
"ZFIN:ZDB-GENE-041210-244",
"ZFIN:ZDB-GENE-081022-10",
"ZFIN:ZDB-GENE-990415-200"
Respectively: fly, fly, possum, pig, WORM, RAT, bovine, HUMAN, macaque, chimp, zfish, zfish, zfish
First problem is that my objective was to use these results to issue a subsequent query to determine whether these homologs have experimental GO annotations associated with them. However AmiGO/GOlr doesn't recognize NCBIGene identifiers so I'll never see results that are actually available. The is especially harmful for human genes. Need the UniProt identifier to find results in AmiGO.
There is also a worm ortholog that is not showing the WB id (NCBIGene:181251 is worm). Same for rat (NCBIGene:25509), need the RGD identifier.
May require a change from flaskrestplus to flask-api
cc @balhoff
The causalmodels (cam) routes in the API provider operations over LEGO models (encompassing causal phenotype models). These are implemented via SPARQL queries over a pre-reasoned triplestore.
Implement some methods for handling
In the top level we have (in order of dependency)
The last 3 can be combined into a general purpose python library for working with ontologies and ontology associations.
The API allows the provision of any relationship type for any route that returns association objects (the precomputed relation closure in solr is used for this).
However, for specific routes like /gene//homologs, we want to provide a convenience enum, e.g. corresponding to O/P/LDO. These would trivially map to the RO ID.
We will follow this design pattern in other places where a fixed set of relations are used to connect two categories - e.g. drug-disease, disease-model
The examples are
https://api.monarchinitiative.org/api/search/entity/parkinson
https://api.monarchinitiative.org/api/search/entity/shh
https://api.monarchinitiative.org/api/search/entity/femur
For each the response is empty.
The swagger page also appears to be acting up for this query, which might be relevant.
Primarily for browsing swagger docs
For example: is it FlyBase:FBgn0005558 or FB:FBgn0005558?
There is a discordancy in Biolink, because the method to retrieve the orthologs for a gene uses 'FlyBase' for the matching ortholog identifiers, but if subsequently you try and use this ID to retrieve GO annotations nothing will come back because GOlr (and PANTHER and the GPCR) use FB as the resource prefix.
At the moment there is only an option to choose how many rows to return (at least for GET /search/entity/{term}). There should also be an option to choose at what index to start from. It would be nice to be able to load a thousand items in batches of tens or hundreds, or something like that, without having to load all thousand at once.
The external URLs are hard-coded. It'll be useful to be able to have that configurable so biolink can point to other endpoints.
For Monarch we'll use that to point to the dev vs prod services.
Required for https://github.com/monarch-initiative/monarch-app/issues/1431
We have a (stub) route:
/within/<build>/<reference>/<begin>/<end>
We may also want one for fetching features N bp up/down of a gene boundary (include the gene features in that region).
These would wrap to be implemented calls in scigraph/scigraph_util.py (or this could be organized in some other way). @kshefchek has code from CKBD that can be adapted.
See for example http://flask.pocoo.org/docs/0.11/testing/
Would be convenient for the 'ribbon' display if there were a method to return T/F for each (GO) class in a slim given i) a gene ID (or entity) and ii) the slim to use.
It should default to setting "exclude_automatic_assertions" to True.
This kind be accomplished via get gene/function (once exclude_automatic_assertions is working) and parsing results, but a lot of unnecessary transfer of data that's ignored.
GO and Monarch have similar patterns in their golrs, but different fields. Because the GO schema is less generic we have (somewhat) specific field names such as bioentity
(typically a gene or gene product) or annotation_class
(typically GO terms)
The Monarch schema is more generic, and we don't assume the LHS is an 'entity', or that the RHS is a class (consider gene-gene interactions), so we use rdf terminology like subject/object for associations.
We should explore patterns for abstracting over these
Required for #11
Behave seems best candidate
This is intended to be very quick and proof of concept.
Create a new route in api/cart/endpoints/cart.py
cart/<id>/
See variantset for example code for wrapping a sqlite database. I'd rather avoid sql dependencies, even sqllite. Could just be in-memory and destroyed at end of lifecycle of server for first pass.
Next we can explore actions on those items:
We currently have a path /disease/{id}/substance
. Currently this returns a substance associated with a disease, but the association model doesn't follow the associations that come from golr.
Like most biolink routes, these are simple facades over more powerful services. In this case, this wraps a SPARQL query provided by @stuppie in NCATS-Tangerine/ncats-ingest#19 (comment)
See:
In fact this is a facade over a facade; we have a small general purpose python library that wraps WD SPARQL calls: https://github.com/biolink/biolink-api/blob/6a339ef0fa71fafdb1cf0a1a14037124a7aedcbf/biowikidata/wd_sparql.py#L85-L87
The scripps team have a superior API but this one fits our purposes just now
(dipper team, @kshefchek @DoctorBud @mbrush @TomConlin take note. We can reuse something like this rather than duplicating ingest work.. in fact we can even simply federate our own graph queries, and start the wd ingest at the solr stage.. this ties in with the translator kbio plan).
we need to
this is just a stub so far.
this can be implemented by the paths:
(phenotype)-[has part]->(_)-[inheres_in|inheres_in_part_of]->(anatomy)
(phenotype)-[inheres_in|inheres_in_part_of]->(anatomy)
we may also want an option for traversing up the subclass hierarchy until a match is found.
How to do this?
graph
calls?TBD: via BOLT or SciGraph API?
Enter
http://api.monarchinitiative.org/api/swagger.json
(or the equivalent on localhost:5000 if testing)
Into http://smart-api.info/editor/#/
And fix errors.
Once the errors are fixed, hit 'Save' - this registers the API.
It may get tedious to do this with every release, maybe there is a way to automate via github releases, can look into this later.
Note: the json is generated by the flaskrestplus framework, so the errors must be fixed at source. However, for debugging purposes we can edit the json
https://api.monarchinitiative.org/api/evidence/graph/cfef92b7-bfa3-44c2-a537-579078d2de37
(id provided in the API doc)
response
<title>500 Internal Server Error</title>The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.
Thought this was the gene_product issue, but appears not. Here's the query:
http://127.0.0.1:8888/api/bioentityset/slimmer/function?slim=GO:0003824&slim=GO:0004872&slim=GO:0005102&slim=GO:0005215&slim=GO:0005198&slim=GO:0008092&slim=GO:0003677&slim=GO:0003723&slim=GO:0001071&slim=GO:0036094&slim=GO:0046872&slim=GO:0030246&slim=GO:0008283&slim=GO:0071840&slim=GO:0051179&slim=GO:0032502&slim=GO:0000003&slim=GO:0002376&slim=GO:0050877&slim=GO:0050896&slim=GO:0023052&slim=GO:0010467&slim=GO:0019538&slim=GO:0006259&slim=GO:0044281&slim=GO:0050789&slim=GO:0005576&slim=GO:0005829&slim=GO:0005856&slim=GO:0005739&slim=GO:0005634&slim=GO:0005694&slim=GO:0016020&slim=GO:0071944&slim=GO:0030054&slim=GO:0042995&slim=GO:0032991&subject=ZFIN:ZDB-GENE-990415-200
Need to adequately test. What species has most of these?
Are there any odd bnode issues?
It should fix the forward slash issue.
Consider https://github.com/graphql-python as framework
See https://docs.galaxyproject.org/en/master/dev/schema.html
for instructions on getting tools into the Galaxy project toolshed.
Here is an example for MouseMine: http://www.mousemine.org/mousemine/begin.do?GALAXY_URL=https%3A//usegalaxy.org/tool_runner%3Ftool_id%3Dmousemine
Alternatively, perhaps more appropriate to be on the sidebar here: https://usegalaxy.org/
Ideas welcome.
association counts are trivially returned for any association query, even if pageination used.
For counts of distinct subjects and objects, we can't assume that |Subject|=1
(consider intermediate disease as subject) or |Object|=|Association|
Options are:
1 is not good as we would have to precompute for the cross-product of all query options
2 may require further investigation
3 will work fine for any subject query for which the subject is loosely-speaking entity level, with a predictably bounded number of associations. E.g. genes, specific diseases.
with example ID MGI:1342287. There is also some leakage of params specific to the homology call. I am using api.parser.copy() so I suspect there might be a weird gotcha in python inheritance? cc @mbrush
https://api.monarchinitiative.org/api/#!/bioentity/get_gene_interactions
shows the return type as
[
{}
]
Looks like an issue here:
The return should be api_marshall_with
Would rather not need to hard code the list of IDs (will do for the nonce), but have this embedded in the ontology itself.
Useful to point to dev vs prod services for Monarch.
It would be great to have the ability to take any identifier in the system and return all equivalent IDs (and labels). Some users may wish to dump this data as well as having a service.
See mart.py for how this works.
for owlsim we'll need another route for /disease/phenotype/taxon/
Should this a separate tool in the toolshed? Data access on the left sidebar?
I think this is an important dissemination/use case for our API and would be very useful for Galaxy users, who suffer a lack of phenotype data in particular. @tnabtaf can assist in helping us make decisions.
Better separate API-independent business logic from API
Eventually general purpose modules should live in their own repo and be distributed using normal python mechanisms.
For now, keep these at the top level, e.g.
It may already be doing this, but should be checked to verify. One example is ZFIN:ZDB-GENE-990415-200 annotation to GO:0043524 (negative regulation of neuron apoptotic process)
This should leverage the tests (#20)
In https://api.monarchinitiative.org/api/#!/bioentity/get_disease_gene_associations if the parameter use_compact_associations is set (either true or false), the server returns a 500 error. Unset, the service works fine.
In the golr schema we always choose a clique leader from the equivalence set for setting fields like subject, object. In Monarch this is done by @jnguyenx's clique leader code. In GO this is hardwired along partly political grounds as to whose ID to show.
Clients often want associations mapped to their ID space of choice. While it is always possible to do this as a post-processing step using an ID map service, better to give the option to do this on the fly.
In Monarch we can do this because we have closures on subjects as well as objects. (this also gives us superclass closures which is unwanted, but in general these are ids in a different id space; longer term we should have one closure for equivalence, and one for reflexive closure over subclass).
@kltm - as part of general alignment can we have subject_closures in amigo-golr too? No action required on your part now I can make experimental PR, this is just for context.
request
https://api.monarchinitiative.org/api/graph/edges/from/HP%3A0000465
response
{
"code": 500,
"message": "There was an error processing your request. It has been logged (ID e97fa0aef407dbfa)."
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.