Git Product home page Git Product logo

monarch-initiative / biolink-api Goto Github PK

View Code? Open in Web Editor NEW
63.0 16.0 25.0 1.22 MB

API for linked biological knowledge

Home Page: https://api.monarchinitiative.org/api/

License: BSD 3-Clause "New" or "Revised" License

Python 82.53% Shell 0.40% Perl 1.44% Makefile 0.24% Gherkin 14.89% CSS 0.05% HTML 0.22% Dockerfile 0.22% Procfile 0.02%
bioinformatics ontologies phenotypes model-organisms python swagger gene api

biolink-api's People

Contributors

balhoff avatar cmungall avatar deepakunni3 avatar falquaddoomi avatar gaurav avatar iimpulse avatar jmcmurry avatar jnguyenx avatar kltm avatar kshefchek avatar lpalbou avatar nathandunn avatar putmantime avatar selewis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

biolink-api's Issues

exclude_automatic_assertions fails

This is the error: <title>Error 400 undefined field evidence_object_closure</title>

HTTP ERROR 400

Problem accessing /solr/select/. Reason:

    undefined field evidence_object_closure


Powered by Jetty://

This is the python test script
from biogolr.golr_associations import search_associations, search_associations_compact, GolrFields, select_distinct_subjects, get_objects_for_subject, get_subjects_for_object

M=GolrFields()

HUMAN_SHH = 'NCBIGene:6469'
HOLOPROSENCEPHALY = 'HP:0001360'
TWIST_ZFIN = 'ZFIN:ZDB-GENE-050417-357'
DVPF = 'GO:0009953'

def test_go_assocs():
results = search_associations(subject=TWIST_ZFIN,
exclude_automatic_assertions=True,
slim=['GO:0001525','GO:0048731','GO:0005634'],
object_category='function')

assocs = results['associations']
assert len(assocs) > 0
n_found = 0
for r in assocs:
    print("Direct: {} Slimmed: {}".format(r['object'],r['slim']))
    if 'GO:0002040' == r['object']['id']:
        if 'GO:0048731' in r['slim']:
            n_found = n_found+1
assert n_found == 1

Explore ways to add smartapi annotations

smartapi provides an editor to add new annotations. Ideally we would by able to do this by adding annotations at the flask level, rather than exporting the json, editing it and having it get out of sync.

as a first pass test we could do a one off export, extend it in the editor, just as a POC

https://websmartapi.github.io/smartapi/
smart-api.info/editor/#/

MacOSX Install issues with matplotlib. Easy workaround

Mac users may encounter this error when running python biolink/app.py:

    from matplotlib.backends import _macosx
RuntimeError: Python is not installed as a framework. The Mac OS X backend will not be able to function correctly if Python is not installed as a framework. See the Python documentation for more information on installing Python as a framework on Mac OS X. Please either reinstall Python as a framework, or try one of the other backends. If you are Working with Matplotlib in a virtual enviroment see 'Working with Matplotlib in Virtual environments' in the Matplotlib FAQ

Quick fix

echo "backend: TkAgg" >> ~/.matplotlib/matplotlibrc

Details

http://stackoverflow.com/questions/4130355/python-matplotlib-framework-under-macosx

Ensure ID rewriting is applied universally

The GO golr uses MGI:MGI:, the monarch MGI:

We standardize on the latter. Currently we rewrite behind the scenes to the doubled version, but this only happens for some fields.

ortholog identifiers

This may turn into 2-3 separate issues, but lets start. The objective is to determine whether any of the orthologs of a given gene have been functionally described using GO.

This query:
https://api.monarchinitiative.org/api/bioentity/gene/MGI%3A97490/homologs/?homology_type=O&rows=20&fetch_objects=true

Returns these results:
"FlyBase:FBgn0005558",
"FlyBase:FBgn0019650",
"NCBIGene:100022926",
"NCBIGene:100514152",
"NCBIGene:181251",
"NCBIGene:25509",
"NCBIGene:286857",
"NCBIGene:5080",
"NCBIGene:695746",
"NCBIGene:737387",
"ZFIN:ZDB-GENE-041210-244",
"ZFIN:ZDB-GENE-081022-10",
"ZFIN:ZDB-GENE-990415-200"

Respectively: fly, fly, possum, pig, WORM, RAT, bovine, HUMAN, macaque, chimp, zfish, zfish, zfish

  1. First problem is that my objective was to use these results to issue a subsequent query to determine whether these homologs have experimental GO annotations associated with them. However AmiGO/GOlr doesn't recognize NCBIGene identifiers so I'll never see results that are actually available. The is especially harmful for human genes. Need the UniProt identifier to find results in AmiGO.

  2. There is also a worm ortholog that is not showing the WB id (NCBIGene:181251 is worm). Same for rat (NCBIGene:25509), need the RGD identifier.

Implement causalmodels API

The causalmodels (cam) routes in the API provider operations over LEGO models (encompassing causal phenotype models). These are implemented via SPARQL queries over a pre-reasoned triplestore.

Implement some methods for handling

  • model querying
  • instance querying
  • 'annoton' querying
  • cross-model analytic queries

Split core obographs/bioontology package into its own top level package

In the top level we have (in order of dependency)

  • biolink - the flaskrest code
  • obographs - general purpose ontology library
  • biogolr - wrapper for golr
  • prefixcommons - curie util

The last 3 can be combined into a general purpose python library for working with ontologies and ontology associations.

Fix gene/<id>/homologs to use homology type

The API allows the provision of any relationship type for any route that returns association objects (the precomputed relation closure in solr is used for this).

However, for specific routes like /gene//homologs, we want to provide a convenience enum, e.g. corresponding to O/P/LDO. These would trivially map to the RO ID.

We will follow this design pattern in other places where a fixed set of relations are used to connect two categories - e.g. drug-disease, disease-model

FB vs. FlyBase

For example: is it FlyBase:FBgn0005558 or FB:FBgn0005558?
There is a discordancy in Biolink, because the method to retrieve the orthologs for a gene uses 'FlyBase' for the matching ortholog identifiers, but if subsequently you try and use this ID to retrieve GO annotations nothing will come back because GOlr (and PANTHER and the GPCR) use FB as the resource prefix.

Add Data Paging

At the moment there is only an option to choose how many rows to return (at least for GET /search/entity/{term}). There should also be an option to choose at what index to start from. It would be nice to be able to load a thousand items in batches of tens or hundreds, or something like that, without having to load all thousand at once.

Allow configuration to point to other services

The external URLs are hard-coded. It'll be useful to be able to have that configurable so biolink can point to other endpoints.

For Monarch we'll use that to point to the dev vs prod services.

short-hand function to support 'ribbon' display

Would be convenient for the 'ribbon' display if there were a method to return T/F for each (GO) class in a slim given i) a gene ID (or entity) and ii) the slim to use.

It should default to setting "exclude_automatic_assertions" to True.

This kind be accomplished via get gene/function (once exclude_automatic_assertions is working) and parsing results, but a lot of unnecessary transfer of data that's ignored.

Abstract over querying different golr instances

GO and Monarch have similar patterns in their golrs, but different fields. Because the GO schema is less generic we have (somewhat) specific field names such as bioentity (typically a gene or gene product) or annotation_class (typically GO terms)

The Monarch schema is more generic, and we don't assume the LHS is an 'entity', or that the RHS is a class (consider gene-gene interactions), so we use rdf terminology like subject/object for associations.

We should explore patterns for abstracting over these

Demo shopping cart functionality

This is intended to be very quick and proof of concept.

Create a new route in api/cart/endpoints/cart.py

cart/<id>/

  • GET - show entities in cart (just IDs). Defer fetching labels for now.
  • PUT - creates a new cart, returns uuid
  • POST - add an object to a cart. It can be any entity - just needs a CURIE

See variantset for example code for wrapping a sqlite database. I'd rather avoid sql dependencies, even sqllite. Could just be in-memory and destroyed at end of lifecycle of server for first pass.

Next we can explore actions on those items:

  • fetching all associations
  • comparing two carts
  • run owlsim (assume all entities are phenotypes, or something like genes that can be expanded to phenotypes)
  • enrichment

Improve biolink API calls wrapping wikidata for disease-substance

We currently have a path /disease/{id}/substance. Currently this returns a substance associated with a disease, but the association model doesn't follow the associations that come from golr.

Like most biolink routes, these are simple facades over more powerful services. In this case, this wraps a SPARQL query provided by @stuppie in NCATS-Tangerine/ncats-ingest#19 (comment)

See:

https://github.com/biolink/biolink-api/blob/a5195d0a2400bd53dbfce190d253f11f48943fbe/biolink/api/bio/endpoints/bioentity.py#L273-L286

In fact this is a facade over a facade; we have a small general purpose python library that wraps WD SPARQL calls: https://github.com/biolink/biolink-api/blob/6a339ef0fa71fafdb1cf0a1a14037124a7aedcbf/biowikidata/wd_sparql.py#L85-L87

The scripps team have a superior API but this one fits our purposes just now

(dipper team, @kshefchek @DoctorBud @mbrush @TomConlin take note. We can reuse something like this rather than duplicating ingest work.. in fact we can even simply federate our own graph queries, and start the wd ingest at the solr stage.. this ties in with the translator kbio plan).

we need to

  • make sure we grab the qualifiers, see the ticket above
  • map to the association model
  • think about the strategy for inference. Can we use the ontologies loaded into WD? What about synchrony, etc

implement /phenotype/<id>/anatomy/

this is just a stub so far.

this can be implemented by the paths:

  • (phenotype)-[has part]->(_)-[inheres_in|inheres_in_part_of]->(anatomy)
  • (phenotype)-[inheres_in|inheres_in_part_of]->(anatomy)

we may also want an option for traversing up the subclass hierarchy until a match is found.

How to do this?

  • SciGraph graph calls?
  • SciGraph dynamic?
  • BOLT?

Make compatible with smart-api

Enter

http://api.monarchinitiative.org/api/swagger.json

(or the equivalent on localhost:5000 if testing)

Into http://smart-api.info/editor/#/

And fix errors.

Once the errors are fixed, hit 'Save' - this registers the API.

It may get tedious to do this with every release, maybe there is a way to automate via github releases, can look into this later.

Note: the json is generated by the flaskrestplus framework, so the errors must be fixed at source. However, for debugging purposes we can edit the json

A particular slim query with ZFIN:ZDB-GENE-990415-200 gives a 500 error

Add counts for distinct subjects and distinct objects in association queries

association counts are trivially returned for any association query, even if pageination used.

For counts of distinct subjects and objects, we can't assume that |Subject|=1 (consider intermediate disease as subject) or |Object|=|Association|

Options are:

  1. precompute ahead of time and populate solr with precomputed results
  2. utilize some function of solr I am not aware of
  3. iterate over all results and count

1 is not good as we would have to precompute for the cross-product of all query options

2 may require further investigation

3 will work fine for any subject query for which the subject is loosely-speaking entity level, with a predictably bounded number of associations. E.g. genes, specific diseases.

Test /mart/ routes

See mart.py for how this works.

for owlsim we'll need another route for /disease/phenotype/taxon/

Reorganize sub-packages

Better separate API-independent business logic from API

Eventually general purpose modules should live in their own repo and be distributed using normal python mechanisms.

For now, keep these at the top level, e.g.

  • scigraph
  • obographs
  • golr

Add parameter to dynamically map ID spaces

In the golr schema we always choose a clique leader from the equivalence set for setting fields like subject, object. In Monarch this is done by @jnguyenx's clique leader code. In GO this is hardwired along partly political grounds as to whose ID to show.

Clients often want associations mapped to their ID space of choice. While it is always possible to do this as a post-processing step using an ID map service, better to give the option to do this on the fly.

In Monarch we can do this because we have closures on subjects as well as objects. (this also gives us superclass closures which is unwanted, but in general these are ids in a different id space; longer term we should have one closure for equivalence, and one for reflexive closure over subclass).

@kltm - as part of general alignment can we have subject_closures in amigo-golr too? No action required on your part now I can make experimental PR, this is just for context.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.