Git Product home page Git Product logo

anyvar's People

Contributors

andreasprlic avatar ehclark avatar jarbesfeld avatar jsstevenson avatar korikuzma avatar larrybabb avatar naomifox-invitae avatar reece avatar theferrit32 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

anyvar's Issues

Deploy demo version

It could be useful for us to deploy some kind of demo instance (eg reset data every week or something).

Ensure basic exposure of SeqRepo functions

Previously this was available via the translator layer, but when we swapped the Variation Normalizer in, some of that stuff became unavailable. Maybe we can make the Normalizer provide those endpoints so that we can restore their functioning here.

Investigate different NoSQL storage backends

ClinGen team found that Redis wasn't cost-effective for caching at scale. They moved to RocksDB -- we may want to consider moving our NoSQL support efforts in that direction.

Be able to connect different Allele representations

What are your thought on the counterpart of hgvs g->c in the context of this project. It would be nice if one could link e.g. an Allele that is represented in genomic coordinates to its counterparts that are mapped to transcript sequences. To take things to the next level, perhaps even link to a lifted-over representation on a different assembly?

Catch exceptions for unrecognized sequences in /sequence/

EG, for /sequence/GRch38%3A1?start=2&end=10 (which is merely a capitalized 'c' away from being recognized -- case neutrality might be another issue worth looking into)

Traceback (most recent call last):
  File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/flask/app.py", line 2525, in wsgi_app
    response = self.full_dispatch_request()
  File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/flask/app.py", line 1822, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/flask/app.py", line 1820, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/flask/app.py", line 1796, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/connexion/decorators/decorator.py", line 68, in wrapper
    response = function(request)
  File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/connexion/decorators/uri_parsing.py", line 149, in wrapper
    response = function(request)
  File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/connexion/decorators/validation.py", line 399, in wrapper
    return function(request)
  File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/connexion/decorators/produces.py", line 41, in wrapper
    response = function(request)
  File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/connexion/decorators/response.py", line 112, in wrapper
    response = function(request)
  File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/connexion/decorators/parameter.py", line 120, in wrapper
    return function(**kwargs)
  File "/Users/jss009/code/anyvar/src/anyvar/restapi/routes/sequence.py", line 14, in get
    return dp.get_sequence(alias, start, end), 200
  File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/ga4gh/vrs/dataproxy.py", line 103, in get_sequence
    return self._get_sequence(identifier, start=start, end=end)
  File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/ga4gh/vrs/dataproxy.py", line 123, in _get_sequence
    return self.sr.fetch_uri(coerce_namespace(identifier), start, end)
  File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/biocommons/seqrepo/seqrepo.py", line 175, in fetch_uri
    return self.fetch(alias=alias, namespace=namespace, start=start, end=end)
  File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/biocommons/seqrepo/seqrepo.py", line 164, in fetch
    seq_id = self._get_unique_seqid(alias=alias, namespace=namespace)
  File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/biocommons/seqrepo/seqrepo.py", line 285, in _get_unique_seqid
    raise KeyError("Alias {} (namespace: {})".format(alias, namespace))
KeyError: 'Alias 1 (namespace: GRch38)'

Modularize translator/data proxy layer

Pending PR implements a REST-based translator proxy that routes through the variation normalizer's /normalize endpoint before storing. Unfortunately, this can't easily replace the necessary data sources for the anyvar sequence/sequence metadata endpoints, so we have to provide SeqRepo separately (this is a little inelegant).

  • It'd be great to get all data from one place somehow. Maybe that means tweaking the normalizer endpoints a bit to expose more SeqRepo stuff. Alternatively, it's not clear to me if the sequence endpoints are a must-have in the long term. Kori has been thinking about this already: GenomicMedLab/cool-seq-tool#96
  • We should be able to inject our own Variation Normalizer instances directly within the process rather than hitting a REST endpoint.
  • We should be able to use VRS Python (REST or internally) in place of the variation normalizer. In #33 I undid some of the modularization work Reece had built, it'd be good to get that back.

Fix postgres connection issue

Not sure what happened. But postgres in docker is now complaining about the connection not having a password. Maybe the security config shipped with the postgres images now don't allow passwordless connection by default. Don't want to get into tweaking the files inside the docker container, so just going to add a password, since that is easy enough.

Exception message encountered by myself and @larrybabb :

$ uvicorn anyvar.restapi.main:app --reload
INFO:     Will watch for changes in these directories: ['/Users/kferrite/dev/anyvar']
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [89724] using StatReload
/Users/kferrite/dev/anyvar/venv/3.11/lib/python3.11/site-packages/pydantic/_internal/_config.py:322: UserWarning: Valid config keys have changed in V2:
* 'schema_extra' has been renamed to 'json_schema_extra'
  warnings.warn(message, UserWarning)
INFO:     Started server process [89726]
INFO:     Waiting for application startup.
ERROR:    Traceback (most recent call last):
  File "/Users/kferrite/dev/anyvar/venv/3.11/lib/python3.11/site-packages/starlette/routing.py", line 734, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/usr/local/Cellar/[email protected]/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kferrite/dev/anyvar/src/anyvar/restapi/main.py", line 39, in app_lifespan
    storage = anyvar.anyvar.create_storage()
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kferrite/dev/anyvar/src/anyvar/anyvar.py", line 41, in create_storage
    storage = PostgresObjectStore(uri)  # type: ignore
              ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kferrite/dev/anyvar/src/anyvar/storage/postgres.py", line 30, in __init__
    self.conn = psycopg.connect(db_url, autocommit=True)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kferrite/dev/anyvar/venv/3.11/lib/python3.11/site-packages/psycopg/connection.py", line 748, in connect
    raise last_ex.with_traceback(None)
psycopg.OperationalError: connection failed: :1", port 5437 failed: fe_sendauth: no password supplied

postgres setup command typo

Describe the bug
The README-pg.md file (for postgres setup commands) has the following command for the 3rd step
cat src/anyvar/storage/postres_init.sql | psql -h localhost -U postgres -p 5432
the postres_init.sql filename is mispelled, it should be postgres_init.sql

To Reproduce
Steps to reproduce the behavior:

  1. Go to src/anyvar/storage/README-pg.md
  2. See error

Expected behavior
The actual command should be edit to show the correct filename:
cat src/anyvar/storage/postgres_init.sql | psql -h localhost -U postgres -p 5432

Additional context

Refactors in ga4gh.vr after release 0.2.0 breaks some symbol imports

The problem is the dependency ga4gh.vr[extras]>=0.2.0. There has been a significant amount of refactoring after version 0.2.0.

Attempted resolution by changing to ga4gh.vr[extras]>=0.2.0. However this resulted in another breaking change when aattempting to require bioutils>=1.0.0a4 from the ga4gh.vr dependencies in ga4gh.vr==0.2.0. It seems that bioutils version was either a typo or the bioutils releases did not strictly increase or had a break in the version sequence, as the latest bioutils release is 0.5.2.post3.

Will attempt another fix by refactoring references to moved or removed symbols.

Summary statistics miss deletions where state isn't empty

NC_000003.12:g.10146527_10146528delCT returns

{
    "_id": "ga4gh:VA.hvwBZON5KzQGQazIMpeUu_dmyJ-xN8EV",
    "type": "Allele",
    "location": {
      "_id": null,
      "type": "SequenceLocation",
      "sequence_id": "ga4gh:SQ.Zu7h9AggXxhTaGVsy7h_EZSChSZGcmgX",
      "interval": {
        "type": "SequenceInterval",
        "start": {
          "type": "Number",
          "value": 10146524
        },
        "end": {
          "type": "Number",
          "value": 10146528
        }
      }
    },
    "state": {
      "type": "LiteralSequenceExpression",
      "sequence": "CT"
    }
  }

This means we might want to look at what we have for the other summary methods too since fully-justified normalization may mess with them too.

Originally posted by @korikuzma in #43 (comment)

self._db references raise AttributeError in Postgres module

Exception ignored in: <function PostgresObjectStore.__del__ at 0x10edbd550>
Traceback (most recent call last):
  File "/Users/jss009/code/anyvar/src/anyvar/storage/postgres.py", line 54, in __del__
    self._db.close()
AttributeError: 'PostgresObjectStore' object has no attribute '_db'

It looks like some of these methods may have been copied from the shelf module and may need to be reimplemented.

Set up CI

  • Lint/format (is there a "Biocommons standard"? Borrow from VRS-Python?)
  • Test runner

Update ubuntu version

Ubuntu 18.10 reached end-of-life and repos are no longer available for docker build.

I am bumping to 20.04 (LTS) and will update here with any problems I encounter. The default python3 will go from 3.6.5 to 3.8.2.

Demonstrate loading ClinVar into AnyVar

As a proof-of-principle, demonstrate registering all ClinVar variants into AnyVar as a way to stress-test the full software stack. To be clear, it is expected that many issues will be found, including valid variants that cannot be parsed, unsupported transcripts, reference data omissions, and other bugs.

Create a vrs-python translator

Currently there is only a variation-normalizer translator which will continue to support, but would like the ability for users to have a native vrs-python translator only.

Speed up Postgres writes

  • Use batch writes (ie transactions) where possible -- optionally share a cursor?
  • Goal is 1000 alleles per second

Development setup fails with build error

Running make devready results in a failure:

$ make devready
make venv/3.11 && source venv/3.11/bin/activate && make develop
make[1]: `venv/3.11' is up to date.
pip install -e .[dev,test]
Obtaining file:///Users/ehc6/workspaces/gdh/temp/anyvar
  Installing build dependencies ... done
  Checking if build backend supports build_editable ... done
  Getting requirements to build editable ... error
  error: subprocess-exited-with-error
  
  × Getting requirements to build editable did not run successfully.
  │ exit code: 1
  ╰─> [80 lines of output]
      /private/var/folders/6p/7_6lw86168703nzwq8knl_m00000gp/T/pip-build-env-lzh7c1r8/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:75: _MissingDynamic: `dependencies` defined outside of `pyproject.toml` is ignored.
      !!
      
              ********************************************************************************
              The following seems to be defined outside of `pyproject.toml`:
      
              `dependencies = ['canonicaljson', 'fastapi >= 0.95.0', 'python-multipart', 'uvicorn', 'ga4gh.vrs[extras] ~= 2.0.0a1', 'psycopg[binary]']`
      
              According to the spec (see the link below), however, setuptools CANNOT
              consider this value unless `dependencies` is listed as `dynamic`.
      
              https://packaging.python.org/en/latest/specifications/declaring-project-metadata/
      
              To prevent this problem, you can list `dependencies` under `dynamic` or alternatively
              remove the `[project]` table from your file and rely entirely on other means of
              configuration.
              ********************************************************************************
      
      !!
        _handle_missing_dynamic(dist, project_table)
      /private/var/folders/6p/7_6lw86168703nzwq8knl_m00000gp/T/pip-build-env-lzh7c1r8/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:75: _MissingDynamic: `optional-dependencies` defined outside of `pyproject.toml` is ignored.
      !!
      
              ********************************************************************************
              The following seems to be defined outside of `pyproject.toml`:
      
              `optional-dependencies = {'dev': ['black', 'ruff', 'pre-commit', 'bandit~=1.7'], 'test': ['pytest', 'pytest-cov', 'pytest-mock', 'httpx']}`
      
              According to the spec (see the link below), however, setuptools CANNOT
              consider this value unless `optional-dependencies` is listed as `dynamic`.
      
              https://packaging.python.org/en/latest/specifications/declaring-project-metadata/
      
              To prevent this problem, you can list `optional-dependencies` under `dynamic` or alternatively
              remove the `[project]` table from your file and rely entirely on other means of
              configuration.
              ********************************************************************************

Updating the pyproject.toml to specify dependencies and optional-dependencies as dynamic fields resolves the issue.

Clean up storage directory

  • Clean up postgres files
  • Move postgres README into #21 docs
  • Move any non-storage logic (eg validation) into a more central/shared location
  • Make DB initialization smoother

Query alleles in genomic region

We'd like to have the possibility to query all alleles that were registered that are in 1 genomic region. For this we need 3 things

  1. A form in the UI that allows a user to specify an accession, start, stop.
  2. A corresponding API call 'find_alleles"
  3. A backend that translates the input into the corresponding postgres query.

Register VRS objects

Currently, AnyVar excepts variation descriptions like simple HGVS strings, and converts them to VRS objects before storing them. Particularly intrepid users might already have done that work themselves, though, so we need a way to permit that (and we need to think about whether any further normalization should be performed).

AttributeError when AnyVar.__name__ == '__main__'

I know this section is just intended to be a brief demo, but it looks like the method call wasn't updated when the translator method was swapped out.

% python3 src/anyvar/anyvar.py
# ... 
Traceback (most recent call last):
  File "src/anyvar/anyvar.py", line 61, in <module>
    v = av.translate_allele("NM_000551.3:c.1A>T", fmt="hgvs")
AttributeError: 'AnyVar' object has no attribute 'translate_allele'

Migrate class dependencies to VRS 2-alpha1

Currently, anyvar is coupled to the VRS 1.x classes and we want a new version of Anyvar to be current with 2-alpha1. In the future we will investigate a ticket/design to decouple the class references from anyvar so that it can support multiple versions of VRS (potentially).

Support Snowflake database as a storage backend

Add an implementation of anyvar.storage._Storage that stores/queries a Snowflake database for VRS objects.

Snowflake storage implementation would be selected at runtime by specifying a snowflake://... storage URI in the ANYVAR_STORAGE_URI environment variable. Format of the Snowflake storage URI is snowflake://[account_identifier].snowflakecomputing.com/?[param=value]&[param=value]... with the account_identifier and parameters as defined by the Snowflake Python connector: https://docs.snowflake.com/en/developer-guide/python-connector/python-connector-api

The Snowflake storage implementation should write VRS object batches asynchronously to avoid network waits before responding (since the Snowflake database will not be local). Query operations operate solely on the stored VRS objects which means a query immediately after a batch-based VRS generation operation may not reflect the batch completely.

Parameterize creating different translator

In #54 , we automatically choose the VrsPythonTranslator but should allow for different translators to be chosen. Example of what we could do:

from os import environ
from enum import Enum

class TranslatorType(str, Enum):

  VRS_PYTHON = "vrs_python"
  VARIATION_NORM = "variation_normalizer"
  ...


TRANSLATOR_TYPE = environ.get("TRANSLATOR_TYPE", TranslatorType.VRS_PYTHON.value)

def create_translator(translator_type: TranslatorType = TRANSLATOR_TYPE) -> _Translator:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.