biocommons / anyvar Goto Github PK
View Code? Open in Web Editor NEW[in development] Proof-of-Concept variation translation, validation, and registration service
Home Page: https://github.com/biocommons/anyvar
License: Apache License 2.0
[in development] Proof-of-Concept variation translation, validation, and registration service
Home Page: https://github.com/biocommons/anyvar
License: Apache License 2.0
The serialization model for VRS 2.0 has been updated in the 2.0.0a5 release. Unit tests need to be updated to account for the changes.
ie newer VRS objects -- copy number, genotype, categorical variation, etc
It could be useful for us to deploy some kind of demo instance (eg reset data every week or something).
If users want to test out the notebook, they will be unable to since the file is not in the repo or in a public s3 bucket. We should provide them with the input file.
Previously this was available via the translator layer, but when we swapped the Variation Normalizer in, some of that stuff became unavailable. Maybe we can make the Normalizer provide those endpoints so that we can restore their functioning here.
currently, anyvar is tightly coupled to a version of vrs-python classes. we'd like to be able to run anyvar with new versions.
The Snowflake storage backend writes VRS object batches asynchronously. Untimely termination could therefore result in pending batches being discarded. Providing an API for graceful shutdown would allow for all pending batches to be written prior to termination. This API should be callable as a PreStop hook in Kubernetes: https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/
See also #64
The changes for ga4gh/vrs-python#305 include a new parameter for vrs_enref
that returns both the object and its identifier. This should be called from AnyVar.put_object
to avoid the subsequent call to ga4gh_identify
.
Maybe worth restructuring in the spirit of #53 to be less bound to some of the internal details of the VRS VCF annotator
ClinGen team found that Redis wasn't cost-effective for caching at scale. They moved to RocksDB -- we may want to consider moving our NoSQL support efforts in that direction.
What are your thought on the counterpart of hgvs g->c in the context of this project. It would be nice if one could link e.g. an Allele that is represented in genomic coordinates to its counterparts that are mapped to transcript sequences. To take things to the next level, perhaps even link to a lifted-over representation on a different assembly?
EG, for /sequence/GRch38%3A1?start=2&end=10
(which is merely a capitalized 'c' away from being recognized -- case neutrality might be another issue worth looking into)
Traceback (most recent call last):
File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/flask/app.py", line 2525, in wsgi_app
response = self.full_dispatch_request()
File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/flask/app.py", line 1822, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/flask/app.py", line 1820, in full_dispatch_request
rv = self.dispatch_request()
File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/flask/app.py", line 1796, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/connexion/decorators/decorator.py", line 68, in wrapper
response = function(request)
File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/connexion/decorators/uri_parsing.py", line 149, in wrapper
response = function(request)
File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/connexion/decorators/validation.py", line 399, in wrapper
return function(request)
File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/connexion/decorators/produces.py", line 41, in wrapper
response = function(request)
File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/connexion/decorators/response.py", line 112, in wrapper
response = function(request)
File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/connexion/decorators/parameter.py", line 120, in wrapper
return function(**kwargs)
File "/Users/jss009/code/anyvar/src/anyvar/restapi/routes/sequence.py", line 14, in get
return dp.get_sequence(alias, start, end), 200
File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/ga4gh/vrs/dataproxy.py", line 103, in get_sequence
return self._get_sequence(identifier, start=start, end=end)
File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/ga4gh/vrs/dataproxy.py", line 123, in _get_sequence
return self.sr.fetch_uri(coerce_namespace(identifier), start, end)
File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/biocommons/seqrepo/seqrepo.py", line 175, in fetch_uri
return self.fetch(alias=alias, namespace=namespace, start=start, end=end)
File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/biocommons/seqrepo/seqrepo.py", line 164, in fetch
seq_id = self._get_unique_seqid(alias=alias, namespace=namespace)
File "/Users/jss009/code/anyvar/venv/3.8/lib/python3.8/site-packages/biocommons/seqrepo/seqrepo.py", line 285, in _get_unique_seqid
raise KeyError("Alias {} (namespace: {})".format(alias, namespace))
KeyError: 'Alias 1 (namespace: GRch38)'
Pending PR implements a REST-based translator proxy that routes through the variation normalizer's /normalize
endpoint before storing. Unfortunately, this can't easily replace the necessary data sources for the anyvar sequence/sequence metadata endpoints, so we have to provide SeqRepo separately (this is a little inelegant).
Not sure what happened. But postgres in docker is now complaining about the connection not having a password. Maybe the security config shipped with the postgres images now don't allow passwordless connection by default. Don't want to get into tweaking the files inside the docker container, so just going to add a password, since that is easy enough.
Exception message encountered by myself and @larrybabb :
$ uvicorn anyvar.restapi.main:app --reload
INFO: Will watch for changes in these directories: ['/Users/kferrite/dev/anyvar']
INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO: Started reloader process [89724] using StatReload
/Users/kferrite/dev/anyvar/venv/3.11/lib/python3.11/site-packages/pydantic/_internal/_config.py:322: UserWarning: Valid config keys have changed in V2:
* 'schema_extra' has been renamed to 'json_schema_extra'
warnings.warn(message, UserWarning)
INFO: Started server process [89726]
INFO: Waiting for application startup.
ERROR: Traceback (most recent call last):
File "/Users/kferrite/dev/anyvar/venv/3.11/lib/python3.11/site-packages/starlette/routing.py", line 734, in lifespan
async with self.lifespan_context(app) as maybe_state:
File "/usr/local/Cellar/[email protected]/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/contextlib.py", line 210, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/kferrite/dev/anyvar/src/anyvar/restapi/main.py", line 39, in app_lifespan
storage = anyvar.anyvar.create_storage()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kferrite/dev/anyvar/src/anyvar/anyvar.py", line 41, in create_storage
storage = PostgresObjectStore(uri) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kferrite/dev/anyvar/src/anyvar/storage/postgres.py", line 30, in __init__
self.conn = psycopg.connect(db_url, autocommit=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kferrite/dev/anyvar/venv/3.11/lib/python3.11/site-packages/psycopg/connection.py", line 748, in connect
raise last_ex.with_traceback(None)
psycopg.OperationalError: connection failed: :1", port 5437 failed: fe_sendauth: no password supplied
Describe the bug
The README-pg.md file (for postgres setup commands) has the following command for the 3rd step
cat src/anyvar/storage/postres_init.sql | psql -h localhost -U postgres -p 5432
the postres_init.sql
filename is mispelled, it should be postgres_init.sql
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The actual command should be edit to show the correct filename:
cat src/anyvar/storage/postgres_init.sql | psql -h localhost -U postgres -p 5432
Additional context
For having more query options it would be nice to have a postgres based storage plugin for anyvar
The problem is the dependency ga4gh.vr[extras]>=0.2.0. There has been a significant amount of refactoring after version 0.2.0.
Attempted resolution by changing to ga4gh.vr[extras]>=0.2.0. However this resulted in another breaking change when aattempting to require bioutils>=1.0.0a4 from the ga4gh.vr dependencies in ga4gh.vr==0.2.0. It seems that bioutils version was either a typo or the bioutils releases did not strictly increase or had a break in the version sequence, as the latest bioutils release is 0.5.2.post3.
Will attempt another fix by refactoring references to moved or removed symbols.
NC_000003.12:g.10146527_10146528delCT
returns
{
"_id": "ga4gh:VA.hvwBZON5KzQGQazIMpeUu_dmyJ-xN8EV",
"type": "Allele",
"location": {
"_id": null,
"type": "SequenceLocation",
"sequence_id": "ga4gh:SQ.Zu7h9AggXxhTaGVsy7h_EZSChSZGcmgX",
"interval": {
"type": "SequenceInterval",
"start": {
"type": "Number",
"value": 10146524
},
"end": {
"type": "Number",
"value": 10146528
}
}
},
"state": {
"type": "LiteralSequenceExpression",
"sequence": "CT"
}
}
This means we might want to look at what we have for the other summary methods too since fully-justified normalization may mess with them too.
Originally posted by @korikuzma in #43 (comment)
Exception ignored in: <function PostgresObjectStore.__del__ at 0x10edbd550>
Traceback (most recent call last):
File "/Users/jss009/code/anyvar/src/anyvar/storage/postgres.py", line 54, in __del__
self._db.close()
AttributeError: 'PostgresObjectStore' object has no attribute '_db'
It looks like some of these methods may have been copied from the shelf
module and may need to be reimplemented.
The Snowflake storage backend writes VRS object batches asynchronously. Read operations may not reflect unwritten batches. Add a flush()
API endpoint that would return when pending batches are written. Specifically, the flush call would wait for any batches pending at the time the call was made to be completed.
See also #64
Ubuntu 18.10 reached end-of-life and repos are no longer available for docker build.
I am bumping to 20.04 (LTS) and will update here with any problems I encounter. The default python3 will go from 3.6.5
to 3.8.2
.
As a proof-of-principle, demonstrate registering all ClinVar variants into AnyVar as a way to stress-test the full software stack. To be clear, it is expected that many issues will be found, including valid variants that cannot be parsed, unsupported transcripts, reference data omissions, and other bugs.
Currently there is only a variation-normalizer translator which will continue to support, but would like the ability for users to have a native vrs-python translator only.
(paginated, presumably)
(larry adding here)
Installation instructions need to be updated and improved to work with the most recent version of anyvar translators
The VcfRegistrar._get_vrs_object()
currently only supports GRCh38. It should also support GRCh37.
ga4gh/vrs-python#295 added a compute_for_ref
parameter for VCF annotation. This parameter should be available as an option in the /vcf REST endpoint.
Computes VRS IDs for all REF and ALT alleles:
PUT /vcf
Computes VRS IDs for all REF and ALT alleles:
PUT /vcf?for_ref=True
Computes VRS IDs for all ALT alleles:
PUT /vcf?for_ref=False
We'd like to have an endpoint that provides summary statists about what type of alleles have been registered in anyvar.
Running make devready
results in a failure:
$ make devready
make venv/3.11 && source venv/3.11/bin/activate && make develop
make[1]: `venv/3.11' is up to date.
pip install -e .[dev,test]
Obtaining file:///Users/ehc6/workspaces/gdh/temp/anyvar
Installing build dependencies ... done
Checking if build backend supports build_editable ... done
Getting requirements to build editable ... error
error: subprocess-exited-with-error
× Getting requirements to build editable did not run successfully.
│ exit code: 1
╰─> [80 lines of output]
/private/var/folders/6p/7_6lw86168703nzwq8knl_m00000gp/T/pip-build-env-lzh7c1r8/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:75: _MissingDynamic: `dependencies` defined outside of `pyproject.toml` is ignored.
!!
********************************************************************************
The following seems to be defined outside of `pyproject.toml`:
`dependencies = ['canonicaljson', 'fastapi >= 0.95.0', 'python-multipart', 'uvicorn', 'ga4gh.vrs[extras] ~= 2.0.0a1', 'psycopg[binary]']`
According to the spec (see the link below), however, setuptools CANNOT
consider this value unless `dependencies` is listed as `dynamic`.
https://packaging.python.org/en/latest/specifications/declaring-project-metadata/
To prevent this problem, you can list `dependencies` under `dynamic` or alternatively
remove the `[project]` table from your file and rely entirely on other means of
configuration.
********************************************************************************
!!
_handle_missing_dynamic(dist, project_table)
/private/var/folders/6p/7_6lw86168703nzwq8knl_m00000gp/T/pip-build-env-lzh7c1r8/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:75: _MissingDynamic: `optional-dependencies` defined outside of `pyproject.toml` is ignored.
!!
********************************************************************************
The following seems to be defined outside of `pyproject.toml`:
`optional-dependencies = {'dev': ['black', 'ruff', 'pre-commit', 'bandit~=1.7'], 'test': ['pytest', 'pytest-cov', 'pytest-mock', 'httpx']}`
According to the spec (see the link below), however, setuptools CANNOT
consider this value unless `optional-dependencies` is listed as `dynamic`.
https://packaging.python.org/en/latest/specifications/declaring-project-metadata/
To prevent this problem, you can list `optional-dependencies` under `dynamic` or alternatively
remove the `[project]` table from your file and rely entirely on other means of
configuration.
********************************************************************************
Updating the pyproject.toml
to specify dependencies
and optional-dependencies
as dynamic fields resolves the issue.
ga4gh/vrs-python#345 added a new parameter to the VCFAnnotator._get_vrs_object()
method. In AnyVar, VcfRegistrar
overrides this method and so must also accept the new parameter.
VCF annotation current fails with the following error:
VcfRegistrar._get_vrs_object() got an unexpected keyword argument 'require_validation'
We'd like to have the possibility to query all alleles that were registered that are in 1 genomic region. For this we need 3 things
Otherwise, the /locations/
endpoint is a little hard to use
Currently, AnyVar excepts variation descriptions like simple HGVS strings, and converts them to VRS objects before storing them. Particularly intrepid users might already have done that work themselves, though, so we need a way to permit that (and we need to think about whether any further normalization should be performed).
Pending ga4gh/vrs#418, and updates to VRS-Python and then the Variation Normalizer (cancervariants/variation-normalization#394).
We'll probably need to add extra passable registration parameters for eg controlling how HGVS dup/del expressions are handled.
I know this section is just intended to be a brief demo, but it looks like the method call wasn't updated when the translator method was swapped out.
% python3 src/anyvar/anyvar.py
# ...
Traceback (most recent call last):
File "src/anyvar/anyvar.py", line 61, in <module>
v = av.translate_allele("NM_000551.3:c.1A>T", fmt="hgvs")
AttributeError: 'AnyVar' object has no attribute 'translate_allele'
Currently, anyvar is coupled to the VRS 1.x classes and we want a new version of Anyvar to be current with 2-alpha1. In the future we will investigate a ticket/design to decouple the class references from anyvar so that it can support multiple versions of VRS (potentially).
Add an implementation of anyvar.storage._Storage
that stores/queries a Snowflake database for VRS objects.
Snowflake storage implementation would be selected at runtime by specifying a snowflake://...
storage URI in the ANYVAR_STORAGE_URI
environment variable. Format of the Snowflake storage URI is snowflake://[account_identifier].snowflakecomputing.com/?[param=value]&[param=value]...
with the account_identifier
and parameters as defined by the Snowflake Python connector: https://docs.snowflake.com/en/developer-guide/python-connector/python-connector-api
The Snowflake storage implementation should write VRS object batches asynchronously to avoid network waits before responding (since the Snowflake database will not be local). Query operations operate solely on the stored VRS objects which means a query immediately after a batch-based VRS generation operation may not reflect the batch completely.
In #54 , we automatically choose the VrsPythonTranslator but should allow for different translators to be chosen. Example of what we could do:
from os import environ
from enum import Enum
class TranslatorType(str, Enum):
VRS_PYTHON = "vrs_python"
VARIATION_NORM = "variation_normalizer"
...
TRANSLATOR_TYPE = environ.get("TRANSLATOR_TYPE", TranslatorType.VRS_PYTHON.value)
def create_translator(translator_type: TranslatorType = TRANSLATOR_TYPE) -> _Translator:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.