griffithlab / civicpy Goto Github PK
View Code? Open in Web Editor NEWA python interface for the CIViC db application
License: MIT License
A python interface for the CIViC db application
License: MIT License
Currently cache building with CIViCpy is very hard on the CIViC server.
CIViC server now has built-in pagination: griffithlab/civic-server#632
We should update the CIViCpy caching behavior to use the pagination features to reduce this burden.
Might save us some effort down the road.
Provides CIViC variant / assertion IDs to VCF
Currently the civic client displays None
values as -
, some users may interpret this to mean that the -
is a discrete value that is used in the CIViC data tables.
We should correct users if they use a -
in queries for CIViCpy, and consider issuing a warning message about improper usage.
Dear CIViCPy team,
May be I am doing something really wrong and I don't know why I am getting this message when I try to access civirecord as mentioned in the example code snippet:
nexus-253:~ vipin$ export CIVICPY_CACHE_FILE=/Users/vipin/Downloads/nightly-civicpy_cache.pkl
nexus-253:~ vipin$ python
Python 3.7.6 (default, Dec 22 2019, 01:09:06)
[Clang 11.0.0 (clang-1100.0.33.12)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from civicpy import civic, version
>>> version()
'1.1.2'
>>> variant = civic.get_variant_by_id(12)
WARNING:root:Local cache at /Users/vipin/Downloads/nightly-civicpy_cache.pkl is stale, updating from remote.
WARNING:root:Downloading remote cache from https://civicdb.org/downloads/nightly/nightly-civicpy_cache.pkl.
WARNING:root:Local cache at /Users/vipin/Downloads/nightly-civicpy_cache.pkl is stale, updating from remote.
....
This is just continuing on the python console and I am not able to access the records. Do you any idea what is wrong here. The machine have outside connection to access the service and I would like to use the cache file I have downloaded to get the records.
Regards,
Vipin
We should add in caching of actions as an optional extension to CIViCpy. This will support the work in #50.
Currently, we do some very large (and low-count) requests. This might not be captured by CIViC's throttling mechanism, which is request count based. We may want to consider the strategies to mitigate widespread adoption of CIViCpy, particularly if we fail to update our nightly civicpy cache and clients revert to hard (API-driven) updates.
@susannasiebert and @acoffman what do you think?
Right now we only support basic assertion and evidence information. We should add at minimum all fields shared between assertions and evidence items.
We've had users request that we provide GRCH38 coordinates. So either a query that could find CIViC evidence using 38 coordinates as a query, or a CIViC converter that just supplies EIDs of interest with 38 coordinates annotated as well (potentially the whole dataset).
Reviewer comments:
I can't find documentation on http://docs.civicpy.org about how to do the bulk coordinate query against the VCI described in the article and Figure 2. Is it documented in the API guide?
We should address this and more broadly ensure documentation matches current functionality.
Some users of CIViCpy may be interested in using it to do secure annotation of variants. We should consider adding an ENV flag that prevents CIViCpy from contacting CIViC for anything other than full-db cache building. While this is typical behavior anyway, such a flag would enforce this expectation, throwing an error if someone tries to do variant queries from an instance without a cache.
This is a small QoL feature that would be nice-to-have and (I expect) simple to implement.
As seen here. Currently the jobs run simultaneously, but pypi release must be made before the docker image is created.
thanks a lot
I see civic update frequentyly, does the cache update itself.
and is there any useful examples of using
and there is no $HOME/.civicpy/cache.pkl after sudo pip3 install civicpy
it aslo not locates at /usr/local/lib/python3.6/site-packages/civicpy
variants
with no non-rejected evidence, rejected assertions
, and rejected evidence
should be excluded from the respective get-all
routines.
request from VHL group
Query encompassing and record encompassing search strategies need to be implemented for bulk coordinate search. See Figure 2D from manuscript for details.
Currently, we support only Python 3.7+, as we use some features that are new to Python 3.7. We have had a request to make CIViCpy compatible with 3.6, and we should consider 3.5 as well, as maintenance plans for 3.5 extend until late 2020.
This ticket will initially track the feasibility / effort of adding support for these versions, as well as progress on the added support if we choose to accept the proposal.
Python 2 will remain unsupported, as it will no longer be maintained in 2020.
Include a select filter function, with some predefined recipes.
Example recipe:
Currently supported as CivicAttribute
class, promote to full CivicRecord
.
CIViCpy should, by default, timestamp civicrecord
objects. There should be routines for storing and loading civicrecord
objects, and checks should be made based off of a global timedelta value (which should be loaded in __init__.py
).
Bonus points for:
because we like coverage tracking
Currently our builds take a long time in CI, ~10 minutes each. The majority of this time is spent in tests.
We should consider how we may speed up tests. I expect that this is directly linked to download of the test cache from production.
per @jmcmichael. Waiting on v1.0 release.
Currently only the CIViC default record state is available through CIViCpy. We should add functionality to pre-cache, search, and selectively apply pending revisions to records.
This is anticipated to be a major feature addition.
Sources are treated as first-class entities in CIViC, and have their own search page. These are currently implemented as "smart" CivicAttribute
objects, but are intended to be CivicRecord
objects.
In response to reviewer #1, provide example code in manuscript as a Figure, and remove code snippets from text by referring instead to lines of figure, beautifying the text.
We need to add in support for configurable and automated retrieval of CIViCpy caches.
In the process, we can remove stale record checking and just leave it to a single check at cache load.
Hi,
I'm trying to annotate my vcf using civicpy.
I have installed civicpy using pip. and i gave the below command to annotate it.
civicpy annotate-vcf --input-vcf SRR12656923_GATK_filtered.vcf --output-vcf SRR12656923 --reference GRCh38 -i submitted
after a while im getting an error :
Traceback (most recent call last):
File "/usr/local/bin/civicpy", line 8, in
sys.exit(cli())
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/civicpy/cli.py", line 88, in annotate_vcf
variants = civic.search_variants_by_coordinates(query, search_mode='exact')
File "/usr/local/lib/python3.7/site-packages/civicpy/civic.py", line 1303, in search_variants_by_coordinates
raise ValueError("Can't use wildcard when searching for non-GRCh37 coordinates")
SRR12656923_GATK_filtered.vcf.tar.gz
i have also enclosed my vcf file for your reference.
Hi,
version 2.0.0.
I was trying to use civic.search_variants_by_coordinates()
as seen in the provided Genie example.
Command:
coords = civic.CoordinateQuery(chr='7', start=140453136, stop=140453136) civic.search_variants_by_coordinates(coords, search_mode='any')
However, this is not possible as I get the following error message:
`AttributeError Traceback (most recent call last)
Cell In [13], line 2
1 test = civic.CoordinateQuery(chr='7', start=140453136, stop=140453136)
----> 2 civic.search_variants_by_coordinates(test, search_mode='query_encompassing')
File ~/miniconda3/envs/work/lib/python3.10/site-packages/civicpy/civic.py:2085, in search_variants_by_coordinates(coordinate_query, search_mode)
2083 chromosome = str(coordinate_query.chr)
2084 # overlapping = (start <= ct.stop) & (stop >= ct.start)
-> 2085 left_idx = chr_idx.searchsorted(chromosome)
2086 right_idx = chr_idx.searchsorted(chromosome, side='right')
2087 chr_ct_idx = chr_idx[left_idx:right_idx].index
AttributeError: 'NoneType' object has no attribute 'searchsorted'`
We need mock testing to support PR #66. This is a little more involved, but important enough for the "high priority" flag considering the implications in the event that the CIViC pre-cache is unavailable or stale (which happens on occasion).
Reviewer suggests:
In some cases it may be desirable to force a cache refresh but if your server is down, then sticking with the local cache would be better than a failure. Is it possible to request a cache refresh that would return a server down result without destroying the current cache so it could be used as a fall back?
We should implement as suggested.
Create function to allow users to query CIViC for a variant from a given gene symbol and variant name.
Currently we hard-code the remote cache:
Line 4 in 2e5de2a
We should allow for the setting of this URL by environment variable for loading from other CIViC instances.
currently revision retrieval is handled in recipes. We should add in support to retrieve directly from a CivicRecord
.
Travis cannot seem to pull down the test pkl from civicdb.org. Our tests will continue to fail until this is resolved.
From manuscript reviewer:
The pip package setup.py includes the testing framework dependencies that are not needed by most users of the application. Could this be setup so they are not installed by default or are they needed?
We will separate out requirements needed only for testing to reduce dependency load.
See relevant documentation here: https://setuptools.readthedocs.io/en/latest/setuptools.html
write up documentation on civicpy.org prior to release.
because we like CI
I'm not sure if this is even possible but we should implement tests for the Jupyter notebooks to catch any syntax that might've changed that would break them.
hi doctor, my script as follows:
##################################
from civicpy import civic, exports
with open('civic_variants.vcf', 'w', newline='') as file:
w = exports.VCFWriter(file)
all_variants = civic.get_all_variants()
w.addrecords(all_variants)
w.writerecords()
###########################the erro:
Traceback (most recent call last):
File "VEP.py", line 1, in
from civicpy import civic, exports
ImportError: cannot import name 'exports'
what should I do?
Reviewer mentions:
I did not see any description, either in the paper nor in the github repo, on accessing the exact provenance information (e.g. date/time of approval). Is this information available?
We should add lifecycle_actions to cached information.
because we like high test coverage.
Currently hard-coded to 7 days. Let's make it configurable in the vein of other options found at __init__.py
.
File "/home/ubuntu/.local/lib/python3.7/site-packages/civicpy/exports.py", line 257, in writerecords assert '=' not in v
Suggested fix: replace the =
with the wildtype amino acid e.g. E55=
would be E55E
.
Allow searching of records by coordinates. Require pre-caching of all variants to speed up subsequent searches.
Currently, _include_filter gets cached, and could mistakenly be retrieved from cache. Refactor calls to cache to always build new CivicRecord objects to prevent _include_filter (and other private variables) from propagating and potentially creating unintended behaviors.
until #4 is completed, we should at least validate that the current set of coordinates are both GRCh37 and 1-based coordinates.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.