etal / cnvkit-dnanexus Goto Github PK
View Code? Open in Web Editor NEWDNAnexus app for CNVkit
License: MIT License
DNAnexus app for CNVkit
License: MIT License
See etal/cnvkit#236
The key section of the traceback is here:
Traceback (most recent call last):
File "/usr/local/bin/cnvkit.py", line 13, in <module>
args.func(args)
File "/usr/local/lib/python2.7/dist-packages/cnvlib/commands.py", line 1391, in _cmd_gender
table = do_gender(cnarrs, args.male_reference)
File "/usr/local/lib/python2.7/dist-packages/cnvlib/commands.py", line 1412, in do_gender
return pd.DataFrame.from_records(rows, columns=columns)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 981, in from_records
first_row = next(data)
File "/usr/local/lib/python2.7/dist-packages/cnvlib/commands.py", line 1410, in <genexpr>
rows = (guess_and_format(cna) for cna in cnarrs)
File "/usr/local/lib/python2.7/dist-packages/cnvlib/commands.py", line 1406, in guess_and_format
("Female", "Male")[is_xy],
TypeError: tuple indices must be integers, not NoneType
Resulting in:
CalledProcessError: Command
dx-docker run -v /home/dnanexus:/workdir -w /workdir etal/cnvkit:0.8.3 \
cnvkit.py gender -o gender.csv GE0119B.cnr
returned non-zero exit status 1
Solution:
is_xy
is not None
at this point in the code. Rebuild and publish app version 0.8.3.gender
command is renamed to sex
, but options are still mostly the same.Bcbio uses Mosdepth in place of cnvkit.py coverage
, then converts the output to .cnn format for downstream processing. It's significantly faster than bedcov
for WGS and exomes.
Skipping 0.9.2 since it had a bug in segmetrics
.
For WGS and sometimes WES, the scatter plot PDF is a big file that takes a long time to render in a PDF viewer.
Excluding unmappable or otherwise problematic genomic regions is important when no or few control samples are available to build a pooled reference.
Add an exclude_regions
input as an array of files that is passed to access -x
internally.
See guidelines here:
https://github.com/dnanexus/file-apps/blob/master/docs/App%28let%29-Style-Guide.md#licenses
The JSON keys have changed a bit.
Sometimes only a few control samples are available, and they're known to contain SVs/CNAs. This option uses the given controls to calculate bin weights, but then zeros out the log2 values in the reference.cnn file (with chrY=-1 and chrX too with --haploid-x-reference
). It's still better than using a flat reference.
Watch for upstream issue etal/cnvkit#135 here.
See also etal/cnvkit#39.
Apparently PyPy 6.0 can run pandas, so it can probably run CNVkit, too. (Unconfirmed.)
If so, see if using PyPy inside this app (or inside the docker image) provides a significant speedup for the post-segmentation steps (which can take a long time on WGS samples).
Ensure CNVkit 0.8.1 and its dependencies are installed properly.
Changes to how cnvkit.py
is run:
--drop-low-coverage
is an option to batch
; no need to repeat segmentation--method
choice of wgs
, hybrid
, amplicon
access
is optional, so skip itCalculate coverage
for each sample in a separate subjob, then collect the results for the reference
command and further downstream processing.
For WGS it might make sense to do the same parallelization for segment
, another performance bottleneck.
(This and #6 imply no longer using cnvkit.py batch
internally.)
For downstream SV analysis, running export vcf
for each sample would be at least as helpful as the currently emitted SEG file.
After inferring segments, use the script cnv_ztest.py
to run per-bin tests for alterations. Include the output of that script in the app output.
The *.targetcoverage.cnn
and possible *.antitargetcoverage.cnn
files are useful to have for troubleshooting and for ad-hoc/custom work after the bulk processing is done.
Any exceptions within CNVkit, which runs inside a Docker container, result in a DNAnexus job error that just says the docker call failed.
Instead, capture the upstream exception and re-raise it, or otherwise use that underlying error message in the job-level error message that will be displayed to the user.
(This is tricky with subprocesses and probably trickier with containers.)
Create a simple test suite that runs the run_cnvkit
function with a few combinations of inputs, using small example files. Consider CI.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.