etal / cnvkit-dnanexus Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 4.0 134.9 MB

DNAnexus app for CNVkit

License: MIT License

Python 90.81% Shell 8.20% Makefile 0.99%

cnvkit-dnanexus's People

Contributors

Stargazers

Watchers

Forkers

geetduggal pacificanalytics xquek

cnvkit-dnanexus's Issues

Save scatter plots as PNG, not PDF

For WGS and sometimes WES, the scatter plot PDF is a big file that takes a long time to render in a PDF viewer.

Catch original CNVkit tracebacks to display in job failure message

Any exceptions within CNVkit, which runs inside a Docker container, result in a DNAnexus job error that just says the docker call failed.

Instead, capture the upstream exception and re-raise it, or otherwise use that underlying error message in the job-level error message that will be displayed to the user.

(This is tricky with subprocesses and probably trickier with containers.)

Add a local test script and example files

Create a simple test suite that runs the run_cnvkit function with a few combinations of inputs, using small example files. Consider CI.

Do coverage calculation in subjobs per sample

Calculate coverage for each sample in a separate subjob, then collect the results for the reference command and further downstream processing.

For WGS it might make sense to do the same parallelization for segment, another performance bottleneck.

(This and #6 imply no longer using cnvkit.py batch internally.)

Update citation structure in dxapp.json

See guidelines here:
https://github.com/dnanexus/file-apps/blob/master/docs/App%28let%29-Style-Guide.md#licenses

The JSON keys have changed a bit.

Use mosdepth to speed up coverage calculation

Bcbio uses Mosdepth in place of cnvkit.py coverage, then converts the output to .cnn format for downstream processing. It's significantly faster than bedcov for WGS and exomes.

Add export vcf files

For downstream SV analysis, running export vcf for each sample would be at least as helpful as the currently emitted SEG file.

Do single-exon testing

After inferring segments, use the script cnv_ztest.py to run per-bin tests for alterations. Include the output of that script in the app output.

Update to v0.8.1

Ensure CNVkit 0.8.1 and its dependencies are installed properly.

Changes to how cnvkit.py is run:

--drop-low-coverage is an option to batch; no need to repeat segmentation
--method choice of wgs, hybrid, amplicon
access is optional, so skip it

Sometimes only a few control samples are available, and they're known to contain SVs/CNAs. This option uses the given controls to calculate bin weights, but then zeros out the log2 values in the reference.cnn file (with chrY=-1 and chrX too with --haploid-x-reference). It's still better than using a flat reference.

Traceback (most recent call last):
  File "/usr/local/bin/cnvkit.py", line 13, in <module>
    args.func(args)
  File "/usr/local/lib/python2.7/dist-packages/cnvlib/commands.py", line 1391, in _cmd_gender
    table = do_gender(cnarrs, args.male_reference)
  File "/usr/local/lib/python2.7/dist-packages/cnvlib/commands.py", line 1412, in do_gender
    return pd.DataFrame.from_records(rows, columns=columns)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 981, in from_records
    first_row = next(data)
  File "/usr/local/lib/python2.7/dist-packages/cnvlib/commands.py", line 1410, in <genexpr>
    rows = (guess_and_format(cna) for cna in cnarrs)
  File "/usr/local/lib/python2.7/dist-packages/cnvlib/commands.py", line 1406, in guess_and_format
    ("Female", "Male")[is_xy],
TypeError: tuple indices must be integers, not NoneType

Resulting in:

CalledProcessError: Command
    dx-docker run -v /home/dnanexus:/workdir -w /workdir etal/cnvkit:0.8.3 \
        cnvkit.py gender -o gender.csv GE0119B.cnr
returned non-zero exit status 1

Solution:

Ensure is_xy is not None at this point in the code. Rebuild and publish app version 0.8.3.
Update the app to upstream CNVkit v0.8.5. Note the gender command is renamed to sex, but options are still mostly the same.