Git Product home page Git Product logo

cnvkit-dnanexus's People

Contributors

etal avatar geetduggal avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

cnvkit-dnanexus's Issues

Catch original CNVkit tracebacks to display in job failure message

Any exceptions within CNVkit, which runs inside a Docker container, result in a DNAnexus job error that just says the docker call failed.

Instead, capture the upstream exception and re-raise it, or otherwise use that underlying error message in the job-level error message that will be displayed to the user.

(This is tricky with subprocesses and probably trickier with containers.)

Do coverage calculation in subjobs per sample

Calculate coverage for each sample in a separate subjob, then collect the results for the reference command and further downstream processing.

For WGS it might make sense to do the same parallelization for segment, another performance bottleneck.

(This and #6 imply no longer using cnvkit.py batch internally.)

Add export vcf files

For downstream SV analysis, running export vcf for each sample would be at least as helpful as the currently emitted SEG file.

Do single-exon testing

After inferring segments, use the script cnv_ztest.py to run per-bin tests for alterations. Include the output of that script in the app output.

Update to v0.8.1

Ensure CNVkit 0.8.1 and its dependencies are installed properly.

Changes to how cnvkit.py is run:

  • --drop-low-coverage is an option to batch; no need to repeat segmentation
  • --method choice of wgs, hybrid, amplicon
  • access is optional, so skip it

Option to "flatten" reference

Sometimes only a few control samples are available, and they're known to contain SVs/CNAs. This option uses the given controls to calculate bin weights, but then zeros out the log2 values in the reference.cnn file (with chrY=-1 and chrX too with --haploid-x-reference). It's still better than using a flat reference.

Use PyPy internally?

Apparently PyPy 6.0 can run pandas, so it can probably run CNVkit, too. (Unconfirmed.)

If so, see if using PyPy inside this app (or inside the docker image) provides a significant speedup for the post-segmentation steps (which can take a long time on WGS samples).

Add optional input "exclude_regions"

Excluding unmappable or otherwise problematic genomic regions is important when no or few control samples are available to build a pooled reference.

Add an exclude_regions input as an array of files that is passed to access -x internally.

Add output coverage.cnn files

The *.targetcoverage.cnn and possible *.antitargetcoverage.cnn files are useful to have for troubleshooting and for ad-hoc/custom work after the bulk processing is done.

Crash when inferring chromosomal sex

See etal/cnvkit#236

The key section of the traceback is here:

Traceback (most recent call last):
  File "/usr/local/bin/cnvkit.py", line 13, in <module>
    args.func(args)
  File "/usr/local/lib/python2.7/dist-packages/cnvlib/commands.py", line 1391, in _cmd_gender
    table = do_gender(cnarrs, args.male_reference)
  File "/usr/local/lib/python2.7/dist-packages/cnvlib/commands.py", line 1412, in do_gender
    return pd.DataFrame.from_records(rows, columns=columns)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 981, in from_records
    first_row = next(data)
  File "/usr/local/lib/python2.7/dist-packages/cnvlib/commands.py", line 1410, in <genexpr>
    rows = (guess_and_format(cna) for cna in cnarrs)
  File "/usr/local/lib/python2.7/dist-packages/cnvlib/commands.py", line 1406, in guess_and_format
    ("Female", "Male")[is_xy],
TypeError: tuple indices must be integers, not NoneType

Resulting in:

CalledProcessError: Command
    dx-docker run -v /home/dnanexus:/workdir -w /workdir etal/cnvkit:0.8.3 \
        cnvkit.py gender -o gender.csv GE0119B.cnr
returned non-zero exit status 1

Solution:

  1. Ensure is_xy is not None at this point in the code. Rebuild and publish app version 0.8.3.
  2. Update the app to upstream CNVkit v0.8.5. Note the gender command is renamed to sex, but options are still mostly the same.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.