Git Product home page Git Product logo

indelcallingworkflow's People

Contributors

gordi avatar julseu avatar jwerner-dkfz avatar nagacombio avatar pjb7687 avatar vinjana avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

indelcallingworkflow's Issues

Port workflow to Nextflow

Migrate all jobs to Nextflow.

  • This is mostly about the structure of the job dependencies, because this is what is the workflow management system thought to manage. This is not about porting top-level (Bash) scripts, refactoring jobs or scripts.
  • Do not port the software environment. The Nextflow workflow should use the existing environment modules (create a dkfzLsf profile).
  • Create a Nextflow workflow that mimicks the Roddy workflow.
  • Remove or substitute RODDY_ references from the jobs.
  • Include job-resources from the plugin XML into the NF workflow. As a start it would be sufficient to just use the 'l' resource-set.

Chance of unnoticed column swap affecting germline/somatic classification

(edited, from to @suhrig)

I have processed some external samples, where the issue manifested, and the effect is that somatic and germline variants are swapped if I'm interpreting things correctly.

Normally, Platypus extracts sample names from the @rg header line of the BAM file. It performs some validation of this line. When it encounters an unexpected field, it aborts the extraction and instead takes the BAM file names as sample names. A message is printed to the Platypus log file ("unknown field code ..."), but Platypus proceeds without error. The problem is that all other scripts of the pipeline still extract the sample names from the @rg header line. So the sample names used by Platypus and those used by all other scripts are inconsistent. This results in swapped germline/somatic classification when the tumor BAM file name is lexicographically smaller than the control BAM file name. What's dangerous is that the pipeline just completes without any error and even the checkSampleSwap plot looks fine as far as I can tell.

ID BC CN DS DT FO KS LB PG PI PL PM PU SM <- These are all valid SAM @RG fields                                                                                                                            
ID    CN DS DT       LB PG PI       PU SM <- These are the fields that Platypus accepts                                                                                                                    

All fields missing in Platypus' list will trigger the issue if they are present in the @rg line. I have found some bug reports about this from years ago, but the Platypus developers have not fixed this yet, so they possibly will never

Identify tools affected by change CRAM

Goal: overview of all pipelines and tools which need adaptations

Pipelines
Quality Control Workflow
SNV Calling Workflow
Platypus
ACEseq

Tools
samtools
bcftools
pysam

Make results MultiQC-compatible

MultiQC simplifies the visual representation of QC data. For display in MultiQC result files either should be one of the already supported formats or can be annotated. Check the output files and restructure their content to support MultiQC.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.