dkfz-odcf / indelcallingworkflow Goto Github PK
View Code? Open in Web Editor NEWA Platypus-based workflow for indel calling
License: Other
A Platypus-based workflow for indel calling
License: Other
Migrate all jobs to Nextflow.
RODDY_
references from the jobs.(edited, from to @suhrig)
I have processed some external samples, where the issue manifested, and the effect is that somatic and germline variants are swapped if I'm interpreting things correctly.
Normally, Platypus extracts sample names from the @rg header line of the BAM file. It performs some validation of this line. When it encounters an unexpected field, it aborts the extraction and instead takes the BAM file names as sample names. A message is printed to the Platypus log file ("unknown field code ..."), but Platypus proceeds without error. The problem is that all other scripts of the pipeline still extract the sample names from the @rg header line. So the sample names used by Platypus and those used by all other scripts are inconsistent. This results in swapped germline/somatic classification when the tumor BAM file name is lexicographically smaller than the control BAM file name. What's dangerous is that the pipeline just completes without any error and even the checkSampleSwap plot looks fine as far as I can tell.
ID BC CN DS DT FO KS LB PG PI PL PM PU SM <- These are all valid SAM @RG fields
ID CN DS DT LB PG PI PU SM <- These are the fields that Platypus accepts
All fields missing in Platypus' list will trigger the issue if they are present in the @rg line. I have found some bug reports about this from years ago, but the Platypus developers have not fixed this yet, so they possibly will never
Goal: overview of all pipelines and tools which need adaptations
Pipelines
Quality Control Workflow
SNV Calling Workflow
Platypus
ACEseq
Tools
samtools
bcftools
pysam
MultiQC simplifies the visual representation of QC data. For display in MultiQC result files either should be one of the already supported formats or can be annotated. Check the output files and restructure their content to support MultiQC.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.