Bioinformatic wooorkFlooows using WDL and Cromwell.
Documentation: https://umccr.github.io/woof/
blabla
:dog: WooorkFlooows for Bioinformatics
Home Page: https://pdiakumis.github.io/woof/
Bioinformatic wooorkFlooows using WDL and Cromwell.
Documentation: https://umccr.github.io/woof/
blabla
Train a model on FP vs. TP, check for informative attributes
Instead of a headerless tsv, create a json with the ability to use key-values.
Create a function that grabs status, timings, output paths etc. from cromwell_metadata.json
For the VCF comparison read in the f1 and f2 arguments into cromwell_inputs.json
Examples:
woof compare
--mode bcbio-dna
--bc1 /path/to/run1/sampleA/final
--bc2 /path/to/run2/sampleA/final
woof compare
--mode umccrise
--um1 /path/to/run1/sampleA/umccrised
--um2 /path/to/run2/sampleA/umccrised
woof compare
--mode bcbio-rna
--bc1 /path/to/run1/sampleA/final
--bc2 /path/to/run2/sampleA/final
Might be able to get that info from the bcbio config
See http://www.htslib.org/doc/htsfile.html for guessing file types
Fix recursive copy function. Getting the below when re-starting and the WDL files already exist:
Directory not copied. Error: [Errno 17] File exists : woof/work/wdl'
Get total counts, but only use filtered PASS calls for the evaluation stats
From Oliver (https://umccr.slack.com/archives/C025TLC7D/p1553825820038500):
Diff bcbio 1.1.3 / 1.1.4. Not surprised we're seeing those changes. A fair number of those 'false positives' (new MuTect2 calls) might actually now just be calls that VarDict/Strelka2 already had.
Which made me think of the easiest evaluation metric: how many of the 'new' (potential FP) calls for a caller have support from the other callers? And respectively, how many of the false negative calls (lost in the new version) are unique to that caller?
Move all code into different script
All you need is <f1>
and <f2>
.
Currently all happens withing Rmd report.
Workflow would go like:
Check out attributes of FP/FN variants from ensemble-batch
comparisons.
Things to look at:
From https://github.com/vladsaveliev/vcf_stuff/blob/master/README.md
For instance, split one record
#CHROM POS ID REF ALT
1 10 . A T,C
Into 2 separate records
#CHROM POS ID REF ALT
1 10 . A T
1 10 . A C
For that, we are using vt tools:
vt decompose -s vcf_file
For instance, split the following one records:
#CHROM POS ID REF ALT
1 20 . AG CT
into 2 separate ones:
#CHROM POS ID REF ALT
1 20 . A C
1 20 . G T
We are using for that vcflib's vcfallelicprimitives
:
vcfallelicprimitives -t DECOMPOSED --keep-geno vcf_file
For instance, given that the reference chromosome 1 starts with GCTCCG
, split the following records
#CHROM POS ID REF ALT
1 2 . CTCC CCC,C,CCCC
into the following 3:
#CHROM POS ID REF ALT
1 1 . GCTC G
1 2 . CT C
1 3 . T C
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.