Comments (4)
Well spotted and thanks for documenting this in so much detail, with good suggestions. You and anybody using this pipeline is welcome to implement this via a PR, and we will look into implementing this if we can find some time. But we are not actively using this pipeline ourselves, so this might be a while.
For further inspiration, the base quality score recalibration (BQSR) setup might also be a good place to look, especially as this uses some of the respective files available in this workflow:
dna-seq-gatk-variant-calling/workflow/rules/mapping.smk
Lines 65 to 101 in cffa77a
from dna-seq-gatk-variant-calling.
Thanks! I am also not actively using this pipeline here, as I'm developing my own that already fixes the issue, as stated above, so I won't have much time to work in this either here :-(
I also noted the BQSR setup has some similarity in GATK - although it seems that the wrappers for that changed. In previous versions, both steps of BQSR were combined into one wrapper, and only split into two later on. That might have been the reason why this was missed in the pipeline here.
from dna-seq-gatk-variant-calling.
Sure, that could be a root cause here. Or simply overlooking some details...
Does your workflow above follow the GATK best practices and do you actively maintain it? It might be worth deprecating this workflow in favor of yours, if yours ticks the right boxes...
Also, I realized yours isn't yet listed in the Snakemake Workflow Catalog. That catalog is basically a nightly GitHub crawler that aggregates all snakemake workflows that adhere either to its inclusion criteria. And if you walk the couple of extra meters to enable standardized usage, you even get a quick and easy deployment help for you workflow that you can for example cite in its README.md
. As a good example, see the usage info for our dna-seq-varlociraptor
workflow.
from dna-seq-gatk-variant-calling.
Does your workflow above follow the GATK best practices and do you actively maintain it?
As far as I am aware, it does indeed implement the best practices. And yes, at least for a while I plan to maintain it. We are currently working on a publication describing the pipeline as well.
Also, I realized yours isn't yet listed in the Snakemake Workflow Catalog.
Indeed, I'd love to have it listed in the catalog! I'll hopefully find some time to walk these extra meters soon ;-)
from dna-seq-gatk-variant-calling.
Related Issues (20)
- Erro in rule snpeff HOT 1
- README report link served through RawGit will stop working
- snpeff with custom genome database
- Direct output to directory of interest HOT 1
- Which singularity image is used? HOT 1
- Failed to open environment file
- Failed to open environment file mergevcfs HOT 1
- Running without known-variants HOT 3
- Haplotype caller with intervals runs slow HOT 2
- Not executed fastqc for specified inputs HOT 1
- Errors in two rules terminating run HOT 2
- missing python dependency in rule plot_stats HOT 2
- Choice of reference genome HOT 2
- can't set java opts for rule recalibrate_base_qualities
- REMOVE_DUPLICATES is false according to logs in rule mark_duplicates
- rule plot_stats fails with "OverflowError: value too large to convert to npy_uint32" HOT 4
- Error tokenizing data. C error: Expected 5 fields in line 4, saw 6 HOT 2
- TypeError in calling.smk rule merge_variants. The bwa mapping stops after creating the indexes files. HOT 7
- Help with dna-seq-gatk-variant-calling
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dna-seq-gatk-variant-calling.