Git Product home page Git Product logo

Comments (4)

dlaehnemann avatar dlaehnemann commented on May 28, 2024 1

Well spotted and thanks for documenting this in so much detail, with good suggestions. You and anybody using this pipeline is welcome to implement this via a PR, and we will look into implementing this if we can find some time. But we are not actively using this pipeline ourselves, so this might be a while.

For further inspiration, the base quality score recalibration (BQSR) setup might also be a good place to look, especially as this uses some of the respective files available in this workflow:

rule recalibrate_base_qualities:
input:
bam=get_recal_input(),
bai=get_recal_input(bai=True),
ref="resources/genome.fasta",
dict="resources/genome.dict",
known="resources/variation.noiupac.vcf.gz",
known_idx="resources/variation.noiupac.vcf.gz.tbi",
output:
recal_table="results/recal/{sample}-{unit}.grp",
log:
"logs/gatk/bqsr/{sample}-{unit}.log",
params:
extra=get_regions_param() + config["params"]["gatk"]["BaseRecalibrator"],
resources:
mem_mb=1024,
wrapper:
"0.74.0/bio/gatk/baserecalibrator"
rule apply_base_quality_recalibration:
input:
bam=get_recal_input(),
bai=get_recal_input(bai=True),
ref="resources/genome.fasta",
dict="resources/genome.dict",
recal_table="results/recal/{sample}-{unit}.grp",
output:
bam=protected("results/recal/{sample}-{unit}.bam"),
log:
"logs/gatk/apply-bqsr/{sample}-{unit}.log",
params:
extra=get_regions_param(),
resources:
mem_mb=1024,
wrapper:
"0.74.0/bio/gatk/applybqsr"

from dna-seq-gatk-variant-calling.

lczech avatar lczech commented on May 28, 2024

Thanks! I am also not actively using this pipeline here, as I'm developing my own that already fixes the issue, as stated above, so I won't have much time to work in this either here :-(

I also noted the BQSR setup has some similarity in GATK - although it seems that the wrappers for that changed. In previous versions, both steps of BQSR were combined into one wrapper, and only split into two later on. That might have been the reason why this was missed in the pipeline here.

from dna-seq-gatk-variant-calling.

dlaehnemann avatar dlaehnemann commented on May 28, 2024

Sure, that could be a root cause here. Or simply overlooking some details...

Does your workflow above follow the GATK best practices and do you actively maintain it? It might be worth deprecating this workflow in favor of yours, if yours ticks the right boxes...

Also, I realized yours isn't yet listed in the Snakemake Workflow Catalog. That catalog is basically a nightly GitHub crawler that aggregates all snakemake workflows that adhere either to its inclusion criteria. And if you walk the couple of extra meters to enable standardized usage, you even get a quick and easy deployment help for you workflow that you can for example cite in its README.md. As a good example, see the usage info for our dna-seq-varlociraptor workflow.

from dna-seq-gatk-variant-calling.

lczech avatar lczech commented on May 28, 2024

Does your workflow above follow the GATK best practices and do you actively maintain it?

As far as I am aware, it does indeed implement the best practices. And yes, at least for a while I plan to maintain it. We are currently working on a publication describing the pipeline as well.

Also, I realized yours isn't yet listed in the Snakemake Workflow Catalog.

Indeed, I'd love to have it listed in the catalog! I'll hopefully find some time to walk these extra meters soon ;-)

from dna-seq-gatk-variant-calling.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.