Git Product home page Git Product logo

openomics / genome-seek Goto Github PK

View Code? Open in Web Editor NEW
16.0 1.0 8.0 941 KB

Clinical Whole Genome and Exome Sequencing Pipeline

Home Page: https://openomics.github.io/genome-seek/

License: MIT License

Python 77.77% Shell 7.60% Dockerfile 11.44% R 1.13% Perl 2.07%
copy-number-variation germline-variants quality-control structural-variants singularity snakemake whole-genome-sequencing somatic-variants pipeline hla-typing

genome-seek's People

Contributors

jlac avatar skchronicles avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

genome-seek's Issues

TODO: v1.0.0 Release

Germline Pipeline

  • Only run DV and its subsequent rules with norm samples
    • If pairs file (only relevant to the somatic pipeline, i.e. provided --call-somatic), then pull out normal samples
    • If --call-somatic is not provided then all samples are assumed to be normals
    • If --call-somatic is provided and no pairs file, then all samples are assumed to be tumors

This will prevent any unnecessary compute and will reduce overall runtime.

Fix strelka issue

Filtering from GATK is not working for strelka, resolved by adding these two commands before it goes to norm & splitting:

bcftools concat -Ov -a  \
    -D somatic.snvs.vcf.gz somatic.indels.vcf.gz \
    -o strelka.merge.indel.snps.vcf
java -Xmx16g -cp /data/OpenOmics/references/genome-seek/hmftools/purple_v3.2.jar com.hartwig.hmftools.purple.tools.AnnotateStrelkaWithAllelicDepth -in strelka.merge.indel.snps.vcf -out strelka.merge.indel.snps.annotated.vcf.gz

Next fix is for spliting TUMOR from strelka. When splitting we cannot use "-c1" for strelka as we don't have the tag in vcf file to check for --min-ac/--max-ac. We need to update the tumor splitting command for strelka to remove -c1:

bcftools view -s TUMOR  -Oz -o strelka.tumor.vcf.gz \
    strelka.merge.indel.snps.annotated.filtered.vcf.gz

Tumor only

Need to update rule all so that the strelka and muse output files do not get created for tumor-only samples

Calling only -> no alignment

Hi,

I wanted to use your package, but ran into the issue that I do not want to realign my data - would it be possible to integrate a shortcut to skip alignment as well?

or is their something spacial done within the alignment, on which the rest is building?

If so, I had the impression that my multi-lane fastq was not accepted properly.

Could you set something up with an addition of *L{X}*R{1,2}.fastq.gz?

Cheers!

Add option to run pipeline in WES-mode

Feature: Whole exome sequencing (WES) pipeline

Add two new cli options to run the pipeline with rules/parameters optimized for WES datasets.

--wes-bed WES_BED
Path to exome targets BED file. This file can be obtained from the manufacturer of the target capture kit that was used. By default, a set of BED files generated from GENCODE's exon annotation for protein coding gene's exon is used.

--wes-mode
Run the whole exome pipeline. By default, the whole genome sequencing pipeline is run. This option allows a user to process and analyze whole exome sequencing data. Please note when this mode is enabled, a sub-set of the WGS rules will run. Please see the option below for more information about providing a custom exome targets BED file.

Overview of changes

  • Update the cli with new options.
  • Add new software dependencies to existing docker images: cnvkit sequenza
  • Updating rules to dynamically use options/parameters optimized for WES if the --wes-mode switch/flag is provided
  • Conditional run exome-only rules, a sub-set of tools we are using in the WGS pipeline do not support WES data, update rule all so these rules do not run
  • Test pipeline end-end with all options, need to test both pipelines: WGS and WES

genome-seek cache error

Hi,

I installed genome-seek through conda:

mamba create -c conda-forge -c bioconda -p /mycondaEnv/snakemake_singularity snakemake singularity
git clone https://github.com/OpenOmics/genome-seek.git
cd genome-seek
mamba activate /mycondaEnv/snakemake_singularity
./genome-seek --version
genome-seek 0.3.3-alpha
snakemake --version
7.25.0
singularity --version
singularity version 3.8.6

But when I tried to run: /genome-seek/genome-seek cache --sif-cache /sif-cache, I got the following error:

genome-seek (0.3.3-alpha)
Image will be pulled from "/data/OpenOmics/SIFs/ccbr_wes_base_v0.1.0.sif".
Image will be pulled from "/data/OpenOmics/SIFs/deepvariant_1.3.0-gpu.sif".
Image will be pulled from "/data/OpenOmics/SIFs/glnexus_v1.4.1.sif".
Image will be pulled from "/data/OpenOmics/SIFs/ncbr_opencravat_latest.sif".
Image will be pulled from "/data/OpenOmics/SIFs/ncbr_octopus_v0.1.0.sif".
Image will be pulled from "/data/OpenOmics/SIFs/ncbr_sigprofiler_v0.1.0.sif".
Image will be pulled from "/data/OpenOmics/SIFs/ncbr_vcf2maf_v0.1.0.sif".
/projectsp/foran/yc790/apps/genome-seek/src/cache.sh: line 210: SLURM_JOB_ID: unbound variable
/projectsp/foran/yc790/apps/genome-seek/src/cache.sh: line 210: SLURM_JOB_ID: unbound variable
WARNING: Failed to run 'set -euo pipefail; /genome-seek/src/cache.sh local  -s '/sif-cache'  -i '/data/OpenOmics/SIFs/ccbr_wes_base_v0.1.0.sif,/data/OpenOmics/SIFs/deepvariant_1.3.0-gpu.sif,/data/OpenOmics/SIFs/glnexus_v1.4.1.sif,/data/OpenOmics/SIFs/ncbr_opencravat_latest.sif,/data/OpenOmics/SIFs/ncbr_octopus_v0.1.0.sif,/data/OpenOmics/SIFs/ncbr_sigprofiler_v0.1.0.sif,/data/OpenOmics/SIFs/ncbr_vcf2maf_v0.1.0.sif'  -t '/sif-cache/yc790/.singularity/' ' command!
        └── Command returned a non-zero exitcode of '1'.
Fatal: Failed to pull all containers. Please try again!

It seems that the image sif files are missing and the SLURM_JOB_ID in cache.sh is not defined. Is there a way to get around?

Thanks a lot!

Ying

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.