openomics / genome-seek Goto Github PK

Clinical Whole Genome and Exome Sequencing Pipeline

Home Page: https://openomics.github.io/genome-seek/

License: MIT License

Python 77.77% Shell 7.60% Dockerfile 11.44% R 1.13% Perl 2.07%

copy-number-variation germline-variants quality-control structural-variants singularity snakemake whole-genome-sequencing somatic-variants pipeline hla-typing

genome-seek's People

Contributors

Stargazers

Watchers

Forkers

redekarnr jlac das2000sidd garyzhangyue khornick1 skchronicles bingli2019

genome-seek's Issues

Build docker image for OpenCRAVAT due to bug in mergesqlite sub command

Germline Pipeline

Only run DV and its subsequent rules with norm samples
- If pairs file (only relevant to the somatic pipeline, i.e. provided --call-somatic), then pull out normal samples
- If --call-somatic is not provided then all samples are assumed to be normals
- If --call-somatic is provided and no pairs file, then all samples are assumed to be tumors

This will prevent any unnecessary compute and will reduce overall runtime.

Fix strelka issue

Filtering from GATK is not working for strelka, resolved by adding these two commands before it goes to norm & splitting:

bcftools concat -Ov -a  \
    -D somatic.snvs.vcf.gz somatic.indels.vcf.gz \
    -o strelka.merge.indel.snps.vcf
java -Xmx16g -cp /data/OpenOmics/references/genome-seek/hmftools/purple_v3.2.jar com.hartwig.hmftools.purple.tools.AnnotateStrelkaWithAllelicDepth -in strelka.merge.indel.snps.vcf -out strelka.merge.indel.snps.annotated.vcf.gz

Next fix is for spliting TUMOR from strelka. When splitting we cannot use "-c1" for strelka as we don't have the tag in vcf file to check for --min-ac/--max-ac. We need to update the tumor splitting command for strelka to remove -c1:

bcftools view -s TUMOR  -Oz -o strelka.tumor.vcf.gz \
    strelka.merge.indel.snps.annotated.filtered.vcf.gz

Tumor only

Need to update rule all so that the strelka and muse output files do not get created for tumor-only samples

Create basic scaffold for project

Calling only -> no alignment

Hi,

I wanted to use your package, but ran into the issue that I do not want to realign my data - would it be possible to integrate a shortcut to skip alignment as well?

or is their something spacial done within the alignment, on which the rest is building?

If so, I had the impression that my multi-lane fastq was not accepted properly.

Could you set something up with an addition of *L{X}*R{1,2}.fastq.gz?

Cheers!

Add optional CNV calling steps

Add optional steps to skip qc and only call variants

Add option to run pipeline in WES-mode

Feature: Whole exome sequencing (WES) pipeline

Add two new cli options to run the pipeline with rules/parameters optimized for WES datasets.

--wes-bed WES_BED
Path to exome targets BED file. This file can be obtained from the manufacturer of the target capture kit that was used. By default, a set of BED files generated from GENCODE's exon annotation for protein coding gene's exon is used.

--wes-mode
Run the whole exome pipeline. By default, the whole genome sequencing pipeline is run. This option allows a user to process and analyze whole exome sequencing data. Please note when this mode is enabled, a sub-set of the WGS rules will run. Please see the option below for more information about providing a custom exome targets BED file.

Overview of changes

Update the cli with new options.
Add new software dependencies to existing docker images: cnvkit sequenza
Updating rules to dynamically use options/parameters optimized for WES if the --wes-mode switch/flag is provided
Conditional run exome-only rules, a sub-set of tools we are using in the WGS pipeline do not support WES data, update rule all so these rules do not run
Test pipeline end-end with all options, need to test both pipelines: WGS and WES

Create somatic pipeline

Add optional SV calling steps

Create a new entry point to pipeline

genome-seek cache error

Hi,

I installed genome-seek through conda:

mamba create -c conda-forge -c bioconda -p /mycondaEnv/snakemake_singularity snakemake singularity
git clone https://github.com/OpenOmics/genome-seek.git
cd genome-seek
mamba activate /mycondaEnv/snakemake_singularity
./genome-seek --version
genome-seek 0.3.3-alpha
snakemake --version
7.25.0
singularity --version
singularity version 3.8.6

But when I tried to run: /genome-seek/genome-seek cache --sif-cache /sif-cache, I got the following error:

genome-seek (0.3.3-alpha)
Image will be pulled from "/data/OpenOmics/SIFs/ccbr_wes_base_v0.1.0.sif".
Image will be pulled from "/data/OpenOmics/SIFs/deepvariant_1.3.0-gpu.sif".
Image will be pulled from "/data/OpenOmics/SIFs/glnexus_v1.4.1.sif".
Image will be pulled from "/data/OpenOmics/SIFs/ncbr_opencravat_latest.sif".
Image will be pulled from "/data/OpenOmics/SIFs/ncbr_octopus_v0.1.0.sif".
Image will be pulled from "/data/OpenOmics/SIFs/ncbr_sigprofiler_v0.1.0.sif".
Image will be pulled from "/data/OpenOmics/SIFs/ncbr_vcf2maf_v0.1.0.sif".
/projectsp/foran/yc790/apps/genome-seek/src/cache.sh: line 210: SLURM_JOB_ID: unbound variable
/projectsp/foran/yc790/apps/genome-seek/src/cache.sh: line 210: SLURM_JOB_ID: unbound variable
WARNING: Failed to run 'set -euo pipefail; /genome-seek/src/cache.sh local  -s '/sif-cache'  -i '/data/OpenOmics/SIFs/ccbr_wes_base_v0.1.0.sif,/data/OpenOmics/SIFs/deepvariant_1.3.0-gpu.sif,/data/OpenOmics/SIFs/glnexus_v1.4.1.sif,/data/OpenOmics/SIFs/ncbr_opencravat_latest.sif,/data/OpenOmics/SIFs/ncbr_octopus_v0.1.0.sif,/data/OpenOmics/SIFs/ncbr_sigprofiler_v0.1.0.sif,/data/OpenOmics/SIFs/ncbr_vcf2maf_v0.1.0.sif'  -t '/sif-cache/yc790/.singularity/' ' command!
        └── Command returned a non-zero exitcode of '1'.
Fatal: Failed to pull all containers. Please try again!

It seems that the image sif files are missing and the SLURM_JOB_ID in cache.sh is not defined. Is there a way to get around?

Thanks a lot!

Ying