alsmith151 / seqnado Goto Github PK

View Code? Open in Web Editor NEW

5.0 2.0 3.0 142.26 MB

Milne Lab ChIP-seq, ATAC-seq and RNA-seq pipeline

Home Page: https://alsmith151.github.io/SeqNado/

License: GNU General Public License v3.0

Python 88.64% Shell 0.62% Jinja 9.93% R 0.82%

seqnado's Introduction

SeqNado Pipeline

Pipeline based on snakemake to process ChIP-seq, ATAC-seq, RNA-seq and short read WGS data for SNP calling.

See the SeqNado documentation https://alsmith151.github.io/SeqNado/ for more information.

seqnado's People

Contributors

Stargazers

Watchers

Forkers

joeharman milne-group cchahrour

seqnado's Issues

Chromsizes specification issue

Mapping ChIP-seq data using hg38 (using the databank UCSC indices), and it seems to have an issue converting bed to bigbed, possibly missing “chrEBV” in chrom.sizes?? Error message is below.

2021-07-22 12:22:59,297 INFO main control - {"task": "'convert_bed_to_bigbed'", "task_status": "update", "task_total": 1, "task_completed": 0,

"task_completed_percent": 0.0}

2021-07-22 12:22:59,297 ERROR main task - \

                                         Exception #1 \

                                           'builtins.OSError(Job 2487560 has non-zero exitStatus 255: hasExited=True,  wasAborted=Falsehas

Signal=True, terminatedSignal='unknown

signal?!' \

                                         statement = cat peaks/homer/ChIP-SEM-MYB_MYB_homer.bed                     | sort -k1,1 -k2,2n > peaks/homer/ChIP-SEM-MYB_MYB_homer.bed.tmp                     && bedToBigBed peaks/homer/ChIP-SEM-MYB_MYB_homer.bed.tmp chrom_sizes.txt.tmp peaks/homer/ChIP-SEM-MYB_MYB_homer.bigBed \

                                          stderr = chrEBV is not found in chromosome sizes file \

                                         )' raised in ... \

                                            Task = def convert_bed_to_bigbed(...): \

                                            Job  = [peaks/homer/ChIP-SEM-MYB_MYB_homer.bed -> peaks/homer/ChIP-SEM-MYB_MYB_homer.bigBed] \

                                          \

                                         Traceback (most recent call last): \

                                           File "/home/j/jharman/.conda/envs/ngs/lib/python3.8/site-packages/ruffus/task.py", line 712, in run_pooled_job_without_exceptions \

                                             return_value = job_wrapper(params, user_defined_work_func, \

                                           File "/home/j/jharman/.conda/envs/ngs/lib/python3.8/site-packages/ruffus/task.py", line 545, in job_wrapper_io_files \

                                             ret_val = user_defined_work_func(*params) \

                                           File "/home/j/jharman/.conda/envs/ngs/lib/python3.8/site-packages/ngs_pipeline/pipeline_atac_chipseq.py", line 600, in convert_bed_to_bigbed \

                                             P.run( \

                                           File "/home/j/jharman/.conda/envs/ngs/lib/python3.8/site-packages/cgatcore/pipeline/execution.py", line 1223, in run \

                                             benchmark_data = r.run(statement_list) \

                                           File "/home/j/jharman/.conda/envs/ngs/lib/python3.8/site-packages/cgatcore/pipeline/execution.py", line 803, in run \

                                             stdout, stderr, resource_usage = self.queue_manager.collect_single_job_from_cluster( \

                                           File "/home/j/jharman/.conda/envs/ngs/lib/python3.8/site-packages/cgatcore/pipeline/cluster.py", line 145, in collect_single_job_from_cluster \

                                             raise OSError(error_msg) \

                                         OSError: Job 2487560 has non-zero exitStatus 255: hasExited=True,  wasAborted=FalsehasSignal=True, terminatedSignal='unknown signal?!'  \

                                         statement = cat peaks/homer/ChIP-SEM-MYB_MYB_homer.bed                     | sort -k1,1 -k2,2n > peaks/homer/ChIP-SEM-MYB_MYB_homer.bed.tmp                     && bedToBigBed peaks/homer/ChIP-SEM-MYB_MYB_homer.bed.tmp chrom_sizes.txt.tmp peaks/homer/ChIP-SEM-MYB_MYB_homer.bigBed \

                                          stderr = chrEBV is not found in chromosome sizes file \

split deseq to get norm factors

ChIP inputs not recognised

with input present in design, inputs are not mapped or used for peak calling

Easier GEO submission/ Reporting Summary

Process the stats in such a way as to make them easier for a GEO submission and/or Nature Reporting Summary

MD5 checksums
Extract stats into a format that works for a reporting summary e.g. number of reads raw/mapped/filtered
Create a summary of the run parameters in an easy to copy format

pileup method defualt in config

Fix se bigwig extend reads for scale

Merged pileups/peaks not being generated

Need to update outputs to allow for merging

Remove pysam dependency in package

Updated profiles

Add singularity prefix to stop everyone downloading the container all the time
Update default-resources in profile

Hg38 bigwig generation - difference in default behaviour for deeptools and homer

Creating bigwigs in hg38 with homer and deeptools results in Homer tracks showing regions of absent signal, whereas normal signal is found with deeptools. These regions are not overlapping blacklist. Looking at hg19 mapping does not show this loss of signal.

I believe this is because hg38 mapping has a much higher level of multimapping compared with hg19. By default Homer retains uniquely mapped reads, and multimapping regions are lost. Adding the -keepAll parameter to homer tag directory generation fixes this issue.

Is this a parameter that we want used by default? Or maybe the template config files should have these written in?

Example tracks attached illustrating the issue. 1st track shows DeepTools, 2nd track shows Homer, 3rd track shows Homer with the -keepAll parameter.

Naming convention is still very strict

concatenate lanes

add concatenate lanes into seqnado

Blacklist not being added when using seqnado-config

Cookiecutter is not inserting blacklist key even when provided.

Blacklist missing parameter access

Problem running rna pipeline.
Blacklist parameter not recognised.

profile permissions on install

profile scripts are not executable on install

Fastq names must have fastq for design to work

editable preset genomes

make preset reference genome editable and fastqscreen conf

STAR command does not account for paired end data

The STAR commands generated by the pipeline are written out in single end mode (Unless I've missed a pipeline flag?)

The --readFilesIn parameter uses "," to imply fastq files that are concatenated (i.e, lanes), while " " is used to seperate read 1 and read 2.

Such that this command (as the pipeline uses):

STAR --genomeDir /project/milne_group/shared/custom_genomes/mm10/TetR_Chr8/star --readFilesIn trimmed/RNA-TetO-MYB-TA2-4_R1_val_1.fq,trimmed/RNA-TetO-MYB-TA2-4_R2_val_2.fq --readFilesCommand cat --outSAMtype BAM Unsorted --runThreadN 4 --outFileNamePrefix bam/RNA-TetO-MYB-TA2-4

Should be (for paired end):

STAR --genomeDir /project/milne_group/shared/custom_genomes/mm10/TetR_Chr8/star --readFilesIn trimmed/RNA-TetO-MYB-TA2-4_R1_val_1.fq trimmed/RNA-TetO-MYB-TA2-4_R2_val_2.fq --readFilesCommand cat --outSAMtype BAM Unsorted --runThreadN 4 --outFileNamePrefix bam/RNA-TetO-MYB-TA2-4

I think this should be a simple fix, unless it needs to be conditional on single/paired end input??

Spaces in design file leads to strange bugs

Building DAG of jobs...

InputFunctionException in rule fastqc_raw in file /project/milne_group/adopicof/software/MambaForge/envs/seqnado/lib/python3.11/site-packages/seqnado/workflow/rules/qc.smk, line 4:

Error:

TypeError: stat: path should be string, bytes, os.PathLike or integer, not float

Wildcards:

sample=H3K27ac_DMSO_H3K27ac_1.fastq.gz

read=1

Traceback:

File "/project/milne_group/adopicof/software/MambaForge/envs/seqnado/lib/python3.11/site-packages/seqnado/workflow/rules/qc.smk", line 6, in

File "/project/milne_group/adopicof/software/MambaForge/envs/seqnado/lib/python3.11/site-packages/seqnado/utils.py", line 191, in translate_fq_files

File "/project/milne_group/adopicof/software/MambaForge/envs/seqnado/lib/python3.11/site-packages/seqnado/utils_chipseq.py", line 164, in translation

File "/project/milne_group/adopicof/software/MambaForge/envs/seqnado/lib/python3.11/site-packages/seqnado/utils_chipseq.py", line 155, in _translate_ip_samples

File "", line 19, in exists

Add Salmon quant

New cluster singularity issue

WorkflowError:
Failed to pull singularity image from library://asmith151/seqnado/seqnado_pipeline:latest:
FATAL: Unable to get library client configuration: remote has no library client (see https://apptainer.org/docs/user/latest/endpoint.html#no-default-remote)

Fix:

apptainer remote add --no-login SylabsCloud cloud.sylabs.io
apptainer remote use SylabsCloud