Pipeline based on snakemake to process ChIP-seq, ATAC-seq, RNA-seq and short read WGS data for SNP calling.
See the SeqNado documentation https://alsmith151.github.io/SeqNado/ for more information.
Milne Lab ChIP-seq, ATAC-seq and RNA-seq pipeline
Home Page: https://alsmith151.github.io/SeqNado/
License: GNU General Public License v3.0
Pipeline based on snakemake to process ChIP-seq, ATAC-seq, RNA-seq and short read WGS data for SNP calling.
See the SeqNado documentation https://alsmith151.github.io/SeqNado/ for more information.
Mapping ChIP-seq data using hg38 (using the databank UCSC indices), and it seems to have an issue converting bed to bigbed, possibly missing “chrEBV” in chrom.sizes?? Error message is below.
2021-07-22 12:22:59,297 INFO main control - {"task": "'convert_bed_to_bigbed'", "task_status": "update", "task_total": 1, "task_completed": 0,
"task_completed_percent": 0.0}
2021-07-22 12:22:59,297 ERROR main task - \
Exception #1 \
'builtins.OSError(Job 2487560 has non-zero exitStatus 255: hasExited=True, wasAborted=Falsehas
Signal=True, terminatedSignal='unknown
signal?!' \
statement = cat peaks/homer/ChIP-SEM-MYB_MYB_homer.bed | sort -k1,1 -k2,2n > peaks/homer/ChIP-SEM-MYB_MYB_homer.bed.tmp && bedToBigBed peaks/homer/ChIP-SEM-MYB_MYB_homer.bed.tmp chrom_sizes.txt.tmp peaks/homer/ChIP-SEM-MYB_MYB_homer.bigBed \
stderr = chrEBV is not found in chromosome sizes file \
)' raised in ... \
Task = def convert_bed_to_bigbed(...): \
Job = [peaks/homer/ChIP-SEM-MYB_MYB_homer.bed -> peaks/homer/ChIP-SEM-MYB_MYB_homer.bigBed] \
\
Traceback (most recent call last): \
File "/home/j/jharman/.conda/envs/ngs/lib/python3.8/site-packages/ruffus/task.py", line 712, in run_pooled_job_without_exceptions \
return_value = job_wrapper(params, user_defined_work_func, \
File "/home/j/jharman/.conda/envs/ngs/lib/python3.8/site-packages/ruffus/task.py", line 545, in job_wrapper_io_files \
ret_val = user_defined_work_func(*params) \
File "/home/j/jharman/.conda/envs/ngs/lib/python3.8/site-packages/ngs_pipeline/pipeline_atac_chipseq.py", line 600, in convert_bed_to_bigbed \
P.run( \
File "/home/j/jharman/.conda/envs/ngs/lib/python3.8/site-packages/cgatcore/pipeline/execution.py", line 1223, in run \
benchmark_data = r.run(statement_list) \
File "/home/j/jharman/.conda/envs/ngs/lib/python3.8/site-packages/cgatcore/pipeline/execution.py", line 803, in run \
stdout, stderr, resource_usage = self.queue_manager.collect_single_job_from_cluster( \
File "/home/j/jharman/.conda/envs/ngs/lib/python3.8/site-packages/cgatcore/pipeline/cluster.py", line 145, in collect_single_job_from_cluster \
raise OSError(error_msg) \
OSError: Job 2487560 has non-zero exitStatus 255: hasExited=True, wasAborted=FalsehasSignal=True, terminatedSignal='unknown signal?!' \
statement = cat peaks/homer/ChIP-SEM-MYB_MYB_homer.bed | sort -k1,1 -k2,2n > peaks/homer/ChIP-SEM-MYB_MYB_homer.bed.tmp && bedToBigBed peaks/homer/ChIP-SEM-MYB_MYB_homer.bed.tmp chrom_sizes.txt.tmp peaks/homer/ChIP-SEM-MYB_MYB_homer.bigBed \
stderr = chrEBV is not found in chromosome sizes file \
with input present in design, inputs are not mapped or used for peak calling
Process the stats in such a way as to make them easier for a GEO submission and/or Nature Reporting Summary
Need to update outputs to allow for merging
Add support for Seacr peak caller
Creating bigwigs in hg38 with homer and deeptools results in Homer tracks showing regions of absent signal, whereas normal signal is found with deeptools. These regions are not overlapping blacklist. Looking at hg19 mapping does not show this loss of signal.
I believe this is because hg38 mapping has a much higher level of multimapping compared with hg19. By default Homer retains uniquely mapped reads, and multimapping regions are lost. Adding the -keepAll parameter to homer tag directory generation fixes this issue.
Is this a parameter that we want used by default? Or maybe the template config files should have these written in?
Example tracks attached illustrating the issue. 1st track shows DeepTools, 2nd track shows Homer, 3rd track shows Homer with the -keepAll parameter.
add concatenate lanes into seqnado
Cookiecutter is not inserting blacklist key even when provided.
Problem running rna pipeline.
Blacklist parameter not recognised.
profile scripts are not executable on install
make preset reference genome editable and fastqscreen conf
The STAR commands generated by the pipeline are written out in single end mode (Unless I've missed a pipeline flag?)
The --readFilesIn parameter uses "," to imply fastq files that are concatenated (i.e, lanes), while " " is used to seperate read 1 and read 2.
Such that this command (as the pipeline uses):
STAR --genomeDir /project/milne_group/shared/custom_genomes/mm10/TetR_Chr8/star --readFilesIn trimmed/RNA-TetO-MYB-TA2-4_R1_val_1.fq,trimmed/RNA-TetO-MYB-TA2-4_R2_val_2.fq --readFilesCommand cat --outSAMtype BAM Unsorted --runThreadN 4 --outFileNamePrefix bam/RNA-TetO-MYB-TA2-4
Should be (for paired end):
STAR --genomeDir /project/milne_group/shared/custom_genomes/mm10/TetR_Chr8/star --readFilesIn trimmed/RNA-TetO-MYB-TA2-4_R1_val_1.fq trimmed/RNA-TetO-MYB-TA2-4_R2_val_2.fq --readFilesCommand cat --outSAMtype BAM Unsorted --runThreadN 4 --outFileNamePrefix bam/RNA-TetO-MYB-TA2-4
I think this should be a simple fix, unless it needs to be conditional on single/paired end input??
Building DAG of jobs...
InputFunctionException in rule fastqc_raw in file /project/milne_group/adopicof/software/MambaForge/envs/seqnado/lib/python3.11/site-packages/seqnado/workflow/rules/qc.smk, line 4:
Error:
TypeError: stat: path should be string, bytes, os.PathLike or integer, not float
Wildcards:
sample=H3K27ac_DMSO_H3K27ac_1.fastq.gz
read=1
Traceback:
File "/project/milne_group/adopicof/software/MambaForge/envs/seqnado/lib/python3.11/site-packages/seqnado/workflow/rules/qc.smk", line 6, in
File "/project/milne_group/adopicof/software/MambaForge/envs/seqnado/lib/python3.11/site-packages/seqnado/utils.py", line 191, in translate_fq_files
File "/project/milne_group/adopicof/software/MambaForge/envs/seqnado/lib/python3.11/site-packages/seqnado/utils_chipseq.py", line 164, in translation
File "/project/milne_group/adopicof/software/MambaForge/envs/seqnado/lib/python3.11/site-packages/seqnado/utils_chipseq.py", line 155, in _translate_ip_samples
File "", line 19, in exists
WorkflowError:
Failed to pull singularity image from library://asmith151/seqnado/seqnado_pipeline:latest:
FATAL: Unable to get library client configuration: remote has no library client (see https://apptainer.org/docs/user/latest/endpoint.html#no-default-remote)
Fix:
apptainer remote add --no-login SylabsCloud cloud.sylabs.io
apptainer remote use SylabsCloud
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.