Git Product home page Git Product logo

seqnado's Introduction

SeqNado logo

SeqNado Pipeline

Pipeline based on snakemake to process ChIP-seq, ATAC-seq, RNA-seq and short read WGS data for SNP calling.

See the SeqNado documentation https://alsmith151.github.io/SeqNado/ for more information.

seqnado's People

Contributors

alsmith151 avatar cchahrour avatar joeharman avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar Emily Georgiades avatar Christopher Cole avatar

Watchers

 avatar  avatar

seqnado's Issues

Chromsizes specification issue

Mapping ChIP-seq data using hg38 (using the databank UCSC indices), and it seems to have an issue converting bed to bigbed, possibly missing “chrEBV” in chrom.sizes?? Error message is below.

2021-07-22 12:22:59,297 INFO main control - {"task": "'convert_bed_to_bigbed'", "task_status": "update", "task_total": 1, "task_completed": 0,

"task_completed_percent": 0.0}

2021-07-22 12:22:59,297 ERROR main task - \

                                         Exception #1 \

                                           'builtins.OSError(Job 2487560 has non-zero exitStatus 255: hasExited=True,  wasAborted=Falsehas

Signal=True, terminatedSignal='unknown

signal?!' \

                                         statement = cat peaks/homer/ChIP-SEM-MYB_MYB_homer.bed                     | sort -k1,1 -k2,2n > peaks/homer/ChIP-SEM-MYB_MYB_homer.bed.tmp                     && bedToBigBed peaks/homer/ChIP-SEM-MYB_MYB_homer.bed.tmp chrom_sizes.txt.tmp peaks/homer/ChIP-SEM-MYB_MYB_homer.bigBed \

                                          stderr = chrEBV is not found in chromosome sizes file \

                                         )' raised in ... \

                                            Task = def convert_bed_to_bigbed(...): \

                                            Job  = [peaks/homer/ChIP-SEM-MYB_MYB_homer.bed -> peaks/homer/ChIP-SEM-MYB_MYB_homer.bigBed] \

                                          \

                                         Traceback (most recent call last): \

                                           File "/home/j/jharman/.conda/envs/ngs/lib/python3.8/site-packages/ruffus/task.py", line 712, in run_pooled_job_without_exceptions \

                                             return_value = job_wrapper(params, user_defined_work_func, \

                                           File "/home/j/jharman/.conda/envs/ngs/lib/python3.8/site-packages/ruffus/task.py", line 545, in job_wrapper_io_files \

                                             ret_val = user_defined_work_func(*params) \

                                           File "/home/j/jharman/.conda/envs/ngs/lib/python3.8/site-packages/ngs_pipeline/pipeline_atac_chipseq.py", line 600, in convert_bed_to_bigbed \

                                             P.run( \

                                           File "/home/j/jharman/.conda/envs/ngs/lib/python3.8/site-packages/cgatcore/pipeline/execution.py", line 1223, in run \

                                             benchmark_data = r.run(statement_list) \

                                           File "/home/j/jharman/.conda/envs/ngs/lib/python3.8/site-packages/cgatcore/pipeline/execution.py", line 803, in run \

                                             stdout, stderr, resource_usage = self.queue_manager.collect_single_job_from_cluster( \

                                           File "/home/j/jharman/.conda/envs/ngs/lib/python3.8/site-packages/cgatcore/pipeline/cluster.py", line 145, in collect_single_job_from_cluster \

                                             raise OSError(error_msg) \

                                         OSError: Job 2487560 has non-zero exitStatus 255: hasExited=True,  wasAborted=FalsehasSignal=True, terminatedSignal='unknown signal?!'  \

                                         statement = cat peaks/homer/ChIP-SEM-MYB_MYB_homer.bed                     | sort -k1,1 -k2,2n > peaks/homer/ChIP-SEM-MYB_MYB_homer.bed.tmp                     && bedToBigBed peaks/homer/ChIP-SEM-MYB_MYB_homer.bed.tmp chrom_sizes.txt.tmp peaks/homer/ChIP-SEM-MYB_MYB_homer.bigBed \

                                          stderr = chrEBV is not found in chromosome sizes file \

Easier GEO submission/ Reporting Summary

Process the stats in such a way as to make them easier for a GEO submission and/or Nature Reporting Summary

  • MD5 checksums
  • Extract stats into a format that works for a reporting summary e.g. number of reads raw/mapped/filtered
  • Create a summary of the run parameters in an easy to copy format

Updated profiles

  • Add singularity prefix to stop everyone downloading the container all the time
  • Update default-resources in profile

Hg38 bigwig generation - difference in default behaviour for deeptools and homer

Creating bigwigs in hg38 with homer and deeptools results in Homer tracks showing regions of absent signal, whereas normal signal is found with deeptools. These regions are not overlapping blacklist. Looking at hg19 mapping does not show this loss of signal.

I believe this is because hg38 mapping has a much higher level of multimapping compared with hg19. By default Homer retains uniquely mapped reads, and multimapping regions are lost. Adding the -keepAll parameter to homer tag directory generation fixes this issue.

Is this a parameter that we want used by default? Or maybe the template config files should have these written in?

Example tracks attached illustrating the issue. 1st track shows DeepTools, 2nd track shows Homer, 3rd track shows Homer with the -keepAll parameter.
Screenshot 2021-07-23 at 18 45 15

STAR command does not account for paired end data

The STAR commands generated by the pipeline are written out in single end mode (Unless I've missed a pipeline flag?)

The --readFilesIn parameter uses "," to imply fastq files that are concatenated (i.e, lanes), while " " is used to seperate read 1 and read 2.

Such that this command (as the pipeline uses):

STAR --genomeDir /project/milne_group/shared/custom_genomes/mm10/TetR_Chr8/star --readFilesIn trimmed/RNA-TetO-MYB-TA2-4_R1_val_1.fq,trimmed/RNA-TetO-MYB-TA2-4_R2_val_2.fq --readFilesCommand cat --outSAMtype BAM Unsorted --runThreadN 4 --outFileNamePrefix bam/RNA-TetO-MYB-TA2-4

Should be (for paired end):

STAR --genomeDir /project/milne_group/shared/custom_genomes/mm10/TetR_Chr8/star --readFilesIn trimmed/RNA-TetO-MYB-TA2-4_R1_val_1.fq trimmed/RNA-TetO-MYB-TA2-4_R2_val_2.fq --readFilesCommand cat --outSAMtype BAM Unsorted --runThreadN 4 --outFileNamePrefix bam/RNA-TetO-MYB-TA2-4

I think this should be a simple fix, unless it needs to be conditional on single/paired end input??

Spaces in design file leads to strange bugs

Building DAG of jobs...

InputFunctionException in rule fastqc_raw in file /project/milne_group/adopicof/software/MambaForge/envs/seqnado/lib/python3.11/site-packages/seqnado/workflow/rules/qc.smk, line 4:

Error:

TypeError: stat: path should be string, bytes, os.PathLike or integer, not float

Wildcards:

sample=H3K27ac_DMSO_H3K27ac_1.fastq.gz

read=1

Traceback:

File "/project/milne_group/adopicof/software/MambaForge/envs/seqnado/lib/python3.11/site-packages/seqnado/workflow/rules/qc.smk", line 6, in

File "/project/milne_group/adopicof/software/MambaForge/envs/seqnado/lib/python3.11/site-packages/seqnado/utils.py", line 191, in translate_fq_files

File "/project/milne_group/adopicof/software/MambaForge/envs/seqnado/lib/python3.11/site-packages/seqnado/utils_chipseq.py", line 164, in translation

File "/project/milne_group/adopicof/software/MambaForge/envs/seqnado/lib/python3.11/site-packages/seqnado/utils_chipseq.py", line 155, in _translate_ip_samples

File "", line 19, in exists

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.