Git Product home page Git Product logo

tobiasrausch / atacseq Goto Github PK

View Code? Open in Web Editor NEW
71.0 7.0 36.0 338 KB

Analysis Workflow for Assay for Transposase-Accessible Chromatin using sequencing (ATAC-Seq)

Home Page: https://tobiasrausch.com/courses/atac/

License: BSD 3-Clause "New" or "Revised" License

Makefile 3.46% Shell 59.21% R 23.98% Python 11.63% Dockerfile 1.72%
atac-seq sequencing chromatin nucleosome-positioning next-generation-sequencing peak-detection atac-seq-pipeline

atacseq's Introduction

ATAC-Seq Pipeline Installation

git clone https://github.com/tobiasrausch/ATACseq.git

cd ATACseq

make all

If one of the above commands fail your operating system probably lacks some build essentials. These are usually pre-installed but if you lack them you need to install these. For instance, for Ubuntu this would require:

apt-get install build-essential g++ git wget unzip

Building promoter regions for QC and downloading motifs

To annotate motifs and estimate TSS enrichments some simple scripts are included in this repository to download these databases.

cd bed/ && Rscript promoter.R && cd ..

cd motif/ && ./downloadMotifs.sh && cd ..

Running the ATAC-Seq analysis pipeline for a single sample

./src/atac.sh <hg38|hg19|mm10> <read1.fq.gz> <read2.fq.gz> <genome.fa> <output prefix>

Plotting the key ATAC-Seq Quality Control metrics

The pipeline produces at various steps JSON QC files (*.json.gz). You can upload and interactively browse these files at https://gear-genomics.embl.de/alfred/. In addition, the pipeline produces a succinct QC file for each sample. If you have multiple output folders (one for each ATAC-Seq sample) you can simply concatenate the QC metrics of each sample.

head -n 1 ./*/*.key.metrics | grep "TssEnrichment" | uniq > summary.tsv

cat ./*/*.key.metrics | grep -v "TssEnrichment" >> summary.tsv

To plot the distribution for all QC parameters.

Rscript R/metrics.R summary.tsv

ATAC-Seq pipeline output files

The ATAC-Seq pipeline produces various output files.

  • Bowtie BAM alignment files filtered for duplicates and mitochondrial reads.
  • Quality control output files from alfred, samtools, FastQC and cutadapt adapter filter metrics.
  • Macs peak calling files and IDR filtered peak lists.
  • Succinct browser tracks in bedGraph format and IGV's tdf format.
  • Footprint track of nucleosome positions and/or transcription factor bound DNA.
  • Homer motif finding results.

Differential peak calling

Merge peaks across samples and create a raw count matrix.

ls ./Sample1/Sample1.peaks ./Sample2/Sample2.peaks ./SampleN/SampleN.peaks > peaks.lst

ls ./Sample1/Sample1.bam ./Sample2/Sample2.bam ./SampleN/SampleN.bam > bams.lst

./src/count.sh hg19 peaks.lst bams.lst <output prefix>

To call differential peaks on a count matrix for TSS peaks, called counts.tss.gz, using DESeq2 we first need to create a file with sample level information (sample.info). For instance, if you have 2 replicates per condition:

echo -e "name\tcondition" > sample.info

zcat counts.tss.gz | head -n 1 | cut -f 5- | tr '\t' '\n' | sed 's/.final$//' | awk '{print $0"\t"int((NR-1)/2);}' >> sample.info

Rscript R/dpeaks.R counts.tss.gz sample.info

Intersecting peaks with annotation tracks

Peaks can of course be intersected with enhancer or conserved element tracks, i.e.:

cd tracks/ && downloadTracks.sh

bedtools intersect -a ./Sample2/Sample2.peaks -b tracks/conserved.bed

Plotting peak density along all chromosomes

There is a basic Rscript available for plotting peak densities.

Rscript R/karyoplot.R input.peaks

Citation

Tobias Rausch, Markus Hsi-Yang Fritz, Jan O Korbel, Vladimir Benes.
Alfred: Interactive multi-sample BAM alignment statistics, feature counting and feature annotation for long- and short-read sequencing.
Bioinformatics. 2018 Dec 6.

B Erarslan, JB Kunz, T Rausch, P Richter-Pechanska et al.
Chromatin accessibility landscape of pediatric T‐lymphoblastic leukemia and human T‐cell precursors
EMBO Mol Med (2020)

License

This ATAC-Seq pipeline is distributed under the BSD 3-Clause license.

atacseq's People

Contributors

tobiasrausch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

atacseq's Issues

bowtie step

Dear,
Thank you for this project for ATAC-Seq analyse.
I don't understand how Bowtie step could work, line 46 in align.sh:
bowtie2 --threads ${THREADS} --local --maxins 2000 -x ${HG} -1 ${OUTP}.1.fq.gz -2 ${OUTP}.2.fq.gz 2> ${OUTP}.bowtie.log | samtools view -bT ${HG} - > ${OUTP}.raw.bam

-x ${HG} requires a Bowtie index
-bT ${HG] requires a fasta file

So either I have a error of Bowtie2 ... does not exist or is not a Bowtie 2 index

So what should I indicate as <genome.fa> argument in atac.sh? Or should I indicate the Bowtie index in align.sh?

Best,

Trim fastq.gz step issue

Hi Trausch,

I wonder about one step (cut adapter step) in your pipeline. You set a specific "sequence" to remove the adapter, and I don't know whether this could affect the outcome (follow your pipeline). Is there any chance to take this step automatically or take our option?

This is a great pipeline (ALL in one)!

Thanks,
Haitao

Issue with annotatePeak in ChIPseeker

Hi Tobias,

I am following your ATACseq tutorial and am currently at step 2.9 (Annotate genomic context). I'm able to make the gr object from the data frame. However, when I try to use the annotatePeak function from Dr.Yu's ChIPseeker, I get the following error:

My code (from the tutorial):

txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
peakAnno <- annotatePeak(gr, tssRegion=c(-1000, 1000), TxDb=txdb, annoDb="org.Hs.eg.db", verbose=TRUE)

Terminal Output:

preparing features information... 2021-12-08 03:28:04 PM
identifying nearest features... 2021-12-08 03:28:04 PM
Error in which(strand(subject) != "-") :
argument to 'which' is not logical

I interrogated strand(gr) for the sample dataset in your tutorial:
factor-Rle of length 57891 with 1 run
Lengths: 57891
Values : *
Levels(3): + - *

I wonder if the function is having trouble with the factor-Rle object. Are you able to help with this?

Many thanks,

George

The downloaded jaspar.zip seems not a zip file

For the motif/downloadMotifs.sh script,

curl --output jaspar.zip 'http://jaspar.genereg.net/download/CORE/JASPAR2018_CORE_vertebrates_non-redundant_pfms_jaspar.zip'

It seems that the downloaded zip file is actually not a zip file. I manually downloaded the zip file and then the rest lines work.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.