Git Product home page Git Product logo

wacs's Introduction

WACS: Improving Peak Calling by Optimally Weighting Controls

Introduction

Chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq) allows biologists to identify protein/DNA binding and histone modifications across the genome. Every experiment is prone to noise and bias, and ChIP-seq experiments are no exception. To alleviate bias, the incorporation of control datasets in ChIPseq analysis is an essential step. The controls are used to account for the background signal, while the remainder of the ChIP-seq signal captures true binding or histone modification. However, a recurrent issue is different types of bias in different ChIP-seq experiments. Depending on which controls are used, different aspects of ChIP-seq bias are better or worse accounted for, and peak calling can produce different results for the same ChIP-seq experiment.

We introduce WACS (Weighted Analysis of ChIP-seq). WACS is an extension of the well-known peak caller -- MACS2. It allows the use of “smart” controls to model the non-signal effect for a specific ChIP-seq experiment. WACS first generates the weights per control to model the background distribution per ChIP-seq experiment. This is then followed by peak calling.

Install

Make sure Part1.sh, getWeights.py, ManipulateData.py and getData.py are in the same directory. Your environment should also be set with Python, BEDtools and SAMtools.

Usage

Step 1. Generate the weights per control sample.

./Part1.sh controlDir treatmentFile chromSize

Inputs:

(a) a path to a directory holding all the control BAM files -- required.

(b) full path to the treatment BAM file -- required.

(c) full path to file containing chromosome sizes corresponding to the species genome. -- optional. File format eg: chr1 248956422 If no chromosome size is provided, by default chromosome sizes corresponding to the human genome will be used.

The BAM files could either be (1) unsorted or unindexed bam files, or (2) sorted and indexed bam files.

Step 2. Call peaks.

For each chip, get control names and their corresponding weights and pass them to macs2 callpeak_wacs.

FILENAME=basename name.bam

BASE=echo $FILENAME | cut -d. -f1

coefFile=Coefficients/$BASE.coefficients.csv

ControlBamDir=BAM/Control

if [ -e $coefFile ]

then

controlNames="`cut -d, -f1 $coefFile`"

controlWeights="`cut -d, -f2 $coefFile`"

for i in $controlNames; do controlNames_full+="$ControlBamDir$i.bam "; done

macs2 callpeak_wacs -t name.bam -c $controlNames_full -w $controlWeights

fi

Citation

If you use this code for your research, please cite our paper:

Awdeh, Aseel, Marcel Turcotte, and Theodore J. Perkins. "WACS: improving ChIP-seq peak calling by optimally weighting controls." BMC bioinformatics 22.1 (2021): 1-21.

wacs's People

Contributors

aawdeh avatar

Watchers

 avatar

wacs's Issues

ImportError: cannot import name quick_pileup

I created a fresh conda environment installed python (v 2.7.9), then installed samtools, bedtools. I also installed numpy, scipy, pands through pip as mentioned in readme file. Then I run the "python setup.py install" to install the provided macs2 algorithm. macs2 algorithm is successfully installed, the subcommand callpeak_wacs is observed.

Now I performed peak calling with 'callpeak_wacs' sub command using the bam files (test and control) and respective control weights as input. Here I am attaching the code used to perform peak calling.

macs2
callpeak_wacs
-t IP_B5_R1_sorted_unique.bam
-c MOCK_B5_R1_sorted_unique.bam 20190125_A-MOCK_B6_R1_sorted_unique.bam
-w 0.13866679257704465 0.42968144125181074
-g 3.73e+08
-f BAM
-n B5
--outdir B5_Out

This returned the following error.

Traceback (most recent call last):
File "/home/anaconda3/envs/py279_2/bin/macs2", line 4, in
import('pkg_resources').run_script('MACS2==2.1.3', 'macs2')
File "/home/anaconda3/envs/py279_2/lib/python2.7/site-packages/pkg_resources/init.py", line 748, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/home/anaconda3/envs/py279_2/lib/python2.7/site-packages/pkg_resources/init.py", line 1517, in run_script
exec(code, namespace, namespace)
File "/home/anaconda3/envs/py279_2/lib/python2.7/site-packages/MACS2-2.1.3-py2.7-linux-x86_64.egg/EGG-INFO/scripts/macs2", line 754, in
main()
File "/home/anaconda3/envs/py279_2/lib/python2.7/site-packages/MACS2-2.1.3-py2.7-linux-x86_64.egg/EGG-INFO/scripts/macs2", line 61, in main
from MACS2.callpeakw_cmd import run
File "/home/anaconda3/envs/py279_2/lib/python2.7/site-packages/MACS2-2.1.3-py2.7-linux-x86_64.egg/MACS2/callpeakw_cmd.py", line 34, in
from MACS2.OptValidator import opt_validate_wacs
File "/home/anaconda3/envs/py279_2/lib/python2.7/site-packages/MACS2-2.1.3-py2.7-linux-x86_64.egg/MACS2/OptValidator.py", line 27, in
from MACS2.IO.Parser import BEDParser, ELANDResultParser, ELANDMultiParser,
File "MACS2/IO/Parser.pyx", line 27, in init MACS2.IO.Parser
File "MACS2/IO/FixWidthTrack.pyx", line 31, in init MACS2.IO.FixWidthTrack
File "MACS2/Pileup.pyx", line 23, in init MACS2.Pileup
File "MACS2/IO/PairedEndTrack.pyx", line 25, in init MACS2.IO.PairedEndTrack
ImportError: cannot import name quick_pileup

Please let me know if I perform any mistakes in performing peak calling or installation errors.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.