macs3-project / macs Goto Github PK

View Code? Open in Web Editor NEW

695.0 50.0 270.0 565.69 MB

MACS -- Model-based Analysis of ChIP-Seq

Home Page: https://macs3-project.github.io/MACS/

License: BSD 3-Clause "New" or "Revised" License

Shell 3.09% Python 27.01% C 17.49% Makefile 0.12% C++ 0.86% Cython 51.43%

chip-seq atac-seq dnase-seq peak-caller python poisson-equation macs

macs's Introduction

MACS: Model-based Analysis for ChIP-Seq

Latest Release:

Github:
PyPI:
Bioconda:
Debian Med:

Introduction

With the improvement of sequencing techniques, chromatin immunoprecipitation followed by high throughput sequencing (ChIP-Seq) is getting popular to study genome-wide protein-DNA interactions. To address the lack of powerful ChIP-Seq analysis method, we presented the Model-based Analysis of ChIP-Seq (MACS), for identifying transcript factor binding sites. MACS captures the influence of genome complexity to evaluate the significance of enriched ChIP regions and MACS improves the spatial resolution of binding sites through combining the information of both sequencing tag position and orientation. MACS can be easily used for ChIP-Seq data alone, or with a control sample with the increase of specificity. Moreover, as a general peak-caller, MACS can also be applied to any "DNA enrichment assays" if the question to be asked is simply: where we can find significant reads coverage than the random background.

Please find MACS3 documentations through MACS3 website.

Contribute

Please read our CODE OF CONDUCT and How to contribute documents. If you have any questions, suggestion/ideas, or just want to have conversions with developers and other users in the community, we recommend using the MACS Discussions instead of posting to our Issues page.

Ackowledgement

MACS3 project is sponsored by CZI EOSS. And we particularly want to thank the user community for their supports, feedbacks and contributions over the years.

Citation

2008: Model-based Analysis of ChIP-Seq (MACS)

macs's People

Contributors

Stargazers

Watchers

Forkers

jakebiesinger shuangxinyu fabrices benjschiller hanfeisun cnibr kprog stephanecastel nstong zhouhufeng humburg rpique caofan jeffhsu3 jespergrud jacquespe liuyuuan aukes hjanime mkhushi chenjiwei124128 bioinformaticsarchive johnurban biocyberman lazycrazyowl heidou007 honglongwu yexiang2046 mezarino daoyingiris y-mone mariux bioarpit zakch17 jinyancool al3n70rn phite flalix cdieterich aertslab galemu2 longtianpy renzhonglu minh084 jpascualanaya phruederi datapython xtmgah jerlya sowmyaiyer yuanbaowen521 alancao123 vd4mmind pfroux xiwa2006 rgejman dpellacani eskeww hongucdenver changebio lparsons rochevin gonzalo-villarino yolandaly mr-c weng-lab ustcahwry siyunw lchmo444 xuanheiiis vangalamaheshh jiehuang2000 tplink32 j3gu haokuo genomicsnx shengqh zhpn1024 liulingnovo alenzhao qiuyuguo dayedepps stephdc lw3259111 gpwls23 jchenpku starpiratewl vreuter winjor lzamparo tankmermaid xiangyupan alicez2016 jano15 huangzh0818 mehulrijawani fullerngs akramdi rafalcode living1069

macs's Issues

bdgpeakcall missing peaks

When looking at score bedGraphs generated by bdgcomp regions that are above threshold that should be called peaks are missed by bdgpeakcall. For example:

chr2 0 1 0.00000
chr2 1 1157 0.16251
chr2 1157 1173 0.59598
chr2 1173 1177 1.92692
chr2 1177 1192 2.25381
chr2 1192 1251 3.78114
chr2 1251 1260 5.39507
chr2 1260 1278 4.89302
chr2 1278 1287 6.50066
chr2 1287 1307 6.02890
chr2 1307 1327 7.63021
chr2 1327 1352 9.31474
chr2 1352 1390 8.77969
chr2 1390 1391 8.62740
chr2 1391 1438 8.46170
chr2 1438 1441 8.94880
chr2 1441 1471 7.30162
chr2 1471 1475 7.81930
chr2 1475 1488 6.17264
chr2 1488 1530 4.61371
chr2 1530 1540 5.05366
chr2 1540 1543 5.60691
chr2 1543 1547 3.94172
chr2 1547 1551 5.60691
chr2 1551 1563 3.94172
chr2 1563 1564 3.49481
chr2 1564 1576 3.14424
chr2 1576 1606 1.74960
chr2 1606 1629 0.55045
chr2 1629 5718 0.16251
chr2 5718 5719 0.13415
chr2 5719 51113 0.16251
chr2 51113 51114 0.09686
chr2 51114 81245 0.16251
chr2 81245 81246 0.05880
chr2 81246 82785 0.16251
chr2 82785 82799 0.04603
chr2 82799 86160 0.16251
chr2 86160 86161 0.09345
chr2 86161 86638 0.16251
chr2 86638 86645 0.07291
chr2 86645 105818 0.16251

Even using lenient parameters (-l 10 -c 2 -g 2000), I can not get the peak in this region to be called.

Broad peaks adding extra region

MACS2-2.0.10.09132012

example:

This is a single peak that is isolated, peak.bed file entry:
chr6 124894207 124894537 H3K4me1_peak_81597 6.49299

The broad peak bed12 file has it like this
chr6 124893066 124894537 H3K4me1_peak_40500 6 . 124894207 124894537 0 2 1,330 0,1141

Shouldn't this bed12 entry be this:
chr6 124894207 124894537 H3K4me1_peak_40500 6 . 124894207 124894537 0 1 330 0

end_example

The above example add a 3' extension to the peak call. Other instances add a 5' or both 3' and 5' extensions.

Add options for customized scaling factors

Adding separate normalisation factor for size difference between input and treatment

Just to revive an issue that has been commented before. It has been shown that for some datasets (particularly those with larger depths) the normalization between the Input and the IP has a criticall effect in the FDR calculation. There have been some attempts to develop estimations for the normalizing factor that seem to work nicely with a wide range of sequencing depths (see for example: http://www.biomedcentral.com/1471-2105/13/199).

I've seen some forks from version 1.4 that include an option to give your own normalization factor (instead of using the linear normalization with --to-large or --down-scale). I think it would be great if such an option could be included in the main macs2 callpeak code (and not terribly complicated as far as I can tell). What do you think?

Best, and thaks for the program!

format and use of gappedPeak output file?

Hello,
Thanks for making the tool available. I am testing out macs2 and see how to use it for downstream analysis with MEDIPS or ChIPpeakAnno in R's Bioconductor package.
I ran the following command:

macs2 callpeak -t LT1.bam LT2.bam LT3.bam LT4.bam -c C5.bam C6.bam C7.bam C8.bam --broad -n RatMeDIP -f BAM -g 1949947921 --outdir macs2

Among the output files, I don't understand the format and use of .gappedPeak file. Could you update the README file so it explains about the format and use of this file?

Thanks

Support data w/o control in MACS2 [Orig Title:NameError: poisson_cdf]

Getting this error, not getting a cdf from the right place?

Traceback (most recent call last):
File "/usr/local/bin/macs2", line 354, in
main()
File "/usr/local/bin/macs2", line 171, in main
peakdetect.call_peaks()
File "cPeakDetect.pyx", line 83, in MACS2.cPeakDetect.PeakDetect.call_peaks (MACS2/cPeakDetect.c:1266)
File "cPeakDetect.pyx", line 316, in MACS2.cPeakDetect.PeakDetect.__call_peaks_wo_control (MACS2/cPeakDetect.c:4306)
File "cBedGraph.pyx", line 486, in MACS2.IO.cBedGraph.bedGraphTrackI.apply_func (MACS2/IO/cBedGraph.c:6431)
File "cPeakDetect.pyx", line 316, in MACS2.cPeakDetect.__call_peaks_wo_control.lambda6 (MACS2/cPeakDetect.c:3835)
NameError: poisson_cdf

Geeting the number of reads per peak

Hi,

I wanted to check whether I can recover from the output of MACS2 the number of reads falling within a peak. Is this provided in the narrowPeak file as the column 5 entitled "5th: integer score for display"?

Thanks,
Oana

IndexError: list index out of range

INFO @ Wed, 16 Oct 2013 14:27:12: #1 read tag files...
INFO @ Wed, 16 Oct 2013 14:27:12: #1 read treatment tags...
INFO @ Wed, 16 Oct 2013 14:27:17: 1000000
INFO @ Wed, 16 Oct 2013 14:27:22: 2000000
INFO @ Wed, 16 Oct 2013 14:27:27: 3000000
INFO @ Wed, 16 Oct 2013 14:27:32: 4000000
INFO @ Wed, 16 Oct 2013 14:27:37: 5000000
INFO @ Wed, 16 Oct 2013 14:27:42: 6000000
INFO @ Wed, 16 Oct 2013 14:27:47: 7000000
INFO @ Wed, 16 Oct 2013 14:27:52: 8000000
INFO @ Wed, 16 Oct 2013 14:27:57: 9000000
INFO @ Wed, 16 Oct 2013 14:28:03: 10000000
INFO @ Wed, 16 Oct 2013 14:28:08: 11000000
INFO @ Wed, 16 Oct 2013 14:28:13: 12000000
INFO @ Wed, 16 Oct 2013 14:28:18: 13000000
INFO @ Wed, 16 Oct 2013 14:28:23: 14000000
INFO @ Wed, 16 Oct 2013 14:28:28: 15000000
INFO @ Wed, 16 Oct 2013 14:28:33: 16000000
INFO @ Wed, 16 Oct 2013 14:28:37: 17000000
INFO @ Wed, 16 Oct 2013 14:28:42: 18000000
INFO @ Wed, 16 Oct 2013 14:28:47: 19000000
INFO @ Wed, 16 Oct 2013 14:28:52: 20000000
INFO @ Wed, 16 Oct 2013 14:28:57: 21000000
Traceback (most recent call last):
File "/usr/bin/macs2", line 514, in
main()
File "/usr/bin/macs2", line 45, in main
run( args )
File "/usr/lib64/python2.7/site-packages/MACS2/callpeak.py", line 69, in run
else: (treat, control) = load_tag_files_options (options)
File "/usr/lib64/python2.7/site-packages/MACS2/callpeak.py", line 379, in load_tag_files_options
treat = tp.build_fwtrack()
File "cParser.pyx", line 824, in MACS2.IO.cParser.BAMParser.build_fwtrack (MACS2/IO/cParser.c:10311)
File "cParser.pyx", line 828, in MACS2.IO.cParser.BAMParser.build_fwtrack (MACS2/IO/cParser.c:10262)
File "cParser.pyx", line 898, in MACS2.IO.cParser.BAMParser.__build_fwtrack_wo_pysam (MACS2/IO/cParser.c:11404)
IndexError: list index out of range

MACS14 doesn't recognized paired-end reads aligned to minus strand

Dear All.

I have paired-end BAM files for my treatment and control samples. When I run

macs14 -t treatment.bam -c control.bam -n output -g hs -B -S -p 0.05

I get the following warnings for all chromosomes on each file (treatment, and control):

WARNING @ Wed, 11 Dec 2013 12:17:35: NO records for chromosome chr1, minus strand!
.
WARNING @ Wed, 11 Dec 2013 12:17:35: NO records for chromosome chrY, minus strand!

I checked the BAM files, and there are reads aligned to the negative strand (across the files the sam flags are: 77, 83, 99, 141, 147, and 163). This problem doesn't show up when I run MACS on single-end libraries (sam flags: 0, and 16).

I would appreciate any idea about how to solve or circumvent this problem.

Thanks in advance,
Marcelo

bdgdiff calling problem

I used bdgdiff to find different/common regions from two samples. The program ran successfully, but when I visualized them in UCSC genome browser, I found this region doesn't make sense.

As you see in the image, the first track is sample1_vs_sample2_cond2.bed, the second track is sample2_treat_pileup.bw, and the third track is sample1_treat_pileup.bw.
Intuitively, those peaks showed in sample 1 and 2 look like common peaks, however, macs2 bdgdiff considered them as different peaks.

In case you need more information. Here are peak excel files from macs2 callpeak.

sample 1

chr	start	end	length	abs_summit	pileup	-LOG10(pvalue)	fold_enrichment	-LOG10(qvalue)	name
chrUextra	27371424	27371889	466	27371713	235	53.31093	3.29706	49.35984	zr785_6_minus_control_peak_1858

sample 2

chr	start	end	length	abs_summit	pileup	-LOG10(pvalue)	fold_enrichment	-LOG10(qvalue)	name
chrUextra	27371432	27371888	457	27371731	214	63.16993	4.0555	58.86643	zr785_7_minus_control_peak_1483

sample1_vs_sample2_cond2.bed
chrUextra 27371569 27371799 sample1_vs_sample2_cond2_73 3.70122

macs2 output files

Hi Everyone,

I have recently started using MACS2 for peak calling. I want to understand some columns in the output file NAME_peaks.xls
What does 5th column which is Absolute peak summit position tells?
What does pileup height at peak summit means??

What is the difference when I call macs2 callpeak with -q 0.01 and with -q 0.05

Also I read about the -m MFOLD option but could not understand it properly.

Hope to hear from you soon

Regards
Varun

Logical error in code

https://github.com/taoliu/MACS/blob/2.1.0.20140616/MACS2/IO/cFixWidthTrack.pyx#L284

'n' is number of repeated tags. ANd it have not been initialized to 1 for every chromosome.

ImportError when use macs2

I use python setup.py install on a Ubuntu server.
However, when I run "macs2" in terminal.
It shows like this:

Traceback (most recent call last):
File "/usr/local/bin/macs2", line 35, in
from MACS2.OptValidator import opt_validate
File "/usr/local/lib/python2.6/dist-packages/MACS2/OptValidator.py", line 26, in
from MACS2.IO.cParser import BEDParser, ELANDResultParser, ELANDMultiParser, ELANDExportParser, SAMParser, BAMParser, BowtieParser, guess_parser
ImportError: /usr/local/lib/python2.6/dist-packages/MACS2/IO/cParser.so: undefined symbol: PyUnicodeUCS2_DecodeUTF8

I've installed numpy and cython in this server, but when I get into the directory of /usr/local/lib/python2.6/dist-packages/MACS2/IO. I found cFeatIO.so was not there. I get a copy from other server and put it here. But It didn't work.

MACS2 install issue

I am trying to install MACS2 on a computer running OSX 10.9.2. I have Python 2.7.5 and numpy is properly installed. I get the error message below whether i use easy_install or when i download from pypi (and do python setup.py install). Any help would be appreciated. Thanks.

easy_install MACS2

...

clang: error: unknown argument: '-mno-fused-madd' [-Wunused-command-line-argument-hard-error-in-future]

clang: note: this will be a hard error (cannot be downgraded to a warning) in the future

error: Setup script exited with error: command 'cc' failed with exit status 1

Issue with peak calls

Hi,

I have been using MACS2 to call peaks for transcription factors using the bam files from the ENCODE database (for HeLaS3 cell line). In specific, for any two given transcription factors (TFs), I am interested in picking sites that is bound only by one of them. So, I called peaks using different q-value cut-offs -- q 0.001 for the TF1 (which is "E2f1" in my case) peak I want bound, and q 0.1 (or 0.2) for TF2 ("E2f4" in my case) that I don't want bound at the same region that TF1 is bound. The idea was that this would give us E2f1 peaks with few false positive, and E2f4 peaks with more false positives (and including peaks that we are not so confident about), and when any of the E2f1 peaks that overlaps with such E2f4 peaks (generated using low q-value cut-off), we would be getting peaks that we know are bound only by the E2f1 Tf.

However, when I implemented this protocol, I came across a number of E2f1 peaks that had significant E2f4 pileup overlap, and therefore did not look like the regions were uniquely bound by E2f1 -- the regions looked like there should have been peaks called for both E2f1 and E2f4 Tfs. The MACS2 called peaks for only E2f1, but not for E2f4 in those regions.

I have attached a plot of pileup coverage for both E2f1 and E2f4 in one of such what was supposed to be a binding site unique to E2f1. The title of the plot define the peak chromosome and position; the red bars in the plot define the actual boundaries of the E2f1 plot.

Currently I am using further downstream filtering for this, but I just wanted to let you guys know of this issue.

Thank you.

Best,
Dinesh Manandhar,
Duke University.

'Keep duplicates' always ignores duplicates if tagged accordingly in SAM/BAM files

From the name of the parameter, it appears that "--keep-dup all" should use all duplicate reads regardless of. However, when a sam/bam file contains reads that are properly tagged as PCR or optical duplicate then these reads are ignored in the IO stage.

cParser.pyx contains the following line of code (twice, lines 566 & 594 in current master)
if bwflag & 4 or bwflag & 512 or bwflag & 1024:
which filters reads with corresponding flags (i.e. 1024 for duplicates), therefore ignoring the user's request.

This bug(?) seems to be valid for both macs2 and macs14

How peaks are calculated ?

I noticed that after callpeak command with treat vs control MeDIP samples, I don't have peaks with fold-changes < 1, which, I suppose, are demethylated spots. Is it correct to expect fold-change < 1 for demethylated peaks? In general, a plain English explanation of how peaks and fold-changes are calculated will be very helpful.

I tried looking for that information in the publication http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3120977 but could not find anything. The last resort could be to look into the source code, but I prefer to have the authors' words first.

cannot run bdgdiff or diffpeak

I got ImportError: No module named cDiffScore error when running it. I checked my /usr/local/lib/python2.7/dist-packages/MACS2/IO, and cannot find cDiffScore.pyx inside. I also tried to download your source code from both github master and pypi. I found cDiffScore.pyx inside MACS2/IO, and I can use

import pyximport
pyximport.install()
from cDiffScore import DiffScoreTrackI

to compile it. I copied all files in IO after compiling to /usr/local/lib/python2.7/dist-packages/MACS2/IO. It still doesn't work. Do you have any idea to solve this problem?

Pileup values with non-zero decimals.. (unexpected..?)

Hi,

I recently ran MACS2 to call peaks in some human transcription factors in Helas3 cell line, and obtained pileup.bdg output where the pileup values were not natural numbers as one would expect; I got numbers with non-zero decimal pileup values. I was wondering if somebody could please explain why this might be the case.

Thank you.

ps. I have attached a png snapshot of the data herewith too. (The following is from the MACS2 pileup.bdg output on Helas3 Ap2gamma transcription factor).

gcc install

Dear Sir:

[root@biocc mpc-1.0.2]# ./configure --with-gmp-include=/usr/local/include --with-gmp-lib=/usr/local/lib
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether to enable maintainer-specific portions of Makefiles... no
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for a sed that does not truncate output... /bin/sed
checking for CC and CFLAGS in gmp.h... yes CC=gcc CFLAGS=-O2 -pedantic -m64
checking for CC=gcc and CFLAGS=-O2 -pedantic -m64... yes
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking for style of include used by make... GNU
checking dependency style of gcc... gcc3
checking for ar... ar
checking the archiver (ar) interface... ar
checking how to print strings... printf
checking for a sed that does not truncate output... (cached) /bin/sed
checking for fgrep... /bin/grep -F
checking for ld used by gcc... /usr/bin/ld
checking if the linker (/usr/bin/ld) is GNU ld... yes
checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B
checking the name lister (/usr/bin/nm -B) interface... BSD nm
checking whether ln -s works... yes
checking the maximum length of command line arguments... 1966080
checking whether the shell understands some XSI constructs... yes
checking whether the shell understands "+="... yes
checking how to convert x86_64-unknown-linux-gnu file names to x86_64-unknown-linux-gnu format... func_convert_file_noop
checking how to convert x86_64-unknown-linux-gnu file names to toolchain format... func_convert_file_noop
checking for /usr/bin/ld option to reload object files... -r
checking for objdump... objdump
checking how to recognize dependent libraries... pass_all
checking for dlltool... no
checking how to associate runtime and link libraries... printf %s\n
checking for archiver @file support... @
checking for strip... strip
checking for ranlib... ranlib
checking command to parse /usr/bin/nm -B output from gcc object... ok
checking for sysroot... no
checking for mt... mt
checking if mt is a manifest tool... no
checking how to run the C preprocessor... gcc -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for dlfcn.h... yes
checking for objdir... .libs
checking if gcc supports -fno-rtti -fno-exceptions... no
checking for gcc option to produce PIC... -fPIC -DPIC
checking if gcc PIC flag -fPIC -DPIC works... yes
checking if gcc static flag -static works... no
checking if gcc supports -c -o file.o... yes
checking if gcc supports -c -o file.o... (cached) yes
checking whether the gcc linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking whether -lc should be explicitly linked in... no
checking dynamic linker characteristics... GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking whether stripping libraries is possible... yes
checking if libtool supports shared libraries... yes
checking whether to build shared libraries... yes
checking whether to build static libraries... yes
checking for gmp.h... yes
checking for ANSI C header files... (cached) yes
checking locale.h usability... yes
checking locale.h presence... yes
checking for locale.h... yes
checking for inttypes.h... (cached) yes
checking for stdint.h... (cached) yes
checking limits.h usability... yes
checking limits.h presence... yes
checking for limits.h... yes
checking for unistd.h... (cached) yes
checking sys/time.h usability... yes
checking sys/time.h presence... yes
checking for sys/time.h... yes
checking whether time.h and sys/time.h may both be included... yes
checking complex.h usability... yes
checking complex.h presence... yes
checking for complex.h... yes
checking for library containing creal... -lm
checking whether creal, cimag and I can be used... yes
checking for an ANSI C-conforming const... yes
checking for size_t... yes
checking for gettimeofday... yes
checking for localeconv... yes
checking for setlocale... yes
checking for dup... yes
checking for dup2... yes
checking for __gmpz_init in -lgmp... yes
checking for MPFR... no
configure: error: libmpfr not found or uses a different ABI (including static vs shared).

my version of linux is Red Hat 4.4.7-7
I want to upgate the gcc, the three packages are "gmp-6.0.0a.tar.bz2", "mpfr-3.1.2.tar.gz" ,"mpc-1.0.1.tar.gz". When I install the MPC, it exists the error above. So i want to ask you the reason. Thank you.

Negative peaks file?

Hello there,

We are making some analysis using your program and we were expecting to find the "negative peak calls" output file.

Was it removed between the version 1.4 and the version 2.0?

Thanks a lot!

Huge memory usage of callpeak for earthworm genome

I have some earthworm ChIP-seq samples. MACS2 callpeak needed a huge amount memory, and I got out of memory error for my server with 68GB memory. I am not sure why, but I think it could be fixed.

earthworm

Genome size	Effective genome size	ref genome scaffold/chromosome #	mapped reads #
624328310	437029817 (genome size*0.7)	7779940	477841

The output of MACS2 callpeak

INFO  @ Fri, 25 Jul 2014 16:38:21:
# Command line: callpeak -B -g 437029817 -t sample1.bam -c sample2.bam -n sample1_minus_control --nomodel --extsize 200
# ARGUMENTS LIST:
# name = sample1_minus_control
# format = AUTO
# ChIP-seq file = ['sample1.bam']
# control file = ['sample2.bam']
# effective genome size = 4.37e+08
# band width = 300
# model fold = [5, 50]
# qvalue cutoff = 5.00e-02
# Larger dataset will be scaled towards smaller dataset.
# Range for calculating regional lambda is: 1000 bps and 10000 bps
# Broad region calling is off

INFO  @ Fri, 25 Jul 2014 16:38:21: #1 read tag files...
INFO  @ Fri, 25 Jul 2014 16:38:21: #1 read treatment tags...
INFO  @ Fri, 25 Jul 2014 16:38:25: Detected format is: BAM
INFO  @ Fri, 25 Jul 2014 16:38:25: * Input file is gzipped.
INFO  @ Fri, 25 Jul 2014 16:38:38:  1000000
INFO  @ Fri, 25 Jul 2014 16:38:51:  2000000
INFO  @ Fri, 25 Jul 2014 16:39:07:  3000000
INFO  @ Fri, 25 Jul 2014 16:39:24:  4000000
INFO  @ Fri, 25 Jul 2014 16:39:43:  5000000
INFO  @ Fri, 25 Jul 2014 16:39:58:  6000000
comp.sh: line 1: 27146 Killed                  macs2 callpeak -B -g 437029817 -t sample1.bam -c sample2.bam -n sample1_minus_control --nomodel --extsize 200

MACS2 installation error

Hello,

I used Cython-0.15.1 and python 2.7 versions to build MACS2 on my OS X.

It is installing and writing the .egg-info file and other binary files.

I tried building MACS as a root also. There is no problem with installation.

But, when I run the command ./macs2 its giving me the following error

Traceback (most recent call last):
File "./macs2", line 33, in
from MACS2.OptValidator import opt_validate
ImportError: No module named MACS2.OptValidator

Any help in this regard will be greatly appreciated.

Thanks,

Feature: use basename of --name as peak names

I like that you can give a full path to --name in order to write files into their respective directories, but it results in having to parse the name field of the bed into something reasonable.

Command:

macs2 callpeak -t bam -n /vol1/home/brownj/xxxxx/test/test ...

Results in:

chr1    1657224 1657228 /vol1/home/brownj/xxxxx/test/test_peak_55   1.63761

refinepeak raises AttributeError

I know it's experimental, but I'd like to experiment with it.

$ macs2 refinepeak -b MP55_peaks.bed -i MP55.bam -o refinetest
INFO  @ Fri, 12 Apr 2013 10:24:27: read tag files... 
INFO  @ Fri, 12 Apr 2013 10:24:27: read alignment tags... 
INFO  @ Fri, 12 Apr 2013 10:24:27: Detected format is: BAM 
INFO  @ Fri, 12 Apr 2013 10:24:27: * Input file is gzipped. 
INFO  @ Fri, 12 Apr 2013 10:24:31:  1000000 
INFO  @ Fri, 12 Apr 2013 10:24:35:  2000000 
INFO  @ Fri, 12 Apr 2013 10:24:38: tag size is determined as 50 bps 
Traceback (most recent call last):
  File "/usr/local/share/python/macs2", line 510, in <module>
    main()
  File "/usr/local/share/python/macs2", line 77, in main
    run( args )
  File "/usr/local/lib/python2.7/site-packages/MACS2/refinepeak.py", line 66, in run
    retval = fwtrack.compute_region_tags_from_peaks( peaks, find_summit, window_size = options.windowsize, cutoff = options.cutoff )
  File "cFixWidthTrack.pyx", line 654, in MACS2.IO.cFixWidthTrack.FWTrackIII.compute_region_tags_from_peaks (MACS2/IO/cFixWidthTrack.c:10624)
  File "cFixWidthTrack.pyx", line 677, in MACS2.IO.cFixWidthTrack.FWTrackIII.compute_region_tags_from_peaks (MACS2/IO/cFixWidthTrack.c:9756)
AttributeError: 'MACS2.IO.cPeakIO.PeakIO' object has no attribute 'peaks'

$ macs2 --version
macs2 2.0.10.20130404 (tag:beta)

Inconsistent help text for parameter --keep-dup

The help text for the 'keep-dup' option of the callpeak command suggests that the default is "auto" but then lists the default as "1". According to the code the actual default is "1". It would be nice if this could be made more consistent.

Deletion, Insertion and Soft clipping are not taken into consideration

While computing tags for negative strand in case of Single End Sam file, you take a tag to be end of a read that is "start + readLength - 1". Which is correct if read is exactly aligned. But when there is a few bases insertion or deletion then it might happen that end is not equal to "start + readLength - 1".

Pileup bdg files

I am just a little bit confused about pileup bdg files. The scores/values in treat are always integer, but the scores/values in control are float points. I am wondering if MACS2 normalized control pileup bdg according to treat, and keep treat un-normalized. If it is true, when I compare a treat to another, I still need to normalize them. Also, do you think normalize by mapped reads is a good way?

Thanks!

Error in bedGraphTrackII.call_peaks when use bdgpeakcall

I met an error when I use bdgpeakcall, and the traceback shows that it was on the Line 882 of cBedGraph:

[liuj@BioServer1 ~/ENCODE/GEOdata]$ bdgpeakcall -i GSM831025_H1_CHD1_1.bdg
INFO @ Mon, 09 Jan 2012 23:51:58: Read and build bedGraph...
INFO @ Mon, 09 Jan 2012 23:57:45: Call peaks from bedGraph...
Traceback (most recent call last):
File "/opt/bin/bdgpeakcall", line 86, in
main()
File "/opt/bin/bdgpeakcall", line 77, in main
peaks = btrack.call_peaks(cutoff=options.cutoff,min_length=options.minlen,max_gap=options.maxgap)
File "cBedGraph.pyx", line 882, in MACS2.IO.cBedGraph.bedGraphTrackII.call_peaks (MACS2/IO/cBedGraph.c:11840)
ValueError: operands could not be broadcast together with shapes (297031) (297030)

Then I change the Line882 from:
above_cutoff_endpos = self.data[chrom]['pos'][above_cutoff] # end positions of regions where score is above cutoff
above_cutoff_startpos = self.data[chrom]['pos'][above_cutoff_flag[1:]] # start position

to:
above_cutoff_endpos = self.data[chrom]['pos'][above_cutoff] # end positions of regions where score is above cutoff
above_cutoff_startpos = self.data[chrom]['pos'][above_cutoff_flag[0:]] # start position

and it seems passed, but I'm not sure whether it's ok to do that?

MACS2 Cython compilation fails

When building from source, one has to regenerate the .c file, using the setup_w_cython.py.

This fails for me when compiling the MACS2/IO/cCallPeakUnit.pyx file.

I’m using Python 2.7.3 under Ubuntu 12.04, Cython 0.15.1

Steps to reproduce:

Straight out Git clone
python setup_w_cython.py build

Stack trace:

MACS2/IO/cScoreTrack.c:14071:37: attention : ‘__pyx_v_cutoff’ may be used uninitialized in this function [-Wuninitialized]
gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro build/temp.linux-i686-2.7/MACS2/IO/cScoreTrack.o -o build/lib.linux-i686-2.7/MACS2/IO/cScoreTrack.so
cythoning MACS2/IO/cCallPeakUnit.pyx to MACS2/IO/cCallPeakUnit.c

Error compiling Cython file:
------------------------------------------------------------
...
                if call_summits:
                    #for x in peak_content:
                    #    if x[2] >= 10:
                    #        print peak_content
                    #        break
                    self.__close_peak_with_subpeaks (peak_content, peaks, min_length, chrom, smoothlen = min_length, score_cutoff_s = score_cutoff_s ) # smooth length is min_length, i.e. fragment size 'd'
                                                   ^
------------------------------------------------------------

MACS2/IO/cCallPeakUnit.pyx:689:52: Keyword and starred arguments not allowed in cdef functions.

Error compiling Cython file:
------------------------------------------------------------
...
                    #    if x[2] >= 10:
                    #        print peak_content
                    #        break
                    self.__close_peak_with_subpeaks (peak_content, peaks, min_length, chrom, smoothlen = min_length, score_cutoff_s = score_cutoff_s ) # smooth length is min_length, i.e. fragment size 'd'
                else:
                    self.__close_peak_wo_subpeaks   (peak_content, peaks, min_length, chrom, smoothlen = min_length, score_cutoff_s = score_cutoff_s ) # smooth length is min_length, i.e. fragment size 'd'
                                                   ^
------------------------------------------------------------

MACS2/IO/cCallPeakUnit.pyx:691:52: Keyword and starred arguments not allowed in cdef functions.

Error compiling Cython file:
------------------------------------------------------------
...
        # save the last peak
        if not peak_content:
            return peaks
        else:
            if call_summits:
                self.__close_peak_with_subpeaks (peak_content, peaks, min_length, chrom, smoothlen = min_length, score_cutoff_s = score_cutoff_s ) # smooth length is min_length, i.e. fragment size 'd'
                                               ^
------------------------------------------------------------

MACS2/IO/cCallPeakUnit.pyx:699:48: Keyword and starred arguments not allowed in cdef functions.

Error compiling Cython file:
------------------------------------------------------------
...
            return peaks
        else:
            if call_summits:
                self.__close_peak_with_subpeaks (peak_content, peaks, min_length, chrom, smoothlen = min_length, score_cutoff_s = score_cutoff_s ) # smooth length is min_length, i.e. fragment size 'd'
            else:
                self.__close_peak_wo_subpeaks   (peak_content, peaks, min_length, chrom, smoothlen = min_length, score_cutoff_s = score_cutoff_s ) # smooth length is min_length, i.e. fragment size 'd'
                                               ^
------------------------------------------------------------

MACS2/IO/cCallPeakUnit.pyx:701:48: Keyword and starred arguments not allowed in cdef functions.
building 'MACS2.IO.cCallPeakUnit' extension
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/python2.7 -c MACS2/IO/cCallPeakUnit.c -o build/temp.linux-i686-2.7/MACS2/IO/cCallPeakUnit.o
MACS2/IO/cCallPeakUnit.c:1:2: erreur: #error Do not use this file, it is the result of a failed Cython compilation.
error: command 'gcc' failed with exit status 1

OverflowError: value too large to convert to int

Whenever I am running macs for that bam file ( with only unique mapped reads) it get stuck at "36000000" for few hours and then I get this error.

INFO @ Tue, 08 Oct 2013 11:53:22: #1 read tag files...
INFO @ Tue, 08 Oct 2013 11:53:22: #1 read treatment tags...
INFO @ Tue, 08 Oct 2013 11:53:28: 1000000
INFO @ Tue, 08 Oct 2013 11:53:33: 2000000
INFO @ Tue, 08 Oct 2013 11:53:38: 3000000
INFO @ Tue, 08 Oct 2013 11:53:44: 4000000
INFO @ Tue, 08 Oct 2013 11:53:49: 5000000
INFO @ Tue, 08 Oct 2013 11:53:55: 6000000
INFO @ Tue, 08 Oct 2013 11:54:00: 7000000
INFO @ Tue, 08 Oct 2013 11:54:05: 8000000
INFO @ Tue, 08 Oct 2013 11:54:11: 9000000
INFO @ Tue, 08 Oct 2013 11:54:16: 10000000
INFO @ Tue, 08 Oct 2013 11:54:21: 11000000
INFO @ Tue, 08 Oct 2013 11:54:27: 12000000
INFO @ Tue, 08 Oct 2013 11:54:32: 13000000
INFO @ Tue, 08 Oct 2013 11:54:37: 14000000
INFO @ Tue, 08 Oct 2013 11:54:43: 15000000
INFO @ Tue, 08 Oct 2013 11:54:48: 16000000
INFO @ Tue, 08 Oct 2013 11:54:53: 17000000
INFO @ Tue, 08 Oct 2013 11:54:59: 18000000
INFO @ Tue, 08 Oct 2013 11:55:04: 19000000
INFO @ Tue, 08 Oct 2013 11:55:09: 20000000
INFO @ Tue, 08 Oct 2013 11:55:14: 21000000
INFO @ Tue, 08 Oct 2013 11:55:20: 22000000
INFO @ Tue, 08 Oct 2013 11:55:25: 23000000
INFO @ Tue, 08 Oct 2013 11:55:30: 24000000
INFO @ Tue, 08 Oct 2013 11:55:36: 25000000
INFO @ Tue, 08 Oct 2013 11:55:41: 26000000
INFO @ Tue, 08 Oct 2013 11:55:46: 27000000
INFO @ Tue, 08 Oct 2013 11:55:51: 28000000
INFO @ Tue, 08 Oct 2013 11:55:57: 29000000
INFO @ Tue, 08 Oct 2013 11:56:02: 30000000
INFO @ Tue, 08 Oct 2013 11:56:07: 31000000
INFO @ Tue, 08 Oct 2013 11:56:13: 32000000
INFO @ Tue, 08 Oct 2013 11:56:18: 33000000
INFO @ Tue, 08 Oct 2013 11:56:23: 34000000
INFO @ Tue, 08 Oct 2013 11:56:29: 35000000
INFO @ Tue, 08 Oct 2013 11:56:34: 36000000
Traceback (most recent call last):
File "/usr/bin/macs2", line 514, in
main()
File "/usr/bin/macs2", line 45, in main
run( args )
File "/usr/lib64/python2.7/site-packages/MACS2/callpeak.py", line 69, in run
else: (treat, control) = load_tag_files_options (options)
File "/usr/lib64/python2.7/site-packages/MACS2/callpeak.py", line 379, in load_tag_files_options
treat = tp.build_fwtrack()
File "cParser.pyx", line 824, in MACS2.IO.cParser.BAMParser.build_fwtrack (MACS2/IO/cParser.c:10311)
File "cParser.pyx", line 828, in MACS2.IO.cParser.BAMParser.build_fwtrack (MACS2/IO/cParser.c:10262)
File "cParser.pyx", line 891, in MACS2.IO.cParser.BAMParser.__build_fwtrack_wo_pysam (MACS2/IO/cParser.c:11246)
File "cParser.pyx", line 1015, in MACS2.IO.cParser.BAMParser.__fw_binary_parse (MACS2/IO/cParser.c:13208)
OverflowError: value too large to convert to int

MACS2 vs MACS14

MACS2 and MACS14 give different results when using the same parameters.

Both versions are installed on Ubuntu 10.4
macs14 1.4.1 20110622
macs2 2.0.8 20110916 (tag:alpha)

The parameters used:
[macs14|macs2] -t Treatment_chr12.bam --keep-dup all --pvalue=1e-3 --nomodel -g 1e8 --name Treatment_nm_dup_p1e-3_g1e8

MACS2 calls fewer peaks than MACS14, and misses some very important ones, called by MACS14. --broad parameter does not help.
Why are the differences?

too many peaks

Dear Tao,

in contrast to issue 5 reported here i find macs2 calls more peaks than the original macs. I would say macs2 gives the most realistic looking peaks. However, for my faire-seq data it calls between 220k and 300k peaks with an FDR of 1% which I would say seems too many and its difficult to know where to draw the line. For faire-seq with so many peaks what is the best way of limiting the number to say 150k which is more in keeping with reported numbers. On the one hand one could simply arbitrarily sort on the fdr or p value and take the top 150k but is there a way of making macs2 less sensitive from the start? I don't have input control but i am comparing two conditions. interestingly if i use oe as the input for the other macs2 calls 5000 peaks whereas if one calls peaks independently as expected from 200k peaks 120k are shared and 80 unique. Do you regard using one condition as input as a valid analysis with macs2?

kind regards

Michael

MACS2 installation error

Hi,

I am having problem installing MACS2 on our system.
The software builds without error, but when I run macs2diff, following error message shows up.

File "/opt/apps/macs2/2.0.10/macs2diff", line 31, in
from MACS2.OptValidator import opt_validate_diff
File "/opt/apps/macs2/2.0.10/MACS2/OptValidator.py", line 27, in
from MACS2.IO.cParser import BEDParser, ELANDResultParser, ELANDMultiParser,
ImportError: No module named cParser

But if I list all the files on /opt/apps/macs2/2.0.10/MACS2/IO/
cParser.c, cParser.pyx seems to be there.
and if I list files in /opt/apps/macs2/2.0.10/lib/python2.7/site-packages/MACS2/IO which is in PYTHONPATH, cParser.so is there.

Can you tell me what seems to be an issue here?

Thanks,
Jawon

bedGraph version wignorm.

It will be another post-process script to combine two bedGraph output from MACS, calculate 'scores' for each data point, and call refined peaks using some threshold

--name option ignored when using -nolambda option (running macs14 on Ubuntu12.10)

Hi,

If I run "macs14 -t treatment_file_name.bed -c control_file_name.bed -f BED -n result_name_prefix -g mm -s 50 -m 10,30 --bw 180 -p 1e-4", the results files have, as expected, "result_name_prefix" as their prefix.

However, if I run the same command with the -nolambda option, the -n option is ignored and "olambda" is used instead.

This behavior is particularly annoying when running scripts with multiple combinations of input files, since each output will erase the previous one. Is there any way around in MACS (i.e. without rewriting the script launching MACS)?

Best,

Sébastien

Problem when using --trackline argument when control file absent

When macs2 is used without a control file and with the --trackline argument, the application crashes at the call peaks step (#3 step).

Version: macs2 2.1.0.20140616 (installed using python package)

Operating system: CentOS 6

Reproduce steps: Use this command line with a valid BAM file

macs2 callpeak -t VALID_BAM_FILE.bam  -f BAM --gsize 3.3e9 -q 0.01 -n output_file  --trackline -B --SPMR

Output example:

[user@ls32]$ macs2 callpeak -t HI.1943.008.Index_14.LCC9_CTRL_MED1_ChIP8_GB_rep1.bam  -f BAM --gsize 3.3e9 -q 0.01 -n output_file  --trackline -B --SPMR
INFO  @ Fri, 15 Aug 2014 10:49:25: 
# Command line: callpeak -t HI.1943.008.Index_14.LCC9_CTRL_MED1_ChIP8_GB_rep1.bam -f BAM --gsize 3.3e9 -q 0.01 -n output_file --trackline -B --SPMR
# ARGUMENTS LIST:
# name = output_file
# format = BAM
# ChIP-seq file = ['HI.1943.008.Index_14.LCC9_CTRL_MED1_ChIP8_GB_rep1.bam']
# control file = None
# effective genome size = 3.30e+09
# band width = 300
# model fold = [5, 50]
# qvalue cutoff = 1.00e-02
# Larger dataset will be scaled towards smaller dataset.
# Range for calculating regional lambda is: 10000 bps
# Broad region calling is off
# MACS will save fragment pileup signal per million reads
INFO  @ Fri, 15 Aug 2014 10:49:25: #1 read tag files... 
INFO  @ Fri, 15 Aug 2014 10:49:25: #1 read treatment tags... 
INFO  @ Fri, 15 Aug 2014 10:49:30:  1000000 
... 
INFO  @ Fri, 15 Aug 2014 10:51:12:  25000000 
INFO  @ Fri, 15 Aug 2014 10:51:13: #1 tag size is determined as 50 bps 
INFO  @ Fri, 15 Aug 2014 10:51:13: #1 tag size = 50 
INFO  @ Fri, 15 Aug 2014 10:51:13: #1  total tags in treatment: 22592117 
INFO  @ Fri, 15 Aug 2014 10:51:13: #1 user defined the maximum tags... 
INFO  @ Fri, 15 Aug 2014 10:51:13: #1 filter out redundant tags at the same location and the same strand by allowing at most 1 tag(s) 
INFO  @ Fri, 15 Aug 2014 10:51:16: #1  tags after filtering in treatment: 21510983 
INFO  @ Fri, 15 Aug 2014 10:51:16: #1  Redundant rate of treatment: 0.05 
INFO  @ Fri, 15 Aug 2014 10:51:16: #1 finished! 
INFO  @ Fri, 15 Aug 2014 10:51:16: #2 Build Peak Model... 
INFO  @ Fri, 15 Aug 2014 10:51:24: #2 number of paired peaks: 37367 
INFO  @ Fri, 15 Aug 2014 10:51:24: start model_add_line... 
INFO  @ Fri, 15 Aug 2014 10:51:52: start X-correlation... 
INFO  @ Fri, 15 Aug 2014 10:51:52: end of X-cor 
INFO  @ Fri, 15 Aug 2014 10:51:52: #2 finished! 
INFO  @ Fri, 15 Aug 2014 10:51:52: #2 predicted fragment length is 165 bps 
INFO  @ Fri, 15 Aug 2014 10:51:52: #2 alternative fragment length(s) may be 165 bps 
INFO  @ Fri, 15 Aug 2014 10:51:52: #2.2 Generate R script for model : output_file_model.r 
INFO  @ Fri, 15 Aug 2014 10:51:52: #3 Call peaks... 
Traceback (most recent call last):
  File "/lustredirect/commonSoftware/python/python-2.7.8/bin/macs2", line 559, in <module>
    main()
  File "/lustredirect/commonSoftware/python/python-2.7.8/bin/macs2", line 56, in main
    run( args )
  File "/lustredirect/commonSoftware/python/python-2.7.8/lib/python2.7/site-packages/MACS2/callpeak.py", line 261, in run
    peakdetect.call_peaks()
  File "cPeakDetect.pyx", line 110, in MACS2.cPeakDetect.PeakDetect.call_peaks (MACS2/cPeakDetect.c:1714)
  File "cPeakDetect.pyx", line 334, in MACS2.cPeakDetect.PeakDetect.__call_peaks_wo_control (MACS2/cPeakDetect.c:3939)
AttributeError: 'NoneType' object has no attribute 'enable_trackline'

does not compile/install

I downloaded commit dd902d9 and I was unable to install it.

thocking@silene:~/projects/chip-seq/MACS$ python setup.py install
/usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'install_requires'
  warnings.warn(msg)
running install
running build
running build_py
creating build
creating build/lib.linux-x86_64-2.7
creating build/lib.linux-x86_64-2.7/MACS2
copying MACS2/Constants.py -> build/lib.linux-x86_64-2.7/MACS2
copying MACS2/bdgbroadcall.py -> build/lib.linux-x86_64-2.7/MACS2
copying MACS2/OptValidator.py -> build/lib.linux-x86_64-2.7/MACS2
copying MACS2/filterdup.py -> build/lib.linux-x86_64-2.7/MACS2
copying MACS2/predictd.py -> build/lib.linux-x86_64-2.7/MACS2
copying MACS2/pileup.py -> build/lib.linux-x86_64-2.7/MACS2
copying MACS2/refinepeak.py -> build/lib.linux-x86_64-2.7/MACS2
copying MACS2/diffpeak.py -> build/lib.linux-x86_64-2.7/MACS2
copying MACS2/bdgpeakcall.py -> build/lib.linux-x86_64-2.7/MACS2
copying MACS2/bdgcmp.py -> build/lib.linux-x86_64-2.7/MACS2
copying MACS2/__init__.py -> build/lib.linux-x86_64-2.7/MACS2
copying MACS2/randsample.py -> build/lib.linux-x86_64-2.7/MACS2
copying MACS2/bdgdiff.py -> build/lib.linux-x86_64-2.7/MACS2
copying MACS2/callpeak.py -> build/lib.linux-x86_64-2.7/MACS2
copying MACS2/OutputWriter.py -> build/lib.linux-x86_64-2.7/MACS2
copying MACS2/cStat.py -> build/lib.linux-x86_64-2.7/MACS2
creating build/lib.linux-x86_64-2.7/MACS2/IO
copying MACS2/IO/WiggleIO.py -> build/lib.linux-x86_64-2.7/MACS2/IO
copying MACS2/IO/BinKeeper.py -> build/lib.linux-x86_64-2.7/MACS2/IO
copying MACS2/IO/__init__.py -> build/lib.linux-x86_64-2.7/MACS2/IO
running build_ext
building 'MACS2.cProb' extension
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/MACS2
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/python2.7 -c MACS2/cProb.c -o build/temp.linux-x86_64-2.7/MACS2/cProb.o
gcc: error: MACS2/cProb.c: No such file or directory
gcc: fatal error: no input files
compilation terminated.
error: command 'gcc' failed with exit status 4
thocking@silene:~/projects/chip-seq/MACS$ python setup.py install
/usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'install_requires'
  warnings.warn(msg)
running install
running build
running build_py
running build_ext
building 'MACS2.cProb' extension
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/python2.7 -c MACS2/cProb.c -o build/temp.linux-x86_64-2.7/MACS2/cProb.o
gcc: error: MACS2/cProb.c: No such file or directory
gcc: fatal error: no input files
compilation terminated.
error: command 'gcc' failed with exit status 4

A small bug maybe caused by a new option--broad?

I encounter a bug after updating the new version of macs2.

[linxq@localhost ~/hmc_Tet1/raw_data/Tet1_hmc]$ macs2 --version
macs2 2.0.8 20110830 (tag:alpha)

[linxq@localhost ~/hmc_Tet1/raw_data/Tet1_hmc]$ macs2 -t Tet1_mES.bed -c Tet1_mES_input.bed -n Tet1 -f BED -g hs
Traceback (most recent call last):
File "/opt/bin/macs2", line 354, in
main()
File "/opt/bin/macs2", line 49, in main
options = opt_validate(prepare_optparser())
File "/opt/bin/lib/python2.6/site-packages/MACS2/OptValidator.py", line 107, in opt_validate
if options.broad:
AttributeError: Values instance has no attribute 'broad'

Error within GZip

There is an error within the Gzip module during peakcalling, it is possible my data is corrupted, but samtools converts it from bam to sam and back with no issues. With a quick google search the problem seems to be in the gzip module, zlib I believe is the preferred python module for this kind of thing. If I could help in any other way please let me know.

Traceback (most recent call last):
File "/home/pi/hillst/bin/MACS/bin/macs2", line 557, in
main()
File "/home/pi/hillst/bin/MACS/bin/macs2", line 56, in main
run( args )
File "/home/pi/hillst/.local/lib/python2.7/site-packages/MACS2/callpeak.py", line 70, in run
else: (treat, control) = load_tag_files_options (options)
File "/home/pi/hillst/.local/lib/python2.7/site-packages/MACS2/callpeak.py", line 394, in load_tag_files_optio
ns
control = options.parser(options.cfile[0], buffer_size=options.buffer_size).build_fwtrack()
File "cParser.pyx", line 824, in MACS2.IO.cParser.BAMParser.build_fwtrack (MACS2/IO/cParser.c:10311)
File "cParser.pyx", line 828, in MACS2.IO.cParser.BAMParser.build_fwtrack (MACS2/IO/cParser.c:10262)
File "cParser.pyx", line 891, in MACS2.IO.cParser.BAMParser.__build_fwtrack_wo_pysam (MACS2/IO/cParser.c:11243
)
File "/local/cluster/lib/python2.7/gzip.py", line 261, in read
self._read(readsize)
File "/local/cluster/lib/python2.7/gzip.py", line 308, in _read
self._read_eof()
File "/local/cluster/lib/python2.7/gzip.py", line 347, in _read_eof
hex(self.crc)))
IOError: CRC check failed 0xeddeb258 != 0x498a3873L

Manually filtering by q value gives more results than -q float

When I run macs using:

macs2 callpeak -t bHLHb5_bHLHb5_WT_B1.bam -c bHLHb5_bHLHb5_KO_B1.bam -f BAM -g mm -n p05 -p 0.05

to generate a large number of results (as multiple testing is ignored), then manually sort the results and filter out any with q > 0.05, I get 848 significant peaks. However if I just run:

macs2 callpeak -t bHLHb5_bHLHb5_WT_B1.bam -c bHLHb5_bHLHb5_KO_B1.bam -f BAM -g mm -n q05 -q 0.05

I only get 77 significant peaks. Should these methods be equivalent?

Add Tag for latest commit c217f354fe13d86ac8b183cb0a02095b29969590

installation

I'm having difficulty with installation. I install to my own personal directory. I also put

setenv PYTHONPATH /opt/python/lib/python2.7

in my .cshrc.

However, when I run macs2, I get the following error

Traceback (most recent call last):
File "./macs2", line 362, in
main()
File "./macs2", line 44, in main
from MACS2.callpeak import run
ImportError: No module named callpeak

Thanks.

chip-exo data

Hi,
I want to try MACS on chip-exo data. This kind of data differs from Chip-seq data in the sense that the peaks are sharper and more concentrated. And the distance between the w-peak and c-peak are smaller. Do you have any suggestions on how to tune the parameters for these data? Thanks.

Fan

incorrect reported pileup height reported

Perhaps this is related to issue 12, noninteger pileup height...

I have a set of peak calls from the following command (CentOS6 / Python 2.7.2):

macs2 callpeak -t IP.bam -c input.bam -g mm -p 0.05 --nomodel

And in the resulting peaks.xls file, the pileup heights are, almost without exception, higher than the actual pileup heights from the bam file.

Using the reported peak chrom, start, and end, I assay bam pileup heights with the following command:

samtools depth -r chrom:start-end IP.bam | cut -f3 | sort | uniq -c

And the maximum is always lower than the macs2 pileup value. This is especially problematic because I am using pileup heights to filter peaks, and if I want peaks with height >= 10 then macs2 should not tell me it is 10 when in fact it is 4, or 15 when it is 9, or 22 when 12, etc.

Also, I am testing the original bam, and macs2 supposedly removes duplicates, so the disparities are probably even worse.

So how are the pileup values generated, and why are they not the IP height?

Thanks,
Ariel

bdgdiff for present or absent?

bdgdiff is good for identify up or down regulated peaks. I am also interested in present or absent peaks. According to Call differential binding events page in the wiki,

A basic requirement is that this region should be at least enriched in either condition.

It looks like bdgdiff cannot identify present or absent peaks. Is it true? Do you have any suggestion for identifying present or absent peaks?

Thanks!

Are output bdg files normalized for library size

I've read that the larger library is scaled to the smaller library.
Do the bdg files (pileup and lamda) reflect this scaling?

I have samples from different conditions and I would like to show the graphs side by side but the library sizes were quite different. Each replicate has an IP and an input file, so each replicate was run through MACS independently.
My understanding is that each pileup can be considered to be normalized for library size in comparison to the corresponding lambda, but not between different IP samples.... is that correct?

How to run paired end samples?

I found in some google group that use BAMPE option for paired end samples. But it is throwing following error

INFO @ Thu, 11 Sep 2014 11:51:38:

Command line: callpeak -t SRR391645-Paired-treatment-Fully_Aligned_Reads-Filtered_On_Chr1.sam -f BAMPE

ARGUMENTS LIST:

name = NA

format = BAMPE

ChIP-seq file = ['SRR391645-Paired-treatment-Fully_Aligned_Reads-Filtered_On_Chr1.sam']

control file = None

effective genome size = 2.70e+09

band width = 300

model fold = [5, 50]

qvalue cutoff = 5.00e-02

Larger dataset will be scaled towards smaller dataset.

Range for calculating regional lambda is: 10000 bps

Broad region calling is off

INFO @ Thu, 11 Sep 2014 11:51:38: #1 read fragment files...
INFO @ Thu, 11 Sep 2014 11:51:38: #1 read treatment fragments...
Traceback (most recent call last):
File "/usr/local/bin/macs2", line 559, in
main()
File "/usr/local/bin/macs2", line 56, in main
run( args )
File "/usr/local/lib/python2.7/dist-packages/MACS2/callpeak.py", line 69, in run
if options.PE_MODE: (treat, control) = load_frag_files_options (options)
File "/usr/local/lib/python2.7/dist-packages/MACS2/callpeak.py", line 362, in load_frag_files_options
treat = tp.build_petrack()
File "cParser.pyx", line 1060, in MACS2.IO.cParser.BAMPEParser.build_petrack (MACS2/IO/cParser.c:13617)
File "cParser.pyx", line 1064, in MACS2.IO.cParser.BAMPEParser.build_petrack (MACS2/IO/cParser.c:13568)
File "cParser.pyx", line 1085, in MACS2.IO.cParser.BAMPEParser.__build_petrack_wo_pysam (MACS2/IO/cParser.c:13774)
File "cParser.pyx", line 776, in MACS2.IO.cParser.BAMParser.get_references (MACS2/IO/cParser.c:9834)
File "cParser.pyx", line 815, in MACS2.IO.cParser.BAMParser.__get_references_wo_pysam (MACS2/IO/cParser.c:10216)
struct.error: unpack requires a string argument of length 4

Fix "/usr/lib/python2.7/distutils/dist.py:267 UserWarning: Unknown distribution option: 'install_requires'"

As mentioned on http://pythonhosted.org/distribute/, distribute is deprecated:

Distribute is a deprecated fork of the Setuptools project.

Since the Setuptools 0.7 release, Setuptools and Distribute have merged and
Distribute is no longer being maintained. All ongoing effort should reference the
Setuptools project and the Setuptools documentation.

By using setuptools instead of distutils, the "/usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'install_requires'" warning disappears:

$ python setup.py build
/usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'install_requires'
  warnings.warn(msg)
running build
running build_py
running build_ext
building 'MACS2.cProb' extension
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/MACS2
...

$ git diff setup.py
diff --git a/setup.py b/setup.py
index 3044be4..714d3ac 100644
--- a/setup.py
+++ b/setup.py
@@ -19,7 +19,7 @@ the distribution).

 import os
 import sys
-from distutils.core import setup, Extension
+from setuptools import setup, Extension

 # Use build_ext from Cython if found
 command_classes = {}

bdgdiff error: has no attribute 'empirical_distr_llr'

We are comparing two files with the command:

macs2 bdgdiff --t1 rep1_treat_pileup.bdg --t2 rep2_treat_pileup.bdg --c1 rep1_control_lambda.bdg --c2 rep2_control_lambda.bdg --o-prefix REP1_REP2

We get this error message:

Traceback (most recent call last):
  File "/home/dave/bin/macs2", line 559, in <module>
    main()
  File "/home/dave/bin/macs2", line 84, in main
    run( args )
  File "/home/dave/lib/python2.7/site-packages/MACS2/bdgdiff.py", line 92, in run
    depth2 )
  File "cScoreTrack.pyx", line 1415, in MACS2.IO.cScoreTrack.TwoConditionScores.__init__ (MACS2/IO/cScoreTrack.c:20454)
AttributeError: 'MACS2.IO.cScoreTrack.TwoConditionScores' object has no attribute 'empirical_distr_llr'

Is this a bug or is there something wrong with our input? They were generated from ENCODE samples using callpeak:

macs2 callpeak -t rep1.sam -f SAM -g hs -n rep1 -B -q 0.05
macs2 callpeak -t rep2.sam -f SAM -g hs -n rep2 -B -q 0.05

This was also observed for the data in the test folder using test.sh