jinlabbioinfo / hicorr Goto Github PK

HiCorr: a Hi-C data bias-correction pipeline

Shell 20.34% Perl 63.14% Python 13.70% R 2.83%

hicorr's Introduction

HiCorr

HiCorr is a pipeline designed to do bias-correction and visualization for multi-platform Hi-C(in-situ Hi-C, Arima, micro-C). HiCorr focuses on the mapping of chromatin interactions at high-resolution, especially the sub-TAD ~5kb resolution enhancer-promoter interactions, which requires more rigorous bias-correction. It needs to be run in an unix/linux environment. Currently it includes reference files of genome build hg19 and mm10, the reference files for other genome build will be provided upon request, please contact Shanshan Zhang([email protected]) or Fulai Jin([email protected]). For a noise-free and enhanced signal, please check DeepLoop we recently developed.

Setup

git clone https://github.com/shanshan950/HiCorr.git
cd HiCorr/
chmod 755 HiCorr
chmod -R 755 bin/*

Gateway for different Hi-C data type:

Each section descibes reference file downloading, preprocessing (mapping and fragment filteration), and how to run HiCorr.

👉 HiCorr on micro-C (beta version)

👉 HiCorr on Arima (beta version)

👉 HiCorr on HindIII enzyme Hi-C

👉 HiCorr on eHi-C

👉 HiCorr on in-situ Hi-C or DPNII/Mbol enzyme Hi-C

👉 Visualize HiCorr contact heatmaps

👉 Compatible with HiCPro valid pairs

👉 Generate reference files for HiCorr

👀 40 Processed Hi-C datasets by HiCorr and DeepLoop can be visualized in website

Citation:

Lu,L. et al. Robust Hi-C Maps of Enhancer-Promoter Interactions Reveal the Function of Non-coding Genome in Neural Development and Diseases. Molecular Cell; doi: https://doi.org/10.1016/j.molcel.2020.06.007

hicorr's People

Stargazers

Watchers

Forkers

xliu10 artemluzhin lichenbiostat skelviper shanshan950 zhaoxu-gao wanyingx

hicorr's Issues

Some trouble when converting HiCPro-format allValidPairs by reads_2_cis_frag_loop.pl

Dear author,

I got some trouble when converting HiCPro-format allValidPairs by reads_2_cis_frag_loop.pl.

My command line:

$ perl /work/home/algroup01/username/DH/2023_12_10/9.HiCorr/HiCorr-master/bin/Arima/reads_2_cis_frag_loop.pl /work/home/algroup01/username/DH/2023_12_10/9.HiCorr/DPNII_HiCorr_ref/mm10.DPNII.frag_2.bed 50 SRR3680538.loop.inward SRR3680538.loop.outward SRR3680538.loop.samestrand summary.frag_loop.read_count SRR3680538 SRR3680538.allValidPairs_2

My parameters:

frag_bed: /work/home/algroup01/username/DH/2023_12_10/9.HiCorr/DPNII_HiCorr_ref/mm10.DPNII.frag_2.bed
read_length: 150
inward_outfile: SRR3680538.loop.inward
output_outfile: SRR3680538.loop.outward
outward_outfile: SRR3680538.loop.samestrand
samestrand_outfile: summary.frag_loop.read_count
expt: SRR3680538
reads_file: SRR3680538.allValidPairs_2

SRR3680538.allValidPairs_2 was created by the command line:
$ cat SRR3680538.allValidPairs | cut -f 2-7 > SRR3680538.allValidPairs_2
$ head SRR3680538.allValidPairs_2
1 3050008 + 4 149985221 -
1 3050043 + 1 3051989 -
1 3050115 + MT 1865 +
1 3050142 + 1 8140199 -
1 3050150 + 13 89162868 -
1 3050154 + 1 3064752 +
1 3050157 + Y 1200591 -
1 3050159 - 1 3052276 -
1 3050170 - 18 65410636 +
1 3050175 + 1 3801904 +

$ head mm10.DPNII.frag_2.bed
chr1 1 3000192 frag_1 3000192
chr1 3000193 3000814 frag_2 622
chr1 3000815 3001049 frag_3 235
chr1 3001050 3001120 frag_4 71
chr1 3001121 3001796 frag_5 676
chr1 3001797 3003210 frag_6 1414
chr1 3003211 3003264 frag_7 54
chr1 3003265 3003351 frag_8 87
chr1 3003352 3003414 frag_9 63
chr1 3003415 3003577 frag_10 163

My results:

I successfully got four output files: summary.frag_loop.read_count, SRR3680538.loop.inward, SRR3680538.loop.outward and SRR3680538.loop.samestrand.
However, the results are strange.

The summary.frag_loop.read_count:
SRR3680538 4535864 14307113.5 7135233.5 0 0 0

There's nothing in others three output files (SRR3680538.loop.inward, SRR3680538.loop.outward and SRR3680538.loop.samestrand)

I feel confused. Maybe I used wrong parameter such as "read_length"? I'm not sure about its exactly meaning.

Thank you for your help.

HiCorr/bin/preprocess/pairing_two_SAM_reads.pl: No such file or directory

I meet some errors when runing documents/DPNII_preprocessing.sh
Here is the error：
XXX/HiCorr/bin/preprocess/pairing_two_SAM_reads.pl: No such file or directory
XXX/HiCorr/bin/preprocess/remove_dup_PE_SAM_sorted.pl: No such file or directory

Besides, I am not clear about the “$” in front of “bowtie”, I deleted them when I ran it, do you think it is right

1. mapping, take 50bp for mapping

cat $fq1 | $lib/reformat_fastq.py 1 50 | $bowtie -v 3 -m 1 --best --strata --time -p 10 --sam $hg19 - > $name.R1.sam &
cat $fq2 | $lib/reformat_fastq.py 1 50 | $bowtie -v 3 -m 1 --best --strata --time -p 10 --sam $hg19 - > $name.R2.sam &

Apply to DNase Hi-C

Hi,

HiCorr seems like a wonderful tool for chromatin loop calling. And whether it is possible to apply to DNase Hi-C or Micro-C ?

Error with Arima and HiC Pro

I'm really interested in applying HiCorr/DeepLoop to my HiC data set as they are low-input HiC samples. I have used the Arima 4-enzyme low input kit and processed my fastq files using HiC-Pro. Using the workflow, I've converted the validpairs.gz file using validPairs_2_fragloop.md with the mm10.Arima.frag.bed file. When I use HiCorr, I get anchor_2_anchor files for each chromosome, but the expected and observed values are both 0 for each loop. I have attached my output error below:
hicorrOutputError.txt

Do you have any suggestions of where I might be going wrong?

Input:
hicorr.txt

reference genome issue

Hi，I see you only provide mm10&hg19 reference genome build. if I want to use HiCorr on other species, would that do? are there any special requirements for the reference genome build?

10K resolution

how can i found 10k resolution ref

Arima HiC support?

Hi,
Does HiCorr can support Arima HiC? If so, how to use it? THX!

Missing parameter in reads_2_cis_frag_loop.pl

Hi team,
Just wanted to bring a small detail to your attention - while converting HiCPro-format allValidPairs, I noticed that the script reads_2_cis_frag_loop.pl is missing a parameter for the count summary file at index position 5 in the list on line 7. This prevents arguments after this to be incorrectly assigned.

Thanks for your attention

call loop with micro-C

Hi, I am trying this Hicorr pipeline on the Micro-C dataset to explore cis-regulatory element interactions like E-P, E-E ... because it seems that hicorr out-performs many other algorithms when correcting the bias in short range. However, in the beta version MicroC pipeline on hicorr, there is no p value calculated for each pair of anchor contact, thus I cannot determine the dummy number for ratio calculation, according to your method in the hicorr paper. I am wondering if there is another way of calculation or I misunderstand the output of hicorr? Could you please give me some instructions? Thank you very much!