HiCorr is a pipeline designed to do bias-correction and visualization of Hi-C/eHi-C data. HiCorr focuses on the mapping of chromatin interactions at high-resolution, especially the sub-TAD enhancer-promoter interactions, which requires more rigorous bias-correction, especially the correction of distance biases. It needs to be run in an unix/linux environment. Currently it includes reference files of genome build hg19 and mm10.
If you use HiCorr, please site:
Lu,L. et al. Robust Hi-C Maps of Enhancer-Promoter Interactions Reveal the Function of Non-coding Genome in Neural Development and Diseases. Molecular Cell; doi: https://doi.org/10.1016/j.molcel.2020.06.007
For any question about HiCorr, please contact [email protected]
git clone https://github.com/shanshan950/HiCorr.git
cd HiCorr/
chmod 755 HiCorr
chmod -R 755 bin/*
After you run the following commands, you will see "ref/" in the current directory. There are 4 subdirectories under "ref/": "DPNII/ eHiC/ eHiC-QC/ HindIII".
In each subdirectory, there are reference files for genome build hg19 and mm10.
More descriptions for the reference files.
wget http://hiview.case.edu/ssz20/tmp.HiCorr.ref/HiCorr.tar.gz # download reference files
# It needs ~103G space after decompress
tar -xvf HiCorr.tar.gz
ls
ls ref/
In HiCorr file, you can manually replace the "PATH_TO_REF" with the path to your directory "ref", Replace "PATH_TO_BIN" with the path to your directory "bin" Or use the command below:
new_bin=`pwd`"/bin"
new_ref=`pwd`"/ref"
sed -i "s|PATH_TO_REF|${new_ref}|" HiCorr
sed -i "s|PATH_TO_BIN|${new_bin}|" HiCorr
Usage:
./HiCorr <mode> <parameters>
HiCorr has different modes: Bam-process-HindIII, Bam-process-DPNII, HindIII, DPNII, eHiC-QC, eHiC and Heatmap
Bam-process mode takes a sorted bam file as input, processes and generates two files as outputs. The two output files are the required input files when using the HiCorr HindIII mode. The two output files are intra-chromosome looping fragment-pair file and inter-chromosome looping fragment-pair file.
This mode currently is only able to process bam file of HindIII Hi-C data.
To run the Bam-process mode, you need 6 arguments:
./HiCorr Bam-process-HindIII <bam_file> <name_of_your_data> <mapped_read_length_in_your_bam_file> <genome> HindIII
./HiCorr Bam-process-DPNII <bam_file> <name_of_your_data> <mapped_read_length_in_your_bam_file> <genome> DPNII
More details about the preprocessing (fastq to bam files to fragment loops) are here
HindIII corrects bias of HindIII Hi-C data. It takes two fragment-pair files as input and outputs an anchor_pair file.
- The two input files: one file contains intra-chromosome looping fragment pairs(cis pairs), and another contains inter-chromosome looping fragment pairs(trans pairs).
- Intra-chromosome looping pairs need to have 4 tab-delimited columns, in the following format:
See sample file here: http://hiview.case.edu/test/sample/frag_loop.IMR90.cis.samplefrag_id_1 frag_id_2 observed_reads_count distance_between_two_fragments - Inter-chromosome looping piars need to have 3 tab-delimited columns, in the following format:
See sample file here: http://hiview.case.edu/test/sample/frag_loop.IMR90.trans.samplefrag_id_1 frag_id_2 observed_reads_count - These two files needs to be sorted before you run the pipeline (sort -k1 -k2).
- If you do not know how to generate these two files, please take a look at our bam-process mode.
- Intra-chromosome looping pairs need to have 4 tab-delimited columns, in the following format:
- The final result of HindIII mode is an anchor-to-anchor looping pairs file, which has 5 columns:
See sample file here: http://hiview.case.edu/test/sample/anchor_2_anchor.loop.IMR90.p_val.sampleanchor_id_1 anchor_id_2 obserced_reads_count expected_reads_count p_value_
To run the HindIII mode:
./HiCorr HindIII <cis_loop_file> <trans_loop_file> <name_of_your_data> <reference_genome> [options]
- The format of the two input files are the same as HindIII
To run the DpNII/Mbol mode:
./HiCorr DPNII <cis_loop_file> <trans_loop_file> <name_of_your_data> <reference_genome> [options]
eHiC-QC mode takes a pair of fastq.gz files as input, aligns and processes eHiC reads, outputs fragment-end-pair files for further analysis. This mode also outputs summarize numbers which works as quality check fo eHiC experiments.
Make sure to name your fastq.gz files as .R1.fastq.gz and .R1.fastq.gz.
You need to have Bowtie(http://bowtie-bio.sourceforge.net/index.shtml) and samtools(http://www.htslib.org/) installed since HiCorr calls Bowtie to do alignments.
You also need Bowtie index and fa.fai file.
To run the eHiC-QC mode, you need 4 arguments:
./HiCorr eHiC-QC <bowtie_index> <fa.fai> <name>
eHiC mode corrects bias of eHi-C data. It takes two fragment-end-pair files as input (use HiCorr's eHiC-QC mode if you need to generate these files) and outputs an anchor_pair file.
- The two input files: one file contains intra-chromosome looping fragment-end pairs(cis pairs), and another contains inter-chromosome looping fragment-end pairs(trans pairs).
- Intra-chromosome looping pairs need to have 4 tab-delimited columns, in the following format:
See sample file here:frag_end_id_1 frag_end_id_2 observed_reads_count distance_between_two_fragments - Inter-chromosome looping piars need to have 3 tab-delimited columns, in the following format:
See sample file here:frag_end_id_1 frag_end_id_2 observed_reads_count - These two files needs to be sorted before you run the pipeline (sort -k1 -k2).
- Intra-chromosome looping pairs need to have 4 tab-delimited columns, in the following format:
- The final result of HindIII mode is an anchor-to-anchor looping pairs file, which has 5 columns:
See sample file here: http://hiview.case.edu/test/sample/anchor_2_anchor.loop.IMR90.p_val.sampleanchor_id_1 anchor_id_2 obserced_reads_count expected_reads_count p_value_
To run the eHiC mode:
./HiCorr eHiC <cis_loop_file> <trans_loop_file> <name_of_your_data> <reference_genome>
This test dataset is Adrenal Hi-C.(restriction enzyme: HindIII; genome build:hg19) from GSE87112.
wget http://hiview.case.edu/ssz20/tmp.HiCorr.ref/HiCorr_test_data/frag_loop.Adrenal.cis.gz # cis fragment loop
wget http://hiview.case.edu/ssz20/tmp.HiCorr.ref/HiCorr_test_data/frag_loop.Adrenal.trans.gz # trans fragment loop
gunzip frag_loop.Adrenal.cis.gz
gunzip frag_loop.Adrenal.trans.gz
./HiCorr HindIII frag_loop.Adrenal.cis frag_loop.Adrenal.trans Adrenal hg19
../HiCorr Heatmap chr1 119457772 120457772 HiCorr_output/anchor_2_anchor.loop.chr1 hg19 HindIII # plot Adrenal heatmap
This test dataset is subsampled bam file for H9 rep1 Hi-C.(restriction enzyme: HindIII; genome build:hg19) from GSE130711.
wget http://hiview.case.edu/ssz20/tmp.HiCorr.ref/HiCorr_test_data/H9_rep1.subsample.sorted.bam
./HiCorr Bam-process-HindIII H9_rep1.subsample.sorted.bam H9_rep1.subsample 36 hg19 HindIII
You will found "H9_rep1.subsample.cis.frag_loop" and "H9_rep1.subsample.trans.frag_loop", the other files are intermediate files.
Next run HiCorr bias correction using two *frag_loop files.
./HiCorr HindIII H9_rep1.subsample.cis.frag_loop H9_rep1.subsample.trans.frag_loop H9_rep1.subsample hg19 # It take a few hours to run
This test dataset is subsampled bam file for H1 Bio1Tech1Ind2 in-situ Hi-C.(restriction enzyme: DPNII; genome build:hg19) from 4DNES2M5JIGV.
wget http://hiview.case.edu/ssz20/tmp.HiCorr.ref/HiCorr_test_data/4DNES2M5JIGV.Bio1Tech1Ind2.subsample.sorted.bam
./HiCorr Bam-process-DpNII 4DNES2M5JIGV.Bio1Tech1Ind2.subsample.sorted.bam 4DNES2M5JIGV.Bio1Tech1Ind2.subsample 50 hg19 DPNII
You will found "4DNES2M5JIGV.Bio1Tech1Ind2.subsample.cis.frag_loop" and "4DNES2M5JIGV.Bio1Tech1Ind2.subsample.trans.frag_loop", the other files are intermediate files.
Next run HiCorr bias correction using two *frag_loop files.
./HiCorr DPNII 4DNES2M5JIGV.Bio1Tech1Ind2.subsample.cis.frag_loop 4DNES2M5JIGV.Bio1Tech1Ind2.subsample.trans.frag_loop 4DNES2M5JIGV.Bio1Tech1Ind2.subsample hg19 # It take a few hours to run
Heatmap mode generates Hi-C heatmaps of a certain region you choosed(up to 2,000,000bp). This mode need to be run after either HindIII mode or eHiC mode, since it takes an anchor-to-anchor looping-pair file as input.
To run the Heatmap mode:
./HiCorr Heatmap <chr> <start> <end> <anchor_loop_file> <reference_genome> <enzyme> [option]
Example run:
wget http://hiview.case.edu/ssz20/tmp.HiCorr.ref/HiCorr_test_data/HiCorr_output.tar.gz
tar -xvf HiCorr_output.tar.gz
ls
ls HiCorr_output
./HiCorr Heatmap chr11 130000000 130800000 HiCorr_output/anchor_2_anchor.loop.chr11 hg19 HindIII
You will see three png files named as "hg19.HindIII.chr11_130000000_130800000.raw.matrix.png", "hg19.HindIII.chr11_130000000_130800000.expt.matrix.png" and "hg19.HindIII.chr11_130000000_130800000.ratio.matrix.png"
- Default
By defult, heatmap mode will generates 3 heatmaps for the region you entered: a raw heatmap of observed reads, a heatmap of expected reads, and a heatmap of bias-corrected reads(as a ratio of observeds reads over expected reads). If you want all 3 of these heatmaps, leave the option as blank. - -raw
Only generates a raw heatmap of observed reads - -expected
Only generates a heatmap of expected reads - -ratio
Only generates a bias-corrected heatmap
We developed DeepLoop to remove noise and enhance signals from low-depth Hi-C data, See more details in https://github.com/JinLabBioinfo/DeepLoop