Git Product home page Git Product logo

hicorr's Introduction

HiCorr

HiCorr is a pipeline designed to do bias-correction and visualization of Hi-C/eHi-C data. HiCorr focuses on the mapping of chromatin interactions at high-resolution, especially the sub-TAD enhancer-promoter interactions, which requires more rigorous bias-correction, especially the correction of distance biases. It needs to be run in an unix/linux environment. Currently it includes reference files of genome build hg19 and mm10.

If you use HiCorr, please site:
Lu,L. et al. Robust Hi-C Maps of Enhancer-Promoter Interactions Reveal the Function of Non-coding Genome in Neural Development and Diseases. Molecular Cell; doi: https://doi.org/10.1016/j.molcel.2020.06.007

For any question about HiCorr, please contact [email protected]

How to setup

Download the code

git clone https://github.com/shanshan950/HiCorr.git
cd HiCorr/
chmod 755 HiCorr
chmod -R 755 bin/*

Download reference files

After you run the following commands, you will see "ref/" in the current directory. There are 4 subdirectories under "ref/": "DPNII/ eHiC/ eHiC-QC/ HindIII". In each subdirectory, there are reference files for genome build hg19 and mm10.
More descriptions for the reference files.

wget http://hiview.case.edu/ssz20/tmp.HiCorr.ref/HiCorr.tar.gz # download reference files 
# It needs ~103G space after decompress
tar -xvf HiCorr.tar.gz 
ls
ls ref/

Change variables ref and bin in HiCorr file

In HiCorr file, you can manually replace the "PATH_TO_REF" with the path to your directory "ref", Replace "PATH_TO_BIN" with the path to your directory "bin" Or use the command below:

new_bin=`pwd`"/bin" 
new_ref=`pwd`"/ref" 
sed -i "s|PATH_TO_REF|${new_ref}|" HiCorr
sed -i "s|PATH_TO_BIN|${new_bin}|" HiCorr

Run HiCorr

Usage:
./HiCorr <mode> <parameters>

HiCorr has different modes: Bam-process-HindIII, Bam-process-DPNII, HindIII, DPNII, eHiC-QC, eHiC and Heatmap

Bam-process

Bam-process mode takes a sorted bam file as input, processes and generates two files as outputs. The two output files are the required input files when using the HiCorr HindIII mode. The two output files are intra-chromosome looping fragment-pair file and inter-chromosome looping fragment-pair file.
This mode currently is only able to process bam file of HindIII Hi-C data.
To run the Bam-process mode, you need 6 arguments:

./HiCorr Bam-process-HindIII <bam_file> <name_of_your_data> <mapped_read_length_in_your_bam_file> <genome> HindIII

./HiCorr Bam-process-DPNII <bam_file> <name_of_your_data> <mapped_read_length_in_your_bam_file> <genome> DPNII

More details about the preprocessing (fastq to bam files to fragment loops) are here

HindIII

HindIII corrects bias of HindIII Hi-C data. It takes two fragment-pair files as input and outputs an anchor_pair file.

  • The two input files: one file contains intra-chromosome looping fragment pairs(cis pairs), and another contains inter-chromosome looping fragment pairs(trans pairs).
    • Intra-chromosome looping pairs need to have 4 tab-delimited columns, in the following format:
      frag_id_1 frag_id_2 observed_reads_count distance_between_two_fragments
      See sample file here: http://hiview.case.edu/test/sample/frag_loop.IMR90.cis.sample
    • Inter-chromosome looping piars need to have 3 tab-delimited columns, in the following format:
      frag_id_1 frag_id_2 observed_reads_count
      See sample file here: http://hiview.case.edu/test/sample/frag_loop.IMR90.trans.sample
    • These two files needs to be sorted before you run the pipeline (sort -k1 -k2).
    • If you do not know how to generate these two files, please take a look at our bam-process mode.
  • The final result of HindIII mode is an anchor-to-anchor looping pairs file, which has 5 columns:
    anchor_id_1anchor_id_2 obserced_reads_count expected_reads_count p_value_
    See sample file here: http://hiview.case.edu/test/sample/anchor_2_anchor.loop.IMR90.p_val.sample

To run the HindIII mode:
./HiCorr HindIII <cis_loop_file> <trans_loop_file> <name_of_your_data> <reference_genome> [options]

DpnII/Mbol

  • The format of the two input files are the same as HindIII To run the DpNII/Mbol mode:
    ./HiCorr DPNII <cis_loop_file> <trans_loop_file> <name_of_your_data> <reference_genome> [options]

eHiC-QC

eHiC-QC mode takes a pair of fastq.gz files as input, aligns and processes eHiC reads, outputs fragment-end-pair files for further analysis. This mode also outputs summarize numbers which works as quality check fo eHiC experiments. Make sure to name your fastq.gz files as .R1.fastq.gz and .R1.fastq.gz. You need to have Bowtie(http://bowtie-bio.sourceforge.net/index.shtml) and samtools(http://www.htslib.org/) installed since HiCorr calls Bowtie to do alignments. You also need Bowtie index and fa.fai file. To run the eHiC-QC mode, you need 4 arguments:
./HiCorr eHiC-QC <bowtie_index> <fa.fai> <name>

eHiC

eHiC mode corrects bias of eHi-C data. It takes two fragment-end-pair files as input (use HiCorr's eHiC-QC mode if you need to generate these files) and outputs an anchor_pair file.

  • The two input files: one file contains intra-chromosome looping fragment-end pairs(cis pairs), and another contains inter-chromosome looping fragment-end pairs(trans pairs).
    • Intra-chromosome looping pairs need to have 4 tab-delimited columns, in the following format:
      frag_end_id_1 frag_end_id_2 observed_reads_count distance_between_two_fragments
      See sample file here:
    • Inter-chromosome looping piars need to have 3 tab-delimited columns, in the following format:
      frag_end_id_1 frag_end_id_2 observed_reads_count
      See sample file here:
    • These two files needs to be sorted before you run the pipeline (sort -k1 -k2).
  • The final result of HindIII mode is an anchor-to-anchor looping pairs file, which has 5 columns:
    anchor_id_1anchor_id_2 obserced_reads_count expected_reads_count p_value_
    See sample file here: http://hiview.case.edu/test/sample/anchor_2_anchor.loop.IMR90.p_val.sample

To run the eHiC mode:
./HiCorr eHiC <cis_loop_file> <trans_loop_file> <name_of_your_data> <reference_genome>

HiCorr test data (fragment loop, HindIII)

This test dataset is Adrenal Hi-C.(restriction enzyme: HindIII; genome build:hg19) from GSE87112.

wget http://hiview.case.edu/ssz20/tmp.HiCorr.ref/HiCorr_test_data/frag_loop.Adrenal.cis.gz # cis fragment loop
wget http://hiview.case.edu/ssz20/tmp.HiCorr.ref/HiCorr_test_data/frag_loop.Adrenal.trans.gz # trans fragment loop
gunzip frag_loop.Adrenal.cis.gz
gunzip frag_loop.Adrenal.trans.gz
./HiCorr HindIII frag_loop.Adrenal.cis frag_loop.Adrenal.trans Adrenal hg19
../HiCorr Heatmap chr1 119457772 120457772 HiCorr_output/anchor_2_anchor.loop.chr1 hg19 HindIII # plot Adrenal heatmap

HiCorr test data (bam, HindIII)

This test dataset is subsampled bam file for H9 rep1 Hi-C.(restriction enzyme: HindIII; genome build:hg19) from GSE130711.

wget http://hiview.case.edu/ssz20/tmp.HiCorr.ref/HiCorr_test_data/H9_rep1.subsample.sorted.bam
./HiCorr Bam-process-HindIII H9_rep1.subsample.sorted.bam H9_rep1.subsample 36 hg19 HindIII

You will found "H9_rep1.subsample.cis.frag_loop" and "H9_rep1.subsample.trans.frag_loop", the other files are intermediate files.
Next run HiCorr bias correction using two *frag_loop files.

./HiCorr HindIII H9_rep1.subsample.cis.frag_loop H9_rep1.subsample.trans.frag_loop H9_rep1.subsample hg19 # It take a few hours to run

HiCorr test data (bam, DPNII)

This test dataset is subsampled bam file for H1 Bio1Tech1Ind2 in-situ Hi-C.(restriction enzyme: DPNII; genome build:hg19) from 4DNES2M5JIGV.

wget http://hiview.case.edu/ssz20/tmp.HiCorr.ref/HiCorr_test_data/4DNES2M5JIGV.Bio1Tech1Ind2.subsample.sorted.bam
./HiCorr Bam-process-DpNII 4DNES2M5JIGV.Bio1Tech1Ind2.subsample.sorted.bam  4DNES2M5JIGV.Bio1Tech1Ind2.subsample 50 hg19 DPNII

You will found "4DNES2M5JIGV.Bio1Tech1Ind2.subsample.cis.frag_loop" and "4DNES2M5JIGV.Bio1Tech1Ind2.subsample.trans.frag_loop", the other files are intermediate files.
Next run HiCorr bias correction using two *frag_loop files.

./HiCorr DPNII 4DNES2M5JIGV.Bio1Tech1Ind2.subsample.cis.frag_loop 4DNES2M5JIGV.Bio1Tech1Ind2.subsample.trans.frag_loop 4DNES2M5JIGV.Bio1Tech1Ind2.subsample hg19 # It take a few hours to run

Heatmap

Heatmap mode generates Hi-C heatmaps of a certain region you choosed(up to 2,000,000bp). This mode need to be run after either HindIII mode or eHiC mode, since it takes an anchor-to-anchor looping-pair file as input.
To run the Heatmap mode:
./HiCorr Heatmap <chr> <start> <end> <anchor_loop_file> <reference_genome> <enzyme> [option]
Example run:

Download test dataset for H9 chr11 (restriction enzyme: HindIII; genome build:hg19) from GSE130711

wget http://hiview.case.edu/ssz20/tmp.HiCorr.ref/HiCorr_test_data/HiCorr_output.tar.gz 
tar -xvf HiCorr_output.tar.gz
ls
ls HiCorr_output

Plot heatmaps

./HiCorr Heatmap chr11 130000000 130800000 HiCorr_output/anchor_2_anchor.loop.chr11 hg19 HindIII
You will see three png files named as "hg19.HindIII.chr11_130000000_130800000.raw.matrix.png", "hg19.HindIII.chr11_130000000_130800000.expt.matrix.png" and "hg19.HindIII.chr11_130000000_130800000.ratio.matrix.png"

Options

  • Default
    By defult, heatmap mode will generates 3 heatmaps for the region you entered: a raw heatmap of observed reads, a heatmap of expected reads, and a heatmap of bias-corrected reads(as a ratio of observeds reads over expected reads). If you want all 3 of these heatmaps, leave the option as blank.
  • -raw
    Only generates a raw heatmap of observed reads
  • -expected
    Only generates a heatmap of expected reads
  • -ratio
    Only generates a bias-corrected heatmap

Next step analysis

We developed DeepLoop to remove noise and enhance signals from low-depth Hi-C data, See more details in https://github.com/JinLabBioinfo/DeepLoop

40 Processed Hi-C datasets by HiCorr and DeepLoop can be visualized in website

hicorr's People

Contributors

jinlabbioinfo avatar xliu10 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.