Git Product home page Git Product logo

ihic's Introduction

iHiC

A set of C++ utilities for pre/post-processing Hi-C data for public software (including CscoreTool for A/B compartment, TopDom for TAD, and Fit-Hi-C for significant contacts), for data visualization with the WashU Epigenome Browser, and for quality inspection.

Installation
After download the package, locate the folder using terminal command, and type in: "sh iHiC_install" to compile the C++ codes with g++. Executive files will be save to a sub_folder "bin". Add the "bin" folder to your path or copy the executive files to /usr/bin/ or /usr/local/bin.

Descriptions of usage of the utilities

  1. iHiC_BEDPE_Statistics: Calculate statistics of PET alignments from a simplified BEDPE file.
    Usage: iHiC_BEDPE_Statistics BEDPE_file MAPQ
    Inputs:
    BEDPE_file: A simplified version of regular BEDPE file: only columns for “chrom1”, “start1”, “end1”, “chrom2”, “start2”, “end2”, “score”, “strand1”, and “strand 2” are retained.
    MAPQ: Minimal mapping quality score used to filter out PETs mapped to multiple positions. 10 recommended.
    Outputs: screen

  2. iHiC_PETOrientations_Stat: Calculate % of PETs in different combinations of orientations as a function of genomic distance.
    Usage: iHiC_PETOrientations_Stat BEDPE_file MAPQ
    BEDPE_file: A simplified version of regular BEDPE file: only columns for “chrom1”, “start1”, “end1”, “chrom2”, “start2”, “end2”, “score”, “strand1”, and “strand 2” are retained.
    MAPQ: Minimal mapping quality score used to filter out PETs mapped to multiple positions. 10 recommended.
    Outputs: screen

  3. iHiC_ContactFreq_Distance: Calculate contact probability as a function genomic distance.
    Usage: iHiC_ContactFreq_Distance BEDPE_file MAPQ
    BEDPE_file: A simplified version of regular BEDPE file: only columns for “chrom1”, “start1”, “end1”, “chrom2”, “start2”, “end2”, “score”, “strand1”, and “strand 2” are retained.
    MAPQ: Minimal mapping quality score used to filter out PETs mapped to multiple positions. 10 recommended.
    Outputs: screen
    Note: It is recommended to redirect the ouputs into a local file with the ">" symbol: iHiC_ContactFreq_Distance BEDPE_file MAPQ > local_file_name

  4. iHiC_BEDPE2III: Generate an intra-chromosomal interaction index (III) file that records the numbers of PETs for interacting genomic bins from the same chromosomes.
    Usage: iHiC_BEDPE2III BEDPE_file Bin_size Minimal_Distance Maximal_Distance MAPQ Output_file_name
    Inputs:
    BEDPE_file: A simplified version of regular BEDPE file: only columns for “chrom1”, “start1”, “end1”, “chrom2”, “start2”, “end2”, “score”, “strand1”, and “strand 2” are retained.
    Bin_size: Used to partition the genome into bins of equal size.
    Minimal_Distance: Minimal distance between two PET ends considered for downstream analysis. Should be no less than Bin_size.
    Maximal_Distance: Maximal distance between two PET ends considered for downstream analysis. Simply set to 2000,0000,000 bps to include all interacting PETs longer than the Minimal_Distance. However, considering that majority of the significant contacts are less than 2,000,000 bps, one may want to set Maximal_Distance to this number to save computational time for later analysis.
    MAPQ: Minimal mapping quality score used to filter out PETs mapped to multiple positions. 10 recommended.
    Output_file_name: File name for the III file.
    Outputs: Format for left to right columns: chromosome, genomic position of bin1, genomic position of bin2, number of PETs linking the two bins.
    Note: Only uniquely mapped PETs are considered. For multiple PETs mapped to the same position, only one copy is retained.

  5. iHiC_III2WashU: Convert an III file into “longrange” format accepted by the WashU Epigenome Browser.
    Usage: iHiC_III2WashU III_file Bin_size Output_file_name
    Inputs:
    III_file: An intra-chromosomal interaction index (III) file that records the numbers of PETs for interacting genomic bins from the same chromosomes. Output from iHiC_BEDPE2III.
    Bin_size: Size of genomic bin.
    Output_file_name: File name for the "longrange" file.
    Outputs: A "longrange" format by WashU Epigenome Browser.

  6. iHiC_BEDPE2HiCSummary: Convert a BEDPE file into “HiCsummary” format accepted by CscoreTool.
    Usage:iHiC_BEDPE2HiCSummary BEDPE Output_file_name
    Inputs:
    BEDPE_file: A simplified version of regular BEDPE file: only columns for “chrom1”, “start1”, “end1”, “chrom2”, “start2”, “end2”, “score”, “strand1”, and “strand 2” are retained.
    Output_file_name: File name for the "HiCsummary" file.
    Outputs:In "HiCsummary" format

  7. iHiC_Cscore_Adj: Flip the sign of C-score chromosome-by- chromosome to ensure genomic regions with positive scores correspond to A compartment status.
    Usage: iHiC_Cscore_Adj Cscore_file CpG_annotation_file
    Inputs:
    Cscore_file: output file from Cscore_Tool with name ending with '_cscore.bedgraph'
    CpG_annotation_file: CpG annotation downloaded from UCSC. Only the 4-5 columns corresponding to chr, start and end are used by the program.
    Outputs: A "bedgraph" format with file name ending with "adj"

  8. iHiC_III2MTX4TopDom: Convert an III file into interaction matrices accepted by TopDom.
    Usage:iHiC_III2MTX4TopDom III_file Bin_size Chr_lenth_file Output_Prefix
    Inputs:
    III_file: An intra-chromosomal interaction index (III) file that records the numbers of PETs for interacting genomic bins from the same chromosomes. Output from iHiC_BEDPE2III.
    Bin_size: Size of genomic bin.
    Chr_lenth_file: In a two column format: chr and length
    Output_Prefix: "chr#.mtx4topdom" will be appended to the Output_Prefix to denote the file for chr#.
    Outputs: In the format of n X (n+3) matrix format, accepted by TopDom as input, with first three columns for bin.chr, bin.start, and bin.end, and the next n columns for # of PETs linked to other bins.

  9. iHiC_TADBoundary_TopDom: Extract TAD boundaries from TopDom predictions.
    Usage:iHiC_TADBoundary_TopDom TopDom_binSignal_file Pvalue_threshold Output_file_name
    Inputs:
    TopDom_binSignal_file: Output file from TopDom with name ending with '.binSignal'.
    Pvalue_threshold: Pvalue used to filter bins corresponding to TAD boundaries.
    Output_file_name: File name for the file of TAD boundaries.
    Outputs:In bedgraph format.

  10. iHiC_III2FitHiCInputs: Convert an III file into interaction matrices accepted by Fit-Hi-C.
    Usage: iHiC_III2FitHiCInputs III Output_prefix Chr_length_file Bin_size Minimal_Distance Maximal_Distance
    Inputs:
    III_file: An intra-chromosomal interaction index (III) file that records the numbers of PETs for interacting genomic bins from the same chromosomes. Output from iHiC_BEDPE2III.
    Output_Prefix: "chr#.frag" and "chr#.cnts" will be appended to the Output_Prefix to denote the files for chr#.
    Chr_lenth_file: In a two column format: chr and length
    Bin_size: Size of genomic bin.
    Minimal_Distance: Minimal distance between two PET ends considered.
    Maximal_Distance: Maximal distance between two PET ends considered
    Outputs: "chr#.frag" and "chr#.cnts" for "fragments file" and "interactions file" accepted by Fit-Hi-C

  11. iHiC_FitHiC2WashU: Convert the significant contacts predicted by Fit-Hi-C into a “longrange” format accepted by the WashU Epigenome Browser.
    Usage: iHiC_FitHiC2WashU FitHiC_prediction Bin_size Q-value
    Inputs:
    FitHiC_prediction: Output file Fit-Hi-C with name ending with '.significances.txt'.
    Bin_szie: Bin size used for generating inputs for Fit-Hi-C by iHiC_III2FitHiCInputs
    Q-value: Q-value to retain significant contacts.
    Outputs: Output file will be ending with "longrange" accepted by WashU Epigenome Browser for data visualization.

  12. TopDom_wrapper.R: A Rscript wrapper to runn TopDom.
    Usage: Rscript TopDom_wrapper.R Path_to_TopDom_v0.0.2.R ".mtx4topdom"_file.
    The file ".mtx4topdom" are outputs from "iHiC_III2MTX4TopDom".

ihic's People

Contributors

hugangqing avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.