Git Product home page Git Product logo

mrtadfinder's Introduction

MrTADFinder

MrTADFinder aims to identify topologically associating domains (TADs) in multiple resolutions.

INPUT FILES:

MrTADFinder takes an whole-genome-to-whole genome contact map as an input. The contact map should be a sparse matrix stored in a tab-delimited file as follow

1 1464 39
1 1768 19
1 4455 20
1 4458 70
1 7368 29
1 10413 25
1 10563 30
1 10687 37
1 10690 24
1 11123 28
...

The first 2 columns are indices of genomic bins, and the third column represents the contact frequency (corresponding matrix elements). This simple format has been widely used by mapping tools such as HiCPro.

The mapping between chromosome and genomics bins is specified by two annotation files.

To bin the human genome (hg19) in 40kb bin size, an annotation file in the following format is provided (see ./data/bins_file1):

1 chr1 0 6231
2 chr2 6232 12311
3 chr3 12312 17262
4 chr4 17263 22041
5 chr5 22042 26564
6 chr6 26565 30842
7 chr7 30843 34821
8 chr8 34822 38481
9 chr9 38482 42012
10 chr10 42013 45401
11 chr11 45402 48777
12 chr12 48778 52124
13 chr13 52125 55004
14 chr14 55005 57688
15 chr15 57689 60252
16 chr16 60253 62511
17 chr17 62512 64541
18 chr18 64542 66493
19 chr19 66494 67972
20 chr20 67973 69548
21 chr21 69549 70752
22 chr22 70753 72035
23 chrX 72036 75917
24 chrY 75918 77402
25 chrM 77403 77403

The first and second columns show the indices and names of various chromosome. Based on a bin size of 40kb, chromosome 1 is divided into 6232 bins (from bin 0 to bin 6231). Apart from the last bin, all the bins are 40kb. In this example, the whole human genome is divided int 77404 bins, and therefore the corresponding contact map is a square matrix of size 77404. Numbers running from 1 to 77404 are used to indexing the contact map file.

The 2nd file has the form

0 1 40000
0 40001 80000
0 80001 120000
0 120001 160000
0 160001 200000
0 200001 240000
0 240001 280000
0 280001 320000
0 320001 360000
0 360001 400000

IN this case, there are 77404 lines, representing all genomic bins (chromosome number (0=chr1 etc), start and end points).

Annotation files based on binning human genome (hg19) in 40kb are provided. Users can use different annotation files for different bin sizes or different organisms.

OUTPUT FILES:

Output file is simply a csv file that stores a list of TADs (chromosome number, start and end coordinates).

"chr","domain_st","domain_ed","domain_st_bin","domain_ed_bin","idx"
"chr10",40001,200000,42015,42018,1
"chr10",200001,880000,42019,42035,2
"chr10",880001,1120000,42036,42041,3
"chr10",1120001,1160000,42042,42042,4
"chr10",1160001,1200000,42043,42043,5
"chr10",1200001,1240000,42044,42044,6
"chr10",1240001,1280000,42045,42045,7
"chr10",1280001,1320000,42046,42046,8
"chr10",1320001,1360000,42047,42047,9

The bins defined by the annotation files are provided in the 4th and 5th columns.

USAGE:

MrTADFinder is written in Julia. It has been tested in Julia v0.4.3. If Julia and the required packages are installed (see the first few lines in MrTADFinder.jl), one could simply run in the command prompt

julia run_MrTADFinder.jl contact_map ./data/bins_file1 ./data/bins_file2 res=1.0 10 TAD_chr10.bed

The 1st agrument: contact map.
The 2nd and 3rd agruments are the 2 annotation files.
The 4th argument is the resolution parameter.
The 5th argument is the chromosome of interest.
The 6th argument is the path and name of the TAD output file. An additional file of boundary score will be generated.

Aurthor/Support

Koon-Kiu Yan, [email protected]; Mark Gerstein, [email protected]

REFERENCE:

Koon-Kiu Yan, Mark Gerstein: MrTADFinder: A network modularity based approach to identify topologically associating domains in multiple resolutions https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005647

mrtadfinder's People

Contributors

quantum-man avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

mrtadfinder's Issues

multiple warnings and errors when execution run_MrTADFinder.jl

Hi,

The execution of run_MrTADFinder.jl on my computer produces multiple warnings an errors. I have tried both julia versions 0.4.7 and 0.6.1 (I have been unable to find the source code for the version 0.4.3 described in the README.md file).

Below are the warnings/errors from the 0.6.1 version:

$ julia --version
julia version 0.6.1

$ julia run_MrTADFinder.jl /sp4work/Projects/MiniPromoters/SN/Hi-C/HiC-Pro/NcoI/hic_results/hicpro.merged/hic_results/matrix/data/iced/40000/data_40000_iced.matrix ./bins/mm10/bins_file1 ./bins/mm10/bins_file2 res=2.875 21 chr19.bed

WARNING: deprecated syntax "abstract ApproxFit" at /homed/home/oriol/.miniconda2.7/share/julia/site/v0.6/CurveFit/src/CurveFit.jl:47.
Use "abstract type ApproxFit end" instead.

WARNING: deprecated syntax "abstract LeastSquares<:ApproxFit" at /homed/home/oriol/.miniconda2.7/share/julia/site/v0.6/CurveFit/src/CurveFit.jl:50.
Use "abstract type LeastSquares<:ApproxFit end" instead.

WARNING: deprecated syntax "inner constructor LinearFit(...) around /homed/home/oriol/.miniconda2.7/share/julia/site/v0.6/CurveFit/src/linfit.jl:54".
Use "LinearFit{T}(...) where T" instead.

WARNING: deprecated syntax "inner constructor LogFit(...) around /homed/home/oriol/.miniconda2.7/share/julia/site/v0.6/CurveFit/src/linfit.jl:62".
Use "LogFit{T}(...) where T" instead.

WARNING: deprecated syntax "inner constructor PowerFit(...) around /homed/home/oriol/.miniconda2.7/share/julia/site/v0.6/CurveFit/src/linfit.jl:69".
Use "PowerFit{T}(...) where T" instead.

WARNING: deprecated syntax "inner constructor ExpFit(...) around /homed/home/oriol/.miniconda2.7/share/julia/site/v0.6/CurveFit/src/linfit.jl:76".
Use "ExpFit{T}(...) where T" instead.
WARNING: could not import Base.call into CurveFit
reading binning information
reading contact map
WARNING: round(::Type{T}, x::AbstractArray) where T is deprecated, use round.(T, x) instead.
Stacktrace:
 [1] depwarn(::String, ::Symbol) at ./deprecated.jl:70
 [2] round(::Type{Int64}, ::Array{Float64,1}) at ./deprecated.jl:57
 [3] read_generic_WG_contact_map(::String, ::Int64) at /sp4work/Tools/MrTADFinder/MrTADFinder.jl:12
 [4] include_from_node1(::String) at ./loading.jl:576
 [5] include(::String) at ./sysimg.jl:14
 [6] process_options(::Base.JLOptions) at ./client.jl:305
 [7] _start() at ./client.jl:371
while loading /sp4work/Tools/MrTADFinder/run_MrTADFinder.jl, in expression starting on line 49
ERROR: LoadError: ArgumentError: invalid index: 21.0
Stacktrace:
 [1] extract_chr(::SparseMatrixCSC{Float64,Int64}, ::Array{Any,2}, ::Float64) at /sp4work/Tools/MrTADFinder/MrTADFinder.jl:22
 [2] include_from_node1(::String) at ./loading.jl:576
 [3] include(::String) at ./sysimg.jl:14
 [4] process_options(::Base.JLOptions) at ./client.jl:305
 [5] _start() at ./client.jl:371
while loading /sp4work/Tools/MrTADFinder/run_MrTADFinder.jl, in expression starting on line 51

FYI, attached is a zip file with the bin files that I created from the HiC-Pro file containing the Hi-C fragments (i.e. data_40000_abs.bed) using the Create_MrTADFinder_Bins.py script (also attached) .

Any help will be welcome! Thank you in advance!

Archive.zip

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.