Git Product home page Git Product logo

cbcr's Introduction

CBCR: A Curriculum Based Strategy For Chromosome Reconstruction

OluwadareLab, University of Colorado, Colorado Springs


Developers:
              Van Hovenga
              Department of Mathematics
              University of Colorado, Colorado Springs
              Email: [email protected]

Contact:
              Oluwatosin Oluwadare, PhD
              Department of Computer Science
              University of Colorado, Colorado Springs
              Email: [email protected]


Cite: Hovenga V, Oluwadare O. CBCR: A Curriculum Based Strategy For Chromosome Reconstruction. International Journal of Molecular Sciences. 2021; 22(8):4140. https://doi.org/10.3390/ijms22084140


1. Content of folders:

  • src: Matlab source code.
  • GM12878_Output: Output structions and log files generated from the GM12878 cell line using the Mbol restriction enzyme at four different resolutions (1Mb, 500Kb, 250Kb, 100Kb). These structures were generated using the primary-replicate combined mappings.
  • Validation: Output structures and log files generated from the GM12878 cell line used for validation at 1Mb resolution.
    • Mpol_Primary_Output: Outputs generated from the primary mappings using the Mbol restriction enzyme.
    • Mbol_Replicate_Output: Outputs generated from the replicate mappings using the Mbol restriction enzyme.
    • DpnII_Combined: Outputs generated from the primary-replicate combined mappings using the DpnII restriction enzyme.

2. Hi-C Data used in this study:

In our study, we used the synthetic dataset from Adhikari, et al. The contact maps, the original models and their reconstructed models used in this study were downloaded from here. We also used the synthetic dataset from Zou, et al which can be downloaded from here.

The GM12878 cell Hi-C dataset, GEO Accession number GSE63525, was downloaded from GSDB with GSDB ID: OO7429SF

3. Input file format:

CBCR allows two input formats:

  • Square matrix input format: A square matrix of size N by N consisting of intra-chromosomal contact matrix derived from Hi-C data, where N is the number of equal-sized regions of the chromosome.
  • Tuple input format: A hi-C contact file, each line contains 3 numbers (separated by a space) of a contact, position_1 position_2 interaction_frequencies.

4. Usage:

Usage: To use, type in the terminal CBCR(input, learning_rate, conversion, max_iter_0, max_iter_1, verbose)

  • Arguments:
    • input: A string for the path of the input file
    • learning_rate: The learning rate of the algorithm [Recommended value: .2].
    • conversion: Vector or scalar. The factor(s) used to convert IF to distance, distance = 1/(IF^factor). When a vector is used, a structure is generated at every conversion factor in the vector and the value which maximizes the distance Spearman correlation coefficient is selected as the representitve structure. For example, if the input is [.1, .3, .5, .7,.9, 1, 1.3, 1.5], then CBCR generates a structure for each value and selects whichever one that maximizes dSCC as the representitave structure. A vector input is recommended for a thorough search. When a scalar is used, user only needs to provide a single value, For example, an input value of 0.5
    • max_iter_0: The maximum total number of iterations over all sub-curricula combined. This value should be smaller for smaller inputs, and larger for larger inputs. A value of 1,000 was used for the 1Mb and 500Kb, and 10,000 for the 250Kb and 100Kb resolutions in this study on the GM12878 input data
    • max_iter_1: The maximum total number of iterations over the final training of CBCR if early convergence is met. A value of 500 was used for the 1Mb and 500Kb, and a value of 1,000 was used for the 250Kb and 100Kb resolutions in this study on the GM12878 input data.
    • verbose: Integer. Controls the output of CBCR in the console. A value of 0 will display only the current curricula. A value of 1 will display the current curricula and each iteration with the corresponding loss, and value for alpha and beta. A value of 2 will display the outputs of verbose = 1 and a plot that displays the evolution of the chromosome as training progresses. Note that this option will slow down CBCR.

6. Output:

CBCR outputs three files:

  1. .pdb: The protein data bank file of the representative structure.
  2. .log: A log file that tells the input file, the optimal structure file name, the optimal conversion factor, and the corresponding dSCC, sPCC, and dRMSE.
  3. _coordinate_mapping.txt: contains the mapping of genomic positions to indices in the model. Notice that indices start from 0, while in pyMol or Chimera, id starts from 1

cbcr's People

Contributors

oluwadarelab avatar vhovenga avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.