Git Product home page Git Product logo

contrafold-se's Introduction

CONTRAfold-SE - Learning RNA secondary structure (only) from structure probing data

Installation

Clone the repository and run make in src to compile CONTRAfold-SE.

CONTRAfold-SE has been tested on g++ 4.4.7 and 4.8.4.

Usage

Below are some examples for CONTRAfold-SE usage.

Please see the documentation of CONTRAfold for further information on parameters and usage.

Training

Learn the parameters based on a set of sequences, in which sequences with associated probing data have data from 2 sources, and with lambda parameter 0.1.

Assumes that folder "trainset" has a set of sequences of type ".bpseq" in evidence format for the ones with data.

contrafold train --regularize 1 --numdatasources 2 --maxiter 1000 --hyperparam_data 0.1 --initweights contrafold.params.complementary_data2 trainset/*.bpseq

If there are a large number of input files used (> 1000 files; e.g. for training on RMDB data), provide a text file containing the list of example files instead with the --examplefile option.

contrafold train --regularize 1 --numdatasources 1 --maxiter 500 --examplefile examples.txt

Prediction with probing data

Predict the structure of sequence "seq" with evidence "seq.bpseq" for a variety of gamma.

Assumes that folder "testset" has a bpseq file "seq.bpseq" in evidence format.

contrafold predict --numdatasources 1 --params learned_params.params --gamma -1 --evidence testset/seq.bpseq

Prediction without probing data

Predict the structure of sequence "seq" for a variety of gamma.

Assumes that folder "testset" has a seq file "seq.seq".

contrafold predict --numdatasources 1 --params learned_params.params --gamma -1 testset/seq.bpseq

Input file formats

To support structure probing data we adapt the BPSEQ format in two ways to support sequences with only probing data (BPP2SEQ), and sequences with both probing data and known structure (base-pairings) (BPP2TSEQ).

The original BPSEQ format consists of a row for each base in the RNA molecule, describing the index (1-based), the actual base present, and the index of the pairing partner (0 if unpaired).

In the BPP2SEQ format, there is an evidence string e<N> following the base, where <N> is an integer denoting how many probing data sources there are, followed by <N> (positive) values of the unpairedness potential (derived from probing data) for that base. For instance, e2 denotes 2 probing sources, and should be followed by a two positive real numbers; all entries should have the same evidence string. An example is shown below

1 G e2 7.070000e+00 -1.000000e+00
2 A e2 7.570000e+00 3.333333e-02
3 A e2 6.500000e+00 -1.000000e+00
4 A e2 5.310000e+00 4.444444e-02

In the BPP2TSEQ format, the evidence string is t<N> instead, and following the unpairedness potential values is the index of the pairing partner.

In either format, values that are less than 1e-5 are ignored (treated as not present) for numerical stability.

Datasets

Data used to train and evaluate CONTRAfold-SE may be obtained here.

contrafold-se's People

Contributors

csfoo avatar sunrisewestern avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.