Git Product home page Git Product logo

rnascan's Introduction

Build Status Version GitHub license

rnascan

rnascan is a (mostly) Python suite to scan RNA sequences and secondary structures with sequence and secondary structure PFMs. Secondary structure is represented as weights in different secondary structure contexts, similar to how a PFM represents weights of different nucleotides or amino acids. This allows representation and use of secondary structures in a way that is similar to how PFMs are used to scan nucleotide sequences, and also allows for some flexibility in the structure, as you might find in the boltzmann distribution of secondary structures.

The secondary structure alphabet is as follows:

  • B - bulge loop
  • E - external (unpaired) RNA
  • H - hairpin loop
  • L - left paired RNA (i.e., a '(' in dot-bracket format)
  • M - multiloop
  • R - right paired RNA (i.e., a ')' in dot-bracket format)
  • T - internal loop

The rnascan suite consists of two tools:

  1. run_folding: Calculate an average structural context profile of an RNA sequence by folding overlapping 100 nt subsequences and averaging across.
  2. rnascan: Scan RNA sequences and secondary structures with sequence and secondary structure PFMs.

Installation

Read the following steps to install rnascan. If you do not plan on using the run_folding tool to fold sequences, you may skip the steps with an asterisk (*).

1. Install ViennaRNA*

To predict secondary structures, the program RNAfold from the ViennaRNA package is used. Please follow the installation instructions on their website.

2. Download rnascan source

git clone [email protected]:morrislab/rnascan.git
cd rnascan

3. Compile secondary structure parser C++ script*

The compiled binary must be saved in a location where it can be executed (i.e. is listed in your PATH environment variable). Here, we use the user's local bin:

g++ -o ~/bin/parse_secondary_structure scripts/parse_secondary_structure.cpp

4. Install rnascan Python components

This package requires Python 2.7+ or Python 3.5+. To install the package, run the following:

python setup.py install

# alternatively, for user-specific installation:
python setup.py install --user

Dependencies (pandas, numpy, and biopython) will be automatically downloaded and installed, if necessary.

Usage

For full documentation of options, refer to the help messages using the -h option for each command.

run_folding

run_folding sequences.fasta /path/to/output_dir

The second argument /path/to/output_dir is the directory where the average structure profiles will be saved. One file per FASTA record will be outputted.

rnascan

Scanning can be performed in four modes:

  1. Sequence only (using -p to specify the sequence PFM)
  2. Structure only (using -q to specify the structure PFM)
  3. Sequence and structure (-p and -q)
  4. Sequence and averaged structure (-p and -q)

Here are some example commands using minimal options:

# To run a test sequence
rnascan -p pfm_seq.txt -t AGTTCCGGTCCGGCAGAGATCGCG > hits.tab

# Sequence-only (use -p)
rnascan -p pfm_seq.txt sequences.fasta > hits.tab

# Structure-only (use -q)
rnascan -q pfm_struct.txt structures.fasta > hits.tab

# Sequence and structure
rnascan -p pfm_seq.txt -q pfm_struct.txt sequences.fasta structures.fasta > hits.tab

# Sequence and averaged structure
rnascan -p pfm_seq.txt -q pfm_struct.txt sequences.fasta averaged_structures/ > hits.tab

Note that in the last example, the second positional argument is the path to a directory containing the average structure profiles generated by run_folding. rnascan will look inside the directory and automatically search for files that look like structure.<sequence_id>.txt.

To print the score at every position, change the default threshold using the -m option to -inf. To change the number of processing cores, use -c:

rnascan -p pfm_seq.txt -q pfm_struct.txt -m ' -inf' -c 8 sequences.fasta averaged_structures/ > hits.tab

Computing background probabilities

By default, rnascan computes the background probabilities from the input sequences at the beginning of the run. To apply a uniform background, use the option -u:

rnascan -p pfm_seq.txt -u sequences.fasta > hits.tab

To compute the background probabilities of a set of input sequences and save it for future use, use the option --bgonly:

rnascan -p pfm_seq.txt --bgonly sequences.fasta > background.txt

rnascan -q pfm_struct.txt --bgonly structures.fasta > background.txt

In this mode, rnascan computes the background probabilities, outputs to standard output (in the form of a Python dictionary), and exits (no scanning is performed). To re-use this background later, use the option --bg_seq or --bg_struct with the background file:

rnascan -p pfm_seq.txt --bg_seq background.txt sequences.fasta > hits.tab

Citation

Cook, K.B., Vembu, S., Ha, K.C.H., Zheng, H., Laverty, K.U., Hughes, T.R., Ray, D., Morris, Q.D., 2017. RNAcompete-S: Combined RNA sequence/structure preferences for RNA binding proteins derived from a single-step in vitro selection. Methods 126, 18โ€“28. http://www.sciencedirect.com/science/article/pii/S1046202317300312

Links

rnascan's People

Contributors

cookkate avatar kcha avatar miha-skalic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

rnascan's Issues

Implement _pwm.c for custom alphabets

Currently, _pwm.c is modified to support only RNA alphabet (see 1064f24). We could extend it to also handle the other custom alphabets and speed up secondary structure motif scanning.

Issues with redirecting <stdout> in python3

Hi Kevin,
There seems to be an issue with redirecting STDOUT in python3. The following script redirects motif hits to test_hits.tab with no issues in python2, but in python3, it creates an empty test_hits.tab file, and creates and writes to a file named <stdout> in the directory where rnascan is executed.

rnascan -p test_rnacompete_pfm.txt --bg_seq background_prob.txt test.fasta > test_hits.tab

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.