Git Product home page Git Product logo

deeplncctcf's Introduction

DeepLncCTCF for identification and analysis of consensus RNA motifs binding to the genome regulator CTCF

DeepLncCTCF is a deep learning model to discover the RNA recognition patterns of CTCF and to identify candidate lncRNAs that may interact with CTCF. It utilized convolutional neural networks (CNNs) and attention-based bi-directional long short-term memory (BLSTM) network. We implemented the DeepLncCTCF model in Python using Keras 2.2.4 on a high performance computing cluster.

This documentation is part of the supplementary information release for DeepLncCTCF. For details of this work, please refer to our paper "Identification and analysis of consensus RNA motifs binding to the genome regulator CTCF" (S. Kuang and L. Wang, 2020).

Requirements

  • python3
  • numpy
  • pandas
  • sklearn
  • keras >=2.0
  • tensorflow
  • h5py
  • scipy
  • Bio

Input Format

The input files are in FASTA format, but the description line (the line with ">" symbol in the begining) should start with class label. An example seqeunce is as follows:

>1
CGGCCUCCCCAGCGCAGGGCUCCUCGUUUGAGGGGAGGUGACUUCCCUCCCAGCAGGCUCUUGGACACAGUAAGCUUCCCCAGCCCUGCCUGAGCAGCCUUUCCUCCUUGCCCUGUUCCCCACCUCCCGGCUCCAGGUGAGCGGGCCCUGGAGCUUGCAGUCGGAGGGCCUUGGGCAAGAUCGCCUCCUCCCCUCCAGCCC

Training and Evaluation

Our data for constructing the model are available in the Data directory. If you want to train your own model with DeepLncCTCF, you can just substitute the input with your own data. The command line to train and evaluate DeepLncCTCF as follows:

$ python train.py -f human_positive_seq.fa -n human_negative_seq.fa -o human.output

During the training, the best weights will be automatically stored in a "hdf5" file. Our fully trained model have been uploaded in the Weights directory.

Testing

If you want to evaluate the model on a separate test data, you can run the following command line:

$ python test.py -f test_positive_seq.fa -n test_negative_seq.fa -o test.output

Please make sure to download the "hdf5" file in the Weights directory or generate your own best weights.

Motif Visualization

If you want to visualize the kernals of the first convolution layer and get its frequency and location information, you can run the following command line:

$ python get_motifs.py -f human_positive_seq.fa -n human_negative_seq.fa

Same as the Testing process, "hdf5" file with the best weights is needed.

Predicting CTCF-binding RNA sites on lncRNAs

We applied the trained DeepLncCTCF model to predict CTCF-binding RNA sites on human lncRNAs, which were further used to select candidate CTCF-binding lncRNAs. To predict the CTCF-binding RNA sites on lncRNAs using trained DeepLncCTCF model, you can run the following command line:

$ python prediction.py -f lncRNA_seq.fa -o prediction.output

deeplncctcf's People

Contributors

biodatalearning avatar shuzhenkuang avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.