Git Product home page Git Product logo

splicepredictor's Introduction

SplicePredictor

:octocat: ​A task of functional sites finding in genetic DNA.

Overview

This is a re-implementation of predicting donor & acceptor splice site signals using 3 different models.

Weight Array (WAM)

An optimized frequency-based method to predict splice site patterns with the weight array model. At bottom, a weight array method continues to extract splice signals, count the frequencies of nucleotides and fill the matrices, identical with the conditional weight matrices (WMM). What we can use to distinguish WAM from WMM is that WAM takes into account the correspondence between current position and an adjoining position, which we certify conducive to promote accuracy of splice site prediction. See Zhang et al. (1993) for a reference.

Overall architecture of the WAM splice predictor. (Framework pictures made by hand)

Bayesian Network (BN)

A probabilistic graphical model which considers long range interdependency among nucleotide sequences by learning a direct acyclic graph (DAG), finding all of the conditional probabilities for each of the variables which will be used to search for missing labels of the testing sequences. See Chen et al. (2005) for a reference. BNs are implemented using Pgmpy and NetworkX.

Overall architecture of the Bayesian network splice predictor.

Support Vector Machine (SVM)

A discriminant analysis method to predict splice site patterns, the principle of which is to map samples linear-inseparable in the primal n-dimension feature space to a higher dimension feature space where they become linear-separable by mathematical transformations. See Chen et al. (2005) for a reference. SVMs are implemented using Scikit-Learn based on LibSVM. Thundersvm is recommended as a substitute for sklearn.svm to support GPU training.

Overall architecture of the support vector machine splice predictor.

For details, see ./Docs.

Quick Start

To train a model and evaluate with a set threshold, just run wam.py, bn.py or svm.py in ./Model:

$ python wam.py

A .model file will be saved in ./Model/donor or ./Model/acceptor accordingly based on the site type you prefer.

Then, you may get plotting by your trained model in ./Utils:

$ python plot.py # You may set the model types in plot.py first

You may check ./Pics for Precision - Recall curves and ROC curves. In addition, a .csv file is also saved in ./Model/donor or ./Model/acceptor containing the metric plots for data inspection and figure adjustment. You can just load these csv files simply by setting load_csv=True in Plot.plot() which spares a large amount of time of repetitive and monotonous predicting tasks.

For more examples and instructions, just see the source code.

Acknowledgment

This work was supported by Professor Zhou from College of Life Science & Technology, Huazhong University of Science and Technology, and Wuhan National Laboratory for Optoelectronics for providing computing resources. Also acknowledge our classmates for helpful suggestions & corrections.

WARNING: This code is not encapsulated well by far. More reconfiguration & maintenance tasks on the way:tired_face:...

This project is welcome for perfection of contributors. ❤️

Citing

Please use the following bibtex for citing my work in your research:

% for Weight Array Model
@article{newiz2021wam,
  title={WAM: A Weight Array Model for Prediction of Eukaryotic Genetic Splice Sites},
  author={Ziwen Zhao},
  journal={Bioinformatics: Data Mining Report},
  year={2021}
}
% for Bayesian Network
@article{newiz2021bn,
  title={Bayesian Network: A Bayesian Statistics Based Probabilistic Graphical Model for Prediction of Eukaryotic Genetic Splice Sites},
  author={Ziwen Zhao},
  journal={Specialty Innovation and Entrepreneurship Training for Bioinformatics},
  year={2021}
}
% for Support Vector Machine
@article{newiz2021svm,
  title={A Support Vector Machine Method for Prediction of Eukaryotic Genetic Splice Sites},
  author={Ziwen Zhao},
  journal={Bioinformatics: Data Mining Report},
  year={2021}
}

Contact

Reach me via E-mail: [email protected] for a conversation. 😆

2021 By Newiz

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.