Git Product home page Git Product logo

protein_sequence_classification's Introduction

Protein Sequence Classification

Protein-Protein interaction predicton

Dataset source: https://thebiogrid.org/77183/publication/a-human-protein-protein-interaction-network-a-resource-for-annotating-the-proteome.html

Either using identifier ID (Case-1) or gene names (Case-2) of protein A and protein B, to determine the interaction type. In this course, I considered a study of employing MLP for finding the performance over Case-1 and Case-2. Furthermore, I also employ a possible new approach to feed the temporal information surrounding individual points to study the peroformance over case-2 into a MLP classifier.

This apporach I term as "LSTM-Masked-MLP". Following are the schemes of two topologies associated to it.

Topology-1

image

Topology-2

image

The difference between topology 1 and 2 is that in topology-1, same MLP is considered over each point at sequence, while in toplogy-2, different MLP is associated with each point of sequence, which are the output hidden states of LSTM. The MLP acts on masked hidden states where the considered hidden state is removed, and the remaining sequency is mapped to a low dimensional sequence. Later, the concatenated output is fed into a classifier like MLP to determine classification probabilities.

Following is the comparison of training accuracy and loss curves during cases of study.

image

Determination of Gene Ontologies from Protein Interaction Network

The dataset utilized for this case is same as that of the paper "Predicting Multicellular Function through Multi-layer Tissue Networks", which is conveniently utilized via torchgeometric library. The problem can be identified as node classification, where each node is a protein with positional gene sets, motif gene sets and immunological signatures as features.

For the case study of utilizing of Graph Convolution Network (GCN) and Graph Attention Network (GAN), following is the training and validation profile comparison. The bottom accuracy curves are infact validation curves.

image

protein_sequence_classification's People

Contributors

superdianuj avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.