Git Product home page Git Product logo

niloth-p / phylogenetic-tree-construction-using-agglomerative-and-divisive-hierarchical-clustering Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 1.0 127 KB

The prog is written to construct the phylogenetic tree (dendrogram) based on DNA/Protein sequences of species given in a dataset using Agglomerative and Divisive Hierarchical Clustering and to compare Agglomerative and Divisive methods

License: MIT License

Python 100.00%
phylogenetic-tree dendrogram hierarchical-clustering agglomerative-clustering divisive-clustering min-linkage needleman-wunsch-algorithm proximity-matrix

phylogenetic-tree-construction-using-agglomerative-and-divisive-hierarchical-clustering's Introduction

Phylogenetic Tree using Agglomerative and Divisive Hierarchical clustering

Description:

The prog is written to construct the phylogenetic tree (dendrogram) based on DNA/Protein sequences of species 
given in a dataset using Agglomerative and Divisive Hierarchical Clustering and to compare Agglomerative and Divisive
methods. I have used min linkage to calculate the proximities.

Dataset used:

Human Gene DNA Sequences http://genome.crg.es/datasets/ggalhsapgenes2005/hg16.311.putative.cds.fa

How to run:

python Clusterer.py

For using some other dataset, change the global variable 'filename' with the name of your dataset. And change the global variable 'first_time' to create and store the pickle file.

PseudoCode/Explanation of the algo:

Read data, preprocessing

(also store the data in an ordered dictionary,
so that a list of exclusive keys can be obtained - to get
all pairs of pts for calculating proximity matrix)

Needleman Wunsch algo - for distance calculation

Calc Z - proximity matrix, dump pickle

Clustering algo - Agglomerative

Set numOfClusters = N
Set clusters = list of keys
Loop function call to clustering method until numOfClusters = 1
    Choose min value in Z -> get i and j indices
    Append to matrix Z
        #1 - Cluster A
        #2 - CLuster B
        #3 - Proximity value
        #4 - numOfpts in new cluster formed i.e Total pts in clusters A & B
        Append 1 to 4 to a list, and append that list to Z
    Update numOfClusters -> --1
    Update Clusters -> remove A or B
    Update Proximity Matrix by Ward's method using Lance Williams Formula
Draw dendrogram

Divisive clustering algo

Set numOfClusters = 1
Set clusters = all points as 1 cluster
Loop function call to clustering method until numOfClusters = N
    Choose the cluster to split
        Get the max dissimilarity for each cluster i.e max distance between 2 points in cluster
        Find the cluster with max dissimilarity - this is the cluster to split first
    Find ele to form the new cluster i.e find the elemenet in the cluster
        whose sum of distances to all other points in the cluster is maximum
    Remove the ele to new cluster A from old cluster B
    Split eles in old original cluster B between the 2 clusters A and B 
        Find ele with max(avdisttoB - avdisttoA)
            max - the max value 
            avdist - average distance (sum of distances / num of points)
            If avdisttoB < avdisttoA, skip to next step
        Add the ele to cluster A and remove from cluster B
        Keep finding eles 1 by 1 and adding them to cluster A, and removing from cluster B
            Do it 1 by 1, bcoz the value of avdist will vary with addition/removal of each point
    Update clusters by updating the old cluster B, and adding new cluster A
    Update numOfClusters += 1
    Add [clustA, clustB] to a temp list which will be needed for computing linkage matrix
Get the linkage matrix X such that
    #1 - num
    #2 - num + 1
    #3 - Proximity value
    #4 - [len(clustA), len(clustB)]
    Get #4 from the temp list after reversing its order
Draw dendrogram

phylogenetic-tree-construction-using-agglomerative-and-divisive-hierarchical-clustering's People

Contributors

niloth-p avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Forkers

yousrabrd

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.