Git Product home page Git Product logo

sequence-alignment's Introduction

Sequence Alignment

Implementation of the classic Dynamic Programming problem using the Needleman–Wunsch algorithm which requires quadratic space & time complexity.

Problem Statement

Given 2 sequences, find the minimum cost of aligning the 2 sequences (case insensitive).
Gaps can be inserted to 1 sequence or the other, but incur a penalty.

2 = Gap Penalty (δ)
If 2 characters are aligned with each other, there may be a mismatch penalty (αi j)

  • 0 = aligning identical letters
  • 1 = aligning a vowel with a vowel, or a consonant with a consonant
  • 3 = aligning a vowel with a consonant

Minimum cost = sum of mismatch & gap penalties (the optimal alignment)

Optimal Substructure

There may be multiple optimal paths, but this only finds 1 of them

Runtime

O(M*N)
Storage: also O(M*N)

Pseudocode

Usage

Requirements & Caveats

  • Alphanumeric characters only (all others including spaces are sanitized out)
  • Case insensitive
  • _ (Underscore character) represents gaps but only when displaying results (it will be removed & ignored if it's present in a sequence)
  • View results in a fixed-width font for the 2 sequences to be lined up
  • predecessorIndexes is calculated when creating memoTable but not used to find the actual alignment, only to show where the values in memoTable came from
    • For example: if predecessorIndexes[4][4] contains the array [4, 3], it means the value of memoTable[4][4] (which is 6) came from memoTable[4][3] (i.e. case3, a character in seq2 was aligned with a gap so it came from the left)
    • predecessorIndexes[0][0] contains the array [-1, -1] because the upper left corner has no predecessor

Setup

  • Provide 2 strings (edit testSequences 2D array) & run calculateAndPrintOptimalAlignment()
  • Optionally change the penalties by passing in arguments to the non-default constructor
    Currently vowelVowelMismatchPenalty, consonantConsonantMismatchPenalty & numberNumberMismatchPenalty are the same, but all are arbitrary

Example: aligning "mean" with "name"

Code Notes

  • seq1 & seq2 have a leading space added (after sanitizing illegal & whitespace characters)
    This is so that seq1.charAt(i) & seq2.charAt(j) work in the loops & the string indexes match up
    It causes some adjustments for the array sizes.
  • memoTable = new int[seq1.length()][seq2.length()] uses the exact value of length() which includes the space
  • Loops like for(int i=0; i < seq1.length(); i++) use i < seq1.length() to not go out of bounds of the string indexes
  • In findAlignment(), int i = seq1.length() - 1; & int j = seq2.length() - 1; since the 2 sequences have leading spaces & arrays indexes start from 0
  • findAlignment() retraces each calculation in the memoTable to see where it came from

References

sequence-alignment's People

Contributors

bolds07 avatar sleekpanther avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.