Git Product home page Git Product logo

smith-waterman-optimization's Introduction

Smith-Waterman performance optimizations

Created as a part of the CSC586C (Data Management on Modern Computer Architectures) course at University of Victoria under supervision of Sean Chester.

Contributors: Ze Shi Li, Rohith Pudari, Haris Smajlovic.

Repository contains performance optimizations for Linear gap Smith-Waterman algorithm.

Optimizations

Note: Check if your CPU supports SSE2/4, AVX2 and/or AVX-512 first. Otherwise the SIMD benchmark will still run, but with no valid results.

So far we have a baseline, bithacked, bithacked-striped, multicore-windowed, windowed, simd-alpern and multicore-alpern version of the very same algorithm for a CPU, and cuda-alpern, cuda-antidiagonal, cuda-hypothetical, and cuda-windowed for a GPU:

  • Baseline: A straight forward baseline version of the SW algorithm.
  • Bithacked: Baseline version with heavy branching replaced with bithacks.
  • Bithacked-striped: Bithacked version with an access pattern that is more L1 cache friendly.
  • Windowed: For a scenario in which traceback is not needed but only the best match score.
  • Multicore windowed: Using the technique above, spreads across multiple CPU cores.
  • SIMDed (Alpern technique): A SIMDed baseline using widest registers that your CPU supports and inter-alignment technique from Alpern et al.
  • Multicore (Alpern technique): Just a SIMDed technique above spread accross multiple CPU cores.
  • CUDA (Alpern technique): A SIMTed baseline using inter-alignment technique akin to SIMD Alpern technique above.
  • CUDA windowed: A SIMT implementation of a windowed version above.
  • CUDA antidiagonal: A 2-dimentional parallelisation exposing both inter and intra alignment parallelism.
  • CUDA hypothetical: A 3-dimentional parallelisation in which all data dependency is ignored and parallelization utilized to full extent.

Testing

In order to benchmark the CPU solutions use perf (for now -- sorry non-linux users). So just compile benchmark.cpp and then run perf on the executable.

For testing the GPU solutions just compile and run benchmark.cu.

Don't forget to provide version string as a CLI argument.

CPU Examples

  • For baseline set version=base in your bash
  • For bithacked set version=bithacked in your bash
  • For bithacked-striped set version=bithacked-striped in your bash
  • For windowed set version=windowed in your bash
  • For multicore-windowed set version=multicore-windowed in your bash
  • For SIMDed (Alpern technique) set version=simd-alpern in your bash
  • For multicore set version=multicore-alpern in your bash

and then do

exe_path=benchmark_${version}.out && \
g++ -D THRD_CNT=2 -march=native -fopenmp -Wall -Og -std=c++17 -o $exe_path benchmark.cpp && \
perf stat -e cycles:u,instructions:u ./$exe_path $version

GPU Examples

  • For CUDA (Aplern technique) set version=cuda-alpern in your bash
  • For CUDA windowed set version=cuda-windowed in your bash
  • For CUDA antidiagonal set version=cuda-antidiagonal in your bash
  • For CUDA hypothetical set version=cuda-hypothetical in your bash

and then do

exe_path=benchmark_${version}.out && \
nvcc -O3 \
    -D QUANTITY_SCALE=13 \
    -D SIZE_SCALE=10 \
    -D XBLOCK_SIZE_SCALE=5 \
    -D YBLOCK_SIZE_SCALE=3 \
    -D ZBLOCK_SIZE_SCALE=2 \
    -D WINDOW_SIZE_SCALE=7 \
    -o $exe_path benchmark.cu && \
./$exe_path $version

Results

Version insn per cycle Seconds
Bithacked-striped (optimised-base) 2.77 41.50
SIMD-alpern 2.95 3.76
multicore-alpern 2.26 2.45

smith-waterman-optimization's People

Contributors

hsmajlovic avatar rohithpudari avatar zeshili avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

rohithpudari

smith-waterman-optimization's Issues

More details in readme.md results

Add more details to results in readme.md as well as hardware information of benchmark machine. Maybe pick the best results from report 2 after we finish it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.