Git Product home page Git Product logo

libssa's Introduction

libssa

Introduction

libssa is a library for SIMD accelerated optimal sequence alignments. This is project is currently developed as part of my master's thesis.

Its aim is to provide functionality for aligning query sequences against a sequence database. The main focus is here on:

  • optimal global sequence alignments, using the Needleman-Wunsch algorithm
  • optimal local sequence alignments, using the Smith-Waterman algorithm
  • acceleration of the alignments using multiple threads and SIMD intructions
  • a small public API

A main part of my thesis is to show the benefits of the AVX2 intructionset, for the speedup of optimal sequence alignments. AVX2 is similar to SSE a set of intructions offering SIMD (Single instruction, multiple data) processing, on modern CPUs.

In the past, it had been shown, that a significant speedup can be gained using the SSE (TODO cite). AVX2 implements wider registers: 256 bit instead of 128 bit, with SSE. The hypothesis is, that it gets twice as fast, using AVX2 instead of SSE.

Limitations

  • exit on an error
  • still work in progress

libssa license and third party licenses

The libssa code is licensed under the GNU Affero General Public License version 3.

libssa includes code (the Needleman Wunsch SSE2 implementation) derived from VSEARCH VSEARCH by Torbjørn Rognes et al.

The sequence translation and mapping code was taken from SWARM SWARM by Torbjørn Rognes et al.

References

libssa's People

Contributors

ronnysoak avatar

Stargazers

ylf9811 avatar Hajime Suzuki avatar Mathieu Giraud avatar Jikai Zhang avatar Torbjørn Rognes avatar

Watchers

Torbjørn Rognes avatar  avatar Xiao avatar

libssa's Issues

OSX doesn't know __builtin_cpu_init __builtin_cpu_supports

gcc -Wall -O3 -std=c99  -march=native -c -o src/cpu_config.o src/cpu_config.c
src/cpu_config.c:54:5: error: use of unknown builtin '__builtin_cpu_init'
      [-Wimplicit-function-declaration]
    __builtin_cpu_init();
    ^
src/cpu_config.c:56:26: error: use of unknown builtin '__builtin_cpu_supports'
      [-Wimplicit-function-declaration]
    if( sse2_enabled && !__builtin_cpu_supports( "sse2" ) ) {
                         ^
src/cpu_config.c:56:26: note: did you mean '__builtin_cpu_init'?
src/cpu_config.c:54:5: note: '__builtin_cpu_init' declared here
    __builtin_cpu_init();
    ^
2 errors generated.
make: *** [src/cpu_config.o] Error 1

error: redefinition of typedef ‘__mxxxi’

Getting the following compilation error on OSX using clang.

gcc -Wall -O3 -std=c99  -msse2 -c -o src/algo/16/search_16_util_sse2.o src/algo/16/search_16_util.c
src/algo/16/search_16_util.c:55: error: redefinition of typedef ‘__mxxxi’
src/algo/16/search_16_util.h:38: note: previous declaration of ‘__mxxxi’ was here
make: *** [src/algo/16/search_16_util_sse2.o] Error 1

Removing lines 47-57 from search_16_util.c fixes the problem, since those lines are duplicated from the search_16_util.h header file.

The same issue exists in the 8-bit versions of these files, search_8_util.c and search_8_util.h.

License inconsistency

The README file indicates the code is AGPLv3, but the license file seems to indicate that the code is GPLv3.

alignment results

The aligner returns different cigar string for the alignment scores calculated with the 16 bit and the 64 bit code. Although the codes are the same for both implementations ...

Also the alignment result contains in some runs double entries (same DB-ID)
DB-ID 1055, score: 112, cigar: 7M2I3M3I7M3I3M15I4M14IM10I3M5I4M8I7MI5MI3M4I2MI3MI2M7I
DB-ID 1055, score: 112, cigar: 7M2I3M3I7M3I3M15I4M14IM10I3M5I4M8I7MI5MI3M4I2MI3MI2M7I
DB-ID 1055, score: 112, cigar: 7M2I3M3I7M3I3M15I4M14IM10I3M5I4M8I7MI5MI3M4I2MI3MI2M7I
DB-ID 1151, score: 92, cigar: 5M2I5M3I7M3I3M15I4M13IM10I3M5I4M8I7MI5MI3M4I2MI3MI2M7I
DB-ID 80, score: 92, cigar: 5M2I5M3I8M3I2M15I4M23IMI3M5I4M8I7MI5MI3M4I2MI3MI2M7I

Ant the DB-IDs for the 64 bit and 16 bit code differ, although the scores are the same ...

probably some error in manager.c while gathering the results from the different threads.

constant scoring

Is it useful, to use constant scoring with protein sequences?

SWIPE only allows constant scoring of the symtype is 0 (nucleotide sequences). I am wondering of it might be useful for proteins sequences as well, of if we can restrict it to nucleotide sequences.

error handling

find a suitable way of error handling.

We cannot terminate the program, since it is supposed to be a library used by other programs.

Possible solutions

  • provide an error code as return value and a function, to get a human readable description of it
  • non processable sequences and return them later

error: redefinition of typedef ‘p_s8query’

gcc -Wall -O3 -std=c99 -march=native -c -o src/algo/8/search_8.o src/algo/8/search_8.c
In file included from src/algo/8/search_8.c:21:
src/algo/8/search_8_util.h:44: error: redefinition of typedef ‘p_s8query’
src/algo/8/search_8.h:31: note: previous declaration of ‘p_s8query’ was here
make: *** [src/algo/8/search_8.o] Error 1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.