Git Product home page Git Product logo

sibelia's Introduction

Sibelia 3.0.7

Release date: 2th June 2017

Authors

  • Ilya Minkin (St. Petersburg Academic University)
  • Nikolay Vyahhi (St. Petersburg Academic University)
  • Mikhail Kolmogorov (St. Petersburg Academic University)
  • Ekaterina Starostina (St. Petersburg Academic University)
  • Son Pham (University of California, San Diego)

Introduction

This package contains two programs:

  • Sibelia -- "Sibelia" is a tool for finding synteny blocks in closely related genomes, like different strains of the same bacterial species. It takes a set of FASTA files with genomes and locates coordinates of the synteny blocks in these sequences. It also represents genomes as permutations of the blocks.

  • C-Sibelia -- This tool is designed for comparison between two genomes represented either in finished form or as sets of contigs. It is able to detect SNPs/SNVs and indels of different scales. "C-Sibelia" works by locating synteny blocks between the input genomes and aligning different copies of a block. It considers only unique blocks, i.e. blocks having one copy in reference and one copy in another genome. It is also possible to get the alignments itself, in this case you will get the alignments of repeats within genomes as well.

It also contains a script for annotation of variants found by "C-Sibelia" using the "snpEff" tool.

Note that Sibelia does not support inputs larger than 1 GB. Please use SibeliaZ and maf2synteny to align and construct blocks for longer genomes.

Installation

See INSTALL.md file.

Usage

See SIBELIA.md for "Sibelia", C-SIBELIA.md for "C-Sibelia" and "ANNOTATION.md" for the annotation script.

System requirements

This version of "C-Sibelia" supports only "Unix"-like operating systems, but "Sibelia" runs fine on "Windows". To use "C-Sibelia", "Windows" users may use a virtual machine or try our web server:

http://etool.me/software/csibelia

"Windows" support will be retained soon in the future releases. Binary releases for "Windows" contain only "Sibelia" binaries.

"C-Sibelia" requires "Python" and "Perl" to be istalled in your system. Please note that "C-Sibelia" runs only under "Python2" of version at least 2.7. The annotation script requires "Java" runtime.

Citation

If you use "Sibelia" in your research, please cite:

Ilia Minkin, Anand Patel, Mikhail Kolmogorov, Nikolay Vyahhi, and Son Pham. "Sibelia: a scalable and comprehensive synteny block generation tool for closely related microbial genomes." In Algorithms in Bioinformatics, pp. 215-229. Springer Berlin Heidelberg, 2013.

License

"Sibelia" is distributed under GNU GPL v2 license, see LICENSE.

It also uses third-party librarires and programs:

Contacts

E-mail your feedback at [email protected].

You also can report bugs or suggest features using issue tracker at GitHub https://github.com/bioinf/Sibelia/issues

sibelia's People

Contributors

forgotenn avatar iminkin avatar kspham avatar mikolmogorov avatar starostinak avatar vyahhi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

sibelia's Issues

Bulletproof FASTA parser

Write reliable FASTA format (http://en.wikipedia.org/wiki/FASTA_format) parser.
Specification of the format: http://blast.ncbi.nlm.nih.gov/blastcgihelp.shtml.
Some notes:

  1. Parser must allow only characters {A, C, G, T, U, N, a, c, g, t, n, u} in sequences. Other characters in sequences must be considered as errors
  2. Allow spaces in headers. I.e. header "> Header header" must be read as "Header header"

Requirements:

  1. There should be a class "FASTAReader" that takes a file name in constructor
  2. FASTAReader has method "GetSequences", it returns (by reference) a vector of structs, each struct has two fields ("sequence" and "header")
  3. I case of any errors it must throw an exception that contains detailed information about the error, like "Wrong character '$' at line 97, position 33".
  4. It can be easily modified to be able to work with gzipped FASTA (or do it now).

Ad-hoc optimization in unrolled list

While inserting at position Y use possible free space before Y. For example:
0 1 2 3 4 5
X X X A C G

If we insert before position 3, then elements 0, 1, 2 must be used for insertion.

Installation fails on ace

Linking CXX executable Sibelia
[100%] Built target Sibelia
Install the project...
-- Install configuration: "Release"
-- Installing: /usr/local/bin/Sibelia
CMake Error at cmake_install.cmake:42 (FILE):
file INSTALL cannot copy file
"/home/vyahhi/bioinf-Sibelia-139af21/build/Sibelia" to
"/usr/local/bin/Sibelia".

make: *** [install] Error 1

Compilation fails

make

[  5%] Building C object libdivsufsort-2.0.1/lib/CMakeFiles/divsufsort.dir/divsufsort.o
[ 11%] Building C object libdivsufsort-2.0.1/lib/CMakeFiles/divsufsort.dir/sssort.o
[ 16%] Building C object libdivsufsort-2.0.1/lib/CMakeFiles/divsufsort.dir/trsort.o
[ 22%] Building C object libdivsufsort-2.0.1/lib/CMakeFiles/divsufsort.dir/utils.o
Linking C static library libdivsufsort.a
[ 22%] Built target divsufsort
Scanning dependencies of target SyntenyFinder
[ 27%] Building CXX object CMakeFiles/SyntenyFinder.dir/main.cpp.o
In file included from /Users/vyahhi/projects/SyntenyBuilder/src/main.cpp:2:
In file included from /Users/vyahhi/projects/SyntenyBuilder/src/outputgenerator.h:4:
In file included from /Users/vyahhi/projects/SyntenyBuilder/src/blockfinder.h:4:
In file included from /Users/vyahhi/projects/SyntenyBuilder/src/hashing.h:4:
In file included from /Users/vyahhi/projects/SyntenyBuilder/src/dnasequence.h:4:
In file included from /Users/vyahhi/projects/SyntenyBuilder/src/fasta.h:5:
/Users/vyahhi/projects/SyntenyBuilder/src/common.h:9:10: fatal error: 'cstdint' file not found
# include <cstdint>

 ^

1 error generated.
make[2]: **\* [CMakeFiles/SyntenyFinder.dir/main.cpp.o] Error 1
make[1]: **\* [CMakeFiles/SyntenyFinder.dir/all] Error 2
make: **\* [all] Error 2```

Script for auto-release

One button (script) to generate distributions from the master branch:
Sibelia-X.Y.Z-Linux-x86-x64.tar.gz
Sibelia-X.Y.Z-Linux-x86-x64.zip
Sibelia-X.Y.Z-OSX-x86-x64.tar.gz
Sibelia-X.Y.Z-OSX-x86-x64.zip
Sibelia-X.Y.Z-Win-x86--x64.zip

Unrolled list with vectors

Make a version of unrolled list that has vectors instead of static arrays:

  1. two vectors, one for T, other for A (meta)
  2. erase physically erases elements (and invalidates them, like insert)
  3. size of vectors is limited by NODE_SIZE

Replace std::list by unrolled list

Implement https://github.com/bioinf/Sibelia/blob/master/src/unrolledlist.h
At first, read http://en.wikipedia.org/wiki/Unrolled_linked_list and http://en.literateprograms.org/Unrolled_linked_list_(C_Plus_Plus).

Implementation details:

  1. Each node contains NODE_SIZE elements (static array)
  2. Erased elements are indicated by value erased_indicator (passed in constructor)
  3. Iterator's ++ and -- iterators move current iterator to the next (previous) valid (not erased value)
  4. Method erase sets all valid values in range [begin, end) to erased_indicator
  5. Method insert inserts range [source_begin, source_end) after the place target. If there is any unvalidated iterator x, then x should put in vector invalidated iff notify_predicate(x).
  6. Note that implementation must support allocators (as any STL data structure)
  7. Write tests for unrolled_list

Improve index

Get rid of hashtables, use plain vectors to store bifurcation index

Warnings during compilation on Mac

[ 75%] Building CXX object CMakeFiles/SyntenyFinder.dir/hashing.cpp.o [ 83%] Building CXX object CMakeFiles/SyntenyFinder.dir/serialization.cpp.o /Users/vyahhi/projects/SyntenyBuilder/src/serialization.cpp:16:48: warning: format specifies type 'unsigned long' but the argument has type 'ull' (aka 'unsigned long long') [-Wformat] sprintf(&buf[0], "[color=\"%s\", label=\"(%lu, %lu)\"];", color.c_str(), static_cast<ull>(chr), static_cast<ull>(pos)); ~~^ ~~~~~~~~~~~~~~~~~~~~~ %llu /Users/vyahhi/projects/SyntenyBuilder/src/serialization.cpp:16:53: warning: format specifies type 'unsigned long' but the argument has type 'ull' (aka 'unsigned long long') [-Wformat] sprintf(&buf[0], "[color=\"%s\", label=\"(%lu, %lu)\"];", color.c_str(), static_cast<ull>(chr), static_cast<ull>(pos)); ~~^ ~~~~~~~~~~~~~~~~~~~~~ %llu /Users/vyahhi/projects/SyntenyBuilder/src/serialization.cpp:94:51: warning: format specifies type 'unsigned long' but the argument has type 'ull' (aka 'unsigned long long') [-Wformat] sprintf(&buf[0], "[color=\"%s\", label=\"chr=%lu pos=%lu len=%lu ch=%c\"];", color.c_str(), uchr, upos, ulength, edge[i].firstChar); ~~^ ~~~~ %llu /Users/vyahhi/projects/SyntenyBuilder/src/serialization.cpp:94:59: warning: format specifies type 'unsigned long' but the argument has type 'ull' (aka 'unsigned long long') [-Wformat] sprintf(&buf[0], "[color=\"%s\", label=\"chr=%lu pos=%lu len=%lu ch=%c\"];", color.c_str(), uchr, upos, ulength, edge[i].firstChar); ~~^ ~~~~ %llu /Users/vyahhi/projects/SyntenyBuilder/src/serialization.cpp:94:67: warning: format specifies type 'unsigned long' but the argument has type 'ull' (aka 'unsigned long long') [-Wformat] sprintf(&buf[0], "[color=\"%s\", label=\"chr=%lu pos=%lu len=%lu ch=%c\"];", color.c_str(), uchr, upos, ulength, edge[i].firstChar); ~~^ ~~~~~~~ %llu 5 warnings generated. [ 91%] Building CXX object CMakeFiles/SyntenyFinder.dir/synteny.cpp.o

Add log mirroring

Log with progressbars should be printed in stdout and more detailed log should be printed in file.

make install requires sudo

$ make install
[ 20%] Built target divsufsort
Scanning dependencies of target Sibelia
[ 25%] Building CXX object CMakeFiles/Sibelia.dir/main.cpp.o
[ 30%] Building CXX object CMakeFiles/Sibelia.dir/outputgenerator.cpp.o
[ 35%] Building CXX object CMakeFiles/Sibelia.dir/blockfinder.cpp.o
[ 40%] Building CXX object CMakeFiles/Sibelia.dir/bulgeremoval.cpp.o
[ 45%] Building CXX object CMakeFiles/Sibelia.dir/edge.cpp.o
[ 50%] Building CXX object CMakeFiles/Sibelia.dir/graphalgorithm.cpp.o
[ 55%] Building CXX object CMakeFiles/Sibelia.dir/serialization.cpp.o
[ 60%] Building CXX object CMakeFiles/Sibelia.dir/synteny.cpp.o
[ 65%] Building CXX object CMakeFiles/Sibelia.dir/platform.cpp.o
[ 70%] Building CXX object CMakeFiles/Sibelia.dir/vertexenumeration.cpp.o
Linking CXX executable Sibelia
[100%] Built target Sibelia
Install the project...
-- Install configuration: "Release"
-- Installing: /usr/local/bin/Sibelia
CMake Error at cmake_install.cmake:43 (FILE):
file cannot create directory: /usr/share/sibelia. Maybe need
administrative privileges.

make: *** [install] Error 1

Circos diagram ids

Show chromosome ids (FASTARecord::description) not their numbers on the diagram.

Create option --inram

If this switch is ON, then all computations are performed in RAM without creating temporary files.

UnrolledList

Add constructor by default
Add method set_erased_value(const T& value)

If someone tries to use unrolled list without setting proper erased value, then program must crash in debug mode (use assert)

Segmentation fault: 11

Run:

cd build
Sibelia -s loose Helicobacter_pylori.fasta

Got:

Segmentation fault: 11

P.S. I have no file Helicobacter_pylori.fasta in build folder, but Sibelia should explain me this by error message, not segfault.

Add parameters

There should be following parameters to set:

  1. --infile= $input_file #input file name, mandatory
  2. --parset= fine | loose #parameter set, optional, default = loose
  3. --minlength = #minimum block length, optional, default=5000
  4. --stagefile = $stage_file #stage file name, optional, incompatible with parset
  5. --graphfile = $graph_file #dot file for condensed graph, optional
  6. --repfile = $rep_file #path to report file, optional, default = $input_file + "_report"
  7. --chrfile = $chr_file #path to chr file, optional, default = $input_file + "_chr"
  8. --coordfile = $coord_file #path to coordfile, optional, default = $input_file + "_coord"
  9. --blockfile = $block_file #path to blocks file, optional
  10. --maxiterations = #maximum number of simplification iterations, default = 4

Add circos generation

Extend class OutputGenerator so that it will contain method:

void GenerateCircosOutput(std::ostream & out, ...) const

It must take one (or more, if necessary) std::ostream's and write synteny blocks so that circos can visualize them.

Add a cmd option that allows user to generate circos files. By default, it should be not set.

Website

Update website with new text and interactive picture.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.