Git Product home page Git Product logo

cactus_alignments_tools's Introduction

Cactus_Alignments_Tools

The toolbox used to deal with cactus whole genome alignments. Because the efficiency of perl is limitted, I am happy to work with any person how would like to improve this pipeline.

BTW, one anthoer tool, halSynteny would also be a good way to extracted the reasonable single copy alignments by a pairwise way. Here my pipeline support a chance to deal with all synteny chain based on one uniq reference species.

Requirement: https://github.com/ComparativeGenomicsToolkit/hal.git And add the bin to your PATH

How to use: You could use the extract_shell.pl to create the work shell.

Ho to work

To obtain single-copy alignments from Cactus alignments, a pipeline to get the best synteny blocks was designed as follows: image

  1. Determination of the reference species. The reference species would provide the positional coordinates for synteny identification.

  2. For each query species, aligned blocks from the same chromosome and the same strand were regarded as a synteny chain.

  3. For the synteny chain with insertions, the original synteny chain was subdivided into several synteny chains with locally colinear blocks to make sure each synteny chain had a continuous order within a single strand.

  4. For each query species, all synteny chains were ordered by length without gaps and clustered to the different types at 100-bp intervals.

  5. For each type, all synteny chains were further ordered by aligned scores (S). MA is the number of matched bases, and MI is the number of aligned but unmatched bases. Insertions and deletions were also calculated. The number of break points in the query is denoted by B, and the number of gaps in the query is denoted by G. So, S was calculated as

image

  1. For each query species, the base from the best synteny chain (from higher length and score to lower) was selected to constitute the final best synteny blocks.

  2. To prevent high chimeric synteny blocks, all insertions shorter than 10 bp were eliminated.

Finally, single-copy synteny blocks were obtained from each query species. To optimize this process, the whole-genome alignment was split into multiple 100-kb (for penguins) or 10-kb (for all birds) windows based on reference and were filtered in parallel.

cactus_alignments_tools's People

Contributors

secretloong avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.