Git Product home page Git Product logo

twisst2's Introduction

twisst2

twisst2 is a tool for topology weighting. Topology weighting summarises genealogies in terms of the relative abundance of different sub-tree topologies. It can be used to explore processes like introgression and it can aid the identification of trait-associated loci.

twisst2 has a number of important improvements over the original twisst tool. Most importantly, twisst2 incorporates inference of the tree sequence - local genealogies and their breakpoints along the chromosome. It does this using sticcs. sticcs is a model-free approach and it does not require phased data, so twisst2 can run on unphased genotypes of any ploidy.

The standard way to run twisst2 is to start from polarised genotype data. This means you either need to know the ancestral allele at each site, or you need an appropriate outgroup(s) to allow inference of the derived allele.

An alternative way to run it is by first inferring a tree sequence using a different tool like [tsinfer](https://tskit.dev/tsinfer/docs/stable/index.html]. However, this requires phased and imputed genotypes, and imputation can introduce biases if model assumptions are violated.

Installation

First install sticcs` by following the intructions there.

If you would like to analyse tree sequence objects from tools like msprime and tsinfer, you will also need to install tskit yourself..

git clone https://github.com/simonhmartin/twisst2.git

cd twisst2

pip install -e .

Command line tool

Starting from unphased (or phased) genotypes

In standard usage, twisst2 takes as input a modified vcf file that contains a DC field, giving the count of derived alleles for each individual at each site.

Once you have a vcf file for your genotype data, you can make the modified version using sticcs (this needs to be installed, see above):

sticcs prep -i <input vcf> -o <output vcf>  --outgroup <outgroup sample ID>

If the vcf file already has the ancestral allele (provided in the AA field in the INFO section), then you do not need to specifiy outrgoups for polarising.

Now you can run the twisst to count sub-tree topologies:

twisst2 -i <input_vcf> -o <output_prefix> --max_iterations 100 --ploidy 2 --groups <groupname1> <groupname2> <groupname3> <groupname4> --groups_file

Output

  • <output_prefix>.topocounts.tsv.gz gives the count of each group tree topology for each interval.
  • <output_prefix>.intervals.tsv.gz gives the chromosome, start and end position of each interval.

Plotting

The script plot_twisst.R provides functions for importing and plotting. Examples will be provided soon.

twisst2's People

Contributors

simonhmartin avatar

Stargazers

Tyler Chafin avatar Yudong Cai avatar  avatar Yulu Ye avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.