Git Product home page Git Product logo

2019_puller_sitespecificgtr's Introduction

Inference of site specific GTR models

The majority of the steps described below make use of the HPC infrastructure at sciCore of the University of Basel. Jobs were submitted to a slurm based compute cluster and the corresponding scripts are given here for reference purposes only (see below).

Simulated data generation

Artifical test data are generated by

* generating a site specific GTR model by drawing rates for each site from a Gamma distribution and equilibrium frequencies for each site from a Dirichlet distribution.
* generating a Yule tree with the specified number of leaves
* simulating sequence evolution along the tree using the GTR model
* saving the model, alignment, tree, and ancestral sequences for future analysis

These steps are executed using the script generate_toy_data.py in the directory src.

Inferring site-specific models

The simulated data are analyzed using the script reconstruct_toy_data.py. This script infers site-specific models using a number of different approaches of varying complexity. The results are saved in a pickle file for further analysis.

Reconstructing branch lengths

To compare the ability to correctly infer branch lengths using inferred models, we jointly infer models and branch length. This step is performed in the script src/calculate_branch_length.py.

Results

Generating the results below from scratch is computationally expensive and requires 1000s of CPU hours (in 2019). The output necessary to reproduce the graphs of the paper are therefore provided as a tarball and sorted on github's large file storage. The file is simulation_results.tar.gz and the corresponding hash can be found in .gitattributes.

Specific commands

The following scripts submit the jobs on the sciCore compute cluster used to generate and analyse the toy data.

Generate data

python submit_gendata.py --nvals 100 --submit python submit_gendata.py --nvals 300 --submit python submit_gendata.py --nvals 1000 --submit python submit_gendata.py --nvals 3000 --submit

reconstruct models from simulated data

python submit_reconstruction.py --date 2019-12-25 --nvals 100 --submit python submit_reconstruction.py --date 2019-12-25 --nvals 300 --submit python submit_reconstruction.py --date 2019-12-25 --nvals 1000 --submit python submit_reconstruction.py --date 2019-12-25 --nvals 3000 --submit

calculate branch length deviations

python submit_branch_length.py --nvals 100 --date 2019-12-25 --submit python submit_branch_length.py --nvals 300 --date 2019-12-25 --submit python submit_branch_length.py --nvals 1000 --date 2019-12-25 --submit python submit_branch_length.py --nvals 3000 --date 2019-12-25 --submit

collect branch length into tablular files files

python submit_collect_tree_lengths.py --nval 100 --submit --date 2019-12-25 python submit_collect_tree_lengths.py --nval 300 --submit --date 2019-12-25 python submit_collect_tree_lengths.py --nval 1000 --submit --date 2019-12-25

submit model deviation analysis

Plot figures 1 & 2 -- model inference accuracy

run src/plot_toy_data.py --prefix 2019-12-25_simulatedData_L1000_ratealphaXXX_freqalpha1.0_nuc_results --nvals 1000

corresponding supplementary figures

run src/plot_toy_data.py --prefix 2019-12-25_simulatedData_L1000_ratealphaXXX_freqalpha0.2_aa_results --nvals 1000

Plot figure 3 -- tree length

run src/compare_tree_length.py --nval 300 --treelengths 2019-12-25_simulatedData_L1000_ratealpha1.5_freqalpha1.0_nuc_results/collected_tree_lengths_n300.tsv

corresponding supplementary figure

run src/compare_tree_length.py --nval 300 --treelengths 2019-12-25_simulatedData_L1000_ratealpha1.5_freqalpha0.2_aa_results/collected_tree_lengths_n300.tsv --aa

Plot figure 4 -- model deviation

run src/model_deviation.py --files 2019-12-25_simulatedData_L1000_ratealpha1.5_freqalpha1.0_nuc_results/ModelDeviation_L1000_n300_m0.* --output figures/model_deviation_n300.pdf

Plot figure 5 -- HIV fitness landscape

run src/analyze_HIV_tree.py --prefix HIV_data/results/HIV_B_pol --pc 0.001 --gene pol --redo

run src/analyze_HIV_tree.py --prefix HIV_data/results/HIV_B_pol --pc 0.001 --gene pol --redo --aa

corresponding supplementary figures

run src/analyze_HIV_tree.py --prefix HIV_data/results/HIV_B_nef --pc 0.001 --gene pol --redo run src/analyze_HIV_tree.py --prefix HIV_data/results/HIV_B_gag --pc 0.001 --gene pol --redo

Plot figure 6 -- Branch length dependence due to ignoring site specificity of HIV

run src/HIV_branch_length.py --prefix HIV_data/results/HIV_B_pol --pc 0.01

corresponding supplementary figures

run src/HIV_branch_length.py --prefix HIV_data/results/HIV_B_pol --pc 0.01 --aa

2019_puller_sitespecificgtr's People

Contributors

rneher avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.