Git Product home page Git Product logo

dmmimesim's Introduction

dmMIMESim

First round MIME:

  1. Simulate sequence variants in pool of a wild type sequence (random mutagenesis 1)
  2. Prove ground-truth or randomly assign fitness values for each position-wise mutation and pairwise epistatic effects
  3. Derive species frequencies in equilibrium
  4. Add statistical noise
  5. Output positionwise mutant counts, sequences and the groundtruth

Second round MIME:

  1. Read in results from first round of MIME
  2. Simulate sequence variants in unbound pool of first round experiment (random mutagenesis 2)
  3. Derive species frequencies in equilibrium
  4. Add statistical noise
  5. Output positionwise mutant counts, sequences and the groundtruth

Detailed explanation will be added

Install

Go into the MIMESim directory and create a build directory and cmake and make from there.

cd /path/to/MIMESim
mkdir build
cd build
cmake ..
make

This will create the binary file .../MIMESim/build/bin/MIMESim_prog

Run

Running the program with an optional parameter, giving the path to the result directory. If no parameter is given, the output will be written to the newly created directory ../results.

1: default

/path/to/MIMESim_prog

First round simulation. Program will create dir ./results and write outputs and (default) params to this folder.

2: working-dir

/path/to/MIMESim_prog --working-dir path/to/read/and/write

First round simulation. Program will use parameters.txt from working-dir if available.

3: read-gt

/path/to/MIMESim_prog [--working-dir path/to/read/and/write] --read-gt

First round simulation. Program will use ground truth files (single_kds.txt, pairwise_epistasis.txt) from working-dir (default=.../results). Exits with error if files are not available.

4: previous-dir

/path/to/MIMESim_prog [--working-dir path/to/read/and/write] --previous-dir path/to/results/first/round/exp 

Second round simulation. Program will use the outputs from a previous first-round simulation stored in previous-path as starting point of a second round.

parameters

If you want to set the simulation parameters, create the parameters.txt file in the desired result directory and add the following parameter names and values (tab separated):

parameter type description
kd_wt (float) (absolute Kd for the wildtype sequence (default: 1))
p_effect (float) probability per position to have an effect on function when mutated (default: 0.5)
p_epistasis (float) probability for each pair of mutations to have epistatic effect (default: 0.3)
L (int) Length of the sequence (default: 50)
q (int) (number of possible symbols per position (default: 2))
M (int) (size of initial population (default: 12000000)
p_mut (float) probability for each position of being mutated (default: 0.01)
p_error (float) probability of introducing an error (default: 0.001)
seed (int) seed for random number generator used
epi_restrict (int) regulation for drawing epistatic interaction, see comments below. (default: 0)
B_tot (float) total amount of protein in competition experiment, given in relation to the amount of sequences (default: 2.0)
max_mut (int) maximal number of mutation allowed per sequence (default=-1, no max)

Comments:

  1. Right now, kd_wt is always 1, but this can be easily adjusted in the code.
  2. Epistasis values (strength of interaction) are sampled from a standard log normal distribution.
  3. Possible values for epi_restrict are 0,1,2:
  • 0=unrestricted, all possible pairs of functional mutations mut1=(pos1, symbol1), mut2=(pos2, symb2) with pos1!=pos2, kd_mut1!=kd_wt and kd_mut2!=kd_wt are assigned an epistasis value with probability p_epistasis.
  • 1=semi-restricted, each mutation can only be in an epistatic interaction with one other mutation. From all functional mutations mut=(pos, symbol), Mf, Mf/2 pairs are assigned an epistasis value with probability p_epistasis.
  • 2=restricted, same as 1, only now positions themselves are mutually exclusive: each position can only be in an epistatic interaction with one other position.

Output

  • 1d counts (with noise)
  • 2d counts (with noise)
  • full-length sequences (with noise)
  • species (sequence variants without sequencing noise)
  • amount of free protein in equilibrium, per binding experiment
  • ground truth Kds (position and pairwise) and epistasis values
result_directory1
├── 1d
│   ├── 1.txt
│   ├── 2.txt
│   ├── 3.txt
│   └── 4.txt
├── 2d
│   ├── 1.txt
│   ├── 2.txt
│   ├── 3.txt
│   └── 4.txt
├── free_protein_1_2.txt
├── free_protein_3_4.txt
├── pairwise_epistasis.txt
├── pairwise_kds.txt
├── parameters.txt
├── sequences
│   ├── 1.txt
│   ├── 2.txt
│   ├── 3.txt
│   └── 4.txt
├── single_kds.txt
└── species
    ├── 1.txt
    ├── 2.txt
    ├── 3.txt
    └── 4.txt
results_directory2
├── 1d
│   ├── 5.txt
│   ├── 6.txt
│   ├── 7.txt
│   └── 8.txt
├── 2d
│   ├── 5.txt
│   ├── 6.txt
│   ├── 7.txt
│   └── 8.txt
├── free_protein_5_6.txt
├── free_protein_7_8.txt
├── pairwise_epistasis.txt
├── pairwise_kds.txt
├── parameters.txt
├── sequences
│   ├── 5.txt
│   ├── 6.txt
│   ├── 7.txt
│   └── 8.txt
├── single_kds.txt
└── species
    ├── 5.txt
    ├── 6.txt
    ├── 7.txt
    └── 8.txt

Sequence pool encoding

  • 1 = wt_bound,
  • 2 = wt_unbound,
  • 3 = mut_bound,
  • 4 = mut_unbound,
  • 5 = mut_bound_bound,
  • 6 = mut_bound_unbound,
  • 7 = mut_unbound_bound,
  • 8 = mut_unbound_unbound,

Tests

Test should be run from within the main directory /path/to/MIMESim, for example:

cd /path/to/MIMESim
./build/bin/all_tests

dmmimesim's People

Contributors

wvdtoorn avatar maureensmith avatar

Forkers

joshuagensel

dmmimesim's Issues

SpeciesKd calculation seems off

The value for KD_WT in params reflects the Kd of the wildtpye species, not of a single position.

In the case that KD_WT == 1, the Kd's of the single positions are equal to the species Kd and multiplicative processing of all position-wise Kd's (as done in https://github.com/wvandertoorn/dmMIMESim/blob/3ab12d908c8d33167a7dcc34d3b33547b2fbbf71/src/Species.cpp#L223C19-L223C35 ) will return a species Kd of 1.
However, in the case that KD_WT =/= 1, multiplicative processing of all position-wise Kd's will return incorrect species Kd values.

Possible fix:

Assume all position-wise Kd's in the wildtype have the same magnitude. Let's call it KD_WT_pos. Estimate KD_WT_pos through the Nth root of the KD_WT value. With N the sequence length.
Use KD_WT_pos as default value in the FuncitonalSequence creation in FunctionalSequence::drawKdValues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.