nickbrazeau / polysimibd Goto Github PK
View Code? Open in Web Editor NEWStructured Wright Fisher Model for Malaria Population Genetic Simulations
Home Page: https://nickbrazeau.github.io/polySimIBD/
License: MIT License
Structured Wright Fisher Model for Malaria Population Genetic Simulations
Home Page: https://nickbrazeau.github.io/polySimIBD/
License: MIT License
bvtrees
that was fast... but this is unfortunately terribly slow. It is creating a bottleneck in the space mips work that would be helpful to resolve for that project and any future projects that we want to use this simulator for? Wasn't sure if you had any ideas how to speed up -- or if worth looping though in cpp? Or, it may be "fast enough" -- it is just going to be very time consuming to loop back through the bvtrees and calculate realized IBD (i.e. a cluster problem if you want to do it for a population)get_ver_realized_ibd <- function(swfsim, smpl1, smpl2) {
# pull out ARG pieces
# trees point left -- don't care about number of strains w/in, if any ibd, then loci is ibd
arg <- polySimIBD::get_arg(swf = swfsim, host_index = c(smpl1, smpl2))
conn <- purrr::map(arg, "c")
# now look at loci by loci
get_loci_pairwise_ibd <- function(conni, this_coi) {
smpl1con <- conni[1:this_coi[1]]
smpl2con <- conni[(this_coi[1]+1):(cumsum(this_coi)[2])]
# get IBD connections between 1 and 2
pwconn <- which(smpl2con %in% 0:(this_coi[1]-1) )
# if any, IBD
pwconn <- sum(pwconn)
return(pwconn >= 1)
}
# iterate through
lociibd <- purrr::map_dbl(.x = conn,
.f = get_loci_pairwise_ibd, this_coi = swfsim$coi[c(smpl1, smpl2)])
# loci IBD
return(sum(lociibd)/length(lociibd))
}
polySimIBD::get_realized_pairwise_ibd
function (swf, host_index = NULL)
{
assert_custom_class(swf, "swfsim")
assert_pos_int(host_index)
assert_length(host_index, 2)
arg <- polySimIBD::get_arg(swf = swf, host_index = host_index)
coi1 <- polySimIBD::get_realized_coi(swf = swf, host_index = host_index[[1]])
coi1 <- max(coi1)
coi2 <- polySimIBD::get_realized_coi(swf = swf, host_index = host_index[[2]])
coi2 <- max(coi2)
conn <- purrr::map(arg, "c")
tm <- purrr::map(arg, "t")
get_pairwise_ibd <- function(conni, tmi, this_coi) {
smpl1con <- conni[1:this_coi[1]]
smpl2con <- conni[(this_coi[1] + 1):(cumsum(this_coi)[2])]
pwconn <- which(smpl2con %in% 0:(this_coi[1] - 1))
locimatches <- rep(1, length(pwconn))
if (length(pwconn) != 0) {
for (i in 1:length(pwconn)) {
haplotypeindex <- this_coi[1] + pwconn[i] - 1
internalconn <- which(smpl2con %in% haplotypeindex)
if (length(internalconn) != 0) {
for (i in 1:length(internalconn)) {
internalhaplotypeplace <- this_coi[1] + internalconn[i]
if (tmi[internalhaplotypeplace] < tmi[this_coi[1] +
internalconn[i]]) {
locimatches[i] <- locimatches[i] + 1
}
}
}
}
}
return(sum(locimatches))
}
effCOI <- c(coi1, coi2)
numerator <- purrr::map2_dbl(.x = conn, .y = tm, .f = get_pairwise_ibd,
this_coi = swf$coi[host_index])
numerator <- ifelse(numerator > min(effCOI), min(effCOI),
numerator)
pairwiseIBD <- sum(numerator)/(min(effCOI) * length(conn))
return(pairwiseIBD)
}
Prev broke up chromosomes by adding 1e9 bp --> causes disassoc btwn chrom. Fine conceptually and practically - not the cleanest solution and therefore should be documented prominently somewhere in tutorial
Goal 1: try to break function w/ corner cases
Goal 2: agree this is a good definition for IBD from the ARG
The current implementation of the ZTP distribution becomes a computational bottleneck as the mean COI increases. Take this into Cpp or find a more efficient algorithm
Our zero-truncated COI equation/code currently counts genetic clones as distinct "infections". Although true under a FOI POV, not accurate from a genomics POV.
See Bob's maths and code below:
COI distribution in PolySimIBD model.docx
# COI_equilibrium.R
#
# Author: Bob Verity
# Date: 2022-08-16
#
# Purpose:
# Derive the equilibrium COI distribution under the PolySimIBD model for
# infinite population size.
#
# ------------------------------------------------------------------
library(copula) # needed for Stirling numbers
# define model parameters
lambda <- 3
m <- 0.1
K_max <- 20
# calculate transition matrix
K_mat <- matrix(NA, K_max, K_max)
for (K_prev in 1:K_max) {
p2 <- rep(0, K_max)
for (i in 1:K_max) {
N_clonal <- i
k <- 1:min(N_clonal, K_prev)
p1 <- exp(lfactorial(K_prev) - lfactorial(K_prev - k) - N_clonal*log(K_prev)) * Stirling2.all(N_clonal)[k]
p2 <- p2 + dpois(i, lambda * (1 - m) / K_prev) * c(p1, rep(0, K_max - length(p1)))
}
p2 <- c(dpois(0, lambda * (1 - m) / K_prev), p2)
p3 <- rep(NA, K_max + 1)
lambda2 <- lambda * m + lambda * (1 - m) * (1 - 1 / K_prev)
for (i in 0:K_max) {
p3[i + 1] <- sum(p2[(0:i) + 1] * dpois(i:0, lambda2))
}
K_mat[K_prev,] <- p3[-1]
}
K_mat <- K_mat / (1 - dpois(0, lambda))
# calculate equilibrium solution by applying transition a large number of times
# (I'm having trouble getting the Eigenvalue method to work)
K_init <- rep(1/K_max, K_max)
for (i in 1:100) {
K_init <- K_init %*% K_mat
}
# barplot of equilibrium distribution. Red dots show the raw Poisson(lambda)
# distribtion, i.e. what we would expect under pure super-infection from an
# infinitely large population
barplot(K_init, space = 0, names.arg = 1:K_max, col = grey(0.6))
points(1:K_max - 0.5, dpois(1:K_max, lambda) / (1 - exp(-lambda)), pch = 20, col = 2)
R_ignore/migration_models/
Should not have extinction events, so do not need time away from home model discussed in feature_spatial branch? can delete this branch?
review code
diagonal for migration matrix
GH tests said migration matrix test testthat::expect_equal(nrow(diagnl_comb_hosts_df), 0)
had nrow of 1 --- but not able to reproduce it on GH server when re-running test? Not sure how to find that seed...
community detection not coming through for pairwise ibd as expected? issue in how we are casting tbl_graph or deeper issue w/ my assumptions about community detection from pairwise ibd
Confirm model is respecting ARG and IBD chunks
solutions:
current version = forward in time: coalescent tree/bvtree time criteria --> mutations downstream (no issue w/ speed) --> is breaking up IBD blocks by new mutations = definition by Wakeley of IBD versus the HMM POV
back in time: Use adjacency matrix that is sample x sample x max COI for each unique loci
User can crash program if they enter haploids that are incorrect from host id
need to properly document functions
for the various different marginal trees... move to cowplot
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.