nickbrazeau / polysimibd Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 9.54 MB

Structured Wright Fisher Model for Malaria Population Genetic Simulations

Home Page: https://nickbrazeau.github.io/polySimIBD/

License: MIT License

R 40.86% C++ 59.14%

polysimibd's People

Contributors

Stargazers

Watchers

polysimibd's Issues

IBD calculation code faster

Verity et al. IBD calculation
I was trying to write a Verity et al PMC7192906 IBD calculation on the bvtrees that was fast... but this is unfortunately terribly slow. It is creating a bottleneck in the space mips work that would be helpful to resolve for that project and any future projects that we want to use this simulator for? Wasn't sure if you had any ideas how to speed up -- or if worth looping though in cpp? Or, it may be "fast enough" -- it is just going to be very time consuming to loop back through the bvtrees and calculate realized IBD (i.e. a cluster problem if you want to do it for a population)
IBD_denominator.pptx
?

get_ver_realized_ibd <- function(swfsim, smpl1, smpl2) {
  # pull out ARG pieces
  # trees point left --  don't care about number of strains w/in, if any ibd, then loci is ibd
  arg <- polySimIBD::get_arg(swf = swfsim, host_index = c(smpl1, smpl2))
  conn <- purrr::map(arg, "c")

  # now look at loci by loci
  get_loci_pairwise_ibd <- function(conni, this_coi) {
    smpl1con <- conni[1:this_coi[1]]
    smpl2con <- conni[(this_coi[1]+1):(cumsum(this_coi)[2])]
    # get IBD connections between 1 and 2
    pwconn <- which(smpl2con %in% 0:(this_coi[1]-1) )
    # if any, IBD
    pwconn <- sum(pwconn)
    return(pwconn >= 1)
  }
  # iterate through
  lociibd <- purrr::map_dbl(.x = conn,
                             .f = get_loci_pairwise_ibd, this_coi = swfsim$coi[c(smpl1, smpl2)])
  # loci IBD
  return(sum(lociibd)/length(lociibd))
}

HMMERTIME - esque IBD calculation
Discuss if we agree this is the proper denominator (see attached ppt) for IBD or if we want to name it something specific (e.g. mutually exclusive pairwise realized IBD)? Plus same issue w/ speed in a population setting...

polySimIBD::get_realized_pairwise_ibd
function (swf, host_index = NULL) 
{
    assert_custom_class(swf, "swfsim")
    assert_pos_int(host_index)
    assert_length(host_index, 2)
    arg <- polySimIBD::get_arg(swf = swf, host_index = host_index)
    coi1 <- polySimIBD::get_realized_coi(swf = swf, host_index = host_index[[1]])
    coi1 <- max(coi1)
    coi2 <- polySimIBD::get_realized_coi(swf = swf, host_index = host_index[[2]])
    coi2 <- max(coi2)
    conn <- purrr::map(arg, "c")
    tm <- purrr::map(arg, "t")
    get_pairwise_ibd <- function(conni, tmi, this_coi) {
        smpl1con <- conni[1:this_coi[1]]
        smpl2con <- conni[(this_coi[1] + 1):(cumsum(this_coi)[2])]
        pwconn <- which(smpl2con %in% 0:(this_coi[1] - 1))
        locimatches <- rep(1, length(pwconn))
        if (length(pwconn) != 0) {
            for (i in 1:length(pwconn)) {
                haplotypeindex <- this_coi[1] + pwconn[i] - 1
                internalconn <- which(smpl2con %in% haplotypeindex)
                if (length(internalconn) != 0) {
                  for (i in 1:length(internalconn)) {
                    internalhaplotypeplace <- this_coi[1] + internalconn[i]
                    if (tmi[internalhaplotypeplace] < tmi[this_coi[1] + 
                      internalconn[i]]) {
                      locimatches[i] <- locimatches[i] + 1
                    }
                  }
                }
            }
        }
        return(sum(locimatches))
    }
    effCOI <- c(coi1, coi2)
    numerator <- purrr::map2_dbl(.x = conn, .y = tm, .f = get_pairwise_ibd, 
        this_coi = swf$coi[host_index])
    numerator <- ifelse(numerator > min(effCOI), min(effCOI), 
        numerator)
    pairwiseIBD <- sum(numerator)/(min(effCOI) * length(conn))
    return(pairwiseIBD)
}

IBD_denominator.pptx

Limitations/Workaround section needed

Prev broke up chromosomes by adding 1e9 bp --> causes disassoc btwn chrom. Fine conceptually and practically - not the cleanest solution and therefore should be documented prominently somewhere in tutorial

Limitations section? or workaround section?

pairwise_ibd corner case detection

Goal 1: try to break function w/ corner cases

Goal 2: agree this is a good definition for IBD from the ARG

zero truncated Poisson slow when COI >= 10

The current implementation of the ZTP distribution becomes a computational bottleneck as the mean COI increases. Take this into Cpp or find a more efficient algorithm

ZP(COI) counts clones

Our zero-truncated COI equation/code currently counts genetic clones as distinct "infections". Although true under a FOI POV, not accurate from a genomics POV.

See Bob's maths and code below:
COI distribution in PolySimIBD model.docx

# COI_equilibrium.R
#
# Author: Bob Verity
# Date: 2022-08-16
#
# Purpose:
# Derive the equilibrium COI distribution under the PolySimIBD model for
# infinite population size.
#
# ------------------------------------------------------------------

library(copula) # needed for Stirling numbers

# define model parameters
lambda <- 3
m <- 0.1
K_max <- 20

# calculate transition matrix
K_mat <- matrix(NA, K_max, K_max)
for (K_prev in 1:K_max) {
  
  p2 <- rep(0, K_max)
  for (i in 1:K_max) {
    N_clonal <- i
    k <- 1:min(N_clonal, K_prev)
    p1 <- exp(lfactorial(K_prev) - lfactorial(K_prev - k) - N_clonal*log(K_prev)) * Stirling2.all(N_clonal)[k]
    p2 <- p2 + dpois(i, lambda * (1 - m) / K_prev) * c(p1, rep(0, K_max - length(p1)))
  }
  p2 <- c(dpois(0, lambda * (1 - m) / K_prev), p2)
  
  p3 <- rep(NA, K_max + 1)
  lambda2 <- lambda * m + lambda * (1 - m) * (1 - 1 / K_prev)
  for (i in 0:K_max) {
    p3[i + 1] <- sum(p2[(0:i) + 1] * dpois(i:0, lambda2))
  }
  
  K_mat[K_prev,] <- p3[-1]
}
K_mat <- K_mat / (1 - dpois(0, lambda))

# calculate equilibrium solution by applying transition a large number of times
# (I'm having trouble getting the Eigenvalue method to work)
K_init <- rep(1/K_max, K_max)
for (i in 1:100) {
  K_init <- K_init %*% K_mat
}

# barplot of equilibrium distribution. Red dots show the raw Poisson(lambda)
# distribtion, i.e. what we would expect under pure super-infection from an
# infinitely large population
barplot(K_init, space = 0, names.arg = 1:K_max, col = grey(0.6))
points(1:K_max - 0.5, dpois(1:K_max, lambda) / (1 - exp(-lambda)), pch = 20, col = 2)

Model Migration Examples

Starter code for migration models under polySimIBD spatial model: R_ignore/migration_models/
Code is not going to work "out of the box"
Should be a good launching point for exploring model behaivor

feature_spatial replaced by simple_spatial_feature?

Should not have extinction events, so do not need time away from home model discussed in feature_spatial branch? can delete this branch?

pairwise IBD speed

BV pairwise IBD needs to be sped up
Need unifying code for defining IBD (i/s/o COI) as follows:
Within = average prob. that two strains, sampled without replacement from the same sample, are IBD
Between = average prob. that two strains, sampled from different samples, are IBD

migration matrix and deme model

review code
diagonal for migration matrix

Unreprex Migration Mat Fail

GH tests said migration matrix test testthat::expect_equal(nrow(diagnl_comb_hosts_df), 0) had nrow of 1 --- but not able to reproduce it on GH server when re-running test? Not sure how to find that seed...

network behavior

community detection not coming through for pairwise ibd as expected? issue in how we are casting tbl_graph or deeper issue w/ my assumptions about community detection from pairwise ibd

Mutation model

Confirm model is respecting ARG and IBD chunks

solutions:
current version = forward in time: coalescent tree/bvtree time criteria --> mutations downstream (no issue w/ speed) --> is breaking up IBD blocks by new mutations = definition by Wakeley of IBD versus the HMM POV
back in time: Use adjacency matrix that is sample x sample x max COI for each unique loci

get_arg checks

User can crash program if they enter haploids that are incorrect from host id

either limit to hosts only
or put checks in place

documentation needed

need to properly document functions

facet_wrap having hard time parsing geom_segments

for the various different marginal trees... move to cowplot

nickbrazeau / polysimibd Goto Github PK

polysimibd's People

Contributors

Stargazers

Watchers

polysimibd's Issues

Recommend Projects

Recommend Topics

Recommend Org