Git Product home page Git Product logo

polysimibd's People

Contributors

bobverity avatar nickbrazeau avatar

Stargazers

 avatar

Watchers

 avatar

polysimibd's Issues

IBD calculation code faster

  1. Verity et al. IBD calculation
    I was trying to write a Verity et al PMC7192906 IBD calculation on the bvtrees that was fast... but this is unfortunately terribly slow. It is creating a bottleneck in the space mips work that would be helpful to resolve for that project and any future projects that we want to use this simulator for? Wasn't sure if you had any ideas how to speed up -- or if worth looping though in cpp? Or, it may be "fast enough" -- it is just going to be very time consuming to loop back through the bvtrees and calculate realized IBD (i.e. a cluster problem if you want to do it for a population)
    IBD_denominator.pptx
    ?
get_ver_realized_ibd <- function(swfsim, smpl1, smpl2) {
  # pull out ARG pieces
  # trees point left --  don't care about number of strains w/in, if any ibd, then loci is ibd
  arg <- polySimIBD::get_arg(swf = swfsim, host_index = c(smpl1, smpl2))
  conn <- purrr::map(arg, "c")

  # now look at loci by loci
  get_loci_pairwise_ibd <- function(conni, this_coi) {
    smpl1con <- conni[1:this_coi[1]]
    smpl2con <- conni[(this_coi[1]+1):(cumsum(this_coi)[2])]
    # get IBD connections between 1 and 2
    pwconn <- which(smpl2con %in% 0:(this_coi[1]-1) )
    # if any, IBD
    pwconn <- sum(pwconn)
    return(pwconn >= 1)
  }
  # iterate through
  lociibd <- purrr::map_dbl(.x = conn,
                             .f = get_loci_pairwise_ibd, this_coi = swfsim$coi[c(smpl1, smpl2)])
  # loci IBD
  return(sum(lociibd)/length(lociibd))
}

  1. HMMERTIME - esque IBD calculation
    Discuss if we agree this is the proper denominator (see attached ppt) for IBD or if we want to name it something specific (e.g. mutually exclusive pairwise realized IBD)? Plus same issue w/ speed in a population setting...
polySimIBD::get_realized_pairwise_ibd
function (swf, host_index = NULL) 
{
    assert_custom_class(swf, "swfsim")
    assert_pos_int(host_index)
    assert_length(host_index, 2)
    arg <- polySimIBD::get_arg(swf = swf, host_index = host_index)
    coi1 <- polySimIBD::get_realized_coi(swf = swf, host_index = host_index[[1]])
    coi1 <- max(coi1)
    coi2 <- polySimIBD::get_realized_coi(swf = swf, host_index = host_index[[2]])
    coi2 <- max(coi2)
    conn <- purrr::map(arg, "c")
    tm <- purrr::map(arg, "t")
    get_pairwise_ibd <- function(conni, tmi, this_coi) {
        smpl1con <- conni[1:this_coi[1]]
        smpl2con <- conni[(this_coi[1] + 1):(cumsum(this_coi)[2])]
        pwconn <- which(smpl2con %in% 0:(this_coi[1] - 1))
        locimatches <- rep(1, length(pwconn))
        if (length(pwconn) != 0) {
            for (i in 1:length(pwconn)) {
                haplotypeindex <- this_coi[1] + pwconn[i] - 1
                internalconn <- which(smpl2con %in% haplotypeindex)
                if (length(internalconn) != 0) {
                  for (i in 1:length(internalconn)) {
                    internalhaplotypeplace <- this_coi[1] + internalconn[i]
                    if (tmi[internalhaplotypeplace] < tmi[this_coi[1] + 
                      internalconn[i]]) {
                      locimatches[i] <- locimatches[i] + 1
                    }
                  }
                }
            }
        }
        return(sum(locimatches))
    }
    effCOI <- c(coi1, coi2)
    numerator <- purrr::map2_dbl(.x = conn, .y = tm, .f = get_pairwise_ibd, 
        this_coi = swf$coi[host_index])
    numerator <- ifelse(numerator > min(effCOI), min(effCOI), 
        numerator)
    pairwiseIBD <- sum(numerator)/(min(effCOI) * length(conn))
    return(pairwiseIBD)
}

IBD_denominator.pptx

Limitations/Workaround section needed

Prev broke up chromosomes by adding 1e9 bp --> causes disassoc btwn chrom. Fine conceptually and practically - not the cleanest solution and therefore should be documented prominently somewhere in tutorial

  • Limitations section? or workaround section?

zero truncated Poisson slow when COI >= 10

The current implementation of the ZTP distribution becomes a computational bottleneck as the mean COI increases. Take this into Cpp or find a more efficient algorithm

ZP(COI) counts clones

Our zero-truncated COI equation/code currently counts genetic clones as distinct "infections". Although true under a FOI POV, not accurate from a genomics POV.


See Bob's maths and code below:
COI distribution in PolySimIBD model.docx

# COI_equilibrium.R
#
# Author: Bob Verity
# Date: 2022-08-16
#
# Purpose:
# Derive the equilibrium COI distribution under the PolySimIBD model for
# infinite population size.
#
# ------------------------------------------------------------------

library(copula) # needed for Stirling numbers

# define model parameters
lambda <- 3
m <- 0.1
K_max <- 20

# calculate transition matrix
K_mat <- matrix(NA, K_max, K_max)
for (K_prev in 1:K_max) {
  
  p2 <- rep(0, K_max)
  for (i in 1:K_max) {
    N_clonal <- i
    k <- 1:min(N_clonal, K_prev)
    p1 <- exp(lfactorial(K_prev) - lfactorial(K_prev - k) - N_clonal*log(K_prev)) * Stirling2.all(N_clonal)[k]
    p2 <- p2 + dpois(i, lambda * (1 - m) / K_prev) * c(p1, rep(0, K_max - length(p1)))
  }
  p2 <- c(dpois(0, lambda * (1 - m) / K_prev), p2)
  
  p3 <- rep(NA, K_max + 1)
  lambda2 <- lambda * m + lambda * (1 - m) * (1 - 1 / K_prev)
  for (i in 0:K_max) {
    p3[i + 1] <- sum(p2[(0:i) + 1] * dpois(i:0, lambda2))
  }
  
  K_mat[K_prev,] <- p3[-1]
}
K_mat <- K_mat / (1 - dpois(0, lambda))

# calculate equilibrium solution by applying transition a large number of times
# (I'm having trouble getting the Eigenvalue method to work)
K_init <- rep(1/K_max, K_max)
for (i in 1:100) {
  K_init <- K_init %*% K_mat
}

# barplot of equilibrium distribution. Red dots show the raw Poisson(lambda)
# distribtion, i.e. what we would expect under pure super-infection from an
# infinitely large population
barplot(K_init, space = 0, names.arg = 1:K_max, col = grey(0.6))
points(1:K_max - 0.5, dpois(1:K_max, lambda) / (1 - exp(-lambda)), pch = 20, col = 2)

Model Migration Examples

  • Starter code for migration models under polySimIBD spatial model: R_ignore/migration_models/
  • Code is not going to work "out of the box"
  • Should be a good launching point for exploring model behaivor

pairwise IBD speed

  • BV pairwise IBD needs to be sped up
  • Need unifying code for defining IBD (i/s/o COI) as follows:
    Within = average prob. that two strains, sampled without replacement from the same sample, are IBD
    Between = average prob. that two strains, sampled from different samples, are IBD

Unreprex Migration Mat Fail

GH tests said migration matrix test testthat::expect_equal(nrow(diagnl_comb_hosts_df), 0) had nrow of 1 --- but not able to reproduce it on GH server when re-running test? Not sure how to find that seed...

network behavior

community detection not coming through for pairwise ibd as expected? issue in how we are casting tbl_graph or deeper issue w/ my assumptions about community detection from pairwise ibd

Mutation model

Confirm model is respecting ARG and IBD chunks

solutions:
current version = forward in time: coalescent tree/bvtree time criteria --> mutations downstream (no issue w/ speed) --> is breaking up IBD blocks by new mutations = definition by Wakeley of IBD versus the HMM POV
back in time: Use adjacency matrix that is sample x sample x max COI for each unique loci

get_arg checks

User can crash program if they enter haploids that are incorrect from host id

  • either limit to hosts only
  • or put checks in place

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.