Git Product home page Git Product logo

i2extras's Introduction

CRAN status Codecov test coverage R-CMD-check

Scope

i2extras adds additional functionality to the incidence2 package.

What does it do?

The main features of the package include:

  • fit_curve() and growth_rate(): fit a trend (poisson / negative binomial) to an incidence2 object and calculate the associated growth rate.

  • add_rolling_average(): add a rolling average to an incidence2 object.

  • bootstrap(): generates a bootstrapped incidence2 object by re-sampling, with replacement, the original dates of events.

  • find_peak(): locates the peak time of the epicurve.

  • estimate_peak(): uses bootstrap to estimate the peak time (and related confidence interval) of a partially observed outbreak.

Installing the package

You can install the released version of {i2extras} from CRAN with:

install.packages("i2extras")

The development version of {i2extras} can be installed from GitHub with:

remotes::install_github("reconverse/i2extras", build_vignettes = TRUE)

Resources

Vignettes

An overview of i2extras is provided in the included vignettes:

  • vignette("peak_estimation", package = "i2extras")

  • vignette("fitting_epicurves", package = "i2extras")

Getting help online

Bug reports and feature requests should be posted on github using the issue system. All other questions should be posted on the RECON slack channel see https://www.repidemicsconsortium.org/forum/ for details on how to join.

i2extras's People

Contributors

timtaylor avatar thibautjombart avatar jamesabaker avatar

Stargazers

Pietro Monticone avatar Sam Abbott avatar

Watchers

James Cloos avatar Dirk Schumacher avatar Amy Gimma avatar  avatar

Forkers

jamesabaker

i2extras's Issues

Add function to detect low counts

It would be useful to have a function to detect counts abnormally low (zero or close) indicative of under-reporting, and set these to NAs. Criteria could be being < thres x median(counts). It should be able to handle a counts argument in the case of multiple counts.

Add function for data imputation

As a follow-up to #7 it would be useful to have a procedure for replacing NAs in an incidence2 object. Different methods could be foreseen, e.g.:

  • using a rolling average
  • interpolating from neighbouring values

In the case of multiple NAs next to each other, we may need to do this replacement recursively.

Will come up with a proposal interface later on.

quasipoisson

Add, from trending, quasipoisson as a function in fit

Plotting the output of fit_curve() when it includes warnings

Here is a reprex illustrating the problem. Basically the problem is that when fit_curve() issues warnings, some functions like growth_rate() down the line can ignore warnings as an option, but plot() cannot. In terms of design, I am wondering if it would be useful to have a user-facing function to ignore warnings e.g.

x %>% 
  fit_curve() %>% 
  ignore_warnings() %>% 
  plot()  

Or is it okay to have this as internal and add an option to plot()? Would be useful to discuss before making changes (also, happy to do them).

@TimTaylor tagging you for awareness and future discussions :)

library(tidyverse)
library(incidence2)
library(i2extras)

## set the random seed so we all get the same result
set.seed(1)
days <- 0:30
cases <- rpois(n = length(days), lambda = 3)


## step 3: create dates of infection for these cases
date_infection <- rep(days, cases)
data <- tibble(date_infection)
data
#> # A tibble: 96 × 1
#>    date_infection
#>             <int>
#>  1              0
#>  2              0
#>  3              1
#>  4              1
#>  5              2
#>  6              2
#>  7              2
#>  8              3
#>  9              3
#> 10              3
#> # … with 86 more rows

## build epicurve and fitting
res <- data %>%
  incidence(date_infection) %>%
  fit_curve(model = "negbin",
            control = glm.control(maxit = 1e3))
res
#> # A tibble: 1 × 8
#>   count_variable           data model  estimates   fitting_warning fitting_error
#>   <chr>          <list<tibble[> <list> <list>      <list>          <list>       
#> 1 count                [30 × 2] <negb… <df [30 × … <chr [2]>       <NULL>       
#> # … with 2 more variables: prediction_warning <list>, prediction_error <list>

## Note: there seems to be a 'safe' warning, which I would like to be able to
## ignore in further analyses
res %>% 
  pull(fitting_warning)
#> [[1]]
#> [1] "NaNs produced" "NaNs produced"

## get growth rates: this behaves as expected
res %>%
  growth_rate() # empty result coz of warnings - fine
#> # A tibble: 0 × 9
#> # … with 9 variables: count_variable <chr>, model <list>, r <dbl>,
#> #   r_lower <dbl>, r_upper <dbl>, growth_or_decay <lgl>, time <lgl>,
#> #   time_lower <lgl>, time_upper <lgl>
res %>%
  growth_rate(include_warnings = TRUE) # results as expected - fine
#> # A tibble: 1 × 9
#>   count_variable model        r r_lower r_upper growth_or_decay  time time_lower
#>   <chr>          <lis>    <dbl>   <dbl>   <dbl> <chr>           <dbl>      <dbl>
#> 1 count          <neg… -0.00276 -0.0256  0.0200 halving          251.       27.1
#> # … with 1 more variable: time_upper <dbl>

## but plotting won't go through
res %>%
  plot()
#> Error: Can't subset columns that don't exist.
#> x Column `count` doesn't exist.

Created on 2021-10-21 by the reprex package (v2.0.1)

Can we deal with plotting rolling averages better for grouped incidence objects

Currently grouped incidence objects need regrouping if an overall rolling average is required. Is this the best way to deal with it?

library(outbreaks)
library(incidence2)

data(ebola_sim_clean, package = "outbreaks")
dat <- ebola_sim_clean$linelist


# without groups ----------------------------------------------------------
inci <- incidence(dat,
                  date_index = date_of_onset,
                  interval = "week", 
                  last_date = "2014-10-05")
#> 3522 observations outside of [2014-04-07, 2014-10-05] were removed.

inci %>%  
  rolling_average(before = 2) %>% 
  plot(color = "white")
#> Warning: Removed 2 rows containing missing values (position_stack).

#> Warning: Removed 2 rows containing missing values (position_stack).

# grouped by gender -------------------------------------------------------
inci <- incidence(dat,
                  date_index = date_of_onset,
                  interval = "week", 
                  last_date = "2014-10-05",
                  groups = gender)
#> 3522 observations outside of [2014-04-07, 2014-10-05] were removed.

# facet_plot
inci %>%  
  rolling_average(before = 2) %>% 
  facet_plot(color = "white")
#> Warning: Removed 4 rows containing missing values (position_stack).
#> Warning: Removed 4 rows containing missing values (position_stack).

# individual plot would needs regrouping if groups present
inci %>%  
  rolling_average(before = 2) %>% 
  plot(color = "white")
#> Warning: Removed 4 rows containing missing values (position_stack).

#> Warning: Removed 4 rows containing missing values (position_stack).

inci %>%  
  regroup() %>% 
  rolling_average(before = 2) %>% 
  plot(color = "white")
#> Warning: Removed 2 rows containing missing values (position_stack).
#> Warning: Removed 2 rows containing missing values (position_stack).

Created on 2020-07-30 by the reprex package (v0.3.0)

Possible interface for simple fitting

library(incidence2)
library(incidence2plus)
library(tidyr)

data(ebola_sim_clean, package = "outbreaks")
dat <- ebola_sim_clean$linelist

inci <- incidence(
  dat,
  date_index = date_of_onset,
  interval = "week",
  last_date = "2014-10-05",
  groups = gender
)
#> 3522 observations outside of [2014-04-07, 2014-10-05] were removed.
inci %>% 
  fit(model = "poisson")
#> # A tibble: 2 x 6
#>   gender model  fitted                 r `r-lower` `r-upper`
#>   <fct>  <list> <list>             <dbl>     <dbl>     <dbl>
#> 1 f      <glm>  <tibble [26 × 6]> 0.0249    0.0233    0.0265
#> 2 m      <glm>  <tibble [26 × 6]> 0.0250    0.0234    0.0267
inci %>% 
  fit(model = "poisson") %>% 
  add_doubling()
#> # A tibble: 2 x 9
#>   gender model fitted      r `r-lower` `r-upper` doubling `doubling-lower`
#>   <fct>  <lis> <list>  <dbl>     <dbl>     <dbl>    <dbl>            <dbl>
#> 1 f      <glm> <tibb… 0.0249    0.0233    0.0265     27.9             29.8
#> 2 m      <glm> <tibb… 0.0250    0.0234    0.0267     27.7             29.7
#> # … with 1 more variable: `doubling-upper` <dbl>
inci %>% 
  fit(model = "poisson") %>% 
  plot(color = "white", angle = 45, n_breaks = 4)

Created on 2020-09-03 by the reprex package (v0.3.0)

Alternatives to moving averages

Is your feature request related to a problem? Please describe.

Moving averages that are commonly used to process observational data prior to visualisation etc have several issues with the most notable being information loss and lag. The main drivers of the use of moving averages are day of the week effects and reporting noise.

Some of this functionality looks like it is supported in fit_curve and would only need some minimal extension.

Describe the solution you'd like

In general, day of the week effects are much easier to account for than reporting effects. Time series decomposition would be one possible alternative with other alternatives requiring some non-parametric driver of reports.

Reporting noise is harder to adjust for and requires some more thought about what kind of model would have both the required simplicity and ability to make the adjustment in a rigorous way.

I would in principle be happy to support the implementation of some of these features and to discuss them in more detail.

log-linear

We should have an option in fit_curve for a log-linear function. This will require some changes to the upstream trending package due to how data is stored with a fitted model.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.