reconverse / i2extras Goto Github PK

View Code? Open in Web Editor NEW

2.0 4.0 1.0 3.63 MB

Additional functionality for working with incidence2

Home Page: https://www.reconverse.org/i2extras/

License: Other

R 100.00%

i2extras's Introduction

Scope

i2extras adds additional functionality to the incidence2 package.

What does it do?

The main features of the package include:

fit_curve() and growth_rate(): fit a trend (poisson / negative binomial) to an incidence2 object and calculate the associated growth rate.
add_rolling_average(): add a rolling average to an incidence2 object.
bootstrap(): generates a bootstrapped incidence2 object by re-sampling, with replacement, the original dates of events.
find_peak(): locates the peak time of the epicurve.
estimate_peak(): uses bootstrap to estimate the peak time (and related confidence interval) of a partially observed outbreak.

Installing the package

You can install the released version of {i2extras} from CRAN with:

install.packages("i2extras")

The development version of {i2extras} can be installed from GitHub with:

remotes::install_github("reconverse/i2extras", build_vignettes = TRUE)

Resources

Vignettes

An overview of i2extras is provided in the included vignettes:

vignette("peak_estimation", package = "i2extras")
vignette("fitting_epicurves", package = "i2extras")

Getting help online

Bug reports and feature requests should be posted on github using the issue system. All other questions should be posted on the RECON slack channel see https://www.repidemicsconsortium.org/forum/ for details on how to join.

i2extras's People

Contributors

Stargazers

Watchers

Forkers

jamesabaker

i2extras's Issues

Add function to detect low counts

It would be useful to have a function to detect counts abnormally low (zero or close) indicative of under-reporting, and set these to NAs. Criteria could be being < thres x median(counts). It should be able to handle a counts argument in the case of multiple counts.

Add function for data imputation

As a follow-up to #7 it would be useful to have a procedure for replacing NAs in an incidence2 object. Different methods could be foreseen, e.g.:

using a rolling average
interpolating from neighbouring values

In the case of multiple NAs next to each other, we may need to do this replacement recursively.

Will come up with a proposal interface later on.

quasipoisson

Add, from trending, quasipoisson as a function in fit

growth_rate - highlight confidence intervals containing zero

Currently growth_rate marks a fitted curve as either growth or decay depending on sign of the coefficient. We should think of a way to highlight when the confidence interval around r contains 0.

Unvendor {trending} bits

Now {trending} v0.1.0 is on CRAN we can unvendor the stuff in https://github.com/reconverse/i2extras/blob/master/R/compat_trending.R.

highlight spanning of zero in growth_rate function

It would be nice

Plotting the output of fit_curve() when it includes warnings

Here is a reprex illustrating the problem. Basically the problem is that when fit_curve() issues warnings, some functions like growth_rate() down the line can ignore warnings as an option, but plot() cannot. In terms of design, I am wondering if it would be useful to have a user-facing function to ignore warnings e.g.

x %>% 
  fit_curve() %>% 
  ignore_warnings() %>% 
  plot()

Or is it okay to have this as internal and add an option to plot()? Would be useful to discuss before making changes (also, happy to do them).

@TimTaylor tagging you for awareness and future discussions :)

library(tidyverse)
library(incidence2)
library(i2extras)

## set the random seed so we all get the same result
set.seed(1)
days <- 0:30
cases <- rpois(n = length(days), lambda = 3)


## step 3: create dates of infection for these cases
date_infection <- rep(days, cases)
data <- tibble(date_infection)
data
#> # A tibble: 96 × 1
#>    date_infection
#>             <int>
#>  1              0
#>  2              0
#>  3              1
#>  4              1
#>  5              2
#>  6              2
#>  7              2
#>  8              3
#>  9              3
#> 10              3
#> # … with 86 more rows

## build epicurve and fitting
res <- data %>%
  incidence(date_infection) %>%
  fit_curve(model = "negbin",
            control = glm.control(maxit = 1e3))
res
#> # A tibble: 1 × 8
#>   count_variable           data model  estimates   fitting_warning fitting_error
#>   <chr>          <list<tibble[> <list> <list>      <list>          <list>       
#> 1 count                [30 × 2] <negb… <df [30 × … <chr [2]>       <NULL>       
#> # … with 2 more variables: prediction_warning <list>, prediction_error <list>

## Note: there seems to be a 'safe' warning, which I would like to be able to
## ignore in further analyses
res %>% 
  pull(fitting_warning)
#> [[1]]
#> [1] "NaNs produced" "NaNs produced"

## get growth rates: this behaves as expected
res %>%
  growth_rate() # empty result coz of warnings - fine
#> # A tibble: 0 × 9
#> # … with 9 variables: count_variable <chr>, model <list>, r <dbl>,
#> #   r_lower <dbl>, r_upper <dbl>, growth_or_decay <lgl>, time <lgl>,
#> #   time_lower <lgl>, time_upper <lgl>
res %>%
  growth_rate(include_warnings = TRUE) # results as expected - fine
#> # A tibble: 1 × 9
#>   count_variable model        r r_lower r_upper growth_or_decay  time time_lower
#>   <chr>          <lis>    <dbl>   <dbl>   <dbl> <chr>           <dbl>      <dbl>
#> 1 count          <neg… -0.00276 -0.0256  0.0200 halving          251.       27.1
#> # … with 1 more variable: time_upper <dbl>

## but plotting won't go through
res %>%
  plot()
#> Error: Can't subset columns that don't exist.
#> x Column `count` doesn't exist.

^{Created on 2021-10-21 by the reprex package (v2.0.1)}

Can we deal with plotting rolling averages better for grouped incidence objects

Currently grouped incidence objects need regrouping if an overall rolling average is required. Is this the best way to deal with it?

library(outbreaks)
library(incidence2)

data(ebola_sim_clean, package = "outbreaks")
dat <- ebola_sim_clean$linelist


# without groups ----------------------------------------------------------
inci <- incidence(dat,
                  date_index = date_of_onset,
                  interval = "week", 
                  last_date = "2014-10-05")
#> 3522 observations outside of [2014-04-07, 2014-10-05] were removed.

inci %>%  
  rolling_average(before = 2) %>% 
  plot(color = "white")
#> Warning: Removed 2 rows containing missing values (position_stack).

#> Warning: Removed 2 rows containing missing values (position_stack).

# grouped by gender -------------------------------------------------------
inci <- incidence(dat,
                  date_index = date_of_onset,
                  interval = "week", 
                  last_date = "2014-10-05",
                  groups = gender)
#> 3522 observations outside of [2014-04-07, 2014-10-05] were removed.

# facet_plot
inci %>%  
  rolling_average(before = 2) %>% 
  facet_plot(color = "white")
#> Warning: Removed 4 rows containing missing values (position_stack).
#> Warning: Removed 4 rows containing missing values (position_stack).

# individual plot would needs regrouping if groups present
inci %>%  
  rolling_average(before = 2) %>% 
  plot(color = "white")
#> Warning: Removed 4 rows containing missing values (position_stack).

#> Warning: Removed 4 rows containing missing values (position_stack).

inci %>%  
  regroup() %>% 
  rolling_average(before = 2) %>% 
  plot(color = "white")
#> Warning: Removed 2 rows containing missing values (position_stack).
#> Warning: Removed 2 rows containing missing values (position_stack).

^{Created on 2020-07-30 by the reprex package (v0.3.0)}

Possible interface for simple fitting

library(incidence2)
library(incidence2plus)
library(tidyr)

data(ebola_sim_clean, package = "outbreaks")
dat <- ebola_sim_clean$linelist

inci <- incidence(
  dat,
  date_index = date_of_onset,
  interval = "week",
  last_date = "2014-10-05",
  groups = gender
)
#> 3522 observations outside of [2014-04-07, 2014-10-05] were removed.

inci %>% 
  fit(model = "poisson")
#> # A tibble: 2 x 6
#>   gender model  fitted                 r `r-lower` `r-upper`
#>   <fct>  <list> <list>             <dbl>     <dbl>     <dbl>
#> 1 f      <glm>  <tibble [26 × 6]> 0.0249    0.0233    0.0265
#> 2 m      <glm>  <tibble [26 × 6]> 0.0250    0.0234    0.0267

inci %>% 
  fit(model = "poisson") %>% 
  add_doubling()
#> # A tibble: 2 x 9
#>   gender model fitted      r `r-lower` `r-upper` doubling `doubling-lower`
#>   <fct>  <lis> <list>  <dbl>     <dbl>     <dbl>    <dbl>            <dbl>
#> 1 f      <glm> <tibb… 0.0249    0.0233    0.0265     27.9             29.8
#> 2 m      <glm> <tibb… 0.0250    0.0234    0.0267     27.7             29.7
#> # … with 1 more variable: `doubling-upper` <dbl>

inci %>% 
  fit(model = "poisson") %>% 
  plot(color = "white", angle = 45, n_breaks = 4)

^{Created on 2020-09-03 by the reprex package (v0.3.0)}

Alternatives to moving averages

Is your feature request related to a problem? Please describe.

Moving averages that are commonly used to process observational data prior to visualisation etc have several issues with the most notable being information loss and lag. The main drivers of the use of moving averages are day of the week effects and reporting noise.

Some of this functionality looks like it is supported in fit_curve and would only need some minimal extension.

Describe the solution you'd like

In general, day of the week effects are much easier to account for than reporting effects. Time series decomposition would be one possible alternative with other alternatives requiring some non-parametric driver of reports.

Reporting noise is harder to adjust for and requires some more thought about what kind of model would have both the required simplicity and ability to make the adjustment in a rigorous way.

I would in principle be happy to support the implementation of some of these features and to discuss them in more detail.

log-linear

We should have an option in fit_curve for a log-linear function. This will require some changes to the upstream trending package due to how data is stored with a fitted model.