reconverse / incidence2 Goto Github PK
View Code? Open in Web Editor NEWCompute and visualise incidence (reworking of the original incidence package)
Home Page: https://www.reconverse.org/incidence2
License: Other
Compute and visualise incidence (reworking of the original incidence package)
Home Page: https://www.reconverse.org/incidence2
License: Other
Currently we just have a hack that cats output rather than return a meaningful object.
Borrowing rOpenSci guidance:
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the ropensci reviewer guide.
The package includes all the following forms of documentation:
Currently testing has become a bit of a mixed bag due to the refactoring and is a little unmaintainable. Work needs to be done to implement a more methodical approach.
Minor thing but for consistency, it would make sense to rename the n.breaks
argument to n_breaks
in plot.incidence2
and facet_plot
.
Some thoughts on how facetting may work, esp with regards to using groups for facetting and/or color-filling. Nothing hard-set, more for discussion purpose. It would be useful to have a facet
argument handling which grouping variables are used for facetting. Together with fill
, this should give more flexibility to the user for designing plots with different grouping variables displayed.
Here are some proposed behaviours:
facet_plot(x)
: plot the incidence object using all grouping variables for facettingfacet_plot(x, facet = "foo")
: same, using only variable foo
for facettingfacet_plot(x, facet = c(foo, bar))
: same, using variables foo
and bar
facet_plot(x, facet = "foo", fill = bar)
: use foo
for facetting and bar
for fillingfacet_plot(x, facet = c("foo", "bar"), fill = bar)
: use foo
and bar
for facetting, and bar
for filling; redundant, but that's okay, the user asked for itWhat do you think?
Currently date binning is performed by an internal function group_dates
. This function could actually be any monitonically increasing, user-defined function. May be worth implementing.
In the following example, 2 valid dates are provided to date_index
but generate an error. Sorry if I am missing something obvious. See example below:
library(incidence2)
library(outbreaks)
dat <- ebola_sim_clean$linelist
i <- incidence(dat,
date_index = c(onset = date_of_onset, outcome = date_of_outcome),
interval = 7,
group = "hospital")
#> Error: Names must be unique.
#> x These names are duplicated:
#> * "outcome" at locations 6 and 7.
names(dat)
#> [1] "case_id" "generation"
#> [3] "date_of_infection" "date_of_onset"
#> [5] "date_of_hospitalisation" "date_of_outcome"
#> [7] "outcome" "gender"
#> [9] "hospital" "lon"
#> [11] "lat"
Created on 2021-03-22 by the reprex package (v1.0.0)
See reprex below:
library(magrittr)
library(incidence2)
outbreaks::ebola_sim_clean$linelist %>%
incidence(date_of_onset, groups = outcome) %>%
plot()
#> The number of colors (1) did not match the number of groups (3).
#> Using `col_pal` instead.
It may come from the borders ie bars are so thin we see only the border? Plot is fine with wider time intervals.
Created on 2020-07-06 by the reprex package (v0.3.0)
Currently the dplyr reconstructable checking is superfluous in places and missing some bits in others. See https://github.com/reconhub/trending/blob/master/R/dplyr.R has a slightly better approach including a method for $<-
.
I think I'm using across
poorly as the example below is crazily slow. Here I'm using clock but I'm pretty sure that is not the bottleneck:
library(incidence2)
library(clock)
library(microbenchmark)
dat <- covidregionaldataUK
# default uses data.table
default <- function() {
incidence(dat, date_index = date, groups = region, counts = ends_with("new"))
}
# here clock is just used as an example and is not the bottle-neck
record <- function() {
build_incidence(
dat,
date_index = date,
groups = region,
counts = ends_with("new"),
FUN = function(x) calendar_narrow(as_year_month_day(x), precision = "day")
)
}
microbenchmark(default(), record(), times = 10)
#> Unit: milliseconds
#> expr min lq mean median uq
#> default() 5.053506 5.303756 7.184131 5.825458 6.297723
#> record() 4614.132282 4683.771452 4741.717130 4751.499357 4782.525466
#> max neval
#> 18.83551 10
#> 4835.74966 10
Created on 2021-06-28 by the reprex package (v2.0.0)
Count should be made to work on multiple columns at once.
i.e tidy
and augment
Prepare for release:
devtools::build_readme()
usethis::use_cran_comments()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
rhub::check_for_cran()
cran-comments.md
Submit to CRAN:
usethis::use_version('minor')
devtools::submit_cran()
Wait for CRAN...
usethis::use_news_md()
usethis::use_github_release()
usethis::use_dev_version()
These functions return NAs when the second argument exceeds the size of the object. It should probably return the whole thing, not even sure if we want a warning in there.
library(tidyverse)
library(incidence2)
tibble(date = 1:10) %>%
incidence(date) %>%
keep_last(10)
#> An incidence2 object: 10 x 2
#> 10 cases from 1 to 10
#> interval: 1 day
#> cumulative: FALSE
#>
#> date_index count
#> <int> <int>
#> 1 1 1
#> 2 2 1
#> 3 3 1
#> 4 4 1
#> 5 5 1
#> 6 6 1
#> 7 7 1
#> 8 8 1
#> 9 9 1
#> 10 10 1
tibble(date = 1:10) %>%
incidence(date) %>%
keep_last(11)
#> # A tibble: 10 x 2
#> date_index count
#> <int> <int>
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> 7 NA NA
#> 8 NA NA
#> 9 NA NA
#> 10 NA NA
tibble(date = 1:10) %>%
incidence(date) %>%
keep_first(10)
#> An incidence2 object: 10 x 2
#> 10 cases from 1 to 10
#> interval: 1 day
#> cumulative: FALSE
#>
#> date_index count
#> <int> <int>
#> 1 1 1
#> 2 2 1
#> 3 3 1
#> 4 4 1
#> 5 5 1
#> 6 6 1
#> 7 7 1
#> 8 8 1
#> 9 9 1
#> 10 10 1
tibble(date = 1:10) %>%
incidence(date) %>%
keep_first(11)
#> # A tibble: 10 x 2
#> date_index count
#> <int> <int>
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> 7 NA NA
#> 8 NA NA
#> 9 NA NA
#> 10 NA NA
Created on 2021-03-08 by the reprex package (v1.0.0)
These are comments or discussion items which in itself seem too small to warrant a dedicated issue. This issue is work in progress.
in ?incidence
, specify that x
can be a data.frame
or a tibble
, for the incidence
functions, and that the indicated methods dispatch along the second argument (which is unusual, though very cool)
I love that the following works:
library(magrittr)
library(incidence2)
outbreaks::ebola_sim_clean$linelist %>%
incidence(date_of_onset, groups = c(gender, hospital, outcome)) %>%
pool(c(gender, outcome))
#> <incidence object>
#> [5829 cases from days 2014-04-07 to 2015-04-30]
#> [interval: 1 day]
#> [cumulative: FALSE]
#>
#> date_group gender outcome count
#> <date> <fct> <fct> <int>
#> 1 2014-04-07 f Death 0
#> 2 2014-04-07 f Recover 0
#> 3 2014-04-07 f <NA> 1
#> 4 2014-04-07 m Death 0
#> 5 2014-04-07 m Recover 0
#> 6 2014-04-07 m <NA> 0
#> 7 2014-04-08 f Death 0
#> 8 2014-04-08 f Recover 0
#> 9 2014-04-08 f <NA> 0
#> 10 2014-04-08 m Death 0
#> # … with 2,324 more rows
Going forward it would be nice to have a vignette dedicated to handling incidence
objects and illustrate this feature, amongst others. Also, would it make sense to create an alias (or merely rename) for pool
to regroup
, as it might make more sense to people? Which makes me realise we have not decided on a policy for keeping names / creating new names.
Error in plot_single(x, group, stack, color, col_pal, alpha, border, xlab, :
A single plot can only stack/dodge one variable.
Please `pool` the object first or use `plot_facet`
Should read facet_plot
interval = 14
reads semi-weekly
(google has it as "twice a week"); I think bi-weekly
may be better (google says it can be either "once every 2 weeks" or "twice a week")Hey - not a must have but might be nice to have option to add a moving average line.
This pretty commonly used on messy epi data.
Should be quite easy to implement now that {slider} has been released.
See r4epi discussion
Maybe should just be left for users to so separately (i.e. add to plot themselves after)?
There has been debates in the past on where dates should appear on the x-axis. I will try to sum up views / things to take into account below, and maybe some will add thoughts to it.
geom_point
and geom_line
as options for plot
and facet_plot
, it is preferrable to have a consistent label positioning, which works the same for all geoms; label in the middle seems better for this: it still makes sense with geom_bar
at I'm mostly just curious, but what is the reasoning behind the design decision to copy over the code from {incidence} initially instead of importing? From my perspective, if there's a bug in the future, then there are two places where it needs to be fixed.
It would be useful to provide a wrapper to convert incidence
objects from the original package to incidence2
objects.
I've commented out the mac devel action due to r-lib/actions#140. This is a reminder for me to reinstate when fixed.
A maybe not very frequent use case: define limits between two time intervals defined by grate, e.g. to visually delineate epochs in a graph using a vertical line.
Currently the following will error on purpose:
> as_yrwk("2021-W03") + 1
[1] "2021-W04"
> as_yrwk("2021-W03") + 1.5
Error: Can only add whole numbers to <yrwk> objects
But unsure if we want to change this or not.
hi - like the idea of moving this to cleaner syntax!
Just thought I would suggest using tsibble to do some of the underlying legwork. There are yearweek and year* functions and all works pretty clean.
The advantage over aweek is that is recognised as a date automatically. The disadvantage over aweek is that cannot (as of yet) set a different start day for a week.
Actually just posted issues this morning on {aweek} and {tsibble} about this.
As a sidenote - while in the process of redoing everything might be worth considering renaming just to make certain epis happy and avoid semantic discussions around incidence vs incidence rate vs prevalence (see)
The current default for x-axis annotations is to be left-justified, e.g.:
library(outbreaks)
library(incidence2)
x <- ebola_sim_clean$linelist %>%
incidence(date_index = date_of_onset, interval = "month")
plot(x) + scale_x_incidence(x, format = "%d %M %Y")
#> Scale for 'x' is already present. Adding another scale for 'x', which will
#> replace the existing scale.
I suspect most users will want to center the labels by default. It could be done by adding a center_labels = TRUE
by default to scale_x_incidence
, but this argument already exists with a different meaning (position of the tick marks) in the plot
method.
One solution would be to:
center_labels
to center_tick
in the plot methodcenter_labels
argument to scale_x_incidence
But I am sure there are other options - just can't think of a good one atm.
@nsbatra - the following requires the dev (GitHub main/master branches) of incidence2
and grates
but hopefully works as you were hoping. Let me know what you think:
library(outbreaks)
library(incidence2)
dat <- ebola_sim_clean$linelist
x <- incidence(dat, date_of_onset, interval = "month")
x
#> An incidence2 object: 13 x 2
#> 5829 cases from 2014-Apr to 2015-Apr
#> interval: 1 month
#> cumulative: FALSE
#>
#> date_index count
#> <month> <int>
#> 1 2014-Apr 7
#> 2 2014-May 67
#> 3 2014-Jun 102
#> 4 2014-Jul 228
#> 5 2014-Aug 540
#> 6 2014-Sep 1144
#> 7 2014-Oct 1199
#> 8 2014-Nov 779
#> 9 2014-Dec 567
#> 10 2015-Jan 427
#> 11 2015-Feb 307
#> 12 2015-Mar 277
#> 13 2015-Apr 185
# centred dates (default for yearweek, single months, quarters and years)
plot(x, color = "white")
# histogram-esque dates on the breaks (defaults to "%Y-%m-%d")
plot(x, color = "white", centre_dates = FALSE)
# can specify a different format
plot(x, color = "white", centre_dates = FALSE, date_format = "%d-%m-%Y")
Created on 2021-05-19 by the reprex package (v2.0.0)
Add a sort method for the incidence class. It could help with things like reconhub/trendbreaker#45
Currently week labels are added as additional columns to the underlying data. Should the underlying structure should just be the binned data with attributes that trigger different behaviour? For example, I think the following is a cleaner implementation:
library(incidence2)
library(dplyr, warn.conflicts = FALSE)
# get some data
data(ebola_sim_clean, package = "outbreaks")
dat <-
ebola_sim_clean$linelist %>%
filter(date_of_onset <= "2014-07-07")
# generate object with our current implementation
inci <- incidence(dat, date_index = date_of_onset, interval = "week")
inci
#> <incidence object>
#> [207 cases from days 2014-04-07 to 2014-07-07]
#> [interval: 1 week]
#> [cumulative: FALSE]
#>
#> date_group weeks isoweeks count
#> <date> <aweek> <chr> <int>
#> 1 2014-04-07 2014-W15 2014-W15 1
#> 2 2014-04-14 2014-W16 2014-W16 1
#> 3 2014-04-21 2014-W17 2014-W17 5
#> 4 2014-04-28 2014-W18 2014-W18 4
#> 5 2014-05-05 2014-W19 2014-W19 12
#> 6 2014-05-12 2014-W20 2014-W20 17
#> 7 2014-05-19 2014-W21 2014-W21 15
#> 8 2014-05-26 2014-W22 2014-W22 19
#> 9 2014-06-02 2014-W23 2014-W23 23
#> 10 2014-06-09 2014-W24 2014-W24 21
#> 11 2014-06-16 2014-W25 2014-W25 30
#> 12 2014-06-23 2014-W26 2014-W26 22
#> 13 2014-06-30 2014-W27 2014-W27 34
#> 14 2014-07-07 2014-W28 2014-W28 3
str(inci)
#> tibble [14 × 4] (S3: incidence/tbl_df/tbl/data.frame)
#> $ date_group: Date[1:14], format: "2014-04-07" "2014-04-14" ...
#> $ weeks : 'aweek' chr [1:14] "2014-W15" "2014-W16" "2014-W17" "2014-W18" ...
#> ..- attr(*, "week_start")= int 1
#> $ isoweeks : chr [1:14] "2014-W15" "2014-W16" "2014-W17" "2014-W18" ...
#> $ count : int [1:14] 1 1 5 4 12 17 15 19 23 21 ...
#> - attr(*, "date")= chr [1:3] "date_group" "weeks" "isoweeks"
#> - attr(*, "count")= chr "count"
#> - attr(*, "interval")= chr "week"
#> - attr(*, "cumulative")= logi FALSE
# drop the extra columns
x <- inci %>%
select(date_group, count)
# create demo subclass of tibble without the additional columns
# give it a "week" attribute (but could be "quarter", "year", "month", etc...)
tbl <- tibble::new_tibble(x,
nrow = nrow(x),
type = "week",
class = "demo")
# users would see the groupings they expect
print.demo <- function(x, ...) {
# title
cat("<demo object>\n")
out <- x
if (attr(x, "type") == "week") {
out$date_group <- aweek::date2week(out$date_group)
}
out <- format(tibble::as_tibble(out))
cat(out[-1], sep = "\n")
cat("\n")
invisible(x)
}
tbl
#> <demo object>
#> date_group count
#> <aweek> <int>
#> 1 2014-W15-1 1
#> 2 2014-W16-1 1
#> 3 2014-W17-1 5
#> 4 2014-W18-1 4
#> 5 2014-W19-1 12
#> 6 2014-W20-1 17
#> 7 2014-W21-1 15
#> 8 2014-W22-1 19
#> 9 2014-W23-1 23
#> 10 2014-W24-1 21
#> 11 2014-W25-1 30
#> 12 2014-W26-1 22
#> 13 2014-W27-1 34
#> 14 2014-W28-1 3
# but underlying we have the original binning data
str(tbl)
#> tibble [14 × 2] (S3: demo/tbl_df/tbl/data.frame)
#> $ date_group: Date[1:14], format: "2014-04-07" "2014-04-14" ...
#> $ count : int [1:14] 1 1 5 4 12 17 15 19 23 21 ...
#> - attr(*, "type")= chr "week"
Created on 2020-07-08 by the reprex package (v0.3.0)
I think the following should probably work; I am guessing it has to do with dates being simple numbers (not Date or grates stuff) but not 100% sure. Here's a reprex:
library(incidence2)
df <- data.frame(
dates = 1:100,
counts = round(rnorm(100, 1000, 200))
)
x <- incidence(df, date_index = dates, counts = counts)
x
#> An incidence2 object: 100 x 2
#> 100028 counts from 1 to 100
#> interval: 1 day
#> cumulative: FALSE
#>
#> date_index counts
#> <int> <dbl>
#> 1 1 775
#> 2 2 1389
#> 3 3 1341
#> 4 4 999
#> 5 5 694
#> 6 6 645
#> 7 7 1000
#> 8 8 1365
#> 9 9 1119
#> 10 10 959
#> # … with 90 more rows
plot(x)
#> Error in `substring<-`(`*tmp*`, 1, 1, value = toupper(first_letter)): replacing substrings in a non-character object
Created on 2021-03-29 by the reprex package (v1.0.0)
Subsetting objects by given time windows may be one of the only things made slightly easier in the original incidence package. For instance, x[1:5]
would get you the first 5 time steps (days / weeks / months) of the object, which is a little trickier to do now. It would be useful to have some functions helping with this - see some proposed example uses below.
Filter the data to retain the first or last data points, predicated on a duration. There is a question here, as to how duration can be specified:
integer
daysinteger
time intervals (as specified by the bins of the object)incidence2::incidence
, so we could do things like "3 months"
to have the first 3 months of data (possibly months 1 and 3 not being complete)Examples would be (depending on the option above we retain):
filter_first(x, 30)
: retain the first 30 days of data, or all of it if there are less than 30 daysfilter_first(x, "1 month")
: retain the first month of data; may not be a full month, only data from the first reported monthfilter_last(x, "4 weeks")
: retain the last 4 weeks of data; the last week may not be complete e.g. if the last date is a Thursday, so this may not be 28 days of datafilter_last(x, 28)
: filters the first 28 days data (irrespective of week definition)We could re-implement the features of incidence::subset()
, but possibly renaming the function. It would merely be a wrapper for filter
on dates.
This is apparently caused by a week_var
missing:
## bug for plotting with interval = 1
library(outbreaks)
library(incidence2)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
dat <- ebola_sim_clean$linelist
glimpse(dat)
#> Rows: 5,829
#> Columns: 11
#> $ case_id <chr> "d1fafd", "53371b", "f5c3d8", "6c286a", "0f58…
#> $ generation <int> 0, 1, 1, 2, 2, 0, 3, 3, 2, 3, 4, 3, 4, 2, 4, …
#> $ date_of_infection <date> NA, 2014-04-09, 2014-04-18, NA, 2014-04-22, …
#> $ date_of_onset <date> 2014-04-07, 2014-04-15, 2014-04-21, 2014-04-…
#> $ date_of_hospitalisation <date> 2014-04-17, 2014-04-20, 2014-04-25, 2014-04-…
#> $ date_of_outcome <date> 2014-04-19, NA, 2014-04-30, 2014-05-07, 2014…
#> $ outcome <fct> NA, NA, Recover, Death, Recover, NA, Recover,…
#> $ gender <fct> f, m, f, f, f, f, f, f, m, m, f, f, f, f, f, …
#> $ hospital <fct> Military Hospital, Connaught Hospital, other,…
#> $ lon <dbl> -13.21799, -13.21491, -13.22804, -13.23112, -…
#> $ lat <dbl> 8.473514, 8.464927, 8.483356, 8.464776, 8.452…
i <- incidence(dat, date_index = date_of_onset)
plot(i)
#> Error: Must extract column with a single valid subscript.
#> ✖ Subscript `week_var` can't be `NA`.
Created on 2020-07-03 by the reprex package (v0.3.0)
see labelling issue on y-axis below
library(outbreaks)
library(incidence2)
dat <- ebola_sim_clean$linelist
x2w<- incidence(dat, date_of_onset, interval = "2 weeks")
x2w
#> An incidence2 object: 28 x 2
#> 5829 cases from 2014-04-07 to 2015-05-03
#> interval: 14 days
#> cumulative: FALSE
#>
#> date_index count
#> <period> <int>
#> 1 2014-04-07 to 2014-04-20 2
#> 2 2014-04-21 to 2014-05-04 9
#> 3 2014-05-05 to 2014-05-18 29
#> 4 2014-05-19 to 2014-06-01 34
#> 5 2014-06-02 to 2014-06-15 44
#> 6 2014-06-16 to 2014-06-29 52
#> 7 2014-06-30 to 2014-07-13 72
#> 8 2014-07-14 to 2014-07-27 120
#> 9 2014-07-28 to 2014-08-10 166
#> 10 2014-08-11 to 2014-08-24 255
#> # … with 18 more rows
plot(x2w, color = "white")
Created on 2021-05-20 by the reprex package (v2.0.0)
Hello!
Great improvements on the original package - thank you very much! I really like the ability to facet and to use count data.
I would like to ask if the plotting functions can allow a width argument or otherwise an option for the there to be no gap between bars. At the US CDC and it seems in Europe as well (see ref below) there is a traditional guideline that epidemic curves (when large enough that cases are not shown as boxes) should be histograms and not bar charts - or at least that there be no spaces between the bars. If this option can be offered I think it would also offer a solution to the varying width and frequency of "white lines"/gaps between bars, which appear for example in the github readme (below).
From the vignette - "white lines"/bar gaps appearing at different frequencies across the plot
From the vignette - "white lines"/bar gaps of varying thickness across the plot
I tried to include a width argument in plot() but it was not accepted. When I tried to add a geom_col() to plot() and specify width that also did not work. While experimenting, I tried to use geom_col alone directly on a weekly incidence2 object. When I specified width = 7 I was able to achieve non-overlapping bars without any gaps. This makes sense given that it was a weekly incidence object and according to this ggplot2 issue discussion which says that geom_col width is interpreted in absolute units (days in this case).
Here is that example - the outbreaks ebola_sim_clean linelist
pacman::p_load(incidence2, tidyverse, outbreaks)
b <- incidence2::incidence(outbreaks::ebola_sim_clean$linelist, date_index = date_of_onset, groups = gender, interval = "week")
plot(b, fill = gender) # weird varying white "gaps" between bars
ggplot(data = b)+geom_col(aes(x = bin_date, y = count, fill = gender), width = 7) # no gaps
I just wanted to chime in and see if this was something that is possible. Perhaps at the least the width argument could be allowed to pass to the underlying geom_col? Then the user could tinker and find the correct width?
Thanks very much for considering!
Need to produce an error or convert if data.table object is provided as input
library(data.table)
library(incidence2)
dat <- incidence2::covidregionaldataUK
incidence(dat, date, count = cases_new)
#> An incidence object: 490 x 2
#> date range: [2020-01-30] to [2021-06-02]
#> cases_new: 8379330
#> interval: 1 day
#>
#> date_index cases_new
#> <date> <dbl>
#> 1 2020-01-30 3
#> 2 2020-01-31 0
#> 3 2020-02-01 0
#> 4 2020-02-02 0
#> 5 2020-02-03 0
#> 6 2020-02-04 0
#> 7 2020-02-05 2
#> 8 2020-02-06 0
#> 9 2020-02-07 0
#> 10 2020-02-08 8
#> # … with 480 more rows
setDT(dat)
incidence(dat, date, count = cases_new)
#> Error in `[.data.table`(x, date_index): When i is a data.table (or character vector), the columns to join by must be specified using 'on=' argument (see ?data.table), by keying x (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing column names between x and i (i.e., a natural join). Keyed joins might have further speed benefits on very large data due to x being sorted in RAM.
Created on 2021-08-23 by the reprex package (v2.0.1)
The 3 requirements of the new color palette would be:
Quite a bit of thinking on these has been done by the viridis package:
https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html
As well, these very good resources:
I suggest we use this issue to propose palettes. Ideally put them to a vote at some point.
I find the following potential issue:
library(outbreaks)
library(incidence2)
## this works
ebola_sierraleone_2014 %>%
incidence(date_of_onset, interval = "1 week", groups = district) %>%
plot()
## this throws an error
ebola_sierraleone_2014 %>%
incidence(date_of_onset, interval = "1 month", groups = district) %>%
plot()
## note that month interval without group works
ebola_sierraleone_2014 %>%
incidence(date_of_onset, interval = "1 month") %>%
plot()
The date label when interval = 1L
would look better if it said date
rather than bin_date
For instance, pool
may be better named regroup
, and I guess there could be more cases like this. Generally speaking, renaming things from the original incidence
poses some trade-offs. There are several strategies we may consider:
We keep old names as much as possible, and only use new names for new features.
As this is a reboot, we can do away with old names, and rely on documentation for people to find out correspondence. A softer version would be to have incidence2::pool
merely return NULL
(or an error) and throw a message saying that this feature is now called regroup
in incidence2.
We could have incidence2::regroup <- incidence2::pool
. If so, do we want to:
Currently if na_as_group
is set to FALSE then you can end up with some dates with 0 counts at the beginning and n. Need to retrim dates once NAs are removed
Currently i pass nrow
through dots to ggplot2::facet
. Let's make that explicit.
We're now seeking volunteers to review the package code. If interested please leave a message here.
See reprex below:
library(outbreaks)
library(incidence2)
x <- ebola_sim_clean$linelist %>%
incidence(date_index = date_of_onset, interval = "month")
plot(x, show_cases = TRUE)
#> Error in `$<-.data.frame`(`*tmp*`, "width", value = c(30L, 31L, 30L, 31L, : replacement has 13 rows, data has 5829
But the same with weekly intervals works.
The lines below are slow and hacky:
x <- grouped_df(x, c(date_index, groups))
x <- summarise(x, count = count_dates(.data[[date_index]], breaks), .groups = "keep")
x <- mutate(x, {{date_index}} := breaks)
x <- summarise(x, count = sum(.data$count), .groups = "keep")
colnames(x) <- c("bin_date", colnames(x)[-1])
x <- ungroup(x)
replace summarise
with some sort of split / tapply/ rbind combo will probably be quicker
plot
vs facet_plot
The current design with plot
only working with 0-1 group makes sense, and pool
is nice enough to use. However I wonder if users would like this to be wrapped automatically through plot
. Maybe this is something we should ask in community feedback? I think my personnal preference is the current implementation.
I would like to add some more plotting options beyond the geom_col
, which is effectively a histogram. It would be nice to add a geom_point
and a geom_line
, e.g.
## only points
dat %>%
incidence(date_of_onset) %>%
plot(type = "point")
## points and lines
dat %>%
incidence(date_of_onset) %>%
plot(type = c("point", "line")
In terms of x-axis positioning, these would be set at the middle of the corresponding time interval.
It would be nice to offer some small helpers which will be frequently used. Again, community feedback will be useful there. I can think of, for instance:
This is a big selling point, and I like the idea that the new version will look, well, new.
theme_bw()
?A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.