gavinsimpson / gratia Goto Github PK

View Code? Open in Web Editor NEW

187.0 10.0 28.0 253.12 MB

ggplot-based graphics and useful functions for GAMs fitted using the mgcv package

Home Page: https://gavinsimpson.github.io/gratia/

License: Other

Makefile 0.22% R 99.60% CSS 0.01% SCSS 0.01% TeX 0.16%

r-package generalized-additive-models ggplot2 mgcv gam glm lm random-effects smoothing r

gratia's Introduction

gratia

Overview

Graceful ‘ggplot’-based graphics and utility functions for working with generalized additive models (GAMs) fitted using the ‘mgcv’ package. Provides a reimplementation of the plot() method for GAMs that ‘mgcv’ provides, as well as ‘tidyverse’ compatible representations of estimated smooths.

Features

The main features of gratia are currently

A ggplot2-based replacement for mgcv:::plot.gam(): draw.gam().

For example, the classic four term additive example from Gu & Wahba:
Estimated smooths from a GAM
Or for a bivariate smooth:
Estimated smooths from a GAM
Note that some specialist smoothers (bs %in% c("mrf","sw", "sf")) are not currently supported, but univariate, factor and continuous by-variable smooths, simple random effect smooths (bs = 're'), factor-smooth interaction smooths (bs = "fs"), constrained factor smooths (bs = "sz"), full soap film smooths (bs = "so"), and bivariate, trivariate, and quadvariate TPRS and tensor product smooths are supported,
Estimation of derivatives of fitted smoothers: derivatives(),
Estimation of point-wise across-the-function confidence intervals and simultaneous intervals for smooths: confint.gam().
Model diagnostics via appraise()
Model diagnostics figure

Installing gratia

gratia is now available on CRAN, and can be installed with

install.packages("gratia")

however gratia is under active development and you may wish to install the development version from github. The easiest way to do this is via the install_github() function from package remotes. Make sure you have remotes installed, then run

remotes::install_github("gavinsimpson/gratia")

to install the package. Alternatively, binary packages of the development version are available from rOpenSci’s R Universe service:

# Install gratia in R
install.packages("gratia", repos = c(
  "https://gavinsimpson.r-universe.dev",
  "https://cloud.r-project.org"
))

History

gratia grew out of an earlier package, schoenberg, itself a development of the earlier package tsgam, which was originally intended to be used with GAMs fitted to time series. As I was developing tsgam however it became clear that the package could be used more generally and that the name “tsgam” was no longer appropriate. To avoid breaking blog posts I had written using tsgam I decided to copy the git repo and all the history to a new repo for the package under the name schoenberg. At a later date someone released another package called schoenberg to CRAN, so that scuppered that idea. Now I’m calling the package gratia. Hopefully I won’t have to change it again…

Why gratia?

In naming his greta package, Nick Golding observed the recent phenomena of naming statistical modelling software, such as Stan or Edward, after individuals that played a prominent role in the development of the field. This lead Nick to name his Tensor Flow-based package greta after Grete Hermann.

In the same spirit, gratia is named in recognition of the contributions of Grace Wahba, who did pioneering work on the penalised spline models that are at the foundation of the way GAMs are estimated in mgcv. I wanted to name the package grace, to explicitly recognise Grace’s contributions, but unfortunately there was already a package named Grace on CRAN. So I looked elsewhere for inspiration.

The English word “grace” derives from the Latin gratia, meaning “favor, charm, thanks” (according to Merriam Webster).

The chair that Grace Wabha currently holds is named after Isaac J Schoenberg, a former University Madison-Wisconsin Professor of Mathematics, who in a 1946 paper provided the first mathematical reference to “splines”. (Hence the previous name for the package.)

gratia's People

Contributors

Stargazers

Watchers

gratia's Issues

Implement `newdata` in evaluate_smooth methods

Not all evaluate_smooth() take a newdata argument, and arguably they could.

random effect smooths (evaluate_re_smooth())
Need to check that levels of variables involved in the ranef match with the data.
parametric terms (evaluate_parametric_terms())

Might need an dependency on R 3.5.0 or later for isFALSE - check

Or we could go with ! isTRUE(....) to avoid it...

Check when isFALSE was added to R.

Handle models estimated using gamm4()

Need methods to handle gamm4::gamm4() models.

Add function to convert from posterior_samples & co to tidybayes versions

Add as_tidybayes() methods to convert from posterior_samples() and co to equivalents in the tidybayes package.

Need methods for classes

posterior_samples
fitted_samples
predicted_samples

Need to do the following:

Change names of objects to match tidybayes standard. Usually the names are dotted (e.g. .row) and some variables have different names (fitted vs .value for the fitted_draws() function.
Add .chain and .iteration variables as vectors of NA_integer_ as mgcv doesn't work this way.

Export some of the useful utility functions

From @gavinsimpson on April 6, 2018 17:10

The are quite a few useful utility functions for working with mgcv's GAM models that would be handy to export and make available.

Export the following:

Copied from original issue: gavinsimpson/schoenberg#5

Handle Markov random field smoother

Need functions and methods to handle the plotting of MRF smoothers. The MRFtools package will have much better handling of these smooths, where we know a lot more about the penalties being passed to the smooth constructor. However, the basic smoother could have one of two plot

plot the polygons passed to the basis and fill each polygon with fill equal to the fitted values or smooth effect
if we only have neighbours or even just the penalty, then we could still plot something? Could turn the penalty matrix into an adjacency matrix and thence to a graph which we plot with ggraph.

Add a draw method for smooth_samples()

Add draw() method for smooth_samples()

evaluate_1d_smooth doesn't know about ordered by factor smooths

library('mgcv')
library('gratia')
data(CO2)
CO2 <- transform(CO2, fPlant = factor(Plant))

m <- gam(uptake ~ fPlant + s(conc, by = fPlant, k = 5), data = CO2,
         method = 'REML')

draw(m)

yields

> draw(m)
Error: New columns in `add_column()` must be consistent with `.data`:
* `.data` has 1100 rows
* New column contributes 1200 rows

session info

─ Session info ─────────────────────────────────────────────────────────────── setting value version R version 3.5.3 Patched (2019-03-11 r76238) os Fedora 29 (Workstation Edition) system x86_64, linux-gnu ui X11 language (EN) collate en_CA.UTF-8 ctype en_CA.UTF-8 tz America/Regina date 2019-03-19

─ Packages ───────────────────────────────────────────────────────────────────
package * version date lib source
assertthat 0.2.0 2017-04-11 [1] CRAN (R 3.5.1)
backports 1.1.3 2018-12-14 [1] CRAN (R 3.5.1)
callr 3.1.1 2018-12-21 [1] CRAN (R 3.5.1)
cli 1.0.1 2018-09-25 [1] CRAN (R 3.5.1)
colorspace 1.4-0 2019-01-13 [1] CRAN (R 3.5.1)
cowplot 0.9.4 2019-01-08 [1] CRAN (R 3.5.1)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.1)
desc 1.2.0 2018-05-01 [1] CRAN (R 3.5.1)
devtools 2.0.1 2018-10-26 [1] CRAN (R 3.5.1)
digest 0.6.18 2018-10-10 [1] CRAN (R 3.5.1)
dplyr * 0.8.0.1 2019-02-15 [1] CRAN (R 3.5.2)
fs 1.2.6 2018-08-23 [1] CRAN (R 3.5.1)
ggplot2 3.1.0 2018-10-25 [1] CRAN (R 3.5.1)
glue 1.3.1 2019-03-12 [1] CRAN (R 3.5.3)
gratia * 0.2-8 2019-03-04 [1] local
gtable 0.2.0 2016-02-26 [1] CRAN (R 3.5.1)
lattice 0.20-38 2018-11-04 [1] CRAN (R 3.5.3)
lazyeval 0.2.1 2017-10-29 [1] CRAN (R 3.5.1)
magrittr 1.5 2014-11-22 [1] CRAN (R 3.5.1)
Matrix 1.2-16 2019-03-08 [1] CRAN (R 3.5.3)
memoise 1.1.0 2017-04-21 [1] CRAN (R 3.5.1)
mgcv * 1.8-27 2019-02-06 [1] CRAN (R 3.5.3)
munsell 0.5.0 2018-06-12 [1] CRAN (R 3.5.1)
mvtnorm 1.0-10 2019-03-05 [1] CRAN (R 3.5.3)
nlme * 3.1-137 2018-04-07 [1] CRAN (R 3.5.3)
pillar 1.3.1 2018-12-15 [1] CRAN (R 3.5.1)
pkgbuild 1.0.2 2018-10-16 [1] CRAN (R 3.5.1)
pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.5.1)
pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.5.1)
plyr 1.8.4 2016-06-08 [1] CRAN (R 3.5.1)
prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.5.1)
processx 3.3.0 2019-03-10 [1] CRAN (R 3.5.3)
ps 1.3.0 2018-12-21 [1] CRAN (R 3.5.1)
purrr 0.3.1 2019-03-03 [1] CRAN (R 3.5.3)
R6 2.4.0 2019-02-14 [1] CRAN (R 3.5.2)
Rcpp 1.0.0 2018-11-07 [1] CRAN (R 3.5.1)
remotes 2.0.2 2018-10-30 [1] CRAN (R 3.5.1)
rlang 0.3.1 2019-01-08 [1] CRAN (R 3.5.1)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.5.1)
scales 1.0.0 2018-08-09 [1] CRAN (R 3.5.1)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.5.1)
testthat 2.0.1 2018-10-13 [1] CRAN (R 3.5.1)
tibble 2.0.1 2019-01-12 [1] CRAN (R 3.5.1)
tidyr 0.8.3 2019-03-01 [1] CRAN (R 3.5.3)
tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.5.1)
usethis 1.4.0 2018-08-14 [1] CRAN (R 3.5.1)
withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.1)

[1] /home/gavin/R/build/3.5-patched/library

Export which_smooth

Need this for MRFtools.

Handle 're' smooths of a factor and a continuous variable

Need to see what gratia does with ranef smooths of a factor and one or more continuous variables. The code in evaluate_re_smooth() currently assumes that we have two or more factors only.

Give draw methods the possibility to return ggplot2 objects or data.frame

I have a suggestion for a rather small change to the draw methods. Specifically, the current code does not really allow a further customization of the returned plots, as they are wrapped into a plot_grid call (in line with the documentation). I think it would be great if instead there would be the option to return a ggplot2 object in some cases so one can actually customize the plot further. The main idea is to use the current draw method as the basis for further developments.

I can see three different possibilities:

If only one plot is produced returned always return a ggplot2 object and skip the call to plot_grid. This would clearly be the easiest solution and would not require any additional arguments. This would also make for a straight-forward user experience. If I want only one plot, it is likely I want to focus on this and potentially customize it.
Add an additional argument, e.g., as_list which, if TRUE, returns the list plots as a list also skipping the plot_grid call. This would allow customizing even more than one plot, but the cost would of course be that the different plots would not be printed on the same window.
Add an additional argument, e.g., default is return = "plot", but if return = "data" creating the ggplot2 object is skipped and only the data frames used for creating the plot would be returned. In this case, draw is only a convenience function for creating the basis for the plot and the user can create it exactly as they want.

Does any of this option sound good to you? If so, I am happy to produce a corresponding pull request (might not be instantaneous though).

Add wrappers to smoothCon so that it is easy to visualise basis functions

It would be useful for teaching and understanding GAMs if you could quickly and intuitively generate a basis expansion for a covariate, evaluate it at some values producing a tidy object and then plot it with draw() or ggplot().

In draw(), when select is not NULL set parametric = FALSE

If using select a user might expect not to see any parametric terms plotted, but currently they are plotted.

This might need setting of parametric = NULL and if select is NULL set parametric = TRUE so we get the current behaviour of showing all model terms when not using select. But if select != NULL, then we leave parametric alone, so a user can set parametric = TRUE and use select if they want.

draw() not plotting selected smooths for a selected by variable smooth

In this example draw() fails to plot anything

library('gratia')
library('mgcv')
set.seed(42)
dat <- gamSim(4, n = 400, verbose = FALSE)
m <- gam(y ~ fac + s(x2, by = fac) + s(x0), data = dat)
draw(m, select = 's(x2)')

This works

draw(m, select = c('s(x2):fac1', 's(x2):fac2', 's(x2):fac3'), parametric = FALSE)

but it is a pain to specify all the factor levels.

Could modify gratia:::check_user_select_smooths() so that it does partial matching rather than exact matching as currently. Would need a way to turn this on/off in gratia:::draw.gam() though and have it off by default as it would make it more difficult to select specific smooths if you really wanted those specific smooths. Need a grep() on the smooth labels in gratia:::check_user_select_smooths() if an argument in gratia:::draw.gam() (say partial_match) is TRUE, which we pass on to the check function.

gratia:::draw.gam() is working as documented however.

Handle soap film smoother

Soap film smoothers need special handling. There are potentially two smoothers

the boundary loop, a cyclic cubic or p spline, and
the soap film smoother itself on the inside of the boundary.

The boundary may be estimated or take known values.
For the boundary, if it is estimated, the draw method should use a colour gradient to display "effects".

Need:

eval_smooth.soap.film function
plot_smooth.soap_film method

Notes:
There are two smoothers as part of the soap film and these can be set-up separately, esp in tensor product smooths. Need to look at how these differ from the standard bs = "so" smoother.

Right now (12 March 2024) there is an issue with using the separate basis types "sf" and "sw" with gratia because the "sf" basis has class c("sf", "soap.film") and the sf package defines a [[<-.sf method which causes model fitting to fail if sf is loaded, which it is when ggplot2 is loaded and coord_sf() has been called, which is what gratia does. Until that is fixed (I've emailed Simon Wood about it) I will not handle the separate smooths, just the main "so" type.

Expand data_slice() so it can handle up to 4 variables varying

Follow vis.gam() and allow for user-specified var3, and var4 arguments.

IIRC these should be factors or if continuous vars we need a few cut points to essentially discretise these continous variables so that we can create small-multiples plots of the predicted response whilst varying one or both of the main variables of interest (var1 and var2) and whilst keeping any other model covariates fixed at user-specified values or at the observed value closest to the median.

Parametric term evaluation for ziplss models

I'm working with models from the ziplss family that have parametric (i.e. non-smooth) terms, and these cause the evaluate_parametric_term method (and hence draw) to fail with an error message like

Error in evaluate_parametric_term.gam(object, term = terms[i]) : Term is not in the parametric part of model: <x3>,

because it does not handle the list structure of the two predictor formulae gracefully.

Reprex collapsed in here

#gratia error reprex
library(mgcv)
library(gratia)
## simulate some data...
f0 <- function(x) 2 * sin(pi * x); f1 <- function(x) exp(2 * x)
f2 <- function(x) 0.2 * x^11 * (10 * (1 - x))^6 + 10 * 
  (10 * x)^3 * (1 - x)^10
n <- 500;set.seed(5)
x0 <- runif(n); x1 <- runif(n)
x2 <- runif(n); x3 <- runif(n)

## Simulate probability of potential presence...
eta1 <- f0(x0) + f1(x1) - 3
p <- binomial()$linkinv(eta1) 
y <- as.numeric(runif(n)<p) ## 1 for presence, 0 for absence

## Simulate y given potentially present (not exactly model fitted!)...
ind <- y>0
eta2 <- f2(x2[ind])/3
y[ind] <- rpois(exp(eta2),exp(eta2))

## Fit ZIP model... 
b <- gam(list(y~s(x2)+s(x3),~s(x0)+s(x1)),family=ziplss())
draw(b)


b1 <- gam(list(y~s(x2)+x3,~s(x0)+x1),family=ziplss())
draw(b1) # fails
plot(b1,pages=1, all.terms = TRUE) #works

I've traced the issue to

gratia/R/evaluate_smooth.R

Line 546 in 245ac45

vars <- labels(tt) # names of all parametric terms

which doesn't behave as expected as tt is now a two-element list with labels "1" "2".

My hacky attempt to fix this by changing the above line to

vars <- unlist(lapply(tt, labels))

produces reasonable results if there is only a single parameteric predictor in the model, but falls over with a cryptic plot_grid error that I can trace to align_margin, but which I don't understand:

#after hack-fixing evaluate_parametric_term

b2 <- gam(list(y~s(x2)+x3,~s(x0)+s(x1),family=ziplss())
draw(b2) #this works

b3 <- gam(list(y~s(x2)+x3,~s(x0)+x1),family=ziplss())
draw(b3) # this fails

#Error in 1:(grep("null", x)[1] - 1) : NA/NaN argument
#In addition: Warning message:
#In predict.gam(object, newdata = mf, type = "terms", terms = term,  :
#  non-existent terms requested - ignoring
#Called from: FUN(X[[i]], ...)

sessionInfo()

R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] gratia_0.2-8 mgcv_1.8-28  nlme_3.1-140

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2       rstudioapi_0.10  magrittr_1.5     splines_3.6.1    tidyselect_0.2.5 munsell_0.5.0   
 [7] cowplot_1.0.0    colorspace_1.4-1 lattice_0.20-38  R6_2.4.0         rlang_0.4.0      dplyr_0.8.3     
[13] tools_3.6.1      packrat_0.5.0    grid_3.6.1       gtable_0.3.0     lazyeval_0.2.2   assertthat_0.2.1
[19] tibble_2.1.3     crayon_1.3.4     Matrix_1.2-17    tidyr_0.8.3      purrr_0.3.2      ggplot2_3.2.0   
[25] glue_1.3.1       labeling_0.3     compiler_3.6.1   pillar_1.4.2     scales_1.0.0     mvtnorm_1.0-11  
[31] pkgconfig_2.0.2

Export get_smooths_by_id

Export get_smooths_by_id as it's needed for MRFtools

Write some (better) tests for smooth_samples(), posterior_samples(), fitted_samples(), predicted_samples()

Need some better and more comprehensive tests for for

smooth_samples()
posterior_samples()
fitted_samples()
predicted_samples()

draw.derivatives is documented, but not part of CRAN package

draw.derivatives is part of the documentation at https://cran.r-project.org/web/packages/gratia/gratia.pdf, but not part of the distributed package.
How to reproduce:

> install.packages("gratia")
> gratia::draw.derivatives
Error: 'draw.derivatives' is not an exported object from 'namespace:gratia'

Document data sets properly

The datasets in the package need documenting properly. Currently there is little more than roxygen stubs for these.

birdmove
zooplankton

Handle splines on the sphere

Should we try to handle splines on the sphere with a special method (like the perspective plot of a sphere that gets drawn with plot.gam())? Instead, mirror the handling via plot.gam() where scheme > 2, which seems to use basic image plots which we probably already have capability to handle.

Will need to determine if ggplot2 can handle the kind of sphere-plotting that the mgcv special plot method draws, or whether I can cast the predictions on a lat/lon grid (raster) and then project this and plot if using simple features via the sf package?

Rug plots

There is any command in the function draw, that allow to include the observations in the margins (rug plot). I have tried the ggplot function geom_rug() but it does not work.
Thank you

draw doesn't work for gaulss models

New function derivatives()

Add a new function derivatives() to compute finite-difference-based derivatives of fitted splines.

In effect this will soft-deprecate fderiv() as I would prefer a more tidy-friendly return object in keeping with the other functions in gratia.

derivatives() should also offer alternative ways of computing derivatives (forward, backward, or centred).
derivatives() should allow computing second derivatives (see #5 for an initial go at adding this functionality through fderiv())
derivatives() should return a tibble with a variable indexing which smooth the derivatives are for
derivatives() should return confidence or simultaneous intervals
draw() method for derivatives

Handle fs smooths in draw()

From @gavinsimpson on April 12, 2018 17:45

fs smooths are "random smooth + random intercept" smoothers and one might wish to plot all the random smooths on a single plot. In part this is because only a single smoothness parameter is estimated for this type of smoother. This contrasts with factor by variable smooths, where in effect we estimate a separate smooth per level of the factor.

I can see the default fs smooth being drawn as single plot with lots of lines (group = factor in aes()), but with option to facet on factor also well.

Copied from original issue: gavinsimpson/schoenberg#6

Empty plots when model has 2 or more factor variables

This happened to me recently, when I tried to recreate a plot from a previous project (worked perfectly about a month ago). When a model has 2 or more factor variables, an extra empty plot is drawn between them.
As far as I can tell, it's a problem with the number of parametric term elements returned - there is an extra NULL element in the final plot list to draw, which causes the empty plot..
More empty plots are added for each additional factor variable in the model - 1 between factor1 and factor2, 2 between factor2 and factor3, and I suppose it would go on.

Example below.

library(mgcv)
#> Loading required package: nlme
#> This is mgcv 1.8-28. For overview type 'help("mgcv-package")'.
library(gratia)

## simulate some data
dat <- gamSim(4, n = 300)
#> Factor `by' variable example

## make a second factor variable
dat$fac2 <- dat$fac

## GAM with 2 factors and 2 numeric terms
mod.2factors <- gam(y ~ s(x0) + s(x1) + factor(fac) + factor(fac2), data = dat, family = gaussian(link = "identity"))

draw(mod.2factors)

### doesn't happen when there is only one factor
mod.1factor <- gam(y ~ s(x0) + s(x1) + s(x2) + factor(fac), data = dat, family = gaussian(link = "identity"))

draw(mod.1factor)

### the problem seems to be compounded (more extra empty plots) for each additional factor term
dat$fac3 <- dat$fac
mod.3factors <- gam(y ~ s(x0) + s(x1) + factor(fac) + factor(fac2) + factor(fac3), data = dat, family = gaussian(link = "identity"))

draw(mod.3factors)

^{Created on 2019-04-05 by the reprex package (v0.2.1)}

Session info:
R version 3.5.3 (2019-03-11)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 19

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=bg_BG LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=bg_BG LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=bg_BG LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] reprex_0.2.1 gratia_0.2-8 mgcv_1.8-28 nlme_3.1-137

loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 pillar_1.3.1 compiler_3.5.3 plyr_1.8.4 tools_3.5.3
[6] digest_0.6.18 evaluate_0.13 tibble_2.1.1 gtable_0.3.0 lattice_0.20-38
[11] pkgconfig_2.0.2 rlang_0.3.3 Matrix_1.2-17 rstudioapi_0.10 yaml_2.2.0
[16] mvtnorm_1.0-10 xfun_0.6 knitr_1.22 dplyr_0.8.0.1 fs_1.2.7
[21] grid_3.5.3 cowplot_0.9.4 tidyselect_0.2.5 glue_1.3.1 R6_2.4.0
[26] processx_3.3.0 rmarkdown_1.12 callr_3.2.0 whisker_0.3-2 ggplot2_3.1.0
[31] purrr_0.3.2 tidyr_0.8.3 clipr_0.5.0 magrittr_1.5 ps_1.3.0
[36] htmltools_0.3.6 scales_1.0.0 splines_3.5.3 assertthat_0.2.1 colorspace_1.4-1
[41] labeling_0.3 lazyeval_0.2.2 munsell_0.5.0 crayon_1.3.4

Error in confint() when using a location-scale family

The confint function throws an error when used on a location scale family gam model. Are distributional models not supported yet?

library(mgcv)
#> Loading required package: nlme
#> This is mgcv 1.8-28. For overview type 'help("mgcv-package")'.
library(gratia)

set.seed(111)
df <- gamSim(n = 2000, eg = 4)
#> Factor `by' variable example

## global shared smooth + group level smooths and a group level intercept for x0
## only one smoothing parameter
## scale modeled using group level smooths and group level intercept  
sim_gam <- gam(list(y ~ s(x0, bs = "cs") + s(x0, bs = "cs", by = fac) +
                 s(fac, bs = "re"),~ s(x0, bs = "cs", by = fac) + s(fac, bs = "re")), 
                data = df, method = "REML", family = gaulss())

cis <- confint(sim_gam, parm = "x0", unconditional = T, type = "simultaneous")
#> Error in vapply(object[["smooth"]][take], by_level, character(1L)): values must be length 1,
#>  but FUN(X[[1]]) result is length 0




sessionInfo()
#> R version 3.6.1 (2019-07-05)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Debian GNU/Linux 9 (stretch)
#> 
#> Matrix products: default
#> BLAS/LAPACK: /usr/lib/libopenblasp-r0.2.19.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] gratia_0.2-8 mgcv_1.8-28  nlme_3.1-140
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.2       compiler_3.6.1   pillar_1.4.2     highr_0.8       
#>  [5] tools_3.6.1      digest_0.6.20    evaluate_0.14    tibble_2.1.3    
#>  [9] gtable_0.3.0     lattice_0.20-38  pkgconfig_2.0.2  rlang_0.4.0     
#> [13] Matrix_1.2-17    yaml_2.2.0       mvtnorm_1.0-11   xfun_0.8        
#> [17] stringr_1.4.0    dplyr_0.8.3      knitr_1.24       grid_3.6.1      
#> [21] cowplot_1.0.0    tidyselect_0.2.5 glue_1.3.1       R6_2.4.0        
#> [25] rmarkdown_1.14   tidyr_0.8.3      ggplot2_3.2.1    purrr_0.3.2     
#> [29] magrittr_1.5     scales_1.0.0     htmltools_0.3.6  splines_3.6.1   
#> [33] assertthat_0.2.1 colorspace_1.4-1 stringi_1.4.3    lazyeval_0.2.2  
#> [37] munsell_0.5.0    crayon_1.3.4

^{Created on 2019-10-02 by the reprex package (v0.3.0)}

Add appraise methods for other classes of model

appraise() could also work with other similar models. Add methods for

glm()
lm()
gamm()

(Add others to the list as needed and if required discuss below.)

Need an informative error message when selected smooth doesn't exist with evaluate_smooth.gam() etc

If you ask for a smooth that doesn't exist in evaluate_smooth.gam() you'll get this error message:

Error in SMOOTHS[[1]] : subscript out of bounds

Catch this problem earlier and throw a useful error message.

Multiple character select not working in draw()

In the example, only the smooth s(x0) is plotted.

set.seed(1)
dat1 <- gamSim(1, n = 400, dist = "normal", scale = 2, verbose = FALSE)
m1 <- gam(y ~ s(x0) + s(x1) + s(x2) + s(x3), data = dat1, method = "REML")
draw(m1, select = c("s(x0)", "s(x1"))

Handle models fitted using mgcv::bam()

There are currently no specific methods for GAMs estimated using bam(). Need to run tests of functionality for bam() models and add specific methods or functions where special handling of the model objects is needed.

problem in code reproduction

When I followed the code in "Supplementary materials for: Modelling palaeoecological time series using generalized additive models" the result of

sw.cint <- confint(mod, parm = "Year", newdata = newYear, type = "confidence")
sw.sint <- confint(mod, parm = "Year", newdata = newYear, type = "simultaneous")
head(sw.sint)

are different from the guide.

  smooth  by_variable  Year   est    se  crit   lower upper
  <chr>   <fct>       <dbl> <dbl> <dbl> <dbl>   <dbl> <dbl>
1 s(Year) NA          1846. 0.478 0.163  2.92 0.00223 0.954
2 s(Year) NA          1847. 0.481 0.152  2.92 0.0357  0.926
3 s(Year) NA          1848. 0.483 0.143  2.92 0.0665  0.900
4 s(Year) NA          1849. 0.486 0.134  2.92 0.0940  0.878
5 s(Year) NA          1850. 0.489 0.127  2.92 0.118   0.859
6 s(Year) NA          1850. 0.491 0.121  2.92 0.138   0.845

Thus, the d15N values are lower than original values, despite with the same trend.
cannot find the bug after a few trying, please help.

How to draw() the result of get_by_smooth?

Thanks for this great package! This is probably a better question for the blog post about gratia but I wasn't able to make it past the captcha.

I'm trying to 'draw' the result of a 'get_by_smooth" and getting

Error in xy.coords(x, y, xlabel, ylabel, log) :
'x' is a list, but does not have components 'x' and 'y'

I want to plot the smooth on the x axis and the response on the y axis, and can't find a way to do something like this in the help.

Add support for tidybayes fitted_draws and posterior_draws

This came up in mjskay/tidybayes#130

Now that simulate.gam and related methods in gratia do the correct thing, I should revisit this.

@mjskay I follow what structures/layout/naming conventions are needed to support this, but was the intention that I include gratia:::fitted_draws.gam and gratia:::predicted_draws.gam or that I'd have some functions that provide equivalent output that you can wrap in tidybayes?

Either is fine with me.

enable derivatives for 2d surfaces

Ported from comments on website -- folks often do GAMs with a 2d smoother (i.e. lat an long). The Deriv function (outside of gratia) currently "handles" this by returning the same values for both smoother components. @gavinsimpson indicated developments to derivatives() function should include a tidy-based interface for this.

Confidence intervals of bivariate smooths

From @scottkosty on April 6, 2018 16:30

This is a feature request for adding support for confidence intervals of bivariate smooths.

In the below example code, there is one call to confint that succeeds, but when adding a newdata argument or a type argument, it fails.

library(mgcv)
library(gratia)
set.seed(2)
dat <- gamSim(2, n = 4000, dist = "normal", scale = 1)
m2 <- gam(y ~ s(x, z, k = 30), data = dat$data, method = "REML")

# this works (in the sense of no error)
test1 <- confint(m2, parm = "x,z")

# however, these fail
test2 <- confint(m2, parm = "x,z", type = "simultaneous")
newd_ <- data.frame(x = dat$x, y = dat$y, z = dat$z)
test3 <- confint(m2, parm = "x,z", newdata = newd_)

# I'm guessing these are not expected to work, but I tried them just in case,
# and they do not succeed:

# since there is only one smooth, I thought I could exclude the parm
test4 <- confint(m2, newdata = newd_)
test5 <- confint(m2, type = "simultaneous")

# try specifying parm with s()
test6 <- confint(m2, parm = "s(x,z)", newdata = newd_)
test7 <- confint(m2, parm = "s(x,z)", type = "simultaneous")

# try specifying parm with number
test8 <- confint(m2, parm = 1, newdata = newd_)
test9 <- confint(m2, parm = 1, type = "simultaneous")

Copied from original issue: gavinsimpson/schoenberg#3

appraise fails for models with quasi family

Currently appraise fails for models that have any of the quasi families specified

df <- gamSim(n = 400, verbose = TRUE, eg = 3)
Continuous `by' variable example
m <- gam(y ~ s(x1) + s(x2), data = df,
          family = quasi(link = "identity", variance = "constant"),
          method = "REML")
appraise(m) # FAILS
Error in qq_uniform(model, n = n_uniform, type = type) : 
  Quantile function for family <quasi> not available.
gam.check(m)

Method: REML   Optimizer: outer newton
full convergence after 7 iterations.
Gradient range [-1.290903e-08,5.530441e-09]
(score 528.3638 & scale 4.873964).
Hessian positive definite, eigenvalue range [0.3017501,198.5413].
Model rank =  19 / 19 

Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.

        k'  edf k-index p-value
s(x1) 9.00 1.94    1.00    0.56
s(x2) 9.00 6.61    1.05    0.86]

confint() for derivatives() objects in addition to fderiv() objects?

@gavinsimpson replied on twitter: "The pointwise intervals can be added easily; I'm thinking add_confint() as the function to add them. I need to think about how to do the simultaneous intervals; they need something not easily passed in a tibble so I need a little more work to figure it out."

Do you think this still on the table? Thanks again for a fantastic package. Very useful.

Add option for draw.gam() to return a list of plots instead of cowplot::plot_grid() output

Since draw.gam() returns the output of cowplot::plot_grid(), it can be difficult to modify the properties of individual panels. It can be convenient to users to have the option to modify those individual panels and then call plot_grid themselves.

Maybe some kind of optional argument for whether to join the panels together?

draw.gam(..., assemble = TRUE) # returns plot_grid output as currently
draw.gam(..., assemble = FALSE) # returns the list of plots in `g`.

Simultaneous overall prediction bands

From @scottkosty on June 13, 2018 17:43

Consider the following example:

library("mgcv")
set.seed(2)
dat <- gamSim(1, n = 400, dist = "normal", scale = 2)
mod <- gam(y ~ s(x0) + s(x1) + s(x2) + s(x3), data = dat, method = "REML")

It would be nice to get simultaneous bands for

f(x0) + f(x1) + f(x2) + f(x3)

where f(xn) is the true effect of xn on y.

For example, the following estimated values and standard errors correspond to the overall prediction:

predict(mod, se.fit = TRUE)

The hard part is getting a critical value that yields valid simultaneous bands.

This issue could be generalized to gettig simultaneous band for any combination of the smooth terms, e.g.,

f(x2) + f(x3)

but I'm not sure how much complexity this abstraction adds.

As for implementation, based on reading your extremely useful blog articles, I wonder whether it is possible through simulation from the joint posterior of all the smooth terms (does mgcv allow for this?), or from a joint multivariate normal.

Copied from original issue: gavinsimpson/schoenberg#8

Add reference intervals to simulate method of qq_plot

The simulate method for qq plots can return reference intervals by applying quantile() to the the simulated data for each observation, before we generate the theoretical quantiles for the entire set of simulated values plotted on the x-axis of the QQ plot.

derivatives can't handle by variable smooths

I really like the derivatives function, but I have been running into this issue.
The issue occurs whether I use ordered or unordered factors in the by-factor smooths. I have been mostly testing this with gamm, but it seems to not work for gam either.
For example:

modobj <- gamm(y ~ s(age,by = sex),random=list(ID=~1)
df <- derivatives(modobj,'age)

This returns the derivative for the first level of the by factor smooth, but returns all zeros for the derivative and standard error for the next level.

If I run this in the ordered factor case to test for interactions,
modobj <- gamm(y ~ s(age) +oSex + s(age,by = oSex),random=list(ID=~1)
df <- derivatives(modobj,'age)

It returns the derivative for the reference smooth (s(age)), but all zeros for the derivative and standard error of the by factor smooth.

evaluate_fs_smooth is not setting fs_variable column nor the factor data column variable

From the example on Introducing gratia blog post comments

> evaluate_smooth(mod, "x1")
# A tibble: 1,000 x 6
   smooth    fs_variable      x1 f       est    se
   <chr>     <fct>         <dbl> <fct> <dbl> <dbl>
 1 s(x1,fac) NA          0.00164 1     -2.39 0.859
 2 s(x1,fac) NA          0.0117  1     -2.33 0.838
 3 s(x1,fac) NA          0.0217  1     -2.27 0.817
 4 s(x1,fac) NA          0.0318  1     -2.21 0.796
 5 s(x1,fac) NA          0.0418  1     -2.15 0.776
 6 s(x1,fac) NA          0.0519  1     -2.10 0.756
 7 s(x1,fac) NA          0.0619  1     -2.04 0.736
 8 s(x1,fac) NA          0.0720  1     -1.98 0.717
 9 s(x1,fac) NA          0.0820  1     -1.93 0.698
10 s(x1,fac) NA          0.0921  1     -1.87 0.680

fs_variable should include fac for this term, and f variable should be fac. The former is known; this was not yet implement when I updated the return object due to by variables. The latter is a bug.

Forthcoming release of ggplot2 and gratia

This is just a note to let you know that the upcoming release of ggplot2 includes several improvements to plot rendering, including the ability to specify lineend and linejoin in geom_rect() and geom_tile(), and improved rendering of text. These improvements will result in subtle changes to your vdiffr dopplegangers when the new version is released.

Because vdiffr test cases do not run on CRAN by default, your CRAN checks will still pass. However, we suggest updating your visual test cases with the new version of ggplot2 as soon as possible to avoid confusion. You can install the development version of ggplot2 using remotes::install_github("tidyverse/ggplot2").

If you have any questions, let me know!

Implement functions to add posterior draws to data

Now that we have predicted_samples() and related functions, should add add_predicted_samples() so that we can add predicted samples to data. Implement for

add_predicted_samples()
add_fitted_samples()
add_posterior_samples()

Also possibly for

add_smooth_samples()

but I'm not sure how this one would work as you need to extend the data many times due to the need to evaluate posterior smooths at a number of locations.

Empty plots with selected smooths and parametric = TRUE in draw.gam()

In the following example, if we plot a selected smooth with parametric = TRUE (the default), we get lots of empty plots.

library('gratia')
library('mgcv')
set.seed(42)
dat <- gamSim(4, n = 400, verbose = FALSE)
m <- gam(y ~ fac + s(x2, by = fac) + s(x0), data = dat)
draw(m, select = 's(x2)fac1')

Take another look at the simulate methods & update

Are the simulate() methods doing what is right/consistent?

They simulate from the posterior of a smooth and one might expect to simulate new data from the model. simulate.gam() and co should work as the base R simulate() methods do, so check on that. But still want to simulate from the posterior of specific smooths or model terms.

This also came up in mjskay/tidybayes#130 so consider the discussion there when revisiting this.

We've also thought about posterior_sample() etc for MRFtools (see eric-pedersen/MRFtools#3 ) so we should try to harmonise the two *mgcv(-related packages.

Add a function to load mgcv quietly

mgcv spits out a brief message on loading, which I want to suppress in the reference output for examples. However, suppressPackageStartupMessages() is an obnoxiously long function name. Write a simple wrapper to library() that uses the obnoxiously-long-named function.

load_mgcv()?

Add rootograms as one of the diagnostic plots

Add rootogram support for several response distributions:

For the Tweedie models can use mgcv::ldTweedie() for the log density of the Tweedie distribution.