proback / beyondmlr Goto Github PK

Repo for January 2021 version of Beyond Multiple Linear Regression: Applied Generalized Linear Models and Multilevel Models in R. The rendered version can be found at: https://bookdown.org/roback/bookdown-BeyondMLR/

TeX 97.57% CSS 0.20% R 2.23%

beyondmlr's Introduction

BeyondMLR

Copyright

© 2021 by Taylor & Francis Group, LLC. Except as permitted under U.S. copyright law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by an electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

beyondmlr's People

Contributors

Stargazers

Watchers

Forkers

garylarson perlatex zhujiedong kapilkhanal jakemkc anhnguyendepocen emmanuel-r8 quadfather85 snowdj nathantintle stevenworthington krollgrantoza amrofi murraycadzow szilviaaltorjai jfieberg ikahrilas mjs7633 lejarx ym-han difiore lopez12-crypto jackmwolf martin-morales-prog anouel jiayue-he cmk6318 douglaswhitaker napaxton eknackstedt sdv10 houlad alyssaknapp paulstey anthelix cambrone benckrick spencereanes r-tutorials icecream5058 pharmanuel lesaffrea alex-gable endraswara rayortigas jenkin-t rogerclarkgc jmgraham30 nemochina2008 wraat pman1971 ggreeley ziwenzu-zz fmigone lidongfeng66 dwhite710 the-statistical-support-network davilamaciel ingeborgolie agjest spoicts izzywad nicholaskarlson gitpeser gb300896 kmccollam simonekorteling ybniasgy moamer1369 yawomkobara salvatore03 sohmeiling uberstig rohitpandey13 ara156 andrewjsage shansshe sondergardm billyi01 jaseziv dnlebaron1 99dd9 karbartolome tanishqjha2298 szhaoestunchat ericlesvieira brufico camillefairbourn sianahalim tibi2010 lukeg22 sillyboy77 chenyuw173 tdylanne nikita868 ljubisapetrovic roberval1960 shinminhsu anshori2022 chelseapan

beyondmlr's Issues

Answers to the questions at the end of each chapter?

Is it possible to get answers to the questions? I want to check if my understanding is correct/ improving?

HMLdiag -> HLMdiag

In index.Rmd, HMLdiag should be HLMdiag

Chapters 9 & 11 Fail to Knit

Problem

Chapter 9 and 11 are failing to knit due to changes in dplyr(1.0) or broom(0.7)

I consulted this StackOverflow post for guidance (and provided my own solution) in order to solve the error which occurs in the below locations.

BeyondMLR/09-Two-Level-Longitudinal-Data.Rmd

Line 400 in e6ebd56

tidy(fit) %>%
BeyondMLR/09-Two-Level-Longitudinal-Data.Rmd

Line 407 in e6ebd56

tidy(fit) %>%
BeyondMLR/09-Two-Level-Longitudinal-Data.Rmd

Line 414 in e6ebd56

glance(fit) %>%
BeyondMLR/09-Two-Level-Longitudinal-Data.Rmd

Line 461 in e6ebd56

tidy(fit) %>%
BeyondMLR/09-Two-Level-Longitudinal-Data.Rmd

Line 468 in e6ebd56

tidy(fit) %>%
BeyondMLR/09-Two-Level-Longitudinal-Data.Rmd

Line 475 in e6ebd56

glance(fit) %>%
BeyondMLR/11-Generalized-Linear-Multilevel-Models.Rmd

Line 523 in e6ebd56

tidy(fit) %>%
BeyondMLR/11-Generalized-Linear-Multilevel-Models.Rmd

Line 628 in e6ebd56

tidy(fit) %>%

Example Solution

Looking at the documentation for do(), it appears to have been superseded with a recommendation to use nest_by(). Conveniently, the documentation examples cover almost this exact use case (see details)

# do() with named arguments becomes nest_by() + mutate() & list()
models <- by_cyl %>% do(mod = lm(mpg ~ disp, data = .))
# ->
models <- mtcars %>%
  nest_by(cyl) %>%
  mutate(mod = list(lm(mpg ~ disp, data = data)))
models %>% summarise(rsq = summary(mod)$r.squared)

# use broom to turn models into data
models %>% do(data.frame(
  var = names(coef(.$mod)),
  coef(summary(.$mod)))
)
# ->
if (requireNamespace("broom")) {
  models %>% summarise(broom::tidy(mod))
}

For the chunk containing errors on lines 400-414:

regressions <- smallchart.long %>% 
  nest_by(schoolid) %>% 
  mutate(fit = list(lm(MathAvgScore ~ year08, data=data)))

sd_filter <- smallchart.long %>%
  group_by(schoolid) %>%
  summarise(sds = sd(MathAvgScore)) 

regressions <- regressions %>%
  right_join(sd_filter, by="schoolid") %>%
  filter(!is.na(sds))

lm_info1 <- regressions %>%
  summarise(tidy(fit)) %>%
  ungroup() %>%
  select(schoolid, term, estimate) %>%
  spread(key = term, value = estimate) %>%
  rename(rate = year08, int = `(Intercept)`)

lm_info2 <- regressions %>%
  summarise(tidy(fit)) %>%
  ungroup() %>%
  select(schoolid, term, std.error) %>%
  spread(key = term, value = std.error) %>%
  rename(se_rate = year08, se_int = `(Intercept)`)

lm_info <- regressions %>%
  summarise(glance(fit)) %>%
  ungroup() %>%
  select(schoolid, r.squared, df.residual) %>%
  inner_join(lm_info1, by = "schoolid") %>%
  inner_join(lm_info2, by = "schoolid") %>%
  mutate(tstar = qt(.975, df.residual), 
         intlb = int - tstar * se_int, 
         intub = int + tstar * se_int,
         ratelb = rate - tstar * se_rate, 
         rateub = rate + tstar * se_rate)

This solution can nearly be line-for-lined copy for the errors occurring on lines 461-475.

Chapter 9 also has an issue here knitting due to failure to converge. Using 500 iterations seemed to do the trick:

hcs.lme=lme(MathAvgScore ~ year08 * charter, chart.long, 
  random =  ~ 1 | schoolid, na.action=na.exclude,
  correlation=corCompSymm(form = ~ 1 |schoolid), 
  weights=varIdent(form = ~1|year08), control = lmeControl(msMaxIter=500))

summary(hcs.lme)                                                                                                                                                                                   
# Linear mixed-effects model fit by REML
#   Data: chart.long 
#       AIC     BIC  logLik
#   10299.2 10348.3 -5140.6
# 
# Random effects:
#  Formula: ~1 | schoolid
#         (Intercept) Residual
# StdDev: 0.002264717 6.534915
# 
# Correlation Structure: Compound symmetry
#  Formula: ~1 | schoolid 
#  Parameter estimate(s):
#      Rho 
# 0.8209145 
# Variance function:
#  Structure: Different standard deviations per stratum
#  Formula: ~1 | year08 
#  Parameter estimates:
#        0        1        2 
# 1.000000 1.127902 1.079423 
# Fixed effects:  MathAvgScore ~ year08 * charter 
#                   Value Std.Error   DF   t-value p-value
# (Intercept)    652.3347 0.2828597 1113 2306.2126  0.0000
# year08           1.1831 0.0907869 1113   13.0320  0.0000
# charter         -5.9106 0.8611940  616   -6.8633  0.0000
# year08:charter   0.8316 0.3032040 1113    2.7426  0.0062
#  Correlation: 
#                (Intr) year08 chartr
# year08         -0.208              
# charter        -0.328  0.068       
# year08:charter  0.062 -0.299 -0.308
# 
# Standardized Within-Group Residuals:
#        Min         Q1        Med         Q3        Max 
# -4.9760770 -0.4490767  0.0865079  0.5669240  3.0970658 
# 
# Number of Observations: 1733
# Number of Groups: 618 

hcs.lme$modelStruct                                                                                                                                                                                
# reStruct  parameters:
#  schoolid 
# -7.967465 
# corStruct  parameters:
# [1] 1.998216
# varStruct  parameters:
# [1] 0.1203593 0.0764270

anova(hcs.lme,cs.lme)   # hcs not converging here                                                                                                                                                  
#         Model df      AIC      BIC    logLik   Test  L.Ratio p-value
# hcs.lme     1  9 10299.20 10348.30 -5140.600                        
# cs.lme      2  7 10315.94 10354.13 -5150.973 1 vs 2 20.74528  <.0001

Finally, in Chapter 11, there's a missing library(broom) and a handful of unscoped select() calls needing dplyr:: prefixed.

BeyondMLR/11-Generalized-Linear-Multilevel-Models.Rmd

Line 327 in e6ebd56

select(-group)
BeyondMLR/11-Generalized-Linear-Multilevel-Models.Rmd

Line 478 in e6ebd56

select(.,4, 5, 6, 7, 9, 10, 15, 19)

Hope this unsolicited help is, well, helpful!

Link to 404 in Preface

The link to the instructor solutions manual brings the user to https://bookdown.org/roback/bookdown-BeyondMLR/www.routledge.com which displays a 404 error.

BeyondMLR/index.Rmd

Line 53 in 8614a5d

 Three types of exercises are available for each chapter. **Conceptual exercises** ask about key ideas in the contexts of case studies from the chapter and additional research articles where those ideas appear. **Guided exercises** provide real data sets with background descriptions and lead students step-by-step through a set of questions to explore the data, build and interpret models, and address key research questions. Finally, **Open-ended exercises** provide real data sets with contextual descriptions and ask students to explore key questions without prescribing specific steps. A solutions manual with solutions to all exercises is available to qualified instructors at our [book’s website](www.routledge.com). 

Missing library declarations in index.Rmd

In index.Rmd, line 146, the following block:

{r, echo=FALSE}
package_info(needed_pkgs) %>% 
  as_tibble() %>% 
  filter(attached == TRUE) %>% 
  dplyr::select(package, version = ondiskversion) %>% 
  kable(
    booktabs = TRUE, 
    linesep = "",
    longtable = TRUE,
    col.names = c("R package", "Version used")
  ) %>% 
  kable_styling(font_size = ifelse(is_latex_output(), 9, 16))

Needs to start with:

library(tibble)
library(dplyr)

For the compilation to succeed

Section 6.4.1 probability typo

In Section 6.4.1, we read:

When the goalkeeper’s team is behind, the probability of a successful penalty kick is p = 22/24 or 0.833.

This is incorrect. The probability is .917.

22 / 24

[1] 0.9166667

Translation

Hi,

I found this book very useful and would like to translate into my mother language, Japanese. I have checked the license (CC-BY-NC-SA), so I think I can translate and publish at bookdown free. In the future, if I would like to publish a hard copy version, then would I need a permission?

Final chapter title becomes document title

I have made the changes indicated in Issues 10, 11, 12, and 13.

Doing that, and making sure that I am using the latest (as of 23 April) version of library(knitr), by entering remotes::install_github('yihui/knitr'), I get the following problem.

The title of the final chapter becomes the title of the document overall. Thus, the title of the document built from that repo isn't "Beyond Multiple Linear Regression" but instead "Chapter 11: Multilevel Generalized Linear Models."

I commented out the final chapter in _bookdown.yml (see below), with the result that Chapter 10's title became the overall document title:

book_filename: "bookdown-BeyondMLR"
chapter_name: "Chapter "
output_dir: docs
rmd_files: ["index.Rmd", 
  "01-Introduction.Rmd",
  "02-Beyond-Most-Least-Squares.Rmd",
  "03-Distribution-Theory.Rmd",
  "04-Poisson-Regression.Rmd",
  "05-Generalized-Linear-Models.Rmd",
  "06-Logistic-Regression.Rmd",
  "07-Correlated-Data.Rmd",
  "08-Introduction-to-Multilevel-Models.Rmd",
  "09-Two-Level-Longitudinal-Data.Rmd",
  "10-Multilevel-Data-With-More-Than-Two-Levels.Rmd",
  # "11-Generalized-Linear-Multilevel-Models.Rmd",
  "99-References.Rmd"]
clean: [packages.bib, bookdown.bbl]

A commenter over at SO suggested that something in the index.Rmd could be causing this issue, but I could not find anything obvious…

Missing Section in Chapter 9

The final paragraph in section 9.3.1 says that details on how to convert data from wide to long in R can be found in section 9.8. However, section 9.8 only has information on model fitting using R.

Missing variable from data (Chapter 4.10)

Thank you for putting together this amazing book!

The data weekendDrinks.csv used in section 4.10 seems to be missing the variable firstYear. Would be grateful if you were to add it.

Question/Suggestion for Chapter 4

Dear authors,

I have found your book quite helpful in understanding some important aspects of Poisson regression - thank you so much for making this great resource available for free!

I have a minor suggestions, which might be misguided both in content and place on github, please delete if that is the case.

In chapter 4.4.3 Estimation and Inference you have the sentence: "or decreases by 0.5% (since 1−.995=.005)". This threw me aback when calculating my coefficients, because all of a sudden I got negative percentages, where I should see an increase. Hence, I started to look at other resources, Woolridge's Introductory Econometrics 5th Ed.: A Modern Approach (page 608) in particular and found, that it is more intuitive, if done the other way around: .995-1 = -.005. You will then get a negative percentage for a decrease and a positive percentage for an increase.

I would love to learn more about it, in case I am wrong here.

Thanks again for your work and all the best,
Paul

Typo in Ch3, Gamma example

Hi folks,

One of my students spotted that in Chapter 3, the section on gamma, there may be a small typo in the example. It is currently lines 447–449 in the version on GitHub.

Suggest that:

\begin{align*}
P(Y < 3) = \int_0^3 \frac{2^4}{\Gamma(5)} y^{4} e^{-2y}dy = 0.715.
\end{align*}

should be:

\begin{align*}
P(Y < 3) = \int_0^3 \frac{2^5}{\Gamma(5)} y^{4} e^{-2y}dy = 0.715.
\end{align*}

Change is in the numerator, 2^4 -> 2^5, as r is 5.

Missing Response variable from data (Chapter 8 exercise)

This concerns the open-ended exercise in chapter 8 (8.13.3) : The response variable ambiguity is not found in the ambiguity.csv file.

Incomplete Weekend Drinking dataset in Subsection 4.10

Hi,
Seems the Weekend Drinking dataset (weekendDrinks.csv) is incomplete used in Subsection 4.10.
The provided data does not contain firstYear column.
Thanks!

Offset in Negative Binomial Regression

In Chapter 4 (Poisson Regression), when Negbin model is estimated, the way the offset in included seems to be wrong. The results when "offset(enroll1000)" is added to the formula differs considerably from the results when "offset(log(enroll1000))" is added as an additional argument. Could you check that?...Thanks for this extraordinary job!!

win_pct in NBA1718team.csv

It looks like the win_pct column is the number of wins rather than the percentage.