daviddalpiaz / r4sl Goto Github PK

:chart_with_upwards_trend: Machine Learning from the perspective of a Statistician using R

Home Page: https://daviddalpiaz.github.io/r4sl/

TeX 36.31% CSS 9.67% R 7.70% Shell 46.32%

r machine-learning statistical-learning rmarkdown

r4sl's Introduction

`r4sl`

This text is currently written as a supplement to ISL for use in STAT 432, Basics of Statistical Learning at The University of Illinois at Urbana-Champaign. Some additional details and additional topics are included, but the main focus in on providing more thorough R examples. Eventually the goal is to have a completely self-contained text. However, given its massively popularity in the field, we feel that it is still extremely useful to read ISL in its entirety.

Development Status

April 2020 - Significant updates are expected during the Summer of 2020.
- Re-unification of r4sl with bsl.

r4sl's People

Contributors

Stargazers

Watchers

r4sl's Issues

One standard error rule details

Please could you provide details of how you get standard errors from CV in caret to implement the one standard error rule?

Fix broken KNN code

Pretty sure data in some package that is being used was switched form a data.frame to a tibble and something that used to be passed as a vector is now a tibble.....

Adding Python code

Hi David,

This is Nima, and I am almost at the end of my PhD program.

I was searching about something related to Introduction to Statistical Learning, and found your work (pdf file and then I used it to find this repo). Such a great work!

I have had a similar idea about this book. My idea was to create an interactive environment so that reader can play around the code and try "what if" questions they have in mind. I recently uploaded three of my notebooks here: blog (it might seem a little bit messy and I may go through it later to refine it)

I am using python and I know it is usually harder to get some statistics (e.g. I think regression in scikit-learn does not provide p-value or at least I couldn't find an easy way for it there, so I had to use statsmodels library. ). My goal is to provide python-code not just for lab/exercise, but for examples of the book and try to reproduce them. I also provided some links to stackoverflow /youtube videos/ ... when I felt reader might get benefit by getting into some details (e.g. I added a link that explains why we need to z-normalize before PCA).

I was wondering if you have any feedback/suggestion about my work. Also, do you think readers of your book will benefit from my notebooks? Because, to be honest, the format you used for writing the book is very nice and clean. And, it would be really great if I could use your insight on cleaning my notebooks and (if you are interested) adding it to the book. For example, maybe just provide a hyperlink that takes reader to a notebook and they can just play with the code. (or maybe try it on their own and then click on it to see the difference and ....)

Please let me know what you think. Sorry for the long post as I got super excited when I found about this repo :)

Easier Access to .Rmd Files

Maybe hard-code for now. Find better solution for later.

Typo 'glment'

It's in chapter 25. It should be glmnet.

Also, is this Git actively updated?

External References

Gather external references that are usually posted on course website at end of chapters. Also consider ISL and mathematicalmonk videos.

Remove `R` Chapters

Instead, reference appliedstats. Wait until update to appliedstats.

CV Plots

Tables

Change all tables to knitr::kable().

Cache code chunks

As you continue to add on code that is computationally intensive, you will need to start caching the code chunk results. This in turn means you should start labeling your code chunks (otherwise, they're treated like cattle). Otherwise, we'll have to move you to pushing a rendered product via a netlify setup.

caret model$results outputs SD not se, right?

Here you take RMSESD from the results attribute of a train (model) object and then refer to this as se and standard error in the text. However, SD is commonly refers to standard deviation. I think something has to be done to calculate standard error from standard deviation. Or is the output of caret misleading?

Switch from `caret` to `tidymodels` + FES?

Move from caret toward tidymodels org w.r.t parsnips and recipes

Consider looking at Max's Feature Engineering and Selection: A Practical Approach for Predictive Models as well.. Instead of just ISLR.

Another view of Sim Bias Variance Tradeoff

First, I really like the Bias Variance Tradeoff Simulation Section!! Thank you very much for making such a great book.

One suggestion from me.

Figure 1 is not very intuitive for me, it shows too many information.

Alternatively I think this visualization may be better, which only shows the prediction for the x=0.95 cross different model. From 4 sub plots we can clean see, bias and variance.

Here is the code.

  par(mfrow=c(1,4))
  plot(rep(0.95,n_sims),predictions[,1],col="red", xlim = c(0.75, 1), ylim = c(0, 1.5))
  points(x0, f(x0), col = "black", pch = "x", cex = 2)
  plot(rep(0.95,n_sims),predictions[,2],col="blue", xlim = c(0.75, 1), ylim = c(0, 1.5))
  points(x0, f(x0), col = "black", pch = "x", cex = 2)
  plot(rep(0.95,n_sims),predictions[,3],col="green", xlim = c(0.75, 1), ylim = c(0, 1.5))
  points(x0, f(x0), col = "black", pch = "x", cex = 2)
  plot(rep(0.95,n_sims),predictions[,4],col="orange", xlim = c(0.75, 1), ylim = c(0, 1.5))
  points(x0, f(x0), col = "black", pch = "x", cex = 2)

daviddalpiaz / r4sl Goto Github PK

r4sl's Introduction

r4sl

Development Status

r4sl's People

Contributors

Stargazers

Watchers

Forkers

r4sl's Issues

Recommend Projects

Recommend Topics

Recommend Org

`r4sl`