Git Product home page Git Product logo

datatrain_workshop_ml's Introduction

Workshop: Data Train - Intro to ML

Workshop component to the Machine Learning workshop in the Data Train 2023 series.

Repository contains .Rmd notebooks for hands-on ML demos & exercises.
Example code for exercises will be made available after the exercises.

Workshop Quick Start Instructions:

  1. In RStudio, install the usethis package and enter
usethis::create_from_github(repo = "https://github.com/bips-hb/datatrain_workshop_ml.git", fork = FALSE)

This will create a new RStudio project with the workshop materials.
See more detailed installation instructions below.

  1. Open the project and run renv::restore(prompt = FALSE) to install required R packages.

If the following example code produces a plot, you're probably good to go:

library(mlr3verse)
rr <- resample(tsk("sonar"), lrn("classif.ranger", predict_type = "prob"), rsmp("bootstrap", repeats = 5))
autoplot(rr, type = "prc")

(You'll learn what that piece of code does in the workshop :)

Agenda

TBD

Structure

  • July 4+5, 2023, 9:00 AM - 5:00 PM
  • UNICOM 7.1020, Large Seminar Room, Mary-Somerville-Str. 7, 28359 Bremen

Possibly subject to change:

  • 4 * 3-hour blocks, each ~90min Theory, short break, ~90min Hands-on
  • Hands-on exercises on local devices, bring a laptop!
  • Notebooks in this repository correspond to blocks in workshop

Installation Instructions

  1. Install R for your platform: https://cran.r-project.org/
  • Installation instructions depend in whether you're using Windows, Linux (whichever flavor), or macOS.
  • We assume R version 4.3.0 but all recent versions should work fine.
  1. Install RStudio Desktop: https://posit.co/download/rstudio-desktop/

  2. Create a local copy of this workshop repository (https://github.com/bips-hb/datatrain_workshop_ml.git), using any one of these options (use whichever you are most familiar with): a) Using R and the usethis package: usethis::create_from_github(repo = "https://github.com/bips-hb/datatrain_workshop_ml.git", fork = FALSE) b) Running this in the terminal: git clone https://github.com/bips-hb/datatrain_workshop_ml.git c) Using RStudio's New Project -> Version Control dialog to clone the repository.

  3. Install R packages required for the workshop by opening the workshop repository in RStudio (e.g. double-clicking the datatrain_workshop_ml.Rproj file)
    and run renv::restore(prompt = FALSE).
    {renv} will automatically install all R packages with the correct versions listed in renv.lock.

In some cases, installation with renv might fail. Should that be the case, delete the .Rprofile file or comment out the line source("renv/activate.R") in that file, and then refer to the manual package installation instructions below.

Manual package installation instructions

You should only need to install all packages manually if you were not able to use renv to install them automatically. (Or if you're trying to get this code to run in a different environment than this repository)

packages <- c(
  # Data
  "palmerpenguins", "mlr3data",
  # Learner backends
  "ranger", "xgboost", "kknn", "rpart", "e1071", "randomForest",
  "mlr3verse", # installs "mlr3", "mlr3learners", "mlr3viz", "mlr3tuning" ...
  "precrec", # ROC plots via mlr3, not auto-installed with mlr3viz
  "apcluster",
  # Viz / interpretability
  "rpart.plot", "iml", # "vip", "pdp",
  # Plotting / infrastructure, goodies
  "rmarkdown", "ggplot2", "patchwork", "usethis", "dplyr", "purrr", "ragg",
  "remotes", "styler"
)

# Install packages if not available already
sapply(packages, function(x) {
  if (!requireNamespace(x)) install.packages(x)
})

You can also try using {pak} for installation, which may try to automatically install system dependencies on Linux (see next note) if possible:

install.packages("pak")

pak::pak(packages)

Linux Note

If you're working on a Linux distribution such as Ubuntu (or something Ubuntu-based), you may have to install some system packages with sudo apt-get install ... beforehand. To get the system requirements needed, you can use the {remotes} package like so:

install.packages("remotes")

# Get system requirements for Ubuntu 20.04 (the package names likely won't differ between releases)
pkg_reqs <- sapply(packages, function(x) {
  remotes::system_requirements("ubuntu-20.04", package = x)
})

cat(paste("sudo", sort(unique(unlist(pkg_reqs))), collapse = "\n"))

Which will output the required apt-get install lines you can run to install everything.
For Ubunt 22.04, it would look like this, which you can run in the terminal of your choice:

sudo apt-get install -y git
sudo apt-get install -y libcurl4-openssl-dev
sudo apt-get install -y libfontconfig1-dev
sudo apt-get install -y libfreetype6-dev
sudo apt-get install -y libfribidi-dev
sudo apt-get install -y libgit2-dev
sudo apt-get install -y libglpk-dev
sudo apt-get install -y libgmp3-dev
sudo apt-get install -y libharfbuzz-dev
sudo apt-get install -y libicu-dev
sudo apt-get install -y libjpeg-dev
sudo apt-get install -y libpng-dev
sudo apt-get install -y libssl-dev
sudo apt-get install -y libtiff-dev
sudo apt-get install -y libxml2-dev
sudo apt-get install -y make
sudo apt-get install -y pandoc
sudo apt-get install -y zlib1g-dev

Using docker

If the installation instructions above fail for some reason, you can try using Docker as a sort of last resort. You can find out how to install it at https://docs.docker.com/engine/install/. For some more information specifically in the context of R, you can skim https://solutions.posit.co/envs-pkgs/environments/docker/, and for the popular set of images available from the rocker project, see https://rocker-project.org/images/.

The included Dockerfile will create an image will the required packages and dependencies all installed, and afterwards it launches an RStudio server instance you cann access via your browser.

Build the image with this command:

docker build -t self/mlworkshop:1.0 .

On an arm64 Mac (with an M1 or M2 processor), you will need to run the following instead:

docker build --platform linux/amd64 -t self/mlworkshop:1.0 .

Building the image will take a few minutes, but at least you won't have compilation issues.

Afterwards you can run the image with this command, which will make the workshop materials in the current directory ($(pwd)) available at the home directory where the RStudio instance will start at (/home/rstudio/). It also sets the port where the server will be available to you in the browser (the 8787 on the left side of the :, do not change the right side!), and it disables password authentication for simplicity.

docker run \
  -v "$(pwd):/home/rstudio/" \
  -p 8787:8787 \
  -e DISABLE_AUTH=true \
  self/mlworkshop:1.0

Once it's running, visit http://localhost:8787 in your browser and you should be greeted by an RStudio window. This repository is set up such that it tries to detect whether you're running inside a Docker container and will not load renv in that case, as that would only make things more complicated.

datatrain_workshop_ml's People

Contributors

jemus42 avatar mnwright avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.