Git Product home page Git Product logo

stat-ml's Introduction

Statistical Foundations of Machine Learning

This 15h course develops important aspects of statistical modelling, which are particularly related to machine learning.

Chapter 1 reviews the mathematical notions that will underlie machine learning. In particular, the notions of random variables, probability density and empirical estimation of model parameters will be rigorously defined and illustrated.

Chapter 2 is central in this course as it presents the linear regression model from a statistical point of view, but opens up questions that are essential in machine learning, such as overfitting and cross-validation.

These issues are developed in chapter 3, which deals with regularization and cross-validation but also develops the concepts of bias-variance trade-off and curse of dimensionality.

Chapters 4 and 5 push the statistical modeling aspects introduced in chapter 3 to make clear how the randomness modeled in different random variables allows to build a statistical test (chapter 4 on ANOVA) or to estimate the parameters of a relatively complex model from observed data (chapter 5 on mixed models).

Finally, chapter 6 makes two openings on two classic linear models, both in machine learning and in statistics, which are the logistic regression and the PLS method.

Schedule

In practice, the courses and practicals will be structured in four blocks, each of them containing 1 to 2 hours of course and 2 hours of practicals. All documents linked below are in French.

  • Block 1: In this block, chapter 1 and the 1D linear regression of chapter 2 will be seen in class. The practicals will deal with linear regression, outliers detection and an illustration of the concept of maximum likelihood.
  • Block 2: This block deals with multi-variate linear regression (2nd part of chapter 2), regularisation and cross-validation (chapter 3). These concepts will be manipulated during the practicals.
  • Block 3: This block is more related to statistical aspects of the linear model in data science and focuses on ANOVA (chapter 4). This method will be studied during the practicals, and some time will also be dedicated to further manipulate the concepts of block 2.
  • Block 4: Extensions of the methods seen before will be seen in this block: Mixed models (chapter 5) and the mathematical construction of logistic regression and the PLS method. The practicals will first deal with PLS but also open questions on the interpretability of the decision rules in machine learning based on the logistic regression example.

Practical sessions

All notebooks in French.
Utilisation de scikit-learn pour la regression lineaire
Régression linéaire multiple et inférence statistique
Regression multiple avec régularisaton et validation croisée
Utilisation de Pandas et sklearn pour l'analyse de données réelles
ANOVA
Partial Least Squares
Régression logistique et explicabilité

Chapters

All documents in French. These chapters correspond to the exact same contents studied during the 4 blocks above.
Chapter 1 Introduction (slides)
Chapter 2 Régression linéaire (slides)
Chapter 3 Sélection de modèle en régression linéaire multiple (slides)
Chapter 4 Analyse de variance
Chapter 5 Modèle linéaire mixte
Chapter 6 Ouvertures
Annexes

stat-ml's People

Contributors

lrisser avatar erachelson avatar d9w avatar amtoine avatar

Stargazers

Nadarajen Veerapen avatar Xavier Dupuis avatar  avatar

Watchers

 avatar Carlos Aguilar Melchor avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.