Git Product home page Git Product logo

stat-ml-edu's Introduction

Statistics and machine learning: from undergraduate to research

by Edgar Dobriban, Associate Prof. of Statistics & Data Science, Wharton; w/ Secondary Appointment in Computer and Information Science, Univ. of Pennsylvania

  • This repository contains links to references (books, courses, etc) that are useful for learning statistics and machine learning (as well as some neighboring topics). References for background materials such as linear algebra, calculus/analysis/measure theory, probability theory, etc, are usually not included.

  • The level of the references starts from advanced undergraduate stats/math/CS and in some cases goes up to the research level. The books are often standard references and textbooks, used at leading institutions. In particular, several of the books are used in the standard curriculum of the PhD program in Statistics at Stanford University (where I learned from them as well), as well as at the University of Pennsylvania (where I work). The goal is to benefit students, researchers seeking to enter new areas, and lifelong learners.

  • For each topic, materials are listed in a rough order of from basic to advanced.

  • The list is highly subjective and incomplete, reflecting my own preferences, interests and biases. For instance, there is an emphasis on theoretical material. Most of the references included here are something that I have at least partially (and sometimes extensively) studied; and found helpful. Others are on my to-read list. Several topics are omitted due to lack of expertise (e.g., causal inference, Bayesian statistics, time series, sequential decision-making, functional data analysis, biostatistics, ...).

  • The links are to freely available author copies if those are available, or to online marketplaces otherwise (you are encouraged to search for the best price).

  • How to use these materials to learn: To be an efficient researcher, certain core material must be mastered. However, there is too much specialized knowledge, and it can be overwhelming to know it all. Fortunately, it is often enough to know what type of results/methods/tools are available, and where to find them. When they are needed, they can be recalled and used.

  • Please feel free to contact me with suggestions.

Statistics

Principles and overview

  • Casella & Berger: Statistical Inference (2nd Edition) - Possibly the best introduction to the principles of statistical inference at an advanced undergraduate level. Mathematically rigorous but not technical. Covers key ideas and tools for constructing and evaluating estimators:
    • Data reduction (sufficiency, likelihood principle),
    • Methods for finding estimators (method of moments, Maximum likelihood estimation, Bayes estimators), methods for evaluating estimators (mean squared error, bias and variance, best unbiased estimators, loss function optimality),
    • Hypothesis testing (likelihood ratio tests, power), confidence intervals (pivotal quantities, coverage),
    • Asymptotics (consistency, efficiency, bootstrap, robustness).
  • Wasserman: All of Statistics: A Concise Course in Statistical Inference - A panoramic overview of statistics; mathematical but proofs are omitted. Covers material overlapping with ESL, TSH, TPE (abbreviations defined below), and other books in this list.
  • Cox: Principles of Statistical Inference - Covers a number of classical principles and ideas such as pivotal inference, ancillarity, conditioning, including famous paradoxes. Light on math, but containing deep thoughts.

Statistical Methodology

Statistical Theory

Core Theory: First Year PhD Curriculum

Advanced Theory

This section is the most detailed one, as it is the closest to my research.

Non-parametrics, minimax lower bounds

  • Tsybakov: Introduction to Nonparametric Estimation - The first two chapters contain many core results and techniques in nonparametric estimation, including lower bounds (Le Cam, Fano, Assouad).
  • Weissman, Ozgur, Han: Stanford EE 378 Course Materials. Lecture Notes - Possibly the most comprehensive set of materials on information theoretic lower bounds, including estimation and testing (Ingster's method) with examples given in high-dimensional problems, optimization, etc.
  • Johnstone: Gaussian estimation: Sequence and wavelet models - Beautiful overview of estimation in Gaussian noise (shrinkage, wavelet thresholding, optimality). Rigorous and deep, has challenging exercises.

Overviews of statistical machine learning theory

Semiparametrics

Multivariate statistical analysis

Subsampling

Empirical processes

High dimensional (mean field, proportional limit) asymptotics; random matrix theory (RMT) for stats+ML

Applications and case studies

Machine Learning

ML Theory

Deep Learning

DL Practice and Conceptual Understanding

Safe AI

DL Theory

This is subject to active development and research. There is no complete reference.

Language Models

Uncertainty quantification

Complements

Optimization

Probability

Concentration inequalities

Chaining

  • Talagrand: Upper and Lower Bounds for Stochastic Processes - Chaining is a theoretical tool invented by Talagrand, and can often give optimal bounds of the tail behavior of stochastic processes (even when standard concentration inequalities fail to do so). This is the a readable, but rigorous and complete reference by the inventor of the theory.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.