Git Product home page Git Product logo

master's Introduction

Jupyter Book Badge YouTube Channel Views

An Open Machine Learning Course

Jupyter notebooks for teaching machine learning. Based on scikit-learn and Keras, with OpenML used to experiment more extensively on many datasets.

Online course book - powered by Jupyter-book

Sources

Practice-oriented materials

We use many code examples from the following excellent books. We urge you to read them for a more complete coverage of machine learning in Python:

Introduction to Machine Learning with Python by Andreas Mueller and Sarah Guido. Focussing entirely on scikit-learn, and written by one of its core developers, this book offers clear guidance on how to do machine learning with Python.

Deep Learning with Python by François Chollet. Written by the author of the Keras library, this book offers a clear explanation of deep learning with practical examples.

Python machine learning by Sebastian Raschka. One of the classic textbooks on how to do machine learning with Python.

Python for Data Analysis by Wes McKinney. A more introductory and broader text on doing data science with Python.

Theory-oriented materials

For a deeper understanding of machine learning techniques, we can recommend the following books:

"Mathematics for Machine Learning" by Marc Deisenroth, A. Aldo Faisal and Cheng Soon Ong. This provides the basics of linear algebra, geometry, probabilities, and continuous optimization, and how they are used in several machine learning algorithms. The PDF is available for free.

"The Elements of Statistical Learning: Data Mining, Inference, and Prediction. (2nd edition)" by Trevor Hastie, Robert Tibshirani, Jerome Friedman. One of the key references of the field. Great coverage of linear models, regularization, kernel methods, model evaluation, ensembles, neural nets, unsupervised learning. The PDF is available for free.

"Deep Learning" by Ian Goodfellow, Yoshua Bengio, Aaron Courville. The current reference for deep learning. Chapters can be downloaded from the website.

"An Introduction to Statistical Learning (with Applications in R)" by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. More introductory version of the above book, with many code examples in R. The PDF is also available for free. (Note that we won't be using R in the main course materials, but the examples are still very useful).

"Gaussian Processes for Machine Learning" by Carl Edward Rasmussen and Christopher K. I. Williams. The reference for Bayesian Inference. Also see David MacKay's book for additional insights. Also see this course by Neil Lawrence for a great introduction to Gaussian Processes, all from first principles.

Open course

Made with love by Joaquin Vanschoren. Materials are released under the CC0 License. You can use them as you like.

Partly based on notebooks by Andreas Mueller (CC0 licenced), François Chollet (MIT licenced), Sebastian Raschka (MIT licenced), and Neil Lawrence (with permission)

master's People

Contributors

akratiiet avatar joaquinvanschoren avatar pgijsbers avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

master's Issues

Missing data.csv in 00 - Tutorial 2a - Python for Data Analysis

dfs = pd.read_csv('data.csv')
dfs
dfs.set_value(0, 'a', 10)
dfs.to_csv('data.csv', index=False) # Don't export the row index`

if this is run you get following error:

OSError: File b'data.csv' does not exist

I cannot find the data.csv anywhere in the repo either.

2 Linear Models

Some questions and suggestions came to mind when I read about the gradient descent method:

  • In section Gradient Descent, I find the formulation of the exponential decay of the learning rate a little bit odd. I would suggest expressing \eta_s in terms of \eta_0 instead.

  • In section Stochastic Gradient Descent (SGD), I believe it would be better to start the index at i=0, otherwise it would make more sense to divide by n+1 when averaging the individual losses. Same goes for the other two sums in that part.

  • Furthermore, the "incremental gradient" method does look a lot like the SAG method described here instead of the incremental aggregated gradient (IAG) method from this paper, which I found confusing. I also found the SAGA Algorithm. Maybe adding some of these references will be helpful to other students.

  • Another suggestion would be to change "random i" to "if i = i_s" and adding "with i_s randomly chosen per iteration".

File naming

The names of neural networks slides contain |. This seems allowed on macOS, but it is an invalid character in Windows and Linux file names. It prevents Windows and Linux users cloning or updating this repository.

Pulling from upstreams...
remote: Counting objects: 177, done.
remote: Compressing objects: 100% (10/10), done.
remote: Total 177 (delta 4), reused 1 (delta 1), pack-reused 166
Receiving objects: 100% (177/177), 11.75 MiB | 582.00 KiB/s, done.
Resolving deltas: 100% (10/10), completed with 1 local object.
From https://github.com/joaquinvanschoren/ML-course
 * branch            master     -> FETCH_HEAD
   66e8297..3162f05  master     -> upstream/master
error: unable to create file N1 | Introduction.ipynb: Invalid argument
error: unable to create file N2 | Artificial Neuron.ipynb: Invalid argument
error: unable to create file N3 | Perceptron Classifier.ipynb: Invalid argument
error: unable to create file N4 | MLP.ipynb: Invalid argument
error: unable to create file N5 | MLP Classifier.ipynb: Invalid argument
error: unable to create file N6 | Optimization and Regularization.ipynb: Invalid argument
error: unable to create file N7 | Convolutional Networks.ipynb: Invalid argument
Updating 66e8297..3162f05
error: unable to create file N8 | Recurrent Networks.ipynb: Invalid argument

Issue on page /notebooks/01 - Introduction.html

In section "Neural networks: evaluation and optimization", after "E.g. Gradient descent:", the formula is missing a partial derivative symbol and, more importantly, I guess there should be a minus in order to move towards the minimum.

Error downloading dataset

Running this code the process gets interrupted.

import openml as oml
import numpy as np
import matplotlib.pyplot as plt
import sklearn

# Download Streetview data. Takes a while the first time.
SVHN = oml.datasets.get_dataset(41081)
X, y, cats, attrs = SVHN.get_data(dataset_format='array',
    target=SVHN.default_target_attribute)

Process finished with exit code 137 (interrupted by signal 9: SIGKILL)
I have 12gig of memory free, can't figure out why I am getting this interrupt.

Some lab solutions not available

Hi, the solutions to lab 2a and 7a are not available yet.
They are not listed on the home page /intro.html.

Could you put these online?

Error in tutorial 4

In Tutorial 4 there is an error

print("SVM component: {}".format(pipe.named_steps['svm']f))

The f should not be there

Gini index and information gain

  • I could not immediately find the idea behind the Gini impurity index on the internet. The following derivation helped me understand the intuition a little bit better:
    image
    The idea is that this captures "how often a randomly selected element is labeled incorrectly if the label is chosen randomly according to the actual distribution (in a leaf)".
  • The definition of information gain, it is unclear to me what X_i is exactly. I would have expected Gain(X, i) and |X| in the denominator of the fraction. Would that make sense? Furthermore, am I correct that this l=1 to L sum loops over what some call the levels of this feature?

openML has dependency on Visual Studio Build Tools

When I was installing openML on windows, I had several issues. As it turns out, one of the packages included, netifaces, requires Visual C++ Build Tools to be installed. It would be helpful for future students with this issue if you include this information somewhere. Even if it is a very common program to have installed, I personally had recently emptied my hard disk so I did not have it installed. pip does not clearly state this package is required.

Not all Gabor kernels are plotted

In FDM_Challenge_Part3 it is confusing that the 'Gabor family' and 'Responses' both plot 10 * 10 = 100 images. This gives the impression that len(kernel) == 100. In reality len(kernel) == 225.

Dollar signs in formulas

You're probably already aware, but in a lot of places in the html view of the pages there are dollar signs around the formulas.

For example:
image

It seems to happen when double dollar signs ($$) are used in the notebooks to typeset LaTeX over a full line. For some reason the html view then sometimes (not always) treats it as inline LaTeX instead and treats the formula as if it was only surrounded with single dollar signs. Since the result is correct in the notebooks and PDFs I'm not sure if there is an easy way to resolve this. Luckily it also doesn't affect readability too much so this isn't a very high priority issue.

Finally, the issue seems to happen in both Firefox and Chrome.

Lab solutions imports are wrong

In the lab solutions, the preamble is still imported, which causes an error (and it's overkill).
Update the imports with only the required packages, as in the orginal lab notebooks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.