Machine learning is quite a fascinating subfield of computer science so I decided to learn more about it. This repository contains some of the code that I have written in order to introduce myself to various concepts of machine learning.
Who knows, maybe one day this will become an open source machine learning library... It may not be the TensorFlow but this code should give you a basic understanding of the fundamentals. I will try to comment the code the best I can and give some basic theory behind the concepts in this README file.
NOTE: Readers should be familiar with linear algebra and calculus.
In statistics, linear regression is an approach for modeling the relationship between a scalar dependent variable and one or more explanatory variables (or independent variables) denoted ).
In a nutshell this means that for any two given data sets of points such that and such that we are trying to find a relationship such that where represents the input state and represents the output for the corresponding input state .
In this case we are investigating that looks something like this:
i.e. we are trying to find a set of coefficients such that is as close to as possible. Note that and is called the bias term.
We measure how well describes using the cost function:
where is the i-th set of inputs (or features), is the output for and m is the number of training examples. We can write this in a vector form as:
where is a vector representing all coefficients, is the matrix where every row is a vector where i is between 1 and m, and is a vector representing all outputs .
In order to find the coefficients that minimise our cost function we use the following algorithm:
where alpha is the learning rate. When we substitute our cost function we get:
The idea behind this is that will converge to some vector which will be the best set of coefficients for our relation to predict . We can choose alpha to be a scalar or a diagonal matrix if we want to adjust the learning rate differently for individual coefficients.
Instead of our output vector having components that are in a continuous range of values, they will be 0 or 1.
We use sigmoid function as our hypothesis representation
Cost function is:
This cost function is chosen because it approaches infinity if while and vice versa. Also, the gradient of this function looks much like the gradient of the cost function for the linear regression.
Using the cost function above our algorithm to find the coefficients looks like this:
- Coursera machine learning course
- Deep Learning book by Ian Goodfellow Yoshua Bengio and Aaron Courville