Machine learning

Description

Machine learning is quite a fascinating subfield of computer science so I decided to learn more about it. This repository contains some of the code that I have written in order to introduce myself to various concepts of machine learning.

Who knows, maybe one day this will become an open source machine learning library... It may not be the TensorFlow but this code should give you a basic understanding of the fundamentals. I will try to comment the code the best I can and give some basic theory behind the concepts in this README file.

NOTE: Readers should be familiar with linear algebra and calculus.

Multivariable Linear Regression

In statistics, linear regression is an approach for modeling the relationship between a scalar dependent variable and one or more explanatory variables (or independent variables) denoted ).

In a nutshell this means that for any two given data sets of points such that and such that we are trying to find a relationship such that where represents the input state and represents the output for the corresponding input state .

In this case we are investigating that looks something like this:

i.e. we are trying to find a set of coefficients such that is as close to as possible. Note that and is called the bias term.

Cost Function

We measure how well describes using the cost function:

where is the i-th set of inputs (or features), is the output for and m is the number of training examples. We can write this in a vector form as:

where is a vector representing all coefficients, is the matrix where every row is a vector where i is between 1 and m, and is a vector representing all outputs .

Gradient Descent

In order to find the coefficients that minimise our cost function we use the following algorithm:

where alpha is the learning rate. When we substitute our cost function we get:

The idea behind this is that will converge to some vector which will be the best set of coefficients for our relation to predict . We can choose alpha to be a scalar or a diagonal matrix if we want to adjust the learning rate differently for individual coefficients.

Logistic Regression

Instead of our output vector $\vec{y}$ having components that are in a continuous range of values, they will be 0 or 1.

We use sigmoid function as our hypothesis representation

$F_C(\vec{x}) = g(X \cdot C) = \frac {1}{1 - e^{-X \cdot C}}$

$y = \begin{cases} 1 \text{if } F_C(\vec{x}) \geq 0.5 \ 0 \text{if } F_C(\vec{x}) \less 0.5 \end{cases}$

Cost Function for Logistic Regression

Cost function is:

$J(C) = \frac{1}{m}(-\vec{y}^T \cdot \log{(g(X\cdot C))} - (1 - \vec{y})^T \cdot \log{(1 - g(X\cdot C))})$

This cost function is chosen because it approaches infinity if $g(X\cdot C) = 1$ while and vice versa. Also, the gradient of this function looks much like the gradient of the cost function for the linear regression.

Gradient Descent

Using the cost function above our algorithm to find the coefficients looks like this:

$C_{i + 1} = C_i - \alpha \frac{1}{m} X^T \cdot (g(X \cdot C) - \vec{y})$

References

Coursera machine learning course
Deep Learning book by Ian Goodfellow Yoshua Bengio and Aaron Courville

modest-as / mlearning Goto Github PK

mlearning's Introduction

Machine learning

Description

Multivariable Linear Regression

Cost Function

Gradient Descent

Logistic Regression

Cost Function for Logistic Regression

Gradient Descent

References

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent