Git Product home page Git Product logo

polynomial-regression-optimizer's Introduction

Polynomial-Regression-Optimizer

A regression model that find the optimal polynomial-degree for the input data

Overview

Polynomial regression is a statistical machine learning tool tool that allows us to find a simple model that fits between our features and the data we want to predict. In contrast to the simple linear model, polynomial regression can predict complex and diverse functions. This feature makes polynomial regression a useful tool in the field of Machine Learning. We can be seen example of fitting curve using Polynomial Regression:

fit

Polynomial regression is a simple extension of the familiar linear model. As we know, the simple linear model fit a beta vector to the matrix of X values, so that they give a prediction to the Y values. The matrix shape can be in a high dimension or also in dimension 1 (vector). In the case of polynomial regression, the model receives vector X and vector Y. Then, the model converts vector x to a matrix, so that each column represents a different rank of the polynomial degree. This means that each column in the matrix is a version of the vector x values under a different power (0,1,... until the polynom degree). Now, we can just use the matrix in the familiar multi-dimensional linear model. The model allows us to find a beta vector in dimension P (the polynomial degree) that matches the data to the corresponding Y values. We can display the Linear-Polynomial Model as follow:

When:

,

,

,

Using this method, we can adjust regression models to our data by defining different degrees of polynomials:

degrees

As you can see, some curves are more suitable for the original function, while others do not really seem to fit. Unfortunately, it is not possible to know what the proper polynomial degree to our data. Manual selection of a polynomial degree that is too low will result in an underfitting result and selection of a degree that is too high will result in an overfitting condition as well. Therefore, to find the degree that will lead to the optimal model, the polyr code examines the polynomial model for a wide range of polynomial degrees, to find the one that leads to the best result. As can be seen in the following example:

opt

As can be seen, each degree of polynomial leads to other errors value (in our case RMSE). In our case, the values that lead to the best results seem to be between 15 and 20. This means that the polyr code will choose the betas vector that are based on the optimal polynomial degree. All that remains for the user is to define the range of values for which the various models will be tested. The range can be set using the max_p parameter. To prevent overfitting, the code uses the k-folds cross validation method, where its value can be adjusted via the cv parameter (default = 5)

In addition, the polyr code has inside plot function to display the error results of each tested model. To use this function all you have to do is follow the following example code:

# import code
from polyr import PolyR
import numpy as np

# load data
matrix = np.load('matrix.npy')
x, y   = matrix[:,0], matrix[:,1]

# using the code
PolyR(max_p=29).fit(x,y).plot_rmse()

plot

All mathematical calculations performed in the code are implemented using the numpy library. With the use of this library it is possible to ensure efficient realization of the various regressive models and quickly find the appropriate polynomial degree.

Libraries

The code uses the following library in Python:

matplotlib

numpy

Application

An application of the code is attached to this page under the name:

implementation.py

The examples and the outputs are also attached here: examples & outputs.

Example for using the code

To use this code, you just need to import it as follows:

# import code
from polyr import PolyR
import numpy as np

# load data
train = np.load('train.npy')
test  = np.load('test.npy')

# define variables
max_p   = 30
cv      = 7
x_test  = test[:,0]
x_train = train[:,0]
y_train = train[:,1]

# using the code
y_prediction = PolyR(max_p = max_p,cv = cv).fit(x_train,y_train).predict(x_test)

When the variables displayed are:

max_p: maximum polynomial degree to check

cv: k value for k-folds cross validation (defualt = 5)

License

MIT © Etzion Harari

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.