Git Product home page Git Product logo

gradientdescent-haskell's Introduction

Linear Regression Model Estimation and Selection

Elena Badillo Goicoechea, June 10, 2019

Descripton

This project consists of a purely functional programming implementation of multivariate linear model estimation using: i) gradient descent, ii) greedy best model selection

The pipeline of this project consists of the following steps:

1. Take args provided by the user [ i) json file path, ii) no. of iterations, iii) task, iv) scaling factor]
2. Read file and load data
3. Parse data into appropriate data types
4. Execute required task
5. Print results

The Main module takes args provided by user via command line, reads the data into required form, and executes required algorithms.

The DoLinReg module implements gradient descent algorithm to fit a linear regression model to a given dataset.

The RegModels module implements tasks consisting of different model selection algorithms, the last three of which optimize over adjusted R2:

1) Produce a list of all univariate models ( one for each predictor variable)
2) Estimate a model including the set of all predictors, X
3) Select the best possible bivariate model
4) Greedily select the best k-variate model for each k = 1,..., |X|
5) Greedily select the best model from all combinations

The RTypes module contains all the data type signatures and instances necessary for the implementation, namely:

- Coefs
- Record
- DataSet
- Model

Dataset

The assuptions made by the program, regarding the dataset provided by the user are the following:

- All variables are numeric
- JSON format
- Target variable (y) corresponds to the last field in each record
- All records have the same size

To test the program, a sample dataset (student.json) is included in the folder. The dataset comes from UCI Machine Learning repository and contains data on Portuguese student achievement in secondary education (target variable) and various potential predictors. Source: https://archive.ics.uci.edu/ml/datasets/Student+Performance

Advanced Features

DataSet and Record data types are made instances of Monoid, and Semigroup, and Functor. Functor instance is currently used (for both data types, jointly) for scalling the dataset provided by the user as input, by the factor also provided by the user as input.

Potential uses for the Monoid and Semigroup include i) merging different datasets, and ii) managing the case of having different sized entries in the json file provided by the user (with the benefit of requiring less assumptions from it, with respect to those stated above, and thus making the program more flexible).

Further use of these features, combining them with the addition of a Monad instance for Model and its components could allow to parallelize gradient descent algorithm, for a more efficient implementation.

References:

- https://gabebw.com/blog/2015/10/11/regular-expressions-in-haskell
- http://mccormickml.com/2014/03/04/gradient-descent-derivation/
- https://www.classes.cs.uchicago.edu/archive/2018/fall/12100-1/pa/pa5/index.html
- https://samcgardner.github.io/2018/10/06/linear-regression-in-haskell.html

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.