Git Product home page Git Product logo

mlpp's Introduction

ML++

Machine learning is a vast and exiciting discipline, garnering attention from specialists of many fields. Unfortunately, for C++ programmers and enthusiasts, there appears to be a lack of support in the field of machine learning. To fill that void and give C++ a true foothold in the ML sphere, this library was written. The intent with this library is for it to act as a crossroad between low-level developers and machine learning engineers.

Installation

Begin by downloading the header files for the ML++ library. You can do this by cloning the repository and extracting the MLPP directory within it:

git clone https://github.com/novak-99/MLPP

Next, execute the "buildSO.sh" shell script:

sudo ./buildSO.sh

After doing so, maintain the ML++ source files in a local directory and include them in this fashion:

#include "MLPP/Stat/Stat.hpp" // Including the ML++ statistics module. 

int main(){
...
}

Finally, after you have concluded creating a project, compile it using g++:

g++ main.cpp /usr/local/lib/MLPP.so --std=c++17

Usage

Please note that ML++ uses the std::vector<double> data type for emulating vectors, and the std::vector<std::vector<double>> data type for emulating matrices.

Begin by including the respective header file of your choice.

#include "MLPP/LinReg/LinReg.hpp"

Next, instantiate an object of the class. Don't forget to pass the input set and output set as parameters.

LinReg model(inputSet, outputSet);

Afterwards, call the optimizer that you would like to use. For iterative optimizers such as gradient descent, include the learning rate, epoch number, and whether or not to utilize the UI panel.

model.gradientDescent(0.001, 1000, 0);

Great, you are now ready to test! To test a singular testing instance, utilize the following function:

model.modelTest(testSetInstance);

This will return the model's singular prediction for that example.

To test an entire test set, use the following function:

model.modelSetTest(testSet);

The result will be the model's predictions for the entire dataset.

Contents of the Library

  1. Regression
    1. Linear Regression
    2. Logistic Regression
    3. Softmax Regression
    4. Exponential Regression
    5. Probit Regression
    6. CLogLog Regression
    7. Tanh Regression
  2. Deep, Dynamically Sized Neural Networks
    1. Possible Activation Functions
      • Linear
      • Sigmoid
      • Softmax
      • Swish
      • Mish
      • SinC
      • Softplus
      • Softsign
      • CLogLog
      • Logit
      • Gaussian CDF
      • RELU
      • GELU
      • Sign
      • Unit Step
      • Sinh
      • Cosh
      • Tanh
      • Csch
      • Sech
      • Coth
      • Arsinh
      • Arcosh
      • Artanh
      • Arcsch
      • Arsech
      • Arcoth
    2. Possible Optimization Algorithms
      • Batch Gradient Descent
      • Mini-Batch Gradient Descent
      • Stochastic Gradient Descent
      • Gradient Descent with Momentum
      • Nesterov Accelerated Gradient
      • Adagrad Optimizer
      • Adadelta Optimizer
      • Adam Optimizer
      • Adamax Optimizer
      • Nadam Optimizer
      • AMSGrad Optimizer
      • 2nd Order Newton-Raphson Optimizer*
      • Normal Equation*

      *Only available for linear regression
    3. Possible Loss Functions
      • MSE
      • RMSE
      • MAE
      • MBE
      • Log Loss
      • Cross Entropy
      • Hinge Loss
      • Wasserstein Loss
    4. Possible Regularization Methods
      • Lasso
      • Ridge
      • ElasticNet
      • Weight Clipping
    5. Possible Weight Initialization Methods
      • Uniform
      • Xavier Normal
      • Xavier Uniform
      • He Normal
      • He Uniform
      • LeCun Normal
      • LeCun Uniform
    6. Possible Learning Rate Schedulers
      • Time Based
      • Epoch Based
      • Step Based
      • Exponential
  3. Prebuilt Neural Networks
    1. Multilayer Peceptron
    2. Autoencoder
    3. Softmax Network
  4. Generative Modeling
    1. Tabular Generative Adversarial Networks
    2. Tabular Wasserstein Generative Adversarial Networks
  5. Natural Language Processing
    1. Word2Vec (Continous Bag of Words, Skip-Gram)
    2. Stemming
    3. Bag of Words
    4. TFIDF
    5. Tokenization
    6. Auxiliary Text Processing Functions
  6. Computer Vision
    1. The Convolution Operation
    2. Max, Min, Average Pooling
    3. Global Max, Min, Average Pooling
    4. Prebuilt Feature Detectors
      • Horizontal/Vertical Prewitt Filter
      • Horizontal/Vertical Sobel Filter
      • Horizontal/Vertical Scharr Filter
      • Horizontal/Vertical Roberts Filter
      • Gaussian Filter
      • Harris Corner Detector
  7. Principal Component Analysis
  8. Naive Bayes Classifiers
    1. Multinomial Naive Bayes
    2. Bernoulli Naive Bayes
    3. Gaussian Naive Bayes
  9. Support Vector Classification
    1. Primal Formulation (Hinge Loss Objective)
    2. Dual Formulation (Via Lagrangian Multipliers)
  10. K-Means
  11. k-Nearest Neighbors
  12. Outlier Finder (Using z-scores)
  13. Matrix Decompositions
    1. SVD Decomposition
    2. Cholesky Decomposition
      • Positive Definiteness Checker
    3. QR Decomposition
  14. Numerical Analysis
    1. Numerical Diffrentiation
      • Univariate Functions
      • Multivariate Functions
    2. Jacobian Vector Calculator
    3. Hessian Matrix Calculator
    4. Function approximator
      • Constant Approximation
      • Linear Approximation
      • Quadratic Approximation
      • Cubic Approximation
    5. Diffrential Equations Solvers
      • Euler's Method
      • Growth Method
  15. Mathematical Transforms
    1. Discrete Cosine Transform
  16. Linear Algebra Module
  17. Statistics Module
  18. Data Processing Module
    1. Setting and Printing Datasets
    2. Available Datasets
      1. Wisconsin Breast Cancer Dataset
        • Binary
        • SVM
      2. MNIST Dataset
        • Train
        • Test
      3. Iris Flower Dataset
      4. Wine Dataset
      5. California Housing Dataset
      6. Fires and Crime Dataset (Chicago)
    3. Feature Scaling
    4. Mean Normalization
    5. One Hot Representation
    6. Reverse One Hot Representation
    7. Supported Color Space Conversions
      • RGB to Grayscale
      • RGB to HSV
      • RGB to YCbCr
      • RGB to XYZ
      • XYZ to RGB
  19. Utilities
    1. TP, FP, TN, FN function
    2. Precision
    3. Recall
    4. Accuracy
    5. F1 score

What's in the Works?

ML++, like most frameworks, is dynamic, and constantly changing. This is especially important in the world of ML, as new algorithms and techniques are being developed day by day. Here are a couple of things currently being developed for ML++:

- Convolutional Neural Networks

- Kernels for SVMs

- Support Vector Regression

Citations

Various different materials helped me along the way of creating ML++, and I would like to give credit to several of them here. This article by TutorialsPoint was a big help when trying to implement the determinant of a matrix, and this article by GeeksForGeeks was very helpful when trying to take the adjoint and inverse of a matrix.

mlpp's People

Contributors

novak-99 avatar richardscottoz avatar smartai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mlpp's Issues

introduce CMAKE build

Hi Marc,

I pulled your project into my CLion IDE (IntelliJ), looks very interesting! great work there.. are you really 16 years old!!!

I have introduced a cmake file that would allow more portable and modern build approach and easy IDE agnostic integration. I am attaching it here. just drop it next to buildSo.sh. the cmake will configure two targets ( shared lib "mlpp" and mlpp_runner for your main.cpp that will link against the shared library). I also upped the C++ support to 20.
CMakeLists.txt

Optimizing matrix multiplication

Impressive work!

You should swap the two inner loops here:

for(int i = 0; i < A.size(); i++){
for(int j = 0; j < B[0].size(); j++){
for(int k = 0; k < B.size(); k++){
C[i][j] += A[i][k] * B[k][j];
}
}
}

That is:

 for(int i = 0; i < A.size(); i++){ 
     for(int k = 0; k < B.size(); k++){ 
         for(int j = 0; j < B[0].size(); j++){ 
             C[i][j] += A[i][k] * B[k][j]; 
         } 
     } 
 } 

It won't change the result, but it should speed up the multiplication. Explanations: https://viralinstruction.com/posts/hardware/#15f5c31a-8aef-11eb-3f19-cf0a4e456e7a

Also, std::vector<std::vector<>> is not the best way to store a matrix: https://stackoverflow.com/a/55478808

Possible mistakes in cost functions

Cost::MAEDeriv is wrong. y_hat must be compared with y, but not with zero.
Cost::WassersteinLoss is same as Cost::HingeLoss, but thats are not same.

Softmax Optimization

std::vector<double> Activation::softmax(std::vector<double> z){
std::vector<double> a;
a.resize(z.size());
for(int i = 0; i < z.size(); i++){
double sum = 0;
for(int j = 0; j < z.size(); j++){
sum += exp(z[j]);
}
a[i] = exp(z[i]) / sum;
}
return a;
}

Here is one specific example of code optimization: The softmax function here will give the correct answer but as it is currently written it is recalculating the same sum z.size() times. It would be much better to calculate the sum outside of the loop once and then reuse that value inside the loop without recalculating it.

If we wanted to optimize even more we could look at the use of the exp() function. Even with the above fix the exponential of each element in z is being calculated twice (once for the sum and once for the final output element calculation). Assuming memory allocations and accesses are faster than the exp() function it would be better to make an intermediary array of the exponential values and then access that array to calculate the sum and then also to calculate values of a.

The first optimization here will make a massive difference, so I think it is definitely worth keeping things like this in mind. The second one will have much less of an impact and goes a bit more into the weeds, so I would not worry too much about optimizations like that during the initial coding - I mention it here just to give a more complete idea of what kind of optimizations are possible even in a very simple function.

Documentation

What is the status of the documentation you mentioned? Thx!

knn implementation problem

your implementation of

    std::vector<double> kNN::nearestNeighbors(std::vector<double> x){
        LinAlg alg;
        // The nearest neighbors
        std::vector<double> knn;
        
        std::vector<std::vector<double>> inputUseSet = inputSet;
        //Perfom this loop unless and until all k nearest neighbors are found, appended, and returned
        for(int i = 0; i < k; i++){
            int neighbor = 0;
            for(int j = 0; j < inputUseSet.size(); j++){
                bool isNeighborNearer = alg.euclideanDistance(x, inputUseSet[j]) < alg.euclideanDistance(x, inputUseSet[neighbor]);
                if(isNeighborNearer){
                    neighbor = j;
                }
            }
            knn.push_back(neighbor);
            inputUseSet.erase(inputUseSet.begin() + neighbor); // This is why we maintain an extra input"Use"Set
        }
        return knn;
    }

is wrong. Given a list of inputSet and x, assuming the inputSet is sorted in ascending order according to their distance to x, your implementation will output a list in index zero

Is MLPP reinventing the wheel? What would it be used for?

Great work! Very interesting!

In the README, you say that MLPP serves to revitalize C++ as a machine learning front-end. How does MLPP separate itself from the Pytorch C++ API? If you don't mind me asking, why not build wrappers around the already open-source and highly optimized Pytorch C++ code?

Thanks!

Development status?

I was sifting through the codebase, as I am building a similar project from scratch and this project gave me a lot of pointers and inspiration, and noticed several ways things could be optimized. Is the project still under active development?

Hyperbolic Activations

In the README, it looks like you've (planned on) implemented a lot of hyperbolic functions as activation functions.

If you don't mind me asking, are there specific cases where such functions with a diverging gradient (such as Cosh and Sinh) that you've implemented would be helpful? I am very curious, I haven't seen these used before. Thanks!

preformance_function error?

double Utilities::performance(std::vector<double> y_hat, std::vector<double> outputSet){
    double correct = 0;
    for(int i = 0; i < y_hat.size(); i++){
        if(std::round(y_hat[i]) == outputSet[i]){
            correct++;
        }
    }
    return correct/y_hat.size();
}

problem:std::round(y_hat[i]) == outputSet[i]???

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.