Git Product home page Git Product logo

sparsesvmlearning's Introduction

sparseSVMLearning

Multiple implementations for fast/efficient/sparse SVM learning. Also some data sparse coding implementations.

Soft-Margin SVM learning

To learn a standard SVM with the typical SMO-like optimization algorithm from libSVM and do grid search with k-fold cross-validation, please download the ML_toolbox and add it to your MATLAB search path. The scripts to test this are:

boundary_learning_2d.m
boundary_learning_robots.m

These contain the necessary code snippets to find the optimal sets of hyper-parameters for a 2D toy problem and for the real Big-Data self-collision avoidance dataset, which has to be downloaded found here

This implementation solves the dual QP problem with a type of SMO optimization, which yields accurate results but may take really long for large number of datapoints. The optimization is implemented in C++ but it has MATLAB mex interface to run them from Matlab. In my experience, I tried learning an SVM from 270k points and it took a couple of days to solve on a (not so old) PC.

GPU-Accelerated Soft-Margin SVM learning

This fast implementation of soft-margin SVM learning relies on exploiting the computational power of GPU's. Given the correct NVidia card, the algorithm is extremely fast! The script to test this is:

boundary_learning_gtsvm.m

NOTE: In order to achieve this efficiency the algorithm does some data clustering to reduce the number of sample points. If your data is completely overlapping, this might not given the same performance as the standard Soft-Margin SVM learning algorithm, so be careful! For the 2D dataset example it works very nicely, however, for our self-collision avoidance it just cannot reach the expected performance (in terms of classification rates). In other words, it's fast but less accurate. Check out the description of the algorithm and code here: GTSVM

Sparse Kernel SVM via Cuttting-Plane Training

This is the sparse kernel SVM implementation that can handle datasets >100k points and relies on finding the optimal K support vectors, which are not necessarily points in the dataset. The scripts to test this is:

boundary_learning_sparse_CPSP.m

Typically, one finds the optimal hyper-parametersl i.e. C and sigma (for RBF Kernel) with either of the previous implementations. Then we use these same parameters and set the support vector budget K to our desired value. This algorithm will try to find the hyper-plane that yields the best performance (in terms of classification rate) given the support vector budget K. Check out the description of the algorithm and code here: sparseSVM and sparse CPSP paper

Other stuff

Implementation of Cross-Training Algorithm

I provide an implementation of the paper titled: Breaking SVM Complexity with Cross-Training in the script:

Xtraining.m

It's one of the first algorithms (2005) to tackle SVM complexity issues. Wasn't really useful for my dataset, but might be useful for something else.

Testing of Sparse Modeling Approaches

I test two sparse modeling approaches whose goal is to select a subset of data points as the representatives for a large collection of points, hence, reducing the number of points of a dataset. I provide testing scripts for the Sparse Modeling Representative Selection (SMRS) which is an algorithm based on sparse multiple-measurement-vector recovery theory:

boundary_subset_selectino_smrs.m

and the Dissimilarity-based Sparse Subset Selection (DS3)

boundary_subset_selectino_dissimilarity.m

Again, these algo's didn't really work on my self-collision avoidance dataset as they rely on a sort of clustering or similarity metrics, which will NOT work for completely overlapping classes. This might be useful for very large datasets of multi-class problems where the decision boundaries might be non-linear but there is not much overlap or outliers. The source code of both algo's is provided here

References and Toolboxes

  • libSVM The standard library for SVM learning is libSVM:
  • ML_toolbox Matlab toolbox used for teaching machine learning techniques at EPFL by N. Figueroa among others.
  • GTSVM A GPU-Tailored Approach for Training Kernelized SVMs. Cotter el at.
  • sparseSVM Support Vector Machine for Multivariate Performance Measures by Joachims.
  • sparse CPSP paper Sparse Kernel SVMs via Cutting-Plane Training. Joachims and Yu

sparsesvmlearning's People

Contributors

nbfigueroa avatar alpais avatar

Stargazers

 avatar

Watchers

 avatar James Cloos avatar paper2code - bot avatar

Forkers

cuplgb

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.