Git Product home page Git Product logo

orca's Introduction

Build Status LICENSE

ORCA logo

ORCA

ORCA (Ordinal Regression and Classification Algorithms) is a MATLAB framework that implements and integrates a wide range of ordinal regression methods and performance metrics from the paper "Ordinal regression methods: survey and experimental study" published in IEEE Transactions on Knowledge and Data Engineering. ORCA also helps to accelerate classifier experimental comparison with automatic fold execution, experiment paralellisation and performance reports. A basic definition of ordinal regression can be found at Wikipedia.

As a generic experimental framework, its two main objectives are:

  1. To run experiments easily to facilitate the comparison between algorithms and datasets.
  2. To provide an easy way of including new algorithms into the framework by simply defining the training and test methods and the hyperparameters of the algorithms.

To help these purposes, ORCA is mainly used through configuration files that describe experiments, but the methods can also be easily used through a common API.

Cite ORCA

If you use ORCA and/or associated datasets, please cite the following works:

J. Sánchez-Monedero, P. A. Gutiérrez and M. Pérez-Ortiz, 
"ORCA: A Matlab/Octave Toolbox for Ordinal Regression", 
Journal of Machine Learning Research. Vol. 20. Issue 125. 2019. http://jmlr.org/papers/v20/18-349.html

P.A. Gutiérrez, M. Pérez-Ortiz, J. Sánchez-Monedero, F. Fernandez-Navarro and C. Hervás-Martínez.
"Ordinal regression methods: survey and experimental study",
IEEE Transactions on Knowledge and Data Engineering, Vol. 28, January, 2016, pp. 127-146. http://dx.doi.org/10.1109/TKDE.2015.2457911

Bibtex entry:

@article{JMLR:v20:18-349,
  author  = {Javier S{{\'a}}nchez-Monedero and Pedro A. Guti{{\'e}}rrez and Mar{{\'i}}a P{{\'e}}rez-Ortiz},
  title   = {ORCA: A Matlab/Octave Toolbox for Ordinal Regression},
  journal = {Journal of Machine Learning Research},
  year    = {2019},
  volume  = {20},
  number  = {125},
  pages   = {1-5},
  url     = {http://jmlr.org/papers/v20/18-349.html}
}

@Article{Gutierrez2015,
  Title                    = {Ordinal regression methods: survey and experimental study},
  Author                   = {P.A. Guti\'errez and M. P\'erez-Ortiz and J. S\'anchez-Monedero and  F. Fernandez-Navarro and C. Herv\'as-Mart\'inez},
  Journal                  = {IEEE Transactions on Knowledge and Data Engineering},
  Year                     = {2016},
  Url                      = {http://dx.doi.org/10.1109/TKDE.2015.2457911},
  Volume                   = {28},
  Number                   = {1},
  pages                    = {127-146},
}

For more information about the paper and the ordinal datasets used please visit the associated website: http://www.uco.es/grupos/ayrna/orreview

For more information about our research group please visit Learning and Artificial Neural Networks (AYRNA) website at University of Córdoba (Spain).

Installation, tutorials and documentation

The documentation can be found in the doc folder and includes:

Methods included

The Algorithms folder includes the MATLAB classes for the algorithms included and the original code (if applicable). The config-files folder includes different configuration files for running all the algorithms. In order to use these files, the datasets used in the previously cited review paper are needed. To add your own method see Adding a new method to ORCA.

Running time of the algorithms was analysed in "Ordinal regression methods: survey and experimental study" (2016). From this analysis, it can be concluded that ELMOP, SVORLin and POM are the best option if computational cost is a priority. The training time of neural network methods (NNPOM and NNOP) and GPOR is in general the highest. This cost can be assumed for GPOR, given that it obtains very good performance for balanced ordinal datasets, while neural network-based methods are generally beaten by the ordinal SVM variants. Concerning scalability, the experimental setup in the review also included some relatively large datasets, so the practitioner could check the time it took to train one of those models with the ORCA framework. In general, linear models such as POM and SVORLin perform very well in these scenarios where there is plenty of data while still having a reasonably low running time (e.g. around 10 seconds for cross-validating, training and testing on a dataset of almost 22.000 patterns). Although very high-dimensional datasets were not considered in the analysis, it is well-known that SVMs can handle high-dimensional data, and given that they are one of the best performing methods in ordinal regression, this might be a good choice in such scenario.

Ordinal regression algorithms

  • SVR [2]: Standard Support Vector Regression with normalised targets (considered as a naïve approach for ordinal regression since equal distances between targets are assumed).
  • CSSVC [1]: Nominal SVM with the OneVsAll decomposition, where absolute costs are included as different weights for the negative class of each decomposition (it is considered as a naïve approach for ordinal regression since equal distances between targets are assumed).
  • SVMOP [3,4]: Binary ordinal decomposition methodology with SVM as base method, it imposes explicit weights over the patterns and uses a probabilistic framework for the prediction.
  • ELMOP [5]: Standard Extreme Learning Machine imposing an ordinal structure in the coding scheme representing the target variable.
  • POM [6]: Extension of the linear binary Logistic Regression methodology to Ordinal Classification by means of Cumulative Link Functions.
  • SVOREX [7]: Ordinal formulation of the SVM paradigm, which computes discriminant parallel hyperplanes for the data and a set of thresholds by imposing explicit constraints in the optimization problem.
  • SVORIM [7]: Ordinal formulation of the SVM paradigm, which computes discriminant parallel hyperplanes for the data and a set of thresholds by imposing implicit constraints in the optimization problem.
  • SVORLin [7]: Linear version of the SVORIM method (considering a linear kernel instead of the Gaussian one) to check how the kernel trick affects the final performance.
  • KDLOR [8]: Reformulation of the well-known Kernel Discriminant Analysis for Ordinal Regression by imposing an order constraint in the projected classes.
  • NNPOM [6,9]: Neural Network based on Proportional Odd Model (NNPOM), implementing a neural network model for ordinal regression. The model has one hidden layer and one output layer with only one neuron but as many thresholds as the number of classes minus one. The standard POM model is applied in this neuron to provide probabilistic outputs.
  • NNOP [10]: Neural Network with Ordered Partitions (NNOP), this model considers the OrderedPartitions coding scheme for the labels and a rule for decisions based on the first node whose output is higher than a predefined threshold (T=0.5). The model has one hidden layer and one output layer with as many neurons as the number of classes minus one.
  • REDSVM [11]: Augmented Binary Classification framework that solves the Ordinal Regression problem by a single binary model (SVM is applied in this case).
  • ORBoost [12]: This is an ensemble model based on the threshold model structure, where normalised sigmoid functions are used as the base classifier. The weights parameter configures whether the All margins versions is used (weights=true) or the Left-Right margin is used (weights=false).
  • OPBE [13]: Ordinal projection-based ensemble (OPBE) based on three-class decompositions, following the ordinal structure. A specific method for fusing the probabilities returned by the different three-class classifiers is implemented (product combiner, logit function and equal distribution of the probabilities). The base classifier is SVORIM but potentially any of the methods in ORCA can be setup as base classifier.

Partial order methods

  • HPOLD [16]: Hierarchical Partial Order Label Decomposition with linear and non-linear base methods.

Nominal methods

  • SVC1V1 [1]: Nominal Support Vector Machine using the OneVsOne formulation (considered as a naïve approach for ordinal regression since it ignores the order information).
  • SVC1VA [1]: Nominal Support Vector Machine with the OneVsAll paradigm (considered as a naïve approach for ordinal regression since it ignores the order information).
  • LIBLINEAR: Implementation of logistic regression and linear SVM based on LIBLINEAR.

Performance metrics

The measures folder contains the MATLAB classes for the metrics used for evaluating the classifiers. The measures included in ORCA are the following (more details about the metrics can be found in [14,15]:

  • MAE: Mean Absolute Error between predicted and expected categories, representing classes as integer numbers (1, 2, ...).
  • MZE: Mean Zero-one Error or standard classification error (1-accuracy).
  • AMAE: Average MAE, considering MAEs individually calculated for each class.
  • CCR: Correctly Classified Ration or percentage of correctly classified patterns.
  • GM: Geometric Mean of the sensitivities individually calculated for each class.
  • MMAE: Maximum MAE, considering MAEs individually calculated for each class.
  • MS: Minimum Sensitivity, representing the ratio of correctly classified patterns for the worst classified class.
  • Spearman: Spearman Rho.
  • Tkendall: Tau of Kendall.
  • Wkappa: Weighted Kappa statistic, using ordinal weights.

Utilities, classes and scripts

  • DataSet.m: Class for data preprocessing.
  • Experiment.m: Class that runs the different experiments.
  • Utilities.m: Class that pre-process the experiment files, run the different algorithms and produces the results.
  • runtests_single.m: Script to run all the methods using the ORCA API. Reference performance is compared with toy dataset in order to check that the installation is correct.
  • runtests_cv.m: This script runs full experiment tests using the ORCA configuration files to describe experiments.

Datasets

The example-data folder includes partitions of several small ordinal datasets for code testing purposes. We have also collected 44 publicly available ordinal datasets from various sources. These can be downloaded from: datasets-OR-review. The link also contains data partitions as used in different papers in the literature to ease experimental comparison. The characteristics of these datasets are the following:

Dataset #Pat. #Attr. #Classes Class distribution
pyrim5 (P5) 74 27 5 ~15 per class
machine5 (M5) 209 7 5 ~42 per class
housing5 (H5) 506 14 5 ~101 per class
stock5 (S5) 700 9 5 140 per class
abalone5 (A5) 4177 11 5 ~836 per class
bank5 (B5) 8192 8 5 ~1639 per class
bank5' (BB5) 8192 32 5 ~1639 per class
computer5 (C5) 8192 12 5 ~1639 per class
computer5' (CC5) 8192 21 5 ~1639 per class
cal.housing5 (CH5) 20640 8 5 4128 per class
census5 (CE5) 22784 8 5 ~4557 per class
census5' (CEE5) 22784 16 5 ~4557 per class
pyrim10 (P10) 74 27 10 ~8 per class
machine10 (M10) 209 7 10 ~21 per class
housing10 (H10) 506 14 10 ~51 per class
stock10 (S10) 700 9 10 70 per class
abalone10 (A10) 4177 11 10 ~418 per class
bank10 (B10) 8192 8 10 ~820 per class
bank10' (BB10) 8192 32 10 ~820 per class
computer10 (C10) 8192 12 10 ~820 per class
computer10' (CC10) 8192 21 10 ~820 per class
cal.housing (CH10) 20640 8 10 2064 per class
census10 (CE10) 22784 8 10 ~2279 per class
census10' (CEE10) 22784 16 10 ~2279 per class
Dataset #Pat. #Attr. #Classes Class distribution
contact-lenses (CL) 24 6 3 (15,5,4)
pasture (PA) 36 25 3 (12,12,12)
squash-stored (SS) 52 51 3 (23,21,8)
squash-unstored (SU) 52 52 3 (24,24,4)
tae (TA) 151 54 3 (49,50,52)
newthyroid (NT) 215 5 3 (30,150,35)
balance-scale (BS) 625 4 3 (288,49,288)
SWD (SW) 1000 10 4 (32,352,399,217)
car (CA) 1728 21 4 (1210,384,69,65)
bondrate (BO) 57 37 5 (6,33,12,5,1)
toy (TO) 300 2 5 (35,87,79,68,31)
eucalyptus (EU) 736 91 5 (180,107,130,214,105)
LEV (LE) 1000 4 5 (93,280,403,197,27)
automobile (AU) 205 71 6 (3,22,67,54,32,27)
winequality-red (WR) 1599 11 6 (10,53,681,638,199,18)
ESL (ES) 488 4 9 (2,12,38,100,116,135,62,19,4)
ERA (ER) 1000 4 9 (92,142,181,172,158,118,88,31,18)
marketing 8993 74 9 (1745,775,667,813,722,1110,969,1308,884)
thyroid 7200 21 3 (6666,166,368)
winequality-white 4898 11 7 (20,163,1457,2198,880,175,5)

Experiments parallelization with HTCondor

The condor folder contains the necessary files and steps for using HTCondor with our framework.

External software

ORCA makes use of the following external software implementations. For some of them, a Matlab interface has been developed through the use of MEX files.

  • libsvm-weights-3.12: framework used for Support Vector Machine algorithms. The version considered was 3.12.
  • libsvm-rank-2.81: implementation used for the REDSVM method. The version considered was 2.81.
  • orensemble: implementation used for the ORBoost method.
  • SVOR: implementation used for the SVOREX, SVORIM and SVORIMLin methods.

Other contributors

Apart from the authors of the paper and the authors of the implementations referenced in "External software" section, the following persons also contributed to ORCA framework:

References

  • [1] C.-W. Hsu and C.-J. Lin, “A comparison of methods for multi-class support vector machines,” IEEE Transaction on Neural Networks, vol. 13, no. 2, pp. 415–425, 2002.
  • [2] A. Smola and B. Schölkopf, “A tutorial on support vector regression,” Statistics and Computing, vol. 14, no. 3, pp. 199–222, 2004.
  • [3] E. Frank and M. Hall, “A simple approach to ordinal classification,” in Proceedings of the 12th European Conference on Machine Learning, ser. EMCL ’01. London, UK: Springer-Verlag, 2001, pp. 145–156.
  • [4] W. Waegeman and L. Boullart, “An ensemble of weighted support vector machines for ordinal regression,” International Journal of Computer Systems Science and Engineering, vol. 3, no. 1, pp. 47–51, 2009.
  • [5] W.-Y. Deng, Q.-H. Zheng, S. Lian, L. Chen, and X. Wang, “Ordinal extreme learning machine,” Neurocomputing, vol. 74, no. 1–3, pp. 447– 456, 2010.
  • [6] P. McCullagh, “Regression models for ordinal data,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 42, no. 2, pp. 109–142, 1980.
  • [7] W. Chu and S. S. Keerthi, “Support Vector Ordinal Regression,” Neural Computation, vol. 19, no. 3, pp. 792–815, 2007.
  • [8] B.-Y. Sun, J. Li, D. D. Wu, X.-M. Zhang, and W.-B. Li, “Kernel discriminant learning for ordinal regression,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 6, pp. 906–910, 2010.
  • [9] M. J. Mathieson, Ordinal models for neural networks, in Proc. 3rd Int. Conf. Neural Netw. Capital Markets, 1996, pp. 523-536.
  • [10] J. Cheng, Z. Wang, and G. Pollastri, "A neural network approach to ordinal regression," in Proc. IEEE Int. Joint Conf. Neural Netw. (IEEE World Congr. Comput. Intell.), 2008, pp. 1279-1284.
  • [11] H.-T. Lin and L. Li, “Reduction from cost-sensitive ordinal ranking to weighted binary classification,” Neural Computation, vol. 24, no. 5, pp. 1329–1367, 2012.
  • [12] H.-T. Lin and L. Li, “Large-margin thresholded ensembles for ordinal regression: Theory and practice,” in Proc. of the 17th Algorithmic Learning Theory International Conference, ser. Lecture Notes in Artificial Intelligence (LNAI), J. L. Balcazar, P. M. Long, and F. Stephan, Eds., vol. 4264. Springer-Verlag, October 2006, pp. 319–333.
  • [13] M. Pérez-Ortiz, P. A. Gutiérrez y C. Hervás-Martínez. “Projection based ensemble learning for ordinal regression”, IEEE Transactions on Cybernetics, Vol. 44, May, 2014, pp. 681-694.
  • [14] M. Cruz-Ramírez, C. Hervás-Martínez, J. Sánchez-Monedero and P. A. Gutiérrez. “Metrics to guide a multi-objective evolutionary algorithm for ordinal classification,” Neurocomputing, Vol. 135, July, 2014, pp. 21-31.
  • [15] J. C. Fernandez-Caballero, F. J. Martínez-Estudillo, C. Hervás-Martínez and P. A. Gutiérrez. “Sensitivity Versus Accuracy in Multiclass Problems Using Memetic Pareto Evolutionary Neural Networks,” IEEE Transacctions on Neural Networks, Vol. 21. 2010, pp. 750-770.
  • [16] J. Sánchez-Monedero, M. Pérez-Ortiz, A. Sáez, P.A. Gutiérrez and C. Hervás-Martínez. "Partial order label decomposition approaches for melanoma diagnosis". Applied Soft Computing. Vol. 64, March 2018, pp. 341-355.

orca's People

Contributors

durka avatar javism avatar mperezortiz avatar pagutierrez avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

orca's Issues

Installation under Octave

Hi,

I have Octave installed but when compiling the sources (I'm following the install guide), the Makefile in src/Algorithms assumes that I have Matlab installed, resulting in the following error:

Folder /usr/local/MATLAB/R2017a/ does not exist. Please, set up MATLABDIR propertly
false
Makefile:18: recipe for target '/usr/local/MATLAB/R2017a/' failed
make: *** [/usr/local/MATLAB/R2017a/] Error 1

hello,I ask one question

hello,I am a student,and I run the code of ,but for some dataset , show Matrix dimensions don't agree in Utilities class ,results method:
cm = confusionmat(act{h},pred{h});
cm_sum = cm_sum + cm;
I find the question is confusionmat function, use it to get cm, the Matrix dimensions don't agree with cm_sum.
can you give me some advise?

Windows port

Windows port related topics. ORCA is being ported to Windows. The methods that does not dependt of external C/C++ code should work out of the box. For the rest, some work has to be done using GCC and Make on Windows.

List of methods that works on Windows (R2017b):

  • CCSVC
  • ELMOR
  • KDLOR
  • OBBE
  • ORBOOST
  • POM
  • REDSVM
  • SVC1V1
  • SVC1VA
  • SVMOP
  • SVOREX
  • SVORIMLIN
  • SVORIM
  • SVR

Parallel running of experiments

Issues related to parallelisation of experiments:

  • matlabpool is removed in recent versions of matlab: add compatibility between versions
  • test compatibility with Octave

Unified experiments ini

Hi! I have been trying to run an unified experiment file (ini format) with a mixture of classifiers (POM, KDLOR) and I always get the same error:

Setting up experiments...
Running experiment exp-kdlor-amae-tormentas-1.ini
Running experiment exp-pom-tormentas-1.ini
error: Error 'C' is not a recognized class parameter name
error: called from
    paramopt>checkParameters at line 135 column 9
    paramopt at line 28 column 1
    crossValideParams at line 180 column 22
    run at line 68 column 26
    launch at line 55 column 13
    runExperiments at line 90 column 25

This is the configuration file:

[pom]
{general-conf}
seed = 1
basedir = ../exampledata/1-holdout/
datasets = tormentas
standarize = true

{algorithm-parameters}
algorithm = POM

[kdlor-amae]
{general-conf}
seed = 1
basedir = ../exampledata/1-holdout/
datasets = tormentas
standarize = true
num_folds = 5
cvmetric = amae

{algorithm-parameters}
algorithm = KDLOR
kernelType = rbf

{algorithm-hyper-parameters-to-cv}
C = 10.^(-3:1:3)
k = 10.^(-3:1:3)
u = 0.01,0.001,0.0001,0.00001,0.000001

It seems that POM is trying to use the hyper-parameters for KDLOR.

Thanks a lot!

REDSVM - Possible memory leak

Migrating REDSVM to ORCA-Python i detected a memory leak during the execution of the algorithm. The problem looks like is this part of the svm_free_model_content function:

if(model_ptr->free_sv && model_ptr->l > 0 && model_ptr->SV != NULL)
free((void *)(model_ptr->SV[0]));

This code only free the memory of the first SV but not the rest of them. Changing that to:

if(model_ptr->free_sv && model_ptr->l > 0 && model_ptr->SV != NULL){
	for(int i=0;i<model_ptr->l;i++)
			free((void *)(model_ptr->SV[i]));
}

Solved the problem for me.

Homogenize algorithms API

Some algorithms present inconsistent API. For instance POM receives a matrix with patterns in test method, instead of the dataset structure.

Parameters processing can confuse parameters with similar name

Parameters processing can be unstable depending of the parameters name that the user choose. Now, if we have two different parameters algorithm and algorithmDefault any of them can be randomly choose to be assigned to obj.method. The issue comes from Experiment.m:

elseif strncmpi('algorithm',nueva_linea, 3),

Causes to compare only using the first lines. A quick fix can be to use length(), so that:

elseif strncmpi('algorithm',nueva_linea, length('algorithm')),

Improve addpath in methods using C code

Methods using C code such as SVORIM, SVOREX... perform addpath only in runAlgorithm. However, if the train/test methods are called there is an error since addpath is only added in runAlgorithm.

Potential solutions are:

  • Place addpath in the constructor and rmpath in the destructor (more general)
  • Add addpath/rmpath in train and test methods.

Continuous integration

The software is not under continuous integration. We can integrate octave with travis.

SVOREX - Segmentation Fault

First of all, I'm running ORCA on Matlab R2018a.

I've been crossvalidating SVOREX with a big set of parameters. At some point (i.e. with a specific combination of parameters detailed below), SVOREX has returned a segmentation fault with the following error description:

Warning: KKT conditions are violated on bias!!! -0.101231 with C=1.000 K=0.001 Segmentation Fault

Up to my knowledge, this comes from the following lines, in the smo_routine.c (included in SVOREX folder):

if (settings->bmu_low[loop-1] - settings->bmu_up[loop-1]>TOL){
	printf("Warning: KKT conditions are violated on bias!!! %f with C=%.3f K=%.3f\r\n",
	settings->bmu_low[loop-1] + settings->bmu_up[loop-1], VC, KAPPA);
	exit(1);
}

In my case, by removing the exit(1); line, the code works successfully, however, I could be omitting any criterion that must be satisfied.

The dataset (patterns and labels of both train and test) is attached to this issue. The algorithm is SVOREX, and the parameter combination is: C=1.000 -- K=0.001.

Dataset.zip

Unit tests

  • Unit tests. Coverage test for all the algorithms and experiments.

issues testing from octave

Hi,

I'm running orca from Octave installed from ubuntu 18.10.

Installation goes fine but when I try to run the tests (from Octave shell) I get the following errors.

warning: struct: converting a classdef object into a struct overrides the access restrictions defined for properties. All properties are returned, including private and protected ones.
warning: called from
    fieldnames at line 47 column 11
    parseArgs at line 114 column 29
    POM at line 56 column 13
    pomTest at line 7 column 14
    runtestssingle at line 35 column 5
panic: Segmentation fault -- stopping myself...
.........................
Performing test for POM
Accuracy Train 0.408889, Accuracy Test 0.333333
Test accuracy matchs reference accuracy
Processing redsvmTest.m...
attempting to save variables to 'octave-workspace'...
error: octave_base_value::save_binary(): wrong type argument 'object'

any clue on what is causing the error?

Refactor predict()

Algorithm's classification method is obj.predict(patterns, model). Following OOP convention, the model should be stored in obj.model, there allowing obj.predict(patterns). However this would affect to ensemble models and binary decomposition methods, since there use to be several models and that methods reuse predict several times to perform the prediction (see OPBE).

The changes can be done, but are not straightforward.

POM improvements

  • Include more link functions
  • Rewrite predict() to use mnrval()

Avoid using combvec

The code only uses the function combvec from the nnet toolbox (in the file Experiment.m). However, it could be easilly replaced by:

  • Link1
  • Link2
    We should do it to reduce dependencies and future problems with Octave.

Include software binaries

Provide binaries in case compilation fails or to allow the use of the software in environments without a suitable compiler. ¿Should we provide 32bits binaries?

  • Linux Matlab binaries
  • Linux Octave binaries
  • Windows Matlab binaries
  • Windows Octave binaries

Compatibility between versions

Some functions are deprecated depending on Matlab's version. Examples are:

Warning: The RandStream.setDefaultStream static method will be removed in a future release. Use
RandStream.setGlobalStream instead.
But also de ones related to optimin for KDLOR

To propertly fix this we need:

  • To better detect version (improve regular expressions at KDLOR.m)
  • To update code according to version
  • Add makefiles to ensure mex compatibility

test failure in ORBoost and SVORIM

After making a few edits to get the toolbox compiling on macOS (see #45), I got these errors when running runtestssingle:

ORBoost

Index exceeds matrix dimensions.

Error in ORBoost/privpredict (line 123)
            predicted = all(:,1);

Error in Algorithm/predict (line 80)
            [projected, predicted]= privpredict(obj,test);

Error in ORBoost/privfit (line 84)
            [projectedTrain,predictedTrain] = obj.predict(train.patterns);

Error in Algorithm/fit (line 65)
            [projectedTrain, predictedTrain] = obj.privfit(train, param);

Error in Algorithm/runAlgorithm (line 33)
            [mInf.projectedTrain, mInf.predictedTrain] = obj.fit(train,param);

Error in orboostTest (line 13)
info = algorithmObj.runAlgorithm(train,test);

Error in runtestssingle (line 37)
        eval(cmd(1:end-2))

SVORIM

Error using svorimTest (line 35)
Test accuracy does NOT match reference accuracy

Error in runtestssingle (line 37)
        eval(cmd(1:end-2))

Add datasets with description

  • Add ordinal regression datasets including data properties
  • Rename 'gpor' to 'matlab' in datasets and scripts
  • Add real problems datasets description

Indentation, comments and variables naming

We need to prettify the code:

  • Code indentation is not consistent through the files
  • All classes and methods description have to match MATLAB's comments style
  • Some variables names are in Spanish

Abstract methods not available in Octave

Abstract methods are not available in Octave (see bug).

The solution right now is to comment those methods in the abstract classes. However the proper solution can be to define the methods as standard methods that trow an exception in the upper class so they can be only called if implemented in child classes. There we have a kind of interface with the tools we have in Matlab and Octave.

Bug in parallel processing

After doing a parallel run of tests:

Parallel pool using the 'local' profile is shutting down.
Calculating results...
Undefined function or variable
'myExperiment'.

Error in Utilities.runExperiments (line
100)
            Utilities.results([logsDir '/'
            'Results'],'report_sum',
            myExperiment.report_sum,
            'train', true);

Error in runtestscv (line 38)
    exp_dir =
    Utilities.runExperiments([tests_dir '/'
    files(i).name], 'parallel', true);

Framework basic tests

This is mandatory to verify code correctness and ease further code improvement. Task ordered by priority. First two ones are basis to perform installation tests.

  • Method level. For each method, create a test for base functionality. The test consist on predefined hyper-parameters and a test dataset with known reference performance.
  • Experiments script level test. We have to check that code executions of example experiments ends correctly. Because of non-determinism behavior of hyper-parameters optimization initially we do not consider reference performance.

Update or supress bash Makefiles

We need to update bash Makefiles to match the rules in Matlab/Octave set of make.m. However, I'd suggest to suppress bash Makefiles since they can be confusing for the user because the user need to propertly setup a set the environment variables to point to Matlab/Octave install dir.

Handling folds execution failure

If a methods fails to end a fold experiment (example fold 15), the results table is build without any notification to the user.

The second issue, is that in the report file the folds rows are secuentially numbered, so that if fold 15 fails, 'dataset/results_test.csv', it still appearing (ex. test_dermatology.15) , and last identifier of experiment is suppressed (test_dermatology.9).

Flag 'all' in .ini file

Hi!

When I try to run an experiment of 20 datasets with the option datasets = all, only first dataset in alphabetical order is considered. I have to write the names of the 20 datasets separated by comma in order to run the experiment properly.

The configuration file with dataset = all option:

[test]
{general-conf}
seed = 1
report_sum = true

; Datasets path
basedir = ../../data/datasets/orca/5classes/original/

; Datasets
datasets = all

; Standardization
standarize = true

; Method: algorithm and parameter
{algorithm-parameters}
algorithm = POM

Returns:

Setting up experiments...
Running experiment exp-test-BTC-AR-1.ini
Calculating results...
Experiments/exp-2019-7-23-13-29-59/Results/BTC-AR-test/dataset
Experiments/exp-2019-7-23-13-29-59/Results/BTC-AR-test/dataset
ans = Experiments/exp-2019-7-23-13-29-59

ORCA only use one dataset. However, when I list all of them in a comma separated list, the experiment runs correctly:

[test]
{general-conf}
seed = 1
report_sum = true

; Datasets path
basedir = ../../data/datasets/orca/5classes/original/

; Datasets
datasets = BTC-AR-trend-CNDL, ETH-AR-trend, ETH-AR-crypto-CNDL, ETH-AR-trend-crypto, BTC-AR-trend-crypto-CNDL, ETH-AR-CNDL, BTC-AR-CC-trend-CNDL, BTC-AR-CNDL, BTC-AR-CC-trend, ETH-AR-crypto, ETH-AR-trend-crypto-CNDL, BTC-AR-trend, ETH-AR-CC-trend, ETH-AR-trend-CNDL, BTC-AR-crypto-CNDL, BTC-AR-crypto, ETH-AR-CC-trend-CNDL, ETH-AR, BTC-AR, BTC-AR-trend-crypto

; Standardization
standarize = true

; Method: algorithm and parameter
{algorithm-parameters}
algorithm = POM

Returns:

Setting up experiments...
Running experiment exp-test-BTC-AR-1.ini
Running experiment exp-test-BTC-AR-CC-trend-1.ini
Running experiment exp-test-BTC-AR-CC-trend-CNDL-1.ini
Running experiment exp-test-BTC-AR-CNDL-1.ini
Running experiment exp-test-BTC-AR-crypto-1.ini
Running experiment exp-test-BTC-AR-crypto-CNDL-1.ini
Running experiment exp-test-BTC-AR-trend-1.ini
Running experiment exp-test-BTC-AR-trend-CNDL-1.ini
Running experiment exp-test-BTC-AR-trend-crypto-1.ini
Running experiment exp-test-BTC-AR-trend-crypto-CNDL-1.ini
Running experiment exp-test-ETH-AR-1.ini
Running experiment exp-test-ETH-AR-CC-trend-1.ini
Running experiment exp-test-ETH-AR-CC-trend-CNDL-1.ini
Running experiment exp-test-ETH-AR-CNDL-1.ini
Running experiment exp-test-ETH-AR-crypto-1.ini
Running experiment exp-test-ETH-AR-crypto-CNDL-1.ini
Running experiment exp-test-ETH-AR-trend-1.ini
Running experiment exp-test-ETH-AR-trend-CNDL-1.ini
Running experiment exp-test-ETH-AR-trend-crypto-1.ini
Running experiment exp-test-ETH-AR-trend-crypto-CNDL-1.ini
Calculating results...
Experiments/exp-2019-7-23-13-38-56/Results/BTC-AR-CC-trend-CNDL-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/BTC-AR-CC-trend-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/BTC-AR-CNDL-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/BTC-AR-crypto-CNDL-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/BTC-AR-crypto-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/BTC-AR-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/BTC-AR-trend-CNDL-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/BTC-AR-trend-crypto-CNDL-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/BTC-AR-trend-crypto-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/BTC-AR-trend-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/ETH-AR-CC-trend-CNDL-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/ETH-AR-CC-trend-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/ETH-AR-CNDL-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/ETH-AR-crypto-CNDL-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/ETH-AR-crypto-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/ETH-AR-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/ETH-AR-trend-CNDL-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/ETH-AR-trend-crypto-CNDL-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/ETH-AR-trend-crypto-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/ETH-AR-trend-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/BTC-AR-CC-trend-CNDL-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/BTC-AR-CC-trend-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/BTC-AR-CNDL-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/BTC-AR-crypto-CNDL-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/BTC-AR-crypto-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/BTC-AR-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/BTC-AR-trend-CNDL-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/BTC-AR-trend-crypto-CNDL-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/BTC-AR-trend-crypto-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/BTC-AR-trend-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/ETH-AR-CC-trend-CNDL-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/ETH-AR-CC-trend-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/ETH-AR-CNDL-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/ETH-AR-crypto-CNDL-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/ETH-AR-crypto-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/ETH-AR-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/ETH-AR-trend-CNDL-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/ETH-AR-trend-crypto-CNDL-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/ETH-AR-trend-crypto-test/dataset
Experiments/exp-2019-7-23-13-38-56/Results/ETH-AR-trend-test/dataset
ans = Experiments/exp-2019-7-23-13-38-56

Im running the experiment from a jupyter notebook with Octave kernel (version 4.4.1) and macOS High Sierra 10.13.6

All the parameters have should be configured

All the parameters have should be configured and passed to the runAlgorithm method as variable arguments.
The type of the parameters should be inferred from the default values of the "parameters" structure.

ORCA not listed on the octave package index

Dear maintainers,

I noticed your package does not appear on the octave package index. Please consider adding it! You can find the octave package index and instructions how to add your package here: https://gnu-octave.github.io/packages/

Ideally you should create a package in the octave package format, that could be installed via octave's pkg command; however, this no longer a strict requirement, and packages with custom installation instructions are also accepted on the index, as long as they provide clear installation instructions. :)

Tutorials

  • Normal use through matlab
  • Use for paralellization (condor, parfor)
  • Getting started (git clone, compilation...)

Framework instalation and build

There are some pending task related to ORCA installation. The installation is done with Makefile (Linux) or make() function (Linux/Windows).

Complete build/clean from src/Algorithms folder:

  • Makefile Linux Matlab
  • Makefile Linux Octave
  • make() Linux Matlab
  • make() Linux Octave
  • make() Windows Matlab
  • make() Windows Octave

Clean of objects:

  • Makefile Linux Matlab/Octave
  • make() Linux Matlab
  • make() Linux Octave
  • make() Windows Matlab
  • make() Windows Octave

Clean all (objects + executables). This is useful when using several versions of matlab or octave.

  • Makefile Linux Matlab/Octave
  • make() Linux Matlab
  • make() Linux Octave
  • make() Windows Matlab
  • make() Windows Octave

Incoherent output when disabling cv

Hi everyone,

I am using the orca library and am trying to run the SVORIM algorithm on my own data. I have allready crossvalidated it and I want to disable this. Therefore, I am using the following .ini file:

`;SVORIM experiments
; Experiment ID
[test]
{general-conf}
seed = 1
; Datasets path
basedir = ../data
; Datasets to process (comma separated list or all to process all)
datasets = test1981
; Activate data standardization
standarize = false
; Number of folds for the parameters optimization
;num_folds = 0
; Crossvalidation metric
cvmetric = mae

; Method: algorithm and parameter
{algorithm-parameters}
algorithm = SVORIM
;kernelType = rbf

; Method's hyper-parameter values
{algorithm-hyper-parameters}
C = 1000
k = 0.01
`

Unfortunately, my predictions are now an incoherent mess of symbols such as: "ਲਲਲਲਲਲਲਲਲਲਲਲਲਲਲਲਲਲਲਲਲਲਲਲਲਲ". When I leave cv enabled I don't have this problem. However, running this code with cv takes over a day and since I have to run it a number of times, without cv is preferred. Is my method of disabling cv in the ini file incorrect of is something else happening that is causing this?

Finally, the "results_test.csv" and "results_train.csv" files are created correctly with data in them that seems to be correct (allthough it has C = 0.1 and k =0.1 instead of C = 1000 and k = 0.01, which is also strange). I hope you can help and thanks in advance!

something in the period of ORboost Training

I encountered an "Index exceeds matrix dimensions" error while using the ORBoost function in my MATLAB code. I followed the instructions in orca_quick_install.md to run runtests_single in MATLAB 2017b on my MacBook, and the following errors occurred:
"
Index exceeds matrix dimensions.
Error in ORBoost/privpredict (line 123)
predicted = all(:,1);

Error in Algorithm/predict (line 80)
[projected, predicted] = privpredict(obj, test);

Error in ORBoost/privfit (line 84)
[projectedTrain, predictedTrain] = obj.predict(train.patterns);

Error in Algorithm/fit (line 65)
[projectedTrain, predictedTrain] = obj.privfit(train, param);

Error in Algorithm/fitpredict (line 33)
[mInf.projectedTrain, mInf.predictedTrain] = obj.fit(train, param);

Error in orboostTest (line 13)
info = algorithmObj.fitpredict(train, test);

Error in runtests_single (line 44)
eval(cmd(1
))
"
Any guidance on resolving this error would be greatly appreciated.

Thank you!

[bug] In DataSet.standarizeData

in DataSet.standarizeFunction (line 106) XStds = std(X) operates across columns, not rows.

Example:
>> X = [1 2 3; 4 5 6]

X =

 1     2     3
 4     5     6

>> std(X)

ans =

2.1213    2.1213    2.1213`

matlab version: 9.6.0.1072779 (R2019a)

Suggested solution: change line 106 to XStds = std(X.')

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.