glycerine / sofia-ml Goto Github PK
View Code? Open in Web Editor NEWAutomatically exported from code.google.com/p/sofia-ml
License: Apache License 2.0
Automatically exported from code.google.com/p/sofia-ml
License: Apache License 2.0
sofia-ml Project homepage: http://code.google.com/p/sofia-ml/ ==Introduction== The suite of fast incremental algorithms for machine learning (sofia-ml) can be used for training models for classification or ranking, using several different techniques. This release is intended to aid researchers and practitioners who require fast methods for classification and ranking on large, sparse data sets. Supported learners include: * Pegasos SVM * Stochastic Gradient Descent (SGD) SVM * Passive-Aggressive Perceptron * Perceptron with Margins * ROMMA These learners can be configured for classification and ranking, with several sampling methods available. This implementation gives very fast training times. For example, 100,000 Pegasos SVM training iterations can be performed on data from the CCAT task from the RCV1 benchmark data set (with roughly 780,000 examples) in 0.1 CPU seconds on an ordinary 2.4GHz laptop, with no loss in classification performance compared with other SVM methods. On LETOR learning to rank benchmark tasks, training time with 100,000 Pegasos SVM rank steps complete 0.2 CPU seconds on an ordinary laptop. The primary computational bottleneck is actually reading the data off of disk; sofia-ml reads and parses data from disk substantially faster than other SVM packages we tested. For example, sofia-ml can read and parse data nearly 10 times faster than the reference Pegasos implementation by Shalev-Shwartz, and nearly 3 times faster than svm_perf by Joachims. This package provides a commandline utility for training models and using them to predict on new data, and also exposes an API for model training and prediction. The underlying libraries for data sets, weight vectors, and example vectors are also provided for researchers wishing to use these classes to implement other algorithms. ==Quick Start== These quick-start instructions assume the use of the unix/linux commandline, with g++ installed. There are no external code dependencies. Step 1 Check out the code: > svn checkout http://sofia-ml.googlecode.com/svn/trunk/sofia-ml sofia-ml-read-only Step 2 Compile the code: > cd sofia-ml-read-only/src/ > make > ls ../sofia-ml # Executable should be in main sofia-ml-read-only directory. # If the above did not succeed, run the unit tests to help locate the problem: > make clean > make all_test Step 3 Test the code: > cd .. > ./sofia-ml # This should display the set of commandline flags and descriptions. # Train a model on the demo training data. > ./sofia-ml --learner_type pegasos --loop_type stochastic --lambda 0.1 --iterations 100000 --dimensionality 150000 --training_file demo/demo.train --model_out demo/model # This should display something like the following: Reading training data from: demo/demo.train Time to read training data: 0.056134 Time to complete training: 0.075364 Writing model to: demo/model Done. # Test the model on the demo data. > ./sofia-ml --model_in demo/model --test_file demo/demo.train --results_file demo/results.txt # Should display the following: Reading model from: demo/model Done. Reading test data from: demo/demo.train Time to read test data: 0.046729 Time to make test prediction results: 0.000844 Writing test results to: demo/results.txt Done. # Examine a few results in the results file: > head -5 demo/results.txt # Format is: <prediction value>\t<label from test file>. Each line in the results # file corresponds to the same line (in order) in the test file. 1.02114 1 1.18046 1 -1.24609 -1 -1.12822 -1 -1.41046 -1 # Note that exact results may vary slightly because these algorithms train # by randomly sampling one example at a time. # Evaluate the results: > perl eval.pl demo/results.txt # Should display something like: Results for demo/results.txt: Accuracy 0.9880 (using threshold 0.00) (988/1000) Precision 0.9719 (using threshold 0.00) (311/320) Recall 0.9904 (using threshold 0.00) (311/314) ROC area: 0.999406 Total of 1000 trials. # Note that this evaluation script has limited functionality. For more # options, we recommend using the perf software by Rich Caruana (developed fo # the KDD Cup 2004), available at: http://kodiak.cs.cornell.edu/kddcup/software.html ==Data Format== This package uses the popular SVM-light sparse data format. <class-label> <feature-id>:<feature-value> ... <feature-id>:<feature-value>\n <class-label> qid:<optional-query-id> <feature-id>:<feature-value> ... <feature-id>:<feature-value>\n <class-label> <feature-id>:<feature-value> ... <feature-id>:<feature-value># Optional comment or extra data, following the optional "#" symbol.\n The feature id's are expected to be in ascending numerical order. The lowest allowable feature-id is 1 (0 is reserved for the bias term internally.) Any feature not specified is assumed to have value 0 to allow for sparse representation. The class label for test data is required but not used; it's okay to put in a dummy placeholder value such as 0 for test data. For binary-class classification problems, the training labels should be 1 or -1. For ranking problems, the labels may be any numeric value, with higher values being judged as "more preferred". Currently, the comment string is not used. However, it is available for use in other algorithms, and can also be useful to aid in bookkeeping of data files. Examples: # Class label is 1, feature 1 has value 1.2, feature 2 (not listed) has value 0, # and feature 3 has value -0.5. 1 1:1.2 3:-0.5 # Class label is -1, belongs to qid 3, and all feature values are zero except # for feature 5011 with value 1.2. -1 qid:3 5011:1.2 # Class label is -1, feature 1 has value 7, comment string is # "This example is especially interesting." -1 1:7 3:-0.5#This example is especially interesting. ==Commandline Details== File Input and Output --model_in * Read in a model from this file before doing any training or testing. --model_out * Write the model to this file when done with everything. --training_file * File to be used for training. When set, causes model training to occur. --test_file * File to be used for testing. When set, causes current model (either loaded from --model_in or trained from --training_file to be tested on test data. --results_file * File to which to write predictions, when --test_file is used. Results for each line are in the format <prediction>\t<label from test file>\n and correspond line-by-line with the examples form the --test_file. Learning Options --learner_type * Type of learner to use. * Options are: o pegasos Use the Pegasos SVM learning algorithm. --lambda sets the regularization parameter, with values closer to zero giving less regularization. Note that Pegasos enforces a hard constraint that the model weight vector must lie within an L2 ball of radius at most 1/sqrt(lambda). Also relies on --eta_type. o sgd-svm Use the SGD-SVM learning algorithm. --lambda sets the regularization parameter, with values closer to zero giving less regularization. Also relies on --eta_type o passive-aggressive Use the Passive Aggressive Perceptron learning algorithm. --passive-aggressive-c sets the largest step size to be taken on any update step; this operates as a capacity term with values closer to zero encouraging simpler models. --passive-aggressive-lambda will force the model weight vector to lie within an L2 ball of radius 1/sqrt(passive-aggressive-lambda) o margin-perceptron Use the Perceptron with Margins algorithm. --perceptron-margin-size sets the update margin. When set to 0, this is exactly equivalent to the classical Perceptron by Rosenblatt. When set to 1, this is equivalent to optimizing SVM hinge-loss without regularization. Increasing values may give additional tolerance to noise. Also relies on --eta_type. o romma Use the ROMMA algorithm. No parametert to set. o logreg-pegasos Use Logistic Regression with Pegasos updates; we optimize logistic loss and enforce Pegasos-style regularization and constraints, with --lambda being the regularization parameter. Also relies on --eta_type. * Default: pegasos --loop_type * Type of sampling loop to use for training, controlling how examples are selected. * Options are: o stochastic Perform normal stochastic sampling for stochastic gradient descent, for training binary classifiers. On each iteration, pick a new example uniformly at random from the data set. o balanced-stochastic Perform a balanced sampling from positives and negatives in data set. For each iteration, samples one positive example uniformly at random from the set of all positives, and samples one negative example uniformly at random from the set of all negatives. This can be useful for training binary classifiers with a minority-class distribution. o rank Perform indexed sampling of candidate pairs for pairwise learning to rank. Useful when there are examples from several different qid groups. o roc Perform indexed sampling to optimize ROC Area. o query-norm-rank Perform sampling of candidate pairs, giving equal weight to each qid group regardless of its size. Currently this is implemented with rejection sampling rather than indexed sampling, so this may run more slowly. * Default: stochastic --eta_type * Type of update for learning rate to use. * Options are: o basic On the i-th iteration, the learning rate eta is set to: 1000 / (i + 1000) o pegasos On the i-th iteration, the learning rate eta is set to: 1 / (i * lambda) o constant Always use learning rate eta of 0.02. * Default: pegasos --dimensionality * Index id of largest feature index in training data set, plus one. * Default: 2^17 = 131072 --iterations * Number of stochastic gradient steps to take. * Default: 100000 --lambda * Value of lambda for SVM regularization, used by both Pegasos SVM and SGD-SVM. * Default: 0.1 --passive_aggressive_c * Maximum size of any step taken in a single passive-aggressive update. --passive_aggressive_lambda * Lambda for pegasos-style projection for passive-aggressive update. * When set to 0 (default) no projection is performed. --perceptron_margin_size * Width of margin for perceptron with margins. * Default of 1 is equivalent to unregularized SVM-loss. --hash_mask_bits * When set to a non-zero value, causes the use of a hashed weight vector with hashed cross product features. This allows learning on conjunction of features, at some increase in computational cost. Note that this flag must be set both in training and testing to function properly. * The size of the hash table is set to 2^--hash_mask_bits. * Default value of 0 shows that hash cross products are not used. Other Options --random_seed * When set to non-zero value, use this seed instead of seed from system clock. * This can be useful in testing and in parameter tuning. * Default: 0 --training_objective * Compute value of objective function on training data, after training. * Default is not to do this. ==References== If you use this source code for scientific research, please cite the following: * D. Sculley. Large Scale Learning to Rank. NIPS Workshop on Advances in Ranking, 2009. Presents the indexed sampling methods used learning to rank, including the rank and roc loops. Additional reading and references: * K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. Online passive-aggressive algorithms. J. Mach. Learn. Res., 7, 2006. Presents the Passive-Aggressive Perceptron algorithm. * T. Joachims. Optimizing search engines using clickthrough data. In KDD ’02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, 2002. Presents the RankSVM objective function, a pairwise objective function used by the rank loop method in sofia-ml. * Y. Li and P. M. Long. The relaxed online maximum margin algorithm. Mach. Learn., 46(1-3), 2002. Presents the ROMMA algorithm. * S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for SVM. In ICML ’07: Proceedings of the 24th international conference on Machine learning,
Hello D.
I've started to work on the multi-label branch. I have made the following
changes:
- Parse comma-separated list of labels.
- Add a MultiplePassOuterLoop routine: it shuffles the dataset and makes
several passes over it. It's more intuitive to determine a number of passes and
results can sometimes be more stable on some datasets.
- Add a MultiLabelWeightVector. It is compatible with other weight classes
(both API-wise and file-wise). It also has a bunch of additional methods such
as "SelectLabel".
- Add Multi-Label Passive-Aggressive. Strictly speaking, the learner optimizes
a label ranking (relevant labels should be more ranked higher than irrelevant
labels). On the 20 newsgroup dataset, it gives 82% accuracy (liblinear gave
85%). (I didn't optimize the hyperparameters though).
- Add a "--prediction_type multi-label" option.
- Infer the number of dimensions from the training dataset when --dimensioality
is set to 0.
I wanted to add one-vs-all but unfortunately, the fact that the labels are
attached to the vectors makes it hard (or inefficient): I need to be able to
pass +1 or -1 instead of the real label to the update function.
Possible short-term plans could include optimizing the multi-class hinge loss
and the multinomial logistic loss by SGD.
Original issue reported on code.google.com by [email protected]
on 28 Apr 2011 at 8:38
What steps will reproduce the problem?
1. Create this training file:
======= train.txt =======
1 1:1 2:.1 3:.1 200:1
1 1:1.2 2:.01 3:.01 200:1
1 1:3 2:.2 3:.41 200:1
-1 3:4 200:1
-1 2:3 200:1
-1 1:.1 2:3 3:2 200:1
====================
2. ./sofia-ml-read-only/sofia-ml --learner_type pegasos --loop_type stochastic
--lambda 0.1 --iterations 100000 --dimensionality 200 --training_file train.txt
--model_out debug-model.txt
3. debug-model.txt has:
-5.01486 -0.169397 -10.0628 -10.0518 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0\
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
The the model should spit out 201 terms, the first being the bias term. Instead
it spits out 200, and clips off the last weight. When I set dimensionality to
201, I get what I would expect:
0.263645 0.561799 -0.509116 -0.382012 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 \
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 \
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0.263645
This was compiled from source a couple weeks ago. The program should probably
crash if you say dimensionality is 200 and there is a "200:x" term in the
sparse vector representation, unless the no-bias flag is set.
Original issue reported on code.google.com by [email protected]
on 26 Feb 2013 at 3:24
Make all_test, then find an error occured white testing sf-sparse-vector_test,
assertion assert(x1.GetGroupId() == "2"); failed at line 27 of file
sf-sparse-vector_test.cc.
Solution. Add a line "group_id_c_string[end - position]=0;" in
sf-sparse-vector.cc line 145. cause string generated by strncpy is not always
'\0' terminated.
Original issue reported on code.google.com by [email protected]
on 5 Nov 2012 at 4:22
For the k-means training, does label (in my case, face label) have an influence
on the clustering?
Original issue reported on code.google.com by [email protected]
on 24 Jun 2013 at 1:01
Hi there.
What steps will reproduce the problem?
./sofia-ml --learner_type pegasos --loop_type stochastic --lambda 0.1
--iterations 10000 --dimensionality 450000 --training_file ../data/m256
--model_out demo/model
What is the expected output?
What do you see instead?
Reading training data from: ../data/final/catted/train/m256
Segmentation fault (core dumped)
What version of the product are you using? On what operating system?
Ubuntu 13.10 64bit
Please provide any additional information below.
I guess it is because my training data (attached) is so sparse that in some
lines all features are zero. Can sofia-ml support such dataset? Thank you!
Original issue reported on code.google.com by [email protected]
on 18 Mar 2014 at 3:34
Attachments:
What steps will reproduce the problem?
1. Create 2-dimensional data drawn from 2-dim multivariate Gaussian
distributions with different means variance = 1. e.g 21 different
distributions, lets say 1000 draws. Total at 21.000 points. (have tried many
different variations and does not have any positive effect on the reported
issue)
2. Train sofia-kmeans with any batch size (tested 500:500:5000) and with any
number of k clusters (tested 64 128 256) using mini_batch_kmeans with fixed
random seed.
command line: sofia-kmeans --k 64 --dimensionality 3 --random_seed 124
--init_type random --opt_type mini_batch_kmeans --mini_batch_size 500
--iterations 10 --objective_after_init --objective_after_training
--training_file traindatafile.svmlight --model_out modelfile.sofia
3. Calculate the training error
command line: sofia-kmeans --model_in modelfile.sofia --test_file
traindatafile.svmlight --objective_on_test --cluster_assignments_out
trainingassignments.sofia
4. run this in a loop as a function of number of iterations. i ran [1 10 100e3
500e3 and 1000e3]
What is the expected output? What do you see instead?
I expect that the training error would fall as a function of number of
iterations used. Since it has fixed seed the random initialization is the same.
This occurs until 100e3 then it start to diverge. i.e. the training error
starts increasing dramatically. The training error becomes even larger than the
random initialization. This is very puzzling to me.
What version of the product are you using? On what operating system?
svn checkout http://sofia-ml.googlecode.com/svn/trunk/sofia-ml
sofia-ml-read-only
performed 10/3-2015
OS: Ubuntu 14.04
Please provide any additional information below.
Attached is the commands and output from sofia-kmeans (sofia_kmeans.txt) and
furthermore all model, assignment and datafiles are provided to reproduce these
finding (tmp.zip)
Original issue reported on code.google.com by [email protected]
on 11 Mar 2015 at 12:36
Attachments:
in sofia-ml.cc
337 float objective = sofia_ml::SvmObjective(training_data,
338 *w,
339 CMD_LINE_BOOLS["--lambda"]);
Note that lambda is passed in from CMD_LINE_BOOLS not CMD_LINE_FLOATS which
results in lambda=0. In TrainModel the correct value of lambda is used:
176 float lambda = CMD_LINE_FLOATS["--lambda"];
Original issue reported on code.google.com by [email protected]
on 9 May 2013 at 1:20
Download & build, run the demo commands adding --hash_mask_bits to the
arguments. Training proceeds fine, but testing of the model gives the malloc
error:
$ ./sofia-ml --learner_type pegasos --loop_type stochastic --lambda 0.1
--iterations 100000 --dimensionality 150000 --training_file demo/demo.train
--model_out demo/model --hash_mask_bits 8
hash_mask_ 255
Reading training data from: demo/demo.train
Time to read training data: 0.061278
Time to complete training: 52.3639
Writing model to: demo/model
Done.
$ ./sofia-ml --model_in demo/model --test_file demo/demo.train --results_file
demo/results.txt --hash_mask_bits 8
hash_mask_ 255
sofia-ml(6235) malloc: *** error for object 0x800000: pointer being freed was
not allocated
*** set a breakpoint in malloc_error_break to debug
Reading model from: demo/model
Done.
Reading test data from: demo/demo.train
Time to read test data: 0.06114
Time to make test prediction results: 0.008274
Writing test results to: demo/results.txt
Done.
========
$ g++ --version
i686-apple-darwin10-g++-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5659)
Original issue reported on code.google.com by [email protected]
on 18 Jun 2010 at 6:43
Is there any example in sofia-ml for multilabel classification?
Original issue reported on code.google.com by [email protected]
on 20 Jan 2015 at 5:34
What steps will reproduce the problem?
1. make all_tests
What is the expected output?
PASS.
What do you see instead?
sf-weight-vector_test: sf-weight-vector_test.cc:95: int main(int, char**):
Assertion `w_6.ValueOf(3) == 1' failed.
What version of the product are you using? On what operating system?
Latest sophia-ml from svn, Debian 5, GCC 4.3.2.
Original issue reported on code.google.com by [email protected]
on 14 Feb 2010 at 3:22
This effects large ids.
The issue is in cluster-src/sofia-kmeans.cc
The solution diff is:
345c345
< << test_data->VectorAt(i).GetY() << std::endl;
---
> << (int)test_data->VectorAt(i).GetY() << std::endl;
Original issue reported on code.google.com by [email protected]
on 6 Mar 2013 at 2:37
What steps will reproduce the problem?
cd "sofia-ml-read-only"
./sofia-kmeans --k 100 --init_type random --opt_type mini_batch_kmeans
--mini_batch_size 100 --iterations 1000 --cluster_mapping_type rbf_kernel
--test_file <test file location goes here> --cluster_mapping_out <cluster
mapping output location goes here>
What is the expected output? What do you see instead?
The expected output is a cluster mapping text file. Instead, I see:
cd "sofia-ml-read-only"
sofia-kmeans: sf-cluster-centers.cc:93: float
SfClusterCenters::SqDistanceToClosestCenter(const SfSparseVector&, int*) const:
Assertion `!cluster_centers_.empty()' failed.
What version of the product are you using? On what operating system?
I don't know where to find the product version. The most recent version is the
one I have been using.
Operating system: Ubuntu 12.04.5 LTS
Please provide any additional information below.
N/A
Original issue reported on code.google.com by [email protected]
on 23 Sep 2014 at 6:46
What steps will reproduce the problem?
1. run make with gcc version 4.4.3 20100127 (Red Hat 4.4.3-4) (GCC)
What is the expected output? What do you see instead?
a proper build
What version of the product are you using? On what operating system?
trunk on 2010-03-30 15:52
Please provide any additional information below.
gcc output:
:sofia-ml-read-only/src$ make
g++ -O3 -lm -Wall -o sofia-ml sofia-ml.cc sofia-ml-methods.cc
sf-weight-vector.cc sf-sparse-vector.cc sf-data-set.cc
sf-hash-weight-vector.cc sf-hash-inline.cc
sf-sparse-vector.cc: In member function âvoid SfSparseVector::Init(const
char*)â:
sf-sparse-vector.cc:132: error: âsscanfâ was not declared in this scope
sf-hash-weight-vector.cc: In constructor
âSfHashWeightVector::SfHashWeightVector(int)â:
sf-hash-weight-vector.cc:40: error: âexitâ was not declared in this scope
sf-hash-weight-vector.cc: In constructor
âSfHashWeightVector::SfHashWeightVector(int, const std::string&)â:
sf-hash-weight-vector.cc:54: error: âexitâ was not declared in this scope
sf-hash-weight-vector.cc: In member function âvirtual void
SfHashWeightVector::AddVector(const SfSparseVector&, float)â:
sf-hash-weight-vector.cc:96: error: âexitâ was not declared in this scope
sf-hash-weight-vector.cc:111: error: âexitâ was not declared in this scope
make: *** [sofia-ml] Error 1
Original issue reported on code.google.com by [email protected]
on 30 Mar 2010 at 1:55
There is a problem with the source code. Many files forget to include standard
libraries, and some of the assertions in the Unit tests fail.
What steps will reproduce the problem?
1. Follow the instructions on https://code.google.com/p/sofia-ml/
2. Run make all_test in src/
What is the expected output? What do you see instead?
I see lots of compile time errors.
What version of the product are you using? On what operating system?
Ubuntu 14.04, G++ 4.7, sofia-ml
Please provide any additional information below.
The following updates fixed everything for me:
sf-sparse-vector_test.cc
l27 //assert(x1.GetGroupId() == "2");
l75 //assert(x6.GetGroupId() == "3");
simple-cmd-line-helper.h
l68 #include <cstdlib>
l69 #include <stdio.h>
sofia-ml-methods_test.cc
l19 #include <cstdlib>
Original issue reported on code.google.com by [email protected]
on 16 Jul 2014 at 4:55
What steps will reproduce the problem?
1. Grab source from SVN.
2. cd cluster-src/
3. make all_test
What is the expected output? What do you see instead?
Test fails with:
sf-kmeans-methods_test: sf-kmeans-methods_test.cc:50: int main(int, char**):
Assertion `cluster_centers_3->ClusterCenter(0).ValueOf(1) == 1.0' failed.
Adding some debug just before the assert failure resulting in:
cluster_centers_3->ClusterCenter(0).ValueOf(1) : 0 (should be 1.0)
What version of the product are you using? On what operating system?
SVN version:
r25 | [email protected] | 2010-04-28 04:52:54 +1000 (Wed, 28 Apr 2010) | 1 line
Running on x86_64 linux with gcc version 4.7.2 (Debian 4.7.2-5).
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 17 Feb 2013 at 12:28
What steps will reproduce the problem?
Follow the instructions in the README "Quick Start" section and on (on Ubuntu
14.04)
1. svn checkout http://sofia-ml.googlecode.com/svn/trunk/ sofia-ml-read-only
2. cd sofia-ml-read-only/src
3. make clean
4. make all_test
What is the expected output? What do you see instead?
Expecting success in all tests
Seeing instead:
Test is failing immediately with:
g++ -O3 -lm -Wall -o sf-sparse-vector_test sf-sparse-vector_test.cc
sf-sparse-vector.cc
./sf-sparse-vector_test
sf-sparse-vector_test: sf-sparse-vector_test.cc:27: int main(int, char**):
Assertion `x1.GetGroupId() == "2"' failed.
make: *** [sf-sparse-vector_test] Aborted (core dumped)
make: *** Deleting file `sf-sparse-vector_test'
What version of the product are you using? On what operating system?
Latest source:
r31 | [email protected] | 2010-07-26 14:17:11 -0700 (Mon, 26 Jul 2010) | 1 line
On Ubuntu 14.04 (LTS)
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 4 May 2015 at 9:22
What steps will reproduce the problem?
1. make src folder with gcc version 4.3
Adding ...
#include <cstring>
#include <cstdlib>
to the top of sf-sparse-vector.cc file fixed this problem for me.
Go Jumbos.
Original issue reported on code.google.com by [email protected]
on 24 Jan 2010 at 11:42
Hi,
I have changed my training data into sparse data format you mentioned.
./sofia-kmeans --k 1000 --init_type random --opt_type batch_kmeans --iterations
1000 --objective_after_init --training_file demo/SMLFAutoTrain1s512val.txt
--model_out demo/CSMLFAutoTrain1s512val.txt
However, I am getting the following errors:
Reading data from: demo/SMLFAutoTrain1s512val.txt
Error reading file demo/SMLFAutoTrain1s512val.txt
I opened your demo.train, I saw that you have square box at the end of every
vector. How can I changed my data format to yours since the square box at the
end may not be the only one? I tried to fetch your demo.train file in matlab,
and it doesn't let me do that either.
For the example of kmeans:
> ./sofia-kmeans --k 5 --init_type random --opt_type mini_batch_kmeans
--mini_batch_size 100 --iterations 500 --objective_after_init
--objective_after_training --training_file demo/demo.train --model_out
demo/clusters.txt
the above command will return the five centroid location, right?
In this case, since only producing the 5 cluster center location, the class
label in the training data (demo.train) can be assigned with any values, right?
Of course, I chose, say, all 1 among these values: 1,0,-1.
I look forward to your clarification.
Thank you,
Fred
Original issue reported on code.google.com by [email protected]
on 23 Sep 2011 at 3:56
Attachments:
I noticed in the demo that all the features have a value of "1". Does sofia-ml
support and/or make use of higher integer values (like for # of times a word is
seen in a document) or for floating point numbers?
Original issue reported on code.google.com by [email protected]
on 25 Feb 2013 at 4:34
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.