Comments (8)
The main issue is that you have too few data. If -b 1 is used, which is for probabilistic outputs, internally we conduct a cross validation process. Thus there is some randomness. To have deterministic results, either fix the seed or, if prob outputs not needed, remove -b 1
from libsvm.
@cjlin1 thank you very much for your reply.
In my case I need probabilistic results, so option -b 1
is mandatory.
Yes, I could change the seed to make this particular example work, but in general I cannot consider this a solution, since for other input data I could fall into the same low-accuracy problem.
Why do you think that constraining the sigmoid function to be always increasing (considered the fact that we expect high probabilities for samples labeled +1 and low probabilities for those labeled -1) could not be a solution? Are there any drawbacks in constraining someway probA < 0
inside sigmoid_train
function?
from libsvm.
I think we do have that sigmoid is always increasing. So I don't understand your question
from libsvm.
I'm sorry: I probably made some confusion with the sign of probA
(I edited the above messages to fix them). I try to explain me better with other words.
Consider the input data attached to my first message. Working on this data I experimented that by setting (at startup) the random generator seed to some integer number, normally (i.e., for about the 96% of these seeds) the resulting trained probabilistic model has 100% accuracy. However, for some seeds (only about 3,5%; an example is srand(42)
on my machine) the trained model has accuracy of 0%.
I noticed that 0% accuracy models have a positive value for probA
, while 100% accuracy models have a negative one. The sigmoid function is defined as SF(x) = 1/(1+exp(probA*x+probB))
, where x
is the decision value. I stated that 0% accuracy models are associated to a decreasing sigmoid function because for x
that tends to +infinity, SF(x)
tends to 1 if probA < 0
and to 0 if probA > 0
.
Considering the fact that we expect high probabilities (i.e., high values of SF(x)
) for samples labeled +1 and low probabilities for those labeled -1, I suspect that there is room for an improvement if we constrain probA
to be always lower than 0.
I attach the adaptation of svm-train.c that I used to make the experiments, hoping it can help.
from libsvm.
from libsvm.
Yes, you are correct.
By simply replicating the input data 10 times, the input file becomes like this; with this input all the 1000 experimented seeds lead to a 100% accuracy model.
I think, however, that there is no reason for not trying to directly improve the algorithm in order to make it work better also for lower-cardinality datasets, as is often the case.
from libsvm.
I tried to impose the constraint probA < 0
by adding the line
newA = newA > -eps ? -2 * eps - newA : newA;
immediately after the
newA = A + stepsize * dA;
in the backtracking loop of sigmoid_train
function. Furthermore, I set the initial value to A = 1
instead of A = 0
.
In practice, I implemented the constraint by reflecting, at every iteration, the point (A, B) of the parameters' search space around the line A = -eps.
With these changes even if the random seed choice is unfortunate, the accuracies of the trained models never fall below 50%. This happens because in the worst case the samples are classified with a very flat sigmoid (when A is near 0) all into the same class; but, unlike before, it cannot happen that the classification is opposite to the labeling.
Are there any disadvantages I didn't foresee in these modifications?
from libsvm.
from libsvm.
Related Issues (20)
- Linux wheel support in github workflow HOT 6
- PyPi package confusion HOT 1
- Include license information in Python package HOT 1
- with linux is ok, but Windows or Mac print this, why? I'm sure shrinking = 1 HOT 1
- How can I get decision or w and b,i am a linear kernel ,with C++ Code HOT 1
- How to use the saved trained svm model file manually HOT 1
- how can i improve train speed? HOT 1
- why my memory always HOT 1
- Some dataset problem HOT 2
- static library HOT 1
- gridregression.py
- LIBSVM library not found HOT 1
- gridregression.py 出现 RuntimeError: get no mse HOT 2
- Exception Thrown: read access violation HOT 1
- 使用svm-train.exe做回归(regression),参数c与g如何优化 HOT 1
- How to modify SVR loss function? HOT 1
- one class svm HOT 1
- Questions about the results of libSVM HOT 6
- Questions about probability estimation of one-class svm HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from libsvm.