Git Product home page Git Product logo

Comments (8)

cjlin1 avatar cjlin1 commented on July 18, 2024

The main issue is that you have too few data. If -b 1 is used, which is for probabilistic outputs, internally we conduct a cross validation process. Thus there is some randomness. To have deterministic results, either fix the seed or, if prob outputs not needed, remove -b 1

from libsvm.

giulio-datamind avatar giulio-datamind commented on July 18, 2024

@cjlin1 thank you very much for your reply.

In my case I need probabilistic results, so option -b 1 is mandatory.

Yes, I could change the seed to make this particular example work, but in general I cannot consider this a solution, since for other input data I could fall into the same low-accuracy problem.

Why do you think that constraining the sigmoid function to be always increasing (considered the fact that we expect high probabilities for samples labeled +1 and low probabilities for those labeled -1) could not be a solution? Are there any drawbacks in constraining someway probA < 0 inside sigmoid_train function?

from libsvm.

cjlin1 avatar cjlin1 commented on July 18, 2024

I think we do have that sigmoid is always increasing. So I don't understand your question

from libsvm.

giulio-datamind avatar giulio-datamind commented on July 18, 2024

I'm sorry: I probably made some confusion with the sign of probA (I edited the above messages to fix them). I try to explain me better with other words.

Consider the input data attached to my first message. Working on this data I experimented that by setting (at startup) the random generator seed to some integer number, normally (i.e., for about the 96% of these seeds) the resulting trained probabilistic model has 100% accuracy. However, for some seeds (only about 3,5%; an example is srand(42) on my machine) the trained model has accuracy of 0%.

I noticed that 0% accuracy models have a positive value for probA, while 100% accuracy models have a negative one. The sigmoid function is defined as SF(x) = 1/(1+exp(probA*x+probB)), where x is the decision value. I stated that 0% accuracy models are associated to a decreasing sigmoid function because for x that tends to +infinity, SF(x) tends to 1 if probA < 0 and to 0 if probA > 0.

Considering the fact that we expect high probabilities (i.e., high values of SF(x)) for samples labeled +1 and low probabilities for those labeled -1, I suspect that there is room for an improvement if we constrain probA to be always lower than 0.

I attach the adaptation of svm-train.c that I used to make the experiments, hoping it can help.

from libsvm.

cjlin1 avatar cjlin1 commented on July 18, 2024

from libsvm.

giulio-datamind avatar giulio-datamind commented on July 18, 2024

Yes, you are correct.

By simply replicating the input data 10 times, the input file becomes like this; with this input all the 1000 experimented seeds lead to a 100% accuracy model.

I think, however, that there is no reason for not trying to directly improve the algorithm in order to make it work better also for lower-cardinality datasets, as is often the case.

from libsvm.

giulio-datamind avatar giulio-datamind commented on July 18, 2024

I tried to impose the constraint probA < 0 by adding the line

newA = newA > -eps ? -2 * eps - newA : newA;

immediately after the

newA = A + stepsize * dA;

in the backtracking loop of sigmoid_train function. Furthermore, I set the initial value to A = 1 instead of A = 0.

In practice, I implemented the constraint by reflecting, at every iteration, the point (A, B) of the parameters' search space around the line A = -eps.

With these changes even if the random seed choice is unfortunate, the accuracies of the trained models never fall below 50%. This happens because in the worst case the samples are classified with a very flat sigmoid (when A is near 0) all into the same class; but, unlike before, it cannot happen that the classification is opposite to the labeling.

Are there any disadvantages I didn't foresee in these modifications?

from libsvm.

cjlin1 avatar cjlin1 commented on July 18, 2024

from libsvm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.