Hello, I have dataset with 90k sentences with 7 tags and I am trying

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

1 - You can try the softmax loss instead of ranking loss, although <a class="user-ment

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Multilabel training Failed about starspace HOT 9 CLOSED

facebookresearch commented on June 26, 2024

Multilabel training Failed

from starspace.

Comments (9)

jaseweston commented on June 26, 2024

The test program "starspace test" assumes there is only one label in the evaluation code (see https://github.com/facebookresearch/StarSpace/blob/master/src/starspace.cpp#L330). So that's not the right thing to use for you.
However, the training should work for the multilabel tasks fine, and you should be able to use it e.g. with the utility "query_predict" which shows the top K predictions.
I'm not sure why predictions would be totally wrong, in your example, they appear correct (tag1 and 2 are at the top)? I guess you need to check first if the training set is correct, and then whether it generalizes to test.. Note that the program doesn't output probabilities, it uses ranking loss by default.

from starspace.

agemagician commented on June 26, 2024

Hello @jaseweston ,

Thanks a lot for your reply.

Actually, I thought the model output a probability (0 .... 1), that is why I thought the output is totally wrong. Because the test set is the training set. I just wanted to check if the model output is correct.

I have 2 more questions to follow up:
1- Can the model provide a propability output (0...1)?
Because I don't want to get the top K labels. They are only 5 labels and I want to know which labels out of the 5 should be assigned to each sentence.
2- After training the model, how I can use it for prediction?
I only found command train and test but I couldn't find a command for prediction.

from starspace.

jaseweston commented on June 26, 2024

1 - You can try the softmax loss instead of ranking loss, although @ledw could say more
Another approach would be train a second classifier that predicts the number of labels to predict, and then take the top ranked N, where N is the prediction of that second classifier. No idea how well this will work, but pretty simple to try.

2 - In test mode there is a flag "predictionFile file path for save predictions. If not empty, top K predictions for each example will be saved." which will save predictions.
You can also use/adapt the utility query_predict described near the end of the README

from starspace.

agemagician commented on June 26, 2024

Thanks a lot for your reply, and I have already tested the dataset using softmax.
However, the accuracy is lower than GRU and/OR CNN with pertained embedding, and the loss is much higher of course.

Using the same dataset with GRU and/OR CNN with pertained embedding, I get a loss less than 0.04, However, using StarSpace the loss is more than 2.0. Even after I left it to be trained for many epochs.

I have to mention that the training set has imbalance issue, so almost 90% of dataset is assigned to one label, while the rest 10% is assigned to the rest 6 labels.

from starspace.

jaseweston commented on June 26, 2024

What's the error like with Starspace ranking loss? what is the loss exactly? are you sure they are the same? imbalance can be changed by simply repeating some of the examples in the training set, or else changing the weights

from starspace.

agemagician commented on June 26, 2024

I use the same file for both training and then testing, to make sure if the accuracy is correct or not.

Using "hinge loss", here is the result for training:

Arguments: 
lr: 0.01
dim: 10
epoch: 5
maxTrainTime: 8640000
saveEveryEpoch: 0
loss: hinge
margin: 0.05
similarity: cosine
maxNegSamples: 10
negSearchLimit: 50
thread: 10
minCount: 1
minCountLabel: 1
label: #
ngrams: 1
bucket: 2000000
adagrad: 1
trainMode: 0
fileFormat: fastText
normalizeText: 0
dropoutLHS: 0
dropoutRHS: 0
Start to initialize starspace model.
Build dict from input file : starspace_input_v2.txt
Read 8M words
Number of words in dictionary:  175273
Number of labels in dictionary: 7
Loading data from file : starspace_input_v2.txt
Total number of examples loaded : 95846
Training epoch 0: 0.01 0.002
Epoch: 100.0%  lr: 0.008000  loss: 0.003669  eta: 0h1m  tot: 0h0m24s  (20.0%)
 ---+++                Epoch    0 Train error : 0.00369100 +++--- ☃
Training epoch 1: 0.008 0.002
Epoch: 100.0%  lr: 0.006021  loss: 0.002374  eta: 0h1m  tot: 0h0m48s  (40.0%)
 ---+++                Epoch    1 Train error : 0.00236855 +++--- ☃
Training epoch 2: 0.006 0.002
Epoch: 100.0%  lr: 0.004000  loss: 0.002302  eta: <1min   tot: 0h1m13s  (60.0%)
 ---+++                Epoch    2 Train error : 0.00232403 +++--- ☃
Training epoch 3: 0.004 0.002
Epoch: 100.0%  lr: 0.002000  loss: 0.001946  eta: <1min   tot: 0h1m38s  (80.0%)
 ---+++                Epoch    3 Train error : 0.00194239 +++--- ☃
Training epoch 4: 0.002 0.002
Epoch: 100.0%  lr: -0.000000  loss: 0.001748  eta: <1min   tot: 0h2m1s  (100.0%)
 ---+++                Epoch    4 Train error : 0.00176628 +++--- ☃

During test:

lr: 0.01
dim: 10
epoch: 5
maxTrainTime: 8640000
saveEveryEpoch: 0
loss: hinge
margin: 0.05
similarity: cosine
maxNegSamples: 10
negSearchLimit: 50
thread: 10
minCount: 1
minCountLabel: 1
label: #
ngrams: 1
bucket: 2000000
adagrad: 1
trainMode: 0
fileFormat: fastText
normalizeText: 0
dropoutLHS: 0
dropoutRHS: 0
Start to load a trained starspace model.
STARSPACE-2017-2
Model loaded.
Loading data from file : starspace_input_v2.txt
Total number of examples loaded : 95846
------Loaded model args:
Arguments: 
lr: 0.01
dim: 10
epoch: 5
maxTrainTime: 8640000
saveEveryEpoch: 0
loss: hinge
margin: 0.05
similarity: cosine
maxNegSamples: 10
negSearchLimit: 50
thread: 10
minCount: 1
minCountLabel: 1
label: #
ngrams: 1
bucket: 2000000
adagrad: 1
trainMode: 0
fileFormat: fastText
normalizeText: 0
dropoutLHS: 0
dropoutRHS: 0
Predictions use 7 known labels.
Evaluation Metrics : 
hit@1: 0.895614 hit@10: 1 hit@20: 1 hit@50: 1 mean ranks : 1.34728 Total examples : 95846

Here is part of the output file:

Example 78:
LHS:
Some text .....
RHS:
#tag1
Predictions:
(++) [0.807774] #tag1
(--) [0.755205] #tag2
(--) [0.755042] #tag3
(--) [0.754421] #tag4
(--) [0.751218] #tag5
(--) [0.748473] #tag6
(--) [0.744363] #tag7

Example 79:
LHS:
Some text.....
RHS:
#tag2
Predictions:
(++) [-0.434155]        #tag1
(--) [-0.435466]        #tag2
(--) [-0.444126]        #tag3
(--) [-0.444145]        #tag4
(--) [-0.444961]        #tag5
(--) [-0.454735]        #tag6
(--) [-0.503305]        #tag7

Since, it is a multi label problem, I need the result values between 0 and 1 to get the probability if each sentence is assigned to one or more tags. I cannot use the hinge loss.
According to your advice I used the softmax.

Using "softmax loss", here is the result for training:

Arguments: 
lr: 0.01
dim: 10
epoch: 5
maxTrainTime: 8640000
saveEveryEpoch: 0
loss: softmax
margin: 0.05
similarity: cosine
maxNegSamples: 10
negSearchLimit: 50
thread: 10
minCount: 1
minCountLabel: 1
label: #
ngrams: 1
bucket: 2000000
adagrad: 1
trainMode: 0
fileFormat: fastText
normalizeText: 0
dropoutLHS: 0
dropoutRHS: 0
Start to initialize starspace model.
Build dict from input file : starspace_input_v2.txt
Read 8M words
Number of words in dictionary:  175273
Number of labels in dictionary: 7
Loading data from file : starspace_input_v2.txt
Total number of examples loaded : 95846
Training epoch 0: 0.01 0.002
Epoch: 100.0%  lr: 0.008000  loss: 2.422373  eta: 0h1m  tot: 0h0m26s  (20.0%)
 ---+++                Epoch    0 Train error : 2.42903161 +++--- ☃
Training epoch 1: 0.008 0.002
Epoch: 100.0%  lr: 0.006063  loss: 2.425689  eta: 0h1m  tot: 0h0m53s  (40.0%)
 ---+++                Epoch    1 Train error : 2.42404556 +++--- ☃
Training epoch 2: 0.006 0.002
Epoch: 100.0%  lr: 0.004000  loss: 2.423630  eta: <1min   tot: 0h1m20s  (60.0%)
 ---+++                Epoch    2 Train error : 2.42313027 +++--- ☃
Training epoch 3: 0.004 0.002
Epoch: 100.0%  lr: 0.002021  loss: 2.434748  eta: <1min   tot: 0h1m48s  (80.0%)
 ---+++                Epoch    3 Train error : 2.42263532 +++--- ☃
Training epoch 4: 0.002 0.002
Epoch: 100.0%  lr: 0.000042  loss: 2.429571  eta: <1min   tot: 0h2m16s  (100.0%)
 ---+++                Epoch    4 Train error : 2.42253876 +++--- ☃

During test:

Arguments: 
lr: 0.01
dim: 10
epoch: 5
maxTrainTime: 8640000
saveEveryEpoch: 0
loss: softmax
margin: 0.05
similarity: cosine
maxNegSamples: 10
negSearchLimit: 50
thread: 10
minCount: 1
minCountLabel: 1
label: #
ngrams: 1
bucket: 2000000
adagrad: 1
trainMode: 0
fileFormat: fastText
normalizeText: 0
dropoutLHS: 0
dropoutRHS: 0
Start to load a trained starspace model.
STARSPACE-2017-2
Model loaded.
Loading data from file : starspace_input_v2.txt
Total number of examples loaded : 95846
------Loaded model args:
Arguments: 
lr: 0.01
dim: 10
epoch: 5
maxTrainTime: 8640000
saveEveryEpoch: 0
loss: softmax
margin: 0.05
similarity: cosine
maxNegSamples: 10
negSearchLimit: 50
thread: 10
minCount: 1
minCountLabel: 1
label: #
ngrams: 1
bucket: 2000000
adagrad: 1
trainMode: 0
fileFormat: fastText
normalizeText: 0
dropoutLHS: 0
dropoutRHS: 0
Predictions use 7 known labels.
Evaluation Metrics : 
hit@1: 0.897878 hit@10: 1 hit@20: 1 hit@50: 1 mean ranks : 1.42807 Total examples : 95846

Here is part of the output file:

Example 78:
LHS:
Some text....
RHS:
#tag1
Predictions:
(++) [0.999999] #tag1
(--) [-0.999964]        #tag2
(--) [-0.999968]        #tag3
(--) [-0.999978]        #tag4
(--) [-0.999979]        #tag5
(--) [-0.999991]        #tag6
(--) [-0.999998]        #tag7

Example 79:
LHS:
Some text ....
RHS:
#tag2
Predictions:
(--) [0.999999] #tag1
(--) [-0.999963]        #tag3
(--) [-0.999967]        #tag4
(++) [-0.999979]        #tag2
(--) [-0.99998] #tag5
(--) [-0.999991]        #tag6
(--) [-0.999998]        #tag7

As you can see the error using softmax is grater than 2 however, when I use cnn or lstm. I usually get error rate around 0.04 or less.
I am not sure why the algorithm in the case of softmax just select always tag1. Maybe because 90% of the data was tagged with tag1 and the algorithm Cann't handle imbalance data, or maybe it does softmax on all the outputs and the sum of the probability is one across all labels.

Do you have any idea or solution ?

from starspace.

ledw commented on June 26, 2024

@agemagician the outputs in prediction is not a probability distribution on the labels, rather, it is the similarity distance between the two entities (sentence and tag). To obtain a probability distribution, you can add softmax to the predictions to make the values in (0, 1).
In the case that you use softmax as the loss function, I'd suggest to use 'dot' as similarity function instead of 'cosine'. For the absolute value of the loss (greater than 2), I would not worry about it, as the way it's calculated could be different than in the cnn or lstm case.
Finally, for imbalanced data, you can try to repeat examples or different weights for other labels, as @jaseweston suggested. Please refer to README on how to use weighted examples.

from starspace.

ledw commented on June 26, 2024

@agemagician any updates on this? Thanks.

from starspace.

ledw commented on June 26, 2024

Closing the issue as there's no updates.

from starspace.

Multilabel training Failed about starspace HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent