Hi! Thank you for sharing such a good toolkit! Recently, when I looked into the N

Thanks for the question! In general, you should use <code class="not

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Some Questions on ConvLab-Module-NLG-SCLSTM about convlab HOT 4 CLOSED

convlab commented on June 1, 2024

Some Questions on ConvLab-Module-NLG-SCLSTM

from convlab.

Comments (4)

zqwerty commented on June 1, 2024

Thanks for the question!

In general, you should use ConvLab/convlab/modules/nlg/multiwoz/evaluate.py.

The way we calculate bleu is different from machine translation task. We group the sentence by their dialog act. For example, if Inform-Hotel-Addr dialog act has 3 golden sentences [r1,r2,r3], then for each generated sentence, its reference sentences is [r1,r2,r3].

The differences between ConvLab/convlab/modules/nlg/multiwoz/evaluate.py and ConvLab/convlab/modules/nlg/multiwoz/sc_lstm/bleu.py:

The former replaces value in the sentence with corresponding dialog_act-slot, while the latter uses the delexicalized output (see sclstm.res) of sclstm directly. So the former one can be used for other NLG models.
The former generates one sentence each time(beam_size=1), which is slower compared with using batch.

from convlab.

ToSev7en commented on June 1, 2024

@zqwerty Thank you !

It seems that I understand the way you group the sentence by dialog act. By this way, a system response generated by one dialog act (cause the beam_size=1 as you said ) may have multiple refs.

Comparing to another way that calculates a system response generated by one dialog act with it's only one golden system response, the way used in Convlab may get a higher BLEU4 score, right ?

And one more question: In ConvLab/convlab/modules/nlg/multiwoz/evaluate.py, the SCLSTM model is load from a remote source.

print("Loading", model_name)
if model_name == 'SCLSTM':
    model_sys = SCLSTM(model_file="https://convlab.blob.core.windows.net/models/nlg-sclstm-multiwoz.zip")

And in config.cfg, it shows that the model was only trained and evaluated on Boo_ResDataSplitRand0925.json rather than train.json. Would this situation be a problem?

[DATA]
vocab_file =	%(dir)s/resource/vocab.txt
feat_file =		%(dir)s/resource/feat.json
text_file =		%(dir)s/resource/text.json
template_file =	%(dir)s/resource/template.txt
dataSplit_file= %(dir)s/resource/Boo_ResDataSplitRand0925.json
batch_size = 256
shuffle = true
dir = 

[MODEL]
dec_type = sclstm
hidden_size = 100
dropout = 0.25
clip = 0.5
learning_rate = 0.001

[TRAINING]
model_epoch = best
n_epochs = 75

from convlab.

zqwerty commented on June 1, 2024

Comparing to another way that calculates a system response generated by one dialog act with it's only one golden system response, the way used in Convlab may get a higher BLEU4 score, right ?

Yes.

from convlab.

truthless11 commented on June 1, 2024

And in config.cfg, it shows that the model was only trained and evaluated on Boo_ResDataSplitRand0925.json rather than train.json. Would this situation be a problem?

We migrate SCLSTM from multiwoz benchmark where Boo_ResDataSplitRand0925.json is used and this json file split the dataset into train/valid/test set in the same way as Convlab that uses valListFile.json and testListFile.json in the original dataset.

from convlab.

Some Questions on ConvLab-Module-NLG-SCLSTM about convlab HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent