thu-coai / convlab-2 Goto Github PK
View Code? Open in Web Editor NEWConvLab-2: An Open-Source Toolkit for Building, Evaluating, and Diagnosing Dialogue Systems
License: Apache License 2.0
ConvLab-2: An Open-Source Toolkit for Building, Evaluating, and Diagnosing Dialogue Systems
License: Apache License 2.0
I have a question regarding inform-actions in ConvLab-2:
Consider for instance the action “Attraction-Inform-Area-3”. If I am correct, this will result in informing the area of the THIRD entity in the database query. Is that correct?
If so, what is the idea behind having “Attraction-Inform-Area-1”, “Attraction-Inform-Area-2”, “Attraction-Inform-Area-3”, … instead of only having one action “Attraction-Inform-Area” and selecting an entity from the DB-query randomly? Doesn’t your method only blow up the action space?
Moreover, if you look at the function “lexicalize_da” in /convlab2/util/multiwoz/lexicalize.py and consider the actions “Attraction-Inform-Area-3” and “Attraction-Inform-Phone-2” then for instance.
Wouldn’t that output you the phone-number of the second entity and the area of the third entity? Shouldn’t the area and phone-number be associated to the same entity?
Hey, guys, I am a big fan of Convlab1, and in that platform, it implement the actor critic, ppo and also DQN. Are you guys plan to add these two other RL model to convlab2? If not, is there any convinient way for me to add these RL methods?
Describe the bug
util/multiwoz/dbquery.py:42
The query()
method expects elements of the constraints argument to be mutable.
However, tuples are used as the elements and given to the method call in the ConvLab code MultiWozEvaluator._final_goal_analyze()
.
To Reproduce
The error occurs when Analyzer.comprehensive_analysis()
is called and the generated goal contains specific values, see call stack:
File "/lscratch/kulhajon/873944.stallo-adm.uit.no/source/train.py", line 214, in _run_evaluation
result = analyzer(agent, num_dialogs=self.args.evaluation_dialogs)
File "/lscratch/kulhajon/873944.stallo-adm.uit.no/source/evaluate.py", line 58, in __call__
self.comprehensive_analyze(agent, agent.name, total_dialog=num_dialogs)
File "/home/kulhajon/source/ConvLab-2/convlab2/util/analysis_tool/analyzer.py", line 145, in comprehensive_analyze
task_success = sess.evaluator.task_success()
File "/home/kulhajon/source/ConvLab-2/convlab2/evaluator/multiwoz_eval.py", line 298, in task_success
goal_sess = self.final_goal_analyze()
File "/home/kulhajon/source/ConvLab-2/convlab2/evaluator/multiwoz_eval.py", line 401, in final_goal_analyze
match, mismatch = self._final_goal_analyze()
File "/home/kulhajon/source/ConvLab-2/convlab2/evaluator/multiwoz_eval.py", line 382, in _final_goal_analyze
query_result = self.database.query(domain, constraints)
File "/home/kulhajon/source/ConvLab-2/convlab2/util/multiwoz/dbquery.py", line 34, in query
ele[1] = 'centre'
TypeError: 'tuple' object does not support item assignment
Expected behavior
The method does not throw an error.
Additional context
It's reasonable to use immutable elements for other reasons too.
I've created the Pull Request with suggested fix.
"Our documents are on https://thu-coai.github.io/ConvLab-2_docs/convlab2.html#module-convlab2."
Maybe the url is https://thu-coai.github.io/ConvLab-2_docs/convlab2.html ?
Hi, I want to know how can I train models in ConvLab-2 on my private dataset to build my own task-oriented dialogue system?
Describe the bug
Since I recently started learning how to use this library, I'm aiming for a very simple concrete task, which is training a PPO policy from scratch.
Given my limited resources at the moment I'm using Google Colab Pro, which gives me a Tesla P100-PCIE-16GB for cheap.
The approach I'm following is (it mostly follows the tutorial on Colab, however I'll be extra detailed because the only mistake here could be my way of using the different modules) :
I'm cloning convlab2 github repo and installing locally (I tried both "just in runtime" and also locally when connecting the notebook with a Google Drive folder
After importing all necessary libraries, I create a simple dialogue system as in /ppo/train.py
:
# simple rule DST
dst_sys = RuleDST()
policy_sys = PPO(True)
policy_sys.load(args.load_path)
# not use dst
dst_usr = None
# rule policy
policy_usr = RulePolicy(character='usr')
# assemble
simulator = PipelineAgent(None, None, policy_usr, None, 'user')
evaluator = MultiWozEvaluator()
env = Environment(None, simulator, None, dst_sys, evaluator)
for i in range(args.epoch):
update(env, policy_sys, args.batchsz, i, args.process_num)
The key here is that i'm training for only 100 epochs, however even if I would not expect this trained policy being any good, at least I'd expect being able to generate dialogues.
policy_sys
) is trained, I create a session for testing dialogues. In order to being able to do it, I create the user and system agents (pipelines) using the trained policy for the system agent.# --- system ---
# BERT nlu
sys_nlu = BERTNLU()
# simple rule DST
sys_dst = RuleDST()
# TRAINED PPO POLICY ###
sys_policy = policy_sys
# template NLG
sys_nlg = TemplateNLG(is_user=False)
# assemble
sys_agent = PipelineAgent(sys_nlu, sys_dst, sys_policy, sys_nlg, name='sys')
# --- user ---
# MILU
user_nlu = MILU()
# not use dst
user_dst = None
# rule policy
user_policy = RulePolicy(character='usr')
# template NLG
user_nlg = TemplateNLG(is_user=True)
# assemble
user_agent = PipelineAgent(user_nlu, user_dst, user_policy, user_nlg, name='user')
# --- evaluator and session ---
evaluator = MultiWozEvaluator()
sess = BiSession(sys_agent=sys_agent, user_agent=user_agent, kb_query=None, evaluator=evaluator)
sys_response = ''
sess.init_session()
print('init goal:')
pprint(sess.evaluator.goal)
print('-'*50)
for i in range(20):
sys_response, user_response, session_over, reward = sess.next_turn(sys_response)
print('user:', user_response)
print('sys:', sys_response)
print()
if session_over is True:
break
print('task success:', sess.evaluator.task_success())
print('book rate:', sess.evaluator.book_rate())
print('inform precision/recall/f1:', sess.evaluator.inform_F1())
print('-'*50)
print('final goal:')
pprint(sess.evaluator.goal)
print('='*100)
The error I'm getting is "RuntimeError: CUDA error: device-side assert triggered" apparently from jointBERT library.
I suspect this is due the utterances generated by the system are longer than the MAX_LEN in BERT model?
So the main question here (apart from the obvious one "why is this happening?") would be: is this the right approach for training and testing a RL policy from scratch?
I've seen that in order to improve PPO's performance, some sort of imitation learning helps as pre-training step. However what I'm aiming is not fine-tuning PPO, but simply training it for a few epochs (100, 200, etc) and different hyperparameters and being able to use the analyzer library to assess the model's behaviour (e.g. avg Success Rate) when increasing the number of epochs or changing hyperparameters.
To Reproduce
All the code is here and it runs out of the box, it should reproduce the "CUDA error: device-side assert triggered" error I'm facing
https://colab.research.google.com/drive/1nz73WBKLohohScsZIFjpDJz0y0CG4SRB?usp=sharing
Expected behavior
I expected that building a system agent again and using the previously trained policy would work out of the box within the analyzer tool.
Given the release of MultiWoZ 2.2, it seems like the baselines should all be retrained using the cleanest version of the dataset. Paper: https://www.aclweb.org/anthology/2020.nlp4convai-1.13/
Describe the bug
The following case will be judged as a mismatch in _final_goal_analyze
:
Constraints from final goal:
[('price', '75.10 pounds'), ('arriveBy', '15:45'), ('day', 'friday'), ('departure', 'cambridge'), ('destination', 'birmingham new street')]
self.booked[train]:
{'arriveBy': '07:44', 'day': 'friday', 'departure': 'cambridge', 'destination': 'birmingham new street', 'duration': '163 minutes', 'leaveAt': '05:01', 'price': '75.10 pounds', 'trainID': 'TR9678', 'Ref': '00002092'}
ConvLab-2/convlab2/evaluator/multiwoz_eval.py
Line 390 in db9ddd4
Expected behavior
In train
domain, arriveBy
and leaveAt
in goal may be not equal to item in DB exactly, but it is right.
Excuse me, I just want to know why there exists big bias between the result of DAMD model in Convlab-2 and the original paper?
The success rate is 60.4% in original paper and 33.6% in ConvLab-2.
When i want to run the test files in tests directory, the downloading of pre-trained models always get errors although i have tried many times. Is there other method for downloading?
I want to confirm whether this script is the final evaluation script of cross-lingual DST task. In this script, only slots in semi
are of interest, and slots in book
are not.
ConvLab-2/convlab2/dst/evaluate.py
Line 61 in 50424c3
Describe the feature
For the following goal:
{
"attraction": {
"info": {
"area": "west"
},
"reqt": {
"entrance fee": "?",
"phone": "?"
}
},
"hotel": {
"info": {
"internet": "yes",
"pricerange": "cheap",
"type": "guesthouse"
},
"reqt": {
"phone": "?"
}
},
"taxi": {
"info": {
"arriveBy": "18:30"
},
"reqt": {
"car type": "?",
"phone": "?"
}
}
}
The user simulator said:
I ca n't wait to get there and see some of the local attractions . I am looking for a place to stay . Can you help me ? I also need a place to go in the west .
It is not clear for area
slot value west
Describe the feature
SimpleTOD perform strongly on the MultiWOZ dataset. It can be used in DST, Policy, and end-to-end tasks.
Expected behavior
Can use SimpleTOD as DST, Policy, and end-to-end model.
Additional context
Performance: https://github.com/budzianowski/multiwoz
I don't know why we count mismatch using the constraints of self.goal in line 376-384.
self.goal isn't modified by other variables, it's still same with the initial goal.
Describe the bug
This platform could not train a MLE model.
When I load the MLE model for GDPL, PPO, PG, it could train with no problem, but it never gets to the optimal score(I run evluate.py to see the model). Actually it goes down after few eopchs. And here is a graph I made in GDPL, and PPO is pretty similar to this one.
To Reproduce
Steps to reproduce the behavior:
Simply run train.py in PG/GDPL/PPO, and it will give this issue. I write a script which could evluate all of the models in one dir, and here is the graph I made about GDPL.
Expected behavior
The evluate score should go higher when loaded MLE model.
Score for PPO:
[0.52, 0.53, 0.54, 0.49, 0.44, 0.49, 0.46, 0.47, 0.44, 0.43, 0.42, 0.44, 0.47, 0.48, 0.49, 0.46, 0.45, 0.46, 0.45, 0.48, 0.46, 0.48, 0.49, 0.49, 0.49, 0.48, 0.45, 0.47, 0.43, 0.43, 0.43, 0.42, 0.42, 0.41, 0.42, 0.43, 0.44, 0.47, 0.45, 0.43]
So, the max of PPO is just goes to 0.53, not like 0.74.
Describe the bug
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [40,0,0], thread: [0,0,0] Assertion val >= zero
failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [40,0,0], thread: [1,0,0] Assertion val >= zero
failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [9,0,0], thread: [0,0,0] Assertion val >= zero
failed.
This will happen when runing train.py in convlab2/policy/pg, and this will happen after 15 epoch even I have already loaded the MLE model.
Describe the bug
In the last update, it was included the package fuzzywuzzy
in file convlab2/util/multiwoz/dbquery.py
.
We need to add it in setup to install this package when running pip install -e
command
Describe the feature
I think it would be a good idea to have more implementations of policies
, networks
and memory
(all of this classes are implemented in convlab2/policy/rlmodule.py
) such as ReplayMemory
, PrioritizedReplay
, CNN
, etc.
Most of these classes are implementend in Convlab.
Describe the bug
sess.user_agent.get_in_da
in BERTNLU always output Parking-none
and Internet-none
, it lead to a decrease in our recall and cause task fail
Expected behavior
parking
and internet
should be yes
or no
I used set_seed(20200202)
in test_end2end.py
, but I got different frequency of domain. Why?
Can't random seeds fix user utterance?
Describe the feature
I notice that the vhus in convlab2 only implement hus and vhus, without goal regularization. In the original paper, this part plays an important role. So I wonder will you implement this?
In fact, I try to finish this part as well, but I get puzzled about the function BOW, do you have any idea how to calculate this part? Especially BOW(Ut)
Thanks for this great tool. I have a few questions about the scoring metric.
The benchmark table uses "Complete rate", "Success rate", "Book rate", and "Inform P/R/F" as metrics. While in many recent papers, people use "Inform (%)", "Success (%)", "BLEU", and "Combined score" (a combined score of the previous three metrics with different weight). Can I assume "Success rate" = "Success (%)", "Inform F1" = "Inform (%)"? And it would be nice if you can add "BLEU" and "Combined score", just to align with others' work.
In the benchmark table, DAMB achieves 33.6 success rate, 57.4 inform F1, but in the original paper (https://arxiv.org/pdf/1911.10484.pdf), DAMB achieves 60.4 success (%), 76.3 inform (%). Why there is a huge difference?
Describe the feature
Are training curves saved anywhere? How can I see them after training a model?
I'm currently wrapping the policy to retrieve them, but perhaps they are being saved somewhere I'm not aware of.
When I try to run the evaluate command of “Translation-train SUMBT for cross-lingual DST”, I foud some code problems.
DOWNLOAD_DIRECTORY = os.path.join(SUMBT_PATH, 'pre-trianed')
value = value_trans.get(value, value)
value_trans.get(value, value)
should be trans_value(value)
DOWNLOAD_DIRECTORY = os.path.join(SUMBT_PATH, 'pre-trianed')
Hi! When I tried to evaluate the translation training SUMBT model, I found that the eval mode
was not set, which had a certain impact on the results. According to the results of my local test, I found that there is a difference of two points on the MutliWOZ-zh human-val
dataset. I think it may be necessary to re-evaluate the SUMBT model after modifying the code. The current results
are not real model performance.
My Local Result on MultiWOZ-zh
not set eval mode
{'Joint Acc': 0.4821722435545804, 'Turn Acc': 0.9738983360760534, 'Joint F1': 0.8826705748001639}
set eval mode
{'Joint Acc': 0.49972572682391664, 'Turn Acc': 0.9751935149631128, 'Joint F1': 0.8885012208542876}
Describe the bug
Not quite a bug but I've noticed some policies (like PPO and PG) have an init_session
method that is not implemented, yet it's called by the agent (dialog_agent/agent.py) when the pipeline is build. Is this a non-used method or is there any plan of adding something here?
when I use the interactive tool, I can't find the tool in browser after runing run.py. What's wrong happend, or where should I modify? Can you tell me what should i do? thanks!
Describe the bug
When I try to train the model of GDPL, also I loaded the MLE pretrained model, but the loss and results for evluation is always around 0.26. Below is the problem issue, could you guys help me out? Since GDPL is pretty good, and also I plan to set this as my baseline model.
To Reproduce
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: taxi domain
DEBUG:root:<> epoch 0, loss_real:-0.5383382267836068, loss_gen:-1.5583195904683735
INFO:root:<> epoch 0: saved network to mdl
DEBUG:root:<> weight -3.7587242126464844
DEBUG:root:<> log pi -11.807324409484863
/home/raliegh/视频/convlab2_github_code_theirs/ConvLab-2/convlab2/policy/gdpl/gdpl.py:183: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_grad_norm_.
torch.nn.utils.clip_grad_norm(self.policy.parameters(), 10)
DEBUG:root:<> epoch 0, iteration 0, value, loss 3489.1388260690787
DEBUG:root:<> epoch 0, iteration 0, policy, loss -0.0036238288800967368
DEBUG:root:<> epoch 0, iteration 1, value, loss 3480.9435135690787
DEBUG:root:<> epoch 0, iteration 1, policy, loss -0.09092773252019756
DEBUG:root:<> epoch 0, iteration 2, value, loss 3498.0641061883225
DEBUG:root:<> epoch 0, iteration 2, policy, loss -0.11517706787899921
DEBUG:root:<> epoch 0, iteration 3, value, loss 3488.2195530941613
DEBUG:root:<> epoch 0, iteration 3, policy, loss -0.12360558266702451
DEBUG:root:<> epoch 0, iteration 4, value, loss 3476.682437294408
DEBUG:root:<> epoch 0, iteration 4, policy, loss -0.12722392360630788
INFO:root:<> epoch 0: saved network to mdl
WARNING:root:illegal booking slot: time, slot: attraction domain
DEBUG:root:<> epoch 1, loss_real:-2.1718062476107947, loss_gen:-6.248041303534257
INFO:root:<> epoch 1: saved network to mdl
DEBUG:root:<> weight -9.06725788116455
DEBUG:root:<> log pi -11.601991653442383
DEBUG:root:<> epoch 1, iteration 0, value, loss 1590.3297087016858
DEBUG:root:<> epoch 1, iteration 0, policy, loss -0.0042587477517755405
DEBUG:root:<> epoch 1, iteration 1, value, loss 1590.0544883326481
DEBUG:root:<> epoch 1, iteration 1, policy, loss -0.07637144262461286
DEBUG:root:<> epoch 1, iteration 2, value, loss 1589.7801545795642
DEBUG:root:<> epoch 1, iteration 2, policy, loss -0.09997303185886458
DEBUG:root:<> epoch 1, iteration 3, value, loss 1589.4738512541119
DEBUG:root:<> epoch 1, iteration 3, policy, loss -0.11133970398651927
DEBUG:root:<> epoch 1, iteration 4, value, loss 1589.1489193564967
DEBUG:root:<> epoch 1, iteration 4, policy, loss -0.11775584558123037
INFO:root:<> epoch 1: saved network to mdl
WARNING:root:illegal booking slot: time, domain: hospital
WARNING:root:illegal booking slot: time, slot: attraction domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
DEBUG:root:<> epoch 2, loss_real:-3.781325187948015, loss_gen:-10.217867334683737
INFO:root:<> epoch 2: saved network to mdl
DEBUG:root:<> weight -12.925418853759766
DEBUG:root:<> log pi -12.265064239501953
DEBUG:root:<> epoch 2, iteration 0, value, loss 4830.441213507402
DEBUG:root:<> epoch 2, iteration 0, policy, loss -0.020781385271172775
DEBUG:root:<> epoch 2, iteration 1, value, loss 4839.154656661184
DEBUG:root:<> epoch 2, iteration 1, policy, loss -0.08836260036026176
DEBUG:root:<> epoch 2, iteration 2, value, loss 4831.741853412829
DEBUG:root:<> epoch 2, iteration 2, policy, loss -0.10602868407180435
DEBUG:root:<> epoch 2, iteration 3, value, loss 4824.3883634868425
DEBUG:root:<> epoch 2, iteration 3, policy, loss -0.12300284697036994
DEBUG:root:<> epoch 2, iteration 4, value, loss 4831.304481907895
DEBUG:root:<> epoch 2, iteration 4, policy, loss -0.12597578234578433
INFO:root:<> epoch 2: saved network to mdl
WARNING:root:illegal booking slot: time, domain: attraction
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, domain: hotel
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
DEBUG:root:<> epoch 3, loss_real:-5.254823472764757, loss_gen:-13.987894455591837
INFO:root:<> epoch 3: saved network to mdl
DEBUG:root:<> weight -16.43012809753418
DEBUG:root:<> log pi -11.844439506530762
DEBUG:root:<> epoch 3, iteration 0, value, loss 6681.600123355263
DEBUG:root:<> epoch 3, iteration 0, policy, loss -0.014684114396866215
DEBUG:root:<> epoch 3, iteration 1, value, loss 6697.302657277961
DEBUG:root:<> epoch 3, iteration 1, policy, loss -0.08244152585546927
DEBUG:root:<> epoch 3, iteration 2, value, loss 6687.997532894737
DEBUG:root:<> epoch 3, iteration 2, policy, loss -0.10515823467683635
DEBUG:root:<> epoch 3, iteration 3, value, loss 6690.9089997944075
DEBUG:root:<> epoch 3, iteration 3, policy, loss -0.11676324161357786
DEBUG:root:<> epoch 3, iteration 4, value, loss 6678.3968313116775
DEBUG:root:<> epoch 3, iteration 4, policy, loss -0.12235850389697589
INFO:root:<> epoch 3: saved network to mdl
WARNING:root:illegal booking slot: time, domain: hotel
WARNING:root:illegal booking slot: time, domain: attraction
WARNING:root:illegal booking slot: time, domain: hotel
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: taxi domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: attraction domain
DEBUG:root:<> epoch 4, loss_real:-6.739606408511891, loss_gen:-18.933229109820196
INFO:root:<> epoch 4: saved network to mdl
DEBUG:root:<> weight -21.545021057128906
DEBUG:root:<> log pi -12.236998558044434
DEBUG:root:<> epoch 4, iteration 0, value, loss 16275.491156684027
DEBUG:root:<> epoch 4, iteration 0, policy, loss -0.014838041116793951
DEBUG:root:<> epoch 4, iteration 1, value, loss 16267.9013671875
DEBUG:root:<> epoch 4, iteration 1, policy, loss -0.09151227782583898
DEBUG:root:<> epoch 4, iteration 2, value, loss 16256.190104166666
DEBUG:root:<> epoch 4, iteration 2, policy, loss -0.11655553637279405
DEBUG:root:<> epoch 4, iteration 3, value, loss 16265.713351779514
DEBUG:root:<> epoch 4, iteration 3, policy, loss -0.12722003553062677
DEBUG:root:<> epoch 4, iteration 4, value, loss 16243.192165798611
DEBUG:root:<> epoch 4, iteration 4, policy, loss -0.13666448928415775
INFO:root:<> epoch 4: saved network to mdl
WARNING:root:illegal booking slot: time, domain: hotel
WARNING:root:illegal booking slot: time, domain: hotel
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, domain: taxi
WARNING:root:illegal booking slot: time, slot: taxi domain
WARNING:root:illegal booking slot: time, slot: taxi domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
DEBUG:root:<> epoch 5, loss_real:-7.912413765402401, loss_gen:-22.03384522830739
INFO:root:<> epoch 5: saved network to mdl
DEBUG:root:<> weight -24.468324661254883
DEBUG:root:<> log pi -12.261258125305176
DEBUG:root:<> epoch 5, iteration 0, value, loss 27010.648274739582
DEBUG:root:<> epoch 5, iteration 0, policy, loss -0.013149608030087419
DEBUG:root:<> epoch 5, iteration 1, value, loss 27043.53125
DEBUG:root:<> epoch 5, iteration 1, policy, loss -0.0839987989101145
DEBUG:root:<> epoch 5, iteration 2, value, loss 27066.318250868055
DEBUG:root:<> epoch 5, iteration 2, policy, loss -0.10623834199375576
DEBUG:root:<> epoch 5, iteration 3, value, loss 27043.93825954861
DEBUG:root:<> epoch 5, iteration 3, policy, loss -0.11813025466269916
DEBUG:root:<> epoch 5, iteration 4, value, loss 26953.104600694445
DEBUG:root:<> epoch 5, iteration 4, policy, loss -0.1252221003588703
INFO:root:<> epoch 5: saved network to mdl
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: attraction domain
WARNING:root:illegal booking slot: time, domain: hotel
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, domain: hospital
DEBUG:root:<> epoch 6, loss_real:-9.242614388465881, loss_gen:-24.15580458111233
INFO:root:<> epoch 6: saved network to mdl
DEBUG:root:<> weight -26.42582893371582
DEBUG:root:<> log pi -11.808538436889648
DEBUG:root:<> epoch 6, iteration 0, value, loss 35887.18179481908
DEBUG:root:<> epoch 6, iteration 0, policy, loss -0.020953503682425146
DEBUG:root:<> epoch 6, iteration 1, value, loss 35494.21656558388
DEBUG:root:<> epoch 6, iteration 1, policy, loss -0.08569272891863396
DEBUG:root:<> epoch 6, iteration 2, value, loss 35628.84801603619
DEBUG:root:<> epoch 6, iteration 2, policy, loss -0.10266891509098441
DEBUG:root:<> epoch 6, iteration 3, value, loss 35657.03916529605
DEBUG:root:<> epoch 6, iteration 3, policy, loss -0.11386555943049882
DEBUG:root:<> epoch 6, iteration 4, value, loss 35917.57833059211
DEBUG:root:<> epoch 6, iteration 4, policy, loss -0.11797217848269563
INFO:root:<> epoch 6: saved network to mdl
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: taxi domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
DEBUG:root:<> epoch 7, loss_real:-11.321088128619724, loss_gen:-29.293851852416992
INFO:root:<> epoch 7: saved network to mdl
DEBUG:root:<> weight -32.10945129394531
DEBUG:root:<> log pi -11.713705062866211
DEBUG:root:<> epoch 7, iteration 0, value, loss 44522.42914496528
DEBUG:root:<> epoch 7, iteration 0, policy, loss -0.015966814425256517
DEBUG:root:<> epoch 7, iteration 1, value, loss 44453.58452690972
DEBUG:root:<> epoch 7, iteration 1, policy, loss -0.07723193801939487
DEBUG:root:<> epoch 7, iteration 2, value, loss 44377.24782986111
DEBUG:root:<> epoch 7, iteration 2, policy, loss -0.09828437285290824
DEBUG:root:<> epoch 7, iteration 3, value, loss 44297.86208767361
DEBUG:root:<> epoch 7, iteration 3, policy, loss -0.11189984074897236
DEBUG:root:<> epoch 7, iteration 4, value, loss 44211.8828125
DEBUG:root:<> epoch 7, iteration 4, policy, loss -0.12044301960203382
INFO:root:<> epoch 7: saved network to mdl
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
DEBUG:root:<> epoch 8, loss_real:-14.25956932703654, loss_gen:-33.106894387139214
INFO:root:<> epoch 8: saved network to mdl
DEBUG:root:<> weight -35.563194274902344
DEBUG:root:<> log pi -11.887650489807129
DEBUG:root:<> epoch 8, iteration 0, value, loss 61228.02682976974
DEBUG:root:<> epoch 8, iteration 0, policy, loss -0.019527194384289414
DEBUG:root:<> epoch 8, iteration 1, value, loss 60913.86245888158
DEBUG:root:<> epoch 8, iteration 1, policy, loss -0.08493027012599141
DEBUG:root:<> epoch 8, iteration 2, value, loss 60804.58943256579
DEBUG:root:<> epoch 8, iteration 2, policy, loss -0.10401363087523925
DEBUG:root:<> epoch 8, iteration 3, value, loss 60740.71361019737
DEBUG:root:<> epoch 8, iteration 3, policy, loss -0.11570279148872942
DEBUG:root:<> epoch 8, iteration 4, value, loss 60633.64113898026
DEBUG:root:<> epoch 8, iteration 4, policy, loss -0.12276971943088268
INFO:root:<> epoch 8: saved network to mdl
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
DEBUG:root:<> epoch 9, loss_real:-16.396672407786053, loss_gen:-39.38313462999132
INFO:root:<> epoch 9: saved network to mdl
DEBUG:root:<> weight -42.118408203125
DEBUG:root:<> log pi -11.91506290435791
DEBUG:root:<> epoch 9, iteration 0, value, loss 102404.39268092105
DEBUG:root:<> epoch 9, iteration 0, policy, loss -0.023536940546412217
DEBUG:root:<> epoch 9, iteration 1, value, loss 102286.93421052632
DEBUG:root:<> epoch 9, iteration 1, policy, loss -0.0810224729541101
DEBUG:root:<> epoch 9, iteration 2, value, loss 101849.27960526316
DEBUG:root:<> epoch 9, iteration 2, policy, loss -0.10366031547126017
DEBUG:root:<> epoch 9, iteration 3, value, loss 101598.78638980263
DEBUG:root:<> epoch 9, iteration 3, policy, loss -0.11581830601943166
DEBUG:root:<> epoch 9, iteration 4, value, loss 101350.11461759868
DEBUG:root:<> epoch 9, iteration 4, policy, loss -0.1236358410433719
INFO:root:<> epoch 9: saved network to mdl
WARNING:root:illegal booking slot: time, domain: hotel
WARNING:root:illegal booking slot: time, domain: hotel
WARNING:root:illegal booking slot: time, slot: taxi domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: attraction domain
DEBUG:root:<> epoch 10, loss_real:-17.94006437725491, loss_gen:-41.33010853661431
INFO:root:<> epoch 10: saved network to mdl
DEBUG:root:<> weight -43.82462692260742
DEBUG:root:<> log pi -12.179319381713867
DEBUG:root:<> epoch 10, iteration 0, value, loss 111196.29091282895
DEBUG:root:<> epoch 10, iteration 0, policy, loss -0.015502721169277242
DEBUG:root:<> epoch 10, iteration 1, value, loss 108579.41981907895
DEBUG:root:<> epoch 10, iteration 1, policy, loss -0.08138108037804302
DEBUG:root:<> epoch 10, iteration 2, value, loss 108351.37541118421
DEBUG:root:<> epoch 10, iteration 2, policy, loss -0.10115281825787142
DEBUG:root:<> epoch 10, iteration 3, value, loss 109070.85341282895
DEBUG:root:<> epoch 10, iteration 3, policy, loss -0.10706739313900471
DEBUG:root:<> epoch 10, iteration 4, value, loss 108081.73663651316
DEBUG:root:<> epoch 10, iteration 4, policy, loss -0.11929772833460256
INFO:root:<> epoch 10: saved network to mdl
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
DEBUG:root:<> epoch 11, loss_real:-22.859329329596626, loss_gen:-50.24238416883681
INFO:root:<> epoch 11: saved network to mdl
DEBUG:root:<> weight -53.37864685058594
DEBUG:root:<> log pi -12.136919975280762
DEBUG:root:<> epoch 11, iteration 0, value, loss 201200.13569078947
DEBUG:root:<> epoch 11, iteration 0, policy, loss -0.023343098202818317
DEBUG:root:<> epoch 11, iteration 1, value, loss 195454.23190789475
DEBUG:root:<> epoch 11, iteration 1, policy, loss -0.09736867954856471
DEBUG:root:<> epoch 11, iteration 2, value, loss 199148.953125
DEBUG:root:<> epoch 11, iteration 2, policy, loss -0.10236057227379397
DEBUG:root:<> epoch 11, iteration 3, value, loss 203306.05283717104
DEBUG:root:<> epoch 11, iteration 3, policy, loss -0.10679333225676887
DEBUG:root:<> epoch 11, iteration 4, value, loss 197667.32565789475
DEBUG:root:<> epoch 11, iteration 4, policy, loss -0.12387701702353202
INFO:root:<> epoch 11: saved network to mdl
Thank you guys, have a good day! Appreciate your help.
Describe the bug
When I follow the Installation
step and run tutorials/Getting_Started.ipynb
, It gets into trouble.
To Reproduce
Steps to reproduce the behavior:
conda create -n convlab-2
pip install -e .
jupyter notebook
tutorials/Getting_Started.ipynb
Expected behavior
It will run well for me.
Additional context
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-3-765703729d4c> in <module>
1 # common import: convlab2.$module.$model.$dataset
----> 2 from convlab2.nlu.jointBERT.multiwoz import BERTNLU
3 from convlab2.nlu.milu.multiwoz import MILU
4 from convlab2.dst.rule.multiwoz import RuleDST
5 from convlab2.policy.rule.multiwoz import RulePolicy
~/vscode/dialogue/ConvLab-2/convlab2/nlu/jointBERT/multiwoz/__init__.py in <module>
----> 1 from convlab2.nlu.jointBERT.multiwoz.nlu import BERTNLU
~/vscode/dialogue/ConvLab-2/convlab2/nlu/jointBERT/multiwoz/nlu.py in <module>
7 from convlab2.util.file_util import cached_path
8 from convlab2.nlu.nlu import NLU
----> 9 from convlab2.nlu.jointBERT.dataloader import Dataloader
10 from convlab2.nlu.jointBERT.jointBERT import JointBERT
11 from convlab2.nlu.jointBERT.multiwoz.postprocess import recover_intent
~/vscode/dialogue/ConvLab-2/convlab2/nlu/jointBERT/dataloader.py in <module>
2 import torch
3 import random
----> 4 from transformers import BertTokenizer
5 import math
6 from collections import Counter
~/miniconda3/envs/convlab-2/lib/python3.7/site-packages/transformers/__init__.py in <module>
46 from .configuration_xlm_roberta import XLM_ROBERTA_PRETRAINED_CONFIG_ARCHIVE_MAP, XLMRobertaConfig
47 from .configuration_xlnet import XLNET_PRETRAINED_CONFIG_ARCHIVE_MAP, XLNetConfig
---> 48 from .data import (
49 DataProcessor,
50 InputExample,
~/miniconda3/envs/convlab-2/lib/python3.7/site-packages/transformers/data/__init__.py in <module>
4
5 from .metrics import is_sklearn_available
----> 6 from .processors import (
7 DataProcessor,
8 InputExample,
~/miniconda3/envs/convlab-2/lib/python3.7/site-packages/transformers/data/processors/__init__.py in <module>
3 # module, but to preserve other warnings. So, don't check this module at all.
4
----> 5 from .glue import glue_convert_examples_to_features, glue_output_modes, glue_processors, glue_tasks_num_labels
6 from .squad import SquadExample, SquadFeatures, SquadV1Processor, SquadV2Processor, squad_convert_examples_to_features
7 from .utils import DataProcessor, InputExample, InputFeatures, SingleSentenceClassificationProcessor
~/miniconda3/envs/convlab-2/lib/python3.7/site-packages/transformers/data/processors/glue.py in <module>
23
24 from ...file_utils import is_tf_available
---> 25 from ...tokenization_utils import PreTrainedTokenizer
26 from .utils import DataProcessor, InputExample, InputFeatures
27
~/miniconda3/envs/convlab-2/lib/python3.7/site-packages/transformers/tokenization_utils.py in <module>
24
25 from .file_utils import add_end_docstrings
---> 26 from .tokenization_utils_base import (
27 ENCODE_KWARGS_DOCSTRING,
28 ENCODE_PLUS_ADDITIONAL_KWARGS_DOCSTRING,
~/miniconda3/envs/convlab-2/lib/python3.7/site-packages/transformers/tokenization_utils_base.py in <module>
29
30 import numpy as np
---> 31 from tokenizers import AddedToken
32 from tokenizers import Encoding as EncodingFast
33
~/miniconda3/envs/convlab-2/lib/python3.7/site-packages/tokenizers/__init__.py in <module>
15 EncodeInput = Union[TextEncodeInput, PreTokenizedEncodeInput]
16
---> 17 from .tokenizers import Tokenizer, Encoding, AddedToken
18 from .tokenizers import decoders
19 from .tokenizers import models
ImportError: dlopen(/Users/wujingwujing/miniconda3/envs/convlab-2/lib/python3.7/site-packages/tokenizers/tokenizers.cpython-37m-darwin.so, 2): Symbol not found: ____chkstk_darwin
Referenced from: /Users/wujingwujing/miniconda3/envs/convlab-2/lib/python3.7/site-packages/tokenizers/tokenizers.cpython-37m-darwin.so (which was built for Mac OS X 10.15)
Expected in: /usr/lib/libSystem.B.dylib
in /Users/wujingwujing/miniconda3/envs/convlab-2/lib/python3.7/site-packages/tokenizers/tokenizers.cpython-37m-darwin.so
Describe the feature
Improve user simulator, mostly agenda policy
Expected behavior
dontcare
response for those system requests that are not in the user goal.name
first, then other slots.Additional context
Have look a few simulated dialogues.
Describe the feature
I have create a simple dataset for my own domain which contains little lines conversations sentences. And how can I create a simple bot on my own few-shot dataset like rasa ?
Expected behavior
I hope this will be supported.
Additional context
And more, how to develop a new dataset for ConvLab-2 ? Is there a document for that ?
Hi there,
Thanks for the excellent paper of Takanobu et al. (2020). I thoroughly appreciate the systematic comparison across so many points, and especially the call out of mismatch between single turn performance and overall performance.
One question I have is about the source code. Right now, the top line README for this repo shows a very similar table to Table 1 of the paper, but with very different results (and indeed, quite a few different systems, with some disagreeing). Furthermore, you have similar tables for 2, 3, etc in the README, but with different metrics.
Best wishes from the ParlAI team & collaborators
Thanks,
Stephen
I find the result of sequicity is quite bad in convlab2(e2e/sequicity), and when I print the result, it's something like this:
user: May I have the address ?
sys: the address is .
I try to train the sequicity in convlab2, the test result is at least not so bad as the result by running the script. So I wonder if there are some bugs in sequicity inference.
Some slots such as restaurant_name may occur in system utterances, i.e. in dialogue MUL2499, the bot says, "All saints church looks good , would you like to head there ?". Logically there is no need to recognize slot values in system utterances but baseline models like SUMBT and TRADE did it. I want to confirm whether you will supply the annotations for system utterances in the final test set.
This may be a silly issue, but when trying to run the tutorial notebook on Colab, convlab2 doesn't get imported after cloning the repo and installing. This is a sample notebook with the two lines involved:
https://colab.research.google.com/drive/1B6p8_GXoPUau9AaZUnKmCm3hQX8BcsIv?usp=sharing
Thanks!
Nick
When training policies that use the UserPolicyAgendaMultiWoz
such as PG
, PPO
and GDPL
, it throws an Exception.
This Exception is related to the file convlab2/task/multiwoz/goal_generator.py
line 181, where the self.corpus_path
seems to be None
instead of a path.
Indeed, in the UserPolicyAgendaMultiWoz
class, we initialize the GoalGenerator
with defaults values (corpus_path = None
, included), and we do not set this anywhere.
To Reproduce
Steps to reproduce an example of the behavior (PPO policy):
convlab2/policy/ppo
python train.py
Expected behavior
It should train the policy with the respective algorithm (PPO in this case)
Additional context
It also happens when evaluating an policy using convlab2/policy/evaluate.py
script.
Describe the feature
SOLOIST
Describe the feature
I've noticed that there are a few issues (#8, #13, #15, #20, #40) mention that it's hard to train RL policy (PG, PPO, GDPL). Thanks all of you, we have fixed some bugs. To help discussion and debugging, I suggest we report the bugs all under this issue.
Since we have improved our user agenda policy (#31), the performance of RL policies in the README is out-of-date. However, as mentioned in this comment, you can still reproduce the result before the change of user policy.
Currently, we are working on training RL policies with the newest agenda policy. We greatly appreciate it If you could help!
Since the training of RL policy is unstable and sensitive to hyperparameters, here are some suggestions (and we welcome more):
In the example https://github.com/thu-coai/ConvLab-2/blob/master/convlab2/util/analysis_tool/example.py the analyzer instantiation analyzer = Analyzer(user_agent=user_agent, use_nlu=True, dataset='multiwoz')
has a wrong signature as use_nlu
is not implemented. Removing that works fine, just seems you forgot to remove it when changing the analyzer class.
In the restaurant database, 3 restaurants do not have phone
Describe the bug
user simulator with the following configuration generates empty user utterance for some turns.
Here is a sample conversation of user/system with user utterance empty:
user: I need a restaurant . It just needs to be expensive . I am also in the market for a new restaurant . Is there something in the centre of town ? Do you have portuguese food ?
sys: I have n't found any in the centre. I am unable to find any portuguese restaurants in town .
user: It just needs to be cheap .
sys: It is in the centre area . They serve portuguese . Would you like to try nandos city centre ? They are in the cheap price range . I will book it for you and get a reference number ?
user:
sys:
user: Can I get the postcode for the restaurant ?
sys: The postcode is cb23ar . Is there anything else I can help you with today ?
user: Ok , have a good day . Goodbye .
sys: Thank you and goodbye . You are welcome . Have a good day !.
To Reproduce
Steps to reproduce the behavior:
20200720
and look at the logs from one of the tests. user utterance for some of the turns are emptyExpected behavior
user utterance shouldn't be empty for any turn.
Describe the feature
I think the convlab-2
is a great toolkit for chinese student/developer to create fantastic chatbot. But after read the code, I find that there are many aspects to be improved in this project which should be in good project, eg: auto-deploy-devops
, auto-testing-deveops
, full-document-in-code
, and so on ....
Expected behavior
I you agree, I will go ahead to improve the quality of this project which will be a good open-source project. I really like this project and field.
Additional context
I am a graduate student of Beijing University of Posts and Telecommunications
, the author of open-source project python-wechaty
, and the [menter of summer-of-code](https://github.com/wechaty/summer-of-code/issues/6)
. And more, my major interest directional is task-oriented dialogue.
I run the test code. The result I got is far different from the result of yours.
For example, the result of BERTNLU-RuleDST-RulePolicy-TemplateNLG model is
complete number of dialogs/tot: 0.901 success number of dialogs/tot: 0.571 average precision: 0.7524100652627299 average recall: 0.905928745583918 average f1: 0.7967497596208643 average book rate: 0.905176116838488 average turn (succ): 10.325744308231174 average turn (all): 12.374 percentage of domains that satisfy the database constraints: 0.752 percentage of dialogs that satisfy the database constraints: 0.617
Is that normal?
train.json.zip
: https://github.com/thu-coai/ConvLab-2/blob/master/data/multiwoz/train.json.ziptrain_corrected.json.zip
: https://github.com/thu-coai/ConvLab-2/blob/master/data/multiwoz/train_corrected.json.zipMultiWOZ2.1_Cleaned.zip
: https://github.com/ConvLab/ConvLab-2/blob/master/data/multiwoz/MultiWOZ2.1_Cleaned.zipI executed the provided train.py script in convlab2/policy/ppo with the prespecified configurations. During training, the success-rate starts pretty high with around 25% and then bumps around 30-35% for some while. When training is finished, I used the evaluation.py script in convlab2/policy to evaluate the performance which gives me 26%, far from the 74% reported in the table.
My Question: What is the exact configuration that has been used for training the 74% model?
Describe the bug
There are some illegal time in arriveBy
slot, such as:
ConvLab-2/data/multiwoz/db/train_db.json
Line 793 in d046161
False
:ConvLab-2/convlab2/evaluator/multiwoz_eval.py
Line 224 in d046161
Expected behavior
Evaluator should regard them as legal time because they are in DB.
Describe the feature
I was able to run python tests/test_BERTNLU-RuleDST-RulePolicy-TemplateNLG.py
and got the same results as in README. Thanks a lot!
However, as soon as I tried to run the same test for CrossWOZ, I ran into the issue that there was no evaluator for CrossWOZ dataset. In fact, currently analyzer only works with the default MultiWOZ evaluator.
I tried to use MultiWOZ evaluator but it didn't work. I looked at CrossWOZ repo but it seemed there wasn't an end-to-end evaluator class, either.
Expected behavior
A working end-to-end evaluator for CrossWOZ dataset.
Additional context
n/a
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.