thu-coai / convlab-2 Goto Github PK

ConvLab-2: An Open-Source Toolkit for Building, Evaluating, and Diagnosing Dialogue Systems

License: Apache License 2.0

Python 80.74% Shell 0.01% Perl 0.24% HTML 18.27% Jsonnet 0.22% CSS 0.06% JavaScript 0.01% Jupyter Notebook 0.42% Dockerfile 0.04%

dialogue task-oriented-dialogue dialogue-systems

convlab-2's People

Stargazers

Watchers

Forkers

akashicmarga jingyaozhou barryzm zqwerty ren98feng aaa123git chenruiqing chrisgeishauser victor8733 gusalsdmlwlq xuehuiping ivanchenph veinpy zeinabbo sherlock1987 blackdragondidara aayush6897 xrosliang strategist922 charan1561 luweishuang madkote jincli msft-shahins beethovenvirus sdwivedi kitaharatomoyo jqwang-77 yclzju malavikkarajmohan mehrad0711 pengbaolin zimozhou jiashengliu111 ashutoshml wenjing9870 zhangyanbo2007 brunonishimoto function2-llx dh95 wj-mcat ggjge zz-jacob armando-fandango larsliden ufal jkulhanek jiaxin-wen xiaoanshi keshuichonglx highosimida ngduyanhece rulegreen qd0716 micheledequattro e0397123 flash2007 nikitacs16 ares5221 liangrz15 hex-plex muximuxi terabyte17 rtharungowda arch-raven zhangxt kunalkumarsaw rogervaas silin159 zy12105228 affinitiai tlntin hungntt greatfeel ishine taaccoo 18166035475 tejasr112002 gzpbbd yenchen-wu giladlandau1 judelee19 hualai-liujiexi ehealthgroup whatissimondoing squirtle-gpj scu-jjkinging caohoangtung zhanzq jyueb hbasafa ljw23 yukioichida ikhee0119 fukongpgz h1ke dingjianfei hanscal ellen0609 zhangjunliang555

convlab-2's Issues

[BUG] About inform actions and entity selection

I have a question regarding inform-actions in ConvLab-2:

Consider for instance the action “Attraction-Inform-Area-3”. If I am correct, this will result in informing the area of the THIRD entity in the database query. Is that correct?

If so, what is the idea behind having “Attraction-Inform-Area-1”, “Attraction-Inform-Area-2”, “Attraction-Inform-Area-3”, … instead of only having one action “Attraction-Inform-Area” and selecting an entity from the DB-query randomly? Doesn’t your method only blow up the action space?

Moreover, if you look at the function “lexicalize_da” in /convlab2/util/multiwoz/lexicalize.py and consider the actions “Attraction-Inform-Area-3” and “Attraction-Inform-Phone-2” then for instance.
Wouldn’t that output you the phone-number of the second entity and the area of the third entity? Shouldn’t the area and phone-number be associated to the same entity?

Is there any method to add DQN to convlab2?

Hey, guys, I am a big fan of Convlab1, and in that platform, it implement the actor critic, ppo and also DQN. Are you guys plan to add these two other RL model to convlab2? If not, is there any convinient way for me to add these RL methods?

[BUG] Database query expects mutable elements

Describe the bug
util/multiwoz/dbquery.py:42
The query() method expects elements of the constraints argument to be mutable.
However, tuples are used as the elements and given to the method call in the ConvLab code MultiWozEvaluator._final_goal_analyze().

To Reproduce
The error occurs when Analyzer.comprehensive_analysis() is called and the generated goal contains specific values, see call stack:

File "/lscratch/kulhajon/873944.stallo-adm.uit.no/source/train.py", line 214, in _run_evaluation
    result = analyzer(agent, num_dialogs=self.args.evaluation_dialogs)

  File "/lscratch/kulhajon/873944.stallo-adm.uit.no/source/evaluate.py", line 58, in __call__
    self.comprehensive_analyze(agent, agent.name, total_dialog=num_dialogs)

  File "/home/kulhajon/source/ConvLab-2/convlab2/util/analysis_tool/analyzer.py", line 145, in comprehensive_analyze
    task_success = sess.evaluator.task_success()

  File "/home/kulhajon/source/ConvLab-2/convlab2/evaluator/multiwoz_eval.py", line 298, in task_success
    goal_sess = self.final_goal_analyze()

  File "/home/kulhajon/source/ConvLab-2/convlab2/evaluator/multiwoz_eval.py", line 401, in final_goal_analyze
    match, mismatch = self._final_goal_analyze()

  File "/home/kulhajon/source/ConvLab-2/convlab2/evaluator/multiwoz_eval.py", line 382, in _final_goal_analyze
    query_result = self.database.query(domain, constraints)

  File "/home/kulhajon/source/ConvLab-2/convlab2/util/multiwoz/dbquery.py", line 34, in query
    ele[1] = 'centre'

TypeError: 'tuple' object does not support item assignment

Expected behavior
The method does not throw an error.

Additional context
It's reasonable to use immutable elements for other reasons too.
I've created the Pull Request with suggested fix.

[Maintenance] Maybe the URL in README is wrong?

"Our documents are on https://thu-coai.github.io/ConvLab-2_docs/convlab2.html#module-convlab2."

Maybe the url is https://thu-coai.github.io/ConvLab-2_docs/convlab2.html ?

Can ConvLab-2 be used for my private dataset?

Hi, I want to know how can I train models in ConvLab-2 on my private dataset to build my own task-oriented dialogue system?

[BUG] Trained PPO from scratch won't work with Analyzer

Describe the bug

Since I recently started learning how to use this library, I'm aiming for a very simple concrete task, which is training a PPO policy from scratch.

Given my limited resources at the moment I'm using Google Colab Pro, which gives me a Tesla P100-PCIE-16GB for cheap.

The approach I'm following is (it mostly follows the tutorial on Colab, however I'll be extra detailed because the only mistake here could be my way of using the different modules) :

I'm cloning convlab2 github repo and installing locally (I tried both "just in runtime" and also locally when connecting the notebook with a Google Drive folder
After importing all necessary libraries, I create a simple dialogue system as in /ppo/train.py:

# simple rule DST
dst_sys = RuleDST()

policy_sys = PPO(True)
policy_sys.load(args.load_path)

# not use dst
dst_usr = None
# rule policy
policy_usr = RulePolicy(character='usr')
# assemble
simulator = PipelineAgent(None, None, policy_usr, None, 'user')

evaluator = MultiWozEvaluator()
env = Environment(None, simulator, None, dst_sys, evaluator)

for i in range(args.epoch):
    update(env, policy_sys, args.batchsz, i, args.process_num)

The key here is that i'm training for only 100 epochs, however even if I would not expect this trained policy being any good, at least I'd expect being able to generate dialogues.

Once the policy (policy_sys) is trained, I create a session for testing dialogues. In order to being able to do it, I create the user and system agents (pipelines) using the trained policy for the system agent.

# --- system ---

# BERT nlu
sys_nlu = BERTNLU()
# simple rule DST
sys_dst = RuleDST()
# TRAINED PPO POLICY ###
sys_policy = policy_sys
# template NLG
sys_nlg = TemplateNLG(is_user=False)
# assemble
sys_agent = PipelineAgent(sys_nlu, sys_dst, sys_policy, sys_nlg, name='sys')

# --- user ---

# MILU
user_nlu = MILU()
# not use dst
user_dst = None
# rule policy
user_policy = RulePolicy(character='usr')
# template NLG
user_nlg = TemplateNLG(is_user=True)
# assemble
user_agent = PipelineAgent(user_nlu, user_dst, user_policy, user_nlg, name='user')

# --- evaluator and session ---

evaluator = MultiWozEvaluator()
sess = BiSession(sys_agent=sys_agent, user_agent=user_agent, kb_query=None, evaluator=evaluator)

Same as in the tutorial, I use this simple loop to sample dialogues from the session

sys_response = ''
sess.init_session()
print('init goal:')
pprint(sess.evaluator.goal)
print('-'*50)
for i in range(20):
    sys_response, user_response, session_over, reward = sess.next_turn(sys_response)
    print('user:', user_response)
    print('sys:', sys_response)
    print()
    if session_over is True:
        break
print('task success:', sess.evaluator.task_success())
print('book rate:', sess.evaluator.book_rate())
print('inform precision/recall/f1:', sess.evaluator.inform_F1())
print('-'*50)
print('final goal:')
pprint(sess.evaluator.goal)
print('='*100)

The error I'm getting is "RuntimeError: CUDA error: device-side assert triggered" apparently from jointBERT library.

I suspect this is due the utterances generated by the system are longer than the MAX_LEN in BERT model?

So the main question here (apart from the obvious one "why is this happening?") would be: is this the right approach for training and testing a RL policy from scratch?

I've seen that in order to improve PPO's performance, some sort of imitation learning helps as pre-training step. However what I'm aiming is not fine-tuning PPO, but simply training it for a few epochs (100, 200, etc) and different hyperparameters and being able to use the analyzer library to assess the model's behaviour (e.g. avg Success Rate) when increasing the number of epochs or changing hyperparameters.

To Reproduce

All the code is here and it runs out of the box, it should reproduce the "CUDA error: device-side assert triggered" error I'm facing

https://colab.research.google.com/drive/1nz73WBKLohohScsZIFjpDJz0y0CG4SRB?usp=sharing

Expected behavior

I expected that building a system agent again and using the previously trained policy would work out of the box within the analyzer tool.

[Feature] Update MultiWOZ dataset from 2.1 to 2.2

Given the release of MultiWoZ 2.2, it seems like the baselines should all be retrained using the cleanest version of the dataset. Paper: https://www.aclweb.org/anthology/2020.nlp4convai-1.13/

[BUG] `final_goal_analyze` bug

Describe the bug
The following case will be judged as a mismatch in _final_goal_analyze:

case:

Constraints from final goal:
[('price', '75.10 pounds'), ('arriveBy', '15:45'), ('day', 'friday'), ('departure', 'cambridge'), ('destination', 'birmingham new street')]

self.booked[train]:
{'arriveBy': '07:44', 'day': 'friday', 'departure': 'cambridge', 'destination': 'birmingham new street', 'duration': '163 minutes', 'leaveAt': '05:01', 'price': '75.10 pounds', 'trainID': 'TR9678', 'Ref': '00002092'}

code:

ConvLab-2/convlab2/evaluator/multiwoz_eval.py

Line 390 in db9ddd4

if all(booked.get(k, object()) == v for k, v in constraints):

Expected behavior
In train domain, arriveBy and leaveAt in goal may be not equal to item in DB exactly, but it is right.

Result bias of DAMD model in Convlab-2 and original paper, and downloading problem

Excuse me, I just want to know why there exists big bias between the result of DAMD model in Convlab-2 and the original paper?
The success rate is 60.4% in original paper and 33.6% in ConvLab-2.

When i want to run the test files in tests directory, the downloading of pre-trained models always get errors although i have tried many times. Is there other method for downloading?

confirmation of the evaluation metric of cross-lingual DST task

I want to confirm whether this script is the final evaluation script of cross-lingual DST task. In this script, only slots in semi are of interest, and slots in book are not.

ConvLab-2/convlab2/dst/evaluate.py

Line 61 in 50424c3

domain_data = domain_data['semi']

Thanks.

[Feature] The user simulator always says something that is not clear

Describe the feature
For the following goal:

{
  "attraction": {
    "info": {
      "area": "west"
    },
    "reqt": {
      "entrance fee": "?",
      "phone": "?"
    }
  },
  "hotel": {
    "info": {
      "internet": "yes",
      "pricerange": "cheap",
      "type": "guesthouse"
    },
    "reqt": {
      "phone": "?"
    }
  },
  "taxi": {
    "info": {
      "arriveBy": "18:30"
    },
    "reqt": {
      "car type": "?",
      "phone": "?"
    }
  }
}

The user simulator said:

I ca n't wait to get there and see some of the local attractions . I am looking for a place to stay . Can you help me ? I also need a place to go in the west .

It is not clear for area slot value west

[Feature] integrate SimpleTOD into ConvLab-2

Describe the feature
SimpleTOD perform strongly on the MultiWOZ dataset. It can be used in DST, Policy, and end-to-end tasks.

Expected behavior
Can use SimpleTOD as DST, Policy, and end-to-end model.

Additional context
Performance: https://github.com/budzianowski/multiwoz

I think there has a problem in convlab2/evaluator/multiwoz_eval.py.

I don't know why we count mismatch using the constraints of self.goal in line 376-384.
self.goal isn't modified by other variables, it's still same with the initial goal.

[BUG]

Describe the bug
This platform could not train a MLE model.
When I load the MLE model for GDPL, PPO, PG, it could train with no problem, but it never gets to the optimal score(I run evluate.py to see the model). Actually it goes down after few eopchs. And here is a graph I made in GDPL, and PPO is pretty similar to this one.

To Reproduce
Steps to reproduce the behavior:
Simply run train.py in PG/GDPL/PPO, and it will give this issue. I write a script which could evluate all of the models in one dir, and here is the graph I made about GDPL.

Expected behavior
The evluate score should go higher when loaded MLE model.
Score for PPO:
[0.52, 0.53, 0.54, 0.49, 0.44, 0.49, 0.46, 0.47, 0.44, 0.43, 0.42, 0.44, 0.47, 0.48, 0.49, 0.46, 0.45, 0.46, 0.45, 0.48, 0.46, 0.48, 0.49, 0.49, 0.49, 0.48, 0.45, 0.47, 0.43, 0.43, 0.43, 0.42, 0.42, 0.41, 0.42, 0.43, 0.44, 0.47, 0.45, 0.43]
So, the max of PPO is just goes to 0.53, not like 0.74.

[BUG] for training part of policy gradient

Describe the bug
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [40,0,0], thread: [0,0,0] Assertion val >= zero failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [40,0,0], thread: [1,0,0] Assertion val >= zero failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [9,0,0], thread: [0,0,0] Assertion val >= zero failed.

This will happen when runing train.py in convlab2/policy/pg, and this will happen after 15 epoch even I have already loaded the MLE model.

[BUG] Missing package in setup

Describe the bug
In the last update, it was included the package fuzzywuzzy in file convlab2/util/multiwoz/dbquery.py.
We need to add it in setup to install this package when running pip install -e command

[Feature] Improve networks and memory implementations

Describe the feature
I think it would be a good idea to have more implementations of policies, networks and memory (all of this classes are implemented in convlab2/policy/rlmodule.py) such as ReplayMemory, PrioritizedReplay, CNN, etc.
Most of these classes are implementend in Convlab.

[BUG] Parking-none Internet-none

Describe the bug
sess.user_agent.get_in_da in BERTNLU always output Parking-none and Internet-none, it lead to a decrease in our recall and cause task fail

Expected behavior
parking and internet should be yes or no

Different frequency of domain after using same random seed

I used set_seed(20200202) in test_end2end.py, but I got different frequency of domain. Why?

Can't random seeds fix user utterance?

[Feature] vhus

Describe the feature

I notice that the vhus in convlab2 only implement hus and vhus, without goal regularization. In the original paper, this part plays an important role. So I wonder will you implement this?
In fact, I try to finish this part as well, but I get puzzled about the function BOW, do you have any idea how to calculate this part? Especially BOW(Ut)

[Maintenance] scoring metric

Thanks for this great tool. I have a few questions about the scoring metric.

The benchmark table uses "Complete rate", "Success rate", "Book rate", and "Inform P/R/F" as metrics. While in many recent papers, people use "Inform (%)", "Success (%)", "BLEU", and "Combined score" (a combined score of the previous three metrics with different weight). Can I assume "Success rate" = "Success (%)", "Inform F1" = "Inform (%)"? And it would be nice if you can add "BLEU" and "Combined score", just to align with others' work.

In the benchmark table, DAMB achieves 33.6 success rate, 57.4 inform F1, but in the original paper (https://arxiv.org/pdf/1911.10484.pdf), DAMB achieves 60.4 success (%), 76.3 inform (%). Why there is a huge difference?

[Feature] Training curves

Describe the feature
Are training curves saved anywhere? How can I see them after training a model?

I'm currently wrapping the policy to retrieve them, but perhaps they are being saved somewhere I'm not aware of.

[BUG] Problem with indentation

Describe the bug
There is an indentation error at file convlab2/policy/evaluate.py on line 219. The else statement should be nested inside the for loop, right? Aligned with the if statement.

[BUG] Some code problems

When I try to run the evaluate command of “Translation-train SUMBT for cross-lingual DST”, I foud some code problems.

convlab2/dst/sumbt/multiwoz_zh/sumbt.py , line 33
DOWNLOAD_DIRECTORY = os.path.join(SUMBT_PATH, 'pre-trianed')
Maybe 'pre-trianed' should be changed to 'pre-trained'
convlab2/dst/sumbt/multiwoz_zh/convert_to_glue_format.py , line 126,
value = value_trans.get(value, value)
Maybe value_trans.get(value, value) should be trans_value(value)
convlab2/dst/sumbt/crosswoz_en/sumbt.py , line 36 (same as no.1)
DOWNLOAD_DIRECTORY = os.path.join(SUMBT_PATH, 'pre-trianed')
Maybe 'pre-trianed' should be changed to 'pre-trained'

[BUG] The result of SUMBT model

Hi! When I tried to evaluate the translation training SUMBT model, I found that the eval mode was not set, which had a certain impact on the results. According to the results of my local test, I found that there is a difference of two points on the MutliWOZ-zh human-val dataset. I think it may be necessary to re-evaluate the SUMBT model after modifying the code. The current results
are not real model performance.

My Local Result on MultiWOZ-zh
not set eval mode
{'Joint Acc': 0.4821722435545804, 'Turn Acc': 0.9738983360760534, 'Joint F1': 0.8826705748001639}
set eval mode
{'Joint Acc': 0.49972572682391664, 'Turn Acc': 0.9751935149631128, 'Joint F1': 0.8885012208542876}

[BUG] init_session class method

Describe the bug
Not quite a bug but I've noticed some policies (like PPO and PG) have an init_session method that is not implemented, yet it's called by the agent (dialog_agent/agent.py) when the pipeline is build. Is this a non-used method or is there any plan of adding something here?

help

when I use the interactive tool, I can't find the tool in browser after runing run.py. What's wrong happend, or where should I modify? Can you tell me what should i do? thanks!

[BUG] GDPL cound not train.

Describe the bug
When I try to train the model of GDPL, also I loaded the MLE pretrained model, but the loss and results for evluation is always around 0.26. Below is the problem issue, could you guys help me out? Since GDPL is pretty good, and also I plan to set this as my baseline model.

To Reproduce

Go to ploicy/gdpl/train.py and add the arguements --load_model path of MLE. And you could see the results, the loss will become bigger and bigger. This results should look like this:

WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: taxi domain
DEBUG:root:<> epoch 0, loss_real:-0.5383382267836068, loss_gen:-1.5583195904683735
INFO:root:<> epoch 0: saved network to mdl
DEBUG:root:<> weight -3.7587242126464844
DEBUG:root:<> log pi -11.807324409484863
/home/raliegh/视频/convlab2_github_code_theirs/ConvLab-2/convlab2/policy/gdpl/gdpl.py:183: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_grad_norm_.
torch.nn.utils.clip_grad_norm(self.policy.parameters(), 10)
DEBUG:root:<> epoch 0, iteration 0, value, loss 3489.1388260690787
DEBUG:root:<> epoch 0, iteration 0, policy, loss -0.0036238288800967368
DEBUG:root:<> epoch 0, iteration 1, value, loss 3480.9435135690787
DEBUG:root:<> epoch 0, iteration 1, policy, loss -0.09092773252019756
DEBUG:root:<> epoch 0, iteration 2, value, loss 3498.0641061883225
DEBUG:root:<> epoch 0, iteration 2, policy, loss -0.11517706787899921
DEBUG:root:<> epoch 0, iteration 3, value, loss 3488.2195530941613
DEBUG:root:<> epoch 0, iteration 3, policy, loss -0.12360558266702451
DEBUG:root:<> epoch 0, iteration 4, value, loss 3476.682437294408
DEBUG:root:<> epoch 0, iteration 4, policy, loss -0.12722392360630788
INFO:root:<> epoch 0: saved network to mdl
WARNING:root:illegal booking slot: time, slot: attraction domain
DEBUG:root:<> epoch 1, loss_real:-2.1718062476107947, loss_gen:-6.248041303534257
INFO:root:<> epoch 1: saved network to mdl
DEBUG:root:<> weight -9.06725788116455
DEBUG:root:<> log pi -11.601991653442383
DEBUG:root:<> epoch 1, iteration 0, value, loss 1590.3297087016858
DEBUG:root:<> epoch 1, iteration 0, policy, loss -0.0042587477517755405
DEBUG:root:<> epoch 1, iteration 1, value, loss 1590.0544883326481
DEBUG:root:<> epoch 1, iteration 1, policy, loss -0.07637144262461286
DEBUG:root:<> epoch 1, iteration 2, value, loss 1589.7801545795642
DEBUG:root:<> epoch 1, iteration 2, policy, loss -0.09997303185886458
DEBUG:root:<> epoch 1, iteration 3, value, loss 1589.4738512541119
DEBUG:root:<> epoch 1, iteration 3, policy, loss -0.11133970398651927
DEBUG:root:<> epoch 1, iteration 4, value, loss 1589.1489193564967
DEBUG:root:<> epoch 1, iteration 4, policy, loss -0.11775584558123037
INFO:root:<> epoch 1: saved network to mdl
WARNING:root:illegal booking slot: time, domain: hospital
WARNING:root:illegal booking slot: time, slot: attraction domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
DEBUG:root:<> epoch 2, loss_real:-3.781325187948015, loss_gen:-10.217867334683737
INFO:root:<> epoch 2: saved network to mdl
DEBUG:root:<> weight -12.925418853759766
DEBUG:root:<> log pi -12.265064239501953
DEBUG:root:<> epoch 2, iteration 0, value, loss 4830.441213507402
DEBUG:root:<> epoch 2, iteration 0, policy, loss -0.020781385271172775
DEBUG:root:<> epoch 2, iteration 1, value, loss 4839.154656661184
DEBUG:root:<> epoch 2, iteration 1, policy, loss -0.08836260036026176
DEBUG:root:<> epoch 2, iteration 2, value, loss 4831.741853412829
DEBUG:root:<> epoch 2, iteration 2, policy, loss -0.10602868407180435
DEBUG:root:<> epoch 2, iteration 3, value, loss 4824.3883634868425
DEBUG:root:<> epoch 2, iteration 3, policy, loss -0.12300284697036994
DEBUG:root:<> epoch 2, iteration 4, value, loss 4831.304481907895
DEBUG:root:<> epoch 2, iteration 4, policy, loss -0.12597578234578433
INFO:root:<> epoch 2: saved network to mdl
WARNING:root:illegal booking slot: time, domain: attraction
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, domain: hotel
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
DEBUG:root:<> epoch 3, loss_real:-5.254823472764757, loss_gen:-13.987894455591837
INFO:root:<> epoch 3: saved network to mdl
DEBUG:root:<> weight -16.43012809753418
DEBUG:root:<> log pi -11.844439506530762
DEBUG:root:<> epoch 3, iteration 0, value, loss 6681.600123355263
DEBUG:root:<> epoch 3, iteration 0, policy, loss -0.014684114396866215
DEBUG:root:<> epoch 3, iteration 1, value, loss 6697.302657277961
DEBUG:root:<> epoch 3, iteration 1, policy, loss -0.08244152585546927
DEBUG:root:<> epoch 3, iteration 2, value, loss 6687.997532894737
DEBUG:root:<> epoch 3, iteration 2, policy, loss -0.10515823467683635
DEBUG:root:<> epoch 3, iteration 3, value, loss 6690.9089997944075
DEBUG:root:<> epoch 3, iteration 3, policy, loss -0.11676324161357786
DEBUG:root:<> epoch 3, iteration 4, value, loss 6678.3968313116775
DEBUG:root:<> epoch 3, iteration 4, policy, loss -0.12235850389697589
INFO:root:<> epoch 3: saved network to mdl
WARNING:root:illegal booking slot: time, domain: hotel
WARNING:root:illegal booking slot: time, domain: attraction
WARNING:root:illegal booking slot: time, domain: hotel
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: taxi domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: attraction domain
DEBUG:root:<> epoch 4, loss_real:-6.739606408511891, loss_gen:-18.933229109820196
INFO:root:<> epoch 4: saved network to mdl
DEBUG:root:<> weight -21.545021057128906
DEBUG:root:<> log pi -12.236998558044434
DEBUG:root:<> epoch 4, iteration 0, value, loss 16275.491156684027
DEBUG:root:<> epoch 4, iteration 0, policy, loss -0.014838041116793951
DEBUG:root:<> epoch 4, iteration 1, value, loss 16267.9013671875
DEBUG:root:<> epoch 4, iteration 1, policy, loss -0.09151227782583898
DEBUG:root:<> epoch 4, iteration 2, value, loss 16256.190104166666
DEBUG:root:<> epoch 4, iteration 2, policy, loss -0.11655553637279405
DEBUG:root:<> epoch 4, iteration 3, value, loss 16265.713351779514
DEBUG:root:<> epoch 4, iteration 3, policy, loss -0.12722003553062677
DEBUG:root:<> epoch 4, iteration 4, value, loss 16243.192165798611
DEBUG:root:<> epoch 4, iteration 4, policy, loss -0.13666448928415775
INFO:root:<> epoch 4: saved network to mdl
WARNING:root:illegal booking slot: time, domain: hotel
WARNING:root:illegal booking slot: time, domain: hotel
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, domain: taxi
WARNING:root:illegal booking slot: time, slot: taxi domain
WARNING:root:illegal booking slot: time, slot: taxi domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
DEBUG:root:<> epoch 5, loss_real:-7.912413765402401, loss_gen:-22.03384522830739
INFO:root:<> epoch 5: saved network to mdl
DEBUG:root:<> weight -24.468324661254883
DEBUG:root:<> log pi -12.261258125305176
DEBUG:root:<> epoch 5, iteration 0, value, loss 27010.648274739582
DEBUG:root:<> epoch 5, iteration 0, policy, loss -0.013149608030087419
DEBUG:root:<> epoch 5, iteration 1, value, loss 27043.53125
DEBUG:root:<> epoch 5, iteration 1, policy, loss -0.0839987989101145
DEBUG:root:<> epoch 5, iteration 2, value, loss 27066.318250868055
DEBUG:root:<> epoch 5, iteration 2, policy, loss -0.10623834199375576
DEBUG:root:<> epoch 5, iteration 3, value, loss 27043.93825954861
DEBUG:root:<> epoch 5, iteration 3, policy, loss -0.11813025466269916
DEBUG:root:<> epoch 5, iteration 4, value, loss 26953.104600694445
DEBUG:root:<> epoch 5, iteration 4, policy, loss -0.1252221003588703
INFO:root:<> epoch 5: saved network to mdl
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: attraction domain
WARNING:root:illegal booking slot: time, domain: hotel
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, domain: hospital
DEBUG:root:<> epoch 6, loss_real:-9.242614388465881, loss_gen:-24.15580458111233
INFO:root:<> epoch 6: saved network to mdl
DEBUG:root:<> weight -26.42582893371582
DEBUG:root:<> log pi -11.808538436889648
DEBUG:root:<> epoch 6, iteration 0, value, loss 35887.18179481908
DEBUG:root:<> epoch 6, iteration 0, policy, loss -0.020953503682425146
DEBUG:root:<> epoch 6, iteration 1, value, loss 35494.21656558388
DEBUG:root:<> epoch 6, iteration 1, policy, loss -0.08569272891863396
DEBUG:root:<> epoch 6, iteration 2, value, loss 35628.84801603619
DEBUG:root:<> epoch 6, iteration 2, policy, loss -0.10266891509098441
DEBUG:root:<> epoch 6, iteration 3, value, loss 35657.03916529605
DEBUG:root:<> epoch 6, iteration 3, policy, loss -0.11386555943049882
DEBUG:root:<> epoch 6, iteration 4, value, loss 35917.57833059211
DEBUG:root:<> epoch 6, iteration 4, policy, loss -0.11797217848269563
INFO:root:<> epoch 6: saved network to mdl
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: taxi domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
DEBUG:root:<> epoch 7, loss_real:-11.321088128619724, loss_gen:-29.293851852416992
INFO:root:<> epoch 7: saved network to mdl
DEBUG:root:<> weight -32.10945129394531
DEBUG:root:<> log pi -11.713705062866211
DEBUG:root:<> epoch 7, iteration 0, value, loss 44522.42914496528
DEBUG:root:<> epoch 7, iteration 0, policy, loss -0.015966814425256517
DEBUG:root:<> epoch 7, iteration 1, value, loss 44453.58452690972
DEBUG:root:<> epoch 7, iteration 1, policy, loss -0.07723193801939487
DEBUG:root:<> epoch 7, iteration 2, value, loss 44377.24782986111
DEBUG:root:<> epoch 7, iteration 2, policy, loss -0.09828437285290824
DEBUG:root:<> epoch 7, iteration 3, value, loss 44297.86208767361
DEBUG:root:<> epoch 7, iteration 3, policy, loss -0.11189984074897236
DEBUG:root:<> epoch 7, iteration 4, value, loss 44211.8828125
DEBUG:root:<> epoch 7, iteration 4, policy, loss -0.12044301960203382
INFO:root:<> epoch 7: saved network to mdl
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
DEBUG:root:<> epoch 8, loss_real:-14.25956932703654, loss_gen:-33.106894387139214
INFO:root:<> epoch 8: saved network to mdl
DEBUG:root:<> weight -35.563194274902344
DEBUG:root:<> log pi -11.887650489807129
DEBUG:root:<> epoch 8, iteration 0, value, loss 61228.02682976974
DEBUG:root:<> epoch 8, iteration 0, policy, loss -0.019527194384289414
DEBUG:root:<> epoch 8, iteration 1, value, loss 60913.86245888158
DEBUG:root:<> epoch 8, iteration 1, policy, loss -0.08493027012599141
DEBUG:root:<> epoch 8, iteration 2, value, loss 60804.58943256579
DEBUG:root:<> epoch 8, iteration 2, policy, loss -0.10401363087523925
DEBUG:root:<> epoch 8, iteration 3, value, loss 60740.71361019737
DEBUG:root:<> epoch 8, iteration 3, policy, loss -0.11570279148872942
DEBUG:root:<> epoch 8, iteration 4, value, loss 60633.64113898026
DEBUG:root:<> epoch 8, iteration 4, policy, loss -0.12276971943088268
INFO:root:<> epoch 8: saved network to mdl
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
DEBUG:root:<> epoch 9, loss_real:-16.396672407786053, loss_gen:-39.38313462999132
INFO:root:<> epoch 9: saved network to mdl
DEBUG:root:<> weight -42.118408203125
DEBUG:root:<> log pi -11.91506290435791
DEBUG:root:<> epoch 9, iteration 0, value, loss 102404.39268092105
DEBUG:root:<> epoch 9, iteration 0, policy, loss -0.023536940546412217
DEBUG:root:<> epoch 9, iteration 1, value, loss 102286.93421052632
DEBUG:root:<> epoch 9, iteration 1, policy, loss -0.0810224729541101
DEBUG:root:<> epoch 9, iteration 2, value, loss 101849.27960526316
DEBUG:root:<> epoch 9, iteration 2, policy, loss -0.10366031547126017
DEBUG:root:<> epoch 9, iteration 3, value, loss 101598.78638980263
DEBUG:root:<> epoch 9, iteration 3, policy, loss -0.11581830601943166
DEBUG:root:<> epoch 9, iteration 4, value, loss 101350.11461759868
DEBUG:root:<> epoch 9, iteration 4, policy, loss -0.1236358410433719
INFO:root:<> epoch 9: saved network to mdl
WARNING:root:illegal booking slot: time, domain: hotel
WARNING:root:illegal booking slot: time, domain: hotel
WARNING:root:illegal booking slot: time, slot: taxi domain
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: attraction domain
DEBUG:root:<> epoch 10, loss_real:-17.94006437725491, loss_gen:-41.33010853661431
INFO:root:<> epoch 10: saved network to mdl
DEBUG:root:<> weight -43.82462692260742
DEBUG:root:<> log pi -12.179319381713867
DEBUG:root:<> epoch 10, iteration 0, value, loss 111196.29091282895
DEBUG:root:<> epoch 10, iteration 0, policy, loss -0.015502721169277242
DEBUG:root:<> epoch 10, iteration 1, value, loss 108579.41981907895
DEBUG:root:<> epoch 10, iteration 1, policy, loss -0.08138108037804302
DEBUG:root:<> epoch 10, iteration 2, value, loss 108351.37541118421
DEBUG:root:<> epoch 10, iteration 2, policy, loss -0.10115281825787142
DEBUG:root:<> epoch 10, iteration 3, value, loss 109070.85341282895
DEBUG:root:<> epoch 10, iteration 3, policy, loss -0.10706739313900471
DEBUG:root:<> epoch 10, iteration 4, value, loss 108081.73663651316
DEBUG:root:<> epoch 10, iteration 4, policy, loss -0.11929772833460256
INFO:root:<> epoch 10: saved network to mdl
WARNING:root:illegal booking slot: time, slot: hotel domain
WARNING:root:illegal booking slot: time, slot: hotel domain
DEBUG:root:<> epoch 11, loss_real:-22.859329329596626, loss_gen:-50.24238416883681
INFO:root:<> epoch 11: saved network to mdl
DEBUG:root:<> weight -53.37864685058594
DEBUG:root:<> log pi -12.136919975280762
DEBUG:root:<> epoch 11, iteration 0, value, loss 201200.13569078947
DEBUG:root:<> epoch 11, iteration 0, policy, loss -0.023343098202818317
DEBUG:root:<> epoch 11, iteration 1, value, loss 195454.23190789475
DEBUG:root:<> epoch 11, iteration 1, policy, loss -0.09736867954856471
DEBUG:root:<> epoch 11, iteration 2, value, loss 199148.953125
DEBUG:root:<> epoch 11, iteration 2, policy, loss -0.10236057227379397
DEBUG:root:<> epoch 11, iteration 3, value, loss 203306.05283717104
DEBUG:root:<> epoch 11, iteration 3, policy, loss -0.10679333225676887
DEBUG:root:<> epoch 11, iteration 4, value, loss 197667.32565789475
DEBUG:root:<> epoch 11, iteration 4, policy, loss -0.12387701702353202
INFO:root:<> epoch 11: saved network to mdl

Thank you guys, have a good day! Appreciate your help.

[Maintenance] re-test the end-to-end performance

Describe the feature
#32 fix bugs in goal generator and dbquery. #31 will improve agenda user policy. After that, we need to re-run test scripts under the test/ dir.
The results in README may not be accurate after commit bdc9dba

[BUG] getting-started can not run well

Describe the bug
When I follow the Installation step and run tutorials/Getting_Started.ipynb, It gets into trouble.

To Reproduce
Steps to reproduce the behavior:

conda create -n convlab-2
pip install -e .
jupyter notebook
run the tutorials/Getting_Started.ipynb

Expected behavior
It will run well for me.

Additional context

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-3-765703729d4c> in <module>
      1 # common import: convlab2.$module.$model.$dataset
----> 2 from convlab2.nlu.jointBERT.multiwoz import BERTNLU
      3 from convlab2.nlu.milu.multiwoz import MILU
      4 from convlab2.dst.rule.multiwoz import RuleDST
      5 from convlab2.policy.rule.multiwoz import RulePolicy

~/vscode/dialogue/ConvLab-2/convlab2/nlu/jointBERT/multiwoz/__init__.py in <module>
----> 1 from convlab2.nlu.jointBERT.multiwoz.nlu import BERTNLU

~/vscode/dialogue/ConvLab-2/convlab2/nlu/jointBERT/multiwoz/nlu.py in <module>
      7 from convlab2.util.file_util import cached_path
      8 from convlab2.nlu.nlu import NLU
----> 9 from convlab2.nlu.jointBERT.dataloader import Dataloader
     10 from convlab2.nlu.jointBERT.jointBERT import JointBERT
     11 from convlab2.nlu.jointBERT.multiwoz.postprocess import recover_intent

~/vscode/dialogue/ConvLab-2/convlab2/nlu/jointBERT/dataloader.py in <module>
      2 import torch
      3 import random
----> 4 from transformers import BertTokenizer
      5 import math
      6 from collections import Counter

~/miniconda3/envs/convlab-2/lib/python3.7/site-packages/transformers/__init__.py in <module>
     46 from .configuration_xlm_roberta import XLM_ROBERTA_PRETRAINED_CONFIG_ARCHIVE_MAP, XLMRobertaConfig
     47 from .configuration_xlnet import XLNET_PRETRAINED_CONFIG_ARCHIVE_MAP, XLNetConfig
---> 48 from .data import (
     49     DataProcessor,
     50     InputExample,

~/miniconda3/envs/convlab-2/lib/python3.7/site-packages/transformers/data/__init__.py in <module>
      4 
      5 from .metrics import is_sklearn_available
----> 6 from .processors import (
      7     DataProcessor,
      8     InputExample,

~/miniconda3/envs/convlab-2/lib/python3.7/site-packages/transformers/data/processors/__init__.py in <module>
      3 # module, but to preserve other warnings. So, don't check this module at all.
      4 
----> 5 from .glue import glue_convert_examples_to_features, glue_output_modes, glue_processors, glue_tasks_num_labels
      6 from .squad import SquadExample, SquadFeatures, SquadV1Processor, SquadV2Processor, squad_convert_examples_to_features
      7 from .utils import DataProcessor, InputExample, InputFeatures, SingleSentenceClassificationProcessor

~/miniconda3/envs/convlab-2/lib/python3.7/site-packages/transformers/data/processors/glue.py in <module>
     23 
     24 from ...file_utils import is_tf_available
---> 25 from ...tokenization_utils import PreTrainedTokenizer
     26 from .utils import DataProcessor, InputExample, InputFeatures
     27 

~/miniconda3/envs/convlab-2/lib/python3.7/site-packages/transformers/tokenization_utils.py in <module>
     24 
     25 from .file_utils import add_end_docstrings
---> 26 from .tokenization_utils_base import (
     27     ENCODE_KWARGS_DOCSTRING,
     28     ENCODE_PLUS_ADDITIONAL_KWARGS_DOCSTRING,

~/miniconda3/envs/convlab-2/lib/python3.7/site-packages/transformers/tokenization_utils_base.py in <module>
     29 
     30 import numpy as np
---> 31 from tokenizers import AddedToken
     32 from tokenizers import Encoding as EncodingFast
     33 

~/miniconda3/envs/convlab-2/lib/python3.7/site-packages/tokenizers/__init__.py in <module>
     15 EncodeInput = Union[TextEncodeInput, PreTokenizedEncodeInput]
     16 
---> 17 from .tokenizers import Tokenizer, Encoding, AddedToken
     18 from .tokenizers import decoders
     19 from .tokenizers import models

ImportError: dlopen(/Users/wujingwujing/miniconda3/envs/convlab-2/lib/python3.7/site-packages/tokenizers/tokenizers.cpython-37m-darwin.so, 2): Symbol not found: ____chkstk_darwin
  Referenced from: /Users/wujingwujing/miniconda3/envs/convlab-2/lib/python3.7/site-packages/tokenizers/tokenizers.cpython-37m-darwin.so (which was built for Mac OS X 10.15)
  Expected in: /usr/lib/libSystem.B.dylib
 in /Users/wujingwujing/miniconda3/envs/convlab-2/lib/python3.7/site-packages/tokenizers/tokenizers.cpython-37m-darwin.so

[Maintenance] improve agenda police

Describe the feature
Improve user simulator, mostly agenda policy

Expected behavior

add dialogue act which indicates the current domain (such as ['Inform', 'Hotel', 'none, 'none']) when the simulator talks about a new domain.
add dontcare response for those system requests that are not in the user goal.
user policy should take ['Internet', 'none'] system act as ['Internet', 'yes'], and ['Parking', 'none'] as ['Parking', 'yes']. The system acts only have 'none' as value for 'Internet' and 'Parking' slots, while user acts have 'yes', 'no', 'none' for these slots.
When the system recommends one or more choices, the user policy could say "ok" or "just randomly pick one".
When the system offers to book but the user does not need to book, the user could say "I 'm not looking to make a booking at the moment."
Improve the order of dialogue acts for templateNLG, such as say the name first, then other slots.

Additional context
Have look a few simulated dialogues.

[Feature] Is there a way to create a light bot on my own few-shot dataset ?

Describe the feature
I have create a simple dataset for my own domain which contains little lines conversations sentences. And how can I create a simple bot on my own few-shot dataset like rasa ?

Expected behavior
I hope this will be supported.

Additional context
And more, how to develop a new dataset for ConvLab-2 ? Is there a document for that ?

Questions about "Is your goal oriented model performing well"

Hi there,

Thanks for the excellent paper of Takanobu et al. (2020). I thoroughly appreciate the systematic comparison across so many points, and especially the call out of mismatch between single turn performance and overall performance.

One question I have is about the source code. Right now, the top line README for this repo shows a very similar table to Table 1 of the paper, but with very different results (and indeed, quite a few different systems, with some disagreeing). Furthermore, you have similar tables for 2, 3, etc in the README, but with different metrics.

How the table in the README different from the one presented in the paper?
Why the mismatch of reported metrics? Can one use ConvLab-2 to produce the rest of the metrics reported in the paper?
The paper calls out that the code was all open source in February, while this repo became available in May. Perhaps I should be looking at a different repo?
If there are two repos, which one do you anticipate receiving more support? I assume this one, as it aims to be a standardized platform.

Best wishes from the ParlAI team & collaborators

Thanks,
Stephen

[BUG] Sequicity

I find the result of sequicity is quite bad in convlab2(e2e/sequicity), and when I print the result, it's something like this:
user: May I have the address ?
sys: the address is .
I try to train the sequicity in convlab2, the test result is at least not so bad as the result by running the script. So I wonder if there are some bugs in sequicity inference.

How to use the interactive tool on End2End model?

When open source the code?

Shall we recognize the domains and slots in system utterances in Multi-domain Dialog State Tracking task?

Some slots such as restaurant_name may occur in system utterances, i.e. in dialogue MUL2499, the bot says, "All saints church looks good , would you like to head there ?". Logically there is no need to recognize slot values in system utterances but baseline models like SUMBT and TRADE did it. I want to confirm whether you will supply the annotations for system utterances in the final test set.

error when importing convlab2

This may be a silly issue, but when trying to run the tutorial notebook on Colab, convlab2 doesn't get imported after cloning the repo and installing. This is a sample notebook with the two lines involved:

https://colab.research.google.com/drive/1B6p8_GXoPUau9AaZUnKmCm3hQX8BcsIv?usp=sharing

Thanks!

Nick

[BUG] Not training algorithms that use the 'UserPolicyAgendaMultiWoz'

When training policies that use the UserPolicyAgendaMultiWoz such as PG, PPO and GDPL, it throws an Exception.
This Exception is related to the file convlab2/task/multiwoz/goal_generator.py line 181, where the self.corpus_path seems to be None instead of a path.
Indeed, in the UserPolicyAgendaMultiWozclass, we initialize the GoalGenerator with defaults values (corpus_path = None, included), and we do not set this anywhere.

To Reproduce
Steps to reproduce an example of the behavior (PPO policy):

Go to convlab2/policy/ppo
Run python train.py
It will throw an Exception as shown in the image below.

Expected behavior
It should train the policy with the respective algorithm (PPO in this case)

Additional context
It also happens when evaluating an policy using convlab2/policy/evaluate.py script.

[Feature] integrate SOLOIST into ConvLab-2

Describe the feature
SOLOIST

[Maintenance] RL policy training

Describe the feature
I've noticed that there are a few issues (#8, #13, #15, #20, #40) mention that it's hard to train RL policy (PG, PPO, GDPL). Thanks all of you, we have fixed some bugs. To help discussion and debugging, I suggest we report the bugs all under this issue.

Since we have improved our user agenda policy (#31), the performance of RL policies in the README is out-of-date. However, as mentioned in this comment, you can still reproduce the result before the change of user policy.

Currently, we are working on training RL policies with the newest agenda policy. We greatly appreciate it If you could help!

Since the training of RL policy is unstable and sensitive to hyperparameters, here are some suggestions (and we welcome more):

Initialize the parameters by imitation learning (load parameters trained by MLE policy).
Save the model that performs best during training.
Try multiple runs with different random seeds.
Tuning hyperparameters.

[BUG] analizer example with wrong signature

In the example https://github.com/thu-coai/ConvLab-2/blob/master/convlab2/util/analysis_tool/example.py the analyzer instantiation analyzer = Analyzer(user_agent=user_agent, use_nlu=True, dataset='multiwoz') has a wrong signature as use_nlu is not implemented. Removing that works fine, just seems you forgot to remove it when changing the analyzer class.

restaurant database is not complete

In the restaurant database, 3 restaurants do not have phone

[BUG] user utterance from user simulator is empty randomly

Describe the bug
user simulator with the following configuration generates empty user utterance for some turns.

Here is a sample conversation of user/system with user utterance empty:

user: I need a restaurant . It just needs to be expensive . I am also in the market for a new restaurant . Is there something in the centre of town ? Do you have portuguese food ?
sys: I have n't found any in the centre. I am unable to find any portuguese restaurants in town .
user: It just needs to be cheap .
sys: It is in the centre area . They serve portuguese . Would you like to try nandos city centre ? They are in the cheap price range . I will book it for you and get a reference number ?
user:
sys:
user: Can I get the postcode for the restaurant ?
sys: The postcode is cb23ar . Is there anything else I can help you with today ?
user: Ok , have a good day . Goodbye .
sys: Thank you and goodbye . You are welcome . Have a good day !.

To Reproduce
Steps to reproduce the behavior:

set the seed to 20200720 and look at the logs from one of the tests. user utterance for some of the turns are empty
Note: I used my own system agent for these tests.

Expected behavior
user utterance shouldn't be empty for any turn.

[Feature] convlab-2 need to be improved in many aspects

Describe the feature
I think the convlab-2 is a great toolkit for chinese student/developer to create fantastic chatbot. But after read the code, I find that there are many aspects to be improved in this project which should be in good project, eg: auto-deploy-devops, auto-testing-deveops, full-document-in-code, and so on ....

Expected behavior
I you agree, I will go ahead to improve the quality of this project which will be a good open-source project. I really like this project and field.

Additional context
I am a graduate student of Beijing University of Posts and Telecommunications, the author of open-source project python-wechaty, and the [menter of summer-of-code](https://github.com/wechaty/summer-of-code/issues/6). And more, my major interest directional is task-oriented dialogue.

I got a different results on test.

I run the test code. The result I got is far different from the result of yours.
For example, the result of BERTNLU-RuleDST-RulePolicy-TemplateNLG model is

complete number of dialogs/tot: 0.901 success number of dialogs/tot: 0.571 average precision: 0.7524100652627299 average recall: 0.905928745583918 average f1: 0.7967497596208643 average book rate: 0.905176116838488 average turn (succ): 10.325744308231174 average turn (all): 12.374 percentage of domains that satisfy the database constraints: 0.752 percentage of dialogs that satisfy the database constraints: 0.617
Is that normal?

What is different between MultiWOZ 2.1 dataset `train.json.zip`, `train_corrected.json.zip` and `MultiWOZ2.1_Cleaned.zip`?

train.json.zip: https://github.com/thu-coai/ConvLab-2/blob/master/data/multiwoz/train.json.zip
train_corrected.json.zip: https://github.com/thu-coai/ConvLab-2/blob/master/data/multiwoz/train_corrected.json.zip
MultiWOZ2.1_Cleaned.zip: https://github.com/ConvLab/ConvLab-2/blob/master/data/multiwoz/MultiWOZ2.1_Cleaned.zip

Training PPO-algorithm

I executed the provided train.py script in convlab2/policy/ppo with the prespecified configurations. During training, the success-rate starts pretty high with around 25% and then bumps around 30-35% for some while. When training is finished, I used the evaluation.py script in convlab2/policy to evaluate the performance which gives me 26%, far from the 74% reported in the table.

My Question: What is the exact configuration that has been used for training the 74% model?

[BUG] Evaluator regard "arriveBy": "24:**" as illegal time

Describe the bug
There are some illegal time in arriveBy slot, such as:

ConvLab-2/data/multiwoz/db/train_db.json

Line 793 in d046161

"arriveBy": "24:08",

But evaluator check them are False:

ConvLab-2/convlab2/evaluator/multiwoz_eval.py

Line 33 in d046161

time_re = re.compile(r'^(([01]\d|2[0-3]):([0-5]\d)|24:00)$')

ConvLab-2/convlab2/evaluator/multiwoz_eval.py

Line 224 in d046161

elif key == "arriveBy" or key == "leaveAt":

Expected behavior
Evaluator should regard them as legal time because they are in DB.

[Feature] end-to-end test with CrossWOZ

Describe the feature
I was able to run python tests/test_BERTNLU-RuleDST-RulePolicy-TemplateNLG.py and got the same results as in README. Thanks a lot!

However, as soon as I tried to run the same test for CrossWOZ, I ran into the issue that there was no evaluator for CrossWOZ dataset. In fact, currently analyzer only works with the default MultiWOZ evaluator.

I tried to use MultiWOZ evaluator but it didn't work. I looked at CrossWOZ repo but it seemed there wasn't an end-to-end evaluator class, either.

Expected behavior
A working end-to-end evaluator for CrossWOZ dataset.

Additional context
n/a

thu-coai / convlab-2 Goto Github PK

convlab-2's People

Stargazers

Watchers

Forkers

convlab-2's Issues

Recommend Projects

Recommend Topics

Recommend Org