Git Product home page Git Product logo

convlab's Introduction

ConvLab

ConvLab is an open-source multi-domain end-to-end dialog system platform, aiming to enable researchers to quickly set up experiments with reusable components and compare a large set of different approaches, ranging from conventional pipeline systems to end-to-end neural models, in common environments.

Package Overview

convlab an open-source multi-domain end-to-end dialog research library
convlab.agent a module for constructing dialog agents including RL algorithms
convlab.env a collection of environments
convlab.experiment a module for running experiments at various levels
convlab.evaluator a module for evaluating a dialog session with various metrics
convlab.modules a collection of state-of-the-art dialog system component models including NLU, DST, Policy, NLG
convlab.human_eval a server for conducting human evaluation using Amazon Mechanical Turk
convlab.lib a library of common utilities
convlab.spec a collection of experiment spec files

Installation

ConvLab requires Python 3.6.5 or later. Windows is currently not offically supported.

Installing via pip

Setting up a virtual environment

Conda can be used to set up a virtual environment with the version of Python required for ConvLab. If you already have a Python 3.6 or 3.7 environment you want to use, you can skip to the 'installing via pip' section.

  1. Download and install Conda.

  2. Create a Conda environment with Python 3.6.5

    conda create -n convlab python=3.6.5
  3. Activate the Conda environment. You will need to activate the Conda environment in each terminal in which you want to use ConvLab.

    source activate convlab

Installing the library and dependencies

Installing the library and dependencies is simple using pip.

pip install -r requirements.txt

If your Linux system does not have essential building tools installed, you might need to install it by running

sudo apt-get install build-essential

ConvLab uses 'stopwords' in nltk, and you need to download it by running

python -m nltk.downloader stopwords

Installation tips on CentOS

Please refer to the instructions here: https://github.com/daveta/convlab-notes/wiki/Installing-Convlab-on-Centos7

Installing using Docker

Docker provides more isolation and consistency, and also makes it easy to distribute your environment to a compute cluster.

Once you have installed Docker just run the following commands to get an environment that will run on either the CPU or GPU.

  1. Pull docker
    docker pull convlab/convlab:0.2.2

  2. Run docker
    docker run -it --rm convlab/convlab:0.2.2

Running ConvLab

Once you've downloaded ConvLab and installed required packages, you can run the command-line interface with the python run.py command.

$ python run.py {spec file} {spec name} {mode}

For non-RL policies:

# to evaluate a dialog system consisting of NLU(OneNet), DST(Rule), Policy(Rule), NLG(Template) on the MultiWOZ environment
$ python run.py demo.json onenet_rule_rule_template eval

# to see natural language utterances 
$ LOG_LEVEL=NL python run.py demo.json onenet_rule_rule_template eval

# to see natural language utterances and dialog acts 
$ LOG_LEVEL=ACT python run.py demo.json onenet_rule_rule_template eval

# to see natural language utterances, dialog acts and state representation
$ LOG_LEVEL=STATE python run.py demo.json onenet_rule_rule_template eval

For RL policies:

# to train a DQN policy with NLU(OneNet), DST(Rule), NLG(Template) on the MultiWOZ environment
$ python run.py demo.json onenet_rule_dqn_template train

# to use the policy trained above (this will load up the onenet_rule_dqn_template_t0_s0_*.pt files under the output/onenet_rule_dqn_template_{timestamp}/model directory)
$ python run.py demo.json onenet_rule_dqn_template eval@output/onenet_rule_dqn_template_{timestamp}/model/onenet_rule_dqn_template_t0_s0

Note that currently ConvLab can only train the policy component by interacting with a user simulator. For other components, ConvLab supports offline supervise learning. For example, you can train an NLU model using the local training script as in OneNet.

Creating a new spec file

A spec file is used to fully specify experiments including a dialog agent and a user simulator. It is a JSON of multiple experiment specs, each containing the keys agent, env, body, meta, search.

We based our implementation on SLM-Lab.

Instead of writing one from scratch, you are welcome to modify the convlab/spec/demo.json file. Once you have created a new spec file, place it under convlab/spec directory and run your experiments. Note that you don't have to prepend convlab/spec/ before your spec file name.

Participation in DSTC-8

  1. Extend ConvLab with your code, and include submission.json under the convlab/spec directory.
  2. In submission.json, specify up to 5 specs with the name submission[1-5].
  3. Make sure the code with the config is runnable in the docker environment.
  4. If your code uses external packages beyond the existing docker environment, please choose one of the following two approaches to specify your environment requirements:
    • Add install.sh under the convlab directory. install.sh should include all required extra packages.
    • Create your own Dockerfile with the name dev.dockerfile
  5. Zip the system and submit.

Evaluation

  1. Automatic end2end Evaluation: The submitted system will be evaluated using the user-simulator setting in spec milu_rule_rule_template in convlab/spec/baseline.json. We will use the evaluator MultiWozEvaluator in convlab/evaluator/multiwoz to report metrics including success rate, average reward, number of turms, precision, recall, and F1 score.
  2. Human Evaluation: The submitted system will be evaluated in Amazon Mechanic Turk. Crowd-workers will communicate with your summited system, and provide a rating based on the whole experience (language understanding, appropriateness, etc.)

Contributions

The ConvLab team welcomes contributions from the community. Pull requests must have one approving review and no requested changes before they are merged. The ConvLab team reserves the right to reject or revert contributions that we don't think are good additions.

Citing

If you use ConvLab in your research, please cite ConvLab: Multi-Domain End-to-End Dialog System Platform.

@inproceedings{lee2019convlab,
  title={ConvLab: Multi-Domain End-to-End Dialog System Platform},
  author={Lee, Sungjin and Zhu, Qi and Takanobu, Ryuichi and Li, Xiang and Zhang, Yaoqin and Zhang, Zheng and Li, Jinchao and Peng, Baolin and Li, Xiujun and Huang, Minlie and Gao, Jianfeng},
  booktitle={Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
  year={2019}
}

convlab's People

Contributors

aatkinson avatar blpeng1991 avatar cclauss avatar changukshin avatar dependabot[bot] avatar jincli avatar leejayyoon avatar mathias3 avatar pengbaolin avatar rxy1212 avatar seungjaeryanlee avatar sungjinl avatar temporaer avatar truthless11 avatar zqwerty avatar zz-jacob avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

convlab's Issues

about booking for train

Hi,

I suspected that the function add_sys_da (

def add_sys_da(self, da_turn):
) about booking for train.

MultiWozEvaluator monitor booked state using add_sys_da from

if da == 'booking-book-ref' and self.cur_domain in ['hotel', 'restaurant', 'train']:
to
self.booked['taxi'] = 'booked'
.

In line

elif da == 'train-offerbook-ref' or da == 'train-inform-ref':
,
the condition only handle da == "train-offerbook-ref", not "train-offerbooked-ref".

I understood "train-offerbook" means that the system suggest the user to book, not booked.
and "train-offerbooked" means the system has booked user's ticket.

The reason is, I logged my system and Those are the message.

Case1-1) (train-offerbook),
[Syst] tr8464 arrives at 17:43. would you like to book it?
[SYS_DA] train-inform-id-tr8464
[SYS_DA] train-inform-arrive-17:43
[SYS_DA] train-offerbook-none-none

Case1-2) (train-offerbook)
[Syst] i have train tr8842 that will arrive at 07:52. would you like me to book that for you?
[SYS_DA] train-inform-arrive-07:52
[SYS_DA] train-inform-id-tr8842
[SYS_DA] train-offerbook-none-none

Case2-1) (train-offerbooked)
[Syst] i have booked you for the tr8842 train. the total fee is 8.08 gbp payable at the station. your reference number is : r6h3h3h
[SYS_DA] train-offerbooked-id-tr8842
[SYS_DA] train-offerbooked-ticket-8.08 gbp
[SYS_DA] train-inform-ref-r6h3h3h

Case2-2) (train-offerbooked)
[Syst] i have booked 5 tickets for you. the reference number is : zlm0wvqs. is there anything else i can help you with?
[SYS_DA] train-offerbooked-ref-zlm0wvqs
[SYS_DA] train-offerbooked-people-5
[SYS_DA] general-reqmore-none-none

And, I never saw the case "train-offerbook-ref" just in my system.
If the user simulator need to make those case, the system need to say like below.
"Would you like to book that train which the reference number is 01234567"
=> "train-offerbook-ref-01234567"
Q1. Should my model say like this?

I did the evaluator option (MultiWozEvaluator).
and user simulator(env) spec in baseline.json is MILU(NLU)-Agenda(UserPolicy)-RuleBased(SysPolicy)-TemplateNLG(NLG).

Q2. Could you add the case of condition "train-offerbook-ref"? (If not, could you say the reason?)

Thank you :)

Is there any method to speed up the training process?

I try to train the warmup DQN, however, it will cost me about 2 hours to train one model. Is there any method to speed up the training process? I got one server, which contains 96 CPU unit, but it also run slow on that.

How to measure success rate? function "task_complate" in policy_agenda_multiwoz.py

Hi,

This is the evaluation code(gen_avg_result in convlab/experiment/analysis.py) and I want to know how to measure "success_rate".
gen_avg_result

To check how to measure "success_rate", I thought "task_complete" in convlab/modules/policy/user/multiwoz/policy_agenda_multiwoz.py is core function I guessed. (in eval mode)

not_sure_vals

task_complete_comment

In the line 286-287, and the line 290-291,
if the slot's value is NOT_SURE_VALS(?, don't care, no, ..) return False.
otherwise return True.

I thought it's too generous to judge that task is success.

For example, when the slot is phone number, the sampled expected answer is 000-0000-0000 (or [phone number identifier]).
But, If my system don't say those NOT_SURE_VALS, every answer is correct.

Did I misunderstand in some part?
Or, only the 'evaluator mode'(not eval) can measure correctly?

I also checked 'evaluator mode', the core function is "task_success" .
the inner function is "inform_F1"(I just ignored "book_rate" function)
task_success_comment
inform_F1_comment

To be success(inform_sess[1] == 1, in task_success function), we should make 'rec' be 1.
it means TP should be > 0, and FN = 0.

So, the core function is "_inform_F1_goal in MultiWozEvaluator".
inform_F1_goal_comment

In line 179-182, The code just said If something in there, return True, otherwise False.
I think same mechanism happen above.

Is it really right way?

maximum limit of submission file in codalab

Hi,

I tried to upload my model code and pretrained weights in codalab test submission.
But, my submission.zip file has 1.6 GB.
I uploaded and waited a moment but, It didn't work.

I googled the limit of submission file in codalab and,
codalab/codalab-competitions#1050
It seems like only about 300 MB is covered.

Q 1. Is it right to upload my pretrain weight file?
Q 2. If yes, how to upload pretrain weight file? (The way is just uploading own files on cloud service and downloading them in my code when running?)

Thanks.

Some problems on evaluator(MILU NLU)

Hi, while evaluating our model, I noticed something strange in the evaluator.

  1. Goal: {'restaurant': {'info': {'food': 'italian', 'pricerange': 'moderate', 'area': 'east'}, 'reqt': {'address': '?', 'postcode': '?'}, 'fail_info': {'food': 'italian', 'pricerange': 'expensive', 'area': 'east'}}, 'hotel': {'info': {'type': 'guesthouse', 'parking': 'yes', 'pricerange': 'moderate', 'area': 'north'}, 'reqt': {'stars': '?', 'address': '?'}}, 'taxi': {'info': {'leaveAt': '03:30'}, 'reqt': {'car type': '?', 'phone': '?'}}}

the goal of this episode is requesting the address of moderate price italian restaurant in east. But,
User utterance: I am looking for a expensive restaurant in the center of the city . I would prefer italian food please . I would prefer it in the east area .

the generated user utterance contains two area keywords, center of the city and east area. In this case, our dialog system find wrong db search results.

Goal: {'restaurant': {'info': {'food': 'british', 'pricerange': 'moderate', 'area': 'centre'}, 'book': {'day': 'sunday', 'time': '16:30', 'people': '8'}, 'booked': '?'}, 'hotel': {'info': {'parking': 'yes', 'pricerange': 'moderate'}, 'reqt': {'postcode': '?'}, 'book': {'stay': '2', 'people': '8', 'day': 'sunday'}, 'booked': '?'}}

[2019-08-21 20:38:59,466 PID:46413 NL logger.py nl] User utterance: Actually , can you give me the postcode for that ?
[2019-08-21 20:38:59,466 PID:46413 ACT logger.py act] Inferred user action: Actually , can you give me the postcode for that ?
[2019-08-21 20:38:59,466 PID:46413 STATE logger.py state] Dialog state: Actually , can you give me the postcode for that ?
[2019-08-21 20:39:00,699 PID:46413 ACT logger.py act] System action: the postcode is cb21ab.
[2019-08-21 20:39:00,699 PID:46413 NL logger.py nl] System utterance: the postcode is cb21ab.
[2019-08-21 20:39:00,718 PID:46413 ACT logger.py act] Inferred system action: {'Restaurant-Inform': [['Post', 'cb21ab']]}
[2019-08-21 20:39:00,718 PID:46413 NL logger.py nl] User utterance: And i need a postcode .
[2019-08-21 20:39:00,718 PID:46413 ACT logger.py act] Inferred user action: And i need a postcode .
[2019-08-21 20:39:00,718 PID:46413 STATE logger.py state] Dialog state: And i need a postcode .
[2019-08-21 20:39:03,112 PID:46413 ACT logger.py act] System action: the postcode is cb21ab.
[2019-08-21 20:39:03,112 PID:46413 NL logger.py nl] System utterance: the postcode is cb21ab.
[2019-08-21 20:39:03,125 PID:46413 ACT logger.py act] Inferred system action: {'Restaurant-Inform': [['Post', 'cb21ab']]}
[2019-08-21 20:39:03,125 PID:46413 NL logger.py nl] User utterance: That sounds fine , i just need the postcode though .
[2019-08-21 20:39:03,125 PID:46413 ACT logger.py act] Inferred user action: That sounds fine , i just need the postcode though .
[2019-08-21 20:39:03,125 PID:46413 STATE logger.py state] Dialog state: That sounds fine , i just need the postcode though .
{'food': 'british', 'pricerange': 'moderate', 'name': 'not mentioned', 'area': 'centre'}
[2019-08-21 20:39:04,374 PID:46413 ACT logger.py act] System action: the postcode is cb21ab.
[2019-08-21 20:39:04,374 PID:46413 NL logger.py nl] System utterance: the postcode is cb21ab.
[2019-08-21 20:39:04,389 PID:46413 ACT logger.py act] Inferred system action: {'Restaurant-Inform': [['Post', 'cb21ab']]}
[2019-08-21 20:39:04,389 PID:46413 NL logger.py nl] User utterance: Can i get the postcode for both of them ?
[2019-08-21 20:39:04,390 PID:46413 ACT logger.py act] Inferred user action: Can i get the postcode for both of them ?
[2019-08-21 20:39:04,390 PID:46413 STATE logger.py state] Dialog state: Can i get the postcode for both of them ?

Sometimes evaluator repeats same utterance over and over although our system give correct response and MILU correctly detect it's slot and value . Also, this task fail because it has goal 'hotel' but user didn't ask about any information about the hotel. What's the problem ?

  1. this episode is related to police domain.
    Goal: {'police': {'info': {}, 'reqt': {'address': '?', 'phone': '?'}}}

[2019-08-21 20:43:05,742 PID:46413 ACT logger.py act] Inferred user action: Can you give me the phone number please ? Also , can you give me the exact address to the station ?
[2019-08-21 20:43:05,742 PID:46413 STATE logger.py state] Dialog state: Can you give me the phone number please ? Also , can you give me the exact address to the station ?
[2019-08-21 20:43:06,065 PID:46413 ACT logger.py act] System action: the address is parkside, cambridge.
[2019-08-21 20:43:06,065 PID:46413 NL logger.py nl] System utterance: the address is parkside, cambridge.
[2019-08-21 20:43:06,092 PID:46413 ACT logger.py act] Inferred system action: {}

MILU NLU cannot detect any information related to police and hospital domain. Do you have an update plan for this problem?

  1. Should I consider when a user requests information that is not in db? (NoOffer, NoBook case)

thank you.

NLU(MILU) can't understand 'id'(and SpacySplitter can't recognize 'id')

Hi,

I found some weird thing.
Let's see my example.

[User] What 's the train id on that please ?
[Syst] the train id is tr8292 and the price is 16.50 pounds.
................[SYS_DA] train-inform-leave-tr8292
................[SYS_DA] train-inform-ticket-16.50 pounds
[User] Give me the train id , please .
[Syst] the train id is tr8292. would you like me to book it for you?
................[SYS_DA] train-inform-leave-tr8292
................[SYS_DA] train-offerbook-none-none
[User] What 's the train id on that please ?
[Syst] the train id is tr8292. would you like me to book it for you?
................[SYS_DA] train-inform-leave-tr8292
................[SYS_DA] train-offerbook-none-none
[User] What is the train id ?
[Syst] the train id is tr8292. is there anything else i can help you with?
................[SYS_DA] attraction-inform-phone-tr8292
................[SYS_DA] general-reqmore-none-none
[User] Give me the train id , please .
[Syst] the train id is tr8292. is there anything else i can help you with?
................[SYS_DA] attraction-inform-phone-tr8292
................[SYS_DA] general-reqmore-none-none
[User] Could you also give me the train id ?
[Syst] the train id is tr8292. is there anything else i can help you with?
................[SYS_DA] attraction-inform-phone-tr8292
................[SYS_DA] general-reqmore-none-none

User ask system "train id", and my system say 'train id is tr8292' but,
the next turn of User(user simulator) still ask "train id".
As you see, the user simulator understood 'train-inform-leave-tr8292', even it misunderstood 'attraction-inform-phone-tr8292'

So I suspected It must be derived from "NLU"(MILU) part.
And I checked that SpacySplitter can't recognize 'id'. (It split 'id' to 'i', 'd').
This is what I tried.
spacy
it can't recognize 'id', but can only recognize 'ID'.
I think it's quite serious part, because NLU can't understand 'id'

Question. Should the designer consider only 'ID'? not 'id'

Thank you.

About the tagging of training data

Q1:
We found that in the training data, about 1/4 of the dialog_act lacks the "Name" slot and about 1/3 of the dialog_act lacks the "Ref" slot when they are actually in the raw sentence.
For example the data
"text": "Actually , could I have the phone number of Cote ?"
"dialog_act": {"Restaurant-Request": [["Phone", "?"]]}
lacks the "Name" slot and the data
"text": "That sounds perfect , please book 1 ticket for me , and can I have the reference number ?"
"Train-Inform": [["People", "1"]]}
lacks the "Ref" slot.

Q2:
What is the configuration of GPU and other resources when evaluating?

train booking about the number of people

Hi,

I found some confusing thing.
This is my system example.

[User] I need 5 tickets .
[Syst] i have booked 5 tickets for you. your reference number is : zlm0jqqt. is there anything else i can help you with?
................[SYS_DA] train-offerbooked-ref-zlm0jqqt
................[SYS_DA] train-offerbooked-people-5
................[SYS_DA] general-reqmore-none-none

But,

elif da == 'train-offerbooked-ref' or da == 'train-inform-ref':

there's no 'train-offerbooked-people'.

So, in my system's evaluation phase, there's no booked information.
like this,
{'attraction': None,
'hospital': None,
'hotel': None,
'police': None,
'restaurant': None,
'taxi': None,
'train': None}

Could you add 'train-offerbooked-people'?

Thanks,

about DQN in Convlab2

Hey, guys, are you guys planing to move DQN from Convlab1 to Convlab2? Since I am a big fan of Convlab2, so I really want to konw if DQN will work in Convlab2. And I am doing reward function for RL recently, I have to make more RL learing curve on my paper.

Here are two reason that I think it may not work.

  1. The action space (MultiWoz) in Convlab1 is just arond 300, and evaluation should follows that. But in Convlab2, there are no such small action space, it is so large, up to 8000.
  2. When do evaluation, Convlab1 will only review the 300 actions, but for Convlab2, it will contains all of possible actions.

I am looking forward your reply, if Convlab2 not work, I have to move to Convlab1, which is not convinient for me since I have already done plenty of work on Convlab2.

The architecture of MILU?

What is the architecture of MILU? I can`t find it in the internet, would you release the detail model architecture in the future?

About the baseline of the dqn policy.

Could you mind give the baseline result(successful rate...) when used the dqn as the dialogue policy?
because when I run the original code with :
python run.py demo.json onenet_rule_dqn_template train
The final result is
100 episodes, -57.62 return, 1.00% success rate, 20.82 turns
Thus, I would like to know that whether my result is not correct,or I shoud change some parameters to get a better perfomance.

Thanks.

Undefined names

flake8 testing of https://github.com/ConvLab/ConvLab on Python 3.7.1

$ flake8 . --count --select=E9,F63,F72,F82 --show-source --statistics

./convlab/agent/algorithm/random.py:36:37: F821 undefined name 'util'
        if body.env.is_venv and not util.in_eval_lab_modes():
                                    ^
./convlab/agent/algorithm/base.py:86:16: F821 undefined name 'action'
        return action
               ^
./convlab/agent/algorithm/base.py:93:16: F821 undefined name 'batch'
        return batch
               ^
./convlab/env/__init__.py:60:65: F821 undefined name 'ENV_DATA_NAMES'
        _reward_v, state_v, done_v = self.aeb_space.init_data_v(ENV_DATA_NAMES)
                                                                ^
./convlab/env/__init__.py:65:69: F821 undefined name 'ENV_DATA_NAMES'
        _reward_space, state_space, done_space = self.aeb_space.add(ENV_DATA_NAMES, (_reward_v, state_v, done_v))
                                                                    ^
./convlab/env/__init__.py:71:64: F821 undefined name 'ENV_DATA_NAMES'
        reward_v, state_v, done_v = self.aeb_space.init_data_v(ENV_DATA_NAMES)
                                                               ^
./convlab/env/__init__.py:79:68: F821 undefined name 'ENV_DATA_NAMES'
        reward_space, state_space, done_space = self.aeb_space.add(ENV_DATA_NAMES, (reward_v, state_v, done_v))
                                                                   ^
./convlab/modules/e2e/multiwoz/Mem2Seq/main_nmt.py:31:26: E999 SyntaxError: positional argument follows keyword argument
                          i==0)
                         ^
./convlab/modules/usr/multiwoz/vhus_usr/usermodule.py:192:30: F821 undefined name 'Attention'
            self.attention = Attention(self.hidden_size)
                             ^
./convlab/modules/nlg/multiwoz/sc_lstm/bleu.py:175:18: F821 undefined name 'score_domain4'
		feat2content = score_domain4(args.res_file)
                 ^
./convlab/modules/nlu/multiwoz/svm/Classifier.py:398:26: F821 undefined name 'utils'
                lines += utils.svm_to_libsvm(self.classifiers[this_tuple].model)
                         ^
./convlab/modules/word_policy/multiwoz/mdrg/utils/dbPointer.py:115:20: F821 undefined name 'flag'
                if flag:
                   ^
1     E999 SyntaxError: positional argument follows keyword argument
11    F821 undefined name 'action'
12

E901,E999,F821,F822,F823 are the "showstopper" flake8 issues that can halt the runtime with a SyntaxError, NameError, etc. These 5 are different from most other flake8 issues which are merely "style violations" -- useful for readability but they do not effect runtime safety.

  • F821: undefined name name
  • F822: undefined name name in __all__
  • F823: local variable name referenced before assignment
  • E901: SyntaxError or IndentationError
  • E999: SyntaxError -- failed to compile a file into an Abstract Syntax Tree

NameError: name 'false' is not defined

I went through a tiny problem when I run the code with commad :
python demo.json onenet_rule_dqn_template train
Then it throws below error :
Traceback (most recent call last):
File "./convlab/spec/demo.json", line 14, in
"is_user": false
NameError: name 'false' is not defined

In fact I tried all the commad, it all ourrs the error.
I have no idea why. Please help me. Thanks.

tensorboardX version cause problems

I have finished convlab‘env config following the instruction, a problem occurs:

Invalid proto descriptor for file "tensorboard/compat/proto/resource_handle.proto":
tensorboard.ResourceHandleProto.device: "tensorboard.ResourceHandleProto.device" is already defined in file "tensorboardX/src/resource_handle.proto".
tensorboard.ResourceHandleProto.container: "tensorboard.ResourceHandleProto.container" is already defined in file "tensorboardX/src/resource_handle.proto".
tensorboard.ResourceHandleProto.name: "tensorboard.ResourceHandleProto.name" is already defined in file "tensorboardX/src/resource_handle.proto".
tensorboard.ResourceHandleProto.hash_code: "tensorboard.ResourceHandleProto.hash_code" is already defined in file "tensorboardX/src/resource_handle.proto".
tensorboard.ResourceHandleProto.maybe_type_name: "tensorboard.ResourceHandleProto.maybe_type_name" is already defined in file "tensorboardX/src/resource_handle.proto".
tensorboard.ResourceHandleProto: "tensorboard.ResourceHandleProto" is already defined in file "tensorboardX/src/resource_handle.proto".

then I uninstall tensorboardX=1.2.0 and installed a higher version 1.8.0. It works well.

but there is a new problem occured:

ERROR: allennlp 0.8.2 has requirement pytz==2017.3, but you'll have pytz 2019.1 which is incompatible.
ERROR: allennlp 0.8.2 has requirement tensorboardX==1.2, but you'll have tensorboardx 1.8 which is incompatible.

the allennlp also required in requirements.txt

Some Questions on ConvLab-Module-NLG-SCLSTM

Hi! Thank you for sharing such a good toolkit!
Recently, when I looked into the NLG module, I found some confusing parts:

  1. There are several evaluate code in ConvLab NLG module, including:
  • the evaluate.py in the ConvLab/convlab/modules/nlg/multiwoz/evaluate.py

image

  • the bleu.py in the ConvLab/convlab/modules/nlg/multiwoz/sc_lstm/bleu.py

image

I don't know which code is the right evaluation code. They all have different methods for calculating BLEU Score.

And in the evaluate.py , line 52-57, I think the way to construct refs may have some problems, it make the Corpus Level BLEU score on test.json higher. I got the Corpus Level BLEU score 0.49 while in another paper the Corpus Level BLEU score is 0.18.

image

Looking forward to your reply. Thanks!

problem of baseline

when I run this command python run.py baseline.json onenet_rule_dqn_template train, it displays such information in the dataframe.

20190717-193558(eSpace)

why is the total_reward always -59? I nearly didn't modify your spec. could you tell me how to adjust it to run this baseline normally?
The spec is following:
"onenet_rule_dqn_template": {
"agent": [{
"name": "DialogAgent",
"nlu": {
"name": "OneNetLU",
"model_file": "https://convlab.blob.core.windows.net/models/onenet.tar.gz"
},
"dst": {
"name": "RuleDST"
},
"nlg": {
"name": "MultiwozTemplateNLG",
"is_user": false
},
"state_encoder": {
"name": "MultiWozStateEncoder"
},
"action_decoder": {
"name": "MultiWozVocabActionDecoder"
},
"algorithm": {
"name": "DQN",
"action_pdtype": "Argmax",
"action_policy": "epsilon_greedy",
"explore_var_spec": {
"name": "linear_decay",
"start_val": 0.1,
"end_val": 0.01,
"start_step": 1000,
"end_step": 500000
},
"gamma": 0.9,
"training_batch_iter": 1000,
"training_iter": 1,
"training_frequency": 100,
"training_start_step": 32,
"normalize_state": false
},
"memory": {
"name": "Replay",
"batch_size": 16,
"max_size": 50000,
"use_cer": false
},
"net": {
"type": "MLPNet",
"hid_layers": [100],
"hid_layers_activation": "relu",
"clip_grad_val": null,
"loss_spec": {
"name": "MSELoss"
},
"optim_spec": {
"name": "Adam",
"lr": 0.00001
},
"lr_scheduler_spec": {
"name": "StepLR",
"step_size": 1000,
"gamma": 0.999,
},
"update_type": "replace",
"update_frequency": 300,
"polyak_coef": 0,
"gpu": true
}
}],
"env": [{
"name": "multiwoz",
"action_dim": 300,
"observation_dim": 392,
"max_t": 40,
"max_frame": 500000,
"nlu": {
"name": "MILU",
"model_file": "https://convlab.blob.core.windows.net/models/milu.tar.gz"
},
"user_policy": {
"name": "UserPolicyAgendaMultiWoz"
},
"sys_policy": {
"name": "RuleBasedMultiwozBot"
},
"nlg": {
"name": "MultiwozTemplateNLG",
"is_user": true
},
"evaluator": {
"name": "MultiWozEvaluator"
}
}],
"meta": {
"distributed": false,
"num_eval": 100,
"eval_frequency": 1000,
"max_tick_unit": "total_t",
"max_trial": 1,
"max_session": 5,
"resources": {
"num_cpus": 0,
"num_gpus": 1
}
}
},

Question: Adding new domain

Hi there,

I'm about to use ConvLab on a new domain. Is there a guide or something of what to do in order to add a new domain to ConvLab? The ConvLab is quite comprehensive and I don't want to miss a place, where I have to add code, when adding a new domain.
Currently I want to focus on the evaluation, i.e. I have to implement an user simulator for my domain and reimplement the metrics. What is the best way to get a new user simulator?

Thank you very much and kind regards,
Stefan

Typo

Typo on readme page how to install ConvLab
connda --> conda

Which Components Should We Develop for DSTC-8 track 1 and How Is a Submission Evaluated?

I do not know here is the right place to ask such a question. Sorry if I got the wrong place...

We can make so many changes in a system spec.
For DSTC 8 track 1, I think we should develop models for agent/algorithm, agent/dst, and agent/nlg. Is it right? Or should we make every change possible to achieve a better score?

I also want to know how a submitted system is evaluated because env is not always the same. (nlu does not exist in baseline.json/onenet_rule_dqn_template.) Which env should we assume for evaluation?

Thank you.

I found mysterious codes in env/multiwoz.py

Hi,
I found mysterious codes in env/multiwoz.py.

"reset" function in class MultiWozEnvironment,

self.history.extend(["null", f'{user_response}'])

self.history extends with f"{user_response}".
So the result is, ['null', '$REAL_USER_RESPONSE']

but, when the episode progresses,
"step" function in class MultiWozEnvironment,

self.history.extend([f'sys_response', f'user_response'])

self.history extends with [f'sys_response', f'user_response']
So the result is, ['null', '$REAL_USER_RESPONSE', 'sys_response', 'user_response'...].

I guess the code's original intention is [f"{sys_response}", f"{user_response}"].
the expected result is, ['null', '$REAL_USER_RESPONSE', '$REAL_SYS_RESPONSE', '$REAL_USER_RESPONSE'].

Those codes affect MILU's 'parse' function. So, It might be important thing.

Please check this part and let us know.
Thank you!

Bugs in the user simulator

Hi, while training ConvLab I've discovered a couple of bugs in the user simulator:

  1. Sometimes the check_constraint function in policy_agenda_multiwoz.py (line 254) is passed the string "soonest" forval_usr. The function expects times in the form XX:YY, cannot resolve "soonest", and therefore throws an error.

  2. The function get_request_state in multiwoz_state_encoder.py (line 69) throws an error when it tries to get in the index of the pricerange slot in REF_USR_DA[Attraction]. This slot does not exist in the Attraction domain of REF_USR_DA.

Can you implement a fix?

Thanks!
Gabriel

Issue with the Demo Policies

Do you have any successful results training policies other than WDQN. I managed to train wdqn as adviced in #33, however when trying to train any other policies, like PPO for example I am unable to get any success.

Have you had any success with training these policies? If so could you give me some tips on what worked for you?

Thanks for your help.

About time(arriveBy, leaveAt) of train case

Hello!

I found a strange part when simulating user simulator.
I used "milu-agenda-template"

Let's see my example.

-----Goal-----
{
    "train": {
        "info": {
            "destination": "cambridge",
            "day": "thursday",
            "arriveBy": "17:15",
            "departure": "cambridge"
        },
        "reqt": {
            "leaveAt": "?",
            "price": "?"
        },
        "book": {
            "people": "5"
        },
        "booked": "?"
    },
    "restaurant": {
        "info": {
            "pricerange": "cheap"
        },
        "reqt": {
            "address": "?"
        }
    }
}
-----Goal-----

[User] I need to travel on thursday .
[Syst] Where are you departing from and going to?
[User] Yes I would like to go to cambridge please . Great I also need a train departs from cambridge .
[Syst] What time would you like to leave?
***[User] I need it to arrive by 17:15 .***
[Syst] TR4526 arrives at 17:09. Would you like me to book it for you?
\ \ \ \ Inferred system action : {'Train-Inform': [['Arrive', '17:09'], ['Id', 'TR4526']], 'Train-OfferBook': [['none', 'none']]} (This is from logger.act('Inferred system action : {self.get_sys_act()}'))
***[User] I need it to arrive by 17:15 .***

As you see, the user simulator repeated same utterance,
and MILU can understand 17:09 as arrive time.(Inferred system action)
I followed steps of how user simulator(MILU-PolicyAgenda) works.

[policy.predict]

action, session_over, reward = self.policy.predict(None, sys_act)

[agenda.update]

self.agenda.update(sys_action, self.goal)

[update_domain] diaact='train-inform', slot_vals=[['arriveBy','17:09''],['trainID','TR4526']]

if self.update_domain(diaact, slot_vals, goal):

I think this line can't cover our case.

elif len(g_fail_info) <= 0 and slot in g_info and value != g_info[slot]:

[if condition part] is,
slot : 'arriveBy', value : '17:09'
len(g_fail_info) <= 0 : True
slot in g_info : True
value != g_info[slot] : True (17:09 v.s. 17:15)

It means the simulator only accepts "17:15", not earlier than "17:15"
(Normally, when the train arrive earlier than expected, It tolerates, reversely,
when the train depart later than expected, It tolerates.
and the query function(dbquery.py) and _book_rate_goal also follow this rule.)
Also, there's no train satisfying 17:15 with previous constraints (arrive cambridge, depart cambridge.) therefore, my model can just recommend the earlier, but most closest train (from train database).

If this closest time recommendation is not tolerated, my model should say there's no train matching with your requirement. (but there's still a problem, because the time slot is in "info", not "book" or "fail_book".)

Is there any opinion for this circumstance?

I captured my case to see easily.
screenshot
the process after this is, it push the unsatisfying slot-value('arriveBy', '17:15') and the user utterance is repeated.

Thank you for reading my messy issue!

Bug in MILU NLU training/dataset reader

Hello,

We think we found some quite serious issues in
ConvLab/convlab/modules/nlu/multiwoz/milu/dataset_reader.py, when training individual NLUs for agent and user.

  1. Both lines 77 and 88 use variable i to enumerate over data (turns in the dialogue, and tokens in a turn, respectively). The scope of the loop in line 88 effectively overrides the variable used in line 77.

  2. As an effect of this(?), lines 122 and 124 seems to have their tests for user vs. agent turn switched, and call the continue statement on the wrong turns
    if self._agent and self._agent == "user" and i % 2 != 1:
    continue
    will continue on turns that are even (0, 2 , 4), even though those turns (in the data) are the user turns. Vice-versa for agent.

  3. As an effect of (1), it seems like the i referred to in lines 122 and 124 are actually equal to the number of tokens of the current turn, instead of the ID/counter of the current turn. (Thus it is basically random...)

These issues effectively render training user- and agent-specific models using the system.jsonnet /user.jsonnet configurations useless, as the training is not done with the proper system or user turns.

  1. Regardless of the above, it seems strange to have the turn-number check at the end of the turn loop, instead of the beginning. Right now, a whole lot of processing is carried out, before it is simply "thrown away" due to the continue statements toward the end.

Please let us know if our observations here are wrong/if we have overlooked anything, but it seems like at the moment, agent- and user-specific MILU training is broken.

Cheers,
Philip

about function _check_value

Hi,

I checked my system's output and, there are some ambiguous issues.
Below is my system's message

[User] Are there any 4 stars available ? It should be a guesthouse type hotel. I ' m looking for a place to stay in the centre part of town .
[Syst] i have two guesthouses in the centre of town. one is cheap and the other is moderately priced. do you have a preference?
[SYS_DA] hotel-inform-price-moderately priced
[SYS_DA] hotel-inform-choice-two
[SYS_DA] hotel-inform-price-cheap
[SYS_DA] hotel-inform-area-centre of town
[SYS_DA] hotel-inform-type-guesthouses
[SYS_DA] hotel-inform-choice-other
[SYS_DA] hotel-select-none-none

[SYS_DA] is printed at the function "add_sys_da" in evaluator/multiwoz.py and this is what user simulator NLU(my case, MILU) understood.

MILU can't parse "moderately priced" into "moderate" in price case, but also parse "centre of town" into "centre".
The problem is when inform_F1 and _check_value is invoked,

def _check_value(self, key, value):

_check_value return "False", because it only allows "moderate" and "centre" (not moderately priced and centre of town).

I think they are correct. Could you allow those case?

Code suggestion is like,
elif key == "pricerange":
....for candidate in ["cheap", "expensive", "moderate", "free"]:
........if candidate in value:
............return True
....return False
The "area" case also be changed same way.

Thank you.

AttributeError: 'str' object has no attribute 'items'

Hi, I clone the code and find the following error when executing

python run.py demo.json onenet_rule_rule_template eval

I run it in python3.6.5, so anyone can help?

Traceback (most recent call last):
  File "run.py", line 102, in <module>
    main()
  File "run.py", line 91, in main
    read_spec_and_run(*args)
  File "run.py", line 68, in read_spec_and_run
    run_spec(spec, lab_mode)
  File "run.py", line 49, in run_spec
    Session(spec).run()
  File "/home/super/DSTC8/ConvLab/convlab/experiment/control.py", line 142, in run
    self.run_eval()
  File "/home/super/DSTC8/ConvLab/convlab/experiment/control.py", line 95, in run_eval
    avg_return, avg_len, avg_success, avg_p, avg_r, avg_f1, avg_book_rate = analysis.gen_avg_result(self.agent, self.eval_env, self.num_eval) 
  File "/home/super/DSTC8/ConvLab/convlab/experiment/analysis.py", line 68, in gen_avg_result
    returns.append(gen_result(agent, env))
  File "/home/super/DSTC8/ConvLab/convlab/experiment/analysis.py", line 59, in gen_result
    _return = gen_return(agent, env)
  File "/home/super/DSTC8/ConvLab/convlab/experiment/analysis.py", line 35, in gen_return
    next_obs, reward, done, info = env.step(action)
  File "/home/super/DSTC8/ConvLab/convlab/env/multiwoz.py", line 242, in step
    env_info_dict = self.u_env.step(action)
  File "/home/super/DSTC8/ConvLab/convlab/env/multiwoz.py", line 110, in step
    action = da_normalize(action, role='sys')
  File "/home/super/DSTC8/ConvLab/convlab/modules/util/multiwoz/da_normalize.py", line 71, in da_normalize
    for act, svs in das.items():
AttributeError: 'str' object has no attribute 'items'

Sc-lstm training is not working

Dear,
I tried to train the sc-lstm module and it didn't work, please find the log below.
What could be the error?

mrc@marco-instance-0:~/ConvLab/convlab/modules/nlg/multiwoz/sc_lstm$ PYTHONPATH=../../../../.. python3 run_woz.py  --mode=train --model_path=sclstm.pt --n_layer=1 --lr=0.005 > sclstm.log
Processing data...
# of turns
Train: 26414
Valid: 3171
Test: 3154
# of batches: Train 103 Valid 12 Test 12
Using deep version with 1 layer
Start training
Epoch 0 (n_layer 1)
Traceback (most recent call last):
  File "run_woz.py", line 382, in <module>
    train(config, args)
  File "run_woz.py", line 226, in train
    train_epoch(config, dataset, model)
  File "run_woz.py", line 144, in train_epoch
    _ = model(input_var, dataset, feats_var)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/mrc/ConvLab/convlab/modules/nlg/multiwoz/sc_lstm/model/lm_deep.py", line 50, in forward
    gen=gen, sample_size=sample_size)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/mrc/ConvLab/convlab/modules/nlg/multiwoz/sc_lstm/model/layers/decoder_deep.py", line 155, in forward
    output, last_hidden, last_cell, last_dt = self.rnn_step(vocab_t, last_hidden, last_cell, last_dt, gen=gen)
  File "/home/mrc/ConvLab/convlab/modules/nlg/multiwoz/sc_lstm/model/layers/decoder_deep.py", line 111, in rnn_step
    _hidden, _cell, _dt = self._step(input_t, last_hidden, last_cell[i], last_dt[i], i)
  File "/home/mrc/ConvLab/convlab/modules/nlg/multiwoz/sc_lstm/model/layers/decoder_deep.py", line 69, in _step
    w2h = self.w2h[layer_idx](input_t) # (batch_size, hidden_size*4)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 87, in forward
    return F.linear(input, self.weight, self.bias)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py", line 1370, in linear
    ret = torch.addmm(bias, input, weight.t())
RuntimeError: Expected object of device type cuda but got device type cpu for argument #2 'mat1' in call to _th_addmm

fix some installation errors

had to add this 2 commands to make it work with the 1rst method (installing using conda)

sudo apt-get install build-essential
otherwise i get an error when running the requirement.txt
and
python3 -m nltk.downloader all
to fix the nltk path when running the run.py file.

Evaluation configuration and Pretrained MDBT

  1. Evaluation
    For automatic end2end evaluation, it is denoted that The submitted system will be evaluated using the current user-simulator in the repository with milu-rule-template configuration. Can you describe the exact configuration in Conv/convlab/spec folder?

  2. mdbt_rule_template configure
    For mdbt_rule_tremplate configure, can we download the pretrained mdbt model? It is because currently we can't train any components except policy models.

python run.py demo.json mdbt_rule_template eval

FileNotFoundError: [Errno 2] No such file or directory: 'data/mdbt/word-vectors/paragram_300_sl999.txt'

  1. onenet_rule_mdrg
    For this configuration, I faced with the following errors.

Response generation error
IndexError: list index out of range

Do you have any idea for this?

Thank you

About the evaluation

We found some bugs in convlab/evaluator/multiwoz.py. So is the code used in automatic evaluation the same as convlab/evaluator/multiwoz.py? If the code is different, did you fix the bugs such as the bug in issue 79.

I had questions about database and evaluation.

Thank you for organizing and managing this challenge!
@sungjinl @truthless11

I had 4 questions.

  1. in the "baseline.json", the user simulator side("env") of 4 examples has same components;
    NLU(MILU) - UserPolicy(AgendaMultiWoz) - SysPolicy(RuleBased) - NLG(MultiwozTemplateNLG)
    Will you fix those setting in evaluation session? If so, participants can believe their automatic score right (with evaluator)?

  2. As I questioned last time, the any string value filling in slot except "?"(question mark) makes success that slot, when the "task_success" function is called.
    But, for natural conversations between system and user, at least the system said correct slot type; for example, "11:00" for [time] slot, not "Hilton hotel", "0123420402" for [phone] slot, not "Hilton hotel".
    Will you check those slot type for each slot? (I thought the codes doesn't check this way until now.)

  3. In the conversation simulation(user-system), The User informs their situation and constrain the range, like "domain hotel, price range : cheap, area : east". as the normal way, the system should find the hotel satisfying those constraint. It means the system should investigate the database in Multiwoz.
    My question is this challenge encourage us to investigate those db("hotel.db", "restaurant.db"..)? (system should fill the satisfying value considering db)
    Or, the evaluator discards the constraint so just check the slot type?, or even discards the slot type?

And I think if we can use "database", there's no confusion about place name. Because the evaluator can grade those value which is in database. So, not allow the synonym, typos, only accept the value in db.

  1. Subsequently, In human evaluation phase, will crowd-workers check the slot value which satisfy the constraint(with db)? or just check "appropriateness"
    Thank you for reading this issue!

Plot the learning curve.

Excuse me, how to plot the graph mentioned in the analysis.py ?
And I check the training output file, the graph file is empty.

About human evaluation and Amazon Mechanical Turk

Dear ConvLab Organizers,

Thank you for organizing ConvLab!
I participated in Track 1 Task1 End-to-End Multi-Domain Dialog Challenge in DSTC8.
I want to redo the “human evaluation process” in ConvLab and get human evaluation score using Amazon Mechanical Turk.
I also found the “convlab.human_eval” in convlab official github page.

But, I’m not sure the codes in the “convlab.human_eval” itself can be used in AMT system (I guessed it's hard to use directly for my case), and I don’t have any experience of AMT system.

If I can, I hope to upload the human evaluation code in AMT and evaluate my model.

I have 5 questions.

  1. Can I use human_eval codes in convlab github page?

  2. Are the codes in the “convlab.human_eval” itself (published in convlab) can be used in AMT system?

  3. Can you give a brief introduction to use ConvLab Human Evaluation code? If it is complicated, a tiny suggestion also helps me a lot.

  4. By the way, Can you explain me how each crowd-workers work? (#81 might be related).
    I hope to know the exact way to evaluate.
    For example, Each worker interacts one model and marks such that

a. task success, fail
(Option 1 :He/she didn’t check whether the booked information is appropriate into the goal’s informable slot and didn’t check requestable slots. He/she just only check the flow of conversation is natural.
Option 2: He/she “strictly” check the booked information is fitted on the goal’s informable slot and check requestable slots.
Option 3: All the criteria is up to each crowd-worker.)

b. language understanding score on the 5 point scale,
c. response appropriateness score on the 5 point scale.
(This is what I guessed.)

  1. Minor question is how to encourage the AMT crowd workers work faithfully to prevent from unfaithful participation?
    For example, The base reward is small but, when they accomplish something, the success reward will be granted.

Thank you for reading.

Sincerely yours,
Jeong-gwan Lee

About the human evaluation

Hello, I have two questions about the human evaluation.

  1. In final human evaluation, you use task success rate to rank each submission. Issue #80 says that the code of computing task success rate may be different in each submission. Did you fix it in human evaluation?
  2. Task proposal says task success rate, dialog length, irrelevant turn rate, redundant turn rate, user satisfaction score will be considered in crowdworker-based evaluation while you only use task success rate finally.

Thank you.

about automatic evaluation

hello, I am the participant of DSTC8 track1 task1, and today results of automatic evaluation was released.
I have some question about the evaluator used in the automatic evaluation.
While we were evaluating our model, we found a strange case.
스크린샷 2019-10-15 오후 3 32 23
as shown in the image, milu NLU always detect 'parking' and 'internet' slot value as 'none' value even if we give correct responses.
But, in evaluator,
image
_check_value function return False when value is 'none'.
The problem is that there are many cases like this. So if we allow 'none' value to pass the _check_value function, our model's success rate increases almost 10% .
Could you please check this issue?

No this file models/300/model.tar.gz

When I run .convlab\modules\policy\system\multiwoz\vanilla_mle\polcy.py,
I found there is no models/300/model.tar.gz mentioned in the code.
How can I get this file? ?

Where should be the right place I set the seed?

Hello, guys, recently I am going to set seed so that my results could be reproductive. And I am going to set the seed for rule_wdqn or rule_ppo for policy training. Is there any clue where should I set the seed?

No file in data/glove?

Hi!
I am training the Sequicity model (one of the e2e models).
image
I download the given data.

image

But when I run: python model.py -mode train -model tsdf-multiwoz
There is a FileNotFoundError: [Errno 2] No such file or directory: '/convlab/ConvLab/convlab/modules/e2e/multiwoz/Sequicity/data/glove/glove.6B.50d.txt'
So, should there be a txt file in the folder?
Thank you!

About evaluation

Hi,
I have 3 questions.
In the human evaluation phase, What is the criteria to evaluate participant's model?
I have already read in the task proposal pdf(http://workshop.colips.org/dstc7/dstc8/DTSC8_multidomain_task_proposal.pdf).
For task1, "Crowsworker-based evaluation" is given.
For task2(meta-learning task), they wrote "Human annotators will be asked to judge the appropriateness, informativeness and utility of the responses".

Q1) Could you explain more detail about human evaluation of task1? (Will the same measurement in task2(appropriateness, informativeness, and utility of the responses) be applied?)

Q2) I also wonder how crowdworkers work in real. one is interacting with participant's model (such crowdworkers play a role as user) and judging(appropriateness, informativeness, and utility of the responses). the other is the conversation with user simulator is given, then crowdworkers read the conversation with user simulator and judging since the automatic measurement and user simulator is not perfect.(NLU part sometimes failed to understand model's intent, so crowdworkers revise it).

Q3) What kind of autometric measurement will be used? When I run with "evaluator" mode, I got return(average reward), success rate, average turns, P(precision), R(Recall), F1 score.(6 metrics)
But, in previous, I saw only three measurements(average reward, success rate, average turns) in CodaLab DSTC8 page.(https://competitions.codalab.org/competitions/20162#learn_the_details-overview) (now those explanation disappeared. That's why I ask). Until now, Will the participant's model be evaluated using those 6 metric right?

Thank you!

NLU(MILU) can't recognize "free internet"?

Hi,

I found that MILU can't regognize "internet is available or not"

Let's see my example.

ex 1)
[User] I am looking for a particular hotel . its name is called kirkwood house.
[Syst] the kirkwood house is a guesthouse in the north area. it is moderately priced and has 4 stars. would you like me to book it for you?
[SYS_DA] hotel-inform-stars-4
[SYS_DA] hotel-inform-area-north
[SYS_DA] hotel-inform-price-moderately priced
[SYS_DA] hotel-inform-name-the kirkwood house
[SYS_DA] hotel-inform-type-guesthouse
[SYS_DA] booking-inform-none-none
[User] Does it have internet ?
[Syst] yes it has internet.
[SYS_DA] hotel-inform-internet-none

ex 2)
[User] Does it have internet ? I do n't need it booked but what s the address ? What type of hotel is it ?
[Syst] dojo noodle bar is a guesthouse. the address is 40210 millers yard city centre.
[SYS_DA] restaurant-inform-name-dojo noodle bar
[SYS_DA] restaurant-inform-addr-40210 millers yard city centre
[SYS_DA] hotel-inform-type-guesthouse
[User] Can you tell me if the hotel has internet available ? Could i get their address ?
[Syst] yes, it has internet. the address is 40210 millers yard, chesterton.
[SYS_DA] hotel-inform-addr-chesterton
[SYS_DA] hotel-inform-addr-40210 millers yard
[User] No , i just need to know if they have free internet .
[Syst] they do have free internet.
[SYS_DA] hotel-inform-internet-none

ex 3)
[User] I need a place to stay in the don't care please .
[Syst] i have many options available. is there a price range you prefer?
[SYS_DA] hotel-inform-choice-many
[SYS_DA] hotel-request-price-?
[User] How many stars ? No , i just need to know if they have free internet . I just need the postcode .
[Syst] the acorn guest house has internet. the postcode is cb41da.
[SYS_DA] hotel-inform-post-cb41da
[SYS_DA] hotel-inform-name-the acorn guest house
[SYS_DA] hotel-inform-internet-none

I also manually tried MILU can parse "free internet" sentence.
I made wifi_sentence_list like this.
wifi_list = ["this hotel has free wifi", "it has free wifi", "it has free internet", "it includes internet",
"they has free internet", "it serve free internet", "hotel has free wifi", "this hotel has free internet"]
and only the case "this hotel has free wifi" => {''Hotel-Inform", [['Internet', 'yes']]}

To guarantee my experiment, I captured my result.
internet_capture

Can't we overcome those MILU's limitation?

Thank you.

MultiWoz action decoder KeyError: 'Ref'

python run.py milu_rule_dqn_template train

[2019-09-10 22:23:58,059 PID:6203 INFO init.py log_summary] Trial 0 session 0 milu_rule_dqn_template_t0_s0 [train_df] epi: 124 t: 20 wall_t: 520 opt_step: 120000 frame: 2499 fps: 4.80577 total_reward: -59 avg_return: -58.78 avg_len: nan avg_success: nan loss: 0.0269472 lr: 0.001 explore_var: 0.0997296 entropy_coef: nan entropy: nan grad_norm: nan
Process Process-2:
Traceback (most recent call last):
File "/home/ddpghgg/anaconda3/envs/convlab/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/ddpghgg/anaconda3/envs/convlab/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/ddpghgg/ConvLab/convlab/experiment/control.py", line 32, in mp_run_session
metrics = session.run()
File "/home/ddpghgg/ConvLab/convlab/experiment/control.py", line 145, in run
self.run_rl()
File "/home/ddpghgg/ConvLab/convlab/experiment/control.py", line 128, in run_rl
action = self.agent.act(obs)
File "/home/ddpghgg/ConvLab/convlab/agent/init.py", line 149, in act
output_act, decoded_action = self.action_decode(action, self.body.state)
File "/home/ddpghgg/ConvLab/convlab/agent/init.py", line 187, in action_decode
output_act = self.action_decoder.decode(action, state) if self.action_decoder else action
File "/home/ddpghgg/ConvLab/convlab/modules/action_decoder/multiwoz/multiwoz_vocab_action_decoder.py", line 97, in decode
action[act] = [["Ref", kb_result[0]["Ref"]]]
KeyError: 'Ref'

Hi I was just following the demo code, but got an Key Error in multiwoz_vocab_action_decoder.py where it retrieves the results from db query. It seems that the result db_query don't have a key of "Ref" in it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.