google / airdialogue_model Goto Github PK

View Code? Open in Web Editor NEW

15.0 15.0 12.0 139 KB

License: Apache License 2.0

Python 94.52% Shell 5.48%

airdialogue_model's People

Contributors

Stargazers

Watchers

Forkers

josephch405 w31w31 gaoyiyeah zpeng1989 leftice hmjianggatech neotim entn-at sciai-ai isabella232 ghas-results

airdialogue_model's Issues

Unexpected Error When Preprocessing

After I run bash scripts/preprocess.sh -p train, I got the following error:

mode = airdialogue
word cutoff = 10
nltk data path = ./data/nltk
data path = ./data/airdialogue
partition = train
json path = ./data/airdialogue/json
tokenized path = ./data/airdialogue/tokenized
tokenizing train data...
2020-05-18 20:39:43.271598: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
loading data: 642868it [04:13, 2532.79it/s]
process kb: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 321459/321459 [01:57<00:00, 2728.20it/s]
process raw data: 0%| | 0/321459 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/opt/conda/bin/airdialogue", line 4, in
import('pkg_resources').run_script('airdialogue==0.1', 'airdialogue')
File "/opt/conda/lib/python3.7/site-packages/pkg_resources/init.py", line 667, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/opt/conda/lib/python3.7/site-packages/pkg_resources/init.py", line 1471, in run_script
exec(script_code, namespace, namespace)
File "/opt/conda/lib/python3.7/site-packages/airdialogue-0.1-py3.7.egg/EGG-INFO/scripts/airdialogue", line 33, in
File "/opt/conda/lib/python3.7/site-packages/airdialogue-0.1-py3.7.egg/airdialogue/prepro/prepro_main.py", line 276, in main
File "/opt/conda/lib/python3.7/site-packages/airdialogue-0.1-py3.7.egg/airdialogue/prepro/prepro_main.py", line 162, in load_data_from_jsons
File "/opt/conda/lib/python3.7/site-packages/airdialogue-0.1-py3.7.egg/airdialogue/prepro/tokenize_lib.py", line 359, in process_main_data
File "/opt/conda/lib/python3.7/site-packages/airdialogue-0.1-py3.7.egg/airdialogue/prepro/tokenize_lib.py", line 286, in get_dialogue_boundary
AssertionError: start token appeared twice: Hello Hello How may I help you ? Can you help me to change my recent reservation because my trip dates are got postponed ? I will help you with that please share your name to proceed further ? Edward hall here . Please wait for a while . Sure , take your own time . There is no active reservation found under your name to amend it . That 's ok , thank you for checking . Thank you for choosing us .

Is that because of python3?

Multiple bugs for evaluating selfplay

In README, Section 5 Scoring:

airdialogue score --pred_data ./data/out_dir/dev_selfplay_out.txt \
                  --true_data ./data/airdialogue/tokenized/dev.selfplay.eval.data \
                  --true_kb ./data/airdialogue/tokenized/dev.selfplay.eval.kb \
                  --task selfplay \
                  --output ./data/out_dir/dev_selfplay.json

It loads tokenized true_data and true_kb.
However according to
https://github.com/josephch405/airdialogue/blob/c74072f8667d92839dc39e98b386ce8e932c8c68/airdialogue/evaluator/evaluator_main.py#L240-L256
, it actually needs json files.
May be change it to

                  --true_data ./data/airdialogue/json/dev_data.json \
                  --true_kb ./data/airdialogue/json/dev_kb.json \

After fixing the previous bug, another one appears:

https://github.com/josephch405/airdialogue/blob/c74072f8667d92839dc39e98b386ce8e932c8c68/airdialogue/evaluator/evaluator_main.py#L247

it process pred_json_obj['action'] using action_obj_to_str. This step, however, has been done when generating dev_selfplay_out.txt

maybe remove action_obj_to_str?

After that, another one appears:
https://github.com/josephch405/airdialogue/blob/c74072f8667d92839dc39e98b386ce8e932c8c68/airdialogue/evaluator/evaluator_main.py#L252

pred_json_obj is not compatible with json_obj_to_tokens, where pred_json_obj do not have key dialogue. Instead pred_json_obj has a key called utterance

I can get the program run via replacing that line by

pred_raw_text = pred_json_obj['utterance'].replace('<t1> ','').replace('<t2> ','').split(' ')

However, it think that may not be the optimal solution.

Inconsistent BatchSize

airdialogue_model/self_play.py

Lines 132 to 144 in 5bdc799

 for i in tqdm(list(range(0, ceil))): 

 start_ind = i * batch_size 

 end_ind = min(i * batch_size + batch_size, len(selfplay_data)) 

 batch_data = selfplay_data[start_ind:end_ind] 

 batch_kb = selfplay_kb[start_ind:end_ind] 

 # we indicaet to let agent1 to talk first. Keep in mind that we will 

 # swap between agent1 and agent2. 

 speaker = flip % 2 

 generated_data, _, summary = dialogue.talk(hparams.max_dialogue_len, 

 batch_data, batch_kb, agent1, 

 agent2, worker_step, 

 batch_size, speaker)

In line 144, we should replace batch_size by end_ind-start_ind. Otherwise, there will be an inconsistent batchsize issue in the last iteration.

google / airdialogue_model Goto Github PK

airdialogue_model's People

Contributors

Stargazers

Watchers

Forkers

airdialogue_model's Issues

Unexpected Error When Preprocessing

Multiple bugs for evaluating selfplay

Inconsistent BatchSize

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	for i in tqdm(list(range(0, ceil))):
	start_ind = i * batch_size
	end_ind = min(i * batch_size + batch_size, len(selfplay_data))

	batch_data = selfplay_data[start_ind:end_ind]
	batch_kb = selfplay_kb[start_ind:end_ind]
	# we indicaet to let agent1 to talk first. Keep in mind that we will
	# swap between agent1 and agent2.
	speaker = flip % 2
	generated_data, _, summary = dialogue.talk(hparams.max_dialogue_len,
	batch_data, batch_kb, agent1,
	agent2, worker_step,
	batch_size, speaker)