Git Product home page Git Product logo

airdialogue_model's People

Contributors

josephch405 avatar w31w31 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

airdialogue_model's Issues

Unexpected Error When Preprocessing

After I run bash scripts/preprocess.sh -p train, I got the following error:

mode = airdialogue
word cutoff = 10
nltk data path = ./data/nltk
data path = ./data/airdialogue
partition = train
json path = ./data/airdialogue/json
tokenized path = ./data/airdialogue/tokenized
tokenizing train data...
2020-05-18 20:39:43.271598: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
loading data: 642868it [04:13, 2532.79it/s]
process kb: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 321459/321459 [01:57<00:00, 2728.20it/s]
process raw data: 0%| | 0/321459 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/opt/conda/bin/airdialogue", line 4, in
import('pkg_resources').run_script('airdialogue==0.1', 'airdialogue')
File "/opt/conda/lib/python3.7/site-packages/pkg_resources/init.py", line 667, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/opt/conda/lib/python3.7/site-packages/pkg_resources/init.py", line 1471, in run_script
exec(script_code, namespace, namespace)
File "/opt/conda/lib/python3.7/site-packages/airdialogue-0.1-py3.7.egg/EGG-INFO/scripts/airdialogue", line 33, in
File "/opt/conda/lib/python3.7/site-packages/airdialogue-0.1-py3.7.egg/airdialogue/prepro/prepro_main.py", line 276, in main
File "/opt/conda/lib/python3.7/site-packages/airdialogue-0.1-py3.7.egg/airdialogue/prepro/prepro_main.py", line 162, in load_data_from_jsons
File "/opt/conda/lib/python3.7/site-packages/airdialogue-0.1-py3.7.egg/airdialogue/prepro/tokenize_lib.py", line 359, in process_main_data
File "/opt/conda/lib/python3.7/site-packages/airdialogue-0.1-py3.7.egg/airdialogue/prepro/tokenize_lib.py", line 286, in get_dialogue_boundary
AssertionError: start token appeared twice: Hello Hello How may I help you ? Can you help me to change my recent reservation because my trip dates are got postponed ? I will help you with that please share your name to proceed further ? Edward hall here . Please wait for a while . Sure , take your own time . There is no active reservation found under your name to amend it . That 's ok , thank you for checking . Thank you for choosing us .

Is that because of python3?

Multiple bugs for evaluating selfplay

  1. In README, Section 5 Scoring:
airdialogue score --pred_data ./data/out_dir/dev_selfplay_out.txt \
                  --true_data ./data/airdialogue/tokenized/dev.selfplay.eval.data \
                  --true_kb ./data/airdialogue/tokenized/dev.selfplay.eval.kb \
                  --task selfplay \
                  --output ./data/out_dir/dev_selfplay.json

It loads tokenized true_data and true_kb.
However according to
https://github.com/josephch405/airdialogue/blob/c74072f8667d92839dc39e98b386ce8e932c8c68/airdialogue/evaluator/evaluator_main.py#L240-L256
, it actually needs json files.
May be change it to

                  --true_data ./data/airdialogue/json/dev_data.json \
                  --true_kb ./data/airdialogue/json/dev_kb.json \

?

  1. After fixing the previous bug, another one appears:

https://github.com/josephch405/airdialogue/blob/c74072f8667d92839dc39e98b386ce8e932c8c68/airdialogue/evaluator/evaluator_main.py#L247

it process pred_json_obj['action'] using action_obj_to_str. This step, however, has been done when generating dev_selfplay_out.txt

maybe remove action_obj_to_str?

  1. After that, another one appears:
    https://github.com/josephch405/airdialogue/blob/c74072f8667d92839dc39e98b386ce8e932c8c68/airdialogue/evaluator/evaluator_main.py#L252

pred_json_obj is not compatible with json_obj_to_tokens, where pred_json_obj do not have key dialogue. Instead pred_json_obj has a key called utterance

I can get the program run via replacing that line by

pred_raw_text = pred_json_obj['utterance'].replace('<t1> ','').replace('<t2> ','').split(' ')

However, it think that may not be the optimal solution.

Inconsistent BatchSize

for i in tqdm(list(range(0, ceil))):
start_ind = i * batch_size
end_ind = min(i * batch_size + batch_size, len(selfplay_data))
batch_data = selfplay_data[start_ind:end_ind]
batch_kb = selfplay_kb[start_ind:end_ind]
# we indicaet to let agent1 to talk first. Keep in mind that we will
# swap between agent1 and agent2.
speaker = flip % 2
generated_data, _, summary = dialogue.talk(hparams.max_dialogue_len,
batch_data, batch_kb, agent1,
agent2, worker_step,
batch_size, speaker)

In line 144, we should replace batch_size by end_ind-start_ind. Otherwise, there will be an inconsistent batchsize issue in the last iteration.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.