Git Product home page Git Product logo

ketod's Introduction

KETOD: Knowledge-Enriched Task-Oriented Dialogue

This repo contains the dataset from NAACL 2022 paper "KETOD: Knowledge-Enriched Task-Oriented Dialogue" https://arxiv.org/abs/2205.05589

Dataset Generation

KETOD is built upon the google SGD dataset. Here we release our knowledge-enriched utterances annotations and the script to generate the final dataset.

  1. Go to https://github.com/google-research-datasets/dstc8-schema-guided-dialogue to download the SGD dataset.
  2. Unzip ketod_release.zip and put with the SGD dataset in the same directory.
  3. Edit the main entry of gen_ketod_data.py to set up your own data paths.
  4. Run 'python gen_ketod_data.py' to generate the full KETOD dataset.

Dataset Format

Each entry of the data is one dialogue. It has the following fields:

"dialogue_id": unique id of the dialogue.

"turns": the list of dialogue turns. Besides the original fields in the SGD dataset, if it is an enriched turn, then we have the following additional fields:
    {
      "enrich": True. For turns without chitchat enrichment, this field is False. 
      "entity_query": The entity query we use to do knowledge retrieval.
      "enriched_utter": The utterance enriched with chitchat. Another field 'utterance' is the original response in the SGD dataset.
      "kg_snippets": the index of the ground truth knowledge snippets
      "kg_snippets_text": the content of the ground truth knowledge snippets
    }
  
"dialog_query": all the entity queries we use to do knowledge retrieval in this dialog

"entity_passages": all the wikipedia passages retrieved in this dialog

"entity_passage_sents": all the wikipedia passages retrieved in this dialog, breaked into snippets associated with index numbers

Code

To run the model, go to the "code" folder.

To run the knowledge selection model, go to "kg_selection" folder: run process_data.py first, then train the model with Train.py, generate the kg selection results with Test.py, for all train, dev, and test sets.

To run the SimpleToDPlus model, go to "simpletodplus" folder: modify and run gen_kg_train.py to generate data files with the kg selection results. Then run gen_data.py to generate train/dev/test files for the model input formats. Using the run_simpletod.sh script, run train_simpletod.py for training, and test_simpletod_simple.py for testing. You need to modify and follow the steps at the end of the test_simpletod_simple.py file to generate the results for each step.

Citation

If you find this project useful, please cite it using the following format

@inproceedings{DBLP:conf/naacl/ChenLMSCW22,
  author    = {Zhiyu Chen and
               Bing Liu and
               Seungwhan Moon and
               Chinnadhurai Sankar and
               Paul A. Crook and
               William Yang Wang},
  editor    = {Marine Carpuat and
               Marie{-}Catherine de Marneffe and
               Iv{\'{a}}n Vladimir Meza Ru{\'{\i}}z},
  title     = {{KETOD:} Knowledge-Enriched Task-Oriented Dialogue},
  booktitle = {Findings of the Association for Computational Linguistics: {NAACL}
               2022, Seattle, WA, United States, July 10-15, 2022},
  pages     = {2581--2593},
  publisher = {Association for Computational Linguistics},
  year      = {2022},
  url       = {https://doi.org/10.18653/v1/2022.findings-naacl.197},
  doi       = {10.18653/v1/2022.findings-naacl.197},
  timestamp = {Mon, 01 Aug 2022 16:27:57 +0200},
  biburl    = {https://dblp.org/rec/conf/naacl/ChenLMSCW22.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

License

KETOD is released under MIT license, see LICENSE for details.

ketod's People

Contributors

czyssrs avatar facebook-github-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ketod's Issues

Missing entity_schemas with schema_all.json for testing

test retrieved
python3 test_simpletod_simple.py
--saved_model_path="model1_20210818161419/saved_model/loads/11/model.pt"
--output_path="${root_path}outputs/"
--model_dir_name="model1_gold_action_retrieved_kg_gold_decision"
--test_input="${root_path}todkg_dataset/runs/model1/model1.lm.input.eval.test_retrieved.txt"
--test_input_gold_action="${root_path}todkg_dataset/runs/model1/model1.lm.input.eval.goldaction.test_retrieved.txt"
--test_input_gold_kg="${root_path}todkg_dataset/runs/model1/model1.lm.input.eval.goldkg.test_retrieved.txt"
--test_input_gold_decision="${root_path}todkg_dataset/runs/model1/model1.lm.input.eval.golddecision.test_retrieved.txt"
--test_oracle_input="${root_path}todkg_dataset/runs/model1/model1.lm.input.test_retrieved.txt"
--test_input_original="${root_path}todkg_dataset/runs/model1/processed_model1_test_retrieved.json"
--test_inter="${root_path}todkg_dataset/runs/model1/test_retrieved_inter.json"
--test_inter_res="${root_path}todkg_dataset/runs/model1/predictions_kg_select.json"
--en_schema="${root_path}todkg_dataset/entity_schemas/schema_all.json"
--num_passages=2
--num_para=2
--eos_token_id=50256
--batch_size=1
--max_seq_len=1024
--gold_action \

For testing, it is required to access schema_all.json file, but it is not available.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.