Git Product home page Git Product logo

hsp's Introduction

Complex Question Decomposition for Semantic Parsing

This is the code base for ACL'19 paper Complex Question Decomposition for Semantic Parsing.

1. Preprocess

1.1 Download raw data

Download ComplexWebQ data, prepare environment and libraries.

1.2 Requirements for preprocess

In order to run preprocess, you should put the following files in DATA_PATH directory, DATA_PATH is defined in the script.

  • ComplexWebQuestions_train.json
  • ComplexWebQuestions_dev.json
  • ComplexWebQuestions_test.json (we need any other information in above files)
  • train.json, dev.json, test.json (we need the splitted sub questions in these files, it is not included in raw data, but we generate and prepare them for users in complex_questions directory, also you can generate them by following steps)

1.2.1 Generate golden sub question split points by yourself

  • cd WebAsKB.
  • Prepare a StanfordCoreNLP server in localhost following 1.2.2.
  • Change data_dir setting in WebAsKB/config.py.
  • Change EVALUATION_SET setting in WebAsKB/config.py to train, dev and test, and Run python webaskb_run.py gen_golden_sup for three times.
  • By this time, you can get train.json, dev.json, test.json in DATA_PATH.

1.2.2 Prepare a StanfordCoreNLP server

In order to run the POS annotation process, you should download and start a StanfordCoreNLP server in localhost:9003.

  • Download from https://nlp.stanford.edu/software/stanford-corenlp-full-2018-10-05.zip, unzip and cd to it.
  • Start server using java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9003 -timeout 15000.

1.3 Modify script

We provide a template script scripts/run.sh for the users, and you need to change the following directory settings at least to run it.

  • DATA_PATH: where the data root directory is.
  • RUN_T2T: the root directory of the code base.

1.3 Run Preprocess

Now run scripts/run.sh preprocess, the command will generate the data format for our model, and annotate POS labels.

2. Prepare

Prepare Glove pretrained embedding file glove.6B.300d.txt and put it in DATA_PATH/embed/.

scripts/run.sh prepare will shuffle the dataset and build vocabulary file.

3. Train

To train our decompose model, use scripts/run.sh train.

To train our semantic parsing model, use scripts/run.sh train_lf.

4. Test

scripts/run.sh test, it will generate decomposed query with a input file, and print bleu-4 & rouge-l score compared to references.

scripts/run.sh test_lf, it will generate logical form with a input file, and print EM score compared to references.

Citation

If you use this code in your research, please kindly cite our paper via the following BibTeX.

@inproceedings{Zhang2019HSP,
author = {Zhang, Haoyu and Cai, Jingjing and Xu, Jianjun and Wang, Ji},
booktitle = {Conference of the Association for Computational Linguistics (ACL)},
year = {2019}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.