Git Product home page Git Product logo

emnlp2019-adjective-knowledge-for-text-to-sql's Introduction

Adjective-Knowledge-for-Text-to-SQL

This is the source code for our paper Leveraging Adjective-Noun Phrasing Knowledge for Comparison Relation Prediction in Text-to-SQL (EMNLP 2019).

In this paper, we propose to leverage adjective-noun phrasing knowledge mined from the web to predict the comparison relations in text-to-SQL. Experimental results on both the original and the re-split Spider dataset show that our approach achieves significant improvement over syntaxSQL and SQLNet on comparison relation prediction.

Preliminaries

Enviroment Setup

  1. The baseline codes use Python 2.7 and Pytorch 0.2.0 GPU. Install Python dependency: pip install -r requirements.txt Alternatively use docker: docker pull buaa1156/py27torch0.2cuda8vim:latest
  2. The preprocess scripts use Python >= 3.5.

Data and Embeddings

  1. The dataset comes from the Spider task website, and the singletable and resplitdata used in our paper are under data/singletable and data/resplitdata respectively.
  2. The knowledge used in this paper is under the folder data/knowledge.
  3. Download the pretrained Glove, and put it under syntaxSQL and SQLNet folders as glove/glove.42B.300d.txt
  4. Download evaluation.py and process_sql.py from the Spider github page, and evaluate the results following their instructions.

Run syntaxSQL with Knowledge

  1. Generated train and dev data by running:
    • python3 preprocess_syntaxSQL.py train|dev singletable|resplitdata
  2. Preprocess knowledge features by running:
    • python3 preprocess_direction_features.py syntaxSQL singletable|resplitdata weighted|direct
  3. Run run_train.sh and run_test.sh under the directory syntaxSQL after setting the data_type, feats_format, and DATE at first lines.
    • data_type: singletable or resplitdata
    • feats_format: weighted or direct
    • DATE: automatically set as local time while training and manually assigned while testing

Run SQLNet with Knowledge

  1. Copy files in data/ directory to SQLNet/data/
  2. Preprocess knowledge features by running:
    • python3 preprocess_direction_features.py SQLNet singletable|resplitdata weighted|direct
  3. Run run_train.sh and run_test.sh under the directory SQLNet after setting the data_type, feats_format, and DATE at first lines.
    • data_type: singletable or resplitdata
    • feats_format: weighted or direct
    • DATE: automatically set as local time while training and manually assigned while testing

Question

If you have any question, please go ahead and open an issue.

Citation

@inproceedings{liu2019leveraging,
  title={Leveraging Adjective-Noun Phrasing Knowledge for Comparison Relation Prediction in Text-to-SQL},
  author={Liu, Haoyan and Fang, Lei and Liu, Qian and Chen, Bei and Jian-Guang, LOU and Li, Zhoujun},
  booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)},
  pages={3506--3511},
  year={2019}
}

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repositories using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

emnlp2019-adjective-knowledge-for-text-to-sql's People

Contributors

haoyanliu avatar microsoftopensource avatar msftgits avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

emnlp2019-adjective-knowledge-for-text-to-sql's Issues

Where to define the matching score?

Hi, you mentioned to leverage fuzzywuzzy lib to define the matching score between column name and nouns from kg, but I did not find this piece of code in the project. What I observed is only to match the lemma instead. Please confirm.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.