Git Product home page Git Product logo

tamr's Introduction

TAMR

This repository contains the official implementation of experiments conducted in

  • TAMR - A Lightweight AMR Toolkit For Enhancing NL2SQL Solutions (ICDE 2025 Submission)

Repo structure:

  • configs: Json files for various running configurtions
  • seq2seq: Codebase for NL2SQL experiments, which is adapted from Picard for the file structure.
    • datasets: Dataset related python files
    • metrics: Metrics related python files
    • utils: Folder with python utility files
  • T5: A modified T5 architecture with AMR augmented, which is derived from Huggingface Transformers.
  • run_seq2seq_internal.py: A python file for training main experiments.

For GPT4-based, please refer to https://github.com/causalNLP/amr_llm. We modified the prompt as indicated in Table III to V.

Dependencies

  • python3.7 or above

Install python packages

pip install -r requirements.txt

Basic Usage

Train AMRT5-large (LN) model

python run_seq2seq_internal.py --config_files=configs/train_amr.json

Train AMRT5-large (SC) model

python run_seq2seq_internal.py --config_files=configs/train_amr_sc.json

Train AMRT5-3B (SC) model

python run_seq2seq_internal.py --config_files=configs/train_amr_3B.json

Eval the model A

python run_seq2seq_internal_eval.py --model_path=path/to/ckpt_A

NL2SQL datasets

Dataset Link
Spider https://drive.usercontent.google.com/download?id=1iRDVHLr4mX2wQKSgA9J8Pire73Jahh0m&export=download&authuser=0
SYN https://github.com/ygan/Spider-Syn
DK https://github.com/ygan/Spider-DK
REALISTIC https://zenodo.org/records/5205322

We also provide preprocessed datasets in the folder data.

AMR

AMR is a comprehensive semantic graph representation of a sentence. It utilizes a directed acyclic graph structure with a root node and represents important concepts as nodes and semantic relationships as edges.

AMR can help PLM to augment their semantics to strive a better trade off between efficiency and effectiveness.

AMR parser

The parser we choose is orginal from here. We modified this to be suitable for Natural Language Questions (NLQs) by retraining it using a corpus of NLQ-AMR pairs.

Datasets Citations

@inproceedings{Yu&al.18c,
  title     = {Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task},
  author    = {Tao Yu and Rui Zhang and Kai Yang and Michihiro Yasunaga and Dongxu Wang and Zifan Li and James Ma and Irene Li and Qingning Yao and Shanelle Roman and Zilin Zhang and Dragomir Radev}
  booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
  address   = "Brussels, Belgium",
  publisher = "Association for Computational Linguistics",
  year      = 2018
}

@inproceedings{gan-etal-2021-towards,
    title = "Towards Robustness of Text-to-{SQL} Models against Synonym Substitution",
    author = "Gan, Yujian  and
      Chen, Xinyun  and
      Huang, Qiuping  and
      Purver, Matthew  and
      Woodward, John R.  and
      Xie, Jinxia  and
      Huang, Pengsheng",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.acl-long.195",
    doi = "10.18653/v1/2021.acl-long.195",
    pages = "2505--2515",
}

@misc{gan2021exploring,
      title={Exploring Underexplored Limitations of Cross-Domain Text-to-SQL Generalization}, 
      author={Yujian Gan and Xinyun Chen and Matthew Purver},
      year={2021},
      eprint={2109.05157},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@article{deng2020structure,
  title={Structure-Grounded Pretraining for Text-to-SQL},
  author={Deng, Xiang and Awadallah, Ahmed Hassan and Meek, Christopher and Polozov, Oleksandr and Sun, Huan and Richardson, Matthew},
  journal={arXiv preprint arXiv:2010.12773},
  year={2020}
}

License

MIT

tamr's People

Contributors

wenyudu avatar

Stargazers

Tongxu Luo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.