Git Product home page Git Product logo

ambrosia's Introduction

AMBROSIA

AMBROSIA is a new benchmark for recognizing and interpreting ambiguous requests in text-to-SQL. The dataset contains questions with three types of ambiguity (scope ambiguity, attachment ambiguity, and vagueness), their interpretations, and corresponding SQL queries. AMBROSIA and all released software are distributed under CC BY 4.0.

This repository contains data and code used for data collection and evaluation.

image

Data

The dataset is available at ambrosia-benchmark.github.io. Please download and extract it into the data directory.

Setup

To speed up inference, we use various toolkits: TGI, VLLM, and OpenChat server. These libraries can be easily accessed using Docker. To run Docker, first build and run the custom image (it will install additional packages and copy data to the base image):

docker build -t <image_name> -f <Dockerfile> .
docker run -it <image_name> <args_for_base_image> /bin/bash

Dockerfiles for TGI, OpenChat and VLLM inference are available in the docker directory.

Build and run the customized TGI image (for Llama3 and CodeLlama Prompt):

docker build -t ambrosia-docker-image-tgi -f docker/Dockerfile_tgi .
docker run -it ambrosia-docker-image-tgi --gpus all \
    --shm-size 1g \
    --env "HUGGING_FACE_HUB_TOKEN=<secret>" \
    --model-id $model \
    --max-best-of '1' \
    /bin/bash

Build and run the customized VLLM image (for Llama3, CodeLlama and OpenChat Beam):

docker build -t ambrosia-docker-image-vllm -f docker/Dockerfile_vllm .
docker run -it ambrosia-docker-image-vllm --runtime nvidia --gpus all \
    --shm-size 1g \
    --env "HUGGING_FACE_HUB_TOKEN=<secret>" \
    --model $model \
    --max-model-len 5120 \
    --seed 1 \
    /bin/bash

Build and run the customized OpenChat image (for OpenChat Prompt and database generation):

docker build -t ambrosia-docker-image-openchat -f docker/Dockerfile_openchat .
docker run -it ambrosia-docker-image-openchat --gpus all /bin/bash

Evaluation

All evaluation functions are located in the src/evaluation directory. Prompts for evaluations can be found in src/prompts/evaluation. Scripts for the experiments are provided in src/scripts. For example, to perform zero-shot inference of Llama3-70B with the Prompt method (requires TGI):

./src/scripts/zero_shot_prompt.sh "meta-llama/Meta-Llama-3-70B-Instruct" --tgi

You can also run inference directly (with a fixed random seed and only ambiguous questions):

python src/evaluation/evaluate_model_tgi.py \
    --prompt_file src/prompts/evaluation/prompt \
    --use_tgi \
    --api_url 'http://0.0.0.0/' \
    --model_name "meta-llama/Meta-Llama-3-70B-Instruct" \
    --type_of_questions ambig

For example, to perform zero-shot inference of Llama3-70B with the Beam method (requires VLLM):

./src/scripts/zero_shot_beam_vllm_server.sh "meta-llama/Meta-Llama-3-70B-Instruct"

Or to run inference directly (with a fixed random seed and only ambiguous questions):

python3 src/evaluation/evaluate_model_openai_api.py  \
    --prompt_file src/prompts/evaluation/beam \
    --use_vllm \
    --api_url 'http://localhost:8000/v1' \
    --api_key 'EMPTY' \
    --model_name "meta-llama/Meta-Llama-3-70B-Instruct" \
    --type_of_questions ambig

Database Generation

All evaluation functions are located in the src/db_generation directory. Prompts for evaluations can be found in src/prompts/db_generation. Domains are specified in the data directory. We use the OpenChat model for database generation.

Once you run a Docker container with OpenChat, you can run the generation of key concepts and relations:

./src/scripts/generate_key_concepts_relations.sh

It will run the following command for all ambiguity types:

python src/db_generation/generate_key_concepts_relations.py --prompt_file PROMPT --ambig_type AMBIG_TYPE --data_dir DIR_FOR_CONCEPTS --domain_file DOMAIN_FILE --api_url "http://localhost:18888/v1/"

Run database generation:

./src/scripts/generate_databases.sh

It will run the following command for all ambiguity types:

python src/db_generation/generate_databases.py --data_dir data/ --ambig_type AMBIG_TYPE --api_url "http://localhost:18888/v1/"

Data Collection Interface

You can find the Potato interface for data collection in the annotation directory. To install Potato, follow the instructions in the README. We use the modified version written by Hosking et al., 2024.

To start Potato, run the following command:

cd annotation/
python potato/flask_server.py start ambrosia_data_collection/configs/CONFIG_FILE -p 8000

The config options are:

We provide examples of raw data in annotation/ambrosia_data_collection/data_files. Instructions for annotators can be found in annotation/ambrosia_data_collection/instructions

Paper

More details on data collection and evaluation results are provided in the paper:

π”Έπ•„π”Ήβ„π•†π•Šπ•€π”Έ: A Benchmark for Parsing Ambiguous Questions into Database Queries

Irina Saparina and Mirella Lapata

ambrosia's People

Contributors

saparina avatar

Stargazers

JoΓ£o avatar Vladislav Skripniuk avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.