Git Product home page Git Product logo

pearl's Introduction

PEARL

This is the repository for our paper Prompting Large Language Models to Plan and Execute Actions Over Long Documents.

Setup

Please make sure openai package is installed, and the API key has been exported to env variable OPENAI_API_KEY_OAI.

Preprocess Data

  1. Download QuALITY data and unzip to ./data/raw folder
  2. Run python data_preproc.py. This step produces two files in data/processed folder: quality_dev_q.csv and quality_train_q.csv

Action Mining

PEARL mines actions from data of similar distribution (in this repo, the training data of the QuALITY dataset) instead of assuming a pre-defined action space. To mine the actions from the training set, run

bash ./script.sh action_mining

The above command will return file ./output/mined_actions_init.txt which stores the actions in the format of

ANALYZE(CTX, X, Y) #Analyze the relationship, attitude, or feelings between X and Y, or the character, language, tone, or symbolism of X given the input CTX.

Notice that we find the generation process is not entirely deterministic even after setting both temperature and top_p to 0. We provide examples of mined actions in output/mined_actions_init_example.txt.

In our experiments, the total length of all mined actions exceeds the maximum context length of GPT-4-8k, thus we added a step to simplify the mined actions:

bash ./script.sh action_simplification

Example actions simplified from output/mined_actions_init_example.txt are provided in output/mined_actions_simplified_example.txt. The number of actions can be adjusted via going through multiple rounds of action simplification. More details are included in our paper Section 4.1.

Plan Generation and Execution with PEARL

We evaluate PEARL on a subset of QuALITY questions that are annotated requiring long context to answer. For both baselines and PEARL, the output will be stored in ./output folder following the format {prompt_type}_out.{split}.{ctx_type}.csv. The {split} and {ctx_type} denote placeholder for the original QuALITY split (train or dev) from which we extract the example, and the context size required to answer the question respectively.

To run the multiple-choice question baseline, run

bash ./script.sh baseline_mcq

We provide output in ./output/baseline_mcq_out.{split}.{ctx_type}.csv

To run the free-form answer baseline, run

bash ./script.sh baseline_gqa

We provide output in ./output/baseline_gqa_out.{split}.{ctx_type}.csv

To run PEARL on the challenge subset of QuALITY, run

bash ./script.sh pearl

For PEARL, two files are generated:

  1. The .csv file that contains the plan and answer and mapped answer with field names as follows:
  • qid: qid
  • plan: generated plan
  • open-answer: the freeform answer generated by executing the plan
  • map-answer: the letter choice selected by GPT-4 based on the open-answer
  • gold: the gold label
  1. The output .pkl file that stores the intermediate output where the keys are the output variables in the plan, and values are the executed results assigned to the output variables.

We provide example .csv output of two runs with gpt-4-0314 checkpoint in ./output/pearl_out.{split}.{ctx_type}.csv, as well as the intermediate output of one run in .pkl file.

To see the intermediate step output, run the the command in ./script.sh with --debug. Example output for executing one action is shown below: the parsed action and the executed output.

{'action': 'FIND_RELATION', 'args': ['CTX', '"Ro"', '"mother"'], 'output_var': 'ro_mother', 'detailed_action': 'Find and summarize the relationship between Ro and his mother in the input article'}
In the input article, Ro is a young Martian who has returned to his home ... The relationship between Ro and his mother seems to be one of respect and learning, as he remembers her words and uses them to navigate the challenges he faces.

Note that the code currently uses the provided examples in the prompt_bank for plan generation. To generate demonstration with GPT-4 along with self-refinement, run

bash ./script.sh refine

The generated demonstrations will be printed out, and can be later incorporated into prompt_bank/plan_gen.txt.

To compute the mapped answer accuracy for each method, run

python comp_acc.py baseline_mcq_out
# File: ./output/baseline_mcq_out.dev.ctx_eval_long.csv, accuracy: 81.2
# File: ./output/baseline_mcq_out.dev.ctx_eval_short.csv, accuracy: 84.4
# File: ./output/baseline_mcq_out.train.ctx_eval_long.csv, accuracy: 71.7
# Total accuracy: 78.7

python comp_acc.py baseline_gqa_out
# File: ./output/baseline_gqa_out.dev.ctx_eval_long.csv, accuracy: 71.5
# File: ./output/baseline_gqa_out.dev.ctx_eval_short.csv, accuracy: 79.1
# File: ./output/baseline_gqa_out.train.ctx_eval_long.csv, accuracy: 57.9
# Total accuracy: 68.8

python comp_acc.py pearl_out
# File: ./output/pearl_out.dev.ctx_eval_long.csv, accuracy: 77.4
# File: ./output/pearl_out.dev.ctx_eval_short.csv, accuracy: 76.7
# File: ./output/pearl_out.train.ctx_eval_long.csv, accuracy: 63.8
# Total accuracy: 72.2

Cite

@misc{sun2023pearl,
      title={PEARL: Prompting Large Language Models to Plan and Execute Actions Over Long Documents}, 
      author={Simeng Sun and Yang Liu and Shuohang Wang and Chenguang Zhu and Mohit Iyyer},
      year={2023},
      eprint={2305.14564},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

pearl's People

Contributors

simengsun avatar

Stargazers

 avatar  avatar  avatar Lukasz Hanusik avatar Alexandre Vilcek avatar xutweety avatar Eunchan Lee avatar Lingxi Li avatar  avatar WDeng avatar xzebin avatar wangsimian avatar minsing-jin avatar Than Lwin Aung avatar Jeffrey (Dongkyu) Kim avatar Jeff Carpenter avatar  avatar  avatar Yufei avatar Hol avatar gyunggyung avatar Yang Liu avatar qshu avatar Liu Xiao avatar Alan deLevie avatar  avatar ifling avatar Sandalots avatar 爱可可-爱生活 avatar Tao Yang avatar ajay avatar  avatar  avatar  avatar Jeff Hammerbacher avatar  avatar  avatar  avatar  avatar  avatar Anshuman Sabath avatar Wenting Zhao avatar pushpendra pratap avatar Ruocheng Guo avatar DMT avatar Far El avatar

Watchers

 avatar Anshuman Sabath avatar Krisztián Boros avatar  avatar  avatar DMT avatar

pearl's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.