Git Product home page Git Product logo

simulated-trial-and-error's Introduction

LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error

Tools are essential for large language models (LLMs) to acquire up-to-date information and take consequential actions in external environments. Existing work on tool-augmented LLMs primarily focuses on the broad coverage of tools and the flexibility of adding new tools. However, a critical aspect that has surprisingly been understudied is simply how accurately an LLM uses tools for which it has been trained. We find that existing LLMs, including GPT-4 and open-source LLMs specifically fine-tuned for tool use, only reach a correctness rate in the range of 30% to 60%, far from reliable use in practice. We propose a biologically inspired method for tool-augmented LLMs, simulated trial and error (STE), that orchestrates three key mechanisms for successful tool use behaviors in the biological system: trial and error, imagination, and memory. Specifically, STE leverages an LLM's `imagination' to simulate plausible scenarios for using a tool, after which the LLM interacts with the tool to learn from its execution feedback. Both short-term and long-term memory are employed to improve the depth and breadth of the exploration, respectively. Comprehensive experiments on ToolBench show that STE substantially improves tool learning for LLMs under both in-context learning and fine-tuning settings, bringing a boost of 46.7% to Mistral-Instruct-7B and enabling it to outperform GPT-4. We also show effective continual learning of tools via a simple experience replay strategy.

image

image

File Structure

STE/
├─  tool_metadata/: tool related metadata
├─  prompts/: full prompts used
├─  saved_results/: prediction results in json
    ├─ {*, *_FT, *_ICL}.json: results for baseline model, tool-enhanced w/ fine-tuning, tool-enhanced with ICL
    ├─ CL_round_*.json: continual learning (each round)
├─  main.py: main script for STE
├─  postprocessing.py: filtering & paraphrasing for tool enhancement
├─  evaluation.ipynb: evaluation script and cached evaluation results
├─  my_llm.py: helper functions for LLM API call
└── utils.py: other helper functions

llama-recipes/ (adapted from https://github.com/facebookresearch/llama-recipes/)
├─  configs/: configurations for model training
    ├─ training.py: model training-related arguments
    ├─ ...
├─  ft_datasets/: cached data files for fine-tuning and testing
    ├─ api2neighbors.json: nearest neighbors for each API (based on API description similarity)
    ├─ flan_v2_2k.json: 2k random examples from flan_v2
    ├─ tool_data_train_STE_*.json: distilled tool-specific training data
    ├─ tool_test*.json: test set (w/ retrieved demonstration examples)
    ├─ ...
├─  inference/: helper functions for model inference
├─  sysmsg_dir/: system messages for tool and non-tool mode
├─  jobs/: example bash scripts for training/inference
├─  llama_finetuning.py: scripts for model training
├─  data_proc_format.py: data formatting/merging for model training
└── demo_retrieve.ipynb: nearest-neighbor demonstration retrieval

Environment Setup

Put your OpenAI API key in api_key.txt in the parent directory.

  • For STE/, install ToolBench, BMTools and acquire the associated API keys following their respective instructions, and then
pip install -r requirements.txt

Exploration w/ STE

cd STE
python main.py \
    --model_ckpt gpt-3.5-turbo-16k-0613 \
    --num_episodes 15 \
    --num_stm_slots 4 \
    --max_turn 4 \
    --dir_write <your_directory_to_write> \
    --rapidapi_key <your_rapidapi_key> \
    --if_visualize

Custom tool

For STE with custom APIs, simply append the API names and descriptions to API_list.json and API_descriptions.json in tool_metadata/, and change the run_tool function in main.py to enable the execution of newly-added tools.

Exploitation w/ STE

Data preparation

python postprocessing.py \
    --directory <your_directory_to_write> \
    --filter_model_ckpts gpt-4-8k \
    --paraphrase_model_ckpts gpt-3.5-turbo-16k-0613 \
    --target_num_train_per_API 150 \
    --num_para_train_max 6 \
    --save_file_name <your_save_file_name> \
    --if_visualize

Fine-tuning & Inference

cd llama-recipes/
python data_proc_format.py \
    --tool_file <your_save_file_name> \
    --data_save_dir <your_data_directory> \
    --no_general \
    --add_tool_response

CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nnodes 1 --nproc_per_node 4 llama_finetuning.py \
    --enable_fsdp \
    --model_name <your_model_directory> \
    --num_epochs 2 \
    --batch_size_training 16 \
    --micro_batch_size 1 \
    --val_batch_size 8 \
    --lr 2e-5 \
    --num_workers_dataloader 1 \
    --seed 42 \
    --data_path <your_data_directory> \
    --max_words_dataset 2048 \
    --checkpoint_folder <your_directory_to_save> \
    --save_with_hf \
    --warmup_ratio 0.03 \
    --save_epoch_interval 1 \
    --add_token_list ft_datasets/toolken_list_50.json
    
CUDA_VISIBLE_DEVICES=0 python inference/inference_chat.py \
    --model_name <your_model_directory> \
    --data_path ft_datasets/tool_test.json \
    --save_path <your_save_directory> \
    --item_type query \
    --sys_msg_dir sys_msg_dir/sysmsg_tool.json \
    --quantization

ICL

First run demo_retrieve.ipynb to prepare retrieved demonstration examples.

  • For GPT-3.5/4:
cd STE/
python test_gpt.py \
    --model_ckpt {gpt-35-turbo-16k-0613|gpt-4-0613}\
    --save_name <save_file_name> \
    --setting ICL \
    --if_visualize
  • For models based on Llama/Mistral:
cd llama-recipes/
CUDA_VISIBLE_DEVICES=0 python inference/inference_chat.py \
    --model_name <your_model_directory> \
    --data_path ft_datasets/tool_test_OTR_DR.json \
    --save_path <your_save_directory> \
    --item_type dialog \
    --sys_msg_dir sys_msg_dir/sysmsg_tool.json \
    --quantization

Continual Learning with Rehearsal

For round {0|1|2|3},

cd llama-recipes/
python data_proc_format.py \
    --tool_file ft_datasets/tool_data_batches.json \
    --batch_id {0|1|2|3} \
    --data_save_dir ft_datasets/CL_round_{0|1|2|3}.json \
    --general_data_file ft_datasets/flan_v2_2k.json \
    --add_tool_response \
    {--no_replay}

CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nnodes 1 --nproc_per_node 4  llama_finetuning.py \
    --enable_fsdp \
    --model_name <your_model_directory> \
    --num_epochs 2 \
    --batch_size_training 16 \
    --micro_batch_size 1 \
    --val_batch_size 8 \
    --lr 2e-5 \
    --num_workers_dataloader 1 \
    --seed 42 \
    --data_path ft_datasets/CL_round_{0|1|2|3}.json \
    --max_words_dataset 2048 \
    --checkpoint_folder ft_datasets/CL_round_{0|1|2|3}.json \
    --save_with_hf \
    --warmup_ratio 0.03 \
    --save_epoch_interval 1 \
    --add_token_list ft_datasets/toolken_list_50.json

Evaluation

STE/evaluation.ipynb includes the evaluation scripts and cached evaluation results for all predictions files in STE/saved_results/

Citation

@misc{wang2024llms,
      title={LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error}, 
      author={Boshi Wang and Hao Fang and Jason Eisner and Benjamin Van Durme and Yu Su},
      year={2024},
      eprint={2403.04746},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/pdf/2403.04746.pdf}
}

simulated-trial-and-error's People

Contributors

microsoftopensource avatar boshi-wang avatar hao-fang avatar

Stargazers

Starrism avatar Kun-Lin Lee avatar F50 zzz avatar Yue Cui avatar  avatar Andrew Chauzov avatar Stupid avatar Igor avatar init avatar  avatar  avatar  avatar AI in PM avatar Shashank Gupta avatar None avatar  avatar Alan Molin avatar James Chang avatar  avatar Zhymabek Roman (Suenish Salkimbaev) avatar Venkatesh Murugadas avatar Huanxuan Liao avatar Meet Shah  avatar Markus Rauhalahti avatar Yong woo Song avatar zj avatar lafmdp avatar w5688414 avatar Agniva avatar Benjamin D. Brodie avatar  avatar RuiqiLi avatar Zhou Hao avatar dong li avatar Aidan Lew avatar Jovinder Singh avatar Jose Cohenca avatar  avatar Dionis Totsila avatar Xiangyan Liu avatar shuyhere avatar Ismayil Ismayilov avatar Lara J. Martin avatar  avatar HL avatar ZhiyeGuo avatar Nguyen Ngoc Dam avatar jchl avatar James Diffenderfer avatar  avatar Mohammad Reza Taesiri avatar Christian Tsalidis avatar Somto Muotoe avatar Allen avatar  avatar Zilong He avatar  avatar Boolean avatar cckuailong avatar  avatar Xin Li avatar zubintang avatar  avatar KawaiiPGR avatar  avatar  avatar Peng avatar  avatar  avatar Mathieu Simoph avatar kyle avatar Aaron Mihalik avatar tensorboy avatar Diwank Singh Tomer avatar Tao Shen avatar Zirui Song avatar Ryan Kanno avatar  avatar Gabriel Baez avatar  avatar  avatar  avatar  avatar Sean Jensen-Grey avatar Rui Shao avatar  avatar Shivakumar Anandan avatar Yetmens avatar jadon avatar Laeeth Isharc avatar Praveen Sridhar avatar  avatar Tommy Falkowski avatar Luis avatar Michael Y. Choi avatar  avatar  avatar  avatar  avatar IBleedIcare avatar

Watchers

Sam Thomson avatar Yu Su avatar .NET Foundation Contribution License Agreements avatar

simulated-trial-and-error's Issues

Action required: migrate or opt-out of migration to GitHub inside Microsoft

Migrate non-Open Source or non-External Collaboration repositories to GitHub inside Microsoft

In order to protect and secure Microsoft, private or internal repositories in GitHub for Open Source which are not related to open source projects or require collaboration with 3rd parties (customer, partners, etc.) must be migrated to GitHub inside Microsoft a.k.a GitHub Enterprise Cloud with Enterprise Managed User (GHEC EMU).

Action

✍️ Please RSVP to opt-in or opt-out of the migration to GitHub inside Microsoft.

❗Only users with admin permission in the repository are allowed to respond. Failure to provide a response will result to your repository getting automatically archived.🔒

Instructions

Reply with a comment on this issue containing one of the following optin or optout command options below.

✅ Opt-in to migrate

@gimsvc optin --date <target_migration_date in mm-dd-yyyy format>

Example: @gimsvc optin --date 03-15-2023

OR

❌ Opt-out of migration

@gimsvc optout --reason <staging|collaboration|delete|other>

Example: @gimsvc optout --reason staging

Options:

  • staging : This repository will ship as Open Source or go public
  • collaboration : Used for external or 3rd party collaboration with customers, partners, suppliers, etc.
  • delete : This repository will be deleted because it is no longer needed.
  • other : Other reasons not specified

Need more help? 🖐️

Will the collected data be released?

HI, dear authors. It is a nice work. I believe this work is beneficial to the tool-learning community.

By the way, will the collected data be released to facilitate subsequent research?

Two questions about the original paper

Thank you for your interesting work! After reading your paper, I have a few small questions I would like to confirm with you:

  1. Is the main purpose of STE to provide a set of high-quality tool use examples, and then augment LLMs with tools, under both in-context learning and fine-tuning settings?
  2. In Table 1, are the baselines directly calling the corresponding LLMs? I mean, we all know that ICL could improve the tool invocation accuracy of LLMs, but it seems I didn't see baselines based on ICL.

All in all, I think STE is a GOOD paper. Looking forward to your further explanation :D.

Error Occurring at loss.backward() Despite Loss Being Calculable

Hello,

I am encountering an issue when running the following code snippet:

CUDA_VISIBLE_DEVICES=0,1,2,3` torchrun --nnodes 1 --nproc_per_node 4 llama_finetuning.py
--enable_fsdp
--model_name <your_model_directory>
--num_epochs 2
--batch_size_training 16
--micro_batch_size 1
--val_batch_size 8
--lr 2e-5
--num_workers_dataloader 1
--seed 42
--data_path <your_data_directory>
--max_words_dataset 2048
--checkpoint_folder <your_directory_to_save>
--save_with_hf
--warmup_ratio 0.03
--save_epoch_interval 1
--add_token_list ft_datasets/toolken_list_50.json

This results in the following error:

File "/mnt/home/xlh/code/simulated-trial-and-error-main/simulated-trial-and-error-main/llama-recipes/utils/train_utils.py", line 94, in train
loss.backward()
File "/usr/local/miniconda3/envs/ste/lib/python3.10/site-packages/torch/_tensor.py", line 522, in backward
torch.autograd.backward(
File "/usr/local/miniconda3/envs/ste/lib/python3.10/site-packages/torch/autograd/init.py", line 266, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/usr/local/miniconda3/envs/ste/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 1112, in unpack_hook
frame.recompute_fn(*args)
File "/usr/local/miniconda3/envs/ste/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 1401, in recompute_fn
fn(*args, **kwargs)
File "/usr/local/miniconda3/envs/ste/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/miniconda3/envs/ste/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/miniconda3/envs/ste/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 741, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/usr/local/miniconda3/envs/ste/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/miniconda3/envs/ste/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/miniconda3/envs/ste/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 671, in forward
attn_output = torch.nn.functional.scaled_dot_product_attention(
RuntimeError: The expanded size of the tensor (4096) must match the existing size (2048) at non-singleton dimension 3. Target sizes: [1, 32, 2048, 4096]. Tensor sizes: [1, 1, 2048, 2048]

The error occurs at loss.backward() but the loss value is computed successfully and can be printed out. I would appreciate any insights or suggestions on possible causes for this error.

Thank you for your help!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.