Git Product home page Git Product logo

buffer-of-thought-llm's Introduction

Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

     

This repository contains the official implementation of our Buffer of Thoughts (BoT) framework. Affiliation: Peking University, UC Berkeley, Stanford University

🚩 New Updates

  • Release initial code of BoT, supporting GPT-4 and Llama3-70B [2024.6.6]
  • Update the code for smaller LLMs (e.g., Llama3-8B) [2024.6.24]
  • Release meta-buffer and buffer-manager
  • Extending BoT to more applications

Introduction

We introduce BoT, a novel and versatile thought-augmented reasoning approach designed to enhance the accuracy, efficiency, and robustness of large language models (LLMs). Specifically, we propose a meta-buffer to store a series of high-level thoughts, referred to as thought-templates, distilled from problem-solving processes across various tasks. For each problem, we retrieve a relevant thought-template and adaptively instantiate it with specific reasoning structures to conduct efficient reasoning. To ensure scalability and stability, we also propose a buffer-manager to dynamically update the meta-buffer, thus enhancing its capacity as more tasks are solved. We conduct extensive experiments on 10 challenging reasoning-intensive tasks, achieving significant performance improvements over previous state-of-the-art (SOTA) methods: 11% on Game of 24, 20% on Geometric Shapes, and 51% on Checkmate-in-One. Further analysis demonstrates the superior generalization ability and robustness of our BoT, while requiring only 12% of the cost of multi-query prompting methods (e.g., tree/graph of thoughts) on average. Notably, we find that our Llama3-8B + BoT has the potential to surpass Llama3-70B model.

Overview of our BoT

Comparison between Different Methods

Task/Method GPT-4 PAL ToT Meta Prompting BoT (Ours)
Game of 24 3.0 64.0 74.0 67.0 82.4
MGSM (avg) 84.4 72.0 86.4 84.8 89.2
Multi-Step Arithmetic 84.0 87.4 88.2 90.0 99.8
WordSorting 80.4 93.2 96.4 99.6 100.0
Python Puzzles 31.1 47.3 43.5 45.8 52.4
Geometric Shapes 52.6 51.2 56.8 78.2 93.6
Checkmate-in-One 36.4 10.8 49.2 57.0 86.4
Date Understanding 68.4 76.2 78.6 79.2 88.2
Penguins 71.1 93.3 84.2 88.6 94.7
Sonnet Writing 62.0 36.2 68.4 79.6 80.0

Evaluation with Buffer of Thoughts

1. Benchmarks

For now, we release our demo version of BoT based on three different benchmarks:

2. Meta Buffer

For each task, we choose one thought template sampled from our meta-buffer library. Stay tuned for our complete meta-buffer library update!

3. Quick Start

First, set up the environment:

git clone https://github.com/YangLing0818/buffer-of-thought-llm
cd buffer-of-thought-llm
conda create -n BoT python==3.9 
pip install -r requirements.txt

3.1. Running on Three Benchmarks

Our BoT is easy to use. Just run:

python run_benchmarks.py --task_name 'gameof24' --api_key 'input your API key here if you want to use GPT-4' --model_id 'the model ID of GPT-4 or the path to your local LLM'

Here, --task_name could be one of gameof24, checkmate, wordsorting.

The --api_key is required if you want to use GPT-series; if not, you can skip it.

The --model_id should be the model ID of GPT-series like gpt-4o, gpt-4-turbo, or the path to your local LLM if you do not set --api_key.

The data for these three tasks are located in the /benchmarks directory.

The results generated during the experiment are stored in the /test_results directory.

3.2. Validate the Test Results

Run the command below to validate the test results of our BoT:

python validate_results.py --task_name 'gameof24' --test_path 'The path to the .jsonl file you want to validate'

This will print out the accuracy of the selected task on your relevant .jsonl file.

📖 BibTeX

@article{yang2024buffer,
  title={Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models},
  author={Yang, Ling and Yu, Zhaochen and Zhang, Tianjun and Cao, Shiyi and Xu, Minkai and Zhang, Wentao and Gonzalez, Joseph E and Cui, Bin},
  journal={arXiv preprint arXiv:2406.04271},
  year={2024}
}

buffer-of-thought-llm's People

Contributors

bitcodingwalkin avatar yangling0818 avatar

Stargazers

Gaston Longhitano avatar hxb avatar Charlie avatar Tomás Barros avatar  avatar  avatar Vladislav Sorokin avatar  avatar Xiaohui Yan avatar Lulzx avatar Tadej Fius avatar  avatar  avatar Raphael MANSUY avatar Walid Lezzar avatar Markus Strazds avatar Syed Hasan Abbas avatar Web&Sundry avatar AllenChou avatar  avatar Karthik Kumar Veldandi avatar  avatar Renat Zayashnikov avatar MANISH KUMAR PANDEY avatar oliv avatar Jonathan Padilla avatar  avatar John M. Owen avatar Sani avatar Abdullah Mohammed avatar alex avatar Peter Clarke avatar  avatar Peter Baylies avatar Vanildo Vanni avatar Jin Zhang avatar  avatar Kunyao Lan avatar Injae Ryou avatar Hanzhe "Alfred" Long avatar Komachi Zhou avatar  avatar Sean Shih-Huan Lin avatar Huaishuo Liu avatar  avatar Zach Meador avatar Marçal Albert avatar Matěj Pešl avatar Hause Lin avatar  avatar  avatar  avatar Sanjay avatar  avatar Guangdong Xie avatar archas avatar  avatar  avatar  avatar  avatar Oscar Neto avatar  avatar Nur Arifin Akbar avatar  avatar  avatar  avatar Zhiyuan Ma avatar  avatar Niall Taylor avatar  avatar Ahmed Mohamed avatar Rancho2050 avatar  avatar  avatar ranfdev avatar Bingkang Shi avatar Tyoung avatar Barend Potijk avatar Jeongyeon Ma avatar Yuchao Jin avatar Peng Lin avatar GRATITUD3.ETH avatar  avatar Shangzhi Lou avatar  avatar Elmira Ghorbani avatar Junfeng Huang avatar  avatar Senias avatar Zhouzhonghui avatar  avatar Christian Gintenreiter avatar meetog avatar  avatar  avatar Don Kang avatar Igor Tarasenko avatar Jonathan Fraine avatar Xianing Chen avatar  avatar

Watchers

Ari Weinstein avatar Michael.谢 avatar Kenn avatar  avatar  avatar Sani avatar Christian Gintenreiter avatar  avatar  avatar signal processing fan avatar dongfshuo avatar ranfdev avatar Jihyuk Kim avatar  avatar Nadav Nesher avatar  avatar eli=mc^2 avatar  avatar  avatar  avatar Tomás Barros avatar  avatar  avatar

buffer-of-thought-llm's Issues

checkmate结果差异过大

作者您好,我使用了Metallama-8B-Instruct模型跑checkmate的结果,得到的准确率为4%,和您在论文中报告的56.7%差距过大,请问可能出现什么问题了呢?我对run_benchmark中的代码做了细微的修改,以下是我的代码。

#run_benchmarks_modify.py

import json
from bot_pipeline_modify import BoT
import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--task_name',type=str,default='gameof24',choices=['gameof24','checkmate','wordsorting'])
parser.add_argument('--api_key',type=str,help='input your api key here')
parser.add_argument('--model_id',type=str,default='gpt-4o',help='Input model id here, if use local model, input the path to the local model')



GameOf24 = """
Let's play a game called 24. You'll be given four integers, and your objective is to use each number only once, combined with any of the four arithmetic operations (addition, subtraction, multiplication, and division) and parentheses, to achieve a total of 24. For example, if the input is 4, 7, 8, and 8, the output could be 7 * 8 - 4 * 8 = 24. You only need to find one feasible solution!
Input:
"""
CheckmateInOne = """
Given a series of chess moves written in Standard Algebraic Notation (SAN), determine the next move that will result in a checkmate.
Input: 
"""
WordSorting = """
Sort a list of words alphabetically, placing them in a single line of text separated by spaces.
Input:
"""


if __name__ == "__main__":
    args = parser.parse_args()
    task = args.task_name
    api_key = args.api_key
    model_id = args.model_id
    benchmark_dict = {
        'gameof24':GameOf24,
        'checkmate':CheckmateInOne,
        'wordsorting':WordSorting
    }
    
    path_dict = {
        'gameof24':'benchmarks/gameof24.jsonl',
        'checkmate':'benchmarks/CheckmateInOne.jsonl',
        'wordsorting':'benchmarks/word_sorting.jsonl'
    }
    
    buffer_dict = {
        'gameof24':0,
        'checkmate':1,
        'wordsorting':2
        
    }
    
    user_prompt = benchmark_dict[task]
    path = path_dict[task]    
    problem_id = buffer_dict[task]

    test_bot = BoT(
    user_input = None,
    problem_id= problem_id,
    api_key= api_key,
    model_id= model_id
    )
    
    for line in (open(path)):
        input = json.loads(line)['input']
        user_input = user_prompt + input

        test_bot.set_problem_id(problem_id=problem_id)# 修改之处:self.problem_id=problem_id
        test_bot.set_user_input(user_input=user_input)

        result = test_bot.bot_run()
        tmp = {'input':input,'result':result}
        with open(f'test_results/BoT_{task}_modify.jsonl', 'a+', encoding='utf-8') as file:
            json_str = json.dumps(tmp)
            file.write(json_str + '\n')

运行的结果见附件
Uploading testResult.zip…

数学解题demo

你的工作挺有意思的。

  1. 能否提供一个数学解题thought_template例子?
  2. 生成数学解题的thought_template的prompt怎么写?能否也给个例子

非常感谢!

what if i want to setting to the conversation in my use case

i want know how can i build with in my rule of setting by your BoT framwork , such like the i given the data format and some theoretical formula or somethong logic thing ,and then runing by away in setting by you given..............
that coulde be wonderful and interesting

Thought Extractor based on coding repositories

There should be an easy to to ingest a code repository and git PRs

Git PRs are prime spaces to implement "thought extractors" or "solution extractors"

For a example a prompt like:

"Given the below git diff, generate a thought process that I will store in a bank of solutions. The thought processes you generate must cover all basis a software engineer must have gone through to generate the PR.

A thought is basically a solution eg
"where is the place that we edit the images directory? and how did we solve it

to solve this question we need to use the
function here_that_solves_the_issuse():

<further explanation of the solution and why it logically makes sense, this should be grounded on the
truth from the PR ONLY use CODE that is FROM the PR so we have 100% accuracy>
"

Here's the context for this PR:

And here's the diff:

Respond in the following manner: here's examples of thoughts / solutions / pattern of solutions.

To add a method that .... we need to add ...
func some_thing_here()

Respond and output all thoughts you can, I will continue the conversation and ask for more"

ref: #4

Code gen

I wonder how this performs on code gen

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.