Git Product home page Git Product logo

mathblackbox's Introduction

MCTSr: Mathematic as a Blackbox for LLM

Testing with GPT-4o-mini

Adapted from here.

Prep

Set the OPENAI_API_KEY in your environment

export OPENAI_API_KEY = <key>

Create the Python virtual environment

python -m venv venv

Run a benchmark, say, gsm8k

python run_with_earlystopping.py gpt-4o-mini gsm8k-gpt-4o-mini

Results saved in the folder name provided as second param above.

Check the accuracy

python anal.py gsm8k-gpt-4o-mini gsm8k-gpt-4o-mini

Previous instructions below

Server

We need VLLM or other Openai compatible method.

pip install vllm

Clients

We need Huggingface toolkit and Openai for inference.

pip install datasets transformers openai

Usage

The script relies on Slurm, If you run it on non-slurm environments,

Just use VLLM to create a openai compatible server, and insert to 'server.csv'

IP,PORT,MODEL_NAME

If you run it on slurm environment, change the partition name to your own partition in make_n_server.py

then, you can run the run_with_earlystopping.py for datasets.

python run_with_earlystopping.py MODEL_NAME DATA_DIR_NAME

Support Datasets

datasets were given by the first part of DATA_DIR_NAME arguments, like gsm8k-llama3-8b-new-mcts-8 for gsm8k , can selected in,

        'gsm8k-llama3-8b-new-mcts-8',
         'gsmhard-llama3-8b-new-mcts-8',
         'olympiadbench-llama3-8b-new-mcts-8',
         'GAIC-llama3-8b-new-mcts-8',
         'MATH-llama3-8b-new-mcts-8',
         'AIME-llama3-8b-mcts-2'

Using run_olympics.py to run all of them.

Alert: That would consume a long time.

Citation

@misc{zhang2024accessing,
      title={Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B}, 
      author={Di Zhang and Xiaoshui Huang and Dongzhan Zhou and Yuqiang Li and Wanli Ouyang},
      year={2024},
      eprint={2406.07394},
      archivePrefix={arXiv},
      primaryClass={id='cs.AI' full_name='Artificial Intelligence' is_active=True alt_name=None in_archive='cs' is_general=False description='Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.'}
}

Disclaimer

This project was still in a very early stage for explore, pay attentions for the algorithm's output, and do not deploying it to real-world product without fully test.

Read More

https://arxiv.org/abs/2406.07394

Re-implementations

https://github.com/BrendanGraham14/mcts-llm

Jupyter Notebook

mathblackbox's People

Contributors

trotsky1997 avatar loudsage avatar

Stargazers

Andrew H. Zhang avatar Amund Tveit avatar Greg DeCarlo avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.