Git Product home page Git Product logo

rlmec's Introduction

RLMEC

This repo provides the source code & data of our paper: Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint (arXiv 2024)

Framework

Based on the sampled LLM solutions that contain errors, we train the generative reward model using the erroneous solution rewriting task and the distilled data with minimum editing constraint from the teacher model. Then, we perform RL training on the policy model (i.e., our LLM) with fine-grained supervision using the token-level RL objective and the imitation-based regularization.

Usage

To run RLMEC, you need to set your Claude 2 API key in the miniedit_tm_qa.py and miniedit_tm_qa.py, and set the model path and data path in the scripts.

The procedure of RLMEC is as follows:

  1. Utilize the policy model to generate samples.
cd process_data/Gen_Samples
bash scripts/run_inference_qa.sh
cd ../../
  1. Distillate samples from the teacher model (i.e., Claude 2).
cd process_data/Minimum_Edit
bash scripts/run_miniedit_tm_qa.sh
cd ../../
  1. Process data to train the generative reward model.
cd process_data/Gen_Training_Data
python gen_grm_data_qa.py
cd ../../
  1. Utilize the distillated samples to train the generative reward model.
cd train
bash scripts/run_train_grm.sh
cd ..
  1. Utilize the well trained generative reward model to rewrite the generated samples.
cd process_data/Minimum_Edit
bash scripts/run_miniedit_grm_qa.sh
cd ../../
  1. Utilize the generative reward model to generate the fine-grained supervised signals.
cd process_data/Gen_Training_Data
python gen_rlmec_data_qa.py
cd ../../
  1. Utilize the fine-grained supervised signals to train the policy model.
cd train
bash scripts/run_grain_rlmec.sh
cd ../
  1. Evaluate the well trained policy model on the mathematical tasks and question-answering tasks.
cd evaluate/QA
bash run.sh
cd ../Math
bash scripts/run_eval.sh
cd ../../

Citation

If you find that RLMEC is helpful for your work, please cite the follow paper.

@InProceedings{Chen-RLMEC-2024,
      title={Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint}, 
      author={Zhipeng Chen and Kun Zhou and Wayne Xin Zhao and Junchen Wan and Fuzheng Zhang and Di Zhang and Ji-Rong Wen},
      year={2024},
      eprint={2401.06081},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

rlmec's People

Contributors

timothy023 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.