qiugen / hh-rlhf Goto Github PK
View Code? Open in Web Editor NEWThis project forked from anthropics/hh-rlhf
Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
Home Page: https://arxiv.org/abs/2204.05862
License: MIT License