Git Product home page Git Product logo

llama-qrlhf's Introduction

Llama - QRLHF (wip)

Implementation of the Llama (or any language model) architecture with RLHF + Q-learning.

This is experimental / independent open research, built off nothing but speculation. But I'll throw some of my brain cycles at the problem in the coming month, just in case the rumors have any basis. Anything you PhD students can get working is up for grabs.

Will start off by adapting the autoregressive discrete Q-learning formulation in the cited paper below and run a few experiments on arithmetic, using a symbolic solver as reward generator.

Yannic Kilcher's educational Q-learning video

Citations

@inproceedings{qtransformer,
    title   = {Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions},
    authors = {Yevgen Chebotar and Quan Vuong and Alex Irpan and Karol Hausman and Fei Xia and Yao Lu and Aviral Kumar and Tianhe Yu and Alexander Herzog and Karl Pertsch and Keerthana Gopalakrishnan and Julian Ibarz and Ofir Nachum and Sumedh Sontakke and Grecia Salazar and Huong T Tran and Jodilyn Peralta and Clayton Tan and Deeksha Manjunath and Jaspiar Singht and Brianna Zitkovich and Tomas Jackson and Kanishka Rao and Chelsea Finn and Sergey Levine},
    booktitle = {7th Annual Conference on Robot Learning},
    year   = {2023}
}
@inproceedings{Wang2015DuelingNA,
    title   = {Dueling Network Architectures for Deep Reinforcement Learning},
    author  = {Ziyun Wang and Tom Schaul and Matteo Hessel and H. V. Hasselt and Marc Lanctot and Nando de Freitas},
    booktitle = {International Conference on Machine Learning},
    year    = {2015},
    url     = {https://api.semanticscholar.org/CorpusID:5389801}
}

llama-qrlhf's People

Contributors

lucidrains avatar

Stargazers

 avatar Tianyi Zhou avatar Richmond Sun avatar Abdullah Vanlıoğlu avatar  avatar Xiaobo Yang avatar Johanes Setiawan avatar Zizheng Yang avatar  avatar Chang Nie avatar Xiaoqian Liu avatar  avatar Richard Burleigh avatar Zeyuan avatar Justin DuJardin avatar Sulthan Abiyyu Hakim avatar D. Marcos avatar Clearlove avatar Xiaojian Yuan avatar  avatar  avatar YH avatar Stephen Vorwerk avatar  avatar Hai-Dang Huynh-Lam avatar zetsukhun avatar 过拟合 avatar 林楷傑 avatar Minjae Song avatar Junyoung Park avatar MuhammadAnwar avatar Lam Ngo avatar Niksa Praljak avatar Mohammed OE Abdallah avatar OrigamiDream avatar Mitchell Mosure avatar  avatar Hussein Lezzaik avatar Tony avatar  avatar John Binkley avatar Max-Ryujin avatar Vegax avatar Jeffrey Fetzer avatar Ghulam Jilani Raza avatar  avatar Louis Beaumont avatar harsha20032020 avatar Brent avatar Li Yi avatar QinHsiu avatar Rohan Seth avatar  avatar Michael R. Kirchner avatar Hyojeon Yoon avatar Guian Fang avatar Jiseong avatar Rahul avatar `Trần Bảo Chí avatar  avatar K.Y avatar Kai Yi avatar Praise avatar Jinxiu Liu avatar Mako avatar  avatar Sameer G avatar Rodrigo Baron avatar Samuel Rincé avatar  avatar dgo2dance avatar Vincent Weisser avatar Max Lu avatar yangchao avatar  avatar Sezgin Er avatar Anas Awadalla avatar George avatar Rohan Lamba avatar  avatar Dip_an  avatar Helmy LuqmanulHakim avatar TOKAMK avatar  avatar Ryu teho avatar Tianpei Gu avatar Alafate avatar Tomek Gniazdowski avatar Attila Nagy avatar Men Tianyi avatar RossSong avatar  avatar LouisKai avatar Zafar Ansari avatar Yoon, Seungje avatar  avatar  avatar kkk avatar 爱可可-爱生活 avatar Malik Hashmat avatar

Watchers

James avatar  avatar Matt Wilde avatar Pi avatar  avatar Omkar Patil avatar Luke Meyers avatar Jacques Thibodeau avatar SeungHeon Doh avatar Pengfei avatar André Pedersen avatar Daniel Grittner avatar  avatar Halima Bouzidi avatar Zhuoran Jin avatar Zeyuan avatar Tiding Luo avatar truongpdd avatar Shivam Kumar avatar Theo Hintz avatar Desmond Grealy avatar

llama-qrlhf's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.