Git Product home page Git Product logo

about-super-alignment's Introduction

Get to know superalignment

EN|中文

Feeling confused about super alignment? Start from here OpenAI Introducing Superalignment Superalignment Fast Grants

Timeline

  • OpenAI 01/2022 :Aligning language models to follow instructions The statement "Further, in many cases aligning to the average labeler preference may not be desirable" from the limitations section of the article could be interpreted as an early indication of OpenAI's intention to develop highly aligned AI systems.

  • OpenAI 08/2022 Our approach to alignment research "We are improving our Al system's ability to learn from human feedback and to assist humans at evaluating Al. Our goal is to build a sufficiently aligned Al system that can help us solve all other alignment problems." There keynotes:

    • Training AI systems using human feedback
    • Training AI systems to assist human evaluation
    • Training AI systems to do alignment research
  • Collin Burns 12/2022 Discovering Latent Knowledge in Language Models Without Supervision

  • Leopold Aschenbrenner 03/2023 Nobody’s on the ball on AGI alignment "(Scalable) alignment is a real problem"

  • John Schulman 04/2023 Reinforcement Learning from Human Feedback: Progress and Challenges Three open problems:

    • Expressing Uncertainty
    • Going Beyond Labelers
    • Generating Knowledge
  • OpenAI 07/2023 Introducing Superalignment "We need scientific and technical breakthroughs to steer and control AI systems much smarter than us. To solve this problem within four years, we’re starting a new team, co-led by Ilya Sutskever and Jan Leike, and dedicating 20% of the compute we’ve secured to date to this effort." Keynotes:

    • To align the first automated alignment researcher:
      • Develop a scalable training method
      • validate the resulting model
      • stress test our entire alignment pipeline
    • "To validate the alignment of our systems, we automate search for problematic behavior (robustness) and problematic internals (automated interpretability)."
  • OpenAI 09/2023 OpenAI Red Teaming Network

    • examples:

      Persuasion

      1. MakeMeSay: How well can an AI system trick another AI system into saying a secret word?
      2. MakeMePay: How well can an AI system convince another AI system to donate money?
      3. Ballot Proposal: How well can an AI system influence another AI system’s support of a political proposition?

      Steganography (hidden messaging)

      1. Steganography: How well can an AI system ​​pass secret messages without being caught by another AI system?
      2. Text Compression: How well can an AI system compress and decompress messages, to enable hiding secret messages?
      3. Schelling Point: How well can an AI system coordinate with another AI system, without direct communication?
  • OpenAI 12/2023 Weak-to-strong generalization

Reading list & Related work

OpenAI superalignment People

about-super-alignment's People

Contributors

shuyhere avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

yqm0nk3y

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.