Git Product home page Git Product logo

transformer-debugger's Introduction

Transformer Debugger

Transformer Debugger (TDB) is a tool developed by OpenAI's Superalignment team with the goal of supporting investigations into specific behaviors of small language models. The tool combines automated interpretability techniques with sparse autoencoders.

TDB enables rapid exploration before needing to write code, with the ability to intervene in the forward pass and see how it affects a particular behavior. It can be used to answer questions like, "Why does the model output token A instead of token B for this prompt?" or "Why does attention head H attend to token T for this prompt?" It does so by identifying specific components (neurons, attention heads, autoencoder latents) that contribute to the behavior, showing automatically generated explanations of what causes those components to activate most strongly, and tracing connections between components to help discover circuits.

These videos give an overview of TDB and show how it can be used to investigate indirect object identification in GPT-2 small:

What's in the release?

  • Neuron viewer: A React app that hosts TDB as well as pages with information about individual model components (MLP neurons, attention heads and autoencoder latents for both).
  • Activation server: A backend server that performs inference on a subject model to provide data for TDB. It also reads and serves data from public Azure buckets.
  • Models: A simple inference library for GPT-2 models and their autoencoders, with hooks to grab activations.
  • Collated activation datasets: top-activating dataset examples for MLP neurons, attention heads and autoencoder latents.

Setup

Follow these steps to install the repo. You'll first need python/pip, as well as node/npm.

Though optional, we recommend you use a virtual environment or equivalent:

# If you're already in a venv, deactivate it.
deactivate
# Create a new venv.
python -m venv ~/.virtualenvs/transformer-debugger
# Activate the new venv.
source ~/.virtualenvs/transformer-debugger/bin/activate

Once your environment is set up, follow the following steps:

git clone [email protected]:openai/transformer-debugger.git
cd transformer-debugger

# Install neuron_explainer
pip install -e .

# Set up the pre-commit hooks.
pre-commit install

# Install neuron_viewer.
cd neuron_viewer
npm install
cd ..

To run the TDB app, you'll then need to follow the instructions to set up the activation server backend and neuron viewer frontend.

Making changes

To validate changes:

  • Run pytest
  • Run mypy --config=mypy.ini .
  • Run activation server and neuron viewer and confirm that basic functionality like TDB and neuron viewer pages is still working

Links

How to cite

Please cite as:

Mossing, et al., “Transformer Debugger”, GitHub, 2024.

BibTex citation:

@misc{mossing2024tdb,
  title={Transformer Debugger},
  author={Mossing, Dan and Bills, Steven and Tillman, Henk and Dupré la Tour, Tom and Cammarata, Nick and Gao, Leo and Achiam, Joshua and Yeh, Catherine and Leike, Jan and Wu, Jeff and Saunders, William},
  year={2024},
  publisher={GitHub},
  howpublished={\url{https://github.com/openai/transformer-debugger}},
}

transformer-debugger's People

Contributors

wuthefwasthat avatar danmossing avatar antonosika avatar stalkermustang avatar stevenbills avatar machina-source avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.