One of the most common use-cases for SAELens is to apply a pre-trained SAE visible on

<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="22

Proposal: Add helper function to apply a SAE or SAEDict to a model about saelens HOT 2 CLOSED

jbloomaus commented on July 30, 2024

Proposal: Add helper function to apply a SAE or SAEDict to a model

from saelens.

Comments (2)

dtch1997 commented on July 30, 2024

#81 seems to implement the core functionality that would be needed here. I've defined an SAEPatcher class to create forward and backward hooks to inject a single SAE into the model and save its features and activations. The implementation matches this description:

always inject the SAE into model computation + error term by default, so the SAE gets gradients, tracks features, while not modifying model output

Possibly that could be re-written or extended to more closely match the usage you've defined here, e.g.:

Defining a context manager method SAEPatcher.track_activations that returns an ActivationCache

from saelens.

jbloomAus commented on July 30, 2024

HookedSAE Transformer provides this functionality and was merged as part of the 3.0 release.

from saelens.

Proposal: Add helper function to apply a SAE or SAEDict to a model about saelens HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent