Comments (2)
#81 seems to implement the core functionality that would be needed here. I've defined an SAEPatcher
class to create forward and backward hooks to inject a single SAE into the model and save its features and activations. The implementation matches this description:
always inject the SAE into model computation + error term by default, so the SAE gets gradients, tracks features, while not modifying model output
Possibly that could be re-written or extended to more closely match the usage you've defined here, e.g.:
- Defining a context manager method
SAEPatcher.track_activations
that returns anActivationCache
from saelens.
HookedSAE Transformer provides this functionality and was merged as part of the 3.0 release.
from saelens.
Related Issues (20)
- How to train SAEs on my own model? HOT 1
- [Bug Report] SAE.from_pretrained errors out in Hooked_SAE_Transformer_Demo.ipynb HOT 2
- [Bug Report] Log sparsity artifact seems wrong HOT 2
- [Proposal] Implement OpenAI's TopK Auxilliary for preventing dead latents.
- [Bug Report] When installing sae_lens get warning that "cannot import name 'deprecated' from 'typing_extensions'" when importing HOT 1
- [Proposal] Support Remaining GPT2 Small OpenAI SAEs in SAE Lens
- [Bug Report] Not able to load pretrained SAE using gated architecture HOT 2
- [Proposal] Make SAE Lens compatible with the latest Transformers / TransformerLens versions
- [Bug Report] load_from_pretrained should respect device parameter
- [Proposal] Save norm scaling factor HOT 1
- [Bug Report] Model saving fails on models with `/` in the name
- [Bug Report] Fine-tuning SAEs with from_pretrained_path seems broken
- [Bug Report] Backwards hooks broken in v2.x.x HOT 1
- [Bug Report] read_sae_from_disk should default to the SAE dtype unless it's overridden
- [Proposal] Add use_error_term as an optional param for HookedSAETransformer.add_sae HOT 1
- [Bug Report] hook_sae_input should not be on x - self.b_dec, it should be on x HOT 1
- [Proposal] Add a custom SAE.to method to update sae.dtype and sae.cfg.dtype when running sae.to(torch.float32)
- [Proposal] Download tracking, code snippets, and filtering on the Hugging Face Hub HOT 3
- [Proposal] Use stop_at_layer when running with cache in evals
- [Bug Report] sae-lens and umap-learn compatibility issue HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from saelens.