Comments (2)
Questions:
- how do I generate the vector?
They run a two token forward pass. I think I should run a one token forward pass since I don't have EOS.
I do have the padding tokens though, so could pad these tokens to do the forward pass. I would then need
to align it as we went forward which seems doable.
- Where do I add it in? (layer, position): Can choose layer, they use resid_pre (editing the residual stream)
def get_resid_pre(prompt: str, layer: int):
name = f"blocks.{layer}.hook_resid_pre"
cache, caching_hooks, _ = model.get_caching_hooks(lambda n: n == name)
with model.hooks(fwd_hooks=caching_hooks):
_ = model(prompt)
return cache[name]
def ave_hook(resid_pre, hook):
if resid_pre.shape[1] == 1:
return # caching in model.generate for new tokens
# We only add to the prompt (first call), not the generated tokens.
ppos, apos = resid_pre.shape[1], act_diff.shape[1]
assert apos <= ppos, f"More mod tokens ({apos}) then prompt tokens ({ppos})!"
# add to the beginning (position-wise) of the activations
resid_pre[:, :apos, :] += coeff * act_diff
from decisiontransformerinterpretability.
How do I apply this procedure in DTI?
- Take the residual stream of the forward pass at some layer and inject it into the same layer (at some sequence position).
- What would this mean in the memory env? An obvious candidate is to inject the residual stream of layer two from a model with one goal into the residual stream of a model with a seperate goal. This is essentially activation patching in reverse. Take the corrupted pass residual stream and add it to the residual stream of another path.
I think the easiest way for me to do this, is to do it which the AVEC code. I could parameterise it so we can get more info on the outcomes (eg, layer, head etc)
from decisiontransformerinterpretability.
Related Issues (20)
- Cuda cannot be disabled HOT 3
- Over resource limits on Streamlit Cloud
- Over resource limits on Streamlit Cloud
- Train a BC on PCT traj = 1 with two different agents mixed in and see if we can tell which one it thinks it is.
- Facelift of the RTG Scan in the streamlit app HOT 1
- Improve history panel in streamlit app HOT 1
- SVD Decomp / Explore ways to use dimensionality reduction to quickly understand what heads are doing. HOT 1
- Make it possible to track the preferences of the PPO in the app. HOT 1
- Folding Layer Norm in Model Loading HOT 9
- Streamlit app requires mujoco installation
- Write a post before EAG London HOT 2
- Expand analytical AVEC
- Look into why MemoryDT appears to have no bias on the value terms. HOT 1
- Reverse Logit Lense
- Complete Embedding visualizations
- Shapley Values on Attention Heads or Causal Edges Via Ablation
- Fix Ablation Tool
- Complete QK/OV Circuit visualizations
- "Algebraic value editing" raises exception HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from decisiontransformerinterpretability.