stanfordnlp / pyvene Goto Github PK
View Code? Open in Web Editor NEWStanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
Home Page: http://pyvene.ai
License: Apache License 2.0
Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
Home Page: http://pyvene.ai
License: Apache License 2.0
Description:
Currently, the library only works for transformer-based models. For non-sequence-based models, MLP models; or other sequence-based models like RNN, the library cannot work well.
The first step moving forward to support other model types could be to showcase how this library will work for MLP models. The MLP model can be hand-crafted as well so that we know the counterfactual behaviors. We expect there will be hacks here and there to get things to work, but it will allow more model types.
No response
A bug happened! My understanding of intervene_on_prompt=False
is that it involves adding an intervention vector even when generating new tokens. However, when I tried setting intervene_on_prompt=False
, I encountered a RuntimeError:
CUDA error: device-side assert triggered. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging, consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
Does this toolkit currently support intervene_on_prompt=False
? Please correct me if I am wrong.
No response
Come to my office on Gates 3rd floor anytime ๐
IntervenableModel.save()
doesn't save trained model parameters. This is an issue when you are also training e.g. a classification head on top of our model and want to upload it to HF/save it locally. save_intervention()
does this properly though.
No response
Description:
Adding tests when using vanilla intervention on the GPT-2 model at different streams. This PR focuses on position-based intervention. It should cover a single position as well as multiple positions. It should also cover different layers.
Description:
Currently, we assume there is one source example associated with one intervention. Often, we want to reuse the same source example for multiple interventions, where these interventions are grouped for a purpose.
We want to support the concept of "grouping" during intervention configuration, as well as source inputs. With this, you can also flexibly skip some groups if you don't want to.
Changes need to be made in the common classes regarding the config, as well as how alignable
class consumes the source inputs.
In the new version, self.embed_dim in line 262 in interventions.py is initialized as None and never assigned value. This will cause the failure of running Doundless_DAS.ipynb:
Traceback (most recent call last):
File "/work/frink/sun.jiu/function_vectors/src/compute_rotational_subspace.py", line 300, in
intervenable = IntervenableModel(intervenable_config, model)
File "/work/frink/sun.jiu/miniconda3/envs/fv/lib/python3.10/site-packages/pyvene/models/intervenable_base.py", line 111, in init
intervention = intervention_function(
File "/work/frink/sun.jiu/miniconda3/envs/fv/lib/python3.10/site-packages/pyvene/models/interventions.py", line 265, in init
torch.arange(0, self.embed_dim), requires_grad=False
TypeError: arange() received an invalid combination of arguments - got (int, NoneType), but expected one of:
Simply changing the line into:
self.embed_dim = embed_dim
Would solve the issue.
Description:
Currently, we have,
class AlignableConfig(PretrainedConfig):
def __init__(
self,
alignable_model_type="gpt2",
alignable_representations=[
# we do distributed search over elements in the sublist.
AlignableRepresentationConfig()
],
alignable_interventions_type=VanillaIntervention,
alignable_low_rank_dimension=None,
mode="parallel",
**kwargs
):
We need to specify the type as a class type, not as an instance. This causes trouble. It is better to get alignable interventions as a list of actual instances, e.g., alignable_interventions = [VanillaIntervention()]. This will allow us to have more specifications for customizable interventions.
Hi, I am working with your newest version of the repo and got the tutorial.ipynb to work. However, when I run run_alignment.py with the training script at the end at your README.md, I run into the following error as the model tries to save checkpoints of the rotation layer:
Traceback (most recent call last):
File "/net/scratch/zhouy1/github/align-transformers-forked/run_alignment.py", line 183, in <module>
aligner.train(
File "/net/scratch/zhouy1/github/align-transformers-forked/trainer.py", line 222, in train
self.save_model(output_dir, 'pytorch-rotate-best.bin')
File "/net/scratch/zhouy1/github/align-transformers-forked/trainer.py", line 62, in save_model
'rotate_layer': self.model.module.model.rotate_layer.state_dict(),
File "/home/zhouy1/miniconda3/envs/BoundlessDAS/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1269, in __getattr__
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'AlignableLlamaForCausalLM' object has no attribute 'module'
Thanks!
Description:
After training, the intervention's artifacts are saved in memory without a good way of saving to disk with other metadata or sharing on huggingface marketplace. This will be a change to provide a smooth way of saving/sharing interventions trained by users.
The key thing will be serializing metadata into a shareable format (i.e., serializing and deserializing need both be tested). It will still require sharing parties to know the counterfactual dataset generation, but it is less of a problem of this library and more about sharing the dataset itself. And dataset sharing could be a separate process not included in this library.
This change should also consider sharing interventions that contain a vector store (some truthful direction for sharing, etc..).
Testing Done:
.Removing testing dir ./test_output_dir_prefix-d9080f
Removing testing dir ./test_output_dir_prefix-dff621
Removing testing dir ./test_output_dir_prefix-9227e2
Removing testing dir ./test_output_dir_prefix-6cb8c4
Removing testing dir ./test_output_dir_prefix-67cd73
----------------------------------------------------------------------
Ran 25 tests in 4.280s
OK
tutorials/basic_tutorials/Load_Save_and_Share_Interventions.ipynb
.Descriptions:
Currently, all the models need to have a config, and the config needs to be inheriting transformer library's config. The model config is only used in,
intervention = intervention_function(
embed_dim=get_dimension(
get_internal_model_type(model), model.config, representation
), **other_medata
)
to get the components dimension.
We should however allow config-less models where the dimension is directly read-off from the config dict, or dynamically figure out using some helper functions.
mlp_type_to_dimension_mapping = {
"block_input": (32,),
"block_output": (32,),
"mlp_activation": (32,),
}
torch.compile
)Meanwhile, the intervening component can accept arbitrary model component string e.g. model.h[2].attn.c_proj.output
, we can dynamically figure the component out.
Descriptions:
Currently, we only support basic interventions during model generation just like the model forward call. This is not ideal. In model generation, we want to support more free-formed interventions (e.g., intervene based on decoding steps or other decoding parameters, not just unit location as if it is in an intervened forward mode).
The current infra (also this applies to other existing intervention library as well) cannot support this. For instance, it does not support a specific decoding step intervention during decoding without more incisive code change. To support complex cases, we plan to introduce a new notion of Intervention Scheduler.
In the high-level, the scheduler is responsible to schedule interventions dynamically at inference time, and it is customizable. For instance, we can (1) intervene on all decoded punctuation tokens, or (2) all verbs that get decoded, or (3) all the last entity token that gets decoded in a specific entity set.
This enables us to a wide spectrum of ways to steer model behavior with interventions. This ticket may require multiple changes.
Descriptions:
Recent PR accidentally removes the supports of a couple of models. This PR adds them back. See issue raised: bf09440#diff-e5c418b4f46cb08393fe2f1c9698b2d20d5e51e746b6d3f092f05d4f9744dc98
Commonly, we want to exhaustively train DAS on every layer and position (or e.g. every attention head in a layer) to find which ones are causally relevant for the model's computations. When dealing with a fixed dataset, we could speed this process up by caching and reusing activations. Unclear what the best way to implement this is; should already be possible to have a minimal example with CollectIntervention
and the activations_sources=
arg in inference.
Receiving error message when visiting "Pyvene.ai" website. The message says the site can't be reached at the moment. It's taking too long to respond.
Visit https://pyvene.ai/ in Browser
Chrome : Version 123.0.6312.58 (Official Build) (64-bit)
OS : Win10 Pro for Workstations
Descriptions:
The library supports interventions on the torch model. Interventions can be attached to any subcomponents in the torch model. However, for a quantized model where the model is quantized with another wrapper, the intervention location can be more dynamic. See the quantized function here.
We want to support interventions on quantized models and intervene on correct components specified in the configuration.
Descriptions:
Currently, matrix exponential will fail with M1 chip MPS framework for rotation-based interventions. Need to figure out other ways to handle this so that M1 chip can run pyvene
with trainable interventions like DAS as well.
Hi,
I am recently exploring the repo and using Boundless DAS, is there a way to save and load the interventions? Thanks!
Description:
When using the hook, we can now support kwargs-based inputs by reading the input as a dictionary. However, we will always assume the dictionary only contains a single input (e.g., hidden representations). This assumption can easily go wrong. What should we do instead is to specify which part of the inputs we do interventions on in the config for the model.
Note that this will still result in coupled code with the Transformers library. Multiple PRs are required to move towards this direction.
Description:
Currently, we don't have a systematic way of unit testing the library. A good way is just to create some hand-crafted models (i.e., with fixed weights), do interventions, and check counterfactual behaviors.
Probably not hand-crafted transformers, but just simple MLPs with 3 hidden states, for instance.
Description:
Currently, we assume model weights are frozen when training intervention for alignments. We can also add support to this library so that models can be tuned with the intervention.
This can help reproduce interchange intervention training experiments in this paper. Or it can be used to reproduce experiments in the causal proxy model (i.e., using another explainer model to explain a Blackbox model)
Description:
Now everything is in the util file, including common model config, import as well as different helpers on hooks, etc.. It is better to separate them out into smaller files to increase readability and extensibility.
Could you maybe add "LlamaForCausalLM"
here? (It's not captured by "LLaMAForCausalLM"
)
My alpaca weights recovered as guided in https://github.com/tatsu-lab/stanford_alpaca has the following config for some reason:
LlamaConfig {
"architectures": [
"LlamaForCausalLM"
],
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 11008,
"max_position_embeddings": 2048,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"pad_token_id": 0,
"rms_norm_eps": 1e-06,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.29.2",
"use_cache": true,
"vocab_size": 32001
}
BTW, thanks for this awesome project!
Descriptions:
We will move away from the concept of "align" soon in light of future releases of this library. We thus change "alignable" to "intervenable" for all the occurrences.
Testing Done:
.Removing testing dir ./test_output_dir_prefix-185ec0
Removing testing dir ./test_output_dir_prefix-f2e887
Removing testing dir ./test_output_dir_prefix-c7b84c
Removing testing dir ./test_output_dir_prefix-9e7acd
Removing testing dir ./test_output_dir_prefix-762318
----------------------------------------------------------------------
Ran 25 tests in 8.546s
OK
zhengxuanwu@DNa8211a3 align-transformers % grep -Ri "alignable" .
Binary file ./.git/objects/pack/pack-96bfa18d843e36c6082333ae210e2c8c92ce2bbd.pack matches
Descriptions:
Interventions on activations at inference to steer model behaviors are good applications of this library. It fits the ultimate goal of this library well. Ideally, people should be able to share their steering mounting point along with injecting vectors with others easily.
Original GitHub:
https://github.com/likenneth/honest_llama
Descriptions:
Ideally, all the models listed here can be supported by this library without exposing the model details to the users of this library.
This requires we set up model folders for all model types and write config metadata for each of them annotating where to do interventions. This requires a lot of effort. This is a PR tracking the process towards the goal of supporting as many as we can.
Each model should take less than an hour to (1) configure and (2) write simple unit tests.
Here is the list of models that are in the pipeline to support (in order):
Need to make sure HF trainer can easily hook onto pv.IntervenableModel
. This means getting the right parameters for the optimiser, making sure model state is set correctly, etc.
Descriptions:
MLPConfig(num_labels=3).num_labels
prints,
2
the number of labels has not changed. This is probably because num_labels
is a default field in huggingface pretrained model config thus overwriting users' input. We need to change this field to a different name to avoid that.
Descriptions:
Mistral model shares architectures with other models, e.g., gpt-2 and also Mixtral. Supporting Mistral along with unit tests will help us to support models in this family.
Descriptions:
Currently, only basic integration tests around modules are included. It would be best if individual functions could be tested in a single unit test file separating from the integration test.
Testing Done:
.Removing testing dir ./test_output_dir_prefix-d9080f
Removing testing dir ./test_output_dir_prefix-dff621
Removing testing dir ./test_output_dir_prefix-9227e2
Removing testing dir ./test_output_dir_prefix-6cb8c4
Removing testing dir ./test_output_dir_prefix-67cd73
----------------------------------------------------------------------
Ran 25 tests in 4.280s
OK
Descriptions:
Specifically, focus on
Descriptions:
The library is built to support flexible intervention schemes. It comes at the cost of easiness to onboard. To make the library easier for simpler use cases, we want to support more fields in the intervenable config for "more static" intervention schemes.
For instance, for all examples, we want to intervene with the 4th token in the input at 6th layer of the transformer block output. With this, we don't need to provide intervention locations at runtime. We can simply provide a pair of base and source examples.
Descriptions:
Currently, we support a limited intervention use case on stateful models such as GRU. For instance, after the intervention, although the causal effect of the intervening site would ripple through time, we assume the inputs to be the same as before intervention. This is fine if the task setup doesn't care about the inputs, or it is simply input agnostic when generating, or allows teacher forcing (forcing to discard) when generating.
Here are some illustrations. Right now, we can support cross-time interventions as,
example 1:
(hiddens) h1, h2, h3, h4, h5, h6
(inputs) x1, x2, x3, x4, x5, x6
^
|
---
|
v
example 2:
(hiddens) h1', h2', h3', h4', h5', h6'
(inputs) x1', x2', x3', x4', x5', x6'
where we take h3'
from the second example to intervene in h2
from the first example through time. We then also update h3
to h6
after the intervention. However, we assume x3
to x6
still use the inputs from the example 1. This is acceptable, if during training, x3
to x6
are agnostic in terms of the model's generation (e.g., x2
is some trigger token so the model is in generation mode).
However, this is not ideal. Ideally, if we are dealing with autoregressive LMs, we want x3
to be the intervened model output at the previous step. This requires the model to pass gradients through time. One simple solution is to update the model to do Gumbel-Softmax
to softly select the token and pass it to the next time step as the input embedding.
The change may be only on the modeling side. We need to change the model to do soft token selection which allows gradients. However, this is compatible with the library since only in intervention mode, does this input-based unrolling make sense.
Description:
Currently, we are enabling interventions to take in a new argument, subspaces
so that interventions can only intervene on selected subspaces.
Right now, the subspace is partitioned into different chunks during configurations. This makes interventions that work with discrete neuron level subspace hard.
We need to change the intervention to accept subspaces
as a list of dimensions e.g., [[1,3,5]]
meaning we are intervening on 1st, 3rd, and 5th neurons.
A lot of the old tutorials are using the old API for representation configs and interventions; some of them might even be broken because of this. We should upgrade all of them to the new format to showcase how the library is designed to be used. Thanks to @smejak for pointing this out.
Edit: Specifically, intro DAS tutorial isn't working in colab, intervened forward pass is failing. Will debug this later today.
Descriptions:
Specifically, focus only on these 3 functions
gather_neurons
scatter_neurons
output_to_subcomponent
Hello,
I was wondering what exactly the counterfactual sampling procedure in lower_bound_alignment_example_sampler
does. Do the base and counterfactual labels have to be different, or can they be the same? For example, for a counterfactual label like "No", do we only want to sample base and source amounts such that the base label "Yes" is changed to "No" after intervention, or can the base label also be "No"? The code seems to suggest the latter scenario.
In that case, I put in in-line comments for what seems like a potential bug. When base_source_regions
is [2, 3], The base left and right boundary values are (Yes, Yes), and that of the source is (Yes, No). The base label is "Yes", but after intervening on the left boundary, it is still "Yes". Any clarification is much appreciated!
Description:
The training for interventions is done through backprop from the model loss. There are other use cases such as intervention itself could be supervised by other objectives such as probes attached to the intervention site.
We can add a basic classification head on top of the intervention object so that additional gradients can be backprop through the probe to the interventions. This could allow this library to support basic probing experiments with desired class labels.
Description:
This PR only includes minimum changes to the repo to make the library extensible to all types of RNN models. The key difference is that the hook function has to be stateful, i.e., it needs to be aware of its "step" when it is called. This ties back to the fact that RNN-based models or transformer model when generating sequences are acting like stateful models.
This change will ideally include a simple tutorial for how to do intervention with a very simple RNN at any step. The input fields should stay the same, but the hook needs to do bookkeeping of extra fields in the memory to do stateful interventions.
Pyvene is a library featuring interchange interventions. It frequently needs to process datasets that contain two sets of input_ids and (maybe) two sets of labels. When we need to train these libraries with batched datasets, the collator issue starts to arise: there is no existing collator that supports padding both sets of input_ids of different lengths at the same time.
Hugging face transformers only pad the "input_ids" entries in the dataset
In addition to above, DataCollatorForSeq2Seq only pads "labels".
So dataset entries like "source_input_ids" are not padded, a problematic issue.
Adding a utility supporting this may help pyvene develop in general.
Description:
Currently, the library aims for flexibility in the inputs as well as a small training batch size in case the intervention is trainable. For instance, we assume each example in the batch can have different intervention locations as well as different intervention subspaces allowing more flexible configurations.
This is not desired when we have a large batch size, and intervention location does not change within a batch. Suppose we want to localize (a+b) with a simple NN that solves (a+b)*c, and we want to localize (a+b) with DAS and a fixed dimensionality of 16, the intervention location stays the same. However, current code will actually do the intervention in the example-level, not in the batch level. See,
for batch_i, locations in enumerate(unit_locations):
tensor_input[
batch_i, locations, start_index:end_index
] = replacing_tensor_input[batch_i]
this can be,
tensor_input[
:, locations, start_index:end_index
] = replacing_tensor_input[:]
subspace intervention,
if subspaces is not None:
for example_i in range(len(subspaces)):
# render subspace as column indices
sel_subspace_indices = []
for subspace in subspaces[example_i]:
sel_subspace_indices.extend(
[
i for i in range(
subspace_partition[subspace][0],
subspace_partition[subspace][1]
)
])
if mode == "interchange":
base[example_i, ..., sel_subspace_indices] = \
source[example_i, ..., sel_subspace_indices]
elif mode == "add":
base[example_i, ..., sel_subspace_indices] += \
source[example_i, ..., sel_subspace_indices]
elif mode == "subtract":
base[example_i, ..., sel_subspace_indices] -= \
source[example_i, ..., sel_subspace_indices]
can be,
if subspaces is not None:
if subspace_partition is None:
sel_subspace_indices = subspaces[0]
else:
sel_subspace_indices = []
for subspace in subspaces[0]:
sel_subspace_indices.extend(
[
i for i in range(
subspace_partition[subspace][0],
subspace_partition[subspace][1]
)
])
if mode == "interchange":
base[..., sel_subspace_indices] = \
source[..., sel_subspace_indices]
elif mode == "add":
base[..., sel_subspace_indices] += \
source[..., sel_subspace_indices]
elif mode == "subtract":
base[..., sel_subspace_indices] -= \
source[..., sel_subspace_indices]
else:
base[..., :interchange_dim] = source[..., :interchange_dim]
We should enable a flag as use_fact
in the alignable config, and do a validation check that fails fast during the forward call.
This PR tracks the use_fast
effort for position-based intervention as well as subspace-based intervention. It does not cover head-based or head+position-based yet. Will cover the latter one in a separate PR.
Testing Done:
In case multiple location tags are passed only the first one will be considered
testing stream: value_output with a single position
WARNING:root:Detected use_fast=True means the intervention location will be static within a batch.
In case multiple location tags are passed only the first one will be considered
.
----------------------------------------------------------------------
Ran 18 tests in 30.117s
OK
Description:
There is a clear use-case of intervening in certain positions of selected heads (e.g., intervening x-th token in head y). For instance, we can see how different heads at each position handle the information. We can see where there is an induction head on top of certain positions.
We probably need multiple changes to achieve the goal without bugs. This ticket marks the first step of it. We will support basic nested intervention locations. Specifically, we want to have the following capability,
_, counterfactual_outputs = alignable(
base,
sources,
{"sources->base": ([
[[[target_head]], [[pos_i]]] # intervene w/ target_head's pos_i
], [
[[[target_head]], [[pos_i]]] # intervene on target_head's pos_i
])}
)
where target_head
is a list of specific heads, and we want to intervene on a list of pos_i
for each head in the target_head
. With this, we can intervene on 3rd token representation from 4th head.
Descriptions:
The first PR to enable use_fast
(https://github.com/frankaging/align-transformers/issues/33) does not cover head-related interventions. We want to enable this for head index + position index as well when used compositionally.
The following code needs to be changed,
https://github.com/frankaging/align-transformers/blob/main/models/modeling_utils.py#L464
if "head" in alignable_representation_type:
start_index = 0 if start_index is None else start_index
end_index = 0 if end_index is None else end_index
# head-based scattering
if alignable_unit in {"h.pos"}:
# we assume unit_locations is a tuple
for head_batch_i, head_locations in enumerate(unit_locations[0]):
for head_loc_i, head_loc in enumerate(head_locations):
for pos_loc_i, pos_loc in enumerate(unit_locations[1][head_batch_i]):
h_start_index = start_index+head_loc*attn_head_size
h_end_index = start_index+(head_loc+1)*attn_head_size
tensor_input[
head_batch_i, pos_loc, h_start_index:h_end_index
] = replacing_tensor_input[head_batch_i, head_loc_i, pos_loc_i] # [dh]
else:
for batch_i, locations in enumerate(unit_locations):
for loc_i, loc in enumerate(locations):
h_start_index = start_index+loc*attn_head_size
h_end_index = start_index+(loc+1)*attn_head_size
tensor_input[
batch_i, :, h_start_index:h_end_index
] = replacing_tensor_input[batch_i, loc_i] # [s, dh]
Descriptions:
The library is not tested with multi-GPU use cases. We assume the intervening model can be loaded into a single GPU. This is not ideal for interventions on 70B models, for instance. We want to be able to load the model into multiple GPUs using sharding.
Static interventions need to be attached to the right component on the right machine in case of model sharing. Training interventions need to be mapped onto the right machine where the corresponding model component lives as well.
This could be a large task. The first step is clear: try out static interventions (e.g., vanilla interventions) when models are loaded into multiple GPUs during inference time.
Description:
Currently, in the generate
mode, the code only works if you intervene on the first prompt token, and every decoding token, given the activation caching mechanism of the huggingface library.
We want to provide more generic support: 1) intervene on different tokens in the prompt; 2) every decoding step.
Descriptions:
The IntervenableModel
is a torch.nn.Module
. So, this can be used inside another torch model, or even pipeline object (e.g., Huggingface pipeline). Here is a quick code snippet,
import pyvene
from pyvene import IntervenableRepresentationConfig, IntervenableConfig, IntervenableModel
# provided wrapper for huggingface gpt2 model
_, tokenizer, gpt2 = pyvene.create_gpt2()
# turn gpt2 into intervenable_gpt2
intervenable_gpt2 = IntervenableModel(
intervenable_config = IntervenableConfig(
intervenable_representations=[
IntervenableRepresentationConfig(
0, # intervening layer 0
"mlp_output", # intervening mlp output
"pos", # intervening based on positional indices of tokens
1 # maximally intervening one token
),
],
),
model = gpt2
)
import torch
import torch.nn as nn
from typing import List, Optional, Tuple, Union, Dict
class ModelWithIntervenables(nn.Module):
def __init__(self):
super(ModelWithIntervenables, self).__init__()
self.intervenable_gpt2 = intervenable_gpt2
self.relu = nn.ReLU()
self.fc = nn.Linear(768, 1)
# Your other downstream components go here
def forward(
self,
base,
sources: Optional[List] = None,
unit_locations: Optional[Dict] = None,
activations_sources: Optional[Dict] = None,
subspaces: Optional[List] = None,
):
_, counterfactual_x = self.intervenable_gpt2(
base,
sources,
unit_locations,
activations_sources,
subspaces
)
counterfactual_x = counterfactual_x.last_hidden_state
counterfactual_x = self.relu(counterfactual_x)
counterfactual_x = self.fc(counterfactual_x)
return counterfactual_x
and then you can run forward as usual,
model = ModelWithIntervenables()
base = tokenizer("The capital of Spain is", return_tensors="pt")
sources = [
tokenizer("The capital of Italy is", return_tensors="pt"),
]
model(
base, sources, {"sources->base": ([[[4]]], [[[4]]])}
)
which returns,
tensor([[[2.7027],
[6.3036],
[6.1785],
[6.4302],
[8.0921]]], grad_fn=<ViewBackward0>)
We should add support for training sparse autoencoders (Bricken et al., 2023, Cunningham et al., 2023). Cool be cool as a way of obtaining a feature basis for interventions.
Description:
One limitation with the current codebase is that, for DAS or anything similar, we only support aligning one causal variable at a time. This breaks the assumption that we are using DAS to find a new basis from which we can interpret orthogonal causal variables among different axes in that learned basis! Learning separate basis for different causal variables break this core assumption.
We need to have a new type of intervention that supports multi-variable alignments. And this is more like a special need for DAS, or other basis-respect interventions.
Impact files may include intervention, and alignable config as well as the input fields needed to show causal variable index.
Descriptions:
Currently, if only base is provided, it will just run the regular model forward without any interventions. We want to support the use case of collecting activations if only base is provided as well.
The activation collection is done using a no-op intervention. See: https://github.com/frankaging/align-transformers/issues/26
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.