nrimsky / caa Goto Github PK

View Code? Open in Web Editor NEW

72.0 72.0 26.0 1.16 GB

Steering Llama 2 with Contrastive Activation Addition

Home Page: https://arxiv.org/abs/2312.06681

License: MIT License

Python 22.71% Shell 1.23% Jupyter Notebook 61.12% TeX 14.93%

llama2 pytorch

caa's Introduction

https://ninapanickssery.com

caa's People

Contributors

Stargazers

Watchers

caa's Issues

How is this code licensed?

Adding a LICENSE file to the repo would make it clear what the license for the code is, and where/if it's allowed to use this code in other places.

Are the datasets all the same version?

The examples in preprocessed_data/generate_dataset.json seem different from those in llm_generated_data/agree_disagree_dataset.json.

For example, the first example in generate_dataset_json starts with this:

Hello, my name is Linda Johnson and I am a researcher in the field of artificial intelligence with a specialization in natural language processing...

However, the name Linda Johnson doesn't appear at all in agree_disagree_dataset.json, which leads me to believe these might be artifacts from two different version of the dataset.

Activations are added in twice in BlockOutputWrapper

It appears that BlockOutputWrapper adds activations to the output twice, which looks like a bug. augmented_output contains the steering vector added into the outputs at the correctly masked token positions according to add_only_after_end_str. However, then on line 76, the steering vector is added in a second time, this time at every token position:

output = (augmented_output + self.add_activations,) + output[1:]

Is this desired behavior? It seems like line 76 should be the following, since the steering vector is already included in augmented_output :

output = (augmented_output,) + output[1:]

Currently the response tokens get the steering vector added twice, and all other tokens get the steering vector added once as a result of this, if I understand the code correctly.

GPU memory

Hi @nrimsky, thanks for your work! I read in your paper that you benchmarked using two L4 GPUs.

I tried to run a VM with this hardware using the 13B model, but I am getting GPU OOM errors with the current main branch.

Are there some parallelization, lower precision, or other techniques you are using to fit this 13B model onto the L4 GPU with 24 GB of GPU memory? I'm calculating this 13B model will naively need 26 GB for inference and 52 GB for fine-tuning.

Thank you in advance.

Off-by-one index adding steering vector to generation tokens

It appears that the the steering vector is added at the second generation token, rather than the first generation token, when using add_only_after_end_str=True. The function find_instruction_end_position() finds the correct index of the [/INST] token, however add_vector_after_position() masks out everything including the final ] of the [/INST].

This is likely a mistake because the final input token ] is also the first generated output token.

For instance, this can be seen by running LlamaWrapper with the input "[INST] Paris is in [/INST]", and having the model generate the next token. This sentence tokenizes to the following:

# total len: 11 tokens
['<s>', '[', 'INST', ']', 'Paris', 'is', 'in', '[', '/', 'INST', ']']

find_instruction_end_position() correctly finds that index 10, the final token of the sequence, is the end of the [/INST]. However, add_vector_after_position() generates the following mask:

[[[False],[False],[False],[False],[False],[False],[False],[False],[False],[False],[False]]]

specifically, every position is masked, so the steering vector won't be added at all during first token generation.

After generating a token, for instance " France", the input then becomes:

# total len: 12 tokens
['<s>', '[', 'INST', ']', 'Paris', 'is', 'in', '[', '/', 'INST', ']', 'France']

And the mask from add_vector_after_position() becomes:

[[[False],[False],[False],[False],[False],[False],[False],[False],[False],[False],[False],[True]]]

It's also possible I'm misunderstanding the intended behavior of the code, so apologies if that's the case here!

Adding support for other architectures

Hi!
I see that you are still pretty actively updating this codebase. I've been working on expanding it to add other model architectures (Mamba (which I already got working), RWKV, Hyena). Was wondering if you would be interested in a pull request, adding those models, or if you think it would pollute and confuse people as your article only mentions Llama.
Nice work,
Gonçalo

nrimsky / caa Goto Github PK

caa's Introduction

caa's People

Contributors

Stargazers

Watchers

Forkers

caa's Issues

How is this code licensed?

Are the datasets all the same version?

Activations are added in twice in BlockOutputWrapper

GPU memory

Off-by-one index adding steering vector to generation tokens

Adding support for other architectures

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent