Git Product home page Git Product logo

caa's Introduction

caa's People

Contributors

nrimsky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

caa's Issues

How is this code licensed?

Adding a LICENSE file to the repo would make it clear what the license for the code is, and where/if it's allowed to use this code in other places.

Are the datasets all the same version?

The examples in preprocessed_data/generate_dataset.json seem different from those in llm_generated_data/agree_disagree_dataset.json.

For example, the first example in generate_dataset_json starts with this:

Hello, my name is Linda Johnson and I am a researcher in the field of artificial intelligence with a specialization in natural language processing...

However, the name Linda Johnson doesn't appear at all in agree_disagree_dataset.json, which leads me to believe these might be artifacts from two different version of the dataset.

Activations are added in twice in BlockOutputWrapper

It appears that BlockOutputWrapper adds activations to the output twice, which looks like a bug. augmented_output contains the steering vector added into the outputs at the correctly masked token positions according to add_only_after_end_str. However, then on line 76, the steering vector is added in a second time, this time at every token position:

output = (augmented_output + self.add_activations,) + output[1:]

Is this desired behavior? It seems like line 76 should be the following, since the steering vector is already included in augmented_output :

output = (augmented_output,) + output[1:]

Currently the response tokens get the steering vector added twice, and all other tokens get the steering vector added once as a result of this, if I understand the code correctly.

GPU memory

Hi @nrimsky, thanks for your work! I read in your paper that you benchmarked using two L4 GPUs.

I tried to run a VM with this hardware using the 13B model, but I am getting GPU OOM errors with the current main branch.

Are there some parallelization, lower precision, or other techniques you are using to fit this 13B model onto the L4 GPU with 24 GB of GPU memory? I'm calculating this 13B model will naively need 26 GB for inference and 52 GB for fine-tuning.

Thank you in advance.

Off-by-one index adding steering vector to generation tokens

It appears that the the steering vector is added at the second generation token, rather than the first generation token, when using add_only_after_end_str=True. The function find_instruction_end_position() finds the correct index of the [/INST] token, however add_vector_after_position() masks out everything including the final ] of the [/INST].

This is likely a mistake because the final input token ] is also the first generated output token.

For instance, this can be seen by running LlamaWrapper with the input "[INST] Paris is in [/INST]", and having the model generate the next token. This sentence tokenizes to the following:

# total len: 11 tokens
['<s>', '[', 'INST', ']', 'Paris', 'is', 'in', '[', '/', 'INST', ']']

find_instruction_end_position() correctly finds that index 10, the final token of the sequence, is the end of the [/INST]. However, add_vector_after_position() generates the following mask:

[[[False],[False],[False],[False],[False],[False],[False],[False],[False],[False],[False]]]

specifically, every position is masked, so the steering vector won't be added at all during first token generation.

After generating a token, for instance " France", the input then becomes:

# total len: 12 tokens
['<s>', '[', 'INST', ']', 'Paris', 'is', 'in', '[', '/', 'INST', ']', 'France']

And the mask from add_vector_after_position() becomes:

[[[False],[False],[False],[False],[False],[False],[False],[False],[False],[False],[False],[True]]]

It's also possible I'm misunderstanding the intended behavior of the code, so apologies if that's the case here!

Adding support for other architectures

Hi!
I see that you are still pretty actively updating this codebase. I've been working on expanding it to add other model architectures (Mamba (which I already got working), RWKV, Hyena). Was wondering if you would be interested in a pull request, adding those models, or if you think it would pollute and confuse people as your article only mentions Llama.
Nice work,
Gonçalo

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.