kmeng01 / memit Goto Github PK

View Code? Open in Web Editor NEW

406.0 406.0 42.0 101 KB

Mass-editing thousands of facts into a transformer memory (ICLR 2023)

Home Page: https://memit.baulab.info

License: MIT License

Python 78.07% Jupyter Notebook 20.92% Shell 1.01%

editing gpt pytorch transformer

memit's People

Contributors

Stargazers

Watchers

memit's Issues

No optimization after first step

Is it normal for the edit to stop optimizing after first step?

Command:
python -m experiments.evaluate —alg_name=MEMIT —model_name="EleutherAI/gpt-j-6B" —hparams_fname=EleutherAI_gpt-j-6B.json —ds_name=cf —dataset_size_limit=1000 —num_edits=1

Example:

Tying optimization objective to 27
Recording initial value of v*
loss 10.159 = 10.159 + 0.0 + 0.0 avg prob of [ bishop] 0.00010042281064670533
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
loss 7.014 = 7.006 + 0.005 + 0.003 avg prob of [ bishop] 0.0024381510447710752
Init norm 67.62352752685547 | Delta norm 31.999399185180664 | Target norm 75.150146484375```

It seems like the NLL loss could be further optimized?

what is the difference between multicounterfact and counterfact?

Discussion: About Knowledge Editing

Dear authors:
I read two of your articles on knowledge editing and benefited a lot.
Sorry to bother you, I have two questions I want to make sure of with you:

MEMIT's Equation 9, why specifically split into two items, looks like it can be combined into [1,u]. Does the specific split [1,n] and [n+1,u] have any special meanings?
ROME looks like it can also perform batch editing? The original paper is to calculate a set of [k*, v*] and then update W_proj. But what if I calculate [k*_1, k*_2...k*_n] with [v*_1, v*_2...v*_n], and then update W_proj? In this way, ROME can also be batch edited？

Looking forward to your reply, thank you

[1] Locating and Editing Factual Associations in GPT
[2] MASS-EDITING MEMORY IN A TRANSFORMER

Can it work with Llama 3 / other 7b models?

Could this process be used on Llama 3 8b? I find that "Llama 3 is an auto-regressive language model that uses an optimized transformer architecture."
I have a task to update the knowledge in Llama 3 for the Phaser game engine. It currently gives mixed responses using both Phaser 2 and Phaser 3 code examples. We'd like it to either learn the difference or forget everything it 'knows' about Phaser 2 (and 3 if necessary - I have an effective RAG to provide Phaser 3 knowledge in this case).
(Llama 3 is just a 'for instance', we're not locked into any single model at the moment. I'm about to start trying a variety of them to see if there's something more suited to this task).

Applying to other models

I understand this was written for autoregressive models, but do you think you could apply it to the decoder portion of an encoder to dcoder module? Like, if the input is "What is the capital of France?" and the decoder outputs "The capital of France is" and you change the appropriate MLP to point to Rome? with the subject being France?

IndexError: tuple index out of range at cur_repr processing stage

I'm trying to replicate MEMIT on GPTJ-6B, and I'm getting the following error (just on the first request/prompt example in your memit.ipynb notebook):

Traceback (most recent call last):
  File "notebooks/my_new_file_copying_your_interactive_notebook.py", line 58, in <module>
    model_new, orig_weights = demo_model_editing(
  File "/memit/notebooks/experiments/py/demo.py", line 50, in demo_model_editing
    model_new, orig_weights = apply_method(
  File "/memit/notebooks/memit/memit_main.py", line 44, in apply_memit_to_model
    deltas = execute_memit(model, tok, requests, hparams, cache_template=cache_template)
  File "/memit/notebooks/memit/memit_main.py", line 160, in execute_memit
    cur_zs = get_module_input_output_at_words(
  File "/memit/notebooks/memit/compute_z.py", line 212, in get_module_input_output_at_words
    l_input, l_output = repr_tools.get_reprs_at_word_tokens(
  File "/memit/notebooks/rome/repr_tools.py", line 32, in get_reprs_at_word_tokens
    return get_reprs_at_idxs(
  File "/memit/notebooks/rome/repr_tools.py", line 150, in get_reprs_at_idxs
    _process(tr.input, batch_idxs, "in")
  File "/memit/notebooks/rome/repr_tools.py", line 131, in _process
    cur_repr = cur_repr[0] if type(cur_repr) is tuple else cur_repr
IndexError: tuple index out of range

I also tried running the non-notebook version via experiments.evaluate and got the same exact error. Doing some debugging printouts there revealed that this error occurred on calling _process for the input, and the tuple cur_repr was an empty tuple, with a batch_idxs value of [[7]]. Thus, I'm unable to apply MEMIT and move forward with the evaluation. Is anyone else running into this issue, and if so how were you able to resolve it?

One error

I sincerely hope you could tell me how to handle this error？
Traceback (most recent call last):
File "/home/lyn/miniconda3/envs/memit/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/lyn/miniconda3/envs/memit/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/lyn/memit/experiments/evaluate.py", line 299, in
main(
File "/home/lyn/memit/experiments/evaluate.py", line 146, in main
edited_model, weights_copy = apply_algo(
File "/home/lyn/memit/memit/memit_main.py", line 44, in apply_memit_to_model
deltas = execute_memit(model, tok, requests, hparams, cache_template=cache_template)
File "/home/lyn/memit/memit/memit_main.py", line 196, in execute_memit
adj_k = torch.linalg.solve(
torch._C._LinAlgError: linalg.solve: The diagonal element 2 is zero, the solve could not be completed because the input matrix is singular.
I just run with this command:
CUDA_VISIBLE_DEVICES=2 python3 -m experiments.evaluate --alg_name=MEMIT --model_name=/home/lyn/EleutherAI/gpt-j-6B --hparams_fname=EleutherAI_gpt-j-6B.json --num_edits=1

Distributing the update across multiple layer

Hey,
Thanks for sharing your work!
I have a question about how you chose to spread the residual across the remaining layers at each update step (Eq. 20).
You chose the updated values as:
M' = M + residual / (L - l + 1)
claiming it spreads the residual equally across the updated layers, but actually if there are 4 updates layers:
the first layer will provide 1/4 of the residual,
the second layer will provide 1/12 (=1/3 - 1/4) of the residual,
the third layer will provide 1/6 (=1/2-1/3) of the residual,
and the fourth layer will provide 1/2 (=1-1/2) of the residual.

Shouldn't the correct update be:
M' = M + residual * (l - first_edited_layer + 1) / (L - first_edited_layer + 1)?

Thanks

Missing `data` folder in root directory

Issue Description

When locally running memit.ipynb, an error occurs as follows:

Retrieving covariance statistics for EleutherAI_gpt-j-6B @ transformer.h.3.mlp.fc_out.
Attempting to download EleutherAI_gpt-j-6B/wikipedia_stats/transformer.h.3.mlp.fc_out_float32_mom2_100000.npz from https://memit.baulab.info/data/stats/EleutherAI_gpt-j-6B/wikipedia_stats/transformer.h.3.mlp.fc_out_float32_mom2_100000.npz.
Unable to download due to [Errno 17] File exists: 'data'. Computing locally....
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
notebooks/memit/memit_main.py:44, in apply_memit_to_model(model, tok, requests, hparams, copy, return_orig_weights, cache_template)
...
    461         )
    463 # if not using an existing config, then create a new config on the fly
    464 if not builder_config:

ValueError: BuilderConfig 20200501.en not found. Available: ['20220301.aa',...

Cause

The error seems to occur because the data folder in the root directory is missing, which could be because Git ignores empty folders.

Solution

To resolve the issue, simply add an empty data folder to the root directory. This should allow the script to run without encountering the "ValueError".

Multi-GPU support for MEMIT

[Issue desc]
I've try to run MEMIT on the machine with 32GB GPU * 2 (Tesla V100), under the 10k edit number.
However, it fails, due to the large weight size(GPT-J-6B) and the data(even in small batch_size) could not be placed into single GPU instance.

[Comment for Emhancement]
I would like to ask if there is an implementation on the support for Multi-GPU on MEMIT?
I mean the model could be placed in single one GPU, and every batches of data wich leveraged by model could be placed in another GPU. Overall, a Non-distributed 2 GPU(or more) training implementation.

Paraphrase prompts' format not compatible with the sample from ROME paper

When reading the ROME paper, I found this sample from CounterFact:

{
"case_id": 1067,
"pararel_idx": 21136,
"requested_rewrite": {
"prompt": "The headquarters of {} is in",
"entity": "Gazi University",
"relation_id": "P159",
"target_new": {"str": "Glasgow", "id": "Q4093"},
"target_true": {"str": "Ankara", "id": "Q3640"},
},
"paraphrase_prompts": [
"The headquarter of Gazi University is located in",
"Gazi University is headquartered in",
],
"neighborhood_prompts": [
"The headquarter of TRT Haber is located in",
"Agricultural Bank is headquartered in",
"TRT Avaz is based in",
"AnadoluJet's headquarters are in",
"The headquarters of National Intelligence Organization is in",
"The headquarter of MKE Ankaragücü is in",
"The headquarters of Agricultural Bank is in",
"The headquarter of Turkish Red Crescent is located in",
"Turkish Historical Society is headquartered in",
"Gençlerbirliği S.K. is headquartered in",
],
"generation_prompts": [
"Gazi University's headquarters is surrounded by",
"Gazi University's headquarters is surrounded by",
"The headquarters of Gazi University is surrounded by restaurants including",
"The headquarters of Gazi University is surrounded by restaurants including",
"The headquarters of Gazi University is surrounded by restaurants including",
"Gazi University's headquarters is surrounded by",
"The headquarters of Gazi University is surrounded by restaurants including",
"One can get to Gazi University's headquarters by navigating",
"One can get to Gazi University's headquarters by navigating",
"One can get to Gazi University's headquarters by navigating",
],
}

But the actual dataset that can be found here (https://memit.baulab.info/data/dsets/counterfact.json) has a different format for paraphrase prompts. Here is an example (I'll put only the paraphrase prompts):
{
...
"paraphrase_prompts": [
"Shayna does this and Yossel goes still and dies. Danielle Darrieux, a native",
"An album was recorded for Capitol Nashville but never released. Danielle Darrieux spoke the language"
],
...
}

We notice the apparently random sentences at the start of each paraphrase prompt. The code does not seem to filter these prefixes.

If this is not an error, why there is this difference ? And what is its impact on the evaluation procedure ?

IndexError: tuple index out of range

CUDA out of memory

Hi authors,

MEMIT is an interesting work.
When I run: python -m experiments.evaluate --alg_name=MEMIT --model_name=EleutherAI/gpt-j-6B --hparams_fname=EleutherAI_gpt-j-6B.json --num_edits=10 --use_cache

There is error about CUDA out of memory:
File "\memit-main\memit\memit_main.py", line 97, in
weights_copy = {k: v.detach().clone() for k, v in weights.items()}
RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 24.00 GiB total capacity; 23.15 GiB already allocated; 0 bytes free; 23.16 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

My local is 24GB 3090 GPU.
Would you help me how to run the MEMIT code ? How to revise the configure file (EleutherAI_gpt-j-6B.json) , in order to reduce memory?

Thank you very much.
MEMIT is very nice.

muti-counterfact and counterfact

What is the difference between muti-counterfact and counterfact dataset? Which one should I choose to rebuild the result in the paper?

GPU not big enough? I'm using A5500 24GB RAM

the paper uses an A6000 GPU with 48GB of RAM but the GPU in my workstation I have 4 A5500 with 24GB of RAM. Can I use the method suggested in the paper by separating out the model editing and model running. Or is there a way for me to run it parallel between my GPUs? My current idea to use this library called transformer-utils that uses a smaller model. I'm getting the message that I'm running out storage when running the model editing

NotImplementedError for GPT-J-6b

I was trying to run "memit.ipynb" on the premium GPU in Colab with the GPT-J-6b model. I believe the error message I got is not related to OOM errors, rather, a potential problem in the code? For context, the gpt2-xl does work as expected.

kmeng01 / memit Goto Github PK

memit's People

Contributors

Stargazers

Watchers

Forkers

memit's Issues

Issue Description

Cause

Solution

Recommend Projects

Recommend Topics

Recommend Org