hunter-ddm / knowledge-neurons Goto Github PK

View Code? Open in Web Editor NEW

148.0 148.0 10.0 94.09 MB

Code for the ACL-2022 paper "Knowledge Neurons in Pretrained Transformers"

License: MIT License

Python 98.49% Shell 1.51%

knowledge-neurons's People

Contributors

Stargazers

Watchers

Forkers

rogervaas joskid trellixvulnteam hliang-lee guhaifudeng segmond shundroid siddhant7876 modelfarmai linuer

knowledge-neurons's Issues

Computational cost for calculating attribution scores

Hi, I'm currently running the first step bash 1_run_mlm.sh to get attribution scores, and I found it took several hours. Is it normal? In your paper you reported theh running time for identifying knowledge neurons is only 13.3 seconds. So this is the time cost for the second step bash 2_run_kn.sh, not the first step right?

Question about load pretrained model

Hello,
When I wanted to load pretrained model, I faced this problem:

INFO - custom_bert - Weights from pretrained model not used in BertForMaskedLM: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']

Questions about the code.

Hi,
I find in '3_modify_activation.py', the code
_, logits = model(input_ids=input_ids, attention_mask=input_mask, token_type_ids=segment_ids, tgt_pos=tgt_pos, tgt_layer=0, imp_pos=kn_bag, imp_op='remove')
why the tgt_later is always 0, in kn_bag, some neurons is [9,1000] or [10,1001], not always in layer 0.
And this happens in places like edit and erase.

And in the paper, Figure 4 and Figure 5 is the correct probability change or the probability change about the correct labe, does it mean the remove or enhance can improving probing performance or improving the ranking of the target labe(But the output is wrong) ?

Thanks!

Question about your code

knowledge-neurons/src/1_analyze_mlm.py

Line 292 in 922dfd9

 _, grad = model(input_ids=input_ids, attention_mask=input_mask, token_type_ids=segment_ids, tgt_pos=tgt_pos, tgt_layer=tgt_layer, tmp_score=batch_weights, tgt_label=pred_label) # (batch, n_vocab), (batch, ffn_size) 

I don't know why the output is gradient.. could you please explain about it?

Dependencies for Libraries

Please provide me with the relevant dependencies for the libraries used in this code, allowing me to successfully utilize this code to reproduce all results.

Can you provide me with the requirements.txt

🔥💰⑥名

Question about ig_pred

hello,
I have a question: why ig_pred is computed but not used in your project. It seems that all the gradients are computed based on golden label. Is there some reason?

knowledge neuron in transformers that have both encoder and decoder

Hi,

Firstly I want to thank you for the paper, which is very inspiring and interesting. I am wondering if you have ever tried to identity knowledge neurons in other transformers (in you paper you evaluated BERT models, which is encoder-only). Just curious about what would the knowledge neuron distributed in a model that has both encoder and decoder? Will decoder layers have more knowledge neurons than encoder layers? Or if there any related reference?

Thanks!

hunter-ddm / knowledge-neurons Goto Github PK

knowledge-neurons's People

Contributors

Stargazers

Watchers

Forkers

knowledge-neurons's Issues

Computational cost for calculating attribution scores

Question about load pretrained model

Questions about the code.

Question about your code

Dependencies for Libraries

Can you provide me with the requirements.txt

🔥💰⑥名

Question about ig_pred

knowledge neuron in transformers that have both encoder and decoder

Hi，When will the code be released? Thank you!

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent