Git Product home page Git Product logo

knowledge-neurons's People

Contributors

hunter-ddm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

knowledge-neurons's Issues

Computational cost for calculating attribution scores

Hi, I'm currently running the first step bash 1_run_mlm.sh to get attribution scores, and I found it took several hours. Is it normal? In your paper you reported theh running time for identifying knowledge neurons is only 13.3 seconds. So this is the time cost for the second step bash 2_run_kn.sh, not the first step right?

Question about load pretrained model

Hello,
When I wanted to load pretrained model, I faced this problem:

  • INFO - custom_bert - Weights from pretrained model not used in BertForMaskedLM: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']

Questions about the code.

Hi,
I find in '3_modify_activation.py', the code
_, logits = model(input_ids=input_ids, attention_mask=input_mask, token_type_ids=segment_ids, tgt_pos=tgt_pos, tgt_layer=0, imp_pos=kn_bag, imp_op='remove')
why the tgt_later is always 0, in kn_bag, some neurons is [9,1000] or [10,1001], not always in layer 0.
And this happens in places like edit and erase.

And in the paper, Figure 4 and Figure 5 is the correct probability change or the probability change about the correct labe, does it mean the remove or enhance can improving probing performance or improving the ranking of the target labe(But the output is wrong) ?

Thanks!

Question about your code

_, grad = model(input_ids=input_ids, attention_mask=input_mask, token_type_ids=segment_ids, tgt_pos=tgt_pos, tgt_layer=tgt_layer, tmp_score=batch_weights, tgt_label=pred_label) # (batch, n_vocab), (batch, ffn_size)

I don't know why the output is gradient.. could you please explain about it?

Dependencies for Libraries

Please provide me with the relevant dependencies for the libraries used in this code, allowing me to successfully utilize this code to reproduce all results.

Question about ig_pred

hello,
I have a question: why ig_pred is computed but not used in your project. It seems that all the gradients are computed based on golden label. Is there some reason?

knowledge neuron in transformers that have both encoder and decoder

Hi,

Firstly I want to thank you for the paper, which is very inspiring and interesting. I am wondering if you have ever tried to identity knowledge neurons in other transformers (in you paper you evaluated BERT models, which is encoder-only). Just curious about what would the knowledge neuron distributed in a model that has both encoder and decoder? Will decoder layers have more knowledge neurons than encoder layers? Or if there any related reference?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.