Git Product home page Git Product logo

selfexplain's Issues

LIL Implementation

phrase_level_logits = self.phrase_logits(phrase_level_activations)

The implementation of LIL differs from what is in the paper. I am a bit confused on that aspect as well. If we are going via this implementation then mean that we are taking is not actually division by len(nt) matrix.

Inaccurate loss

According to the paper, section 2.5, the final loss is calculated as a weighted combination of the loss terms, LIL loss, GIL loss and task-based CE loss. However, in the code, the logits are calculated as the weighted sum of LIL, GIL and task-based logits BEFORE computation of the final loss. Due to this, the two are not equivalent.

i.e.; log(softmax(a+b)) is not equivalent to log(softmax(a)) + log(softmax(b))

XLNet has <cls> in last

phrase_level_activations = phrase_level_activations - self.activation(hidden_state[:,0,:].unsqueeze(1))

I am a bit confused on this line. XLNet has CLS token in last. hidden state first token would capture the first token.

Not runable in Windows

Hi,

I could not run_self_explain.sh due to "import resource" from line 8, run.py.
"import resource" is only available in UNIX systems, but not Windows systems. Is there any way that I can fix it?

Thanks!

Could you provide a colab notebook to running all your code?

Thank you very much for your coding.

However, I am stuck trying to run the training and inference steps to obtain an explanation. Could you provide a notebook that contains your code? It could be a Colab notebook with a small example that demonstrates how to obtain an explanation.

I have made several attempts to install the required library, but it still fails every time.

Thank you so much.

Missing normalization based on phrase length?

According to the paper (section 2.2), constituent word representations are taken to be the average of token representations of the phrase (non-terminal) tokens.

The code actually does a batch matrix multiplication, and therefore achieves the sum of hidden token representations. This may affects both the magnitude and the direction of the phrase level representation after applying the activation.

Am I missing something?

Noise on LIL layer due to batching

The codebase uses batching to process multiple sentences at the same time. Each sentence can be broken down into multiple phrases, represented by non-terminal onehot vectors. Because not ALL sentences in a batch contain the same number of phrase decomposition, some sentences have empty phrases. Ideally, lil_logits should MASK out all such representations and these should not account for the total loss. If not, some the lil_logits_mean will be dominated by 0 - pooled_seq_rep.

GIL implementation

hi, I'm trying to understand your code and reproduce it. in the GIL implementation, I have a question:
your paper says q_{k} in concept store Q will be constantly updated.
(As the model M is finetuned for a downstream task, the representations qk are constantly updated)
however, after looking at the bash files and python files, I found that you build the concept store at the very beginning using the original XLNet checkpoints then save it as a static concept_store.pt file. During the training, it seems that you did not update the .pt file.
I'm a bit wonder here. did I miss any detail here? or maybe can you point out where is the function for updating embeddings in Q?
Thanks in advanced!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.