dheerajrajagopal / selfexplain Goto Github PK

Python 96.19% Shell 3.81%

selfexplain's Issues

LIL Implementation

Line 119 in ec914ca

phrase_level_logits = self.phrase_logits(phrase_level_activations)

The implementation of LIL differs from what is in the paper. I am a bit confused on that aspect as well. If we are going via this implementation then mean that we are taking is not actually division by len(nt) matrix.

Inaccurate loss

According to the paper, section 2.5, the final loss is calculated as a weighted combination of the loss terms, LIL loss, GIL loss and task-based CE loss. However, in the code, the logits are calculated as the weighted sum of LIL, GIL and task-based logits BEFORE computation of the final loss. Due to this, the two are not equivalent.

i.e.; log(softmax(a+b)) is not equivalent to log(softmax(a)) + log(softmax(b))

XLNet has <cls> in last

SelfExplain/model/SE_XLNet.py

Line 118 in ec914ca

 phrase_level_activations = phrase_level_activations - self.activation(hidden_state[:,0,:].unsqueeze(1)) 

I am a bit confused on this line. XLNet has CLS token in last. hidden state first token would capture the first token.

Not runable in Windows

Hi,

I could not run_self_explain.sh due to "import resource" from line 8, run.py.
"import resource" is only available in UNIX systems, but not Windows systems. Is there any way that I can fix it?

Thanks!

Could you provide a colab notebook to running all your code?

Thank you very much for your coding.

However, I am stuck trying to run the training and inference steps to obtain an explanation. Could you provide a notebook that contains your code? It could be a Colab notebook with a small example that demonstrates how to obtain an explanation.

I have made several attempts to install the required library, but it still fails every time.

Thank you so much.

Missing normalization based on phrase length?

According to the paper (section 2.2), constituent word representations are taken to be the average of token representations of the phrase (non-terminal) tokens.

The code actually does a batch matrix multiplication, and therefore achieves the sum of hidden token representations. This may affects both the magnitude and the direction of the phrase level representation after applying the activation.

Am I missing something?

Noise on LIL layer due to batching

The codebase uses batching to process multiple sentences at the same time. Each sentence can be broken down into multiple phrases, represented by non-terminal onehot vectors. Because not ALL sentences in a batch contain the same number of phrase decomposition, some sentences have empty phrases. Ideally, lil_logits should MASK out all such representations and these should not account for the total loss. If not, some the lil_logits_mean will be dominated by 0 - pooled_seq_rep.

GIL implementation

hi, I'm trying to understand your code and reproduce it. in the GIL implementation, I have a question:
your paper says q_{k} in concept store Q will be constantly updated.
(As the model M is finetuned for a downstream task, the representations qk are constantly updated)
however, after looking at the bash files and python files, I found that you build the concept store at the very beginning using the original XLNet checkpoints then save it as a static concept_store.pt file. During the training, it seems that you did not update the .pt file.
I'm a bit wonder here. did I miss any detail here? or maybe can you point out where is the function for updating embeddings in Q?
Thanks in advanced!

dheerajrajagopal / selfexplain Goto Github PK

selfexplain's Issues

LIL Implementation

Inaccurate loss

XLNet has <cls> in last

Not runable in Windows

Could you provide a colab notebook to running all your code?

Missing normalization based on phrase length?

Noise on LIL layer due to batching

GIL implementation

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent