selfexplain's Issues
LIL Implementation
Line 119 in ec914ca
The implementation of LIL differs from what is in the paper. I am a bit confused on that aspect as well. If we are going via this implementation then mean that we are taking is not actually division by len(nt) matrix.
Inaccurate loss
According to the paper, section 2.5, the final loss is calculated as a weighted combination of the loss terms, LIL loss, GIL loss and task-based CE loss. However, in the code, the logits are calculated as the weighted sum of LIL, GIL and task-based logits BEFORE computation of the final loss. Due to this, the two are not equivalent.
i.e.; log(softmax(a+b))
is not equivalent to log(softmax(a)) + log(softmax(b))
XLNet has <cls> in last
Line 118 in ec914ca
I am a bit confused on this line. XLNet has CLS token in last. hidden state first token would capture the first token.
Not runable in Windows
Hi,
I could not run_self_explain.sh due to "import resource" from line 8, run.py.
"import resource" is only available in UNIX systems, but not Windows systems. Is there any way that I can fix it?
Thanks!
Could you provide a colab notebook to running all your code?
Thank you very much for your coding.
However, I am stuck trying to run the training and inference steps to obtain an explanation. Could you provide a notebook that contains your code? It could be a Colab notebook with a small example that demonstrates how to obtain an explanation.
I have made several attempts to install the required library, but it still fails every time.
Thank you so much.
Missing normalization based on phrase length?
According to the paper (section 2.2), constituent word representations are taken to be the average of token representations of the phrase (non-terminal) tokens.
The code actually does a batch matrix multiplication, and therefore achieves the sum of hidden token representations. This may affects both the magnitude and the direction of the phrase level representation after applying the activation.
Am I missing something?
Noise on LIL layer due to batching
The codebase uses batching to process multiple sentences at the same time. Each sentence can be broken down into multiple phrases, represented by non-terminal onehot vectors. Because not ALL sentences in a batch contain the same number of phrase decomposition, some sentences have empty phrases. Ideally, lil_logits
should MASK out all such representations and these should not account for the total loss. If not, some the lil_logits_mean
will be dominated by 0 - pooled_seq_rep
.
GIL implementation
hi, I'm trying to understand your code and reproduce it. in the GIL implementation, I have a question:
your paper says q_{k} in concept store Q will be constantly updated.
(As the model M is finetuned for a downstream task, the representations qk are constantly updated)
however, after looking at the bash files and python files, I found that you build the concept store at the very beginning using the original XLNet checkpoints then save it as a static concept_store.pt file. During the training, it seems that you did not update the .pt file.
I'm a bit wonder here. did I miss any detail here? or maybe can you point out where is the function for updating embeddings in Q?
Thanks in advanced!
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.