The contextual from kawine

contextual's People

Contributors

Stargazers

Watchers

contextual's Issues

sts.csv?

I'm interested in replicating the paper results. Would you be willing to share the sts.csv used? If so, perhaps you could simply attach it to this thread.

difficulty reproducing static embedding

I'm having trouble reproducing the static embedding results from the paper. For reference here is the results table for Static Embeddings from the paper:

And here is my current set of results when run on a similar large corpus of ~20k sentence pairs, ~8k word vocabulary (note - I've removed Elmo from my runs):

My scores for GloVe and FastText indicate that the testing procedure seems to be working and my scores there roughly match the paper suggesting my vocabulary is broad enough. However there appears to be some sort of systematic issue in creating good static embeddings from the first principal component which is independent of language model.

If the repo included a diagnostic or unit test this might be easier for me to diagnose on my end. For example - it might be useful to include expected outputs when the code is run on the 99 sentence pairs in sts.csv. But I'm certainly open to suggestions to any other tips or ideas for probing where the process might be failing.

Note that the other sections seem to replicate well! For example, here is Average Cosine Similarity for Anisotropy adjustment in the paper and my most recent run:

Here's self-similairity. [though note my lower scores on gpt2 - my intuition is this is a result of removing sentence duplicates, which are otherwise about 20% of the input data]

And here's intra-sentence similarity:

Recommend Projects

kawine / contextual Goto Github PK

contextual's People

Contributors

Stargazers

Watchers

Forkers

contextual's Issues

sts.csv?

difficulty reproducing static embedding

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent