Git Product home page Git Product logo

contextual's People

Contributors

kawine avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

contextual's Issues

sts.csv?

I'm interested in replicating the paper results. Would you be willing to share the sts.csv used? If so, perhaps you could simply attach it to this thread.

difficulty reproducing static embedding

I'm having trouble reproducing the static embedding results from the paper. For reference here is the results table for Static Embeddings from the paper:

pc_static_embeddings

And here is my current set of results when run on a similar large corpus of ~20k sentence pairs, ~8k word vocabulary (note - I've removed Elmo from my runs):

my_embeddings

My scores for GloVe and FastText indicate that the testing procedure seems to be working and my scores there roughly match the paper suggesting my vocabulary is broad enough. However there appears to be some sort of systematic issue in creating good static embeddings from the first principal component which is independent of language model.

If the repo included a diagnostic or unit test this might be easier for me to diagnose on my end. For example - it might be useful to include expected outputs when the code is run on the 99 sentence pairs in sts.csv. But I'm certainly open to suggestions to any other tips or ideas for probing where the process might be failing.


Note that the other sections seem to replicate well! For example, here is Average Cosine Similarity for Anisotropy adjustment in the paper and my most recent run:

mean_cosine_similarity_across_words

Here's self-similairity. [though note my lower scores on gpt2 - my intuition is this is a result of removing sentence duplicates, which are otherwise about 20% of the input data]

self_similarity_above_expected

And here's intra-sentence similarity:

mean_cosine_similarity_between_sentence_and_words

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.