This codebase contains a set of simple postprocessing transformations that improve the performance of word embeddings. Prior work has shown that mean subtraction and removal of early principal components can enhance performance on lexical similarity tasks. We further demonstrate that, simply by performing these transformations only on a strategic subset of the vocabulary, we can consistently achieve even further gains (up to 20% overall), while consuming less compute and memory resources. Not only does this behavior offer insights into the linguistic properties of these word representations, but the gains are considerable and hold on both static word embeddings (word2vec and GloVe) and contextual word embeddings (BERT and GPT-2) across a broad range of lexical similarity tasks.
google / one-weird-trick Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0