Python code for training all models in the ICLR paper, "Towards Universal Paraphrastic Sentence Embeddings". These models achieve strong performance on semantic similarity tasks without any training or tuning on the training data for those tasks. They also can produce features that are at least as discriminative as skip-thought vectors for semantic similarity tasks at a minimum. Moreover, this code can achieve state-of-the-art results on entailment and sentiment tasks.
I'm trying to access the STS 2012 (and other years if possible) files, and I was wondering where I could download them in the correct format or find the script that preprocessed them. Unless I'm mistaken, preprocess.java only preprocesses the SICK task.
I do have the original STS 2012 files, but I wanted to preprocess them in the same way as done for https://github.com/PrincetonML/SIF (which mentions the data was preprocessed here).
Does this code come with a License? If so, could you add a license file? If you have no strong opinion and do intend to provide it under an open source license, may I suggest Apache 2.0?