Academic WOrd Embeddings based on AMiner 2 billion publication data and gensim and their applications.
- Python 3
- gensim
- spherecluster
- English Paper Keywords (EPK): Download
- Chinese Paper Keywords (CPK): Download
- Bilingual Transformation Matrix: Download
For details for these models, see docs/word2vec.md. (If you just want to use these models, ignore them.)
We hvae prepared a download bash script for you, you can use it on your need. For example, if you only need Chinese, just run ./download.sh zh
.
chmod +x download.sh
./download.sh zh
./download.sh en
wget https://lfs.aminer.cn/misc/awoe/W_en2zh.pkl -P tmp/
We provide some utils to use the above models, including tokenization, keyword extraction, sentense to vector, etc. Here are some use examples.
Before using these modules, download the required models first.
Docs to complete. You can run test.py
for now.
Docs to complete.
If our work helps you in some way, please consider citing the following publication(s):
- Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. ArnetMiner: Extraction and Mining of Academic Social Networks. In Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’2008).