Comments (2)
我用的这个
def weight(self,vocab_to_index):
#将词映射为预训练词向量
size_vocab = len(vocab_to_index)#字典大小
embeddings = np.zeros((size_vocab, 300))#初始化数组 为零,300维
found=0#匹配到的词向量个数
with open(r'..\datasets\sgns.weibo.char','r',encoding='utf-8') as f:#读取预训练词向量文件
for line_idx, line in enumerate(f):#遍历索引和值,值格式为:词,词向量
line = line.strip().split()#值
if len(line) != 300 + 1:#保证每个词向量为300维
continue
word = line[0]#词
embedding = line[1:]#词向量
if word in vocab_to_index:
found=found+1#加一
word_idx = vocab_to_index[word]#找到对应索引
embeddings[word_idx] = embedding#该索引位置对应词向量
print('获取到的词向量:'+str(found)+'所有的词:'+str(size_vocab)+'匹配率:{:.2f}%'.format(found/size_vocab*100))
# 保存提取到的词向量数组
np.savez_compressed(r'..\datasets\vec.npz', embeddings=embeddings)
#return embeddings
from chinese-word-vectors.
from chinese-word-vectors.
Related Issues (20)
- Various Co-occurrence Information 里target word vector是什么 HOT 1
- Download from https://pan.baidu.com/ not possible without Chinese phone mumber
- Baidu Encyclopedia 百度百科-Word + Character + Ngram计算词与词之间的相似度很低 HOT 2
- 请问有汽车领域相关的词向量吗?
- 下游任务的词表中有些在你们的词向量文件中有未出现,请问有什么好的处理方式吗 HOT 4
- 偏旁部首向量无法点击跳转到下载链接
- 没有找到预训练集的下载链接 HOT 1
- Wikipedia_zh 中文维基百科 词300d 下载不了了 HOT 1
- embedding维度不统一问题 HOT 1
- MemoryError: Unable to allocate 1.47 TiB for an array with shape (635969, 635970) and data type float32 HOT 1
- embedding_sample/dense_small.txt在哪 HOT 2
- 如何加载模型 HOT 3
- 如何对不同长度的句子生成固定维度的向量? HOT 3
- 请问模型有上传到huggingface上吗
- Wikipedia_zh 中文维基百科 词300d 下载不了了 HOT 1
- 金融的四个链接全部失效了,请问有哪里可以下载吗? HOT 2
- Wikipedia_zh 中文维基百科 SGNS 词 的链接和谐了 HOT 3
- ValueError: could not convert string to float: '9:0.0221277913146' HOT 1
- 请问 Mixed-large 的 PPMI 向量的下载地址 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chinese-word-vectors.