Git Product home page Git Product logo

Comments (2)

stay-leave avatar stay-leave commented on June 8, 2024

我用的这个
def weight(self,vocab_to_index):
#将词映射为预训练词向量
size_vocab = len(vocab_to_index)#字典大小
embeddings = np.zeros((size_vocab, 300))#初始化数组 为零,300维
found=0#匹配到的词向量个数
with open(r'..\datasets\sgns.weibo.char','r',encoding='utf-8') as f:#读取预训练词向量文件
for line_idx, line in enumerate(f):#遍历索引和值,值格式为:词,词向量
line = line.strip().split()#值
if len(line) != 300 + 1:#保证每个词向量为300维
continue
word = line[0]#词
embedding = line[1:]#词向量
if word in vocab_to_index:
found=found+1#加一
word_idx = vocab_to_index[word]#找到对应索引
embeddings[word_idx] = embedding#该索引位置对应词向量
print('获取到的词向量:'+str(found)+'所有的词:'+str(size_vocab)+'匹配率:{:.2f}%'.format(found/size_vocab*100))
# 保存提取到的词向量数组
np.savez_compressed(r'..\datasets\vec.npz', embeddings=embeddings)
#return embeddings

from chinese-word-vectors.

HunterHeidy avatar HunterHeidy commented on June 8, 2024

from chinese-word-vectors.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.