Git Product home page Git Product logo

torchtext-summary's Introduction

torchtext的使用总结,并结合Pytorch实现LSTM

版本说明

  • Pyorch版本:0.4.1
  • torchtext:0.2.3
  • python:3.6

文件说明

  • Test-Dataset.ipynb Test-Dataset.py 使用torchtext进行文本预处理的notebook和py版。
  • Test-Dataset2.ipynb 使用Keras和PyTorch构建数据集进行文本预处理。

使用说明

  • 分别提供了notebook版和标准py文件版。
  • 从零开始逐步实现了torchtext文本预处理过程,包括截断补长,词表构建,使用预训练词向量,构建可用于pytorch的可迭代数据等。

    使用教程参考我的个人博客:http://www.nlpuser.com/pytorch/2018/10/30/useTorchText/

    代码中在数据集中使用预训练词向量部分已注释为markdown格式,如下所示,若要使用预训练的词向量,例如glove开源的预训练词向量,需要在当前目录下创建mycache文件夹作为cache目录,并指定预训练词向量文件所在位置。glove词向量下载可参考此链接:https://pan.baidu.com/s/1i5XmTA9

###  通过预训练的词向量来构建词表的方式示例,以glove.6B.300d词向量为例
cache = 'mycache'
if not os.path.exists(cache):
    os.mkdir(cache)
vectors = Vectors(name='/Users/wyw/Documents/vectors/glove/glove.6B.300d.txt', cache=cache)
# 指定 Vector 缺失值的初始化方式,没有命中的token的初始化方式
vectors.unk_init = init.xavier_uniform_ 
TEXT.build_vocab(train, min_freq=5, vectors=vectors)
# 查看词表元素
TEXT.vocab.vectors

torchtext-summary's People

Contributors

atnlp avatar

Watchers

James Cloos avatar Betterme avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.