Git Product home page Git Product logo

pos's Introduction

Segmentation and POS Model for Chinese

This is a TensorFlow implementation of segmentation. Simply take the problem as an sequence to sequence problem.

Required Dependencies

  • Python 3.5
  • NumPy
  • TensorFlow 1.1.0-rc2

Data

This project bases on CTB7.0 data set. Others, you could have your own:

  • word embbeding file (send it to ./data/ directory)
  • word dict (send it to ./src/dict/ dicrectory)
  • tag dict (send it to ./src/dict/ dicrectory)

To initialize data, edit data path in make_data.sh and run:

./make_data.sh

Default parameters:

python3 src/reader_ctb7.0.py \
    ../data/ctb7.0/data/utf-8/postagged/ \
    ../data/ctb7.0_seg.train \
    ../data/ctb7.0_seg.dev \
    50 \
    0.05 \

Train

To train the model, run:

./train.sh

Default parameters:

python3 -u ./src/train.py --mode train\
    --train_dir ./data/ctb.train \
	--test_dir ./data/ctb.dev \
	--log_dir ./log/freeze-embedding-newdata/ \
	--c2v_path ./data/dnn_parser_vectors.char.20g \
	--embedding_size 650 --batch_size 10 --hidden_size 200 --train_steps 1000 --learning_rate 0.001 --max_sentence_len 50

Default hyper parameters for model:

Argument Default Description
--max_sentence_len 418 max num of tokens per query
--embedding_size 650 embedding size
--batch_size 20 batch size
--learning_rate 0.001 learning rate
--train_steps 50000 trainning steps
--num_hidden 200 hidden unit size
--vocab_size 7500 vocab size

Default Data Params:

Argument Default
train_dir ./data/ctb.train
test_dir ./data/ctb.dev"
log_dir ./log/default_log/"
c2v_dir ./data/dnn_parser_vectors.char.20g"
dict_path ./src/dict/"
distinct_tag_num 152

Infer

Infer mode avaliable, to enter infer mode, you must set log_path.

Default:

#!/bin/bash
python3 -u ./src/train.py --mode infer \
    --train_dir ./data/ctb.train \
	--test_dir ./data/ctb.dev \
	--log_dir ./log/freeze-embedding-newdata/ \
	--c2v_path ./data/dnn_parser_vectors.char.20g \
	--embedding_size 650 --batch_size 10 --hidden_size 200 --train_steps 1000 --learning_rate 0.001 --max_sentence_len 50

pos's People

Contributors

shimozhen avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.