jcyk / dynet-biaffine-dependency-parser Goto Github PK
View Code? Open in Web Editor NEWDynet-based Biaffine Parser
Dynet-based Biaffine Parser
When i ran your code 5 day ago, it printed some information as below. Nothing more since then, no models were created, but the program still runs in my computer. Do you know why? (Dataset: Train set - 4500 sentences, Development set - 1100 sentences)
[dynet] random seed: 969247908
[dynet] allocating memory: 512MB
[dynet] memory allocation done.
Loaded config file sucessfully.
pretrained_embeddings_file ../data/emb/vi.txt
data_dir ../data/treebank
train_file ../data/treebank/train.conllu
dev_file ../data/treebank/dev.conllu
test_file ../data/treebank/test.conllu
min_occur_count 2
save_dir ../ckpt/default
config_file ../ckpt/default/config.cfg
save_model_path ../ckpt/default/model
save_vocab_path ../ckpt/default/vocab
load_dir ../ckpt/default
load_model_path ../ckpt/default/model
load_vocab_path ../ckpt/default/vocab
lstm_layers 3
word_dims 100
tag_dims 100
dropout_emb 0.33
lstm_hiddens 400
dropout_lstm_input 0.33
dropout_lstm_hidden 0.33
mlp_arc_size 500
mlp_rel_size 100
dropout_mlp 0.33
learning_rate 2e-3
decay .75
decay_steps 5000
beta_1 .9
beta_2 .9
epsilon 1e-12
num_buckets_train 40
num_buckets_valid 10
num_buckets_test 10
train_iters 50000
train_batch_size 5000
test_batch_size 5000
validate_every 100
save_after 5000
#words in training set: 3544
Vocab info: #words 10936, #tags 28 #rels 33
(400, 600)
Orthogonal pretrainer loss: 5.20e-27
(400, 600)
Orthogonal pretrainer loss: 7.02e-27
(400, 1200)
Orthogonal pretrainer loss: 2.79e-30
(400, 1200)
Orthogonal pretrainer loss: 2.77e-30
(400, 1200)
Orthogonal pretrainer loss: 2.82e-30
(400, 1200)
Orthogonal pretrainer loss: 2.93e-30
(600, 800)
Orthogonal pretrainer loss: 3.90e-23
Hi @jcyk ,
Thanks for your nice implementation, much cleaner than the TF codes, even produced higher scores.
On PTB-SD, better scores were reported:
This repo: LAS 95.01, UAS 96.05
TF repo: LAS 94.81, UAS 95.89
I guess the reason is you halved the batch size:
Is my understanding correct?
Thanks.
hello, where is your training data from, conll-2012?
I rad the code about the decoding process of the parser, mainly the arc_argmax
in lib/utils.py
and tarjan.py
, but I thought there is only guarantees that:
But, there is no guarantee for the result to be connected. I have a piece of code for testing:
import numpy as np
import dynet as dy
from lib import arc_argmax
m = np.random.randn(10, 10)
mt = dy.inputTensor(m)
mtp = dy.softmax(mt, d=1)
probs = mtp.npvalue()
mask = np.array([1,1,1,1,1,1,1,1,0,0]) # so only 7 words is valid. if includes the <ROOT> , then 8
sent_len = np.sum(mask) # 8
heads = arc_argmax(probs, sent_len, mask)
dependents = range(1, sent_len)
for d, h in zip(dependents, heads[1: sent_len]):
print('{0} --> {1}'.format(d, h))
Though I use randomly initialized m as the logits, but trying multiple times will produce some result like this.
1 --> 3
2 --> 5
3 --> 2
4 --> 1
5 --> 0
6 --> 5
7 --> 3
The graph is not connected. What's the problem, did I make something wrong? Thank you!
Sorry, I made a mistake about the offset ( is included).
Thanks for this nice reimplement of Biaffine parser.
One question: where is the module file 'ConfigParser.py', it is required at line 1 of config.py:
from ConfigParser import SafeConfigParser
Could you offer the LAS result of PTB and CTB. thx
We drop 33% of words and 33% of tags during training: when one is dropped the other is scaled by a factor of two to compensate, and when both are dropped together, the model simply gets an input of zeros.
I just modified it and it seems that the following one is better:
scale = 2. / (word_mask + tag_mask + 1e-12)
Am I right?
So good job, I am curious about the time needed for training. It looks a little slow.
Orthogonal pretrainer loss: 2.88e-30 (400, 1200) Orthogonal pretrainer loss: 2.82e-30 (400, 1200) Orthogonal pretrainer loss: 2.78e-30 (600, 800) Orthogonal pretrainer loss: 3.78e-23 2018-05-25 19:04:29 Start training epoch #0 Step #79: Acc: arc 0.59, rel 0.81, overall 0.51, loss 0.980
In train.py, I found the best_UAS is always 0 because this best_UAS doesn't update even UAS > best_UAS
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.