jiesutd / latticelstm Goto Github PK
View Code? Open in Web Editor NEWChinese NER using Lattice LSTM. Code for ACL 2018 paper.
Chinese NER using Lattice LSTM. Code for ACL 2018 paper.
hello, why we should fixed the batch_size to 1? I have not read your code carefully, can you say something to me in advance?
Sorry that I am new to pytorch , but here in the class WordLSTMCell ,I found that
f, i, g = torch.split(wh_b + wi, split_size=self.hidden_size, dim=1)
In the formula of your paper, wh_b and wi are not added , so Did I misunderstand your code?
def forward(self, input_, hx):
"""
Args:
input_: A (batch, input_size) tensor containing input
features.
hx: A tuple (h_0, c_0), which contains the initial hidden
and cell state, where the size of both states is
(batch, hidden_size).
Returns:
h_1, c_1: Tensors containing the next hidden and cell state.
"""
h_0, c_0 = hx
batch_size = h_0.size(0)
bias_batch = (self.bias.unsqueeze(0).expand(batch_size, *self.bias.size()))
wh_b = torch.addmm(bias_batch, h_0, self.weight_hh)
wi = torch.mm(input_, self.weight_ih)
f, i, g = torch.split(wh_b + wi, split_size=self.hidden_size, dim=1)
c_1 = torch.sigmoid(f)*c_0 + torch.sigmoid(i)*torch.tanh(g)
return c_1
python main.py --status train \
--train ./Weibo/weiboNER_2nd_conll.train.bio \
--dev ./Weibo/weiboNER_2nd_conll.dev.bio \
--test ./Weibo/weiboNER_2nd_conll.test.bio \
--savemodel ./Weibo/model \
Train file: ./Weibo/weiboNER_2nd_conll.train.bio
Dev file: ./Weibo/weiboNER_2nd_conll.dev.bio
Test file: ./Weibo/weiboNER_2nd_conll.test.bio
Raw file: None
Char emb: data/gigaword_chn.all.a2b.uni.ite50.vec
Bichar emb: None
Gaz file: data/ctb.50d.vec
Model saved to: ./Weibo/model
Load gaz file: data/ctb.50d.vec total size: 704368
gaz alphabet size: 10798
gaz alphabet size: 12235
gaz alphabet size: 13671
build word pretrain emb...
Embedding:
pretrain word:11327, prefect match:3281, case_match:0, oov:75, oov%:0.0223413762288
build biword pretrain emb...
Embedding:
pretrain word:0, prefect match:0, case_match:0, oov:42646, oov%:0.999976551692
build gaz pretrain emb...
Embedding:
pretrain word:704368, prefect match:13669, case_match:0, oov:1, oov%:7.31475385853e-05
Training model...
DATA SUMMARY START:
Tag scheme: BIO
MAX SENTENCE LENGTH: 250
MAX WORD LENGTH: -1
Number normalized: True
Use bigram: False
Word alphabet size: 3357
Biword alphabet size: 42647
Char alphabet size: 3357
Gaz alphabet size: 13671
Label alphabet size: 18
Word embedding size: 50
Biword embedding size: 50
Char embedding size: 30
Gaz embedding size: 50
Norm word emb: True
Norm biword emb: True
Norm gaz emb: False
Norm gaz dropout: 0.5
Train instance number: 1350
Dev instance number: 270
Test instance number: 270
Raw instance number: 0
Hyperpara iteration: 100
Hyperpara batch size: 1
Hyperpara lr: 0.015
Hyperpara lr_decay: 0.05
Hyperpara HP_clip: 5.0
Hyperpara momentum: 0
Hyperpara hidden_dim: 200
Hyperpara dropout: 0.5
Hyperpara lstm_layer: 1
Hyperpara bilstm: True
Hyperpara GPU: True
Hyperpara use_gaz: True
Hyperpara fix gaz emb: False
Hyperpara use_char: False
DATA SUMMARY END.
Data setting saved to file: ./Weibo/model.dset
@jiesutd 请问你一下,问题可能出在那里啊?多谢!
I read in the paper that you set a weight-decay in the optimizer, but I didn't see that term in your initialization of optimizer in main.py here. I wonder if I have skipped something or you really didn't set the regularization in your code? Thanks.
(1)We can feed three kinds of parameter:"train","test" and "decode" to the main.py. In "train" step you have use "dev" set to choose best mode and save it. It seems that you use the "test" data to print the model'performance each iteration. Am I right? When status="test",you also use the "dev" data and "test" data to show the model'performance, but your have used them during trainning stage. Is that OK?
(2)In the main.py you mention "raw" data when status argument is "decode".Where to get the "raw" data?
dev和test是否只是对chtb 0001-0325, chtb 1001-1078根据奇数偶数编号来做了一下划分呢?剩余的用来做训练集?
Hi. I'm now dealing with some clinical unannotated data and I wonder how did you manually annotate the resume data in your experiment. Did you use some tricks or ML based annotatation? Thanks XD.
你好,请问在预训练embedding时有什么tricks吗?我发现自己使用word2vec训练的字向量在NER上效果并不好。
Hi!
我们在使用ResumeNER的数据时,发现标签列是“B-”,“M-”,“E-”和“O”,可是在LatticeLSTM/utils/metric.py中约76行以后的几行,给出的标签检索中没有M,所以问题是:1、“M-”对于BMES的标签方式是不是必须的?代码里是否是漏写?2、真实标签是“O”,代码中是“S-”,这是否影响最后结果?
我用tensorflow复现文中的字模型baseline效果,使用了相同的实验配置和embeding,ontonotes 4 test数据集上只得到了61左右的f值,微博语料则要高一些。请问如果用您的代码,需要做什么改动来跑字模型的baseline效果呢?
Hi Jie, thank for your works on Chinese NER, I downloaded the code and embedding files, and run the demo, however the training spent a long time, the performance is not good, f-value is adout 0.4 after 50 epoch. But I didn't know where I did wrong.
Environment: python2.7, pytorch0.3.0, gpu1080
Thanks!
gpu=True,但是也没用GPU, 哪儿需要改动,model后面加个cuda()就完了吗?还是还有其他也要改。
Excuse me, I have some trouble training your model on MSRA dataset with a GTX 1080Ti card. I've found the speed of training is quite slow. So, may I know your solution to this problem? (Note: The video memory almost runs out, but there is still much unused computing power left.)
run the demon-run_demo.sh,it seems request some more data in "onto4ner.cn" directory.So where to get
demo.train.char, demo.dev.char and demo.test.char?
您好,我现在想用自己的语料库训练,标签集必须要改成BIOES吗,还是BIO也可以,在哪里改标签集合呢?
谢谢
As title shows, I upgraded the code.If you want to use the code on pytorch 0.4.1 ,please refer to this.
new_version
In character-based NER, which one is used as the pretrained embeding?gigaword_chn.all.a2b.uni.ite50.vec or joint4.all.b10c1.2h.iter17.mchar?
我们用decode去做序列标注时,发现得到的raw.out中缺失了大概10处字符,没有字符也没有标签。这十几处不连续,也不是以句子为单位的缺失,每处缺失的字符数不同,多的地方缺失近200个字符。我们用的saved_model是中间某一次new score对应的,因为目前训练还没有结束。问题:1、一定要用最高的score对应的save_model也就是最后一次保存的模型才可以得到正确结果吗?2、目前这种句子有缺失的情况,代码得出的指标p,r,f还是正确的吗?3、出现这种缺失的原因是?
CuDNN: True
GPU available: False
Status: decode
Seg: True
Train file: data/conll03/train.bmes
Dev file: data/conll03/dev.bmes
Test file: data/conll03/test.bmes
Raw file: ./rd_data/test/test.txt
Char emb: data/gigaword_chn.all.a2b.uni.ite50.vec
Bichar emb: None
Gaz file: data/ctb.50d.vec
Data setting loaded from file: ./rd_data/test/test.dset
DATA SUMMARY START:
Tag scheme: BMES
MAX SENTENCE LENGTH: 250
MAX WORD LENGTH: -1
Number normalized: False
Use bigram: False
Word alphabet size: 2596
Biword alphabet size: 31940
Char alphabet size: 2596
Gaz alphabet size: 13634
Label alphabet size: 18
Word embedding size: 50
Biword embedding size: 50
Char embedding size: 30
Gaz embedding size: 50
Norm word emb: True
Norm biword emb: True
Norm gaz emb: False
Norm gaz dropout: 0.5
Train instance number: 0
Dev instance number: 0
Test instance number: 0
Raw instance number: 0
Hyperpara iteration: 100
Hyperpara batch size: 1
Hyperpara lr: 0.015
Hyperpara lr_decay: 0.05
Hyperpara HP_clip: 5.0
Hyperpara momentum: 0
Hyperpara hidden_dim: 200
Hyperpara dropout: 0.5
Hyperpara lstm_layer: 1
Hyperpara bilstm: True
Hyperpara GPU: False
Hyperpara use_gaz: True
Hyperpara fix gaz emb: False
Hyperpara use_char: False
DATA SUMMARY END.
Load Model from file: ./rd_data/test/demo_test.6.model
build batched lstmcrf...
build batched bilstm...
build LatticeLSTM... forward , Fix emb: False gaz drop: 0.5
load pretrain word emb... (13634, 50)
build LatticeLSTM... backward , Fix emb: False gaz drop: 0.5
load pretrain word emb... (13634, 50)
build batched crf...
Traceback (most recent call last):
File "main_test.py", line 454, in
decode_results = load_model_decode(model_dir, data, 'raw', gpu, seg)
File "main_test.py", line 348, in load_model_decode
model.load_state_dict(torch.load(model_dir))
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 487, in load_state_dict
.format(name, own_state[name].size(), param.size()))
RuntimeError: While copying the parameter named lstm.word_embeddings.weight, whose dimensions in the model are torch.Size([2596, 50]) and whose dimensions in the checkpoint are torch.Size([2527, 50]).
只修改过main文件,run_demo.sh文件
In main python script, bichar_emb is set to be none. What is this embedding?
您好:
我在GitHub上下载的MSRA数据集是BIO格式的标签,您好像使用的是BMES格式的标签,请问能不能通过百度云或者其他方式分享一下呢
I try to train a model on msra data using a nvidia 1080 Ti, and it takes about 120 seconds on 500 sentences. It is acceptable on small data set, but if the data set is larger, for instance, 5 times bigger than msra, the training time is too long.
Is there any way to speed up the training speed?
CuDNN: True
GPU available: False
Status: train
Seg: True
Train file: ./rd_data/train.txt
Dev file: ./rd_data/dev.txt
Test file: ./rd_data/test.txt
Raw file: None
Char emb: data/gigaword_chn.all.a2b.uni.ite50.vec
Bichar emb: None
Gaz file: data/ctb.50d.vec
Model saved to: ./rd_data/demo_test
Load gaz file: data/ctb.50d.vec total size: 704368
gaz alphabet size: 31572
gaz alphabet size: 33642
gaz alphabet size: 35512
build word pretrain emb...
Embedding:
pretrain word:11327, prefect match:2497, case_match:0, oov:29, oov%:0.0114760585675
build biword pretrain emb...
Embedding:
pretrain word:0, prefect match:0, case_match:0, oov:91271, oov%:0.999989043737
build gaz pretrain emb...
Embedding:
pretrain word:704368, prefect match:35510, case_match:0, oov:1, oov%:2.81594953818e-05
Training model...
DATA SUMMARY START:
Tag scheme: BIO
MAX SENTENCE LENGTH: 250
MAX WORD LENGTH: -1
Number normalized: False
Use bigram: False
Word alphabet size: 2527
Biword alphabet size: 91272
Char alphabet size: 2527
Gaz alphabet size: 35512
Label alphabet size: 5
Word embedding size: 50
Biword embedding size: 50
Char embedding size: 30
Gaz embedding size: 50
Norm word emb: True
Norm biword emb: True
Norm gaz emb: False
Norm gaz dropout: 0.5
Train instance number: 28185
Dev instance number: 5885
Test instance number: 5977
Raw instance number: 0
Hyperpara iteration: 100
Hyperpara batch size: 1
Hyperpara lr: 0.015
Hyperpara lr_decay: 0.05
Hyperpara HP_clip: 5.0
Hyperpara momentum: 0
Hyperpara hidden_dim: 200
Hyperpara dropout: 0.5
Hyperpara lstm_layer: 1
Hyperpara bilstm: True
Hyperpara GPU: False
Hyperpara use_gaz: True
Hyperpara fix gaz emb: False
Hyperpara use_char: False
DATA SUMMARY END.
Traceback (most recent call last):
File "main_test.py", line 444, in
train(data, save_model_dir, seg)
File "main_test.py", line 240, in train
save_data_setting(data, save_data_name)
File "main_test.py", line 90, in save_data_setting
new_data = copy.deepcopy(data)
File "/usr/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
File "/usr/lib/python2.7/copy.py", line 298, in _deepcopy_inst
state = deepcopy(state, memo)
File "/usr/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
File "/usr/lib/python2.7/copy.py", line 257, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
File "/usr/lib/python2.7/copy.py", line 230, in _deepcopy_list
y.append(deepcopy(a, memo))
File "/usr/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
File "/usr/lib/python2.7/copy.py", line 230, in _deepcopy_list
y.append(deepcopy(a, memo))
File "/usr/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
File "/usr/lib/python2.7/copy.py", line 230, in _deepcopy_list
y.append(deepcopy(a, memo))
File "/usr/lib/python2.7/copy.py", line 192, in deepcopy
memo[d] = y
MemoryError
Hi, can you share me with pretrained biword-embedding?
请问要下的两个词向量文件夹中的所有文件都要下吗,还是只要分别下一个就好?
谢谢
I want to use this model for my project.Thx
Hello, I am trying to reproduce your work on OntoNotes 4. Could you please provide some code or scripts for preprocessing that dataset? I mean, to split it into train/ dev/ test set, and to transform the original format in OntoNotes to CoNLL format (BMES).
I have downloaded OntoNotes 4 from LDC using my license, and tried to split that dataset according to the paper Named Entity Recognition with Bilingual Constraints, as mentioned in your ACL18 paper. However, some statistics are not consistent with the results shown in your paper. It will help a lot if you could provide the code for preprocessing. Thanks!
Where can i get weibo and MSRA data?
用自己已经标注过的语料做了训练,保存了模型到磁盘。在测试阶段,重新加载模型,然后执行后报错。
报错显示维度不匹配。
错误信息如下:
build batched crf...
Traceback (most recent call last):
File "main.py", line 442, in <module>
load_model_decode(model_dir, data, 'test', gpu, seg)
File "main.py", line 348, in load_model_decode
model.load_state_dict(torch.load(model_dir),strict=False)
File "/root/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 487, in load_state_dict
.format(name, own_state[name].size(), param.size()))
RuntimeError: While copying the parameter named lstm.hidden2tag.weight, whose dimensions in the model are torch.Size([9, 200]) and whose dimensions in the checkpoint are torch.Size([7, 200]).
但是训练和测试的时候,网络结构并没有改动,怎么会出现维度失配?还是其他原因?
please tell me why?
In my experiment, char embeddings and word embeddings are gigaword_chn.all.a2b.uni.ite50.vec and ctb.50d.vec respectively, while bichar_emb is set to None. Other parameters take the default value in the code. Currently, 80 epoches on a nividia 1080 Ti GPU has been run, the test result on the msra test dataset did not reach the result in the paper, and the best result is acc: 0.9891, p: 0.9331, r: 0.9093, f: 0.9210. Where did I do wrong?
In addition, if char embeddings trained on Chinese Wikipedia (bigger than gigaword, and the embeddings contain 16115 words, 100 dimensions) are used instead of gigaword_chn.all.a2b.uni.ite50.vec(11327 words, 50 dimensions), the difference of test results between Bi-LSTM+CRF based on char + softword and LatticeLSTM (also using the same char embeddings trained on Chinese Wikipedia) is small. Is the big difference in the paper because of the use of a weaker char embedding?
Epoch: 0/100
Learning rate is setted as: 0.015
Traceback (most recent call last):
File "main.py", line 436, in
train(data, save_model_dir, seg)
File "main.py", line 281, in train
loss, tag_seq = model.neg_log_likelihood_loss(gaz_list, batch_word, batch_biword, batch_wordlen, batch_char, batch_charlen, batch_charrecover, batch_label, mask)
File "/root/receiveData/LatticeLSTM/model/bilstmcrf.py", line 32, in neg_log_likelihood_loss
scores, tag_seq = self.crf._viterbi_decode(outs, mask)
File "/root/receiveData/LatticeLSTM/model/crf.py", line 159, in _viterbi_decode
partition_history = torch.cat(partition_history,0).view(seq_len, batch_size,-1).transpose(1,0).contiguous() ## (batch_size, seq_len. tag_size)
RuntimeError: invalid argument 0: Tensors must have same number of dimensions: got 2 and 3 at /pytorch/torch/lib/THC/generic/THCTensorMath.cu:102
In the readme, you mentioned that the pretrained character and word embeddings are the same with the embeddings in the baseline of RichWordSegmentor, i.e., character and word embeddings are gigaword_chn.all.a2b.uni.ite50.vec and ctb.50d.vec respectively. These seems not be mentioned in the paper. Are the experimental results of latticeLSTM in the paper obtained using these two embeddings?
In the paper, you mentioned that the word embeddings is pretrained using word2vec (Mikolov et al., 2013) over automatically segmented Chinese Giga-Word. Dose this word embedding is only used in those baseline methods?
1.这种边界信息对分词应该也有帮助啊,有尝试过吗?
2.paper中lattice用到的分词,是现成的分词器,还是用的无监督分词来产生词表啊?
谢谢。
如题,谢谢。
Hi,
I see
It achieves 93.18% F1-value on MSRA dataset, which is the state-of-the-art result on Chinese NER task.
But, I try to find by google, I can't find this MSRA dataset. Please ask where I can find this dataset.
你好,我尝试用训练好的模型对一些外部的文本进行命名实体识别。
我发现加载的文本不能是普通的文字序列,需要处理为 char lable 一行这样的格式。 而且句子之间需要有空行。(句子长度是否有限制?)
然后我发现结果基本没有识别出我的句子中的实体。 我用的训练数据集即为ResumeNER数据集,不知效果较差是否和这个训练数据集有关。
运行命令:
python main.py --status decode --raw ./data/bioes2.txt --savedset ./data/saved_model.dset --
loadmodel ./data/saved_models/saved_model.35.model --output ./data/res.out
bioes2.txt为需要命名实体识别的文本内容
res.out 为命名实体识别的结果
bioes2.txt的内容如下:
东 B-LOC
光 E-LOC
铁 E-LOC
佛 E-LOC
寺 E-LOC
位 O
于 O
沧 B-LOC
州 E-LOC
市 E-LOC
东 B-LOC
光 E-LOC
县 E-LOC
县 O
城 O
内 O
, O
是 O
沧 B-LOC
州 E-LOC
最 O
著 O
名 O
的 O
佛 B-LOC
教 E-LOC
寺 E-LOC
院 O
, O
已 O
有 O
千 O
年 O
历 O
史 O
, O
在 O
沧 B-LOC
州 E-LOC
当 O
地 O
自 O
古 O
就 O
有 O
“ O
沧 B-LOC
州 E-LOC
狮 B-LOC
子 E-LOC
景 E-LOC
州 E-LOC
塔 E-LOC
, O
东 B-LOC
光 E-LOC
县 E-LOC
的 O
铁 B-LOC
菩 E-LOC
萨 E-LOC
” O
的 O
说 O
法 O
, O
很 O
多 O
当 O
地 O
人 O
拜 O
佛 O
祈 O
福 O
都 O
会 O
选 O
择 O
这 O
里 O
。 O
铁 B-LOC
佛 E-LOC
寺 E-LOC
始 O
建 O
于 O
宋 O
代 O
, O
后 O
曾 O
经 O
被 O
毁 O
, O
寺 O
内 O
的 O
古 O
迹 O
和 O
古 O
铁 O
佛 O
早 O
已 O
不 O
存 O
, O
如 O
今 O
的 O
铁 B-LOC
佛 E-LOC
寺 E-LOC
是 O
九 O
十 O
年 O
代 O
时 O
重 O
新 O
修 O
建 O
的 O
。 O
但 O
修 O
建 O
后 O
的 O
寺 O
院 O
庄 O
严 O
大 O
气 O
, O
而 O
且 O
修 O
建 O
时 O
也 O
产 O
生 O
了 O
很 O
多 O
神 O
话 O
传 O
说 O
, O
使 O
得 O
如 O
今 O
的 O
铁 B-LOC
佛 E-LOC
寺 E-LOC
依 O
然 O
香 O
火 O
旺 O
盛 O
。 O
在 O
铁 B-LOC
佛 E-LOC
寺 E-LOC
内 O
游 O
玩 O
时 O
, O
可 O
以 O
着 O
重 O
观 O
看 O
寺 O
内 O
的 O
巨 O
大 O
铁 O
佛 O
, O
铁 O
佛 O
高 O
约 O
8 O
米 O
多 O
, O
非 O
常 O
壮 O
观 O
。 O
寺 O
内 O
另 O
有 O
多 O
座 O
佛 B-LOC
殿 E-LOC
, O
都 O
可 O
以 O
一 O
一 O
参 O
观 O
。 O
另 O
外 O
还 O
有 O
京 B-LOC
剧 O
名 O
旦 O
荀 O
慧 O
生 O
的 O
纪 B-LOC
念 E-LOC
馆 E-LOC
, O
可 O
以 O
进 O
入 O
了 O
解 O
一 O
下 O
。 O
在 O
寺 O
内 O
上 O
香 O
、 O
磕 O
头 O
时 O
, O
一 O
般 O
会 O
被 O
要 O
求 O
给 O
一 O
点 O
香 O
火 O
钱 O
, O
每 O
人 O
1 O
0 O
res.out 的结果如下:
东 O
光 O
铁 O
佛 O
寺 O
位 O
于 O
沧 O
州 O
市 O
东 O
光 O
县 O
县 O
城 O
内 O
, O
是 O
沧 O
州 O
最 O
著 O
名 O
的 O
佛 B-ORG
教 M-ORG
寺 M-ORG
院 E-ORG
, O
已 O
有 O
千 O
年 O
历 O
史 O
, O
在 O
沧 O
州 O
当 O
地 O
自 O
古 O
就 O
有 O
“ O
沧 O
州 O
狮 O
子 O
景 O
州 O
塔 O
, O
东 O
光 O
县 O
的 O
铁 O
菩 O
萨 O
” O
的 O
说 O
法 O
, O
很 O
多 O
当 O
地 O
人 O
拜 O
佛 O
祈 O
福 O
都 O
会 O
选 O
择 O
这 O
里 O
。 O
铁 O
佛 O
寺 O
始 O
建 O
于 O
宋 O
代 O
, O
后 O
曾 O
经 O
被 O
毁 O
, O
寺 O
内 O
的 O
古 O
迹 O
和 O
古 O
铁 O
佛 O
早 O
已 O
不 O
存 O
, O
如 O
今 O
的 O
铁 O
佛 O
寺 O
是 O
九 O
十 O
年 O
代 O
时 O
重 O
新 O
修 O
建 O
的 O
。 O
但 O
修 O
建 O
后 O
的 O
寺 O
院 O
庄 O
严 O
大 O
气 O
, O
而 O
且 O
修 O
建 O
时 O
也 O
产 O
生 O
了 O
很 O
多 O
神 O
话 O
传 O
说 O
, O
使 O
得 O
如 O
今 O
的 O
铁 O
佛 O
寺 O
依 O
然 O
香 O
火 O
旺 O
盛 O
。 O
在 O
铁 O
佛 O
寺 O
内 O
游 O
玩 O
时 O
, O
可 O
以 O
着 O
重 O
观 O
看 O
寺 O
内 O
的 O
巨 O
大 O
铁 O
佛 O
, O
铁 O
佛 O
高 O
约 O
0 O
米 O
多 O
, O
非 O
常 O
壮 O
观 O
。 O
寺 O
内 O
另 O
有 O
多 O
座 O
佛 O
殿 O
, O
都 O
可 O
以 O
一 O
一 O
参 O
观 O
。 O
另 O
外 O
还 O
有 O
京 O
剧 O
名 O
旦 O
荀 O
慧 O
生 O
的 O
纪 O
念 O
馆 O
, O
可 O
以 O
进 O
入 O
了 O
解 O
一 O
下 O
。 O
在 O
寺 O
内 O
上 O
香 O
、 O
磕 O
头 O
时 O
, O
一 O
般 O
会 O
被 O
要 O
求 O
给 O
一 O
点 O
香 O
火 O
钱 O
, O
每 O
人 O
0 O
0 O
" O
沧 O
州 O
民 O
谣 O
: O
“ O
一 O
文 O
一 O
武 O
, O
一 O
国 O
宝 O
, O
一 O
人 O
祖 O
。 O
” O
文 O
者 O
, O
是 O
一 O
代 O
文 O
宗 O
纪 O
晓 O
岚 O
, O
武 B-TITLE
者 E-TITLE
, O
是 O
沧 O
州 O
乃 O
驰 O
名 O
中 O
外 O
的 O
武 O
术 O
之 O
乡 O
, O
国 O
宝 O
指 O
沧 O
州 O
铁 O
狮 O
, O
人 O
祖 O
即 O
盘 O
古 O
, O
盘 O
古 O
遗 O
址 O
就 O
在 O
今 O
沧 O
州 O
市 O
所 O
属 O
的 O
青 O
县 O
境 O
内 O
。 O
青 O
县 O
城 O
南 O
0 O
公 O
里 O
有 O
村 O
曰 O
“ O
大 O
盘 O
古 O
” O
, O
村 O
西 O
有 O
座 O
盘 O
古 O
庙 O
。 O
你好,读了你的论文,感觉很棒,刚入坑的小白一枚,有几个地方想请教一下。
我根据您的代码,仅仅用char embedding来复现基于char的weibo和MSRA实验,我发现webo和MSRA的结果都达不到论文中的引用值,weibo test只有: 0.475, 论文中是0.5277; MSRA test只有85.75,论文中是88.81。所以,我想请教一下作者,这大概是什么原因造成的?我调试了很久,但是始终没有太大的提升。
语句如下:
输入语句:在全国高等医药教材建设研究会和卫生部教材办公室的指导和组织下,在第6版的基础上,经过编委们的精心修改、编撰,完成了本教材的第7版。
通过使用训练出来的模型文件(xxx.model),使用decode后:
输出:在全国高等医药教材建设研究会和卫生部教材办公室的指导和组织下,在第0版的基础上,经过编委们的精心修改、编撰,完成了本教材的第0版。
我可以保证的的是词嵌入和字典里面都是有数字vec的。
请问:为什么输出后数字变成了“0”?
你好。在查阅ACL2018时,看见您的论文。按照您论文以及实验代码中的**,我想确认几点问题:
1、完全摒弃了char特征(ps:代码中未看见通过lstm提取字特征),是不是没有结合char特征和Lattice提取出的特征,只是单单使用Lattice提取出的特征?
2、代码中的bi_word也就是代表了词信息,但是在LatticeLSTM中并未参与计算?这个是没有使用么?
3、仅仅使用Lattice网络在长期依赖的问题处理上,能不能保证和lstm能达到相同的效果?有没有一些论证呢?
Traceback (most recent call last):
File "/home/rui/workspace/lattice-lstm/LatticeLSTM-master/main.py", line 459, in
train(data, save_model_dir, seg)
File "/home/rui/workspace/lattice-lstm/LatticeLSTM-master/main.py", line 286, in train
batch_charlen, batch_charrecover, batch_label, mask)
File "/home/rui/workspace/lattice-lstm/LatticeLSTM-master/model/bilstmcrf.py", line 32, in neg_log_likelihood_loss
scores, tag_seq = self.crf._viterbi_decode(outs, mask)
File "/home/rui/workspace/lattice-lstm/LatticeLSTM-master/model/crf.py", line 159, in _viterbi_decode
partition_history = torch.cat(partition_history,0).view(seq_len, batch_size,-1).transpose(1,0).contiguous() ## (batch_size, seq_len. tag_size)
RuntimeError: invalid argument 0: Tensors must have same number of dimensions: got 2 and 3 at /pytorch/torch/lib/THC/generic/THCTensorMath.cu:102
在对partition_history执行cat操作时,输入的tensor list维度不一致。
partition_history中,第一个tensor是 [batch_size, tag_size, 1]:
partition = inivalues[:, START_TAG, :].clone().view(batch_size, tag_size, 1) # bat_size * to_target_size
partition_history.append(partition)
而在for中,torch.max返回的partition形状为 [batch_size,tag_size],与第一个tensor维度不一致,导致cat操作失败
cur_values = cur_values + partition.contiguous().view(batch_size, tag_size, 1).expand(batch_size, tag_size, tag_size)
partition, cur_bp = torch.max(cur_values, 1)
partition_history.append(partition)
请问如何修改
下面这两的向量维度不是50,一个是15,一个是 55, 是这个文件本来的问题,还是我下载过程中传输出错了(没有百度网盘会员,下的真慢)?
森悄 -0.420138 -0.189634 0.346326 -0.235297 -0.389551 -0.588 1.164976 -0.610863 0.073047 0.531165 -3.343037 -0.666090 2.384061 0.129748 -1.972636
系v 0.108717 -0.042028 -2.452340 -0.387857 1.953125 0.230040 2.203831 3.083842 0.400699 -0.449208 1.321026 -2.430978 1.369693 0.100625 -1.246027 -0.846308 -2.649471 0.168484 0.593922 -0.481574 0.546810 -2.844704 -0.956998 -2.017416 1.072134 -1.407300 -0.145390 -0.086188 -0.896394 2.064528 1.660699 0.500353 0.773185 -2.036687 3.072354 0.667415 -0.520374 -1.668948 0.729110 0.385540 -0.868025 0.600913 1.883432 3.111219 -1.039192 1.274076 1.103154 3.524141 -0.77819 -2.084318 -1.281501 -2.526086 -2.124930 -0.793325 -0.496073
Hi @jiesutd 我最近在用latticeLSTM训练医疗数据集来标注,有如下问题:
1)现在的模型batch size固定为1, 那么mask还起作用吗?
2) 我看到了main文件里调用的是bilstmcrf模型,然后bilstmcrf里面调用了bilstm和crf两个模型,bilstm里面调用了latticelstm,所以整个project是实现了latticeLSTM一个模型还是实现了包括bilstmcrf,latticelstm等各种模型?我用默认配置运行run_main.sh,得到的是latticelstm模型的结果?
3)我如果在医疗数据上运行latticeLSTM,利用目前提供的gigaword_chn.all.a2b.uni.ite50.vec以及ctb.50d.vec可行吗
Have you tried setting data.HP_batch_size to a value greater than 1?
I set data.HP_batch_size to 100, but the training results are not good. On the msra dataset, f1 is stable around 0.74. I switched to the adam optimization method, the effect is not as good as the previous default setting, f1 is about 0.91- 0.92.
通常c的计算,都会考虑前一个字符的c状态,为什么论文中这部分没有呢?为什么论文只考虑来当前字符相关的所有词的c状态呢?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.