jasoncao11 / nlp-notebook Goto Github PK

NLP 领域常见任务的实现，包括新词发现、以及基于pytorch的词向量、中文文本分类、实体识别、摘要文本生成、句子相似度判断、三元组抽取、预训练模型等。

License: MIT License

Python 100.00%

textcnn textrcnn bilstm-crf-model bilstm-attention fasttext-embeddings transformer-pytorch bert-chinese textrcnn-bert distill-bert seq2seq gpt2 text-classification glove skip-gram nlp pytorch bert natural-language-processing bert-ner electra

nlp-notebook's Introduction

项目描述

NLP 领域常见任务的实现，包括新词发现、以及基于pytorch的词向量、中文文本分类、实体识别、文本生成、句子相似度判断、三元组抽取、预训练模型等。

依赖

python 3.7
pytorch 1.8.0
torchtext 0.9.1
optuna 2.6.0
transformers 3.0.2

数据集	数据量
训练集	56700
验证集	7000
测试集	6300

nlp-notebook's People

Contributors

Stargazers

Watchers

Forkers

askintution tiexueyl xho-yhzhou cindycandy alansec yihuai-xu shunsunsun chunyu226 xiangyuya xbqnl km1994 yliu9418 chenshubiao techthiyanes piao0623 jiang-yiyan zoumt1633 kazgu badao006 9414lalala zhouxiaoleilei zhoujiangbing sophieduan ashdyh1999 allensmile yisheng123 af-74413592 liuyang333 zurichrain alsers bobotjb tcmip microw answer3664 ehealthgroup yangjinfeng wqw547243068 sunpu1995 longway4ml startgis ni0317 anbiao nevinhappy dmuistlab brightchu anminhhung poisoners benjamesbabala wsj-7416 jimmycao kingking888 v0idwu algorithm-learning-community-for-python mayi140611 tuozhanjun skyrookieyu sinntalker kangaroobiubiu ipengx1029 lichendaozhu xhh315 luckydoggy yesthing kenyony hiahianet aaronzhangl skytodmoon fanlystone xjy531171158 michal-olek benkang-chen lf464347567 tantailong stephenhzj lorraine021 newsky donghailiang111 sanzanalora jcarlosneto wallaceliu moroboshidann houhailun yysverson1 ygj781129 loppol38 lvvvvvk marrinw ycqingfeng marscube zhhy1 zhoulei163 syfffff liufeiran123 danmo121 wengbenjue haisimao besteee minhtien2405 zhoufangquan tender-sun

nlp-notebook's Issues

Bug

  File "/workspace/nlp-notebook/4-3.Transformer/model.py", line 256, in forward
    enc_src = self.encoder(src, src_mask)        
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/workspace/nlp-notebook/4-3.Transformer/model.py", line 35, in forward
    src = self.dropout((self.tok_embedding(src) * self.scale) + self.pos_embedding(pos))     
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 156, in forward
    return F.embedding(
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 1916, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

生成模型Transformer有以上错误，排除了软件包版本问题，S2S+attention也有bug，具体和之前一个issue一样，tuple和tensor的问题

AttributeError : 'tuple' has no attribute 'to'

在4-1.Seq2seq中：
for trg, src in pbar:
trg, src = trg.to(device), src.to(device)
trg和src是tuple类型的

请问在Seq2Seq模型中，把trg改为tensor类型后，出现以下报错：
Traceback (most recent call last):
File "E:\nlp-notebook-master\4-1.Seq2seq\train_eval.py", line 54, in
trg, src = trg.to(device), src.to(device)
AttributeError: 'NoneType' object has no attribute 'to'
是怎么回事呢？

请问可以加入基于transformer的关键词提取和生成吗？

请问bert中文预训练做4-5的时候的运行时显存要求是多少

我目前是在校学生，只能用自己的笔记本进行实验，显卡是GTX2060 6GB的，但是显然是不够的，希望知情的人可以告知，不胜感激

3-2 'tuple' object has no attribute 'last_hidden_state'

(pytorch18) z@z:~/code/nlp-notebook-master/3-2.Bert-CRF$ python demo_train.py
Some weights of the model checkpoint at ./bert-base-chinese were not used when initializing BertForNER: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
This IS expected if you are initializing BertForNER from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
This IS NOT expected if you are initializing BertForNER from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForNER were not initialized from the model checkpoint at ./bert-base-chinese and are newly initialized: ['transitions', 'hidden2label.weight', 'hidden2label.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[Train Epoch 0]: 0%| | 0/1584 [00:00<?, ?it/s]
Traceback (most recent call last):
File "demo_train.py", line 66, in
run()
File "demo_train.py", line 53, in run
loss = model.neg_log_likelihood(input_ids, attention_mask, label_ids, real_lens)
File "/home/z/code/nlp-notebook-master/3-2.Bert-CRF/model.py", line 137, in neg_log_likelihood
feats = self.get_features(input_ids, attention_mask)
File "/home/z/code/nlp-notebook-master/3-2.Bert-CRF/model.py", line 53, in get_features
sequence_output, pooled_output = x.last_hidden_state, x.pooler_output
AttributeError: 'tuple' object has no attribute 'last_hidden_state'

输出如上，尝试修改model.from_pretrained（model_path，output_hidden_states = True）也不行
请问是哪里出了问题？环境配置是一样的

https://pan.baidu.com/s/1O3GXE3E7-qn9TrdKq1-5cw 提取码：h6vt

提取码不正确

RuntimeError: _th_ceil_out not supported on CUDAType for Long

您好，跑您p-tuning代码，遇见一个RuntimeError: _th_ceil_out not supported on CUDAType for Long，猜测是不是mlm_pytorch包的版本问题,想问一下您的版本和环境~谢谢！
Traceback (most recent call last):
File "G:/download/yg/nlp-notebook-master/5.PaperwithCode/3.P-tuning/train.py", line 30, in
loss = model(batch_data[0], batch_data[1])
File "C:\Users\Adam-CVTeam\Anaconda\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "G:\download\yg\nlp-notebook-master\5.PaperwithCode\3.P-tuning\model.py", line 77, in forward
inputs_embeds = self.embed_input(queries) #[batch size, spell_length + x, hidden_size]
File "G:\download\yg\nlp-notebook-master\5.PaperwithCode\3.P-tuning\model.py", line 45, in embed_input
replace_embeds = self.prompt_encoder() #[spell_length, hidden_size]
File "C:\Users\Adam-CVTeam\Anaconda\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "G:\download\yg\nlp-notebook-master\5.PaperwithCode\3.P-tuning\prompt_encoder.py", line 53, in forward
output_embeds = self.mlm_head(input_embeds)[0].squeeze() # [9(sum(template)), hidden_size]
File "C:\Users\Adam-CVTeam\Anaconda\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "C:\Users\Adam-CVTeam\Anaconda\envs\pytorch\lib\site-packages\mlm_pytorch\mlm_pytorch.py", line 67, in forward
mask = get_mask_subset_with_prob(~no_mask, self.mask_prob)
File "C:\Users\Adam-CVTeam\Anaconda\envs\pytorch\lib\site-packages\mlm_pytorch\mlm_pytorch.py", line 23, in get_mask_subset_with_prob
mask_excess = (mask.cumsum(dim=-1) > (num_tokens * prob).ceil())
RuntimeError: _th_ceil_out not supported on CUDAType for Long

Process finished with exit code 1

环境问题

您好，能告诉一下在4-1.Seq2seq项目中的环境吗？

Lattice-LSTM

hi作者你好，今天发现你这个仓库下的Lattice-LSTM不见了，可否再上传一下呢

jasoncao11 / nlp-notebook Goto Github PK

nlp-notebook's Introduction

项目描述

依赖

目录

0. 新词发现算法

1. 词向量

2. 文本分类 (每个模型内部使用optuna进行调参)

3. 实体识别NER

4. 文本摘要生成

1). 生成式

2). 抽取式

5. 句子相似度判别

6. 多标签分类

7. 三元组抽取

8. 预训练模型(ELECTRA + SimCSE)

9. 提示学习

10. PaperwithCode

11. QA

nlp-notebook's People

Contributors

Stargazers

Watchers

Forkers

nlp-notebook's Issues

Recommend Projects

Recommend Topics

Recommend Org