Git Product home page Git Product logo

yongzhuo / pytorch-nlu Goto Github PK

View Code? Open in Web Editor NEW
292.0 4.0 46.0 384 KB

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词、抽取式文本摘要等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of spee

Home Page: https://blog.csdn.net/rensihui

License: Apache License 2.0

Python 100.00%
python3 pytorch text-classification sequence-labeling named-entity-recognition word-segmentation pos-tagging chinese-text-segmentation chinese-text-classification transformers

pytorch-nlu's People

Contributors

moyongzhuo avatar yongzhuo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

pytorch-nlu's Issues

求回复:用bert-tiny做多标签分类,训练没有问题,保存了tc.config和tc.model,推理的时候在加载模型的地方报错

self.pretrain_model = pretrained_model(self.pretrained_config) # 推理时候只需要加载超参数, 不需要预训练模型的权重

预训练模型是bert-tiny,加载用的AutotModel,不是BertModel
在tcGraph.py的这行报错,报错如下

OSError: AutoModel is designed to be instantiated using the AutoModel.from_pretrained(pretrained_model_name_or_path) or AutoModel.from_config(config) methods.

请问读取数据集内存占用过高的问题

进行文本多标签分类,数据有90多万,txt文件有不到200m,但是读取数据集占用的内存太多了,不知道是不是bug还是本来就这样,机子32g的内存都不够读取四分之一的数据,

选择albert模型tokenizer加载错误

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'BertTokenizer'.
The class this function is called from is 'AlbertTokenizer'.
Traceback (most recent call last):
File "/home/efsz/localCode/Pytorch-NLU/test/tc/tet_tc_base_multi_label.py", line 73, in
lc.process()
File "/home/efsz/localCode/Pytorch-NLU/pytorch_nlu/pytorch_textclassification/tcRun.py", line 32, in process
self.corpus = Corpus(self.config, self.logger)
File "/home/efsz/localCode/Pytorch-NLU/pytorch_nlu/pytorch_textclassification/tcData.py", line 25, in init
self.tokenizer = self.load_tokenizer(self.config)
File "/home/efsz/localCode/Pytorch-NLU/pytorch_nlu/pytorch_textclassification/tcData.py", line 206, in load_tokenizer
tokenizer = PRETRAINED_MODEL_CLASSES[config.model_type][1].from_pretrained(config.pretrained_model_name_or_path)
File "/home/efsz/anaconda3/envs/nlu/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1854, in from_pretrained
return cls._from_pretrained(
File "/home/efsz/anaconda3/envs/nlu/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2017, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/home/efsz/anaconda3/envs/nlu/lib/python3.10/site-packages/transformers/models/albert/tokenization_albert.py", line 183, in init
self.sp_model.Load(vocab_file)
File "/home/efsz/anaconda3/envs/nlu/lib/python3.10/site-packages/sentencepiece/init.py", line 905, in Load
return self.LoadFromFile(model_file)
File "/home/efsz/anaconda3/envs/nlu/lib/python3.10/site-packages/sentencepiece/init.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
TypeError: not a string

其他模型好像没问题,但是albert的tokenizer加载会报这个错

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.