wptoux / albert-chinese-large-webqa Goto Github PK
View Code? Open in Web Editor NEW基于百度webqa与dureader数据集训练的Albert Large QA模型
License: Apache License 2.0
基于百度webqa与dureader数据集训练的Albert Large QA模型
License: Apache License 2.0
如题,https://huggingface.co/wptoux/albert-chinese-large-qa
上我看例子得不到结果。
1、使用pipeline方法出现了 Typeerror: not a string的情况,如何解决?
2、如果不能使用pipeline方法,现在我的数据里有问句+文本, 应该如何使用这个模型呢?
恳请赐教~~
4.1.1 会有 not a string issue 在run_squad.py
Traceback (most recent call last):
File "/data2//.conda/envs/py36/lib/python3.6/site-packages/transformers/modeling_utils.py", line 1062, in from_pretrained
state_dict = torch.load(resolved_archive_file, map_location="cpu")
File "/data2//.conda/envs/py36/lib/python3.6/site-packages/torch/serialization.py", line 527, in load
with _open_zipfile_reader(f) as opened_zipfile:
File "/data2//.conda/envs/py36/lib/python3.6/site-packages/torch/serialization.py", line 224, in init
super(_open_zipfile_reader, self).init(torch.C.PyTorchFileReader(name_or_buffer))
RuntimeError: version <= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED at /pytorch/caffe2/serialize/inline_container.cc:132, please report a bug to PyTorch. Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 2. Your PyTorch installation may be too old. (init at /pytorch/caffe2/serialize/inline_container.cc:132)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7f71c7b6b193 in /data2//.conda/envs/py36/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: caffe2::serialize::PyTorchStreamReader::init() + 0x1f5b (0x7f71cacf39eb in /data2//.conda/envs/py36/lib/python3.6/site-packages/torch/lib/libtorch.so)
frame #2: caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader(std::string const&) + 0x64 (0x7f71cacf4c04 in /data2//.conda/envs/py36/lib/python3.6/site-packages/torch/lib/libtorch.so)
frame #3: + 0x6c53a6 (0x7f7212c243a6 in /data2//.conda/envs/py36/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #4: + 0x2961c4 (0x7f72127f51c4 in /data2//.conda/envs/py36/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #49: __libc_start_main + 0xf0 (0x7f7229a8f840 in /lib/x86_64-linux-gnu/libc.so.6)
我的版本:
transformer 4.4.2
pytorch 1.4.0
answer = self.tokenizer.decode(input_ids[answer_start[0][i]:answer_end[0][i] + 1], skip_special_tokens=True)
当在从input_id转tokens的时候存在unk的情况,这种情况下怎么才能将unk对应到原文的内容呢?
例子:
问题:我居住在哪里?
段落:我居住在adsffasdf
应该输出adsffasdf
但是上面代码输出[UNK]
感谢大佬的分享和研发,可否提供dureader和webQA处理过的数据集呢,便于我们用于训练,感谢
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.