wptoux / albert-chinese-large-webqa Goto Github PK

View Code? Open in Web Editor NEW

75.0 75.0 15.0 59.16 MB

基于百度webqa与dureader数据集训练的Albert Large QA模型

License: Apache License 2.0

Jupyter Notebook 100.00%

albert-chinese-large-webqa's Issues

您好，您部署在hugging face 上的模型好像不能得到结果。

如题，https://huggingface.co/wptoux/albert-chinese-large-qa
上我看例子得不到结果。

请问这个模型如何使用呢？

1、使用pipeline方法出现了 Typeerror: not a string的情况，如何解决？
2、如果不能使用pipeline方法，现在我的数据里有问句+文本，应该如何使用这个模型呢？
恳请赐教~~

请问该使用哪个版本的transformers 来finetune?

4.1.1 会有 not a string issue 在run_squad.py

Traceback (most recent call last):
File "/data2//.conda/envs/py36/lib/python3.6/site-packages/transformers/modeling_utils.py", line 1062, in from_pretrained
state_dict = torch.load(resolved_archive_file, map_location="cpu")
File "/data2//.conda/envs/py36/lib/python3.6/site-packages/torch/serialization.py", line 527, in load
with _open_zipfile_reader(f) as opened_zipfile:
File "/data2//.conda/envs/py36/lib/python3.6/site-packages/torch/serialization.py", line 224, in init
super(_open_zipfile_reader, self).init(torch.C.PyTorchFileReader(name_or_buffer))
RuntimeError: version <= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED at /pytorch/caffe2/serialize/inline_container.cc:132, please report a bug to PyTorch. Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 2. Your PyTorch installation may be too old. (init at /pytorch/caffe2/serialize/inline_container.cc:132)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7f71c7b6b193 in /data2//.conda/envs/py36/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: caffe2::serialize::PyTorchStreamReader::init() + 0x1f5b (0x7f71cacf39eb in /data2//.conda/envs/py36/lib/python3.6/site-packages/torch/lib/libtorch.so)
frame #2: caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader(std::string const&) + 0x64 (0x7f71cacf4c04 in /data2//.conda/envs/py36/lib/python3.6/site-packages/torch/lib/libtorch.so)
frame #3: + 0x6c53a6 (0x7f7212c243a6 in /data2//.conda/envs/py36/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #4: + 0x2961c4 (0x7f72127f51c4 in /data2//.conda/envs/py36/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

frame #49: __libc_start_main + 0xf0 (0x7f7229a8f840 in /lib/x86_64-linux-gnu/libc.so.6)

我的版本：
transformer 4.4.2
pytorch 1.4.0

输出UNK的问题

answer = self.tokenizer.decode(input_ids[answer_start[0][i]:answer_end[0][i] + 1], skip_special_tokens=True)
当在从input_id转tokens的时候存在unk的情况，这种情况下怎么才能将unk对应到原文的内容呢？

例子：
问题：我居住在哪里？
段落：我居住在adsffasdf

应该输出adsffasdf
但是上面代码输出[UNK]

清洗后的数据集可否提供

感谢大佬的分享和研发，可否提供dureader和webQA处理过的数据集呢，便于我们用于训练，感谢

wptoux / albert-chinese-large-webqa Goto Github PK

albert-chinese-large-webqa's People

Contributors

Stargazers

Watchers

Forkers

albert-chinese-large-webqa's Issues

您好，您部署在hugging face 上的模型好像不能得到结果。

请问这个模型如何使用呢？

请问该使用哪个版本的transformers 来finetune?

加载出错,不知道啥原因

输出UNK的问题

清洗后的数据集可否提供

我想问一下，为什么参数量这么小，通常的权重文件都得300多M,谢谢

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent