View Code? Open in Web Editor
NEW
CamelBell(驼铃) is be a Chinese Language Tuning project based on LoRA. CamelBell is belongs to Project Luotuo(骆驼), an open sourced Chinese-LLM project created by 冷子昂 @ 商汤科技 & 陈启源 @ 华中师范大学 & 李鲁鲁 @ 商汤科技
License: Apache License 2.0
Jupyter Notebook 97.73%
Python 2.27%
camelbell-chinese-lora's People
Contributors
camelbell-chinese-lora's Issues
我有同样问题,让模型学习知识,但是回答的时候胡说八道
对哈利波特这个项目很感兴趣,但遗憾的是我对哈利波特并不是很了解。为什么不训练**的四大名著之类的,例如三国演义,这样应该有更多了解故事背景的人可以参与进来。
hello, how about the training code? what time would you plan to clean and release it? I have tried some methods for Chinese but failed.
作者好,我看到你们采用了一个80个问答的数据集来训练LoRa权重,想问下你们这个文本数据集是自己手动构建的吗?如果我有一个中文的文本,如何将其快速的构建成这样一个标准数据集呢?
运行下面代码时:
torch.set_default_tensor_type(torch.cuda.HalfTensor)
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained(
"THUDM/chatglm-6b",
trust_remote_code=True,
device_map=DeviceMap("ChatGLM").get()
)
出现如下报错:
AttributeError Traceback (most recent call last)
in <cell line: 3>()
1 torch.set_default_tensor_type(torch.cuda.HalfTensor)
2
----> 3 tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
4
5 model = AutoModel.from_pretrained(
7 frames
~/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b/8b7d33596d18c5e83e2da052d05ca4db02e60620/tokenization_chatglm.py in vocab_size(self)
242 def vocab_size(self):
243 """ Returns vocab size """
--> 244 return self.sp_tokenizer.num_tokens
245
246 def get_vocab(self):
AttributeError: 'ChatGLMTokenizer' object has no attribute 'sp_tokenizer'