Light

lc1332 / camelbell-chinese-lora Goto Github PK

View Code? Open in Web Editor NEW

172.0 172.0 18.0 740 KB

CamelBell（驼铃) is be a Chinese Language Tuning project based on LoRA. CamelBell is belongs to Project Luotuo(骆驼), an open sourced Chinese-LLM project created by 冷子昂 @ 商汤科技 & 陈启源 @ 华中师范大学 & 李鲁鲁 @ 商汤科技

License: Apache License 2.0

Jupyter Notebook 97.73% Python 2.27%

camelbell-chinese-lora's People

Contributors

Stargazers

Watchers

Forkers

mobe1978 pangpang97 ffengill dumpmemory sheli00 e-tjan147 techthiyanes cn-vhql chasechun8 jangocheng yestarone ydk1234 ofshellohicy minghsuanwu mppsk0 topworld88 yaqian012

camelbell-chinese-lora's Issues

训练数据怎么设计可以让模型更好的学习知识

我有同样问题，让模型学习知识，但是回答的时候胡说八道

关于哈利波特项目的一点建议

对哈利波特这个项目很感兴趣，但遗憾的是我对哈利波特并不是很了解。为什么不训练**的四大名著之类的，例如三国演义，这样应该有更多了解故事背景的人可以参与进来。

training code

hello, how about the training code？ what time would you plan to clean and release it? I have tried some methods for Chinese but failed.

训练数据集预处理问题

作者好，我看到你们采用了一个80个问答的数据集来训练LoRa权重，想问下你们这个文本数据集是自己手动构建的吗？如果我有一个中文的文本，如何将其快速的构建成这样一个标准数据集呢？

使用您提供的CamelBell-C的colab时遇到报错

运行下面代码时：
torch.set_default_tensor_type(torch.cuda.HalfTensor)

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)

model = AutoModel.from_pretrained(
"THUDM/chatglm-6b",
trust_remote_code=True,
device_map=DeviceMap("ChatGLM").get()
)
出现如下报错：

AttributeError Traceback (most recent call last)
in <cell line: 3>()
1 torch.set_default_tensor_type(torch.cuda.HalfTensor)
2
----> 3 tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
4
5 model = AutoModel.from_pretrained(

7 frames
~/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b/8b7d33596d18c5e83e2da052d05ca4db02e60620/tokenization_chatglm.py in vocab_size(self)
242 def vocab_size(self):
243 """ Returns vocab size """
--> 244 return self.sp_tokenizer.num_tokens
245
246 def get_vocab(self):

AttributeError: 'ChatGLMTokenizer' object has no attribute 'sp_tokenizer'

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.