Git Product home page Git Product logo

lc1332 / camelbell-chinese-lora Goto Github PK

View Code? Open in Web Editor NEW
172.0 172.0 18.0 740 KB

CamelBell(驼铃) is be a Chinese Language Tuning project based on LoRA. CamelBell is belongs to Project Luotuo(骆驼), an open sourced Chinese-LLM project created by 冷子昂 @ 商汤科技 & 陈启源 @ 华中师范大学 & 李鲁鲁 @ 商汤科技

License: Apache License 2.0

Jupyter Notebook 97.73% Python 2.27%

camelbell-chinese-lora's People

Contributors

lc1332 avatar wu-fu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

camelbell-chinese-lora's Issues

关于哈利波特项目的一点建议

对哈利波特这个项目很感兴趣,但遗憾的是我对哈利波特并不是很了解。为什么不训练**的四大名著之类的,例如三国演义,这样应该有更多了解故事背景的人可以参与进来。

training code

hello, how about the training code? what time would you plan to clean and release it? I have tried some methods for Chinese but failed.

训练数据集预处理问题

作者好,我看到你们采用了一个80个问答的数据集来训练LoRa权重,想问下你们这个文本数据集是自己手动构建的吗?如果我有一个中文的文本,如何将其快速的构建成这样一个标准数据集呢?

使用您提供的CamelBell-C的colab时遇到报错

运行下面代码时:
torch.set_default_tensor_type(torch.cuda.HalfTensor)

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)

model = AutoModel.from_pretrained(
"THUDM/chatglm-6b",
trust_remote_code=True,
device_map=DeviceMap("ChatGLM").get()
)
出现如下报错:

AttributeError Traceback (most recent call last)
in <cell line: 3>()
1 torch.set_default_tensor_type(torch.cuda.HalfTensor)
2
----> 3 tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
4
5 model = AutoModel.from_pretrained(

7 frames
~/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b/8b7d33596d18c5e83e2da052d05ca4db02e60620/tokenization_chatglm.py in vocab_size(self)
242 def vocab_size(self):
243 """ Returns vocab size """
--> 244 return self.sp_tokenizer.num_tokens
245
246 def get_vocab(self):

AttributeError: 'ChatGLMTokenizer' object has no attribute 'sp_tokenizer'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.