lyogavin / anima Goto Github PK
View Code? Open in Web Editor NEW33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU
License: Apache License 2.0
33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU
License: Apache License 2.0
RLHF训练只有100个step会不会数据学习不够重复,更多step会更好吗
from airllm import AirLLMLlama2
MAX_LENGTH = 128
# could use hugging face model repo id:
#model = AirLLMLlama2("garage-bAInd/Platypus2-70B-instruct")
model = AirLLMLlama2("Qwen-14B-Chat",
compression='4bit' # specify '8bit' for 8-bit block-wise quantization "Yi-34B-Chat",
)
# or use model's local path...
#model = AirLLMLlama2("/home/ubuntu/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f")
input_text = [
'What is the capital of United States?',
#'I like',
]
input_tokens = model.tokenizer(input_text,
return_tensors="pt",
return_attention_mask=False,
truncation=True,
max_length=MAX_LENGTH,
padding=True)
generation_output = model.generate(
input_tokens['input_ids'].cuda(),
max_new_tokens=20,
use_cache=True,
return_dict_in_generate=True)
output = model.tokenizer.decode(generation_output.sequences[0])
print(output)
bitsandbytes installed
bitsandbytes installed
bitsandbytes installed
0%| | 0/3 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/luhao/air22.py", line 6, in
model = AirLLMLlama2("Qwen-14B-Chat",
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/airllm/airllm.py", line 75, in init
self.model_local_path, self.checkpoint_path = find_or_create_local_splitted_path(model_local_path_or_repo_id,
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/airllm/utils.py", line 276, in find_or_create_local_splitted_path
return Path(model_local_path_or_repo_id), split_and_save_layers(model_local_path_or_repo_id, layer_shards_saving_path,
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/airllm/utils.py", line 220, in split_and_save_layers
if max(shards) > shard:
ValueError: max() arg is an empty sequence
执行了示例代码,报错了
请问您方便透露,在H100 / A100上每个 step 的平均用时,以及 10000 step 共训练了多久吗?
我的理解是 用来训练超长文本的策略优化都在 modeling_flash_llama文件里 ,但是训练的 longer_training 文件中没有体现modeling_flash_llama里的内容啊 模型结构加载的还是原来的结构 没有替换成 modeling_flash_llama里改的结构 这个地方有些不太理解,希望解答一下,谢谢!
我对你们这个模型挺感兴趣的。
请问有详细的个性化微调、推理的技术交流或者支持吗?交流社区什么的
推理的性能怎么样?
训练和推理的硬件配置建议怎么样?
另外,可以用vllm来推动吗?
非常感谢您在DPO上的工作!我在使用lyogavin/Anima33B-DPO-Belle-1k-merged模型做推断的时候,产生了结尾重复现象(在推断的时候尝试了3种模板,但都很容易有重复现象),希望帮忙解答一下问题所在。下面会贴出我使用的推断样例、推断代码。
一、推断样例
因为不确定使用DPO模型推断的模版是什么,所以在推断的时候尝试了三种模板:1.直接提问;2.参考DPO训练时的模板;3.test_cn_dataset_lenghts.py
文件中的source_template。
输入:世界上最长的河流是什么?
输出:世界上最长的河是尼罗河。它从埃及到埃塞俄比亚流经了非洲大陆,长约6650公里。它是世界上最长、最宽、最深、最漫长、最湿润、最有生态产生力的河。它汇集了埃及、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比
输入:\n\nHuman: 世界上最长的河流是什么?\n\nAssistant:
输出:世界上最长的河流是尼罗河。它从非洲的乌吉亚高原开始,流经非洲和欧洲,最终流入埃及的瓦丽纳盆地。它的长度约为6650公里。尼罗河汇集了非洲和欧洲的多个河流,汇集了很多水资源,是非洲和欧洲最重要的河流之一。尼罗河汇集了埃及、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞�
输入:Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n世界上最长的河流是什么?\n\n### Response:
输出:世界上最长的河流是尼罗河。它从埃及的狮子山源头开始,流经埃及、苏丹、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、��
二、推断代码
# imports
import torch
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from peft import PeftModel
from transformers import GenerationConfig, LlamaForCausalLM, LlamaTokenizer
import json
from tqdm import tqdm
# create tokenizer
base_model = "Anima33B-DPO-Belle-1k-merged"
tokenizer = LlamaTokenizer.from_pretrained(base_model)
# base model
model = LlamaForCausalLM.from_pretrained(
base_model,
torch_dtype=torch.float16,
device_map="auto",
)
model.eval()
prompt = input("")
inputs = tokenizer(prompt, return_tensors="pt")
generate_ids = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0])
你好大佬,请问可以训练llama吗
Thank you for your work! I wonder if there is any experience with fine-tuning QLoRA on other base models? I'm not sure if most of the base models on Huggingface would perform well? Especially for chinese.
这是一个非常nice的工作。
这里我有个小问题想请教一下:
如题,Anima33B的adapter model是和原始的LLama合并后得到Anima33B merged嘛
My company is doing performance optimizations, so I follow your work closely, to learn.
Your example returns:
What is the capital of United States?
W
Ubuntu 22.04, NVidia 4090, Cuda 12.3
(img-caption-py3.10) volker@power:~/workspace/PYTHON/img_caption/img_caption$ python3 llama2.py
Fetching 25 files: 100%|████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 499321.90it/s]
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
cuda:0: 100%|███████████████████████████████████████████████████████████████████████████████████| 83/83 [01:01<00:00, 1.35it/s]
returning kvcache size: torch.Size([1, 8, 9, 128])
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
cuda:0: 100%|███████████████████████████████████████████████████████████████████████████████████| 83/83 [01:01<00:00, 1.35it/s]
returning kvcache size: torch.Size([1, 8, 10, 128])
> /home/volker/workspace/PYTHON/img_caption/img_caption/llama2.py(30)<module>()
-> for idx, answer in enumerate(generation_output.sequences):
(Pdb) generation_output.sequences
tensor([[ 1, 1724, 338, 278, 7483, 310, 3303, 3900, 29973, 13,
29956]], device='cuda:0')
(Pdb) c
<s> What is the capital of United States?
W
Reference模型如果使用大模型,base模型使用小模型能够让模型学习到reference的特征吗?
大佬,有没有测评啊 想看
想要进行llama-13b的ppo,应该要对代码中的哪些部分进行修改啊
请问可以分享训练集的loss变化吗?我只有看到evalation的loss变化
Thanks!
请问能支援 qwen72b,qwen72b-int4 吗? 非常需要支持这个模型,要不根本跑不起来
artidoro/qlora#161
如题,该issue下没有合适的答案,来问问怎么做?不胜感激。
Hello this is an awesome project, I replicated it on Modal Labs on a small T4 GPU.
The problem I see now is that by loading one layer per time, you are not maximizing the GPU VRAM usage, for instance in this case it used only 1.6 GB of VRAM, I guess it is the size of one layer.
Would it be possible instead to load N layers with a configuration parameter?
Code example here: https://gist.github.com/priamai/61aa332c42b89f518dcf134c38dd593d
Hello,
A medium article gave the information that we can use four different ways to optimize the model
Are there examples to show each of these methods?
Thanks.
在 qlora_dpo.py中,看到对chosen 进行 max_length=self.source_max_len
的tokenize,对rejected进行max_length=self.target_max_len
的tokenize,为什么呢?
Line 491 in dc691b2
Hello, I just installed AirLLM as mentioned in the Medium post.
pip install airllm
inference.py
from airllm import AirLLMLlama2
MAX_LENGTH = 128
# could use hugging face model repo id:
model = AirLLMLlama2("garage-bAInd/Platypus2-70B-instruct")
# or use model's local path...
#model = AirLLMLlama2("/home/ubuntu/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f")
input_text = [
'What is the capital of United States?',
]
input_tokens = model.tokenizer(input_text,
return_tensors="pt",
return_attention_mask=False,
truncation=True,
max_length=MAX_LENGTH,
padding=True)
generation_output = model.generate(
input_tokens['input_ids'].cuda(),
max_new_tokens=20,
use_cache=True,
return_dict_in_generate=True)
output = model.tokenizer.decode(generation_output.sequences[0])
print(output)
AssertionError: Torch not compiled with CUDA enabled
Here is result:
(base) andrey@m2 current % python ./inference.py
Downloading README.md: 100%|███████████████████████████████████████████████████████████████████| 5.15k/5.15k [00:00<00:00, 10.6MB/s]
Downloading Best_Platty_small.jpeg: 100%|██████████████████████████████████████████████████████| 7.35k/7.35k [00:00<00:00, 23.3MB/s]
Downloading generation_config.json: 100%|██████████████████████████████████████████████████████████| 154/154 [00:00<00:00, 2.27MB/s]
Downloading config.json: 100%|█████████████████████████████████████████████████████████████████████| 632/632 [00:00<00:00, 11.5MB/s]
Downloading .gitattributes: 100%|██████████████████████████████████████████████████████████████| 1.52k/1.52k [00:00<00:00, 29.1MB/s]
Downloading (…)l-00006-of-00015.bin: 100%|█████████████████████████████████████████████████████| 9.80G/9.80G [22:08<00:00, 7.38MB/s]
Downloading (…)l-00007-of-00015.bin: 100%|█████████████████████████████████████████████████████| 9.97G/9.97G [22:42<00:00, 7.31MB/s]
Downloading (…)l-00001-of-00015.bin: 100%|█████████████████████████████████████████████████████| 9.85G/9.85G [23:20<00:00, 7.04MB/s]
Downloading (…)l-00002-of-00015.bin: 100%|█████████████████████████████████████████████████████| 9.80G/9.80G [28:15<00:00, 5.78MB/s]
Downloading (…)l-00008-of-00015.bin: 100%|█████████████████████████████████████████████████████| 9.80G/9.80G [28:54<00:00, 5.65MB/s]
Downloading (…)l-00003-of-00015.bin: 100%|█████████████████████████████████████████████████████| 9.97G/9.97G [29:04<00:00, 5.71MB/s] | 210M/9.80G [00:35<28:43, 5.56MB/s]
Downloading (…)l-00004-of-00015.bin: 100%|█████████████████████████████████████████████████████| 9.80G/9.80G [30:55<00:00, 5.28MB/s] | 283M/9.80G [00:45<23:49, 6.65MB/s]
Downloading (…)l-00005-of-00015.bin: 100%|█████████████████████████████████████████████████████| 9.80G/9.80G [31:52<00:00, 5.12MB/s] | 944M/9.80G [02:36<25:19, 5.83MB/s]
Downloading (…)model.bin.index.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 66.7k/66.7k [00:00<00:00, 468kB/s]
Downloading (…)cial_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 414/414 [00:00<00:00, 3.03MB/s]
Downloading tokenizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.84M/1.84M [00:01<00:00, 1.56MB/s]
Downloading tokenizer.model: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 1.15MB/s]
Downloading tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 698/698 [00:00<00:00, 3.92MB/s]
Downloading (…)l-00015-of-00015.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 524M/524M [01:27<00:00, 6.02MB/s]
Downloading (…)l-00009-of-00015.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.80G/9.80G [19:29<00:00, 8.38MB/s]
Downloading (…)l-00010-of-00015.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.80G/9.80G [19:19<00:00, 8.45MB/s]
Downloading (…)l-00011-of-00015.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.97G/9.97G [21:02<00:00, 7.90MB/s]
Downloading (…)l-00014-of-00015.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.50G/9.50G [18:10<00:00, 8.71MB/s]
Downloading (…)l-00012-of-00015.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.80G/9.80G [19:16<00:00, 8.47MB/s]
Downloading (…)l-00013-of-00015.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.80G/9.80G [18:41<00:00, 8.74MB/s]
Fetching 25 files: 100%|███████████████████████████████████████████████████████████████████████████| 25/25 [47:39<00:00, 114.38s/it]██████████████████| 9.50G/9.50G [18:10<00:00, 12.0MB/s]
0%| | 0/83 [00:00<?, ?it/s]Loading shard 1/150013-of-00015.bin: 71%|█████████████████████████████████████████████████████████████████████████████▏ | 7.00G/9.80G [15:26<04:04, 11.4MB/s]
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.embed_tokens.safetensorsownloading (…)l-00013-of-00015.bin: 99%|██████████████████████████████████████████████████████████████████████████████████████████████████████████▊ | 9.69G/9.80G [18:38<00:04, 23.8MB/s]
1%|█▊ | 1/83 [00:01<02:36, 1.91s/it]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.0.safetensors
2%|███▋ | 2/83 [00:02<01:23, 1.03s/it]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.1.safetensors
4%|█████▍ | 3/83 [00:02<00:58, 1.36it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.2.safetensors
5%|███████▎ | 4/83 [00:03<00:47, 1.66it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.3.safetensors
6%|█████████ | 5/83 [00:03<00:43, 1.78it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.4.safetensors
7%|██████████▉ | 6/83 [00:03<00:38, 1.99it/s]Loading shard 2/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.5.safetensors
8%|████████████▋ | 7/83 [00:05<01:15, 1.01it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.6.safetensors
10%|██████████████▌ | 8/83 [00:06<00:59, 1.26it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.7.safetensors
11%|████████████████▎ | 9/83 [00:06<00:49, 1.49it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.8.safetensors
12%|██████████████████ | 10/83 [00:07<00:42, 1.71it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.9.safetensors
13%|███████████████████▉ | 11/83 [00:07<00:38, 1.85it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.10.safetensors
14%|█████████████████████▋ | 12/83 [00:08<00:36, 1.96it/s]Loading shard 3/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.11.safetensors
16%|███████████████████████▍ | 13/83 [00:10<01:08, 1.02it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.12.safetensors
17%|█████████████████████████▎ | 14/83 [00:10<00:55, 1.25it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.13.safetensors
18%|███████████████████████████ | 15/83 [00:10<00:46, 1.47it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.14.safetensors
19%|████████████████████████████▉ | 16/83 [00:11<00:39, 1.69it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.15.safetensors
20%|██████████████████████████████▋ | 17/83 [00:11<00:36, 1.82it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.16.safetensors
22%|████████████████████████████████▌ | 18/83 [00:12<00:32, 2.00it/s]Loading shard 4/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.17.safetensors
23%|██████████████████████████████████▎ | 19/83 [00:13<00:58, 1.09it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.18.safetensors
24%|████████████████████████████████████▏ | 20/83 [00:14<00:49, 1.27it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.19.safetensors
25%|█████████████████████████████████████▉ | 21/83 [00:14<00:42, 1.47it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.20.safetensors
27%|███████████████████████████████████████▊ | 22/83 [00:15<00:36, 1.68it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.21.safetensors
28%|█████████████████████████████████████████▌ | 23/83 [00:15<00:31, 1.88it/s]Loading shard 5/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.22.safetensors
29%|███████████████████████████████████████████▎ | 24/83 [00:17<00:56, 1.04it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.23.safetensors
30%|█████████████████████████████████████████████▏ | 25/83 [00:18<00:46, 1.25it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.24.safetensors
31%|██████████████████████████████████████████████▉ | 26/83 [00:18<00:38, 1.47it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.25.safetensors
33%|████████████████████████████████████████████████▊ | 27/83 [00:18<00:34, 1.62it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.26.safetensors
34%|██████████████████████████████████████████████████▌ | 28/83 [00:19<00:30, 1.82it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.27.safetensors
35%|████████████████████████████████████████████████████▍ | 29/83 [00:19<00:27, 1.94it/s]Loading shard 6/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.28.safetensors
36%|██████████████████████████████████████████████████████▏ | 30/83 [00:21<00:50, 1.06it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.29.safetensors
37%|████████████████████████████████████████████████████████ | 31/83 [00:22<00:41, 1.26it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.30.safetensors
39%|█████████████████████████████████████████████████████████▊ | 32/83 [00:22<00:34, 1.49it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.31.safetensors
40%|███████████████████████████████████████████████████████████▋ | 33/83 [00:22<00:30, 1.66it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.32.safetensors
41%|█████████████████████████████████████████████████████████████▍ | 34/83 [00:23<00:26, 1.86it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.33.safetensors
42%|███████████████████████████████████████████████████████████████▎ | 35/83 [00:23<00:24, 1.98it/s]Loading shard 7/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.34.safetensors
43%|█████████████████████████████████████████████████████████████████ | 36/83 [00:25<00:45, 1.04it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.35.safetensors
45%|██████████████████████████████████████████████████████████████████▊ | 37/83 [00:26<00:37, 1.24it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.36.safetensors
46%|████████████████████████████████████████████████████████████████████▋ | 38/83 [00:26<00:30, 1.47it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.37.safetensors
47%|██████████████████████████████████████████████████████████████████████▍ | 39/83 [00:27<00:25, 1.69it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.38.safetensors
48%|████████████████████████████████████████████████████████████████████████▎ | 40/83 [00:27<00:23, 1.86it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.39.safetensors
49%|██████████████████████████████████████████████████████████████████████████ | 41/83 [00:27<00:20, 2.04it/s]Loading shard 8/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.40.safetensors
51%|███████████████████████████████████████████████████████████████████████████▉ | 42/83 [00:29<00:38, 1.06it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.41.safetensors
52%|█████████████████████████████████████████████████████████████████████████████▋ | 43/83 [00:30<00:31, 1.29it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.42.safetensors
53%|███████████████████████████████████████████████████████████████████████████████▌ | 44/83 [00:30<00:25, 1.52it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.43.safetensors
54%|█████████████████████████████████████████████████████████████████████████████████▎ | 45/83 [00:30<00:21, 1.74it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.44.safetensors
55%|███████████████████████████████████████████████████████████████████████████████████▏ | 46/83 [00:31<00:19, 1.86it/s]Loading shard 9/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.45.safetensors
57%|████████████████████████████████████████████████████████████████████████████████████▉ | 47/83 [00:33<00:34, 1.06it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.46.safetensors
58%|██████████████████████████████████████████████████████████████████████████████████████▋ | 48/83 [00:33<00:27, 1.28it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.47.safetensors
59%|████████████████████████████████████████████████████████████████████████████████████████▌ | 49/83 [00:34<00:22, 1.51it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.48.safetensors
60%|██████████████████████████████████████████████████████████████████████████████████████████▎ | 50/83 [00:34<00:20, 1.65it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.49.safetensors
61%|████████████████████████████████████████████████████████████████████████████████████████████▏ | 51/83 [00:35<00:17, 1.81it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.50.safetensors
63%|█████████████████████████████████████████████████████████████████████████████████████████████▉ | 52/83 [00:35<00:15, 1.99it/s]Loading shard 10/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.51.safetensors
64%|███████████████████████████████████████████████████████████████████████████████████████████████▊ | 53/83 [00:37<00:28, 1.06it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.52.safetensors
65%|█████████████████████████████████████████████████████████████████████████████████████████████████▌ | 54/83 [00:37<00:22, 1.27it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.53.safetensors
66%|███████████████████████████████████████████████████████████████████████████████████████████████████▍ | 55/83 [00:38<00:18, 1.50it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.54.safetensors
67%|█████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 56/83 [00:38<00:15, 1.72it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.55.safetensors
69%|███████████████████████████████████████████████████████████████████████████████████████████████████████ | 57/83 [00:38<00:13, 1.91it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.56.safetensors
70%|████████████████████████████████████████████████████████████████████████████████████████████████████████▊ | 58/83 [00:39<00:12, 2.07it/s]Loading shard 11/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.57.safetensors
71%|██████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 59/83 [00:41<00:22, 1.07it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.58.safetensors
72%|████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 60/83 [00:41<00:18, 1.27it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.59.safetensors
73%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 61/83 [00:42<00:14, 1.50it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.60.safetensors
75%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 62/83 [00:42<00:12, 1.71it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.61.safetensors
76%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊ | 63/83 [00:42<00:10, 1.88it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.62.safetensors
77%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 64/83 [00:43<00:09, 2.05it/s]Loading shard 12/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.63.safetensors
78%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 65/83 [00:45<00:17, 1.05it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.64.safetensors
80%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 66/83 [00:45<00:13, 1.27it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.65.safetensors
81%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 67/83 [00:46<00:10, 1.50it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.66.safetensors
82%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 68/83 [00:46<00:08, 1.72it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.67.safetensors
83%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 69/83 [00:47<00:07, 1.84it/s]Loading shard 13/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.68.safetensors
84%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 70/83 [00:48<00:12, 1.04it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.69.safetensors
86%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 71/83 [00:49<00:09, 1.26it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.70.safetensors
87%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 72/83 [00:49<00:07, 1.49it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.71.safetensors
88%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 73/83 [00:50<00:05, 1.67it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.72.safetensors
89%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 74/83 [00:50<00:04, 1.86it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.73.safetensors
90%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 75/83 [00:50<00:03, 2.03it/s]Loading shard 14/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.74.safetensors
92%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 76/83 [00:52<00:06, 1.08it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.75.safetensors
93%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 77/83 [00:53<00:04, 1.26it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.76.safetensors
94%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 78/83 [00:53<00:03, 1.47it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.77.safetensors
95%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊ | 79/83 [00:54<00:02, 1.67it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.78.safetensors
96%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 80/83 [00:54<00:01, 1.83it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.79.safetensors
98%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 81/83 [00:55<00:00, 2.00it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.norm.safetensors
Loading shard 15/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/lm_head.safetensors
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 83/83 [00:55<00:00, 1.50it/s]
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
Traceback (most recent call last):
File "/Users/andrey/air_llm/current/./inference.py", line 5, in <module>
model = AirLLMLlama2("garage-bAInd/Platypus2-70B-instruct")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/andrey/miniconda3/lib/python3.11/site-packages/airllm/airllm.py", line 184, in __init__
self.init_model()
File "/Users/andrey/miniconda3/lib/python3.11/site-packages/airllm/airllm.py", line 205, in init_model
set_module_tensor_to_device(self.model, buffer_name, self.running_device, value=buffer,
File "/Users/andrey/miniconda3/lib/python3.11/site-packages/accelerate/utils/modeling.py", line 317, in set_module_tensor_to_device
new_value = value.to(device)
^^^^^^^^^^^^^^^^
File "/Users/andrey/miniconda3/lib/python3.11/site-packages/torch/cuda/__init__.py", line 289, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
我测试了一下infer代码,在我的4090上跑不起来,显存不够。
您好,
我想问下训练集有开源的计划吗,
感谢!
您好 请问模型训练最低需要显存是多少呢?目前3*A10(24G)72G显存可以跑起来吗?
目前想法是使用贵项目merge后的模型根据自己的数据集再做qlora,请问是在.sh文件中直接改参数就可以实现吗
Hey @lyogavin ,thanks a lot for sharing this awesome job!
I've got a silly question to ask. When using qlora to train a model, is it possible to load either the normal or quantized base model? Both seem to train fine, but it appears that loading the unquantized base model trains a bit faster. I was wondering if there are any differences between the models trained in these two modes.
Great work you are doing here!
install_requires=[ # I get to this in a second
'tqdm',
'torch',
'transformers',
'accelerate',
'safetensors',
'optimum',
'huggingface_hub'
'scipy',
#'bitsandbytes' set it to optional to support fallback when not installable
],
->>
'huggingface_hub',
Cheers,
Volker
request from wx public account
现在这个版本输出经常输出英文,即使是用中文提的问题。
有什么办法稳定中文的输出吗?
在按照Readme中的例子尝试推理的时候,根据报错进行了少数修改:
# base model
model = LlamaForCausalLM.from_pretrained(
base_model,
torch_dtype=torch.float16,
device_map="auto",
offload_state_dict = True,
offload_folder='./offload',
)
# LORA PEFT adapters
adapter_model = "lyogavin/Anima33B"
model = PeftModel.from_pretrained(
model,
adapter_model,
#torch_dtype=torch.float16,
device_map={"":0},
)
model.eval()
其中在LlamaForCausalLM
调用过程中,我添加了offload_folder
参数(根据huggingface文档);
在PeftModel
调用过程中,device_map
需要设为cuda设备(即0)否则会报OOM;
但是即使经过如此修改后仍不能运行:
accelerate/utils/modeling.py:154
ValueError: weight is on the meta device, we need a `value` to put in on 0.
我想知道是否是device_map
参数的问题,以及如何调整?
已经搜索过没有类似问题。
我的配置是
跑inference代码的时候报错
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda121.so...
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /srv/chat/Anima/infer.py:9 in <module> │
│ │
│ 6 │
│ 7 # create tokenizer │
│ 8 base_model = "timdettmers/guanaco-33b-merged" │
│ ❱ 9 tokenizer = LlamaTokenizer.from_pretrained(base_model) │
│ 10 │
│ 11 # base model │
│ 12 model = LlamaForCausalLM.from_pretrained( │
│ │
│ /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base │
│ .py:1825 in from_pretrained │
│ │
│ 1822 │ │ │ else: │
│ 1823 │ │ │ │ logger.info(f"loading file {file_path} from cache at │
│ 1824 │ │ │
│ ❱ 1825 │ │ return cls._from_pretrained( │
│ 1826 │ │ │ resolved_vocab_files, │
│ 1827 │ │ │ pretrained_model_name_or_path, │
│ 1828 │ │ │ init_configuration, │
│ │
│ /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base │
│ .py:1988 in _from_pretrained │
│ │
│ 1985 │ │ │
│ 1986 │ │ # Instantiate tokenizer. │
│ 1987 │ │ try: │
│ ❱ 1988 │ │ │ tokenizer = cls(*init_inputs, **init_kwargs) │
│ 1989 │ │ except OSError: │
│ 1990 │ │ │ raise OSError( │
│ 1991 │ │ │ │ "Unable to load vocabulary from file. " │
│ │
│ /usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenizati │
│ on_llama.py:96 in __init__ │
│ │
│ 93 │ │ self.add_bos_token = add_bos_token │
│ 94 │ │ self.add_eos_token = add_eos_token │
│ 95 │ │ self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwa │
│ ❱ 96 │ │ self.sp_model.Load(vocab_file) │
│ 97 │ │
│ 98 │ def __getstate__(self): │
│ 99 │ │ state = self.__dict__.copy() │
│ │
│ /usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py:905 in │
│ Load │
│ │
│ 902 │ │ raise RuntimeError('model_file and model_proto must be exclus │
│ 903 │ if model_proto: │
│ 904 │ │ return self.LoadFromSerializedProto(model_proto) │
│ ❱ 905 │ return self.LoadFromFile(model_file) │
│ 906 │
│ 907 │
│ 908 # Register SentencePieceProcessor in _sentencepiece: │
│ │
│ /usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py:310 in │
│ LoadFromFile │
│ │
│ 307 │ │ return _sentencepiece.SentencePieceProcessor_serialized_model │
│ 308 │ │
│ 309 │ def LoadFromFile(self, arg): │
│ ❱ 310 │ │ return _sentencepiece.SentencePieceProcessor_LoadFromFile(sel │
│ 311 │ │
│ 312 │ def _EncodeAsIds(self, text, enable_sampling, nbest_size, alpha, │
│ 313 │ │ return _sentencepiece.SentencePieceProcessor__EncodeAsIds(sel │
╰──────────────────────────────────────────────────────────────────────────────╯
TypeError: not a string
这是什么原因?
对生成认识不清
++ echo 'START TIME: Mon Jun 19 19:22:10 CST 2023'
START TIME: Mon Jun 19 19:22:10 CST 2023
++ ROOT_DIR_BASE=/Anima/saved_models/qlora_cn
++ OUTPUT_PATH=/Anima/saved_models/qlora_cn/output_1687173730
++ mkdir -p /Anima/saved_models/qlora_cn/output_1687173730
++ python qlora.py --dataset=chinese-vicuna --dataset_format=alpaca-clean --learning_rate 0.0001 --per_device_train_batch_size 1 --gradient_accumulation_steps 16 --max_steps 10000 --model_name_or_path timdettmers/guanaco-33b-merged --source_max_len 512 --target_max_len
512 --eval_dataset_size 1 --do_eval --evaluation_strategy steps --eval_steps 200 --output_dir /Anima/saved_models/qlora_cn/output_1687173730 --report_to wandb --sample_generate --save_steps 200
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
bin /root/miniconda3/envs/anima/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda121.so
/root/miniconda3/envs/anima/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:148: UserWarning: /root/miniconda3/envs/anima did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
/root/miniconda3/envs/anima/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:148: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/lcoal/cuda/lib64')}
warn(msg)
/root/miniconda3/envs/anima/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:148: UserWarning: /usr/lcoal/cuda/lib64: did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /root/miniconda3/envs/anima/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...
./run_Amina_training.sh: line 48: 36300 Segmentation fault (core dumped) python qlora.py --dataset="chinese-vicuna" --dataset_format="alpaca-clean" #alpaca-clean has similar format to chinese training dataset
--learning_rate 0.0001 # QLoRA paper appendix B Table 9
--per_device_train_batch_size 1 # fix for fitting mem
--gradient_accumulation_steps 16 # QLoRA paper appendix B Table 9
--max_steps 10000 # QLoRA paper appendix B Table 9, follow paper setting even though cn data is 690k much bigger than OASST1 9k, batch size considering accum
--model_name_or_path "timdettmers/guanaco-33b-merged" --source_max_len 512 # default setting in code, cn model 2048 too long
--target_max_len 512 # follow QLoRA paper appendix B Table 9
--eval_dataset_size 1 # mainly for testing, no need to be big
--do_eval --evaluation_strategy "steps" --eval_steps 200 # 10 for debug mode only, 200 for training
--output_dir $OUTPUT_PATH --report_to 'wandb' --sample_generate # test sample generation every once a while
--save_steps 200 # 20 for debug mode only, 200 for training
Hello,
A large part of the air_llm
code is a copy of the work I did during the Kaggle LLM Science exam competition, to fit a 70B llama model on T4 GPUs.
Could you add pointers to my contribution and mention :
Thank you,
Simon
能否更新一下,谢谢
deepspeed可以在全量参数微调时降低显存的要求。但具体套用的时候似乎会产生一些bug。比如
使用zero3时,无法正常使用reference_model产生logits
使用zero2时,部分参数会被重复计算梯度
Hello.
I am using an NVidia RTX a4000 (16GB GPU). I am using airllm 0.9.5 on Ubuntu 20.04.6, python version 3.11.6, and torch 2.1.1.
I used the test code from the airllm GitHub page:
from airllm import AirLLMLlama2
MAX_LENGTH = 128
# could use hugging face model repo id:
model = AirLLMLlama2("meta-llama/Llama-2-13b-chat-hf")
# or use model's local path...
#model = AirLLMLlama2("/home/ubuntu/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/s
napshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f")
input_text = [
'What is the capital of United States?',
#'I like',
]
input_tokens = model.tokenizer(input_text,
return_tensors="pt",
return_attention_mask=False,
truncation=True,
max_length=MAX_LENGTH,
padding=True)
generation_output = model.generate(
input_tokens['input_ids'].cuda(),
max_new_tokens=2,
use_cache=True,
return_dict_in_generate=True)
output = model.tokenizer.decode(generation_output.sequences[0])
print(output)
Here is the output after I run python air.py (air.py is the name of the script I used). The output is at the very end with only outputs "The" and exits.
README.md: 100%|████████████████████████████████████████████████████| 10.4k/10.4k [00:00<00:00, 8.76MB/s]
USE_POLICY.md: 100%|████████████████████████████████████████████████| 4.77k/4.77k [00:00<00:00, 5.98MB/s]
generation_config.json: 100%|████████████████████████████████████████████| 188/188 [00:00<00:00, 429kB/s]
.gitattributes: 100%|███████████████████████████████████████████████| 1.58k/1.58k [00:00<00:00, 4.20MB/s]
LICENSE.txt: 100%|██████████████████████████████████████████████████| 7.02k/7.02k [00:00<00:00, 15.4MB/s]
config.json: 100%|███████████████████████████████████████████████████████| 587/587 [00:00<00:00, 444kB/s]
model.safetensors.index.json: 100%|█████████████████████████████████| 33.4k/33.4k [00:00<00:00, 14.1MB/s]
pytorch_model.bin.index.json: 100%|█████████████████████████████████| 33.4k/33.4k [00:00<00:00, 1.14MB/s]
Responsible-Use-Guide.pdf: 100%|█████████████████████████████████████| 1.25M/1.25M [00:01<00:00, 808kB/s]
special_tokens_map.json: 100%|███████████████████████████████████████████| 414/414 [00:00<00:00, 888kB/s]
tokenizer.model: 100%|█████████████████████████████████████████████████| 500k/500k [00:00<00:00, 612kB/s]
tokenizer_config.json: 100%|████████████████████████████████████████| 1.62k/1.62k [00:00<00:00, 3.52MB/s]
tokenizer.json: 100%|████████████████████████████████████████████████| 1.84M/1.84M [00:06<00:00, 269kB/s]
model-00003-of-00003.safetensors: 100%|█████████████████████████████| 6.18G/6.18G [52:08<00:00, 1.98MB/s]
pytorch_model-00003-of-00003.bin: 100%|█████████████████████████████| 6.18G/6.18G [52:23<00:00, 1.97MB/s]
pytorch_model-00002-of-00003.bin: 100%|███████████████████████████| 9.90G/9.90G [1:12:28<00:00, 2.28MB/s]
model-00002-of-00003.safetensors: 100%|███████████████████████████| 9.90G/9.90G [1:13:00<00:00, 2.26MB/s]
model-00001-of-00003.safetensors: 100%|███████████████████████████| 9.95G/9.95G [1:13:43<00:00, 2.25MB/s]
pytorch_model-00001-of-00003.bin: 100%|███████████████████████████| 9.95G/9.95G [1:13:56<00:00, 2.24MB/s]
Fetching 19 files: 100%|██████████████████████████████████████████████| 19/19 [1:13:57<00:00, 233.55s/it]
0%| | 0/43 [00:00<?, ?it/s]Loading shard 1/301-of-00003.bin: 95%|█████████████████████████▌ | 9.44G/9.95G [1:12:27<03:33, 2.40MB/s]
saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.embed_tokens.safetensors3:41<00:29, 5.66MB/s]
2%|█▌
2%|█▌ | 1/43 [00:29<20:38, 29.50s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.0.safetensors
5%|███▏ | 2/43 [00:31<09:08, 13.37s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.1.safetensors
7%|████▊ | 3/43 [00:33<05:17, 7.94s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.2.safetensors
9%|██████▍ | 4/43 [00:34<03:29, 5.38s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.3.safetensors
12%|████████ | 5/43 [00:36<02:32, 4.02s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.4.safetensors
14%|█████████▋ | 6/43 [00:37<01:58, 3.21s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.5.safetensors
16%|███████████▏ | 7/43 [00:39<01:37, 2.71s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.6.safetensors
19%|████████████▊ | 8/43 [00:41<01:23, 2.39s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.7.safetensors
21%|██████████████▍ | 9/43 [00:42<01:12, 2.13s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.8.safetensors
23%|███████████████▊ | 10/43 [00:44<01:04, 1.95s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.9.safetensors
26%|█████████████████▍ | 11/43 [00:45<00:57, 1.80s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.10.safetensors
28%|██████████████████▉ | 12/43 [00:47<00:54, 1.75s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.11.safetensors
30%|████████████████████▌ | 13/43 [00:48<00:51, 1.71s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.12.safetensors
33%|██████████████████████▏ | 14/43 [00:50<00:50, 1.73s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.13.safetensors
35%|███████████████████████▋ | 15/43 [00:52<00:48, 1.74s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.14.safetensors
37%|█████████████████████████▎ | 16/43 [00:54<00:45, 1.69s/it]Loading shard 2/3
saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.15.safetensors
40%|██████████████████████████▉ | 17/43 [01:45<07:15, 16.73s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.16.safetensors
42%|████████████████████████████▍ | 18/43 [01:47<05:06, 12.25s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.17.safetensors
44%|██████████████████████████████ | 19/43 [01:49<03:37, 9.05s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.18.safetensors
47%|███████████████████████████████▋ | 20/43 [01:50<02:36, 6.82s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.19.safetensors
49%|█████████████████████████████████▏ | 21/43 [01:52<01:54, 5.21s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.20.safetensors
51%|██████████████████████████████████▊ | 22/43 [01:54<01:27, 4.16s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.21.safetensors
53%|████████████████████████████████████▎ | 23/43 [01:55<01:07, 3.39s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.22.safetensors
56%|█████████████████████████████████████▉ | 24/43 [01:57<00:53, 2.84s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.23.safetensors
58%|███████████████████████████████████████▌ | 25/43 [01:58<00:44, 2.46s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.24.safetensors
60%|█████████████████████████████████████████ | 26/43 [02:00<00:37, 2.20s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.25.safetensors
63%|██████████████████████████████████████████▋ | 27/43 [02:03<00:39, 2.46s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.26.safetensors
65%|████████████████████████████████████████████▎ | 28/43 [02:05<00:33, 2.22s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.27.safetensors
67%|█████████████████████████████████████████████▊ | 29/43 [02:06<00:29, 2.09s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.28.safetensors
70%|███████████████████████████████████████████████▍ | 30/43 [02:11<00:35, 2.72s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.29.safetensors
72%|█████████████████████████████████████████████████ | 31/43 [02:29<01:30, 7.54s/it]Loading shard 3/3
saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.30.safetensors
74%|██████████████████████████████████████████████████▌ | 32/43 [03:23<03:55, 21.39s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.31.safetensors
77%|████████████████████████████████████████████████████▏ | 33/43 [03:25<02:34, 15.47s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.32.safetensors
79%|█████████████████████████████████████████████████████▊ | 34/43 [03:26<01:41, 11.30s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.33.safetensors
81%|███████████████████████████████████████████████████████▎ | 35/43 [03:28<01:06, 8.37s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.34.safetensors
84%|████████████████████████████████████████████████████████▉ | 36/43 [03:29<00:44, 6.36s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.35.safetensors
86%|██████████████████████████████████████████████████████████▌ | 37/43 [03:31<00:29, 4.92s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.36.safetensors
88%|████████████████████████████████████████████████████████████ | 38/43 [03:33<00:19, 3.92s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.37.safetensors
91%|█████████████████████████████████████████████████████████████▋ | 39/43 [03:35<00:13, 3.49s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.38.safetensors
93%|███████████████████████████████████████████████████████████████▎ | 40/43 [03:39<00:11, 3.75s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.39.safetensors
95%|████████████████████████████████████████████████████████████████▊ | 41/43 [03:41<00:06, 3.14s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.norm.safetensors
98%|██████████████████████████████████████████████████████████████████▍ | 42/43 [03:41<00:02, 2.23s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/lm_head.safetensors
100%|████████████████████████████████████████████████████████████████████| 43/43 [03:42<00:00, 5.18s/it]
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
cuda:0: 2%|█▍ | 1/43 [00:04<03:02, 4.34s/it]cuda:0: 100%|████████████████████████████████████████████████████████████| 43/43 [01:55<00:00, 2.69s/it]
returning kvcache size: torch.Size([1, 40, 9, 128])
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
cuda:0: 100%|████████████████████████████████████████████████████████████| 43/43 [00:24<00:00, 1.73it/s]
returning kvcache size: torch.Size([1, 40, 10, 128])
<s> What is the capital of United States?
The
^^^^ The output above is the question and then "The" and the program exits.
Do you know why this may be happening? Please let me know if you need more information.
请问如何处理这个错误:f"{hf_cache_path}/pytorch_model.bin.index.json should exists."。
主要是网速有点慢,下载有点费力,直接问下能不能用可能会好一点。
抱歉有点外行。主要是看readme.md没有给出具体显存运行情况。
AirLLM大概慢多少呢
您好,这是个非常好的工作!但我inference阶段:
generate_ids = model.generate(**inputs, max_new_tokens=30)
时遇到报错:
"addmm_impl_cpu_" not implemented for 'Half'.
这边感觉应该是peft和transformers版本问题?我这边使用的版本如下:
transformers:4.31.0.dev0
peft:0.4.0.dev0
想问下您那边的transformers和peft版本是?
非常感谢!
首先,非常喜欢作者的项目!
如题,希望作者能提供一个量化后的demo,以支持在8卡V100上进行微调,感谢🙏
很棒的工作!
单卡跑 33B 的模型,很让人兴奋,请问有计划开源并行训练的代码和运行脚本吗?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.