Git Product home page Git Product logo

anima's People

Contributors

eltociear avatar hiemal avatar likeslab avatar lyogavin avatar naozumi520 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

anima's Issues

训练参数疑问

RLHF训练只有100个step会不会数据学习不够重复,更多step会更好吗

max() arg is an empty sequence

from airllm import AirLLMLlama2

MAX_LENGTH = 128
# could use hugging face model repo id:
#model = AirLLMLlama2("garage-bAInd/Platypus2-70B-instruct")
model = AirLLMLlama2("Qwen-14B-Chat",
                     compression='4bit' # specify '8bit' for 8-bit block-wise quantization "Yi-34B-Chat",
                    )
# or use model's local path...
#model = AirLLMLlama2("/home/ubuntu/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f")

input_text = [
        'What is the capital of United States?',
        #'I like',
    ]

input_tokens = model.tokenizer(input_text,
    return_tensors="pt", 
    return_attention_mask=False, 
    truncation=True, 
    max_length=MAX_LENGTH, 
    padding=True)
           
generation_output = model.generate(
    input_tokens['input_ids'].cuda(), 
    max_new_tokens=20,
    use_cache=True,
    return_dict_in_generate=True)

output = model.tokenizer.decode(generation_output.sequences[0])

print(output)

bitsandbytes installed
bitsandbytes installed
bitsandbytes installed
0%| | 0/3 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/luhao/air22.py", line 6, in
model = AirLLMLlama2("Qwen-14B-Chat",
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/airllm/airllm.py", line 75, in init
self.model_local_path, self.checkpoint_path = find_or_create_local_splitted_path(model_local_path_or_repo_id,
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/airllm/utils.py", line 276, in find_or_create_local_splitted_path
return Path(model_local_path_or_repo_id), split_and_save_layers(model_local_path_or_repo_id, layer_shards_saving_path,
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/airllm/utils.py", line 220, in split_and_save_layers
if max(shards) > shard:
ValueError: max() arg is an empty sequence

关于训练用时

请问您方便透露,在H100 / A100上每个 step 的平均用时,以及 10000 step 共训练了多久吗?

如何支持超长文本的训练?

我的理解是 用来训练超长文本的策略优化都在 modeling_flash_llama文件里 ,但是训练的 longer_training 文件中没有体现modeling_flash_llama里的内容啊 模型结构加载的还是原来的结构 没有替换成 modeling_flash_llama里改的结构 这个地方有些不太理解,希望解答一下,谢谢!

低消耗微调和推理这个太重要了

我对你们这个模型挺感兴趣的。
请问有详细的个性化微调、推理的技术交流或者支持吗?交流社区什么的
推理的性能怎么样?
训练和推理的硬件配置建议怎么样?
另外,可以用vllm来推动吗?

使用lyogavin/Anima33B-DPO-Belle-1k-merged模型推断时大量重复

非常感谢您在DPO上的工作!我在使用lyogavin/Anima33B-DPO-Belle-1k-merged模型做推断的时候,产生了结尾重复现象(在推断的时候尝试了3种模板,但都很容易有重复现象),希望帮忙解答一下问题所在。下面会贴出我使用的推断样例、推断代码。

一、推断样例
因为不确定使用DPO模型推断的模版是什么,所以在推断的时候尝试了三种模板:1.直接提问;2.参考DPO训练时的模板;3.test_cn_dataset_lenghts.py文件中的source_template。

输入:世界上最长的河流是什么?
输出:世界上最长的河是尼罗河。它从埃及到埃塞俄比亚流经了非洲大陆,长约6650公里。它是世界上最长、最宽、最深、最漫长、最湿润、最有生态产生力的河。它汇集了埃及、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比


输入:\n\nHuman: 世界上最长的河流是什么?\n\nAssistant:
输出:世界上最长的河流是尼罗河。它从非洲的乌吉亚高原开始,流经非洲和欧洲,最终流入埃及的瓦丽纳盆地。它的长度约为6650公里。尼罗河汇集了非洲和欧洲的多个河流,汇集了很多水资源,是非洲和欧洲最重要的河流之一。尼罗河汇集了埃及、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞�


输入:Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n世界上最长的河流是什么?\n\n### Response:
输出:世界上最长的河流是尼罗河。它从埃及的狮子山源头开始,流经埃及、苏丹、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、��

二、推断代码

# imports
import torch
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from peft import PeftModel
from transformers import GenerationConfig, LlamaForCausalLM, LlamaTokenizer
import json
from tqdm import tqdm

# create tokenizer
base_model = "Anima33B-DPO-Belle-1k-merged"
tokenizer = LlamaTokenizer.from_pretrained(base_model)

# base model
model = LlamaForCausalLM.from_pretrained(
        base_model,
        torch_dtype=torch.float16,
        device_map="auto",
    )

model.eval()

prompt = input("")
inputs = tokenizer(prompt, return_tensors="pt")
generate_ids = model.generate(**inputs, max_new_tokens=512)

print(tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0])

fine-tuning QLoRA on other base models

Thank you for your work! I wonder if there is any experience with fine-tuning QLoRA on other base models? I'm not sure if most of the base models on Huggingface would perform well? Especially for chinese.

Adapter模型的合并

这是一个非常nice的工作。
这里我有个小问题想请教一下:
如题,Anima33B的adapter model是和原始的LLama合并后得到Anima33B merged嘛

Does not deliver the correct result

My company is doing performance optimizations, so I follow your work closely, to learn.

Your example returns:

What is the capital of United States?
W

Ubuntu 22.04, NVidia 4090, Cuda 12.3

(img-caption-py3.10) volker@power:~/workspace/PYTHON/img_caption/img_caption$ python3 llama2.py
Fetching 25 files: 100%|████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 499321.90it/s]
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
cuda:0: 100%|███████████████████████████████████████████████████████████████████████████████████| 83/83 [01:01<00:00,  1.35it/s]
returning kvcache size: torch.Size([1, 8, 9, 128])
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
cuda:0: 100%|███████████████████████████████████████████████████████████████████████████████████| 83/83 [01:01<00:00,  1.35it/s]
returning kvcache size: torch.Size([1, 8, 10, 128])
> /home/volker/workspace/PYTHON/img_caption/img_caption/llama2.py(30)<module>()
-> for idx, answer in enumerate(generation_output.sequences):
(Pdb) generation_output.sequences
tensor([[    1,  1724,   338,   278,  7483,   310,  3303,  3900, 29973,    13,
         29956]], device='cuda:0')
(Pdb) c
<s> What is the capital of United States?
W

More clever batching of layers

Hello this is an awesome project, I replicated it on Modal Labs on a small T4 GPU.

The problem I see now is that by loading one layer per time, you are not maximizing the GPU VRAM usage, for instance in this case it used only 1.6 GB of VRAM, I guess it is the size of one layer.

image

Would it be possible instead to load N layers with a configuration parameter?

Code example here: https://gist.github.com/priamai/61aa332c42b89f518dcf134c38dd593d

more examples

Hello,

A medium article gave the information that we can use four different ways to optimize the model

  1. Layer wise Inteference
  2. Single layer optimization-->Flash Attention
  3. Model File Sharding
  4. Meta Device

Are there examples to show each of these methods?
Thanks.

Does'n t work on Apple M1/M2. AssertionError: Torch not compiled with CUDA enabled.

Problem

Hello, I just installed AirLLM as mentioned in the Medium post.

Environment

  • MacBook M2
  • MacOS Ventura 13.5.2
  • Python 3.11.4

How to reproduce

pip install airllm

inference.py

from airllm import AirLLMLlama2

MAX_LENGTH = 128
# could use hugging face model repo id:
model = AirLLMLlama2("garage-bAInd/Platypus2-70B-instruct")

# or use model's local path...
#model = AirLLMLlama2("/home/ubuntu/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f")

input_text = [
        'What is the capital of United States?',
    ]

input_tokens = model.tokenizer(input_text,
    return_tensors="pt", 
    return_attention_mask=False, 
    truncation=True, 
    max_length=MAX_LENGTH, 
    padding=True)
           
generation_output = model.generate(
    input_tokens['input_ids'].cuda(), 
    max_new_tokens=20,
    use_cache=True,
    return_dict_in_generate=True)

output = model.tokenizer.decode(generation_output.sequences[0])

print(output)

Result

AssertionError: Torch not compiled with CUDA enabled

Logs

Here is result:

(base) andrey@m2 current % python ./inference.py
Downloading README.md: 100%|███████████████████████████████████████████████████████████████████| 5.15k/5.15k [00:00<00:00, 10.6MB/s]
Downloading Best_Platty_small.jpeg: 100%|██████████████████████████████████████████████████████| 7.35k/7.35k [00:00<00:00, 23.3MB/s]
Downloading generation_config.json: 100%|██████████████████████████████████████████████████████████| 154/154 [00:00<00:00, 2.27MB/s]
Downloading config.json: 100%|█████████████████████████████████████████████████████████████████████| 632/632 [00:00<00:00, 11.5MB/s]
Downloading .gitattributes: 100%|██████████████████████████████████████████████████████████████| 1.52k/1.52k [00:00<00:00, 29.1MB/s]
Downloading (…)l-00006-of-00015.bin: 100%|█████████████████████████████████████████████████████| 9.80G/9.80G [22:08<00:00, 7.38MB/s]
Downloading (…)l-00007-of-00015.bin: 100%|█████████████████████████████████████████████████████| 9.97G/9.97G [22:42<00:00, 7.31MB/s]
Downloading (…)l-00001-of-00015.bin: 100%|█████████████████████████████████████████████████████| 9.85G/9.85G [23:20<00:00, 7.04MB/s]
Downloading (…)l-00002-of-00015.bin: 100%|█████████████████████████████████████████████████████| 9.80G/9.80G [28:15<00:00, 5.78MB/s]
Downloading (…)l-00008-of-00015.bin: 100%|█████████████████████████████████████████████████████| 9.80G/9.80G [28:54<00:00, 5.65MB/s]
Downloading (…)l-00003-of-00015.bin: 100%|█████████████████████████████████████████████████████| 9.97G/9.97G [29:04<00:00, 5.71MB/s]                   | 210M/9.80G [00:35<28:43, 5.56MB/s]
Downloading (…)l-00004-of-00015.bin: 100%|█████████████████████████████████████████████████████| 9.80G/9.80G [30:55<00:00, 5.28MB/s]                   | 283M/9.80G [00:45<23:49, 6.65MB/s]
Downloading (…)l-00005-of-00015.bin: 100%|█████████████████████████████████████████████████████| 9.80G/9.80G [31:52<00:00, 5.12MB/s]                   | 944M/9.80G [02:36<25:19, 5.83MB/s]
Downloading (…)model.bin.index.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 66.7k/66.7k [00:00<00:00, 468kB/s]
Downloading (…)cial_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 414/414 [00:00<00:00, 3.03MB/s]
Downloading tokenizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.84M/1.84M [00:01<00:00, 1.56MB/s]
Downloading tokenizer.model: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 1.15MB/s]
Downloading tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 698/698 [00:00<00:00, 3.92MB/s]
Downloading (…)l-00015-of-00015.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 524M/524M [01:27<00:00, 6.02MB/s]
Downloading (…)l-00009-of-00015.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.80G/9.80G [19:29<00:00, 8.38MB/s]
Downloading (…)l-00010-of-00015.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.80G/9.80G [19:19<00:00, 8.45MB/s]
Downloading (…)l-00011-of-00015.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.97G/9.97G [21:02<00:00, 7.90MB/s]
Downloading (…)l-00014-of-00015.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.50G/9.50G [18:10<00:00, 8.71MB/s]
Downloading (…)l-00012-of-00015.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.80G/9.80G [19:16<00:00, 8.47MB/s]
Downloading (…)l-00013-of-00015.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.80G/9.80G [18:41<00:00, 8.74MB/s]
Fetching 25 files: 100%|███████████████████████████████████████████████████████████████████████████| 25/25 [47:39<00:00, 114.38s/it]██████████████████| 9.50G/9.50G [18:10<00:00, 12.0MB/s]
  0%|                                                                                                                                                               | 0/83 [00:00<?, ?it/s]Loading shard 1/150013-of-00015.bin:  71%|█████████████████████████████████████████████████████████████████████████████▏                              | 7.00G/9.80G [15:26<04:04, 11.4MB/s]
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.embed_tokens.safetensorsownloading (…)l-00013-of-00015.bin:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████████████▊ | 9.69G/9.80G [18:38<00:04, 23.8MB/s]
  1%|█▊                                                                                                                                                     | 1/83 [00:01<02:36,  1.91s/it]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.0.safetensors
  2%|███▋                                                                                                                                                   | 2/83 [00:02<01:23,  1.03s/it]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.1.safetensors
  4%|█████▍                                                                                                                                                 | 3/83 [00:02<00:58,  1.36it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.2.safetensors
  5%|███████▎                                                                                                                                               | 4/83 [00:03<00:47,  1.66it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.3.safetensors
  6%|█████████                                                                                                                                              | 5/83 [00:03<00:43,  1.78it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.4.safetensors
  7%|██████████▉                                                                                                                                            | 6/83 [00:03<00:38,  1.99it/s]Loading shard 2/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.5.safetensors
  8%|████████████▋                                                                                                                                          | 7/83 [00:05<01:15,  1.01it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.6.safetensors
 10%|██████████████▌                                                                                                                                        | 8/83 [00:06<00:59,  1.26it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.7.safetensors
 11%|████████████████▎                                                                                                                                      | 9/83 [00:06<00:49,  1.49it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.8.safetensors
 12%|██████████████████                                                                                                                                    | 10/83 [00:07<00:42,  1.71it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.9.safetensors
 13%|███████████████████▉                                                                                                                                  | 11/83 [00:07<00:38,  1.85it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.10.safetensors
 14%|█████████████████████▋                                                                                                                                | 12/83 [00:08<00:36,  1.96it/s]Loading shard 3/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.11.safetensors
 16%|███████████████████████▍                                                                                                                              | 13/83 [00:10<01:08,  1.02it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.12.safetensors
 17%|█████████████████████████▎                                                                                                                            | 14/83 [00:10<00:55,  1.25it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.13.safetensors
 18%|███████████████████████████                                                                                                                           | 15/83 [00:10<00:46,  1.47it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.14.safetensors
 19%|████████████████████████████▉                                                                                                                         | 16/83 [00:11<00:39,  1.69it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.15.safetensors
 20%|██████████████████████████████▋                                                                                                                       | 17/83 [00:11<00:36,  1.82it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.16.safetensors
 22%|████████████████████████████████▌                                                                                                                     | 18/83 [00:12<00:32,  2.00it/s]Loading shard 4/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.17.safetensors
 23%|██████████████████████████████████▎                                                                                                                   | 19/83 [00:13<00:58,  1.09it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.18.safetensors
 24%|████████████████████████████████████▏                                                                                                                 | 20/83 [00:14<00:49,  1.27it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.19.safetensors
 25%|█████████████████████████████████████▉                                                                                                                | 21/83 [00:14<00:42,  1.47it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.20.safetensors
 27%|███████████████████████████████████████▊                                                                                                              | 22/83 [00:15<00:36,  1.68it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.21.safetensors
 28%|█████████████████████████████████████████▌                                                                                                            | 23/83 [00:15<00:31,  1.88it/s]Loading shard 5/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.22.safetensors
 29%|███████████████████████████████████████████▎                                                                                                          | 24/83 [00:17<00:56,  1.04it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.23.safetensors
 30%|█████████████████████████████████████████████▏                                                                                                        | 25/83 [00:18<00:46,  1.25it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.24.safetensors
 31%|██████████████████████████████████████████████▉                                                                                                       | 26/83 [00:18<00:38,  1.47it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.25.safetensors
 33%|████████████████████████████████████████████████▊                                                                                                     | 27/83 [00:18<00:34,  1.62it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.26.safetensors
 34%|██████████████████████████████████████████████████▌                                                                                                   | 28/83 [00:19<00:30,  1.82it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.27.safetensors
 35%|████████████████████████████████████████████████████▍                                                                                                 | 29/83 [00:19<00:27,  1.94it/s]Loading shard 6/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.28.safetensors
 36%|██████████████████████████████████████████████████████▏                                                                                               | 30/83 [00:21<00:50,  1.06it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.29.safetensors
 37%|████████████████████████████████████████████████████████                                                                                              | 31/83 [00:22<00:41,  1.26it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.30.safetensors
 39%|█████████████████████████████████████████████████████████▊                                                                                            | 32/83 [00:22<00:34,  1.49it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.31.safetensors
 40%|███████████████████████████████████████████████████████████▋                                                                                          | 33/83 [00:22<00:30,  1.66it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.32.safetensors
 41%|█████████████████████████████████████████████████████████████▍                                                                                        | 34/83 [00:23<00:26,  1.86it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.33.safetensors
 42%|███████████████████████████████████████████████████████████████▎                                                                                      | 35/83 [00:23<00:24,  1.98it/s]Loading shard 7/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.34.safetensors
 43%|█████████████████████████████████████████████████████████████████                                                                                     | 36/83 [00:25<00:45,  1.04it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.35.safetensors
 45%|██████████████████████████████████████████████████████████████████▊                                                                                   | 37/83 [00:26<00:37,  1.24it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.36.safetensors
 46%|████████████████████████████████████████████████████████████████████▋                                                                                 | 38/83 [00:26<00:30,  1.47it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.37.safetensors
 47%|██████████████████████████████████████████████████████████████████████▍                                                                               | 39/83 [00:27<00:25,  1.69it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.38.safetensors
 48%|████████████████████████████████████████████████████████████████████████▎                                                                             | 40/83 [00:27<00:23,  1.86it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.39.safetensors
 49%|██████████████████████████████████████████████████████████████████████████                                                                            | 41/83 [00:27<00:20,  2.04it/s]Loading shard 8/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.40.safetensors
 51%|███████████████████████████████████████████████████████████████████████████▉                                                                          | 42/83 [00:29<00:38,  1.06it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.41.safetensors
 52%|█████████████████████████████████████████████████████████████████████████████▋                                                                        | 43/83 [00:30<00:31,  1.29it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.42.safetensors
 53%|███████████████████████████████████████████████████████████████████████████████▌                                                                      | 44/83 [00:30<00:25,  1.52it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.43.safetensors
 54%|█████████████████████████████████████████████████████████████████████████████████▎                                                                    | 45/83 [00:30<00:21,  1.74it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.44.safetensors
 55%|███████████████████████████████████████████████████████████████████████████████████▏                                                                  | 46/83 [00:31<00:19,  1.86it/s]Loading shard 9/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.45.safetensors
 57%|████████████████████████████████████████████████████████████████████████████████████▉                                                                 | 47/83 [00:33<00:34,  1.06it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.46.safetensors
 58%|██████████████████████████████████████████████████████████████████████████████████████▋                                                               | 48/83 [00:33<00:27,  1.28it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.47.safetensors
 59%|████████████████████████████████████████████████████████████████████████████████████████▌                                                             | 49/83 [00:34<00:22,  1.51it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.48.safetensors
 60%|██████████████████████████████████████████████████████████████████████████████████████████▎                                                           | 50/83 [00:34<00:20,  1.65it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.49.safetensors
 61%|████████████████████████████████████████████████████████████████████████████████████████████▏                                                         | 51/83 [00:35<00:17,  1.81it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.50.safetensors
 63%|█████████████████████████████████████████████████████████████████████████████████████████████▉                                                        | 52/83 [00:35<00:15,  1.99it/s]Loading shard 10/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.51.safetensors
 64%|███████████████████████████████████████████████████████████████████████████████████████████████▊                                                      | 53/83 [00:37<00:28,  1.06it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.52.safetensors
 65%|█████████████████████████████████████████████████████████████████████████████████████████████████▌                                                    | 54/83 [00:37<00:22,  1.27it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.53.safetensors
 66%|███████████████████████████████████████████████████████████████████████████████████████████████████▍                                                  | 55/83 [00:38<00:18,  1.50it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.54.safetensors
 67%|█████████████████████████████████████████████████████████████████████████████████████████████████████▏                                                | 56/83 [00:38<00:15,  1.72it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.55.safetensors
 69%|███████████████████████████████████████████████████████████████████████████████████████████████████████                                               | 57/83 [00:38<00:13,  1.91it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.56.safetensors
 70%|████████████████████████████████████████████████████████████████████████████████████████████████████████▊                                             | 58/83 [00:39<00:12,  2.07it/s]Loading shard 11/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.57.safetensors
 71%|██████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                           | 59/83 [00:41<00:22,  1.07it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.58.safetensors
 72%|████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                         | 60/83 [00:41<00:18,  1.27it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.59.safetensors
 73%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████▏                                       | 61/83 [00:42<00:14,  1.50it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.60.safetensors
 75%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                      | 62/83 [00:42<00:12,  1.71it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.61.safetensors
 76%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊                                    | 63/83 [00:42<00:10,  1.88it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.62.safetensors
 77%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                  | 64/83 [00:43<00:09,  2.05it/s]Loading shard 12/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.63.safetensors
 78%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                | 65/83 [00:45<00:17,  1.05it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.64.safetensors
 80%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                              | 66/83 [00:45<00:13,  1.27it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.65.safetensors
 81%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                             | 67/83 [00:46<00:10,  1.50it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.66.safetensors
 82%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉                           | 68/83 [00:46<00:08,  1.72it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.67.safetensors
 83%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                         | 69/83 [00:47<00:07,  1.84it/s]Loading shard 13/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.68.safetensors
 84%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌                       | 70/83 [00:48<00:12,  1.04it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.69.safetensors
 86%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                     | 71/83 [00:49<00:09,  1.26it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.70.safetensors
 87%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                    | 72/83 [00:49<00:07,  1.49it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.71.safetensors
 88%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉                  | 73/83 [00:50<00:05,  1.67it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.72.safetensors
 89%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                | 74/83 [00:50<00:04,  1.86it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.73.safetensors
 90%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌              | 75/83 [00:50<00:03,  2.03it/s]Loading shard 14/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.74.safetensors
 92%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎            | 76/83 [00:52<00:06,  1.08it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.75.safetensors
 93%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏          | 77/83 [00:53<00:04,  1.26it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.76.safetensors
 94%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉         | 78/83 [00:53<00:03,  1.47it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.77.safetensors
 95%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊       | 79/83 [00:54<00:02,  1.67it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.78.safetensors
 96%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌     | 80/83 [00:54<00:01,  1.83it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.79.safetensors
 98%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍   | 81/83 [00:55<00:00,  2.00it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.norm.safetensors
Loading shard 15/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/lm_head.safetensors
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 83/83 [00:55<00:00,  1.50it/s]
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
Traceback (most recent call last):
  File "/Users/andrey/air_llm/current/./inference.py", line 5, in <module>
    model = AirLLMLlama2("garage-bAInd/Platypus2-70B-instruct")
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/andrey/miniconda3/lib/python3.11/site-packages/airllm/airllm.py", line 184, in __init__
    self.init_model()
  File "/Users/andrey/miniconda3/lib/python3.11/site-packages/airllm/airllm.py", line 205, in init_model
    set_module_tensor_to_device(self.model, buffer_name, self.running_device, value=buffer,
  File "/Users/andrey/miniconda3/lib/python3.11/site-packages/accelerate/utils/modeling.py", line 317, in set_module_tensor_to_device
    new_value = value.to(device)
                ^^^^^^^^^^^^^^^^
  File "/Users/andrey/miniconda3/lib/python3.11/site-packages/torch/cuda/__init__.py", line 289, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

The model precision used for training with qlora

Hey @lyogavin ,thanks a lot for sharing this awesome job!
I've got a silly question to ask. When using qlora to train a model, is it possible to load either the normal or quantized base model? Both seem to train fine, but it appears that loading the unquantized base model trains a bit faster. I was wondering if there are any differences between the models trained in these two modes.

Typo in air_llm setup.py

Great work you are doing here!

   install_requires=[  # I get to this in a second
        'tqdm',
        'torch',
        'transformers',
        'accelerate',
        'safetensors',
        'optimum',
        'huggingface_hub'
        'scipy',
        #'bitsandbytes' set it to optional to support fallback when not installable
    ],

->>

        'huggingface_hub',

Cheers,
Volker

README中的例子推理失败

在按照Readme中的例子尝试推理的时候,根据报错进行了少数修改:

# base model
model = LlamaForCausalLM.from_pretrained(
        base_model,
        torch_dtype=torch.float16,
        device_map="auto",
        offload_state_dict = True,
        offload_folder='./offload',
    )

# LORA PEFT adapters
adapter_model = "lyogavin/Anima33B"

model = PeftModel.from_pretrained(
        model,
        adapter_model,
        #torch_dtype=torch.float16,
        device_map={"":0},
    )
model.eval()

其中在LlamaForCausalLM调用过程中,我添加了offload_folder参数(根据huggingface文档);
PeftModel调用过程中,device_map需要设为cuda设备(即0)否则会报OOM;
但是即使经过如此修改后仍不能运行:

accelerate/utils/modeling.py:154
ValueError: weight is on the meta device, we need a `value` to put in on 0.

我想知道是否是device_map参数的问题,以及如何调整?
已经搜索过没有类似问题。

我的配置是

  • 硬件 7700X/4070Ti/DDR5-32G
  • 操作系统 WSL2-Ubuntu22.04
  • Python 3.10.10
  • torch 2.0.1 with cuda11.8
  • transformers 4.31.0.dev0

TypeError: not a string

跑inference代码的时候报错

CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda121.so...

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /srv/chat/Anima/infer.py:9 in <module>                                       │
│                                                                              │
│    6                                                                         │
│    7 # create tokenizer                                                      │
│    8 base_model = "timdettmers/guanaco-33b-merged"                           │
│ ❱  9 tokenizer = LlamaTokenizer.from_pretrained(base_model)                  │
│   10                                                                         │
│   11 # base model                                                            │
│   12 model = LlamaForCausalLM.from_pretrained(                               │
│                                                                              │
│ /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base │
│ .py:1825 in from_pretrained                                                  │
│                                                                              │
│   1822 │   │   │   else:                                                     │
│   1823 │   │   │   │   logger.info(f"loading file {file_path} from cache at  │
│   1824 │   │                                                                 │
│ ❱ 1825 │   │   return cls._from_pretrained(                                  │
│   1826 │   │   │   resolved_vocab_files,                                     │
│   1827 │   │   │   pretrained_model_name_or_path,                            │
│   1828 │   │   │   init_configuration,                                       │
│                                                                              │
│ /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base │
│ .py:1988 in _from_pretrained                                                 │
│                                                                              │
│   1985 │   │                                                                 │
│   1986 │   │   # Instantiate tokenizer.                                      │
│   1987 │   │   try:                                                          │
│ ❱ 1988 │   │   │   tokenizer = cls(*init_inputs, **init_kwargs)              │
│   1989 │   │   except OSError:                                               │
│   1990 │   │   │   raise OSError(                                            │
│   1991 │   │   │   │   "Unable to load vocabulary from file. "
│                                                                              │
│ /usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenizati │
│ on_llama.py:96 in __init__                                                   │
│                                                                              │
│    93 │   │   self.add_bos_token = add_bos_token                             │
│    94 │   │   self.add_eos_token = add_eos_token                             │
│    95 │   │   self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwa │
│ ❱  96 │   │   self.sp_model.Load(vocab_file)                                 │
│    97 │                                                                      │
│    98 │   def __getstate__(self):                                            │
│    99 │   │   state = self.__dict__.copy()                                   │
│                                                                              │
│ /usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py:905 in     │
│ Load                                                                         │
│                                                                              │
│    902 │   │   raise RuntimeError('model_file and model_proto must be exclus │
│    903 │     if model_proto:                                                 │
│    904 │   │   return self.LoadFromSerializedProto(model_proto)              │
│ ❱  905 │     return self.LoadFromFile(model_file)                            │
│    906                                                                       │
│    907                                                                       │
│    908 # Register SentencePieceProcessor in _sentencepiece:                  │
│                                                                              │
│ /usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py:310 in     │
│ LoadFromFile                                                                 │
│                                                                              │
│    307 │   │   return _sentencepiece.SentencePieceProcessor_serialized_model │
│    308 │                                                                     │
│    309 │   def LoadFromFile(self, arg):                                      │
│ ❱  310 │   │   return _sentencepiece.SentencePieceProcessor_LoadFromFile(sel │
│    311 │                                                                     │
│    312 │   def _EncodeAsIds(self, text, enable_sampling, nbest_size, alpha,  │
│    313 │   │   return _sentencepiece.SentencePieceProcessor__EncodeAsIds(sel │
╰──────────────────────────────────────────────────────────────────────────────╯
TypeError: not a string

这是什么原因?

谢谢

对生成认识不清

训练复现报错

++ echo 'START TIME: Mon Jun 19 19:22:10 CST 2023'
START TIME: Mon Jun 19 19:22:10 CST 2023
++ ROOT_DIR_BASE=/Anima/saved_models/qlora_cn
++ OUTPUT_PATH=/Anima/saved_models/qlora_cn/output_1687173730
++ mkdir -p /Anima/saved_models/qlora_cn/output_1687173730
++ python qlora.py --dataset=chinese-vicuna --dataset_format=alpaca-clean --learning_rate 0.0001 --per_device_train_batch_size 1 --gradient_accumulation_steps 16 --max_steps 10000 --model_name_or_path timdettmers/guanaco-33b-merged --source_max_len 512 --target_max_len
512 --eval_dataset_size 1 --do_eval --evaluation_strategy steps --eval_steps 200 --output_dir /Anima/saved_models/qlora_cn/output_1687173730 --report_to wandb --sample_generate --save_steps 200
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /root/miniconda3/envs/anima/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda121.so
/root/miniconda3/envs/anima/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:148: UserWarning: /root/miniconda3/envs/anima did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
/root/miniconda3/envs/anima/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:148: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/lcoal/cuda/lib64')}
warn(msg)
/root/miniconda3/envs/anima/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:148: UserWarning: /usr/lcoal/cuda/lib64: did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /root/miniconda3/envs/anima/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...
./run_Amina_training.sh: line 48: 36300 Segmentation fault (core dumped) python qlora.py --dataset="chinese-vicuna" --dataset_format="alpaca-clean" #alpaca-clean has similar format to chinese training dataset --learning_rate 0.0001 # QLoRA paper appendix B Table 9 --per_device_train_batch_size 1 # fix for fitting mem --gradient_accumulation_steps 16 # QLoRA paper appendix B Table 9 --max_steps 10000 # QLoRA paper appendix B Table 9, follow paper setting even though cn data is 690k much bigger than OASST1 9k, batch size considering accum --model_name_or_path "timdettmers/guanaco-33b-merged" --source_max_len 512 # default setting in code, cn model 2048 too long --target_max_len 512 # follow QLoRA paper appendix B Table 9 --eval_dataset_size 1 # mainly for testing, no need to be big --do_eval --evaluation_strategy "steps" --eval_steps 200 # 10 for debug mode only, 200 for training --output_dir $OUTPUT_PATH --report_to 'wandb' --sample_generate # test sample generation every once a while --save_steps 200 # 20 for debug mode only, 200 for training

能否套用deepspeed?

deepspeed可以在全量参数微调时降低显存的要求。但具体套用的时候似乎会产生一些bug。比如
使用zero3时,无法正常使用reference_model产生logits
使用zero2时,部分参数会被重复计算梯度

Printing one token in output

Hello.

I am using an NVidia RTX a4000 (16GB GPU). I am using airllm 0.9.5 on Ubuntu 20.04.6, python version 3.11.6, and torch 2.1.1.

I used the test code from the airllm GitHub page:

from airllm import AirLLMLlama2

MAX_LENGTH = 128
# could use hugging face model repo id:
model = AirLLMLlama2("meta-llama/Llama-2-13b-chat-hf")

# or use model's local path...
#model = AirLLMLlama2("/home/ubuntu/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/s
napshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f")

input_text = [
        'What is the capital of United States?',
        #'I like',
    ]

input_tokens = model.tokenizer(input_text,
    return_tensors="pt",
    return_attention_mask=False,
    truncation=True,
    max_length=MAX_LENGTH,
    padding=True)

generation_output = model.generate(
    input_tokens['input_ids'].cuda(),
    max_new_tokens=2,
    use_cache=True,
    return_dict_in_generate=True)

	 output = model.tokenizer.decode(generation_output.sequences[0])

print(output)

Here is the output after I run python air.py (air.py is the name of the script I used). The output is at the very end with only outputs "The" and exits.

README.md: 100%|████████████████████████████████████████████████████| 10.4k/10.4k [00:00<00:00, 8.76MB/s]
USE_POLICY.md: 100%|████████████████████████████████████████████████| 4.77k/4.77k [00:00<00:00, 5.98MB/s]
generation_config.json: 100%|████████████████████████████████████████████| 188/188 [00:00<00:00, 429kB/s]
.gitattributes: 100%|███████████████████████████████████████████████| 1.58k/1.58k [00:00<00:00, 4.20MB/s]
LICENSE.txt: 100%|██████████████████████████████████████████████████| 7.02k/7.02k [00:00<00:00, 15.4MB/s]
config.json: 100%|███████████████████████████████████████████████████████| 587/587 [00:00<00:00, 444kB/s]
model.safetensors.index.json: 100%|█████████████████████████████████| 33.4k/33.4k [00:00<00:00, 14.1MB/s]
pytorch_model.bin.index.json: 100%|█████████████████████████████████| 33.4k/33.4k [00:00<00:00, 1.14MB/s]
Responsible-Use-Guide.pdf: 100%|█████████████████████████████████████| 1.25M/1.25M [00:01<00:00, 808kB/s]
special_tokens_map.json: 100%|███████████████████████████████████████████| 414/414 [00:00<00:00, 888kB/s]
tokenizer.model: 100%|█████████████████████████████████████████████████| 500k/500k [00:00<00:00, 612kB/s]
tokenizer_config.json: 100%|████████████████████████████████████████| 1.62k/1.62k [00:00<00:00, 3.52MB/s]
tokenizer.json: 100%|████████████████████████████████████████████████| 1.84M/1.84M [00:06<00:00, 269kB/s]
model-00003-of-00003.safetensors: 100%|█████████████████████████████| 6.18G/6.18G [52:08<00:00, 1.98MB/s]
pytorch_model-00003-of-00003.bin: 100%|█████████████████████████████| 6.18G/6.18G [52:23<00:00, 1.97MB/s]
pytorch_model-00002-of-00003.bin: 100%|███████████████████████████| 9.90G/9.90G [1:12:28<00:00, 2.28MB/s]
model-00002-of-00003.safetensors: 100%|███████████████████████████| 9.90G/9.90G [1:13:00<00:00, 2.26MB/s]
model-00001-of-00003.safetensors: 100%|███████████████████████████| 9.95G/9.95G [1:13:43<00:00, 2.25MB/s]
pytorch_model-00001-of-00003.bin: 100%|███████████████████████████| 9.95G/9.95G [1:13:56<00:00, 2.24MB/s]
Fetching 19 files: 100%|██████████████████████████████████████████████| 19/19 [1:13:57<00:00, 233.55s/it]
  0%|                                                                             | 0/43 [00:00<?, ?it/s]Loading shard 1/301-of-00003.bin:  95%|█████████████████████████▌ | 9.44G/9.95G [1:12:27<03:33, 2.40MB/s]
saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.embed_tokens.safetensors3:41<00:29, 5.66MB/s]
  2%|█▌  
    2%|█▌                                                                   | 1/43 [00:29<20:38, 29.50s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.0.safetensors
  5%|███▏                                                                 | 2/43 [00:31<09:08, 13.37s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.1.safetensors
  7%|████▊                                                                | 3/43 [00:33<05:17,  7.94s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.2.safetensors
  9%|██████▍                                                              | 4/43 [00:34<03:29,  5.38s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.3.safetensors
 12%|████████                                                             | 5/43 [00:36<02:32,  4.02s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.4.safetensors
 14%|█████████▋                                                           | 6/43 [00:37<01:58,  3.21s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.5.safetensors
 16%|███████████▏                                                         | 7/43 [00:39<01:37,  2.71s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.6.safetensors
 19%|████████████▊                                                        | 8/43 [00:41<01:23,  2.39s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.7.safetensors
 21%|██████████████▍                                                      | 9/43 [00:42<01:12,  2.13s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.8.safetensors
 23%|███████████████▊                                                    | 10/43 [00:44<01:04,  1.95s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.9.safetensors
 26%|█████████████████▍                                                  | 11/43 [00:45<00:57,  1.80s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.10.safetensors
 28%|██████████████████▉                                                 | 12/43 [00:47<00:54,  1.75s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.11.safetensors
 30%|████████████████████▌                                               | 13/43 [00:48<00:51,  1.71s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.12.safetensors
 33%|██████████████████████▏                                             | 14/43 [00:50<00:50,  1.73s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.13.safetensors
 35%|███████████████████████▋                                            | 15/43 [00:52<00:48,  1.74s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.14.safetensors
 37%|█████████████████████████▎                                          | 16/43 [00:54<00:45,  1.69s/it]Loading shard 2/3
saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.15.safetensors
 40%|██████████████████████████▉                                         | 17/43 [01:45<07:15, 16.73s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.16.safetensors
 42%|████████████████████████████▍                                       | 18/43 [01:47<05:06, 12.25s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.17.safetensors
 44%|██████████████████████████████                                      | 19/43 [01:49<03:37,  9.05s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.18.safetensors
 47%|███████████████████████████████▋                                    | 20/43 [01:50<02:36,  6.82s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.19.safetensors
 49%|█████████████████████████████████▏                                  | 21/43 [01:52<01:54,  5.21s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.20.safetensors
 51%|██████████████████████████████████▊                                 | 22/43 [01:54<01:27,  4.16s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.21.safetensors
 53%|████████████████████████████████████▎                               | 23/43 [01:55<01:07,  3.39s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.22.safetensors
 56%|█████████████████████████████████████▉                              | 24/43 [01:57<00:53,  2.84s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.23.safetensors
 58%|███████████████████████████████████████▌                            | 25/43 [01:58<00:44,  2.46s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.24.safetensors
 60%|█████████████████████████████████████████                           | 26/43 [02:00<00:37,  2.20s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.25.safetensors
 63%|██████████████████████████████████████████▋                         | 27/43 [02:03<00:39,  2.46s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.26.safetensors
 65%|████████████████████████████████████████████▎                       | 28/43 [02:05<00:33,  2.22s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.27.safetensors
 67%|█████████████████████████████████████████████▊                      | 29/43 [02:06<00:29,  2.09s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.28.safetensors
 70%|███████████████████████████████████████████████▍                    | 30/43 [02:11<00:35,  2.72s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.29.safetensors
 72%|█████████████████████████████████████████████████                   | 31/43 [02:29<01:30,  7.54s/it]Loading shard 3/3
saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.30.safetensors
 74%|██████████████████████████████████████████████████▌                 | 32/43 [03:23<03:55, 21.39s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.31.safetensors
 77%|████████████████████████████████████████████████████▏               | 33/43 [03:25<02:34, 15.47s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.32.safetensors
 79%|█████████████████████████████████████████████████████▊              | 34/43 [03:26<01:41, 11.30s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.33.safetensors
 81%|███████████████████████████████████████████████████████▎            | 35/43 [03:28<01:06,  8.37s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.34.safetensors
 84%|████████████████████████████████████████████████████████▉           | 36/43 [03:29<00:44,  6.36s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.35.safetensors
 86%|██████████████████████████████████████████████████████████▌         | 37/43 [03:31<00:29,  4.92s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.36.safetensors
 88%|████████████████████████████████████████████████████████████        | 38/43 [03:33<00:19,  3.92s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.37.safetensors
 91%|█████████████████████████████████████████████████████████████▋      | 39/43 [03:35<00:13,  3.49s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.38.safetensors
 93%|███████████████████████████████████████████████████████████████▎    | 40/43 [03:39<00:11,  3.75s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.39.safetensors
 95%|████████████████████████████████████████████████████████████████▊   | 41/43 [03:41<00:06,  3.14s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.norm.safetensors
 98%|██████████████████████████████████████████████████████████████████▍ | 42/43 [03:41<00:02,  2.23s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/lm_head.safetensors
100%|████████████████████████████████████████████████████████████████████| 43/43 [03:42<00:00,  5.18s/it]
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.

cuda:0:   2%|█▍                                                           | 1/43 [00:04<03:02,  4.34s/it]cuda:0: 100%|████████████████████████████████████████████████████████████| 43/43 [01:55<00:00,  2.69s/it]
returning kvcache size: torch.Size([1, 40, 9, 128])
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
cuda:0: 100%|████████████████████████████████████████████████████████████| 43/43 [00:24<00:00,  1.73it/s]
returning kvcache size: torch.Size([1, 40, 10, 128])
<s> What is the capital of United States?
The

^^^^ The output above is the question and then "The" and the program exits.

Do you know why this may be happening? Please let me know if you need more information.

peft版本问题:"addmm_impl_cpu_" not implemented for 'Half'

您好,这是个非常好的工作!但我inference阶段:

generate_ids = model.generate(**inputs, max_new_tokens=30)

时遇到报错:

"addmm_impl_cpu_" not implemented for 'Half'.

这边感觉应该是peft和transformers版本问题?我这边使用的版本如下:

transformers:4.31.0.dev0
peft:0.4.0.dev0

想问下您那边的transformers和peft版本是?

非常感谢!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.