Git Product home page Git Product logo

anima's Introduction

Anima

Anima Logo

This is the first open source 33B Chinese LLM, we also support DPO alignment training and we have open source 100k context window. The latest update is AirLLM, a library helps you to infer 70B LLM from just single GPU with just 4GB memory.

第一个开源的基于QLoRA的33B中文大语言模型,支持了基于DPO的对齐训练。

我们也开源了100K输入窗口的开源模型Anima100K,基于Llama2,可商用。

最新开源了单卡跑70B模型的AirLLM。

Read this in English.

🔄 更新 Updates

[2024/04/20] AirLLM supports Llama3 natively already. Run Llama3 70B on 4GB single GPU.

AirLLM天然支持Llama3 70B。4GB显存运行Llama3 70B大模型。

[2024/03/07] Open source: Latte text2video Training - Train your own SORA!

最接近SORA的开源模型来了!训练你自己的SORA

[2023/11/17] Open source: AirLLM, inference 70B LLM with 4GB single GPU.

开源AirLLM,单卡4GB显存跑70B大模型,无需量化,无需模型压缩

[2023/09/06] open source 100K context window Llama2 based LLM

更新支持100k 上下文的基于Llama2的可商用大模型

[2023/06/29] Open source alignment training based on DPO+QLORA

更新基于DPO+QLoRA的Human Feedback训练

[2023/06/12] Open source the first 33B Chinese Large language model

开源了第一个基于QLoRA的中文33B大语言模型

AirLLM, inference 70B LLM with 4GB single GPU

AirLLM optimizes inference memory usage, allowing 70B large language models to run inference on a single 4GB GPU card. No quantization, distillation, pruning or other model compression techniques that would result in degraded model performance are needed.

AirLLM优化inference内存,4GB单卡GPU可以运行70B大语言模型推理。不需要任何损失模型性能的量化和蒸馏,剪枝等模型压缩。

Find out more Here

Train your own SORA: Open source: Latte text2video Training

Train your own SORA:

Check out here: https://github.com/lyogavin/train_your_own_sora

100K context length LLM

We released the new Anima open source 7B model, supporting an input window length of 100K! It’s based on LLama2, so available for commercial use!

With specifically curated long text question answering training data for the 100K input length, and a lot of memory optimizations, we enabled the LLama2 model to scale to 100K input length.

当输入长度支持100k,你甚至可以把整个知识库都放入Prompt交给模型。或者可以把一本书直接放到Prompt里边。再也不用各种费劲的向量化,文本分割。。。。

我们堆了各种最新的猛料:XEntropyPaged 8bit Adamw, LORA, Flashattention2,并且专门针对长输入对于training和Inference代码都做了修改定制,使得单卡100G就可以训练100k窗口。单卡40G就可以进行推理。

训练数据上,从几十种公开数据集中精选了专门针对长输入的30k~100k长度的长文本训练数据,专门针对100K输入对模型进行了训练。

Find out more Here

Generic badge

Anima 33B Chinese

We believe the future of AI will be fully open and democratized. AI should be a tool that’s accessible to everyone, instead of only the big monopolies(some of them have the term “open” in their names 😆 .). QLoRA might be an important step towards that future. We want to make some small contribution to the historical process of democratization of AI, we are open sourcing the 33B QLoRA model we trained: all the model parameters, code, datasets and evaluations are opened! 🤗

因此我们认为QLoRA 的工作很重要,重要到可能是个Game Changer。通过QLoRA的优化方法,第一次让33B规模的模型可以比较**化的,比较低成本的finetune训练,并且普及使用。我们认为33B模型既可以发挥大规模模型的比较强的reasoning能力,又可以针对私有业务领域数据进行灵活的finetune训练提升对于LLM的控制力。

Find out more Here

Generic badge

Alignment training based on DPO and QLoRA

We open sourced latest alignment techinque - DPO.

Anima模型又开源了基于QLoRA的最新的DPO技术。

DPO是最新的最高效的RLHF训练方法。RLHF一直是生成式AI训练的老大难问题,也被认为是OpenAI的压箱底独家秘笈。DPO技术改变了这一切,让RLHF彻底傻瓜化!

我们开源了RLHF的低成本QLoRA的实现,一台GPU机器就可以训练33B模型的DPO!

Find out more here

Generic badge

Stay Connected with Us

Wechat 微信公众号

扫码:

group

Wechat group 微信群

扫码进群:

group

Discord

Discord

Blog

Website

Contribution 参与贡献

Buy me a coffee please! 欢迎大家参与贡献本项目 🙏

如果你喜欢我们的项目,请帮忙点个⭐吧!

"Buy Me A Coffee"

✍️ 艾写科技 & Anima AI

This work is from Anima AI LLC and aiwrite.ai.

此工作来自于艾写科技Anima AI

anima's People

Contributors

eltociear avatar hiemal avatar likeslab avatar lyogavin avatar naozumi520 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

anima's Issues

peft版本问题:"addmm_impl_cpu_" not implemented for 'Half'

您好,这是个非常好的工作!但我inference阶段:

generate_ids = model.generate(**inputs, max_new_tokens=30)

时遇到报错:

"addmm_impl_cpu_" not implemented for 'Half'.

这边感觉应该是peft和transformers版本问题?我这边使用的版本如下:

transformers:4.31.0.dev0
peft:0.4.0.dev0

想问下您那边的transformers和peft版本是?

非常感谢!

如何支持超长文本的训练?

我的理解是 用来训练超长文本的策略优化都在 modeling_flash_llama文件里 ,但是训练的 longer_training 文件中没有体现modeling_flash_llama里的内容啊 模型结构加载的还是原来的结构 没有替换成 modeling_flash_llama里改的结构 这个地方有些不太理解,希望解答一下,谢谢!

谢谢

对生成认识不清

max() arg is an empty sequence

from airllm import AirLLMLlama2

MAX_LENGTH = 128
# could use hugging face model repo id:
#model = AirLLMLlama2("garage-bAInd/Platypus2-70B-instruct")
model = AirLLMLlama2("Qwen-14B-Chat",
                     compression='4bit' # specify '8bit' for 8-bit block-wise quantization "Yi-34B-Chat",
                    )
# or use model's local path...
#model = AirLLMLlama2("/home/ubuntu/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f")

input_text = [
        'What is the capital of United States?',
        #'I like',
    ]

input_tokens = model.tokenizer(input_text,
    return_tensors="pt", 
    return_attention_mask=False, 
    truncation=True, 
    max_length=MAX_LENGTH, 
    padding=True)
           
generation_output = model.generate(
    input_tokens['input_ids'].cuda(), 
    max_new_tokens=20,
    use_cache=True,
    return_dict_in_generate=True)

output = model.tokenizer.decode(generation_output.sequences[0])

print(output)

bitsandbytes installed
bitsandbytes installed
bitsandbytes installed
0%| | 0/3 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/luhao/air22.py", line 6, in
model = AirLLMLlama2("Qwen-14B-Chat",
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/airllm/airllm.py", line 75, in init
self.model_local_path, self.checkpoint_path = find_or_create_local_splitted_path(model_local_path_or_repo_id,
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/airllm/utils.py", line 276, in find_or_create_local_splitted_path
return Path(model_local_path_or_repo_id), split_and_save_layers(model_local_path_or_repo_id, layer_shards_saving_path,
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/airllm/utils.py", line 220, in split_and_save_layers
if max(shards) > shard:
ValueError: max() arg is an empty sequence

使用lyogavin/Anima33B-DPO-Belle-1k-merged模型推断时大量重复

非常感谢您在DPO上的工作!我在使用lyogavin/Anima33B-DPO-Belle-1k-merged模型做推断的时候,产生了结尾重复现象(在推断的时候尝试了3种模板,但都很容易有重复现象),希望帮忙解答一下问题所在。下面会贴出我使用的推断样例、推断代码。

一、推断样例
因为不确定使用DPO模型推断的模版是什么,所以在推断的时候尝试了三种模板:1.直接提问;2.参考DPO训练时的模板;3.test_cn_dataset_lenghts.py文件中的source_template。

输入:世界上最长的河流是什么?
输出:世界上最长的河是尼罗河。它从埃及到埃塞俄比亚流经了非洲大陆,长约6650公里。它是世界上最长、最宽、最深、最漫长、最湿润、最有生态产生力的河。它汇集了埃及、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比


输入:\n\nHuman: 世界上最长的河流是什么?\n\nAssistant:
输出:世界上最长的河流是尼罗河。它从非洲的乌吉亚高原开始,流经非洲和欧洲,最终流入埃及的瓦丽纳盆地。它的长度约为6650公里。尼罗河汇集了非洲和欧洲的多个河流,汇集了很多水资源,是非洲和欧洲最重要的河流之一。尼罗河汇集了埃及、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞俄比亚、苏丹、埃塞�


输入:Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n世界上最长的河流是什么?\n\n### Response:
输出:世界上最长的河流是尼罗河。它从埃及的狮子山源头开始,流经埃及、苏丹、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、埃塞俄比亚、苏丹、埃塞俄比亚、肯尼亚、��

二、推断代码

# imports
import torch
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from peft import PeftModel
from transformers import GenerationConfig, LlamaForCausalLM, LlamaTokenizer
import json
from tqdm import tqdm

# create tokenizer
base_model = "Anima33B-DPO-Belle-1k-merged"
tokenizer = LlamaTokenizer.from_pretrained(base_model)

# base model
model = LlamaForCausalLM.from_pretrained(
        base_model,
        torch_dtype=torch.float16,
        device_map="auto",
    )

model.eval()

prompt = input("")
inputs = tokenizer(prompt, return_tensors="pt")
generate_ids = model.generate(**inputs, max_new_tokens=512)

print(tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0])

Does not deliver the correct result

My company is doing performance optimizations, so I follow your work closely, to learn.

Your example returns:

What is the capital of United States?
W

Ubuntu 22.04, NVidia 4090, Cuda 12.3

(img-caption-py3.10) volker@power:~/workspace/PYTHON/img_caption/img_caption$ python3 llama2.py
Fetching 25 files: 100%|████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 499321.90it/s]
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
cuda:0: 100%|███████████████████████████████████████████████████████████████████████████████████| 83/83 [01:01<00:00,  1.35it/s]
returning kvcache size: torch.Size([1, 8, 9, 128])
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
cuda:0: 100%|███████████████████████████████████████████████████████████████████████████████████| 83/83 [01:01<00:00,  1.35it/s]
returning kvcache size: torch.Size([1, 8, 10, 128])
> /home/volker/workspace/PYTHON/img_caption/img_caption/llama2.py(30)<module>()
-> for idx, answer in enumerate(generation_output.sequences):
(Pdb) generation_output.sequences
tensor([[    1,  1724,   338,   278,  7483,   310,  3303,  3900, 29973,    13,
         29956]], device='cuda:0')
(Pdb) c
<s> What is the capital of United States?
W

Adapter模型的合并

这是一个非常nice的工作。
这里我有个小问题想请教一下:
如题,Anima33B的adapter model是和原始的LLama合并后得到Anima33B merged嘛

低消耗微调和推理这个太重要了

我对你们这个模型挺感兴趣的。
请问有详细的个性化微调、推理的技术交流或者支持吗?交流社区什么的
推理的性能怎么样?
训练和推理的硬件配置建议怎么样?
另外,可以用vllm来推动吗?

More clever batching of layers

Hello this is an awesome project, I replicated it on Modal Labs on a small T4 GPU.

The problem I see now is that by loading one layer per time, you are not maximizing the GPU VRAM usage, for instance in this case it used only 1.6 GB of VRAM, I guess it is the size of one layer.

image

Would it be possible instead to load N layers with a configuration parameter?

Code example here: https://gist.github.com/priamai/61aa332c42b89f518dcf134c38dd593d

Printing one token in output

Hello.

I am using an NVidia RTX a4000 (16GB GPU). I am using airllm 0.9.5 on Ubuntu 20.04.6, python version 3.11.6, and torch 2.1.1.

I used the test code from the airllm GitHub page:

from airllm import AirLLMLlama2

MAX_LENGTH = 128
# could use hugging face model repo id:
model = AirLLMLlama2("meta-llama/Llama-2-13b-chat-hf")

# or use model's local path...
#model = AirLLMLlama2("/home/ubuntu/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/s
napshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f")

input_text = [
        'What is the capital of United States?',
        #'I like',
    ]

input_tokens = model.tokenizer(input_text,
    return_tensors="pt",
    return_attention_mask=False,
    truncation=True,
    max_length=MAX_LENGTH,
    padding=True)

generation_output = model.generate(
    input_tokens['input_ids'].cuda(),
    max_new_tokens=2,
    use_cache=True,
    return_dict_in_generate=True)

	 output = model.tokenizer.decode(generation_output.sequences[0])

print(output)

Here is the output after I run python air.py (air.py is the name of the script I used). The output is at the very end with only outputs "The" and exits.

README.md: 100%|████████████████████████████████████████████████████| 10.4k/10.4k [00:00<00:00, 8.76MB/s]
USE_POLICY.md: 100%|████████████████████████████████████████████████| 4.77k/4.77k [00:00<00:00, 5.98MB/s]
generation_config.json: 100%|████████████████████████████████████████████| 188/188 [00:00<00:00, 429kB/s]
.gitattributes: 100%|███████████████████████████████████████████████| 1.58k/1.58k [00:00<00:00, 4.20MB/s]
LICENSE.txt: 100%|██████████████████████████████████████████████████| 7.02k/7.02k [00:00<00:00, 15.4MB/s]
config.json: 100%|███████████████████████████████████████████████████████| 587/587 [00:00<00:00, 444kB/s]
model.safetensors.index.json: 100%|█████████████████████████████████| 33.4k/33.4k [00:00<00:00, 14.1MB/s]
pytorch_model.bin.index.json: 100%|█████████████████████████████████| 33.4k/33.4k [00:00<00:00, 1.14MB/s]
Responsible-Use-Guide.pdf: 100%|█████████████████████████████████████| 1.25M/1.25M [00:01<00:00, 808kB/s]
special_tokens_map.json: 100%|███████████████████████████████████████████| 414/414 [00:00<00:00, 888kB/s]
tokenizer.model: 100%|█████████████████████████████████████████████████| 500k/500k [00:00<00:00, 612kB/s]
tokenizer_config.json: 100%|████████████████████████████████████████| 1.62k/1.62k [00:00<00:00, 3.52MB/s]
tokenizer.json: 100%|████████████████████████████████████████████████| 1.84M/1.84M [00:06<00:00, 269kB/s]
model-00003-of-00003.safetensors: 100%|█████████████████████████████| 6.18G/6.18G [52:08<00:00, 1.98MB/s]
pytorch_model-00003-of-00003.bin: 100%|█████████████████████████████| 6.18G/6.18G [52:23<00:00, 1.97MB/s]
pytorch_model-00002-of-00003.bin: 100%|███████████████████████████| 9.90G/9.90G [1:12:28<00:00, 2.28MB/s]
model-00002-of-00003.safetensors: 100%|███████████████████████████| 9.90G/9.90G [1:13:00<00:00, 2.26MB/s]
model-00001-of-00003.safetensors: 100%|███████████████████████████| 9.95G/9.95G [1:13:43<00:00, 2.25MB/s]
pytorch_model-00001-of-00003.bin: 100%|███████████████████████████| 9.95G/9.95G [1:13:56<00:00, 2.24MB/s]
Fetching 19 files: 100%|██████████████████████████████████████████████| 19/19 [1:13:57<00:00, 233.55s/it]
  0%|                                                                             | 0/43 [00:00<?, ?it/s]Loading shard 1/301-of-00003.bin:  95%|█████████████████████████▌ | 9.44G/9.95G [1:12:27<03:33, 2.40MB/s]
saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.embed_tokens.safetensors3:41<00:29, 5.66MB/s]
  2%|█▌  
    2%|█▌                                                                   | 1/43 [00:29<20:38, 29.50s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.0.safetensors
  5%|███▏                                                                 | 2/43 [00:31<09:08, 13.37s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.1.safetensors
  7%|████▊                                                                | 3/43 [00:33<05:17,  7.94s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.2.safetensors
  9%|██████▍                                                              | 4/43 [00:34<03:29,  5.38s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.3.safetensors
 12%|████████                                                             | 5/43 [00:36<02:32,  4.02s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.4.safetensors
 14%|█████████▋                                                           | 6/43 [00:37<01:58,  3.21s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.5.safetensors
 16%|███████████▏                                                         | 7/43 [00:39<01:37,  2.71s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.6.safetensors
 19%|████████████▊                                                        | 8/43 [00:41<01:23,  2.39s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.7.safetensors
 21%|██████████████▍                                                      | 9/43 [00:42<01:12,  2.13s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.8.safetensors
 23%|███████████████▊                                                    | 10/43 [00:44<01:04,  1.95s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.9.safetensors
 26%|█████████████████▍                                                  | 11/43 [00:45<00:57,  1.80s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.10.safetensors
 28%|██████████████████▉                                                 | 12/43 [00:47<00:54,  1.75s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.11.safetensors
 30%|████████████████████▌                                               | 13/43 [00:48<00:51,  1.71s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.12.safetensors
 33%|██████████████████████▏                                             | 14/43 [00:50<00:50,  1.73s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.13.safetensors
 35%|███████████████████████▋                                            | 15/43 [00:52<00:48,  1.74s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.14.safetensors
 37%|█████████████████████████▎                                          | 16/43 [00:54<00:45,  1.69s/it]Loading shard 2/3
saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.15.safetensors
 40%|██████████████████████████▉                                         | 17/43 [01:45<07:15, 16.73s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.16.safetensors
 42%|████████████████████████████▍                                       | 18/43 [01:47<05:06, 12.25s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.17.safetensors
 44%|██████████████████████████████                                      | 19/43 [01:49<03:37,  9.05s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.18.safetensors
 47%|███████████████████████████████▋                                    | 20/43 [01:50<02:36,  6.82s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.19.safetensors
 49%|█████████████████████████████████▏                                  | 21/43 [01:52<01:54,  5.21s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.20.safetensors
 51%|██████████████████████████████████▊                                 | 22/43 [01:54<01:27,  4.16s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.21.safetensors
 53%|████████████████████████████████████▎                               | 23/43 [01:55<01:07,  3.39s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.22.safetensors
 56%|█████████████████████████████████████▉                              | 24/43 [01:57<00:53,  2.84s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.23.safetensors
 58%|███████████████████████████████████████▌                            | 25/43 [01:58<00:44,  2.46s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.24.safetensors
 60%|█████████████████████████████████████████                           | 26/43 [02:00<00:37,  2.20s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.25.safetensors
 63%|██████████████████████████████████████████▋                         | 27/43 [02:03<00:39,  2.46s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.26.safetensors
 65%|████████████████████████████████████████████▎                       | 28/43 [02:05<00:33,  2.22s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.27.safetensors
 67%|█████████████████████████████████████████████▊                      | 29/43 [02:06<00:29,  2.09s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.28.safetensors
 70%|███████████████████████████████████████████████▍                    | 30/43 [02:11<00:35,  2.72s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.29.safetensors
 72%|█████████████████████████████████████████████████                   | 31/43 [02:29<01:30,  7.54s/it]Loading shard 3/3
saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.30.safetensors
 74%|██████████████████████████████████████████████████▌                 | 32/43 [03:23<03:55, 21.39s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.31.safetensors
 77%|████████████████████████████████████████████████████▏               | 33/43 [03:25<02:34, 15.47s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.32.safetensors
 79%|█████████████████████████████████████████████████████▊              | 34/43 [03:26<01:41, 11.30s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.33.safetensors
 81%|███████████████████████████████████████████████████████▎            | 35/43 [03:28<01:06,  8.37s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.34.safetensors
 84%|████████████████████████████████████████████████████████▉           | 36/43 [03:29<00:44,  6.36s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.35.safetensors
 86%|██████████████████████████████████████████████████████████▌         | 37/43 [03:31<00:29,  4.92s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.36.safetensors
 88%|████████████████████████████████████████████████████████████        | 38/43 [03:33<00:19,  3.92s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.37.safetensors
 91%|█████████████████████████████████████████████████████████████▋      | 39/43 [03:35<00:13,  3.49s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.38.safetensors
 93%|███████████████████████████████████████████████████████████████▎    | 40/43 [03:39<00:11,  3.75s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.layers.39.safetensors
 95%|████████████████████████████████████████████████████████████████▊   | 41/43 [03:41<00:06,  3.14s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/model.norm.safetensors
 98%|██████████████████████████████████████████████████████████████████▍ | 42/43 [03:41<00:02,  2.23s/it]saved as: /home/administrator/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496/splitted_model/lm_head.safetensors
100%|████████████████████████████████████████████████████████████████████| 43/43 [03:42<00:00,  5.18s/it]
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.

cuda:0:   2%|█▍                                                           | 1/43 [00:04<03:02,  4.34s/it]cuda:0: 100%|████████████████████████████████████████████████████████████| 43/43 [01:55<00:00,  2.69s/it]
returning kvcache size: torch.Size([1, 40, 9, 128])
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
cuda:0: 100%|████████████████████████████████████████████████████████████| 43/43 [00:24<00:00,  1.73it/s]
returning kvcache size: torch.Size([1, 40, 10, 128])
<s> What is the capital of United States?
The

^^^^ The output above is the question and then "The" and the program exits.

Do you know why this may be happening? Please let me know if you need more information.

Does'n t work on Apple M1/M2. AssertionError: Torch not compiled with CUDA enabled.

Problem

Hello, I just installed AirLLM as mentioned in the Medium post.

Environment

  • MacBook M2
  • MacOS Ventura 13.5.2
  • Python 3.11.4

How to reproduce

pip install airllm

inference.py

from airllm import AirLLMLlama2

MAX_LENGTH = 128
# could use hugging face model repo id:
model = AirLLMLlama2("garage-bAInd/Platypus2-70B-instruct")

# or use model's local path...
#model = AirLLMLlama2("/home/ubuntu/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f")

input_text = [
        'What is the capital of United States?',
    ]

input_tokens = model.tokenizer(input_text,
    return_tensors="pt", 
    return_attention_mask=False, 
    truncation=True, 
    max_length=MAX_LENGTH, 
    padding=True)
           
generation_output = model.generate(
    input_tokens['input_ids'].cuda(), 
    max_new_tokens=20,
    use_cache=True,
    return_dict_in_generate=True)

output = model.tokenizer.decode(generation_output.sequences[0])

print(output)

Result

AssertionError: Torch not compiled with CUDA enabled

Logs

Here is result:

(base) andrey@m2 current % python ./inference.py
Downloading README.md: 100%|███████████████████████████████████████████████████████████████████| 5.15k/5.15k [00:00<00:00, 10.6MB/s]
Downloading Best_Platty_small.jpeg: 100%|██████████████████████████████████████████████████████| 7.35k/7.35k [00:00<00:00, 23.3MB/s]
Downloading generation_config.json: 100%|██████████████████████████████████████████████████████████| 154/154 [00:00<00:00, 2.27MB/s]
Downloading config.json: 100%|█████████████████████████████████████████████████████████████████████| 632/632 [00:00<00:00, 11.5MB/s]
Downloading .gitattributes: 100%|██████████████████████████████████████████████████████████████| 1.52k/1.52k [00:00<00:00, 29.1MB/s]
Downloading (…)l-00006-of-00015.bin: 100%|█████████████████████████████████████████████████████| 9.80G/9.80G [22:08<00:00, 7.38MB/s]
Downloading (…)l-00007-of-00015.bin: 100%|█████████████████████████████████████████████████████| 9.97G/9.97G [22:42<00:00, 7.31MB/s]
Downloading (…)l-00001-of-00015.bin: 100%|█████████████████████████████████████████████████████| 9.85G/9.85G [23:20<00:00, 7.04MB/s]
Downloading (…)l-00002-of-00015.bin: 100%|█████████████████████████████████████████████████████| 9.80G/9.80G [28:15<00:00, 5.78MB/s]
Downloading (…)l-00008-of-00015.bin: 100%|█████████████████████████████████████████████████████| 9.80G/9.80G [28:54<00:00, 5.65MB/s]
Downloading (…)l-00003-of-00015.bin: 100%|█████████████████████████████████████████████████████| 9.97G/9.97G [29:04<00:00, 5.71MB/s]                   | 210M/9.80G [00:35<28:43, 5.56MB/s]
Downloading (…)l-00004-of-00015.bin: 100%|█████████████████████████████████████████████████████| 9.80G/9.80G [30:55<00:00, 5.28MB/s]                   | 283M/9.80G [00:45<23:49, 6.65MB/s]
Downloading (…)l-00005-of-00015.bin: 100%|█████████████████████████████████████████████████████| 9.80G/9.80G [31:52<00:00, 5.12MB/s]                   | 944M/9.80G [02:36<25:19, 5.83MB/s]
Downloading (…)model.bin.index.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 66.7k/66.7k [00:00<00:00, 468kB/s]
Downloading (…)cial_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 414/414 [00:00<00:00, 3.03MB/s]
Downloading tokenizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.84M/1.84M [00:01<00:00, 1.56MB/s]
Downloading tokenizer.model: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 1.15MB/s]
Downloading tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 698/698 [00:00<00:00, 3.92MB/s]
Downloading (…)l-00015-of-00015.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 524M/524M [01:27<00:00, 6.02MB/s]
Downloading (…)l-00009-of-00015.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.80G/9.80G [19:29<00:00, 8.38MB/s]
Downloading (…)l-00010-of-00015.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.80G/9.80G [19:19<00:00, 8.45MB/s]
Downloading (…)l-00011-of-00015.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.97G/9.97G [21:02<00:00, 7.90MB/s]
Downloading (…)l-00014-of-00015.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.50G/9.50G [18:10<00:00, 8.71MB/s]
Downloading (…)l-00012-of-00015.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.80G/9.80G [19:16<00:00, 8.47MB/s]
Downloading (…)l-00013-of-00015.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.80G/9.80G [18:41<00:00, 8.74MB/s]
Fetching 25 files: 100%|███████████████████████████████████████████████████████████████████████████| 25/25 [47:39<00:00, 114.38s/it]██████████████████| 9.50G/9.50G [18:10<00:00, 12.0MB/s]
  0%|                                                                                                                                                               | 0/83 [00:00<?, ?it/s]Loading shard 1/150013-of-00015.bin:  71%|█████████████████████████████████████████████████████████████████████████████▏                              | 7.00G/9.80G [15:26<04:04, 11.4MB/s]
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.embed_tokens.safetensorsownloading (…)l-00013-of-00015.bin:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████████████▊ | 9.69G/9.80G [18:38<00:04, 23.8MB/s]
  1%|█▊                                                                                                                                                     | 1/83 [00:01<02:36,  1.91s/it]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.0.safetensors
  2%|███▋                                                                                                                                                   | 2/83 [00:02<01:23,  1.03s/it]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.1.safetensors
  4%|█████▍                                                                                                                                                 | 3/83 [00:02<00:58,  1.36it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.2.safetensors
  5%|███████▎                                                                                                                                               | 4/83 [00:03<00:47,  1.66it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.3.safetensors
  6%|█████████                                                                                                                                              | 5/83 [00:03<00:43,  1.78it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.4.safetensors
  7%|██████████▉                                                                                                                                            | 6/83 [00:03<00:38,  1.99it/s]Loading shard 2/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.5.safetensors
  8%|████████████▋                                                                                                                                          | 7/83 [00:05<01:15,  1.01it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.6.safetensors
 10%|██████████████▌                                                                                                                                        | 8/83 [00:06<00:59,  1.26it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.7.safetensors
 11%|████████████████▎                                                                                                                                      | 9/83 [00:06<00:49,  1.49it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.8.safetensors
 12%|██████████████████                                                                                                                                    | 10/83 [00:07<00:42,  1.71it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.9.safetensors
 13%|███████████████████▉                                                                                                                                  | 11/83 [00:07<00:38,  1.85it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.10.safetensors
 14%|█████████████████████▋                                                                                                                                | 12/83 [00:08<00:36,  1.96it/s]Loading shard 3/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.11.safetensors
 16%|███████████████████████▍                                                                                                                              | 13/83 [00:10<01:08,  1.02it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.12.safetensors
 17%|█████████████████████████▎                                                                                                                            | 14/83 [00:10<00:55,  1.25it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.13.safetensors
 18%|███████████████████████████                                                                                                                           | 15/83 [00:10<00:46,  1.47it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.14.safetensors
 19%|████████████████████████████▉                                                                                                                         | 16/83 [00:11<00:39,  1.69it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.15.safetensors
 20%|██████████████████████████████▋                                                                                                                       | 17/83 [00:11<00:36,  1.82it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.16.safetensors
 22%|████████████████████████████████▌                                                                                                                     | 18/83 [00:12<00:32,  2.00it/s]Loading shard 4/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.17.safetensors
 23%|██████████████████████████████████▎                                                                                                                   | 19/83 [00:13<00:58,  1.09it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.18.safetensors
 24%|████████████████████████████████████▏                                                                                                                 | 20/83 [00:14<00:49,  1.27it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.19.safetensors
 25%|█████████████████████████████████████▉                                                                                                                | 21/83 [00:14<00:42,  1.47it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.20.safetensors
 27%|███████████████████████████████████████▊                                                                                                              | 22/83 [00:15<00:36,  1.68it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.21.safetensors
 28%|█████████████████████████████████████████▌                                                                                                            | 23/83 [00:15<00:31,  1.88it/s]Loading shard 5/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.22.safetensors
 29%|███████████████████████████████████████████▎                                                                                                          | 24/83 [00:17<00:56,  1.04it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.23.safetensors
 30%|█████████████████████████████████████████████▏                                                                                                        | 25/83 [00:18<00:46,  1.25it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.24.safetensors
 31%|██████████████████████████████████████████████▉                                                                                                       | 26/83 [00:18<00:38,  1.47it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.25.safetensors
 33%|████████████████████████████████████████████████▊                                                                                                     | 27/83 [00:18<00:34,  1.62it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.26.safetensors
 34%|██████████████████████████████████████████████████▌                                                                                                   | 28/83 [00:19<00:30,  1.82it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.27.safetensors
 35%|████████████████████████████████████████████████████▍                                                                                                 | 29/83 [00:19<00:27,  1.94it/s]Loading shard 6/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.28.safetensors
 36%|██████████████████████████████████████████████████████▏                                                                                               | 30/83 [00:21<00:50,  1.06it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.29.safetensors
 37%|████████████████████████████████████████████████████████                                                                                              | 31/83 [00:22<00:41,  1.26it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.30.safetensors
 39%|█████████████████████████████████████████████████████████▊                                                                                            | 32/83 [00:22<00:34,  1.49it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.31.safetensors
 40%|███████████████████████████████████████████████████████████▋                                                                                          | 33/83 [00:22<00:30,  1.66it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.32.safetensors
 41%|█████████████████████████████████████████████████████████████▍                                                                                        | 34/83 [00:23<00:26,  1.86it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.33.safetensors
 42%|███████████████████████████████████████████████████████████████▎                                                                                      | 35/83 [00:23<00:24,  1.98it/s]Loading shard 7/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.34.safetensors
 43%|█████████████████████████████████████████████████████████████████                                                                                     | 36/83 [00:25<00:45,  1.04it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.35.safetensors
 45%|██████████████████████████████████████████████████████████████████▊                                                                                   | 37/83 [00:26<00:37,  1.24it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.36.safetensors
 46%|████████████████████████████████████████████████████████████████████▋                                                                                 | 38/83 [00:26<00:30,  1.47it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.37.safetensors
 47%|██████████████████████████████████████████████████████████████████████▍                                                                               | 39/83 [00:27<00:25,  1.69it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.38.safetensors
 48%|████████████████████████████████████████████████████████████████████████▎                                                                             | 40/83 [00:27<00:23,  1.86it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.39.safetensors
 49%|██████████████████████████████████████████████████████████████████████████                                                                            | 41/83 [00:27<00:20,  2.04it/s]Loading shard 8/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.40.safetensors
 51%|███████████████████████████████████████████████████████████████████████████▉                                                                          | 42/83 [00:29<00:38,  1.06it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.41.safetensors
 52%|█████████████████████████████████████████████████████████████████████████████▋                                                                        | 43/83 [00:30<00:31,  1.29it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.42.safetensors
 53%|███████████████████████████████████████████████████████████████████████████████▌                                                                      | 44/83 [00:30<00:25,  1.52it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.43.safetensors
 54%|█████████████████████████████████████████████████████████████████████████████████▎                                                                    | 45/83 [00:30<00:21,  1.74it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.44.safetensors
 55%|███████████████████████████████████████████████████████████████████████████████████▏                                                                  | 46/83 [00:31<00:19,  1.86it/s]Loading shard 9/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.45.safetensors
 57%|████████████████████████████████████████████████████████████████████████████████████▉                                                                 | 47/83 [00:33<00:34,  1.06it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.46.safetensors
 58%|██████████████████████████████████████████████████████████████████████████████████████▋                                                               | 48/83 [00:33<00:27,  1.28it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.47.safetensors
 59%|████████████████████████████████████████████████████████████████████████████████████████▌                                                             | 49/83 [00:34<00:22,  1.51it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.48.safetensors
 60%|██████████████████████████████████████████████████████████████████████████████████████████▎                                                           | 50/83 [00:34<00:20,  1.65it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.49.safetensors
 61%|████████████████████████████████████████████████████████████████████████████████████████████▏                                                         | 51/83 [00:35<00:17,  1.81it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.50.safetensors
 63%|█████████████████████████████████████████████████████████████████████████████████████████████▉                                                        | 52/83 [00:35<00:15,  1.99it/s]Loading shard 10/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.51.safetensors
 64%|███████████████████████████████████████████████████████████████████████████████████████████████▊                                                      | 53/83 [00:37<00:28,  1.06it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.52.safetensors
 65%|█████████████████████████████████████████████████████████████████████████████████████████████████▌                                                    | 54/83 [00:37<00:22,  1.27it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.53.safetensors
 66%|███████████████████████████████████████████████████████████████████████████████████████████████████▍                                                  | 55/83 [00:38<00:18,  1.50it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.54.safetensors
 67%|█████████████████████████████████████████████████████████████████████████████████████████████████████▏                                                | 56/83 [00:38<00:15,  1.72it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.55.safetensors
 69%|███████████████████████████████████████████████████████████████████████████████████████████████████████                                               | 57/83 [00:38<00:13,  1.91it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.56.safetensors
 70%|████████████████████████████████████████████████████████████████████████████████████████████████████████▊                                             | 58/83 [00:39<00:12,  2.07it/s]Loading shard 11/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.57.safetensors
 71%|██████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                           | 59/83 [00:41<00:22,  1.07it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.58.safetensors
 72%|████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                         | 60/83 [00:41<00:18,  1.27it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.59.safetensors
 73%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████▏                                       | 61/83 [00:42<00:14,  1.50it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.60.safetensors
 75%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                      | 62/83 [00:42<00:12,  1.71it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.61.safetensors
 76%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊                                    | 63/83 [00:42<00:10,  1.88it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.62.safetensors
 77%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                  | 64/83 [00:43<00:09,  2.05it/s]Loading shard 12/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.63.safetensors
 78%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                | 65/83 [00:45<00:17,  1.05it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.64.safetensors
 80%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                              | 66/83 [00:45<00:13,  1.27it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.65.safetensors
 81%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                             | 67/83 [00:46<00:10,  1.50it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.66.safetensors
 82%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉                           | 68/83 [00:46<00:08,  1.72it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.67.safetensors
 83%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                         | 69/83 [00:47<00:07,  1.84it/s]Loading shard 13/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.68.safetensors
 84%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌                       | 70/83 [00:48<00:12,  1.04it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.69.safetensors
 86%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                     | 71/83 [00:49<00:09,  1.26it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.70.safetensors
 87%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                    | 72/83 [00:49<00:07,  1.49it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.71.safetensors
 88%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉                  | 73/83 [00:50<00:05,  1.67it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.72.safetensors
 89%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                | 74/83 [00:50<00:04,  1.86it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.73.safetensors
 90%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌              | 75/83 [00:50<00:03,  2.03it/s]Loading shard 14/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.74.safetensors
 92%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎            | 76/83 [00:52<00:06,  1.08it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.75.safetensors
 93%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏          | 77/83 [00:53<00:04,  1.26it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.76.safetensors
 94%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉         | 78/83 [00:53<00:03,  1.47it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.77.safetensors
 95%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊       | 79/83 [00:54<00:02,  1.67it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.78.safetensors
 96%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌     | 80/83 [00:54<00:01,  1.83it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.layers.79.safetensors
 98%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍   | 81/83 [00:55<00:00,  2.00it/s]saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/model.norm.safetensors
Loading shard 15/15
saved as: /Users/andrey/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f/splitted_model/lm_head.safetensors
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 83/83 [00:55<00:00,  1.50it/s]
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
Traceback (most recent call last):
  File "/Users/andrey/air_llm/current/./inference.py", line 5, in <module>
    model = AirLLMLlama2("garage-bAInd/Platypus2-70B-instruct")
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/andrey/miniconda3/lib/python3.11/site-packages/airllm/airllm.py", line 184, in __init__
    self.init_model()
  File "/Users/andrey/miniconda3/lib/python3.11/site-packages/airllm/airllm.py", line 205, in init_model
    set_module_tensor_to_device(self.model, buffer_name, self.running_device, value=buffer,
  File "/Users/andrey/miniconda3/lib/python3.11/site-packages/accelerate/utils/modeling.py", line 317, in set_module_tensor_to_device
    new_value = value.to(device)
                ^^^^^^^^^^^^^^^^
  File "/Users/andrey/miniconda3/lib/python3.11/site-packages/torch/cuda/__init__.py", line 289, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

fine-tuning QLoRA on other base models

Thank you for your work! I wonder if there is any experience with fine-tuning QLoRA on other base models? I'm not sure if most of the base models on Huggingface would perform well? Especially for chinese.

关于训练用时

请问您方便透露,在H100 / A100上每个 step 的平均用时,以及 10000 step 共训练了多久吗?

训练参数疑问

RLHF训练只有100个step会不会数据学习不够重复,更多step会更好吗

README中的例子推理失败

在按照Readme中的例子尝试推理的时候,根据报错进行了少数修改:

# base model
model = LlamaForCausalLM.from_pretrained(
        base_model,
        torch_dtype=torch.float16,
        device_map="auto",
        offload_state_dict = True,
        offload_folder='./offload',
    )

# LORA PEFT adapters
adapter_model = "lyogavin/Anima33B"

model = PeftModel.from_pretrained(
        model,
        adapter_model,
        #torch_dtype=torch.float16,
        device_map={"":0},
    )
model.eval()

其中在LlamaForCausalLM调用过程中,我添加了offload_folder参数(根据huggingface文档);
PeftModel调用过程中,device_map需要设为cuda设备(即0)否则会报OOM;
但是即使经过如此修改后仍不能运行:

accelerate/utils/modeling.py:154
ValueError: weight is on the meta device, we need a `value` to put in on 0.

我想知道是否是device_map参数的问题,以及如何调整?
已经搜索过没有类似问题。

我的配置是

  • 硬件 7700X/4070Ti/DDR5-32G
  • 操作系统 WSL2-Ubuntu22.04
  • Python 3.10.10
  • torch 2.0.1 with cuda11.8
  • transformers 4.31.0.dev0

Typo in air_llm setup.py

Great work you are doing here!

   install_requires=[  # I get to this in a second
        'tqdm',
        'torch',
        'transformers',
        'accelerate',
        'safetensors',
        'optimum',
        'huggingface_hub'
        'scipy',
        #'bitsandbytes' set it to optional to support fallback when not installable
    ],

->>

        'huggingface_hub',

Cheers,
Volker

能否套用deepspeed?

deepspeed可以在全量参数微调时降低显存的要求。但具体套用的时候似乎会产生一些bug。比如
使用zero3时,无法正常使用reference_model产生logits
使用zero2时,部分参数会被重复计算梯度

The model precision used for training with qlora

Hey @lyogavin ,thanks a lot for sharing this awesome job!
I've got a silly question to ask. When using qlora to train a model, is it possible to load either the normal or quantized base model? Both seem to train fine, but it appears that loading the unquantized base model trains a bit faster. I was wondering if there are any differences between the models trained in these two modes.

训练复现报错

++ echo 'START TIME: Mon Jun 19 19:22:10 CST 2023'
START TIME: Mon Jun 19 19:22:10 CST 2023
++ ROOT_DIR_BASE=/Anima/saved_models/qlora_cn
++ OUTPUT_PATH=/Anima/saved_models/qlora_cn/output_1687173730
++ mkdir -p /Anima/saved_models/qlora_cn/output_1687173730
++ python qlora.py --dataset=chinese-vicuna --dataset_format=alpaca-clean --learning_rate 0.0001 --per_device_train_batch_size 1 --gradient_accumulation_steps 16 --max_steps 10000 --model_name_or_path timdettmers/guanaco-33b-merged --source_max_len 512 --target_max_len
512 --eval_dataset_size 1 --do_eval --evaluation_strategy steps --eval_steps 200 --output_dir /Anima/saved_models/qlora_cn/output_1687173730 --report_to wandb --sample_generate --save_steps 200
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /root/miniconda3/envs/anima/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda121.so
/root/miniconda3/envs/anima/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:148: UserWarning: /root/miniconda3/envs/anima did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
/root/miniconda3/envs/anima/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:148: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/lcoal/cuda/lib64')}
warn(msg)
/root/miniconda3/envs/anima/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:148: UserWarning: /usr/lcoal/cuda/lib64: did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /root/miniconda3/envs/anima/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...
./run_Amina_training.sh: line 48: 36300 Segmentation fault (core dumped) python qlora.py --dataset="chinese-vicuna" --dataset_format="alpaca-clean" #alpaca-clean has similar format to chinese training dataset --learning_rate 0.0001 # QLoRA paper appendix B Table 9 --per_device_train_batch_size 1 # fix for fitting mem --gradient_accumulation_steps 16 # QLoRA paper appendix B Table 9 --max_steps 10000 # QLoRA paper appendix B Table 9, follow paper setting even though cn data is 690k much bigger than OASST1 9k, batch size considering accum --model_name_or_path "timdettmers/guanaco-33b-merged" --source_max_len 512 # default setting in code, cn model 2048 too long --target_max_len 512 # follow QLoRA paper appendix B Table 9 --eval_dataset_size 1 # mainly for testing, no need to be big --do_eval --evaluation_strategy "steps" --eval_steps 200 # 10 for debug mode only, 200 for training --output_dir $OUTPUT_PATH --report_to 'wandb' --sample_generate # test sample generation every once a while --save_steps 200 # 20 for debug mode only, 200 for training

TypeError: not a string

跑inference代码的时候报错

CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda121.so...

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /srv/chat/Anima/infer.py:9 in <module>                                       │
│                                                                              │
│    6                                                                         │
│    7 # create tokenizer                                                      │
│    8 base_model = "timdettmers/guanaco-33b-merged"                           │
│ ❱  9 tokenizer = LlamaTokenizer.from_pretrained(base_model)                  │
│   10                                                                         │
│   11 # base model                                                            │
│   12 model = LlamaForCausalLM.from_pretrained(                               │
│                                                                              │
│ /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base │
│ .py:1825 in from_pretrained                                                  │
│                                                                              │
│   1822 │   │   │   else:                                                     │
│   1823 │   │   │   │   logger.info(f"loading file {file_path} from cache at  │
│   1824 │   │                                                                 │
│ ❱ 1825 │   │   return cls._from_pretrained(                                  │
│   1826 │   │   │   resolved_vocab_files,                                     │
│   1827 │   │   │   pretrained_model_name_or_path,                            │
│   1828 │   │   │   init_configuration,                                       │
│                                                                              │
│ /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base │
│ .py:1988 in _from_pretrained                                                 │
│                                                                              │
│   1985 │   │                                                                 │
│   1986 │   │   # Instantiate tokenizer.                                      │
│   1987 │   │   try:                                                          │
│ ❱ 1988 │   │   │   tokenizer = cls(*init_inputs, **init_kwargs)              │
│   1989 │   │   except OSError:                                               │
│   1990 │   │   │   raise OSError(                                            │
│   1991 │   │   │   │   "Unable to load vocabulary from file. "
│                                                                              │
│ /usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenizati │
│ on_llama.py:96 in __init__                                                   │
│                                                                              │
│    93 │   │   self.add_bos_token = add_bos_token                             │
│    94 │   │   self.add_eos_token = add_eos_token                             │
│    95 │   │   self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwa │
│ ❱  96 │   │   self.sp_model.Load(vocab_file)                                 │
│    97 │                                                                      │
│    98 │   def __getstate__(self):                                            │
│    99 │   │   state = self.__dict__.copy()                                   │
│                                                                              │
│ /usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py:905 in     │
│ Load                                                                         │
│                                                                              │
│    902 │   │   raise RuntimeError('model_file and model_proto must be exclus │
│    903 │     if model_proto:                                                 │
│    904 │   │   return self.LoadFromSerializedProto(model_proto)              │
│ ❱  905 │     return self.LoadFromFile(model_file)                            │
│    906                                                                       │
│    907                                                                       │
│    908 # Register SentencePieceProcessor in _sentencepiece:                  │
│                                                                              │
│ /usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py:310 in     │
│ LoadFromFile                                                                 │
│                                                                              │
│    307 │   │   return _sentencepiece.SentencePieceProcessor_serialized_model │
│    308 │                                                                     │
│    309 │   def LoadFromFile(self, arg):                                      │
│ ❱  310 │   │   return _sentencepiece.SentencePieceProcessor_LoadFromFile(sel │
│    311 │                                                                     │
│    312 │   def _EncodeAsIds(self, text, enable_sampling, nbest_size, alpha,  │
│    313 │   │   return _sentencepiece.SentencePieceProcessor__EncodeAsIds(sel │
╰──────────────────────────────────────────────────────────────────────────────╯
TypeError: not a string

这是什么原因?

more examples

Hello,

A medium article gave the information that we can use four different ways to optimize the model

  1. Layer wise Inteference
  2. Single layer optimization-->Flash Attention
  3. Model File Sharding
  4. Meta Device

Are there examples to show each of these methods?
Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.