linaqruf / kohya-trainer Goto Github PK

View Code? Open in Web Editor NEW

1.8K 1.8K 294.0 4.66 MB

Adapted from https://note.com/kohya_ss/n/nbf7ce8d80f29 for easier cloning

License: Apache License 2.0

Python 49.31% Jupyter Notebook 50.69%

kohya-trainer's Introduction

Hi 👋! I'm Linaqruf

kohya-trainer's People

Contributors

Stargazers

Watchers

Forkers

nameless1117 cyan2k goblincore yucklou marcus-arcadius xraearn sheldonchiu breakcore2 cristianmaza treksis mediocreatmybest fannovel16 jade01win cian0 0x1355 ksugawaray pedx78 avermitus p4tr1k285 gigakuma kolabearafk zetsuboucode dahvikiin jesvs tsaost icelandicicecream anjingbin dutinghou jtsiao ydaigo wingtechcorner yanwang202199 ukaserge veblin garimwz locke12456 staffkojinpro magalareuben nelsontseng0704 nyanshell oranzino kasem6746 esdrico bradenliu younyokel 0xdigiscore otherwise-six skidrowh0tz lintonye normalclone jchby ycs77 liwenhe1993 felorhik joelin97 sendhykurniawan zooyr therestlife ngduy192 geminiyellow metaalms feilisp4 jmmarchan dadawooo kelvin697202 schunsukesuzuki ygohel18 nebling17 tomj2ee mlhub-action tk60708 xeerox666 phamhungd coinmars dbonattoj mhkhung rainberg pipinstallyp c00renut usermonk syedusama5556 huytd2k aibots-team szrq zalberth rb7a azjou seancoolo1 ssss12315 tumurzakov 8389899 mjhxyz arkrsr arkitoure maximverse waynewang23 jangocheng jangocity eupimenta 450083aa

kohya-trainer's Issues

Is this just for anime?

Since I can't get any info on how to run the notebook I made it to 3.3 and all the anime stuff it lost me. Zip this or that etc...

Seriously, this needs an instructional video or at least something unless this is just for anime then I have no need or want of it.

please explain native vs dreambooth

what is pros and cons between two of them?
why one should chooce one over the other?

How to resume LORA training ?

i'd like to resume training the LORA, but i must be missing something because i'm not sure where to load the LORA and if there is any cell i need to tick to confirm that i'm resuming ?

Why has the "--learning_rate=" setting disappeared from Kohya LoRA Dreambooth?

Perhaps unet_lr=--learning_rate"?
I am ignorant.

Error 403 Forbidden when trying to connect to Colab's server

So when I click on the localhost link, it is supposed to open File Explorer, but all I get is error 403. Any reason for that?

Kohya colab not working properly?

I frequently use the Kohya colab trainer, last time i used it was probably a few days ago, jan 28th, with great results.

I used it today but it does not work anymore, i set all the parameters and datasets like i always do, it does the "training" but the resulting .safetensor does absolutely nothing when tested, the generation using the prompts it was trained on look nothing like the training data.

These are my settings, am i doing something wrong?:

Tried multiple times with a different amount of epochs ranging from 6 to 20, changing the dataset txt files to be more precise, changing the regularization steps, among other number of things, but no improvement. The resulting safetensor does nothing.

As a reference, last time i tried it and worked the project name was not a dragalia lost LoRA (don't remember what was the last project name when it worked fine, i think it was "hito-komoru"?), that's the only new thing i noticed when i tried to train a dataset today.

Need help! learning_rate

No matter how I try to change the values, my learning_rate does not change and remains like "2e-06" in metadata. And I need 1e-4, as well as unet lr (btw text_encoder_lr: "5e-05"). I don't understand at all what I'm doing wrong. I use kohya-LoRA-finetuner.ipynb

problem with training wd 1.4

train_db.py: error: unrecognized arguments: --v2_parameterization
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 1104, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_db.py', '--v2', '--v2_parameterization', '--pretrained_model_name_or_path=/content/pre_trained_model/waifu-diffusion-1-4-anime-e-1.ckpt', '--train_data_dir=/content/dreambooth/train_1girl', '--reg_data_dir=/content/dreambooth/reg_1girl', '--output_dir=/content/drive/MyDrive/dreambooth1', '--prior_loss_weight=1.0', '--resolution=768', '--save_precision', 'fp16', '--train_batch_size=4', '--learning_rate=2e-6', '--max_train_steps=15000', '--use_8bit_adam', '--xformers', '--mixed_precision=fp16', '--gradient_checkpointing', '--save_every_n_epochs=50', '--enable_bucket', '--cache_latents', '--save_model_as', 'ckpt', '--shuffle_caption', '--caption_extension', '=.txt', '--save_state', '--logging_dir=/content/kohya-trainer/logs', '--log_prefix', 'dreambooth-style-sunori']' returned non-zero exit status 2.

Running into an error on my first attempted use

Good Evening, could you assist me with this error? Hugging face issue?

File "/usr/local/lib/python3.8/dist-packages/diffusers/pipeline_utils.py", line 454, in from_pretrained
config_dict = cls.load_config(
File "/usr/local/lib/python3.8/dist-packages/diffusers/configuration_utils.py", line 363, in load_config
raise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co/' to load this model, couldn't find it in the cached files and it looks like is not the path to a directory containing a model_index.json file.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/diffusers/installation#offline-mode'.
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 1104, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_network.py', '--v2', '--v_parameterization', '--network_dim=128', '--network_alpha=128',

(request) adding runtime.unassign() at the end

Would you please add to exit the colab at the very bottom so that people can just queue shift + enter(s) then go sleep...!
thank you

Error when start training

full logs

2023-01-25 03:52:04.298041: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-01-25 03:52:05.003275: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-01-25 03:52:05.003421: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-01-25 03:52:05.003441: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-01-25 03:52:07.885109: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-01-25 03:52:08.539696: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-01-25 03:52:08.539797: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-01-25 03:52:08.539816: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
prepare tokenizer
Downloading: 100% 961k/961k [00:00<00:00, 2.60MB/s]
Downloading: 100% 525k/525k [00:00<00:00, 1.70MB/s]
Downloading: 100% 389/389 [00:00<00:00, 543kB/s]
Downloading: 100% 905/905 [00:00<00:00, 1.14MB/s]
update token length: 225
Use DreamBooth method.
prepare train images.
found directory 5_masabodo illustration contains 139 image files
695 train images with repeating.
prepare reg images.
found directory 1_illustration contains 0 image files
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
loading image sizes.
100% 139/139 [00:00<00:00, 12514.67it/s]
make buckets
number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む）
bucket 0: resolution (256, 832), count: 0
bucket 1: resolution (256, 896), count: 0
bucket 2: resolution (256, 960), count: 0
bucket 3: resolution (256, 1024), count: 0
bucket 4: resolution (320, 704), count: 5
bucket 5: resolution (320, 768), count: 0
bucket 6: resolution (384, 640), count: 345
bucket 7: resolution (448, 576), count: 150
bucket 8: resolution (512, 512), count: 160
bucket 9: resolution (576, 448), count: 35
bucket 10: resolution (640, 384), count: 0
bucket 11: resolution (704, 320), count: 0
bucket 12: resolution (768, 320), count: 0
bucket 13: resolution (832, 256), count: 0
bucket 14: resolution (896, 256), count: 0
bucket 15: resolution (960, 256), count: 0
bucket 16: resolution (1024, 256), count: 0
mean ar error (without repeats): 0.05949234465711338
prepare accelerator
Using accelerator 0.15.0 or above.
load StableDiffusion checkpoint
Traceback (most recent call last):
  File "train_network.py", line 424, in <module>
    train(args)
  File "train_network.py", line 64, in train
    text_encoder, vae, unet, _ = train_util.load_target_model(args, weight_dtype)
  File "/content/kohya-trainer/library/train_util.py", line 1222, in load_target_model
    text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint(args.v2, args.pretrained_model_name_or_path)
  File "/content/kohya-trainer/library/model_util.py", line 869, in load_models_from_stable_diffusion_checkpoint
    _, state_dict = load_checkpoint_with_text_encoder_conversion(ckpt_path)
  File "/content/kohya-trainer/library/model_util.py", line 846, in load_checkpoint_with_text_encoder_conversion
    checkpoint = torch.load(ckpt_path, map_location="cpu")
  File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 795, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 1002, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '\x09'.
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 1104, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 567, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_network.py', '--network_dim=128', '--network_module=networks.lora', '--text_encoder_lr=5e-05', '--pretrained_model_name_or_path=/content/pre_trained_model/Anything-v3-pruned.ckpt', '--vae=/content/vae/anime.vae.pt', '--caption_extension=.txt', '--train_data_dir=/content/dreambooth/train_illustration', '--reg_data_dir=/content/dreambooth/reg_illustration', '--output_dir=/content/dreambooth/output', '--enable_bucket', '--prior_loss_weight=1.0', '--output_name=masabodo', '--mixed_precision=fp16', '--save_precision=fp16', '--save_every_n_epochs=1', '--save_model_as=safetensors', '--resolution=512', '--cache_latents', '--train_batch_size=1', '--max_token_length=225', '--use_8bit_adam', '--learning_rate=0.0001', '--max_train_epochs=5', '--gradient_accumulation_steps=1', '--clip_skip=2', '--logging_dir=/content/fine_tune/logs', '--log_prefix=masabodo', '--shuffle_caption', '--xformers']' returned non-zero exit status 1.

What I'm doing wrong?

Yesterday night somehow I made it work but today I can't. The code seems to work ( except for some tensorflow errors ) but when I use the lora in SD there's no change in the image. Maybe it's because yesterday I used 51 images for the lora and today I used 105?

I upload my folder with 105 images in 512x512 to my drive, then I press all the celds till path 3. I press the cell 3.1, delete the train_data folder and put my train data folder ( previously uploaded in my drive ) in the same path, I skip 3.2, 3.3 and 4.1 cells, untick BLIP captioning and from here I press every cell till finish in 5.2. When 5.2 ends I put the .safetensors in my drive and when I try the lora in SD there's no changes in the image.

4.2. Data Annotation

4.3. Create JSON file for Finetuning

4.4. Aspect Ratio Bucketing and Cache Latents

5.2. Start Fine-Tuning

WD1.4 Tagger weights Not Found

Can't download weights for the tagger

/content/kohya-trainer
--2022-11-27 11:36:47--  https://huggingface.co/Linaqruf/personal_backup/resolve/main/wd14tagger-weight/wd14Tagger.zip
Resolving huggingface.co (huggingface.co)... 34.227.196.80, 54.147.99.175, 2600:1f18:147f:e800:3df1:c2fc:20aa:9b45, ...
Connecting to huggingface.co (huggingface.co)|34.227.196.80|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2022-11-27 11:36:48 ERROR 404: Not Found.

Archive:  /content/kohya-trainer/wd14tagger-weight/wd14Tagger.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of /content/kohya-trainer/wd14tagger-weight/wd14Tagger.zip or
        /content/kohya-trainer/wd14tagger-weight/wd14Tagger.zip.zip, and cannot find /content/kohya-trainer/wd14tagger-weight/wd14Tagger.zip.ZIP, period.

LoRA file does not work with A1111 Additional Networks

The inference cell in your colab seems to work fine, but it doesn't seem to work in Automatic1111 colab with sd-webui-additional-networks extension installed.
When I try to load it up:

LoRA weight: 1, model: /content/last.safetensors
Error running process: /content/stable-diffusion-webui/extensions/sd-webui-additional-networks/scripts/additional_networks.py
Traceback (most recent call last):
  File "/content/stable-diffusion-webui/modules/scripts.py", line 347, in process
    script.process(p, *script_args)
  File "/content/stable-diffusion-webui/extensions/sd-webui-additional-networks/scripts/additional_networks.py", line 125, in process
    du_state_dict = load_file(model)
  File "/usr/local/lib/python3.8/dist-packages/safetensors/torch.py", line 98, in load_file
    with safe_open(filename, framework="pt", device=device) as f:
Exception: Error while deserializing header: HeaderTooLarge

Train Model has an error

I trained a model and after training I tried to test it on Stable Diffusion it gives me that error. How do I fix it?

[Question] multiple gpu

Batch training dreambooth ?

train_batch_size in dreambooth colab doesn't work

no metadata / メタデータファイルがありません: meta_lat.json

When I start the training I get this error

How to Resume Training

(INDONESIAN)

Halo kak, aku pengen resume training tapi masih bingung soal Pre-Trained Model yang harus dipakai buat resume, sebelumnya pakai AnythingV3-Pruned terus buat 10000 Steps, buat resume mending make last.ckpt atau AnythingV3-Pruned lagi terus dikasih folder last-state nya?

Btw kalo boleh minta ID Discord kakak dong, pengen tanya-tanya

(ENGLISH)
Hello I want a training resume but am still confused about the Pre-Trained Model that should be used for the resume, previously used AnythingV3-Pruned and then made 10000 Steps, for the resume is it better to make last.ckpt or AnythingV3-Pruned again and then give the last-state folder?

Btw, if I may ask for your Discord ID, I want to ask questions.

Run on Runpod / Vast.ai

Is there any way we can run this on a runpod or vast.ai instance? I tried but got suck along the way...

Where is the small LoRA model at not this 2.43GB thing?

I use automatic1111, and he expects a .pt and the 60-150mb file not an entire model of 2.43GB. I f I wanted that I would just use Dreambooth.

Custom Tag not working

I get this error.

/content
Cloning into 'cafe-aesthetic-scorer'...
remote: Enumerating objects: 22, done.
remote: Counting objects: 100% (22/22), done.
remote: Compressing objects: 100% (17/17), done.
remote: Total 22 (delta 7), reused 4 (delta 2), pack-reused 0
Unpacking objects: 100% (22/22), 7.16 KiB | 1.79 MiB/s, done.
/content/cafe-aesthetic-scorer
usage: custom_tagger.py
       [-h]
       [--append]
       folder
       {txt,caption}
       tags
custom_tagger.py: error: argument extension: invalid choice: '{caption_extension}' (choose from 'txt', 'caption')

Colab notebook errors, and warnings and Tensorcore issues.

2023-01-31 09:23:14.849477: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-01-31 09:23:16.217681: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-01-31 09:23:16.217867: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-01-31 09:23:16.217901: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-01-31 09:23:20.825993: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-01-31 09:23:21.745524: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-01-31 09:23:21.745660: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-01-31 09:23:21.745681: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

Any ideas?

Will this adapt to local server(conda is best),not just colab?

Firstly I should say, the project is excellent! Anything model is commonly used by so many people.

Because you can not train model too much time (for example,30000 steps of training will take 17 hours) in Colab, and Colab needs to keep the browser open,so will this project adapt to local server?

about v2 training

Can anyone tell me how to define the "v2" option?
(I just realized that V2.1 is not yet supported, and will start over with 2.0)

Question: What is the difference between lora dreambooth vs lora fine-tuning?

And what is the recommended method to choose if one trains an artist style/character/concept?

Add more arguments for Cosine_with_restarts scheduler

Please add some parameters in your finetuner, like Minimum and maximum learning rate, rate of increasing/decreasing learning rate per cycle, numbers of epochs per cycle and warmup steps per cycle, like in Aria1th's monkey patch

https://github.com/aria1th/Hypernetwork-MonkeyPatch-Extension

To do: Delete image upscaler cell

I made a mistake, I forgot training script processed image latents instead of raw images one, and all images below min_bucket_reso and higher than max_bucket_reso automatically resized by prepare_buckets_and_latents.py and any training script, so image upscaler is basically useless if you use prepare_buckets_and_latents.py or --enable_bucket and --cache_latents for bucketing and making latents

Maybe I'll delete that cell in the future updates. Main repo updated too often, like in 4 days we have 3 iteration updates already. So I think we need to slow down the update for this repo, cuz it's pain to managing all 4 notebooks at once. It also will break everyone workflow in every updates.

Conclusion: You may want to skip image upscaler cell if you're doing bucketing

Sorry for the problem.

Hope to add augmentation

https://note.com/kohya_ss/n/nad3bce9a3622
I found that kohya has written face crop, color and flip augmentation, can you add it to colab please?

Here is another similar project: https://github.com/bmaltais/kohya_ss

Create JSON file error: list index out of range

In the LORA finetuner notebook.
I've used this successfully before, but with a smaller dataset.
Do I have too many images? tags too long?

/content/kohya-trainer/finetune
found 156 images.
new metadata will be created / 新しいメタデータファイルが作成されます
merge tags to metadata json.
79% 124/156 [00:00<00:00, 39430.91it/s]
Traceback (most recent call last):
File "merge_dd_tags_to_metadata.py", line 62, in
main(args)
File "merge_dd_tags_to_metadata.py", line 35, in main
tags = f.readlines()[0].strip()
IndexError: list index out of range

Runtime error: size mismatch?

Just starting this up as a first test - I think my directories and training data are properly named/right file types, but I'm getting this error:

Traceback (most recent call last):
File "train_network.py", line 539, in
train(args)
File "train_network.py", line 149, in train
text_encoder, vae, unet, _ = train_util.load_target_model(args, weight_dtype)
File "/content/kohya-trainer/library/train_util.py", line 1365, in load_target_model
text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint(args.v2, args.pretrained_model_name_or_path)
File "/content/kohya-trainer/library/model_util.py", line 880, in load_models_from_stable_diffusion_checkpoint
info = unet.load_state_dict(converted_unet_checkpoint)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel:
size mismatch for down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for mid_block.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for mid_block.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 1104, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)

Bug: Append Custom Tag not Working

Why is there no "--keep_tokens" field?

Sorry for the trivial question. Is it because of some problem that there is no above item in the "Kohya LoRA Dreambooth" notebook?
Would work if I add myself "--keep_tokens=1 " in the cell "5.3. Start LoRA Dreambooth"?

About dreambooth reg image

Not a technical problem, but about dreambooth train setting.

The project uses girl images from waifu-research-department,which are some images generated by Waifu Diffusion 1.3 (and I think it's not a high-quality dataset). Is this ok? Because in the original dreambooth algorithm the regulation images are generated from the trained model.
Setting config as folder name seems not a good method, for example if the instance prompt or class prompts are multi Danbooru style, like "1girl,solo,white hair". The project can not accept this setting.
Could the trainer generate test image while training? It will be better for selecting model rather than checking every model after finishing training?

I can fix the above questions by changing the code but I'm afraid that It will conflict with your updates. So I hope you can take them into consideration,thanks!

Xformers for A100

Can you help me locate the proper code in order to run the notebook on the colab + GPU models

I believe I will need a precompiled xformers wheel.

_pickle.UnpicklingError: invalid load key, '\x09'

Hi, I am using my own dataset on Colab by uploading 2 .png file manually to the /content/fine_tune/train_data, and get following error for 4.4. Aspect Ratio Bucketing and Cache Latents:

found 12 images.
loading existing metadata: /content/fine_tune/meta_clean.json
load VAE: /content/pre_trained_model/Anything-v3-pruned.ckpt
Traceback (most recent call last):
File "prepare_buckets_latents.py", line 179, in
main(args)
File "prepare_buckets_latents.py", line 57, in main
vae = model_util.load_vae(args.model_name_or_path, weight_dtype)
File "/usr/local/lib/python3.8/dist-packages/library/model_util.py", line 1112, in load_vae
else torch.load(vae_id, map_location="cuda"))
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 795, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 1002, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '\x09'.

Binning not working

/content/kohya-trainer/finetune
2023-01-03 22:10:21.044580: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-03 22:10:21.238660: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-01-03 22:10:22.008708: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-01-03 22:10:22.008815: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-01-03 22:10:22.008834: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
usage: prepare_buckets_latents.py
       [-h]
       [--v2]
       [--batch_size BATCH_SIZE]
       [--max_resolution MAX_RESOLUTION]
       [--min_bucket_reso MIN_BUCKET_RESO]
       [--max_bucket_reso MAX_BUCKET_RESO]
       [--mixed_precision {no,fp16,bf16}]
       [--full_path]
       [--flip_aug]
       train_data_dir
       in_json
       out_json
       model_name_or_path
prepare_buckets_latents.py: error: the following arguments are required: model_name_or_path

I did include the path to the model.

i have a questions for you

Hello, I want to use the Kohya Trainer V4
but I don't know how is the command to do a training with preservation of the original model

I trained a lora embeding with latest note book,but can't load it in auto1111 additonal network extension.

it give the error information below:
Traceback (most recent call last):
File "/content/stable-diffusion-webui/modules/scripts.py", line 338, in process
script.process(p, *script_args)
File "/content/stable-diffusion-webui/extensions/sd-webui-additional-networks/scripts/additional_networks.py", line 177, in process
network, info = lora_compvis.create_network_and_apply_compvis(du_state_dict, weight, text_encoder, unet)
File "/content/stable-diffusion-webui/extensions/sd-webui-additional-networks/scripts/lora_compvis.py", line 73, in create_network_and_apply_compvis
network_dim = min([s for s in size if s > 1])
ValueError: min() arg is an empty sequence

strangly,the embeding I trained hours before this one works fine.I assume that the lates update may be introduce in bugs.
I trained the embeding with a smaller network dimension：32，the bug seems something related to network dimension.

Tensorcore errors and it will not run

2023-01-26 19:54:55.306109: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-01-26 19:54:56.013115: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-01-26 19:54:56.013226: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-01-26 19:54:56.013246: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
found 0 images.
no metadata / メタデータファイルがありません: /content/drive/MyDrive/fine_tune/meta_clean.json

/content/drive/MyDrive/kohya-trainer/finetune
2023-01-26 20:01:45.449924: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-01-26 20:01:46.157549: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-01-26 20:01:46.157670: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-01-26 20:01:46.157697: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
load images from /content/drive/MyDrive/Aliens
found 4 images.
loading BLIP caption: https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_large_caption.pth
Downloading (…)solve/main/vocab.txt: 100% 232k/232k [00:00<00:00, 263kB/s]
Downloading (…)okenizer_config.json: 100% 28.0/28.0 [00:00<00:00, 10.9kB/s]
Downloading (…)lve/main/config.json: 100% 570/570 [00:00<00:00, 212kB/s]
100% 1.66G/1.66G [01:14<00:00, 24.0MB/s]
load checkpoint from https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_large_caption.pth
BLIP loaded
0% 0/4 [00:00<?, ?it/s]convert image mode RGBA to RGB: /content/drive/MyDrive/Aliens/maxresdefault.png
25% 1/4 [00:00<00:02, 1.03it/s]convert image mode RGBA to RGB: /content/drive/MyDrive/Aliens/newsgeek.png
50% 2/4 [00:01<00:01, 1.03it/s]convert image mode RGBA to RGB: /content/drive/MyDrive/Aliens/FGddysgUYAMQ-sJ.png
100% 4/4 [00:03<00:00, 1.06it/s]
done!
2023-01-26 20:03:42.032881: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-01-26 20:03:42.958994: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-01-26 20:03:42.959110: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-01-26 20:03:42.959128: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
downloading wd14 tagger model from hf_hub
/usr/local/lib/python3.8/dist-packages/huggingface_hub/file_download.py:1020: FutureWarning: The force_filename parameter is deprecated as a new caching system, which keeps the filenames as they are on the Hub, is now in place.
warnings.warn(
Downloading (…)"keras_metadata.pb";: 100% 328k/328k [00:00<00:00, 25.9MB/s]
Downloading (…)"saved_model.pb";: 100% 3.81M/3.81M [00:00<00:00, 134MB/s]
Downloading (…)in/selected_tags.csv: 100% 174k/174k [00:00<00:00, 259kB/s]
Downloading (…)ata-00000-of-00001";: 100% 365M/365M [00:01<00:00, 280MB/s]
Downloading (…)"variables.index";: 100% 13.8k/13.8k [00:00<00:00, 6.07MB/s]
found 4 images.
loading model and labels
2023-01-26 20:03:53.629195: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
WARNING:tensorflow:No training configuration found in save file, so the model was not compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in save file, so the model was not compiled. Compile it manually.
100% 4/4 [00:00<00:00, 23.32it/s]
done!

Went on ahead and tried to train to get these:

Looks like something is wrong and possibly using the CPU because this is longer than even a full on Dreambooth training.

RuntimeError: CUDA error: initialization error

full log:

/content/kohya-trainer
2023-01-25 01:51:57.145001: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-01-25 01:51:58.565155: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-01-25 01:51:58.565775: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-01-25 01:51:58.565804: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-01-25 01:52:03.052900: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-01-25 01:52:03.738588: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-01-25 01:52:03.738695: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-01-25 01:52:03.738715: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
prepare tokenizer
update token length: 225
Use DreamBooth method.
prepare train images.
found directory 5_shiverbooru shiver contains 171 image files
855 train images with repeating.
prepare reg images.
found directory 1_shiver contains 0 image files
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
loading image sizes.
100% 171/171 [00:00<00:00, 6699.35it/s]
make buckets
number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む）
bucket 0: resolution (256, 832), count: 0
bucket 1: resolution (256, 896), count: 0
bucket 2: resolution (256, 960), count: 0
bucket 3: resolution (256, 1024), count: 0
bucket 4: resolution (320, 704), count: 25
bucket 5: resolution (320, 768), count: 0
bucket 6: resolution (384, 640), count: 205
bucket 7: resolution (448, 576), count: 445
bucket 8: resolution (512, 512), count: 140
bucket 9: resolution (576, 448), count: 30
bucket 10: resolution (640, 384), count: 10
bucket 11: resolution (704, 320), count: 0
bucket 12: resolution (768, 320), count: 0
bucket 13: resolution (832, 256), count: 0
bucket 14: resolution (896, 256), count: 0
bucket 15: resolution (960, 256), count: 0
bucket 16: resolution (1024, 256), count: 0
mean ar error (without repeats): 0.05386308817472627
prepare accelerator
Using accelerator 0.15.0 or above.
load StableDiffusion checkpoint
loading u-net: <All keys matched successfully>
loading vae: <All keys matched successfully>
Some weights of the model checkpoint at openai/clip-vit-large-patch14 were not used when initializing CLIPTextModel: ['vision_model.encoder.layers.17.layer_norm1.bias', 'vision_model.encoder.layers.12.self_attn.k_proj.bias', 'vision_model.encoder.layers.21.mlp.fc2.weight', 'vision_model.encoder.layers.5.self_attn.out_proj.weight', 'vision_model.encoder.layers.17.self_attn.q_proj.weight', 'vision_model.encoder.layers.3.self_attn.out_proj.weight', 'vision_model.encoder.layers.14.layer_norm1.weight', 'vision_model.encoder.layers.10.self_attn.q_proj.bias', 'vision_model.encoder.layers.11.self_attn.k_proj.weight', 'vision_model.encoder.layers.19.self_attn.k_proj.weight', 'vision_model.encoder.layers.4.self_attn.q_proj.weight', 'vision_model.encoder.layers.5.layer_norm2.bias', 'vision_model.encoder.layers.13.mlp.fc2.bias', 'vision_model.encoder.layers.5.self_attn.k_proj.weight', 'vision_model.encoder.layers.17.layer_norm1.weight', 'vision_model.encoder.layers.15.self_attn.k_proj.bias', 'vision_model.encoder.layers.18.mlp.fc2.weight', 'vision_model.encoder.layers.3.self_attn.k_proj.weight', 'vision_model.encoder.layers.11.mlp.fc1.weight', 'vision_model.encoder.layers.19.self_attn.out_proj.bias', 'vision_model.encoder.layers.15.self_attn.v_proj.bias', 'vision_model.encoder.layers.15.mlp.fc2.bias', 'vision_model.encoder.layers.4.self_attn.v_proj.weight', 'vision_model.encoder.layers.16.self_attn.v_proj.bias', 'vision_model.encoder.layers.10.layer_norm2.bias', 'vision_model.encoder.layers.20.mlp.fc1.weight', 'vision_model.encoder.layers.8.self_attn.v_proj.weight', 'vision_model.encoder.layers.15.layer_norm2.bias', 'vision_model.encoder.layers.10.self_attn.k_proj.weight', 'vision_model.encoder.layers.14.self_attn.k_proj.weight', 'vision_model.encoder.layers.10.self_attn.q_proj.weight', 'vision_model.encoder.layers.9.self_attn.v_proj.bias', 'vision_model.encoder.layers.19.self_attn.out_proj.weight', 'vision_model.encoder.layers.1.layer_norm1.weight', 'vision_model.encoder.layers.10.mlp.fc1.bias', 'vision_model.encoder.layers.13.self_attn.v_proj.weight', 'vision_model.encoder.layers.17.mlp.fc2.bias', 'vision_model.encoder.layers.10.layer_norm1.bias', 'vision_model.encoder.layers.9.self_attn.v_proj.weight', 'vision_model.encoder.layers.0.self_attn.k_proj.weight', 'vision_model.embeddings.class_embedding', 'vision_model.encoder.layers.15.self_attn.q_proj.bias', 'vision_model.encoder.layers.5.layer_norm1.bias', 'vision_model.encoder.layers.15.layer_norm1.bias', 'vision_model.encoder.layers.7.mlp.fc1.bias', 'vision_model.encoder.layers.14.self_attn.v_proj.bias', 'vision_model.encoder.layers.2.layer_norm2.bias', 'vision_model.encoder.layers.11.mlp.fc2.weight', 'vision_model.encoder.layers.16.self_attn.q_proj.weight', 'vision_model.encoder.layers.5.self_attn.out_proj.bias', 'vision_model.encoder.layers.8.self_attn.q_proj.bias', 'vision_model.encoder.layers.9.layer_norm1.bias', 'vision_model.encoder.layers.3.self_attn.q_proj.bias', 'vision_model.encoder.layers.19.mlp.fc1.weight', 'vision_model.encoder.layers.2.self_attn.v_proj.weight', 'vision_model.encoder.layers.12.self_attn.out_proj.weight', 'vision_model.encoder.layers.14.self_attn.k_proj.bias', 'vision_model.encoder.layers.13.layer_norm2.bias', 'vision_model.encoder.layers.21.mlp.fc1.bias', 'vision_model.encoder.layers.8.mlp.fc1.bias', 'vision_model.encoder.layers.4.mlp.fc1.weight', 'vision_model.encoder.layers.7.mlp.fc1.weight', 'vision_model.encoder.layers.20.mlp.fc1.bias', 'vision_model.encoder.layers.15.mlp.fc1.bias', 'vision_model.encoder.layers.22.self_attn.k_proj.weight', 'vision_model.encoder.layers.7.self_attn.out_proj.bias', 'vision_model.encoder.layers.21.layer_norm1.weight', 'vision_model.encoder.layers.13.mlp.fc2.weight', 'vision_model.encoder.layers.3.mlp.fc1.bias', 'vision_model.encoder.layers.18.self_attn.q_proj.weight', 'vision_model.encoder.layers.23.layer_norm2.bias', 'vision_model.encoder.layers.2.mlp.fc2.bias', 'vision_model.encoder.layers.10.self_attn.out_proj.bias', 'vision_model.encoder.layers.22.mlp.fc2.weight', 'vision_model.encoder.layers.2.layer_norm1.bias', 'vision_model.encoder.layers.19.mlp.fc2.weight', 'vision_model.encoder.layers.0.layer_norm1.bias', 'vision_model.encoder.layers.12.layer_norm2.bias', 'vision_model.encoder.layers.12.self_attn.q_proj.weight', 'vision_model.encoder.layers.6.mlp.fc2.bias', 'vision_model.encoder.layers.0.layer_norm2.weight', 'vision_model.encoder.layers.15.self_attn.k_proj.weight', 'vision_model.encoder.layers.14.self_attn.q_proj.bias', 'vision_model.encoder.layers.16.self_attn.v_proj.weight', 'vision_model.encoder.layers.3.layer_norm1.bias', 'vision_model.encoder.layers.10.self_attn.out_proj.weight', 'vision_model.encoder.layers.18.self_attn.k_proj.weight', 'vision_model.encoder.layers.10.mlp.fc1.weight', 'vision_model.encoder.layers.17.mlp.fc1.bias', 'vision_model.encoder.layers.9.self_attn.k_proj.weight', 'vision_model.encoder.layers.0.self_attn.out_proj.bias', 'logit_scale', 'vision_model.encoder.layers.23.self_attn.q_proj.weight', 'vision_model.encoder.layers.3.self_attn.q_proj.weight', 'vision_model.encoder.layers.15.self_attn.q_proj.weight', 'vision_model.encoder.layers.17.mlp.fc2.weight', 'vision_model.encoder.layers.18.self_attn.q_proj.bias', 'vision_model.encoder.layers.9.self_attn.out_proj.weight', 'vision_model.encoder.layers.23.self_attn.v_proj.bias', 'vision_model.encoder.layers.3.mlp.fc2.weight', 'vision_model.encoder.layers.3.layer_norm1.weight', 'vision_model.embeddings.patch_embedding.weight', 'vision_model.encoder.layers.6.layer_norm2.bias', 'vision_model.encoder.layers.6.self_attn.out_proj.weight', 'vision_model.encoder.layers.23.layer_norm2.weight', 'vision_model.encoder.layers.19.self_attn.q_proj.weight', 'vision_model.encoder.layers.17.self_attn.q_proj.bias', 'vision_model.encoder.layers.18.mlp.fc1.bias', 'vision_model.encoder.layers.5.mlp.fc2.weight', 'vision_model.encoder.layers.11.self_attn.q_proj.weight', 'vision_model.encoder.layers.20.layer_norm1.bias', 'vision_model.encoder.layers.0.mlp.fc2.bias', 'vision_model.encoder.layers.11.self_attn.out_proj.weight', 'vision_model.encoder.layers.17.mlp.fc1.weight', 'vision_model.encoder.layers.13.layer_norm2.weight', 'vision_model.encoder.layers.4.layer_norm1.weight', 'vision_model.encoder.layers.14.self_attn.v_proj.weight', 'vision_model.embeddings.position_embedding.weight', 'vision_model.encoder.layers.13.self_attn.k_proj.bias', 'vision_model.encoder.layers.8.self_attn.out_proj.weight', 'vision_model.encoder.layers.17.self_attn.out_proj.bias', 'vision_model.encoder.layers.20.self_attn.q_proj.bias', 'vision_model.encoder.layers.21.layer_norm2.weight', 'vision_model.encoder.layers.12.mlp.fc1.bias', 'vision_model.encoder.layers.13.layer_norm1.bias', 'vision_model.encoder.layers.14.mlp.fc1.bias', 'vision_model.encoder.layers.9.mlp.fc1.weight', 'vision_model.encoder.layers.13.layer_norm1.weight', 'vision_model.encoder.layers.1.self_attn.q_proj.weight', 'vision_model.encoder.layers.5.self_attn.v_proj.weight', 'vision_model.encoder.layers.6.self_attn.k_proj.bias', 'vision_model.encoder.layers.20.self_attn.k_proj.weight', 'vision_model.encoder.layers.21.self_attn.k_proj.bias', 'vision_model.encoder.layers.0.self_attn.k_proj.bias', 'vision_model.encoder.layers.21.self_attn.k_proj.weight', 'vision_model.encoder.layers.15.self_attn.out_proj.bias', 'vision_model.encoder.layers.7.self_attn.q_proj.bias', 'vision_model.encoder.layers.22.self_attn.out_proj.bias', 'vision_model.encoder.layers.5.mlp.fc1.bias', 'vision_model.encoder.layers.22.layer_norm2.weight', 'vision_model.encoder.layers.3.layer_norm2.bias', 'vision_model.encoder.layers.5.mlp.fc1.weight', 'vision_model.encoder.layers.4.layer_norm1.bias', 'vision_model.encoder.layers.22.mlp.fc1.weight', 'vision_model.encoder.layers.19.self_attn.v_proj.weight', 'vision_model.encoder.layers.21.self_attn.v_proj.bias', 'vision_model.encoder.layers.16.self_attn.k_proj.weight', 'vision_model.encoder.layers.12.layer_norm1.weight', 'vision_model.encoder.layers.19.layer_norm2.bias', 'vision_model.encoder.layers.18.layer_norm2.bias', 'vision_model.encoder.layers.12.self_attn.k_proj.weight', 'vision_model.encoder.layers.12.self_attn.out_proj.bias', 'vision_model.encoder.layers.17.layer_norm2.weight', 'vision_model.encoder.layers.2.self_attn.k_proj.weight', 'vision_model.encoder.layers.7.self_attn.k_proj.weight', 'vision_model.encoder.layers.23.mlp.fc1.bias', 'vision_model.encoder.layers.1.mlp.fc1.bias', 'vision_model.encoder.layers.14.mlp.fc2.weight', 'vision_model.encoder.layers.2.self_attn.out_proj.weight', 'vision_model.encoder.layers.11.self_attn.q_proj.bias', 'vision_model.encoder.layers.20.self_attn.v_proj.bias', 'vision_model.encoder.layers.2.layer_norm1.weight', 'vision_model.encoder.layers.8.layer_norm2.weight', 'vision_model.encoder.layers.12.self_attn.q_proj.bias', 'vision_model.encoder.layers.0.layer_norm2.bias', 'vision_model.encoder.layers.21.mlp.fc1.weight', 'vision_model.encoder.layers.18.self_attn.k_proj.bias', 'vision_model.encoder.layers.11.layer_norm1.weight', 'vision_model.encoder.layers.8.mlp.fc1.weight', 'vision_model.encoder.layers.3.mlp.fc1.weight', 'vision_model.encoder.layers.11.mlp.fc1.bias', 'vision_model.encoder.layers.18.layer_norm1.bias', 'vision_model.encoder.layers.14.mlp.fc2.bias', 'vision_model.encoder.layers.1.self_attn.v_proj.weight', 'vision_model.encoder.layers.23.self_attn.k_proj.bias', 'vision_model.encoder.layers.7.self_attn.v_proj.bias', 'vision_model.encoder.layers.3.self_attn.v_proj.bias', 'vision_model.encoder.layers.21.self_attn.q_proj.bias', 'vision_model.encoder.layers.22.self_attn.q_proj.weight', 'vision_model.encoder.layers.7.mlp.fc2.weight', 'vision_model.encoder.layers.5.layer_norm1.weight', 'vision_model.encoder.layers.3.layer_norm2.weight', 'vision_model.encoder.layers.20.layer_norm2.weight', 'vision_model.encoder.layers.1.self_attn.k_proj.weight', 'vision_model.encoder.layers.21.self_attn.q_proj.weight', 'vision_model.encoder.layers.6.mlp.fc1.bias', 'vision_model.encoder.layers.19.self_attn.q_proj.bias', 'vision_model.encoder.layers.11.self_attn.v_proj.weight', 'vision_model.encoder.layers.13.self_attn.out_proj.bias', 'vision_model.encoder.layers.15.mlp.fc2.weight', 'vision_model.encoder.layers.21.self_attn.v_proj.weight', 'vision_model.encoder.layers.17.self_attn.k_proj.bias', 'vision_model.encoder.layers.13.self_attn.out_proj.weight', 'vision_model.encoder.layers.14.layer_norm2.weight', 'vision_model.encoder.layers.19.self_attn.v_proj.bias', 'vision_model.encoder.layers.16.self_attn.out_proj.weight', 'vision_model.encoder.layers.18.self_attn.v_proj.weight', 'vision_model.encoder.layers.2.mlp.fc2.weight', 'vision_model.encoder.layers.7.layer_norm1.bias', 'vision_model.encoder.layers.15.mlp.fc1.weight', 'vision_model.encoder.layers.23.layer_norm1.weight', 'vision_model.encoder.layers.14.mlp.fc1.weight', 'vision_model.encoder.layers.23.self_attn.k_proj.weight', 'vision_model.encoder.layers.4.self_attn.q_proj.bias', 'vision_model.encoder.layers.2.layer_norm2.weight', 'vision_model.encoder.layers.16.mlp.fc2.weight', 'vision_model.encoder.layers.8.layer_norm2.bias', 'vision_model.encoder.layers.20.self_attn.v_proj.weight', 'vision_model.encoder.layers.11.layer_norm2.bias', 'vision_model.encoder.layers.7.self_attn.out_proj.weight', 'vision_model.encoder.layers.4.self_attn.out_proj.bias', 'vision_model.post_layernorm.bias', 'vision_model.encoder.layers.15.self_attn.out_proj.weight', 'vision_model.encoder.layers.13.self_attn.q_proj.weight', 'vision_model.encoder.layers.17.self_attn.k_proj.weight', 'vision_model.encoder.layers.20.layer_norm2.bias', 'vision_model.encoder.layers.13.self_attn.q_proj.bias', 'vision_model.encoder.layers.20.self_attn.out_proj.weight', 'vision_model.encoder.layers.5.self_attn.v_proj.bias', 'vision_model.encoder.layers.8.layer_norm1.bias', 'vision_model.encoder.layers.1.self_attn.out_proj.bias', 'vision_model.encoder.layers.8.self_attn.out_proj.bias', 'vision_model.encoder.layers.3.mlp.fc2.bias', 'vision_model.encoder.layers.8.self_attn.k_proj.weight', 'vision_model.encoder.layers.16.mlp.fc1.weight', 'vision_model.post_layernorm.weight', 'vision_model.encoder.layers.4.layer_norm2.weight', 'vision_model.encoder.layers.6.self_attn.q_proj.weight', 'vision_model.encoder.layers.22.self_attn.v_proj.weight', 'vision_model.encoder.layers.1.self_attn.k_proj.bias', 'vision_model.pre_layrnorm.weight', 'vision_model.encoder.layers.2.self_attn.out_proj.bias', 'vision_model.encoder.layers.13.mlp.fc1.weight', 'vision_model.encoder.layers.17.self_attn.out_proj.weight', 'vision_model.encoder.layers.19.layer_norm1.weight', 'vision_model.encoder.layers.12.self_attn.v_proj.bias', 'vision_model.encoder.layers.5.self_attn.q_proj.bias', 'vision_model.encoder.layers.19.layer_norm1.bias', 'vision_model.encoder.layers.20.self_attn.q_proj.weight', 'vision_model.encoder.layers.23.layer_norm1.bias', 'vision_model.encoder.layers.22.layer_norm1.bias', 'vision_model.encoder.layers.9.mlp.fc2.bias', 'vision_model.encoder.layers.23.mlp.fc2.bias', 'vision_model.encoder.layers.0.self_attn.out_proj.weight', 'vision_model.encoder.layers.12.layer_norm1.bias', 'vision_model.encoder.layers.22.self_attn.k_proj.bias', 'vision_model.encoder.layers.6.self_attn.k_proj.weight', 'vision_model.encoder.layers.1.self_attn.out_proj.weight', 'vision_model.encoder.layers.9.self_attn.out_proj.bias', 'vision_model.encoder.layers.3.self_attn.v_proj.weight', 'vision_model.encoder.layers.18.layer_norm1.weight', 'vision_model.encoder.layers.3.self_attn.k_proj.bias', 'vision_model.encoder.layers.22.layer_norm1.weight', 'vision_model.encoder.layers.5.self_attn.k_proj.bias', 'vision_model.encoder.layers.0.self_attn.q_proj.bias', 'vision_model.encoder.layers.20.layer_norm1.weight', 'vision_model.encoder.layers.21.self_attn.out_proj.weight', 'vision_model.encoder.layers.15.layer_norm1.weight', 'vision_model.encoder.layers.18.self_attn.out_proj.weight', 'vision_model.encoder.layers.20.self_attn.out_proj.bias', 'vision_model.encoder.layers.6.mlp.fc2.weight', 'vision_model.encoder.layers.10.self_attn.v_proj.weight', 'vision_model.encoder.layers.7.layer_norm2.weight', 'vision_model.encoder.layers.7.layer_norm2.bias', 'vision_model.encoder.layers.23.self_attn.out_proj.bias', 'vision_model.encoder.layers.12.self_attn.v_proj.weight', 'vision_model.encoder.layers.12.layer_norm2.weight', 'vision_model.encoder.layers.11.self_attn.k_proj.bias', 'vision_model.encoder.layers.19.self_attn.k_proj.bias', 'vision_model.encoder.layers.0.self_attn.q_proj.weight', 'vision_model.encoder.layers.4.mlp.fc2.bias', 'vision_model.encoder.layers.1.mlp.fc2.bias', 'vision_model.encoder.layers.10.mlp.fc2.bias', 'vision_model.encoder.layers.14.layer_norm1.bias', 'vision_model.encoder.layers.16.self_attn.out_proj.bias', 'vision_model.encoder.layers.0.self_attn.v_proj.bias', 'vision_model.encoder.layers.11.self_attn.v_proj.bias', 'vision_model.encoder.layers.17.self_attn.v_proj.weight', 'vision_model.encoder.layers.1.layer_norm2.bias', 'vision_model.encoder.layers.1.mlp.fc2.weight', 'vision_model.encoder.layers.9.self_attn.k_proj.bias', 'vision_model.encoder.layers.7.layer_norm1.weight', 'vision_model.encoder.layers.13.mlp.fc1.bias', 'vision_model.encoder.layers.16.self_attn.k_proj.bias', 'vision_model.encoder.layers.4.self_attn.k_proj.weight', 'vision_model.encoder.layers.2.mlp.fc1.weight', 'vision_model.encoder.layers.0.mlp.fc1.weight', 'vision_model.encoder.layers.12.mlp.fc1.weight', 'vision_model.encoder.layers.0.layer_norm1.weight', 'vision_model.encoder.layers.4.layer_norm2.bias', 'vision_model.encoder.layers.8.self_attn.q_proj.weight', 'vision_model.encoder.layers.16.self_attn.q_proj.bias', 'vision_model.encoder.layers.20.mlp.fc2.weight', 'vision_model.encoder.layers.8.mlp.fc2.bias', 'vision_model.encoder.layers.20.mlp.fc2.bias', 'vision_model.encoder.layers.15.layer_norm2.weight', 'vision_model.encoder.layers.11.layer_norm2.weight', 'vision_model.encoder.layers.4.self_attn.out_proj.weight', 'vision_model.encoder.layers.19.mlp.fc2.bias', 'vision_model.encoder.layers.15.self_attn.v_proj.weight', 'vision_model.encoder.layers.0.mlp.fc1.bias', 'vision_model.encoder.layers.21.mlp.fc2.bias', 'vision_model.encoder.layers.9.layer_norm2.bias', 'vision_model.encoder.layers.10.layer_norm2.weight', 'vision_model.encoder.layers.11.self_attn.out_proj.bias', 'vision_model.encoder.layers.8.layer_norm1.weight', 'vision_model.encoder.layers.22.mlp.fc2.bias', 'vision_model.encoder.layers.16.layer_norm1.weight', 'vision_model.encoder.layers.18.self_attn.v_proj.bias', 'vision_model.encoder.layers.14.self_attn.q_proj.weight', 'vision_model.encoder.layers.17.self_attn.v_proj.bias', 'vision_model.encoder.layers.10.self_attn.v_proj.bias', 'vision_model.encoder.layers.5.mlp.fc2.bias', 'vision_model.encoder.layers.22.mlp.fc1.bias', 'vision_model.encoder.layers.5.layer_norm2.weight', 'vision_model.encoder.layers.20.self_attn.k_proj.bias', 'vision_model.encoder.layers.18.layer_norm2.weight', 'vision_model.encoder.layers.6.self_attn.v_proj.weight', 'vision_model.encoder.layers.12.mlp.fc2.bias', 'vision_model.encoder.layers.4.self_attn.k_proj.bias', 'vision_model.encoder.layers.8.self_attn.v_proj.bias', 'vision_model.encoder.layers.6.layer_norm1.bias', 'vision_model.encoder.layers.17.layer_norm2.bias', 'vision_model.encoder.layers.2.self_attn.v_proj.bias', 'vision_model.encoder.layers.7.mlp.fc2.bias', 'vision_model.encoder.layers.1.layer_norm2.weight', 'vision_model.encoder.layers.7.self_attn.q_proj.weight', 'vision_model.encoder.layers.11.layer_norm1.bias', 'vision_model.encoder.layers.0.mlp.fc2.weight', 'vision_model.encoder.layers.9.self_attn.q_proj.weight', 'vision_model.encoder.layers.14.self_attn.out_proj.weight', 'vision_model.encoder.layers.18.self_attn.out_proj.bias', 'vision_model.encoder.layers.14.layer_norm2.bias', 'vision_model.encoder.layers.5.self_attn.q_proj.weight', 'vision_model.encoder.layers.6.self_attn.out_proj.bias', 'vision_model.encoder.layers.6.mlp.fc1.weight', 'vision_model.encoder.layers.7.self_attn.k_proj.bias', 'vision_model.encoder.layers.21.layer_norm1.bias', 'vision_model.encoder.layers.13.self_attn.v_proj.bias', 'vision_model.encoder.layers.23.self_attn.out_proj.weight', 'vision_model.encoder.layers.9.self_attn.q_proj.bias', 'vision_model.encoder.layers.18.mlp.fc2.bias', 'vision_model.encoder.layers.23.mlp.fc1.weight', 'vision_model.encoder.layers.6.layer_norm2.weight', 'vision_model.encoder.layers.8.self_attn.k_proj.bias', 'vision_model.encoder.layers.1.layer_norm1.bias', 'vision_model.encoder.layers.1.mlp.fc1.weight', 'vision_model.encoder.layers.9.mlp.fc2.weight', 'vision_model.encoder.layers.0.self_attn.v_proj.weight', 'vision_model.encoder.layers.22.self_attn.q_proj.bias', 'vision_model.encoder.layers.12.mlp.fc2.weight', 'vision_model.encoder.layers.9.layer_norm2.weight', 'vision_model.encoder.layers.21.layer_norm2.bias', 'vision_model.encoder.layers.2.self_attn.q_proj.weight', 'vision_model.encoder.layers.23.mlp.fc2.weight', 'vision_model.encoder.layers.2.self_attn.q_proj.bias', 'vision_model.encoder.layers.6.self_attn.q_proj.bias', 'vision_model.encoder.layers.19.mlp.fc1.bias', 'vision_model.encoder.layers.4.self_attn.v_proj.bias', 'vision_model.encoder.layers.9.mlp.fc1.bias', 'vision_model.pre_layrnorm.bias', 'vision_model.encoder.layers.22.self_attn.out_proj.weight', 'vision_model.encoder.layers.9.layer_norm1.weight', 'vision_model.encoder.layers.7.self_attn.v_proj.weight', 'vision_model.encoder.layers.6.layer_norm1.weight', 'vision_model.encoder.layers.1.self_attn.v_proj.bias', 'vision_model.encoder.layers.8.mlp.fc2.weight', 'vision_model.encoder.layers.16.layer_norm2.bias', 'vision_model.encoder.layers.1.self_attn.q_proj.bias', 'vision_model.encoder.layers.11.mlp.fc2.bias', 'vision_model.embeddings.position_ids', 'vision_model.encoder.layers.16.layer_norm1.bias', 'vision_model.encoder.layers.4.mlp.fc2.weight', 'vision_model.encoder.layers.10.self_attn.k_proj.bias', 'vision_model.encoder.layers.21.self_attn.out_proj.bias', 'vision_model.encoder.layers.4.mlp.fc1.bias', 'vision_model.encoder.layers.16.mlp.fc1.bias', 'vision_model.encoder.layers.23.self_attn.v_proj.weight', 'vision_model.encoder.layers.16.layer_norm2.weight', 'vision_model.encoder.layers.18.mlp.fc1.weight', 'visual_projection.weight', 'vision_model.encoder.layers.13.self_attn.k_proj.weight', 'vision_model.encoder.layers.10.mlp.fc2.weight', 'vision_model.encoder.layers.22.layer_norm2.bias', 'vision_model.encoder.layers.19.layer_norm2.weight', 'vision_model.encoder.layers.22.self_attn.v_proj.bias', 'vision_model.encoder.layers.2.mlp.fc1.bias', 'vision_model.encoder.layers.14.self_attn.out_proj.bias', 'text_projection.weight', 'vision_model.encoder.layers.16.mlp.fc2.bias', 'vision_model.encoder.layers.3.self_attn.out_proj.bias', 'vision_model.encoder.layers.10.layer_norm1.weight', 'vision_model.encoder.layers.2.self_attn.k_proj.bias', 'vision_model.encoder.layers.23.self_attn.q_proj.bias', 'vision_model.encoder.layers.6.self_attn.v_proj.bias']
- This IS expected if you are initializing CLIPTextModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing CLIPTextModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
loading text encoder: <All keys matched successfully>
Replace CrossAttention.forward to use xformers
caching latents.
100% 171/171 [00:26<00:00,  6.38it/s]
import network module: networks.lora
create LoRA for Text Encoder: 72 modules.
create LoRA for U-Net: 192 modules.
enable LoRA for text encoder
enable LoRA for U-Net
prepare optimizer, data loader etc.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 112
CUDA SETUP: Loading binary /usr/local/lib/python3.8/dist-packages/bitsandbytes/libbitsandbytes_cuda112.so...
use 8-bit Adam optimizer
override steps. steps for 10 epochs is / 指定エポックまでのステップ数: 4290
running training / 学習開始
  num train images * repeats / 学習画像の数×繰り返し回数: 855
  num reg images / 正則化画像の数: 0
  num batches per epoch / 1epochのバッチ数: 429
  num epochs / epoch数: 10
  batch size per device / バッチサイズ: 2
  total train batch size (with parallel & distributed & accumulation) / 総バッチサイズ（並列学習、勾配合計含む）: 2
  gradient accumulation steps / 勾配を合計するステップ数 = 1
  total optimization steps / 学習ステップ数: 4290
steps:   0% 0/4290 [00:00<?, ?it/s]epoch 1/10
Traceback (most recent call last):
  File "train_network.py", line 424, in <module>
    train(args)
  File "train_network.py", line 274, in train
    for step, batch in enumerate(train_dataloader):
  File "/usr/local/lib/python3.8/dist-packages/accelerate/data_loader.py", line 375, in __iter__
    current_batch = next(dataloader_iter)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 628, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1333, in _next_data
    return self._process_data(data)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1359, in _process_data
    data.reraise()
  File "/usr/local/lib/python3.8/dist-packages/torch/_utils.py", line 543, in reraise
    raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 58, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/content/kohya-trainer/library/train_util.py", line 440, in __getitem__
    example['latents'] = torch.stack(latents_list) if latents_list[0] is not None else None
RuntimeError: CUDA error: initialization error
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

steps:   0% 0/4290 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 1104, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 567, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_network.py', '--network_dim=128', '--network_module=networks.lora', '--text_encoder_lr=5e-05', '--pretrained_model_name_or_path=/content/pre_trained_model/Animefull-final-pruned.ckpt', '--caption_extension=.txt', '--train_data_dir=/content/dreambooth/train_shiver', '--reg_data_dir=/content/dreambooth/reg_shiver', '--output_dir=/content/dreambooth/output', '--enable_bucket', '--prior_loss_weight=1.0', '--output_name=shiver', '--mixed_precision=fp16', '--save_precision=fp16', '--save_every_n_epochs=1', '--save_model_as=safetensors', '--resolution=512', '--cache_latents', '--train_batch_size=2', '--max_token_length=225', '--use_8bit_adam', '--learning_rate=0.0001', '--max_train_epochs=10', '--gradient_accumulation_steps=1', '--clip_skip=2', '--logging_dir=/content/fine_tune/logs', '--log_prefix=shiver', '--shuffle_caption', '--xformers']' returned non-zero exit status 1.

FATAL: this function is for sm80, but was built for sm750

Hi, I'm using colab pro with premium.

I had this error but once I got rid of what was inside the additional_argument, it worked.

FATAL: this function is for sm80, but was built for sm750
FATAL: this function is for sm80, but was built for sm750
FATAL: this function is for sm80, but was built for sm750
FATAL: this function is for sm80, but was built for sm750
FATAL: this function is for sm80, but was built for sm750
FATAL: this function is for sm80, but was built for sm750
FATAL: this function is for sm80, but was built for sm750
FATAL: this function is for sm80, but was built for sm750
FATAL: this function is for sm80, but was built for sm750
FATAL: this function is for sm80, but was built for sm750
FATAL: this function is for sm80, but was built for sm750

I see another round of this errors when I try to inference it, but no idea how to fix it. Is this xformers related??

Missing dependencies, cannot start training

cell 4.2:
2023-01-23 19:54:33.000677: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-01-23 19:54:34.273687: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-01-23 19:54:34.273927: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-01-23 19:54:34.273957: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
downloading wd14 tagger model from hf_hub
/usr/local/lib/python3.8/dist-packages/huggingface_hub/file_download.py:982: FutureWarning: The force_filename parameter is deprecated as a new caching system, which keeps the filenames as they are on the Hub, is now in place.
warnings.warn(
Downloading: 100% 328k/328k [00:00<00:00, 5.77MB/s]
Downloading: 100% 3.81M/3.81M [00:00<00:00, 15.7MB/s]
Downloading: 100% 174k/174k [00:00<00:00, 3.12MB/s]
Downloading: 100% 365M/365M [00:04<00:00, 90.2MB/s]
Downloading: 100% 13.8k/13.8k [00:00<00:00, 19.1MB/s]
found 1244 images.
loading model and labels
2023-01-23 19:54:43.276085: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
WARNING:tensorflow:No training configuration found in save file, so the model was not compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in save file, so the model was not compiled. Compile it manually.
100% 1244/1244 [02:16<00:00, 9.14it/s]
done!

error on cell 5.2:

/content/kohya-trainer
+-------------------------------+----------------------------------------------------+
| Hyperparameter | Value |
+-------------------------------+----------------------------------------------------+
| v2 | True |
| v_parameterization | False |
| pretrained_model_name_or_path | /content/pre_trained_model/Anything-v4-pruned.ckpt |
| vae | False |
| train_data_dir | /content/fine_tune/train_data |
| in_json | /content/fine_tune/meta_lat.json |
| output_dir | /content/drive/MyDrive/fine_tune/output |
| resume_path | False |
| project_name | hito-komoru |
| mixed_precision | fp16 |
| save_precision | fp16 |
| save_every_n_epochs | 0 |
| save_last_n_epochs | 0 |
| save_model_as | ckpt |
| resolution | 512 |
| train_batch_size | 1 |
| max_token_length | 225 |
| train_text_encoder | False |
| use_8bit_adam | True |
| learning_rate | 2e-06 |
| dataset_repeats | 1 |
| max_train_steps | 5000 |
| seed | 0 |
| gradient_checkpointing | False |
| gradient_accumulation_steps | 1 |
| clip_skip | 2 |
| logging_dir | /content/fine_tune/logs |
| log_prefix | hito-komoru |
| additional_argument | --save_state --shuffle_caption --xformers |
+-------------------------------+----------------------------------------------------+
2023-01-23 20:09:37.571896: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-01-23 20:09:38.284900: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-01-23 20:09:38.285011: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-01-23 20:09:38.285033: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-01-23 20:09:41.168607: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-01-23 20:09:41.878917: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-01-23 20:09:41.879020: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-01-23 20:09:41.879038: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
prepare tokenizer
update token length: 225
loading existing metadata: /content/fine_tune/meta_lat.json
Traceback (most recent call last):
File "fine_tune.py", line 341, in
train(args)
File "fine_tune.py", line 33, in train
train_dataset = train_util.FineTuningDataset(args.in_json, args.train_batch_size, args.train_data_dir,
File "/content/kohya-trainer/library/train_util.py", line 592, in init
assert len(abs_path) >= 1, f"no image / 画像がありません: {abs_path}"
AssertionError: no image / 画像がありません: []
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 1104, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'fine_tune.py', '--v2', '--pretrained_model_name_or_path=/content/pre_trained_model/Anything-v4-pruned.ckpt', '--train_data_dir=/content/fine_tune/train_data', '--in_json=/content/fine_tune/meta_lat.json', '--output_dir=/content/drive/MyDrive/fine_tune/output', '--output_name=hito-komoru', '--mixed_precision=fp16', '--save_precision=fp16', '--save_model_as=ckpt', '--resolution=512', '--train_batch_size=1', '--max_token_length=225', '--use_8bit_adam', '--learning_rate=2e-06', '--dataset_repeats=1', '--max_train_steps=5000', '--gradient_accumulation_steps=1', '--logging_dir=/content/fine_tune/logs', '--log_prefix=hito-komoru', '--save_state', '--shuffle_caption', '--xformers']' returned non-zero exit status 1.

Image does not have a caption even though I used waifu diffusion auto tagger

here is what it looks like

I am new to this and I don't get why it isn't detecting that there are captions

feature req: add separate weight for unet and text encoder in the inference cell.

Fatal Error using colab when starting training: google colab

Been using the google colab, i keep getting a repeated error: FATAL: this function is for sm80, but was built for sm750 when trying to train my model. Been testing with low batch count (5-10 images)

Don't work in Colab since yesterday

Crashed on training step.

steps:   0% 0/10 [00:00<?, ?it/s]epoch 1/2
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 1104, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 567, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'fine_tune.py', '--pretrained_model_name_or_path=/content/pre_trained_model/Anything-v4-5-pruned.ckpt', '--in_json', '/content/meta_lat.json', '--train_data_dir=/content/kohya-trainer/img', '--output_dir=/content/drive/MyDrive/fine_tune/output', '--shuffle_caption', '--train_batch_size=1', '--learning_rate=2e-6', '--lr_scheduler=constant', '--max_token_length=225', '--clip_skip=2', '--mixed_precision=fp16', '--dataset_repeats', '1', '--max_train_steps=10', '--use_8bit_adam', '--xformers', '--gradient_accumulation_steps', '1', '--save_model_as=ckpt', '--save_state', '--resume', '/content/drive/MyDrive/fine_tune/last-state', '--save_every_n_epochs=1', '--save_precision=float', '--logging_dir=/content/fine_tune/training_logs', '--log_prefix', 'fine-tune-style1']' died with <Signals.SIGKILL: 9>.````

model.ckpt file not found

I am having an issue where, after training is complete, I am not finding the model.ckpt file in the expected location (/content/kohya-trainer/fine-tuned/model.ckpt). Instead, I am only finding a folder (/content/kohya-trainer/fine-tuned/last) in this location. Is there a reason why the model.ckpt file is not being generated or saved in the expected location?

Please help on output parameters and resuming process

I would like output to be the ~150MB “additional networks” that can be mounted via extensions. However, after recent updates I can’t always get that: it’s always the ~1.5GB “full model” (I assume), only sometimes can I get the desired 150MB file. I would like to know how to get the 150MB files consistently.

Also, if I’m understanding correctly, the previous practice of training 5_conceptA and 10_conceptB in each folders has been updeted into training conceptA 5 times, then conceptB 10 times from a resumed process. How can I resume that process, or is there an easier way to train both concepts in one click?

Vast.ai torch 1.13.1+cu116 issue

Impossible to run trainer on vast.ai on pytorch/pytorch_1.13.1-cuda116-cudnn8-runtime docker image with xformers-0.0.16rc397-cp310-cp310-manylinux2014_x86_64.whl from facebook (ubuntu-22.04-py3.10-torch1.13.1+cu117). With other prebuilded xformers (if manually install 1.12.1 + cuda116 and corresponding xformers) same story.

Replace CrossAttention.forward to use xformers
prepare optimizer, data loader etc.
==============================WARNING: DEPRECATED!==============================
WARNING! This version of bitsandbytes is deprecated. Please switch to pip install bitsandbytes and the new repo: https://github.com/TimDettmers/bitsandbytes
==============================WARNING: DEPRECATED!==============================
use 8-bit Adam optimizer
running training / 学習開始
num examples / サンプル数: 7117
num batches per epoch / 1epochのバッチ数: 7117
num epochs / epoch数: 5
batch size per device / バッチサイズ: 1
total train batch size (with parallel & distributed & accumulation) / 総バッチサイズ（並列学習、勾配合計含む）: 1
gradient ccumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 35000
steps: 0%| | 0/35000 [00:00<?, ?it/s]epoch 1/5
Traceback (most recent call last):
File "/workspace/kohya-trainer/fine_tune.py", line 1059, in
train(args)
File "/workspace/kohya-trainer/fine_tune.py", line 624, in train
accelerator.backward(loss)
File "/opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py", line 1314, in backward
self.scaler.scale(loss).backward(**kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
torch.autograd.backward(
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/function.py", line 267, in apply
return user_fn(self, *args)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 157, in backward
torch.autograd.backward(outputs_with_grad, args_with_grad)
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/function.py", line 267, in apply
return user_fn(self, *args)
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/function.py", line 414, in wrapper
outputs = fn(ctx, *args)
File "/opt/conda/lib/python3.10/site-packages/xformers/ops/fmha/init.py", line 111, in backward
grads = _memory_efficient_attention_backward(
File "/opt/conda/lib/python3.10/site-packages/xformers/ops/fmha/init.py", line 381, in _memory_efficient_attention_backward
grads = op.apply(ctx, inp, grad)
File "/opt/conda/lib/python3.10/site-packages/xformers/ops/fmha/cutlass.py", line 184, in apply
(grad_q, grad_k, grad_v,) = cls.OPERATOR(
File "/opt/conda/lib/python3.10/site-packages/torch/_ops.py", line 442, in call
return self._op(*args, **kwargs or {})
RuntimeError: CUDA error: invalid argument
steps: 0%| | 0/35000 [00:03<?, ?it/s]
Traceback (most recent call last):
File "/opt/conda/bin/accelerate", line 8, in
sys.exit(main())
File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1104, in launch_command
simple_launcher(args)
File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', 'fine_tune.py', '--pretrained_model_name_or_path=/workspace/pre_trained_model/Anything-V3.ckpt', '--in_json', '/workspace/meta_lat.json', '--train_data_dir=/workspace/img', '--output_dir=/workspace/output', '--shuffle_caption', '--train_batch_size=1', '--learning_rate=2e-6', '--lr_scheduler=constant', '--max_token_length=225', '--clip_skip=2', '--mixed_precision=fp16', '--max_train_steps=35000', '--use_8bit_adam', '--xformers', '--gradient_checkpointing', '--gradient_accumulation_steps', '1', '--save_model_as=ckpt', '--save_state', '--save_every_n_epochs=1', '--logging_dir=/workspace/fine_tune/training_logs', '--log_prefix', 'fine-tune-style1']' returned non-zero exit status 1.