Git Product home page Git Product logo

bk-sdm's Introduction

Block-removed Knowledge-distilled Stable Diffusion

Official codebase for BK-SDM: Architecturally Compressed Stable Diffusion for Efficient Text-to-Image Generation [ArXiv] [ICCV 2023 Demo Track] [ICML 2023 Workshop on ES-FoMo].

BK-SDMs are lightweight text-to-image (T2I) synthesis models:

  • Certain residual & attention blocks are eliminated from the U-Net of SD.
  • Distillation pretraining is conducted with very limited data, but it (surprisingly) remains effective.

⚡Quick Links: KD Pretraining | Evaluation on MS-COCO | DreamBooth Finetuning | Demo

Notice

Model Description

Installation

conda create -n bk-sdm python=3.8
conda activate bk-sdm
git clone https://github.com/Nota-NetsPresso/BK-SDM.git
cd BK-SDM
pip install -r requirements.txt

Note on the torch versions we've used:

  • torch 1.13.1 for MS-COCO evaluation & DreamBooth finetuning on a single 24GB RTX3090
  • torch 2.0.1 for KD pretraining on a single 80GB A100
    • If pretraining with a total batch size of 256 on A100 causes out-of-GPU-memory, check torch version & consider upgrading to torch>2.0.0.

Minimal Example with 🤗Diffusers

With the default PNDM scheduler and 50 denoising steps:

import torch
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("nota-ai/bk-sdm-small", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

prompt = "a golden vase with different flowers"
image = pipe(prompt).images[0]  
    
image.save("example.png")
An equivalent code (modifying solely the U-Net of SD-v1.4 while preserving its Text Encoder and Image Decoder):
import torch
from diffusers import StableDiffusionPipeline, UNet2DConditionModel

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)
pipe.unet = UNet2DConditionModel.from_pretrained("nota-ai/bk-sdm-small", subfolder="unet", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

prompt = "a golden vase with different flowers"
image = pipe(prompt).images[0]  
    
image.save("example.png")

Distillation Pretraining

Our code was based on train_text_to_image.py of Diffusers 0.15.0. To access the latest version, use this link.

[Optional] Toy to check runnability

bash scripts/get_laion_data.sh preprocessed_11k
bash scripts/kd_train_toy.sh
Note
  • A toy dataset (11K img-txt pairs) is downloaded at ./data/laion_aes/preprocessed_11k (1.7GB in tar.gz; 1.8GB data folder).
  • A toy script can be used to verify the code executability and find the batch size that matches your GPU. With a batch size of 8 (=4×2), training BK-SDM-Base for 20 iterations takes about 5 minutes and 22GB GPU memory.

Single-gpu training for BK-SDM-{Base, Small, Tiny}

bash scripts/get_laion_data.sh preprocessed_212k
bash scripts/kd_train.sh
Note
  • The dataset with 212K (=0.22M) pairs is downloaded at ./data/laion_aes/preprocessed_212k (18GB tar.gz; 20GB data folder).
  • With a batch size of 256 (=4×64), training BK-SDM-Base for 50K iterations takes about 300 hours and 53GB GPU memory. With a batch size of 64 (=4×16), it takes 60 hours and 28GB GPU memory.
  • Training BK-SDM-{Small, Tiny} results in 5∼10% decrease in GPU memory usage.

Single-gpu training for BK-SDM-{Base-2M, Small-2M, Tiny-2M}

bash scripts/get_laion_data.sh preprocessed_2256k
bash scripts/kd_train_2m.sh
Note
  • The dataset with 2256K (=2.3M) pairs is downloaded at ./data/laion_aes/preprocessed_2256k (182GB tar.gz; 204GB data folder).
  • Except the dataset, kd_train_2m.sh is the same as kd_train.sh; given the same number of iterations, the training computation remains identical.

Multi-gpu training

bash scripts/kd_train_toy_ddp.sh
Note
  • Multi-GPU training is supported (sample results: link), although all experiments for our paper were conducted using a single GPU. Thanks @youngwanLEE for sharing the script :)

Compression of SD-v2 with BK-SDM

bash scripts/kd_train_v2-base-im512.sh
bash scripts/kd_train_v2-im768.sh

# For inference, see: 'scripts/generate_with_trained_unet.sh'  

Note on training code

Key segments for KD training
  • Define Student U-Net by adjusting config.json [link]
  • Initialize Student U-Net by copying Teacher U-Net's weights [link]
  • Define hook locations for feature KD [link]
  • Define losses for feature-and-output KD [link]
Key learning hyperparams
--unet_config_name "bk_small" # option: ["bk_base", "bk_small", "bk_tiny"]
--use_copy_weight_from_teacher # initialize student unet with teacher weights
--learning_rate 5e-05
--train_batch_size 64
--gradient_accumulation_steps 4
--lambda_sd 1.0
--lambda_kd_output 1.0
--lambda_kd_feat 1.0

Evaluation on MS-COCO Benchmark

We used the following codes to obtain the results on MS-COCO. After generating 512×512 images with the PNDM scheduler and 25 denoising steps, we downsampled them to 256×256 for computing scores.

Generation with released models (using BK-SDM-Small as default)

On a single 3090 GPU, '(2)' takes ~10 hours per model, and '(3)' takes a few minutes.

  • (1) Download metadata.csv and real_im256.npz:

    bash scripts/get_mscoco_files.sh
    
    # ./data/mscoco_val2014_30k/metadata.csv: 30K prompts from the MS-COCO validation set (used in '(2)')  
    # ./data/mscoco_val2014_41k_full/real_im256.npz: FID statistics of 41K real images (used in '(3)')
    Note on 'real_im256.npz'
    • Following the evaluation protocol [DALL·E, Imagen], the FID stat for real images was computed over the full validation set (41K images) of MS-COCO. A precomputed stat file is downloaded via '(1)' at ./data/mscoco_val2014_41k_full/real_im256.npz.
    • Additionally, real_im256.npz can be computed with python3 src/get_stat_mscoco_val2014.py, which downloads the whole images, resizes them to 256×256, and computes the FID stat.
  • (2) Generate 512×512 images over 30K prompts from the MS-COCO validation set → Resize them to 256×256:

    python3 src/generate.py 
    
    # python3 src/generate.py --model_id nota-ai/bk-sdm-base --save_dir ./results/bk-sdm-base
    # python3 src/generate.py --model_id nota-ai/bk-sdm-tiny --save_dir ./results/bk-sdm-tiny  

    [Batched generation] Increase --batch_sz (default: 1) for a faster inference at the cost of higher VRAM usage. Thanks @Godofnothing for providing this feature :)

    Click for inference cost details.
    • Setup: BK-SDM-Small on MS-COCO 30K image generation

    • We used an eval batch size of 1 for our paper results. Different batch sizes affect the sampling of random latent codes, resulting in slightly different generation scores.

      Eval Batch Size 1 2 4 8
      GPU Memory 4.9GB 6.3GB 11.3GB 19.6GB
      Generation Time 9.4h 7.9h 7.6h 7.3h
      FID 16.98 17.01 17.16 16.97
      IS 31.68 31.20 31.62 31.22
      CLIP Score 0.2677 0.2679 0.2677 0.2675
  • (3) Compute FID, IS, and CLIP score:

    bash scripts/eval_scores.sh
    
    # For the other models, modify the `./results/bk-sdm-*` path in the scripts to specify different models.

[After training] Generation with a trained U-Net

bash scripts/get_mscoco_files.sh
bash scripts/generate_with_trained_unet.sh

Results on Zero-shot MS-COCO 256×256 30K

See Results in MODEL_CARD.md

DreamBooth Finetuning with 🤗PEFT

Our lightweight SD backbones can be used for efficient personalized generation. DreamBooth refines text-to-image diffusion models given a small number of images. DreamBooth+LoRA can drastically reduce finetuning cost.

DreamBooth dataset

The dataset is downloaded at ./data/dreambooth/dataset [folder tree]: 30 subjects × 25 prompts × 4∼6 images.

git clone https://github.com/google/dreambooth ./data/dreambooth

DreamBooth finetuning (using BK-SDM-Base as default)

Our code was based on train_dreambooth.py of PEFT 0.1.0. To access the latest version, use this link.

  • (1) without LoRA — full finetuning & used in our paper
    bash scripts/finetune_full.sh # learning rate 1e-6
    bash scripts/generate_after_full_ft.sh
  • (2) with LoRA — parameter-efficient finetuning
    bash scripts/finetune_lora.sh # learning rate 1e-4
    bash scripts/generate_after_lora_ft.sh  
  • On a single 3090 GPU, finetuning takes 10~20 minutes per subject.

Results of Personalized Generation

See DreamBooth Results in MODEL_CARD.md

Gradio Demo

Check out our Gradio demo and the codes (main: app.py)!

[Aug/01/2023] featured in Hugging Face Spaces of the week 🔥 Spaces of the week

Core ML Weights

For iOS or macOS applications, we have converted our models to Core ML format. They are available at 🤗Hugging Face Models (nota-ai/coreml-bk-sdm) and can be used with Apple's Core ML Stable Diffusion library.

  • 4-sec inference on iPhone 14 (with 10 denoising steps): results

License

This project, along with its weights, is subject to the CreativeML Open RAIL-M license, which aims to mitigate any potential negative effects arising from the use of highly advanced machine learning systems. A summary of this license is as follows.

1. You can't use the model to deliberately produce nor share illegal or harmful outputs or content,
2. We claim no rights on the outputs you generate, you are free to use them and are accountable for their use which should not go against the provisions set in the license, and
3. You may re-distribute the weights and use the model commercially and/or as a service. If you do, please be aware you have to include the same use restrictions as the ones in the license and share a copy of the CreativeML OpenRAIL-M to all your users.

Acknowledgments

Citation

@article{kim2023architectural,
  title={BK-SDM: A Lightweight, Fast, and Cheap Version of Stable Diffusion},
  author={Kim, Bo-Kyeong and Song, Hyoung-Kyu and Castells, Thibault and Choi, Shinkook},
  journal={arXiv preprint arXiv:2305.15798},
  year={2023},
  url={https://arxiv.org/abs/2305.15798}
}
@article{kim2023bksdm,
  title={BK-SDM: Architecturally Compressed Stable Diffusion for Efficient Text-to-Image Generation},
  author={Kim, Bo-Kyeong and Song, Hyoung-Kyu and Castells, Thibault and Choi, Shinkook},
  journal={ICML Workshop on Efficient Systems for Foundation Models (ES-FoMo)},
  year={2023},
  url={https://openreview.net/forum?id=bOVydU0XKC}
}

bk-sdm's People

Contributors

bokyeong1015 avatar godofnothing avatar shinkookchoi avatar thibaultcastells avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bk-sdm's Issues

How to replicate this work offline

Hi,thanks for your great work!
I currently have an A100 GPU server that is not connected to the internet. I can configure the environment offline. **Can I replicate your work offline?**Could you please provide me with some guidance? Thank you.

OSError: Error no file named scheduler_config.json found in directory CompVis/stable-diffusion-v1-4

i download the stable-diffusion-v1-4 ckpt in compvis,but still have this problem, i have triied to install transformers==4.25 4.27 and so on,but didn't work, this is the error details

bash scripts/kd_train_toy.sh
The following values were not passed to accelerate launch and had defaults used instead:
--num_processes was set to a value of 1
--num_machines was set to a value of 1
--mixed_precision was set to a value of 'no'
--dynamo_backend was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
/home/lzj/miniconda3/envs/bk-sdm/lib/python3.8/site-packages/accelerate/accelerator.py:249: FutureWarning: logging_dir is deprecated and will be removed in version 0.18.0 of 🤗 Accelerate. Use project_dir instead.
warnings.warn(
./results/toy_bk_small/log_loss.csv
03/11/2024 21:34:33 - INFO - main - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: fp16

Traceback (most recent call last):
File "src/kd_train_text_to_image.py", line 914, in
main()
File "src/kd_train_text_to_image.py", line 429, in main
noise_scheduler = DDPMScheduler.from_pretrained(args.pretrained_model_name_or_path, subfolder="scheduler")
File "/home/lzj/miniconda3/envs/bk-sdm/lib/python3.8/site-packages/diffusers/schedulers/scheduling_utils.py", line 139, in from_pretrained
config, kwargs, commit_hash = cls.load_config(
File "/home/lzj/miniconda3/envs/bk-sdm/lib/python3.8/site-packages/diffusers/configuration_utils.py", line 331, in load_config
raise EnvironmentError(
OSError: Error no file named scheduler_config.json found in directory CompVis/stable-diffusion-v1-4.
Traceback (most recent call last):
File "/home/lzj/miniconda3/envs/bk-sdm/bin/accelerate", line 8, in
sys.exit(main())
File "/home/lzj/miniconda3/envs/bk-sdm/lib/python3.8/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/home/lzj/miniconda3/envs/bk-sdm/lib/python3.8/site-packages/accelerate/commands/launch.py", line 923, in launch_command
simple_launcher(args)
File "/home/lzj/miniconda3/envs/bk-sdm/lib/python3.8/site-packages/accelerate/commands/launch.py", line 579, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/lzj/miniconda3/envs/bk-sdm/bin/python', 'src/kd_train_text_to_image.py', '--pretrained_model_name_or_path', 'CompVis/stable-diffusion-v1-4', '--train_data_dir', '/home/lzj/work/data/preprocessed_11k', '--use_ema', '--resolution', '512', '--center_crop', '--random_flip', '--train_batch_size', '2', '--gradient_checkpointing', '--mixed_precision=fp16', '--learning_rate', '5e-05', '--max_grad_norm', '1', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--report_to=all', '--max_train_steps=20', '--seed', '1234', '--gradient_accumulation_steps', '4', '--checkpointing_steps', '5', '--valid_steps', '5', '--lambda_sd', '1.0', '--lambda_kd_output', '1.0', '--lambda_kd_feat', '1.0', '--use_copy_weight_from_teacher', '--unet_config_path', './src/unet_config', '--unet_config_name', 'bk_small', '--output_dir', './results/toy_bk_small']' returned non-zero exit status 1.

any plans for more models?

Greetings!

these tiny models are amazing! i love fp16 versions,
could u please in the future make models that are based on 1.5 and mixed with uncensored models such as lyriel or deliberate for better face and anatomy?

kind regards

Question about the lambda

Hi there,
It's me again, I am curious about whether you guys tried different combination of lambda for feat_loss and out_loss or maybe add a lambda for the task_loss?

From my training process, it seems that the feat_loss contributes most part of the total loss.

Scale of KD-feature loss for SD inpainting 1.5

Hi there,

I am trying to distill the Unet in SD inpainting 1.5 to a smaller Unet by using your code. (I replaced the pipeline to inpainting and the input data)
I have trained for 130K steps with batch size 64.
Right now the kd_feat_loss is around 20.

I am wondering what kd_feat_loss you have when you finished distill the Unet in your experiment?

Thank you.

Loading preprocessed_212k laion dataset without any response in terminal

Hi @bokyeong1015 , thanks for your great work!

I modified diffusers/train_text_to_image.py and used your fine-tuning strategy: on 212k subset of laion. But when I run the training code, loading dataset will consume too much time and there is no response in the terminal after even 40 minutes.... Is it caused by the large number of images or some bugs in my code?

    # In distributed training, the load_dataset function guarantees that only one local process can concurrently
    if args.dataset_name is not None:
        # Downloading and loading a dataset from the hub.
        dataset = load_dataset(
            args.dataset_name,
            args.dataset_config_name,
            cache_dir=args.cache_dir,
            data_dir=args.train_data_dir,
        )
    else:
        data_files = {}
        if args.train_data_dir is not None:
            data_files["train"] = os.path.join(args.train_data_dir, "**")
        print("*** load dataset: start")
        t0 = time.time()
        dataset = load_dataset(
            "imagefolder",
            # data_files=data_files,
            cache_dir=args.cache_dir,
            split="train",
            data_dir=args.train_data_dir,
        )
        print(f"*** load dataset: end --- {time.time()-t0} sec")

        # See more about loading custom images at
        # https://huggingface.co/docs/datasets/v2.4.0/en/image_load#imagefolder

    # Preprocessing the datasets.
    # We need to tokenize inputs and targets.
        
    # column_names = dataset["train"].column_names
    
    ##############################################################################################
    column_names = dataset.column_names
    image_column = column_names[0]
    caption_column = column_names[1]
    ###################################################################################################

This is the loading dataset code. How much time will 'load_dataset' function cost?

Thanks for your great work, looking forward to your reply!

Best wishes,
Qianli

About the training speed

I found that the total number of iterations for the training is 400,000. May I ask, how many days does it take for you to train a distilled model? I use 8*V100, I found that I can only complete around 3,800 iterations in one night (from 19:55 to 10:00 the next day).

Refine generation code

  • remove use_auth_token=True in StableDiffusionPipeline.from_pretrained [ref]
  • disable NSFW filter in recent diffusers versions [ref] [ref] for MS-COCO benchmark

Question of Dreambooth evaluation

Hi, thank you for sharing your awesome work ☺️
How to reproduce your Dreambooth quantitative performance in Table. 5?
Would you provide the evaluation code?

issue about training iterations

We note the readme show training BK-SDM-Base need 50K interations, while we find in the "kd_train.py" show --max_train_steps=400K , so can we think the 50K is good enough?

Generation with trained unet

response to #10 (comment)

I want to conduct zero-shot MSCOCO evaluation for my intermediate checkpoint trained with multi-GPU setting, I'm not sure how to denote my checkpoint.

Could you give me some hints for this?

In your instruction(2), you enter model_id.

Could I change the model_id to my checkpoint path?

However, I don't know which one should be denoted.

I guess the unet_ema/diffusion_pytorch_model.bin. Am I right?

Thanks in advance.

multi-gpu training error

Hi, I'm really impressed by your work and nice code.

When I ran the training code with multi-gpu setting, I encountered this error.

Traceback (most recent call last):
File "/home/user01/BK-SDM/src/kd_train_text_to_image.py", line 891, in
main()
File "/home/user01/BK-SDM/src/kd_train_text_to_image.py", line 766, in main
a_stu = acts_stu[m_stu]
KeyError: 'up_blocks.0'

Could you check this?

Thanks in advance :)

Discussion on experimental settings

[Inquiry]

hi, I tried this method, but found that the performance was very poor. My experimental configuration was to train on laion_11k data for 10k steps, and the unet is bk_tiny. And I also replaced the pipeline to inpainting and the input data. I would like to ask you for any good suggestions, thanks.

Queries

@bokyeong1015 hi thanks for sharing this wonderful work , i had few queries and request

  1. Can you please share ur checkpoint-45000 on one drive or google drive , i wanted to test it on the system as donot have the resources to train it on gpu system
  2. In ur paper u have mentioned deploying on Nvidia orin ? did u test it on any other platforms like Nvidia Agx / Nx/ nano if so whats the time taken on it
  3. When deploying it on nvidia orin did u used docker or straight up with hugging face models
  4. The Techniques used in this paper and the snap fusion can we bring in those in this code and can we expect to see some better improvements

image

Thanks n advanc

SDXL support?

Hi there!

I'd like to ask, do you have or plan to have support for the SDXL model? It's quite heavy and the process of making it more fast and lightweights would have insane benefits to the community.

Thanks for your work!

improved wandb logger

To incorporate the below feature

In addition, the base training script src/kd_train_text_to_image.py logs only the total loss to W&B and one may be interested in each particular contribution. I added image logging to W&B as well.

Repo update

  • Code for SD-V2 applicability
  • Readme & model card for SD-V2 applicability
    • Updated description & results
    • Updated package info
  • Credit BK-SDXL from KOALA

batched image generation

To incorporate the below feature

The original src/generate.py generates images one-by-one which leads to under utilization of GPU and as consequence, generation of 30k images takes a while. I've added batched generation of images to speed-up it.

Snapfusion seems to get better results?

Thanks for the generosity of open sourcing your work, but there was a previous work similar to yours, called Snapfusion, aimed at speeding up Stable diffusion.

From the results of their paper, they achieved better results through efficient-unet and step distillation, but unfortunately this work is not open source.

Do you have any opinion on this work? https://snap-research.github.io/SnapFusion/
image

Add DreamBooth finetuning

  • Goal: Efficient personalized generation with lightweight SD backbones
  • Method: DreamBooth finetuning without and with LoRA

Wonderful work and hi from 🧨 diffusers

Hi folks!

Simply amazing work here 🔥

I am Sayak, one of the maintainers of 🧨 diffusers at HF. I see all the weights of BK-SDM are already diffusers-compatible. This is really amazing!

I wanted to know if there is any plan to also open-source the distillation pre-training code. I think that will be beneficial to the community.

Additionally, any plans on doing for SDXL as well?

Cc: @patrickvonplaten

Unhandled exception while generating images that considered NSFW

Hi! I ran this line of code to generate samples to compute FID:

!python3 src/generate.py --model_id nota-ai/bk-sdm-base --save_dir ./results/bk-sdm-base

Then I got this error:

0/30000 | COCO_val2014_000000000042.jpg **A small dog is curled up on top of the shoes** | 25 steps
Total 751.9M (U-Net 579.4M; TextEnc 123.1M; ImageDec 49.5M)
100% 25/25 [00:03<00:00,  8.14it/s]
Traceback (most recent call last):
  File "/content/BK-SDM/src/generate.py", line 53, in <module>
    img = pipeline.generate(prompt = val_prompt,
  File "/content/BK-SDM/src/utils/inference_pipeline.py", line 34, in generate
    out = self.pipe(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 706, in __call__
    do_denormalize = [not has_nsfw for has_nsfw in has_nsfw_concept]
TypeError: 'bool' object is not iterable

data loading problem with 89M pairs

Hi, thanks to your excellent work, I have conducted many experiments.

When I trained on a subset of LAION-aesthetic-5+ (about 89M pairs), my training process was killed without specific error message:(

Maybe it occurred at the load_dataset.

I guess that the number of training sets is too big, but I'm not sure.

I think this problem may be caused by the huggingface's dataset library.

Have you ever faced this problem? and have you tried to train your model on much bigger training set?

Thanks in advance :)

image

About gpu memory

Thanks for your great work. May I ask a question about the GPU mermory? You write

A toy script can be used to verify the code executability and find the batch size that matches your GPU. With a batch size of 8 (=4×2), training BK-SDM-Base for 20 iterations takes about 5 minutes and 22GB GPU memory.

With a batch size of 256 (=4×64), training BK-SDM-Base for 50K iterations takes about 300 hours and 53GB GPU memory. With a batch size of 64 (=4×16), it takes 60 hours and 28GB GPU memory.

That is about batch size increase about 32x (from 2 to 64), but gpu memory only inscrease less than 3x (from 22G to 53G). Why the gpu memory is so saving? Does the diffusers more gpu efficient than pytorch-lightning (sd v1.5 used)?
Thanks very much

how about kd trianing without ema?

thanks for your paper and code. my question is how about the model performance when i not use the eam option. it means i didn't pass the option "--use_ema"

May I ask if the training time is not accurate

batch size is 64 (256=4x64), train BK-SDM-Base by single A100 for 50K iteractions takes about 300 hours
batch size is 16 (64=4x16), train BK-SDM-Base by single A100 for 50K iteractions takes about 60 hours ???
in fact ,it is 600 hours??

Discussion on preprocessing of LAION data

[Question]

I have another question.

I split the LAION-aesthetic V2 5+ dataset into several subsets, e.g., 5M, 10M, 89M, etc, and I made metadata.csv for each subset.

Then, when I tried to train with multi-gpus using the subset dataset, I faced the below error.

I guess that the problem was caused by the data itself.

FYI, I didn't pre-process the data except for resolution (512x512) when I downloaded data.

Did you also face this problem?

Or did you conduct any pre-processing of the LAION data??

Steps: 0%| | 283/400000 [35:52<813:24:06, 7.33s/it, kd_feat_loss=58.6, kd_output_loss=0.0447, lr=5e-5, sd_loss=0.185, step_loss=58.9]
Traceback (most recent call last):
File "/home/user01/bk-sdm/src/kd_train_text_to_image.py, line 1171, in
main()
File "/home/user01/bk-sdm/src/kd_train_text_to_image.py", line 961, in main
for step, batch in enumerate(train_dataloader):
File "/home/user01/anaconda3/envs/kd-sdm/lib/python3.9/site-packages/accelerate/data_loader.py", line 388, in iter
next_batch = next(dataloader_iter)
File "/home/user01/anaconda3/envs/kd-sdm/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 628, in next
data = self._next_data()
File "/home/user01/anaconda3/envs/kd-sdm/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 671, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/user01/anaconda3/envs/kd-sdm/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 56, in fetch
data = self.dataset.getitems(possibly_batched_index)
File "/home/user01/anaconda3/envs/kd-sdm/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2715, in getitems
return [{col: array[i] for col, array in batch.items()} for i in range(n_examples)]
File "/home/user01/anaconda3/envs/kd-sdm/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2715, in
return [{col: array[i] for col, array in batch.items()} for i in range(n_examples)]
File "/home/user01/anaconda3/envs/kd-sdm/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2715, in
return [{col: array[i] for col, array in batch.items()} for i in range(n_examples)]
IndexError: index 63 is out of bounds for dimension 0 with size 63

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.