Git Product home page Git Product logo

latent-consistency-model's Introduction

Latent Consistency Models

Official Repository of the paper: Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference.

Official Repository of the paper: LCM-LoRA: A Universal Stable-Diffusion Acceleration Module.

Project Page: https://latent-consistency-models.github.io

Try our Demos:

🤗 Hugging Face Demo: Hugging Face Spaces 🔥🔥🔥

Replicate Demo: Replicate

OpenXLab Demo: Open in OpenXLab

LCM Community: Join our LCM discord channels for discussions. Coders are welcome to contribute.

Breaking News 🔥🔥!!

  • (🤖New) 2023/12/1 Pixart-α X LCM is out, a high quality image generative model. see here.
  • (❤️New) 2023/11/10 Training Scripts are released!! Check here.
  • (🤯New) 2023/11/10 Training-free acceleration LCM-LoRA is born! See our technical report here and Hugging Face blog here.
  • (⚡️New) 2023/11/10 LCM has a major update! We release 3 LCM-LoRA (SD-XL, SSD-1B, SD-V1.5), see here.
  • (🚀New) 2023/11/10 LCM has a major update! We release 2 Full Param-tuned LCM (SD-XL, SSD-1B), see here.

News

  • (🔥New) 2023/11/10 We support LCM Inference with C# and ONNX Runtime now! Thanks to @saddam213! Check the link here.
  • (🔥New) 2023/11/01 Real-Time Latent Consistency Models is out!! Github link here. Thanks @radames for the really cool Huggingface🤗 demo Real-Time Image-to-Image, Real-Time Text-to-Image. Twitter/X Link.
  • (🔥New) 2023/10/28 We support Img2Img for LCM! Please refer to "🔥 Image2Image Demos".
  • (🔥New) 2023/10/25 We have official LCM Pipeline and LCM Scheduler in 🧨 Diffusers library now! Check the new "Usage".
  • (🔥New) 2023/10/24 Simple Streamlit UI for local use: See the link Thanks for @akx.
  • (🔥New) 2023/10/24 We support SD-Webui and ComfyUI now!! Thanks for @0xbitches. See the link: SD-Webui and ComfyUI.
  • (🔥New) 2023/10/23 Running on Windows/Linux CPU is also supported! Thanks for @rupeshs See the link.
  • (🔥New) 2023/10/22 Google Colab is supported now. Thanks for @camenduru See the link: Colab
  • (🔥New) 2023/10/21 We support local gradio demo now. LCM can run locally!! Please refer to the "Local gradio Demos".
  • (🔥New) 2023/10/19 We provide a demo of LCM in 🤗 Hugging Face Space. Try it here.
  • (🔥New) 2023/10/19 We provide the LCM model (Dreamshaper_v7) in 🤗 Hugging Face. Download here.
  • (🔥New) 2023/10/19 LCM is integrated in 🧨 Diffusers library. Please refer to the "Usage".

🔥 Image2Image Demos (Image-to-Image):

We support Img2Img now! Try the impressive img2img demos here: Replicate, SD-webui, ComfyUI, Colab

Local gradio for img2img is on the way!

🔥 Local gradio Demos (Text-to-Image):

To run the model locally, you can download the "local_gradio" folder:

  1. Install Pytorch (CUDA). MacOS system can download the "MPS" version of Pytorch. Please refer to: https://pytorch.org. Install Intel Extension for Pytorch as well if you're using Intel GPUs.
  2. Install the main library:
pip install diffusers transformers accelerate gradio==3.48.0 
  1. Launch the gradio: (For MacOS users, need to set the device="mps" in app.py; For Intel GPU users, set device="xpu" in app.py)
python app.py

Demos & Models Released

Ours Hugging Face Demo and Model are released ! Latent Consistency Models are supported in 🧨 diffusers.

LCM Model Download: LCM_Dreamshaper_v7

LCM模型已上传到始智AI(wisemodel) 中文用户可在此下载,下载链接.

For Chinese users, download LCM here: (中文用户可以在此下载LCM模型) Open in OpenXLab

Hugging Face Demo: Hugging Face Spaces

Replicate Demo: Replicate

OpenXLab Demo: Open in OpenXLab

Tungsten Demo: Tungsten

Novita.AI Demo: Novita.AI Latent Consistency Playground

By distilling classifier-free guidance into the model's input, LCM can generate high-quality images in very short inference time. We compare the inference time at the setting of 768 x 768 resolution, CFG scale w=8, batchsize=4, using a A800 GPU.

Usage

We have official LCM Pipeline and LCM Scheduler in 🧨 Diffusers library now! The older usages will be deprecated.

You can try out Latency Consistency Models directly on: Hugging Face Spaces

To run the model yourself, you can leverage the 🧨 Diffusers library:

  1. Install the library:
pip install --upgrade diffusers  # make sure to use at least diffusers >= 0.22
pip install transformers accelerate
  1. Run the model:
from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7")

# To save GPU memory, torch.float16 can be used, but it may compromise image quality.
pipe.to(torch_device="cuda", torch_dtype=torch.float32)

prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k"

# Can be set to 1~50 steps. LCM support fast inference even <= 4 steps. Recommend: 1~8 steps.
num_inference_steps = 4 

images = pipe(prompt=prompt, num_inference_steps=num_inference_steps, guidance_scale=8.0, lcm_origin_steps=50, output_type="pil").images

For more information, please have a look at the official docs: 👉 https://huggingface.co/docs/diffusers/api/pipelines/latent_consistency_models#latent-consistency-models

Usage (Deprecated)

We have official LCM Pipeline and LCM Scheduler in 🧨 Diffusers library now! The older usages will be deprecated. But you can still use the older usages by adding revision="fb9c5d1" from from_pretrained(...)

To run the model yourself, you can leverage the 🧨 Diffusers library:

  1. Install the library:
pip install diffusers transformers accelerate
  1. Run the model:
from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7", custom_pipeline="latent_consistency_txt2img", custom_revision="main", revision="fb9c5d")

# To save GPU memory, torch.float16 can be used, but it may compromise image quality.
pipe.to(torch_device="cuda", torch_dtype=torch.float32)

prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k"

# Can be set to 1~50 steps. LCM support fast inference even <= 4 steps. Recommend: 1~8 steps.
num_inference_steps = 4 

images = pipe(prompt=prompt, num_inference_steps=num_inference_steps, guidance_scale=8.0, lcm_origin_steps=50, output_type="pil").images

Our Contributors :

BibTeX

LCM:
@misc{luo2023latent,
      title={Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference}, 
      author={Simian Luo and Yiqin Tan and Longbo Huang and Jian Li and Hang Zhao},
      year={2023},
      eprint={2310.04378},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

LCM-LoRA:
@article{luo2023lcm,
  title={LCM-LoRA: A Universal Stable-Diffusion Acceleration Module},
  author={Luo, Simian and Tan, Yiqin and Patil, Suraj and Gu, Daniel and von Platen, Patrick and Passos, Apolin{\'a}rio and Huang, Longbo and Li, Jian and Zhao, Hang},
  journal={arXiv preprint arXiv:2311.05556},
  year={2023}
}

latent-consistency-model's People

Contributors

akx avatar anyisalin avatar camenduru avatar chenxwh avatar eltociear avatar kalyanimhala avatar luosiallen avatar mjpyeon avatar nuullll avatar patrickvonplaten avatar tyq1024 avatar vantang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

latent-consistency-model's Issues

python app.py have error

requests.exceptions.ConnectionError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /api/models/SimianLuo/LCM_Dreamshaper_v7 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f6b9794f5b0>: Failed to establish a new connection: [Errno 110] Connection timed out'))"), '(Request ID: 19cc3c18-0229-4474-883c-071d094ae64d)')
how to solve this problem ?

[Bug] CPU off loading is not working with diffusion pipeline

To reduce VRAM usage we can use CPU offloading but it raises error.

import torch
from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained(
                    "SimianLuo/LCM_Dreamshaper_v7",
                    custom_pipeline="latent_consistency_txt2img",
                    custom_revision="main")

pipeline.enable_sequential_cpu_offload()

images = pipeline(prompt="a cute cat", num_inference_steps=4, guidance_scale=8.0, lcm_origin_steps=50, output_type="pil").images

Colab : https://colab.research.google.com/drive/1Fqio_x4IG-5hV6anasSXlnEGp4c9LPF4?usp=sharing

image

how to train well on my pretrained model?

@luosiallen , great job. I do do latent consistency model distilling on my pretrained model according to your paper using ddim solver. the result is ok but blurry and step must be high(16 is ok , 4 is too bad) . I found the training is very tricky, especially lr and ema_rate, do you have the practical training guildance for distilling ? ?

Stable Diffusion-V2.1-Base model

Hi, great work! The paper stated that latent consistency model is distillated from Stable Diffusion-V2.1-Base, however, this repo only provide Dreamshaper v7 model, a version of Stable-Diffusion v1-5. Why it is the case? Does Dreamshaper v7 better than Stable Diffusion-V2.1-Base?

Document : Adding Contribution section to the Readme.md file.

There is no Contributors section in readme file .
As we know Contributions are what make the open-source community such an amazing place to learn, inspire, and create.
The "Contributors" section in a README.md file is important as it acknowledges and gives credit to those who have contributed to a project, fosters community and collaboration, adds transparency and accountability, and helps document the project's history for current and future maintainers. It also serves as a form of recognition, motivating contributors to continue their efforts.
contributors

Question regarding BoundaryConditionScalings

Hey Team,

I have successfully ported LCM to native C#, I am just adding some UI settings and have a few questions regarding a few fields

https://github.com/saddam213/OnnxStack/blob/master/OnnxStack.StableDiffusion/Diffusers/LatentConsistency/TextDiffuser.cs
https://github.com/saddam213/OnnxStack/blob/master/OnnxStack.StableDiffusion/Schedulers/LatentConsistency/LCMScheduler.cs

  1. BoundaryConditionScalings, there is a fixed variable sigma_data = 0.5 # Default: 0.5 is this something that should be adjusted by the end user, if so what is the recommended min/max
  2. Original Inference Steps, like wise is this something we should add next to the Inference Steps slider? or is it safe to fix it to a range of 30-50

Loving LCM, images are great even with a low step count, awesome work

About pred x0 model

I noticed that you have open-sourced the related models and code for LCM on HuggingFace, which is very useful to me.
However, I found that the existing models use either pred_eps or pred_v.
I want to ask if you have conducted experiments with the pred_x0 mode, and do you have the corresponding checkpoint files? If so, could you provide some relevant information or resources? Thank you very much for your help and support!

training code and more models

  1. would you release the training code, e.g. finetuning on custom datasets?
  2. Will more pretrained models be released in the future, e.g. distilled version of official SD-v1.5?

Any plan for ControlNet?

While the community has already presented an img2img approach based on SDEdit, it seems that ControlNet hasn't been embraced yet.

Training data

The training code involves using muse-dataset bucket from s3. However, it seems that the dataset is not publicly accessible. Is it possible to make the s3 bucket public for reproduction? This seems to be a processed version of LAION-A6+, which is different from LAION-A6+ on huggingface.

The dataset path in the command:

s3://muse-datasets/laion-aesthetic6plus-min512-data/{00000..01210}.tar

usage of guidance_scale_embedding as timestamp condtional

Running either train_lcm_distill_sd_wds.py or train_lcm_distill_sdxl_wds.py would running into missing argument error:

Traceback (most recent call last):
  File "/home/smhu/diffusers/examples/consistency_distillation/train_lcm_distill_sd_wds.py", line 1302, in <module>
    main(args)
  File "/home/smhu/diffusers/examples/consistency_distillation/train_lcm_distill_sd_wds.py", line 857, in main
    teacher_unet.config["time_cond_proj_dim"] = args.unet_time_cond_proj_dim
                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Namespace' object has no attribute 'unet_time_cond_proj_dim'

In train_lcm_distill_sd_wds.py, this argument configures guidance_scale_embedding dimension to be used as timestep conditional:
https://github.com/luosiallen/latent-consistency-model/blob/main/LCM_Training_Script/consistency_distillation/train_lcm_distill_sd_wds.py#L1143-L1156

, whereas in train_lcm_distill_sdxl_wds.py, guidance_scale_embedding is not used at all.
https://github.com/luosiallen/latent-consistency-model/blob/main/LCM_Training_Script/consistency_distillation/train_lcm_distill_sdxl_wds.py#L1248

Is the use of guidance_scale_embedding as timestamp_cond actually helpful? if yes, please fix with a default dim.

some questions and suggestions

Hi!
omg, only 1-4 steps???? this is greaaat!

  1. i saw the dreamshaper model is about 3.4 gb, can we have fp16 models in the future too for smaller sizes?
  2. PLEASE add AnimateDiff, with only 4 steps for cpu it will be a blessing for us!
  3. any plans for img2img, inpainting? or if someday lcm sampler merged with automatic1111, will we be able to use dreamshaper with animatediff and img2img normally? [it will be great if u guys support cpu officially in a1111]

i found @rupeshs repo for cpu's gui, can't wait for new features!
https://github.com/rupeshs/fastsdcpu

kind regards

Candle implementation

Hey 👋, I opened an issue on 🤗 Candle to add LCMs over there.
If you guys are interested you can provide support.

In a ideal scenario a Candle port would enable an inbrowser generation (like these one)

`mat1 and mat2 must have the same dtype, but got Float and Half`

got mat1 and mat2 must have the same dtype, but got Float and Half error when run sample code:

import torch
import cv2
import numpy as np
from PIL import Image

from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, LCMScheduler
from diffusers.utils import load_image, make_image_grid

image = load_image(
    "https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"
).resize((512, 512))

image = np.array(image)

low_threshold = 100
high_threshold = 200

image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)

controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16,
    safety_checker=None,
    variant="fp16"
).to("cuda")

# set scheduler
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

# load LCM-LoRA
pipe.load_lora_weights("latent-consistency/lcm-lora-sdv1-5")

generator = torch.manual_seed(0)
image = pipe(
    "the mona lisa",
    image=canny_image,
    num_inference_steps=4,
    guidance_scale=1.5,
    controlnet_conditioning_scale=0.8,
    cross_attention_kwargs={"scale": 1},
    generator=generator,
).images[0]
make_image_grid([canny_image, image], rows=1, cols=2)

Here is the exception stack:

log RuntimeError Traceback (most recent call last) Cell In[1], line 39 36 pipe.load_lora_weights("latent-consistency/lcm-lora-sdv1-5") 38 generator = torch.manual_seed(0) ---> 39 image = pipe( 40 "the mona lisa", 41 image=canny_image, 42 num_inference_steps=4, 43 guidance_scale=1.5, 44 controlnet_conditioning_scale=0.8, 45 cross_attention_kwargs={"scale": 1}, 46 generator=generator, 47 ).images[0] 48 make_image_grid([canny_image, image], rows=1, cols=2)

File ~\venv\Lib\site-packages\torch\utils_contextlib.py:115, in context_decorator..decorate_context(*args, **kwargs)
112 @functools.wraps(func)
113 def decorate_context(*args, **kwargs):
114 with ctx_factory():
--> 115 return func(*args, **kwargs)

File ~\venv\Lib\site-packages\diffusers\pipelines\controlnet\pipeline_controlnet.py:1010, in StableDiffusionControlNetPipeline.call(self, prompt, image, height, width, num_inference_steps, guidance_scale, negative_prompt, num_images_per_prompt, eta, generator, latents, prompt_embeds, negative_prompt_embeds, output_type, return_dict, callback, callback_steps, cross_attention_kwargs, controlnet_conditioning_scale, guess_mode, control_guidance_start, control_guidance_end, clip_skip)
1007 controlnet_cond_scale = controlnet_cond_scale[0]
1008 cond_scale = controlnet_cond_scale * controlnet_keep[i]
-> 1010 down_block_res_samples, mid_block_res_sample = self.controlnet(
1011 control_model_input,
1012 t,
1013 encoder_hidden_states=controlnet_prompt_embeds,
1014 controlnet_cond=image,
1015 conditioning_scale=cond_scale,
1016 guess_mode=guess_mode,
1017 return_dict=False,
1018 )
1020 if guess_mode and do_classifier_free_guidance:
1021 # Infered ControlNet only for the conditional batch.
1022 # To apply the output of ControlNet to both the unconditional and conditional batches,
1023 # add 0 to the unconditional batch to keep it unchanged.
1024 down_block_res_samples = [torch.cat([torch.zeros_like(d), d]) for d in down_block_res_samples]

File ~\venv\Lib\site-packages\torch\nn\modules\module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
1516 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1517 else:
-> 1518 return self._call_impl(*args, **kwargs)

File ~\venv\Lib\site-packages\torch\nn\modules\module.py:1527, in Module._call_impl(self, *args, **kwargs)
1522 # If we don't have any hooks, we want to skip the rest of the logic in
1523 # this function, and just call forward.
1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1525 or _global_backward_pre_hooks or _global_backward_hooks
1526 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527 return forward_call(*args, **kwargs)
1529 try:
1530 result = None

File ~\venv\Lib\site-packages\diffusers\models\controlnet.py:736, in ControlNetModel.forward(self, sample, timestep, encoder_hidden_states, controlnet_cond, conditioning_scale, class_labels, timestep_cond, attention_mask, added_cond_kwargs, cross_attention_kwargs, guess_mode, return_dict)
731 # timesteps does not contain any weights and will always return f32 tensors
732 # but time_embedding might actually be running in fp16. so we need to cast here.
733 # there might be better ways to encapsulate this.
734 t_emb = t_emb.to(dtype=sample.dtype)
--> 736 emb = self.time_embedding(t_emb, timestep_cond)
737 aug_emb = None
739 if self.class_embedding is not None:

File ~\venv\Lib\site-packages\torch\nn\modules\module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
1516 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1517 else:
-> 1518 return self._call_impl(*args, **kwargs)

File ~\venv\Lib\site-packages\torch\nn\modules\module.py:1527, in Module._call_impl(self, *args, **kwargs)
1522 # If we don't have any hooks, we want to skip the rest of the logic in
1523 # this function, and just call forward.
1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1525 or _global_backward_pre_hooks or _global_backward_hooks
1526 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527 return forward_call(*args, **kwargs)
1529 try:
1530 result = None

File ~\venv\Lib\site-packages\diffusers\models\embeddings.py:226, in TimestepEmbedding.forward(self, sample, condition)
224 if condition is not None:
225 sample = sample + self.cond_proj(condition)
--> 226 sample = self.linear_1(sample)
228 if self.act is not None:
229 sample = self.act(sample)

File ~\venv\Lib\site-packages\torch\nn\modules\module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
1516 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1517 else:
-> 1518 return self._call_impl(*args, **kwargs)

File ~\venv\Lib\site-packages\torch\nn\modules\module.py:1527, in Module._call_impl(self, *args, **kwargs)
1522 # If we don't have any hooks, we want to skip the rest of the logic in
1523 # this function, and just call forward.
1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1525 or _global_backward_pre_hooks or _global_backward_hooks
1526 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527 return forward_call(*args, **kwargs)
1529 try:
1530 result = None

File ~\venv\Lib\site-packages\diffusers\models\lora.py:300, in LoRACompatibleLinear.forward(self, hidden_states, scale)
298 def forward(self, hidden_states: torch.Tensor, scale: float = 1.0) -> torch.Tensor:
299 if self.lora_layer is None:
--> 300 out = super().forward(hidden_states)
301 return out
302 else:

File ~\venv\Lib\site-packages\torch\nn\modules\linear.py:114, in Linear.forward(self, input)
113 def forward(self, input: Tensor) -> Tensor:
--> 114 return F.linear(input, self.weight, self.bias)

RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half

My OS is Windows 11, CUDA version 11.8

Infer problem about loading lora weights

Hey~

Good jobs~ I have trained SD Lora on my custom dataset. But I have some problems with inference ONLY.

With the state_dict() we saved by
'''
lora_state_dict = get_peft_model_state_dict(unet_, adapter_name="default")
StableDiffusionPipeline.save_lora_weights(os.path.join(output_dir, "unet_lora"), lora_state_dict)
'''

The keys of the saved model are named like
'''
base_model.model.mid_block.resnets.1.time_emb_proj.lora_B.weight
'''

But I checked the pytorch_lora_weights.safetensors are like
'''
lora_unet_up_blocks_2_attentions_0_proj_in.lora_up.weight
'''
which can be correctly loaded by "pipe.load_lora_weights()".

But the models we saved can not be loaded directly.
So, the question is how to load the Lora weights we save. Or should we convert the Lora weights before we save?

Thanks~

About Dreamshaper-V7 and inference time

TL; DR:

Thanks for the great paper!

I have two questions about the paper.
Q1) Did you use the dataset, LAION-subset for distilling Dreamshaper-V7?
Q2) For the inference time in the graph at the end of README, will LCM have the same speed as DDIM? Or like DPM++, LCM is faster than DDIM? (Might be because of the distillation effect of CFG)


Hi, I read the paper, LCM, and found that it is a well-grounded paper with high reproducibility.

After reading the paper, I have a question about the implementation details.

In the paper, there are several results from the distilled Dreamshaper-V7, however, I cannot find the implementation details on it.
For example, for all the quantitative evaluations, the teacher model is the base SD; for the distillation, the LAION-5B-Aesthetic dataset is referred to as the dataset. However, the training dataset for Dreamer-V7 is not described well (and the training dataset for Dreamer-* seems not to be publicly available.). Did you use the same dataset (LAION-subset) for distilling Dreamshaper-V7?

Following the paper, DDIM sampler is adopted for sampling of LCM, then, is the wall-clock time of the inference of LCM the same as the vanilla stable diffusion model with DDIM (I understand that the performance of DDIM with fewer steps exhibits inferior performance. The question is just about the inference time)? Or is it faster than DDIM because of the distillationof CFG?

Thanks!

v_prediction error

In the "predicted_origin" method in train.py, prediction_type == "epsilon" can work well, but prediction_type == "v_prediction" will meet this error.

*** RuntimeError: The size of tensor a (10) must match the size of tensor b (64) at non-singleton dimension 3

Should the "sigmas" and "alphas" in prediction_type == "v_prediction" are also processed by extract_into_tensor() for matching shapes as in prediction_type == "epsilon"?

Like this

    if prediction_type == "epsilon":
        sigmas = extract_into_tensor(sigmas, timesteps, sample.shape)
        alphas = extract_into_tensor(alphas, timesteps, sample.shape)
        pred_x_0 = (sample - sigmas * model_output) / alphas
    elif prediction_type == "v_prediction":
        sigmas = extract_into_tensor(sigmas, timesteps, sample.shape)
        alphas = extract_into_tensor(alphas, timesteps, sample.shape)
        pred_x_0 = alphas * sample - sigmas * model_output
    else:
        raise ValueError(f"Prediction type {prediction_type} currently not supported.")

Am I right?Thanks~

a error vid2vid

Traceback (most recent call last):
File "D:\StabilityMatrix\Packages\Stable Diffusion WebUI\venv\lib\site-packages\gradio\routes.py", line 488, in run_predict
output = await app.get_blocks().process_api(
File "D:\StabilityMatrix\Packages\Stable Diffusion WebUI\venv\lib\site-packages\gradio\blocks.py", line 1431, in process_api
result = await self.call_function(
File "D:\StabilityMatrix\Packages\Stable Diffusion WebUI\venv\lib\site-packages\gradio\blocks.py", line 1103, in call_function
prediction = await anyio.to_thread.run_sync(
File "D:\StabilityMatrix\Packages\Stable Diffusion WebUI\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "D:\StabilityMatrix\Packages\Stable Diffusion WebUI\venv\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "D:\StabilityMatrix\Packages\Stable Diffusion WebUI\venv\lib\site-packages\anyio_backends_asyncio.py", line 807, in run
result = context.run(func, *args)
File "D:\StabilityMatrix\Packages\Stable Diffusion WebUI\venv\lib\site-packages\gradio\utils.py", line 707, in wrapper
response = f(*args, **kwargs)
File "D:\StabilityMatrix\Packages\Stable Diffusion WebUI\extensions\sd-webui-lcm\scripts\main.py", line 291, in generate_v2v
result = pipe(
File "D:\StabilityMatrix\Packages\Stable Diffusion WebUI\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\StabilityMatrix\Packages\Stable Diffusion WebUI\extensions\sd-webui-lcm\lcm\lcm_i2i_pipeline.py", line 305, in call
self.scheduler.set_timesteps(strength, num_inference_steps, original_inference_steps)
File "D:\StabilityMatrix\Packages\Stable Diffusion WebUI\venv\lib\site-packages\diffusers\schedulers\scheduling_lcm.py", line 382, in set_timesteps
timesteps = lcm_origin_timesteps[::-skipping_step][:num_inference_steps]
TypeError: slice indices must be integers or None or have an index method

Combining iCT with LCM for Image Quality Enhancement

Hello,

I've had the opportunity to read your team paper on iCT, and I find the technology quite impressive.

https://arxiv.org/abs/2310.14189

Are there any plans in place to combine iCT with LCM to enhance image quality further? It seems like something that would be of great interest to Stability AI.

Looking forward to hearing about any such developments.

Best regards,
Alfredplpl

我有意见

首先,向你们表示崇高的敬意,你们是国人的骄傲。其次,你们至少提供个中文版的文档吧。另外模型国内不翻墙也没法下载啊,给个国内镜像呗。

how to train on custom images?

The paper mentions Latent Consistency Fine-tuning on customised image datasets. Is there any available code to do that?

Also I noticed the results for pokemon and simpsons 30k finetuning is getting there but still kinda bad. Would this be fixed by using more than 30k iterations?

timestep_cond is need?

image Does teacher model must have timestep_cond in lcm distilling? My pretrain model is sd1.5 and it is don't have this weight. When I remove timestep_cond code, the distilled model result is strange like this image
I use one A100 to train about 35400 step, and loss is desending normally. image

@luosiallen

is there a normal sd version of this model?

Hi!

after using this model i totally fell in love with it, it is so beautiful, i checked the normal dreamshaper v7 but it was totally different

do u guys have this model as a normal 1.5 model as well to use with euler or ddim?

kind regards

TORCH_USE_CUDA_DSA error in recent update

Running app.py locally (Windows). UI opens but when one of the sample prompts is clicked it errors out with this message

self.timesteps = torch.from_numpy(timesteps.copy()).to(device=device, dtype=torch.long)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Any ideas on what needs to be done to fix this and get it working again?

To setup a local environment I use these packages/versions

python -m pip install --upgrade pip
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts wheel==0.38.4
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts transformers==4.34.1
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts accelerate==0.23.0
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts gradio==3.48.0
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts diffusers==0.22.3
pip uninstall -y torch
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts torch==2.0.1+cu118 torchvision==0.15.2+cu118 torchaudio==2.0.2+cu118 --index-url https://download.pytorch.org/whl/cu118
pip uninstall -y typing_extensions
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts typing_extensions==4.8.0

Here is the pip list just in case that helps

Package                   Version
------------------------- ------------
accelerate                0.23.0
aiofiles                  23.2.1
altair                    5.1.2
annotated-types           0.6.0
anyio                     3.7.1
attrs                     23.1.0
certifi                   2023.7.22
charset-normalizer        3.3.2
click                     8.1.7
colorama                  0.4.6
contourpy                 1.2.0
cycler                    0.12.1
diffusers                 0.22.3
exceptiongroup            1.1.3
fastapi                   0.104.1
ffmpy                     0.3.1
filelock                  3.13.1
fonttools                 4.44.0
fsspec                    2023.10.0
gradio                    3.48.0
gradio_client             0.6.1
h11                       0.14.0
httpcore                  1.0.1
httpx                     0.25.1
huggingface-hub           0.17.3
idna                      3.4
importlib-metadata        6.8.0
importlib-resources       6.1.1
Jinja2                    3.1.2
jsonschema                4.19.2
jsonschema-specifications 2023.7.1
kiwisolver                1.4.5
MarkupSafe                2.1.3
matplotlib                3.8.1
mpmath                    1.3.0
networkx                  3.2.1
numpy                     1.26.1
orjson                    3.9.10
packaging                 23.2
pandas                    2.1.2
Pillow                    10.1.0
pip                       23.3.1
psutil                    5.9.6
pydantic                  2.4.2
pydantic_core             2.10.1
pydub                     0.25.1
pyparsing                 3.1.1
python-dateutil           2.8.2
python-multipart          0.0.6
pytz                      2023.3.post1
PyYAML                    6.0.1
referencing               0.30.2
regex                     2023.10.3
requests                  2.31.0
rpds-py                   0.12.0
safetensors               0.4.0
semantic-version          2.10.0
setuptools                63.2.0
six                       1.16.0
sniffio                   1.3.0
starlette                 0.27.0
sympy                     1.12
tokenizers                0.14.1
toolz                     0.12.0
torch                     2.0.1+cu118
torchaudio                2.0.2+cu118
torchvision               0.15.2+cu118
tqdm                      4.66.1
transformers              4.34.1
typing_extensions         4.8.0
tzdata                    2023.3
urllib3                   2.0.7
uvicorn                   0.24.0.post1
websockets                11.0.3
wheel                     0.38.4
zipp                      3.17.0

regular lora with lcm

when combine regular lora with lcm, the output is quite "lighter" than the output from lora with sd, any thoughts? do i need to finetune lora? or there is some parameters need to pay attention to?

original
lcm+regular lora
sd+regular lora

Why LCM-Lora for SD-XL can be used in SD-XL-inpainting?

In the LCM-Lora huggingface demo, lcm-lora-sdxl is proposed for stable-diffusion-xl-1.0-inpainting-0.1.
However, the lcm-lora-sdxl is trained for SDXL rather than SDXL-inpainting.
How does it work?

pipe = AutoPipelineForInpainting.from_pretrained(
    "diffusers/stable-diffusion-xl-1.0-inpainting-0.1",
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")

# set scheduler
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

# load LCM-LoRA
pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl")
pipe.fuse_lora()

code release

Hi, when will the code be released? Thank you!

Distilled SDXL Models

Firstly, congratulations on the achievement, this technology adds a lot of value to the Stable Diffusion community.

Having used this, one is curious about the following:

  • Is it possible to distill SDXL models?
  • Would the distillation code be released?

Request for training guidance of Latent Consistency Fine-tuning

Hi, very exciting work! I'm trying to reproduce your results of LCF on Pokemon dataset. However, I can't reach a convergence for training. I use MSE loss for training, the training loss is always around 0.005 and does not show clear decent trends.

Below is a brief list of my training configurations:
Batch size: 72
EMA rate: 0.999943
Learning Rate: 8e-6
Epoch: 800 (approximately 10k steps)

For the skipping step, I keep same as your pretrained model (it is 20 considering Stable Diffusion, and 1 considering your LCM).

Do you have any ideas regrading this failure? And also could you please provide training code or help me at reproducing your results? Any help would be appreciated!

If it's possible help me at reproducing the results, you may send your vx to my email: [email protected]. Thanks!

can i get trainning code?

it's realy very, very surprising results.
(enough to make me look back at myself for giving up on graduate school.)

I hope you guys get good reviews on this paper.

I'm curious is there any script provision plan for model training? If possible, for the Lora script too.

About guidance_scale

Hi @luosiallen, I love your work! But I have one question.

As suggested in latent-consistency/lcm-lora-sdv1-5, it is recommended to disable guidance_scale or use values between 1.0 and 2.0. The effect of guidance_scale can be found in this post. But in the paper, it seems that it should be fine to set the guidance_scale to 7.5 for classifier-free guidance. What causes this difference?

About guidance scale with input prompt!Urgent issue

Hi, thanks for your great work!

I want to understand how much guidance_scale is set to have no effect. According to the comments in the code, when gudiance_scale=1, it should completely become unconditional. No matter what the input prompt is at this time, when the remaining variables (such as the input image and random number seed) are completely fixed, the generated images should be consistent. However, when I set it to 1, the generated image still changes significantly when I change the prompt.

Could you tell me, how should I set guidance_scale or change where in the code so that my prompt does not affect the generated image, which is completely unconditional.

Thanks a lot!

Prevent nudity

LCM Lora really need a way to prevent nudity, since we have cfg = [1, 2], it's not strong enough for the negative prompt to be effective in A1111.

I've also tried to put the negative prompt in the positive prompt with weight < 1, it works in some cases, but not in other cases, furthermore, that will dramatically change the artistic of the output images compared to not using it, so it's not a reliable solution at all.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.