horseee / deepcache Goto Github PK

View Code? Open in Web Editor NEW

619.0 619.0 29.0 104.94 MB

[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free

Home Page: https://horseee.github.io/Diffusion_DeepCache/

License: Apache License 2.0

Python 27.85% Shell 0.05% Jupyter Notebook 72.10%

diffusion-models efficient-inference model-compression stable-diffusion training-free

deepcache's Introduction

Hi there 👋

I'm ✨ Xinyin Ma ✨

🌱 I’m currently learning at NUS LV lab, under the supervision of Prof. Xinchao Wang

🎓 Previously I'm studying in Zhejiang University, under the supervision of Prof. Weiming Lu

🤔 My research interest mainly focuses on the efficient generative model.

👯[Personal Page] | 📫[Google Scholar] | 💬[Semantic Scholar]

🎃 Latest Goal: to have a kitten like this

deepcache's People

Contributors

Stargazers

Watchers

deepcache's Issues

I experiment a bit with DeepCache, it is very awsome !
I realised that if we do start (and end) caching earlier (something like start caching only after 5 % of the iteration and stop 5% before end of denoising), we can bearly see any change in the image but the preformences are still very good !
Implementing a start and stop caching is quite straighforward you just have to replace this :

if i in interval_seq :
    prv_features = None

to :

if i in interval_seq or i <= int(num_inference_steps * cache_start) or i >= int(num_inference_steps * cache_end):
    prv_features = None

deepcache support DPM \DPM etc ?

Is it useful in the LCM model OpenVino? Can image to image accelerate on CPU

Is it useful in the LCM model OpenVino? Can image to image be accelerated on the CPU and integrated into it https://github.com/rupeshs/fastsdcpu

DeepCache is enabled on model base on SD 1.5 trained by diffusers?

How can use deepcache on models re-trained by diffusers?

No Speed Gains

Hi,

I am trying to run DeepCache on v100 using the given instructions on huggingface, in all tests I see that DeepCache take ~10s longer than using SD wihtout DeepCache.

 import torch
 from diffusers import StableDiffusionPipeline
 pipe = StableDiffusionPipeline.from_pretrained('runwayml/stable-diffusion-v1-5', torch_dtype=torch.float16).to("cuda")

from DeepCache import DeepCacheSDHelper
helper = DeepCacheSDHelper(pipe=pipe)
helper.set_params(
    cache_interval=3,
    cache_branch_id=0,
)
helper.enable()

image = pipe("a photo of an astronaut on a moon").images[0]

This is the code that I am using

Use of cache_block_id

Hi, the work looks very interesting.

I was going through the code, and got most part of it, except cache_block_id. I didn't get it's usage from the paper, and according to the code, it seems like caching the output of a particular attention block within a UNet layer.

Could you please provide some insight on this. Thanks!

Controlnet support?

Diffusers 0.26.3

Hi you need update pipeline XL to work with last version of Diffusers...

Stable Diffusion- WebUI Implementation

Thanks for the great work!
I have few questions about it.

Short Implementation

https://gist.github.com/aria1th/04bb78207daeee1f3d0800dc422e6254
WebUI Implementation

We can understand it as caching the result, and use for nearby steps, skipping UNet blocks to use cache.

But, for DDIM and PLMS - I observed it works pretty well, but for other samplers, the changes are drastic for certain steps, such as DPM ++ SDE a Karras.

For those samplers, seems like we can stop caching for initial steps, focusing on 'stabilizing' steps, like stopping for 30-40% of initial steps, or invalidating cache if timestep is bigger than 50 between steps.

Is it correct?

Also, I tried implementation for controlnet too - but seems like it is conflicting, it naturally uses timestep dependent embedding, to schedule guidance...

Maybe some masked guidance / combined guidance is required?

is it compatible with Faster-Diffusion?

from what i read of the code and paper, you are using the same principle but applied a different way than Faster-Diffusion(https://github.com/hutaiHang/Faster-Diffusion). Maybe trying to combine the two method can further improve speed. any idea / comments on it?

[issue] Compatibility with torch.compile() if torch >= 2.0

Thanks for great work once again.

I would like to ask you,
whether it can work with torch.compile().

If it is, maybe it can work so faster.
I got an error below when I combine with together.

Unsupported: class property UNet2DConditionModel getset_descriptor

from user code:
   File "/usr/local/lib/python3.10/dist-packages/diffusers/models/modeling_utils.py", line 211, in __getattr__
    is_in_config = "_internal_dict" in self.__dict__ and hasattr(self.__dict__["_internal_dict"], name)

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

It is not work when using the base model in safetensors format.

When I use the basic model in safetensors format, the error is as follows:

Who can help me?

Comfyui node

Hello, this looks quite awesome, do you plan to make this available as a extension/plugin in the common SD tools (automatic, comfyui, etc)?

Does DeepCache support controlnet?

How much GPU memory is needed to complete the SVD experiment?

In my SVD experiment, OOM occurred on the 80G A100. So I would like to ask how much GPU memory can support SVD experiments. Thanks!

How to use Actually in local pc

Do I need stable diffusion installed in my pc , or do I need to download models and put in the folder, the readme.md is not good how to use please make a #use and requirements

When i enable Deep Cache my image start being generated as siginficantly more distorted

I know i am 7800XT user and through newly and only kinda working ZLUDA at that. So i may be asking too much...
But i have question? Bug? Request? Depends on how it should be looked at.

When i enable Deep Cache (even with 2 step level. By "steps" i will mean "deep-cache cache interval" further below), my images begin to be generated with severe flaws which scale upwards pretty rapidly with increase in deep-cache frequency stepping. I guess deep-cache somehow make inherent model flaws scale up very rapidly until they cannot be contained?

I used SD 2.1 model and in example there will be both 512x512 generation and 768x768 on respective models.
Prompt used everywhere is simple, but practical: "Tree in the forest".
Used default sampler with 20 steps.

Decently large negative prompt does kinda help, by reducing generation distortion to great extend. But it only works up to specific deep-cache intervals.

Negative prompt (when used) was take from my other experiment [that is why mountain peaks and road are there... Also fish eye, as i used "wide angle"]: "lowres, low quality, low details, overexposed, overcontrast, oversaturated, text, watermark, excessively grainy, deformed, tiling, deformed foliage, inaccurate sky, deformed trees, deformed leaves, bad reflections, (unrealistic:1.1), smeary, deformed mountain peaks, road, oil drawing, cropped textures, flat textures, fish eye"

At 512x512 model it starts with oversaturation even at default 3 steps. Then reduction in details and mosaic feel, then at about 5-th deep-cache level generation results turns into "game of light, shadow and color". Negative prompt works to about 3-rd step, then generation quickly begin to degrade.

512x512 = no deep-cache

512x512 = 3 steps - oversaturation

512x512 = 5 steps - light, shadow and color "game"

512x512 = 3 steps + negative prompt

With 768x768 model even originally some pictures look like painting, and leaves are often feel like spots or mosaic. But at 2 steps ALL generated by prompt images tend to look like oil painting or with mosaic feel. While at 5 steps images start to look like as if they were fractured and shuffled a bit. Negative prompt only helps up to 2-nd step. Then it starts to quickly degrade.

768x768 = no deep cache

no deep-cache

768x768 = 3 steps - "oil painting" + mosaic feel

768x768 = 5 steps - "fractured" and/or "mosaic" images

768x768 = 10 steps - basically caleidoscope. Kinda. Looks pretty unique though, ngl.

768x768 = 2 steps + negative prompt

768x768 = 3 steps + negative prompt

I can kinda see slight mosaic feel or spottiness on leaves at 768x768 model. It seems to be inherent proble. But abstract "fracturing" or such level of "spottiness". I also cannot understand why i negative prompt compensated deep cache to some extend, and should it even be like this?

Sorry for bothering with my blabbering and complaint. Writing this issue also wasn't easy. Every time i changed model or deep cache related parameters i was forced to completely restart SD webui. I wish deep-cache wasn't useful for me, but it really allows to noticeable speed-up generation. But at current state for me, i just cannot afford to use it, it seems. It was only reason why i, in the end, decided to write it up.

Complimentary combination with FreeU

It would be pretty interesting to see how this behaves with https://github.com/ChenyangSi/FreeU

I guess it could very easily fix the major inconsistencies and while keeping the speed high. And it's also very easy to implement.

Any plans to support SDXL?

Do you guys have any plans to support XL-based models?

Does DeepCache support LCM?

pipe.enable_model_cpu_offload() makes every image after first distorted

Found a weird problem with DeepCache. If you use cpu offload with DeepCache first generated batch will be normal (expected speedup and quality), but all next batches will be very distorted (and generation is speed up even more). It does not happen with pipe.enable_sequential_cpu_offload() and when using without offload.
It can be fixed by helper.disable() after every batch and helper.enable() again before next batch generation but as it is not required in pipeline without cpu offload I decided not consider it as expected behavior.
Code for recreating this issue and example of distortion here: https://www.kaggle.com/code/ledrose/deepcache-cpu-offload-bug

Does it support SD-Inpaint model?

I changed the UNet2DConditionModel, ImageProjection and DiffusionPipeline functions in the pipeline_stable_diffusion_inpaint.py file. The code is fast but the output is terrible. What can I do to improve this?