I am reading the SDXL paper and found that the refiner is applying to the latent image

Thank you for your reply The PR use <div class="snippet-clipboard-content notr

there is an output_type="latent" parameter <p dir="auto

Two updates: If the intermediate images are not needed (i.e.,

<a target="_blank" rel="noopener noreferrer" href="https://private-user-images.githubu

Question about the refiner step about stable-diffusion-xl-demo HOT 8 OPEN

tonylianlong commented on June 8, 2024

Question about the refiner step

from stable-diffusion-xl-demo.

Comments (8)

TonyLianLong commented on June 8, 2024

I believe it will re-encode so it's applied on the latents.

The implementation shows that images are transformed to latents prior to processing: https://github.com/huggingface/diffusers/blob/af48bf200860d8b83fe3be92b2d7ae556a3b4111/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_img2img.py#L841

I believe this is their recommended way to do refinement, as this is in their PR examples.

from stable-diffusion-xl-demo.

lifeisboringsoprogramming commented on June 8, 2024

Thank you for your reply
The PR use

images = pipe(prompt=prompt, output_type="latent").images

before the refiner

there is an output_type="latent" parameter

How much VRAM is needed if we move the pipe to cuda instead of enable_model_cpu_offload?

I do not have enough VRAM so I cannot know

Thanks

from stable-diffusion-xl-demo.

TonyLianLong commented on June 8, 2024

there is an output_type="latent" parameter

In this way you are right, the process of converting the latents to image can be skipped.

How much VRAM is needed if we move the pipe to cuda instead of enable_model_cpu_offload?

With 4 images, you can use 24G GPU Memory.

from stable-diffusion-xl-demo.

lifeisboringsoprogramming commented on June 8, 2024

I have 12G VRAM and cannot even do one image using pipe to cuda, thank you so much.

from stable-diffusion-xl-demo.

TonyLianLong commented on June 8, 2024

Two updates:

If the intermediate images are not needed (i.e., we don't want to compare before/after), no decoding and re-encoding between the base generation and refinement stage are used.
Offloading can be controlled with environment variables.

from stable-diffusion-xl-demo.

lifeisboringsoprogramming commented on June 8, 2024

I got some results
on the left: using images as refiner input
on the right: using latent as refiner input

the head of the middle guy has some differences.

Thanks

from stable-diffusion-xl-demo.

TonyLianLong commented on June 8, 2024

Thanks for this example. Is using images consistently better than using latents?

from stable-diffusion-xl-demo.

lifeisboringsoprogramming commented on June 8, 2024

I did not do any more testing for that
I think I decided to do the refiner only after knowing how this picture looked is a better workflow

from stable-diffusion-xl-demo.

Question about the refiner step about stable-diffusion-xl-demo HOT 8 OPEN

Comments (8)

Related Issues (8)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent