Git Product home page Git Product logo

Comments (10)

adhaesitadimo1 avatar adhaesitadimo1 commented on August 25, 2024 1

I think I will better pull request my revision so it's more convenient for you

from streammultidiffusion.

adhaesitadimo1 avatar adhaesitadimo1 commented on August 25, 2024 1

Thanks mate!

from streammultidiffusion.

ironjr avatar ironjr commented on August 25, 2024 1

Fundamentally, the main cause of the problem is shorter timesteps: reducing the timesteps from 50 to 5, the model has 10 times less 'chance' to correct the content creation.

Bootstrapping steps are for alleviating such issues. The recommended solutions for the problem is:

  1. Increase the bootstrapping_steps from 1 to 3.
  2. If 1 does not work, increase the number of timesteps from 5 to 8 (bootstrapping_steps=3 is recommended for timesteps 8.

Specifically, each of the bootstrapping stages do the following:

  • Bootstrapping: StableDiffusion model is dumb. If you designate two people in a scene even with a separate mask, frequently, the diffusion model develops only one person during the intermediate stage and the model feels happy about it, because each prompt (somewhat) agrees with the generated content (a person). The problem is more critical in earlier generation steps (~20%), when overall constitution of the image is formed. Bootstrapping is basically separating the generation process of each masked prompts. So, the more you bootstrap, the higher fidelity to the prompt-mask pairs. However, since each objects are developed without knowing each other, the overall consistency can be more easily broken. Therefore, the recommended bootstrapping steps are the first 20-50% of the timesteps.
  • Centering: StableDiffusion tends to generate prompt-related object at the center of the screen. If the mask is at the side of the frame, the output of the initial generation steps (1-2) that sketches the object in the scene is unnaturally cropped by the uncentered mask. The centering tries to resolve this issue by centering each prompt-designated objects at the center of the frame for the initial generation steps, so the objects are not cropped unwantedly.

Hope this helps!

from streammultidiffusion.

ironjr avatar ironjr commented on August 25, 2024

Can you please provide how did you produced the results? Thanks!

from streammultidiffusion.

ironjr avatar ironjr commented on August 25, 2024

Specifically, is the model of StableMultiDiffusionPipelineSDXL or of StreamMultiDiffusionSDXL? I will check this out.

from streammultidiffusion.

adhaesitadimo1 avatar adhaesitadimo1 commented on August 25, 2024

Sure, here is the .ipynb I used
https://drive.google.com/file/d/18MtBdlOohfwgIlnT9AwqCyPySS4lDJux/view?usp=drive_link
StableMultiDiffusionSDXLPipeline was used. I made a couple of fixes in this class first to be able to use custom sdxl checkpoint
model_ckpt = 'drive/MyDrive/checkpoints/john_cena_last.ckpt' # Use the correct ckpt for your step setting! print(model_ckpt) #model_ckpt = "sdxl_lightning_8step_unet.safetensors" #unet = UNet2DConditionModel.from_config(model_key, subfolder='unet').to(self.device, self.dtype) #unet.load_state_dict(load_file(hf_hub_download(lightning_repo, model_ckpt), device=self.device)) #self.pipe = StableDiffusionXLPipeline.from_pretrained(model_key, unet=unet, torch_dtype=self.dtype, variant=variant).to(self.device) self.pipe = StableDiffusionXLPipeline.from_single_file(model_ckpt, torch_dtype=self.dtype, variant="fp16").to(self.device)
Then fp16 vae fix
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16).to(self.device) self.pipe.vae = vae self.vae = self.pipe.vae
And then the quick fix with pooled embeddings dimensions I described
# INTERPOLATION ba = pooled_prompt_embeds[0] fa = pooled_prompt_embeds[1] pooled_prompt_embeds = torch.lerp(ba, fa, s1) # BACKGROUND OBFUSCATION #pooled_prompt_embeds = pooled_prompt_embeds[1:,:] #print(pooled_prompt_embeds.shape)

from streammultidiffusion.

adhaesitadimo1 avatar adhaesitadimo1 commented on August 25, 2024

Forgot to mention there was a typo in bootstrap using never mentioned bg_latents variable, I deduced it's bg_latent from before

from streammultidiffusion.

ironjr avatar ironjr commented on August 25, 2024

Thanks for the detailed update! I will have a look.

from streammultidiffusion.

ironjr avatar ironjr commented on August 25, 2024

Thank you again for the report! I just updated StableMultiDiffusionSDXLPipeline to fix the error.
I also added notebooks/demo_inpaint_sdxl.ipynb for the dedicated usage guide.

from streammultidiffusion.

adhaesitadimo1 avatar adhaesitadimo1 commented on August 25, 2024

Hey, I also have one more question. Sometimes when using multiple masks one mask is left empty. Is it seed instability issue or problem with centering?
image

from streammultidiffusion.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.