Git Product home page Git Product logo

Comments (4)

TonyLianLong avatar TonyLianLong commented on June 3, 2024

Our Github is also updated. You can pull to update your code.

You might be using our Github code that has an older copy of our demo. We will update our Github soon.

Tried two things with our demo (https://huggingface.co/spaces/longlian/llm-grounded-diffusion, you could also clone our demo and run it locally: https://huggingface.co/spaces/longlian/llm-grounded-diffusion/tree/main):

  1. I used your prompt but increased the frozen ratio.
image

The quality can improve further (by tuning other hyperparams or changing seeds), but I feel that with cartoon style SD typically puts fewer details (e.g., very few details on the faces).

However, the attribute binding should be right most of the time, with Standard guidance (Faster modes give lower guidance).

  1. I removed the cartoon style.
image

Note the faces are a little weird, but that's a typical SD issue with small faces that you will also see in the baseline.

from llm-groundeddiffusion.

XiaominLi1997 avatar XiaominLi1997 commented on June 3, 2024

Hi,
To clarify, the results mentioned above were obtained from running locally.

I also tried your new demo https://huggingface.co/spaces/longlian/llm-grounded-diffusion, and the results depend on seeds. When I use the same seed as you, I can get correct and reasonable results. But when I use a different seed, failure cases occur. Please see the following results.
image
1693105283462

Of course, I agree with

However, the attribute binding should be right most of the time, with Standard guidance (Faster modes give lower guidance).

Thanks.

from llm-groundeddiffusion.

TonyLianLong avatar TonyLianLong commented on June 3, 2024

It seems like the seed 4354 indeed gives a wrong association. Seed 4353 gives something right (still not beautiful, at a similar level to the SD baseline).

The seed 4354 even gets the association wrong without the man, so I would attribute this to the insufficient text-to-image object association learned from training.

Seed 4354 (only the woman):
image

Seed 4353 (only the woman):
image

Seed 4353 (man and the woman):
image

Longer explanation:

I think your observation can be interpreted this way: SD only sees image and paired caption during training, not instance annotations, so the associations from text to objects/attributes in the image are learned implicitly (without direct supervision).

Since SD is mostly trained on photos, with photo LMD already works in your case.

I believe cartoons are only a small fraction of the training data in the original SD (and GLIGEN, if you use LMD+), and the learned association is probably only weakly learned in the original SD that it still misses the association in some seeds.

LMD (stage 2), as a training-free method, can guide the generation to a specific layout. However, if SD doesn't recognize the text-to-object association (between "a woman in blue" and the cartoon woman in blue), then it is hard to convey the information to the model.

A solution is to train a LoRA or fine-tune the model to give the model more control. This is how people adapt and strengthen SD with a special style/domain. This is also an opportunity for future research to improve (making the association stronger and making it correct even more often).

from llm-groundeddiffusion.

XiaominLi1997 avatar XiaominLi1997 commented on June 3, 2024

Thanks for your helpful explanation, I have no more questions now.
I will close this issue.

from llm-groundeddiffusion.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.