Hi, thanks for your nice work, I have tried the demo, and the method does have strong

Some failure cases about attribute assignment about llm-groundeddiffusion HOT 4 CLOSED

tonylianlong commented on June 3, 2024

Some failure cases about attribute assignment

from llm-groundeddiffusion.

Comments (4)

TonyLianLong commented on June 3, 2024

Our Github is also updated. You can pull to update your code.

~~You might be using our Github code that has an older copy of our demo. We will update our Github soon.~~

Tried two things with our demo (https://huggingface.co/spaces/longlian/llm-grounded-diffusion, you could also clone our demo and run it locally: https://huggingface.co/spaces/longlian/llm-grounded-diffusion/tree/main):

I used your prompt but increased the frozen ratio.

The quality can improve further (by tuning other hyperparams or changing seeds), but I feel that with cartoon style SD typically puts fewer details (e.g., very few details on the faces).

However, the attribute binding should be right most of the time, with Standard guidance (Faster modes give lower guidance).

I removed the cartoon style.

Note the faces are a little weird, but that's a typical SD issue with small faces that you will also see in the baseline.

from llm-groundeddiffusion.

XiaominLi1997 commented on June 3, 2024

Hi,
To clarify, the results mentioned above were obtained from running locally.

I also tried your new demo https://huggingface.co/spaces/longlian/llm-grounded-diffusion, and the results depend on seeds. When I use the same seed as you, I can get correct and reasonable results. But when I use a different seed, failure cases occur. Please see the following results.

Of course, I agree with

However, the attribute binding should be right most of the time, with Standard guidance (Faster modes give lower guidance).

Thanks.

from llm-groundeddiffusion.

TonyLianLong commented on June 3, 2024

It seems like the seed 4354 indeed gives a wrong association. Seed 4353 gives something right (still not beautiful, at a similar level to the SD baseline).

The seed 4354 even gets the association wrong without the man, so I would attribute this to the insufficient text-to-image object association learned from training.

Seed 4354 (only the woman):

Seed 4353 (only the woman):

Seed 4353 (man and the woman):

Longer explanation:

I think your observation can be interpreted this way: SD only sees image and paired caption during training, not instance annotations, so the associations from text to objects/attributes in the image are learned implicitly (without direct supervision).

Since SD is mostly trained on photos, with photo LMD already works in your case.

I believe cartoons are only a small fraction of the training data in the original SD (and GLIGEN, if you use LMD+), and the learned association is probably only weakly learned in the original SD that it still misses the association in some seeds.

LMD (stage 2), as a training-free method, can guide the generation to a specific layout. However, if SD doesn't recognize the text-to-object association (between "a woman in blue" and the cartoon woman in blue), then it is hard to convey the information to the model.

A solution is to train a LoRA or fine-tune the model to give the model more control. This is how people adapt and strengthen SD with a special style/domain. This is also an opportunity for future research to improve (making the association stronger and making it correct even more often).

from llm-groundeddiffusion.

XiaominLi1997 commented on June 3, 2024

Thanks for your helpful explanation, I have no more questions now.
I will close this issue.

from llm-groundeddiffusion.

Some failure cases about attribute assignment about llm-groundeddiffusion HOT 4 CLOSED

Comments (4)

Related Issues (10)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent