Comments (4)
Our Github is also updated. You can pull to update your code.
You might be using our Github code that has an older copy of our demo. We will update our Github soon.
Tried two things with our demo (https://huggingface.co/spaces/longlian/llm-grounded-diffusion, you could also clone our demo and run it locally: https://huggingface.co/spaces/longlian/llm-grounded-diffusion/tree/main):
- I used your prompt but increased the frozen ratio.
The quality can improve further (by tuning other hyperparams or changing seeds), but I feel that with cartoon style SD typically puts fewer details (e.g., very few details on the faces).
However, the attribute binding should be right most of the time, with Standard guidance (Faster modes give lower guidance).
- I removed the cartoon style.
Note the faces are a little weird, but that's a typical SD issue with small faces that you will also see in the baseline.
from llm-groundeddiffusion.
Hi,
To clarify, the results mentioned above were obtained from running locally.
I also tried your new demo https://huggingface.co/spaces/longlian/llm-grounded-diffusion, and the results depend on seeds. When I use the same seed as you, I can get correct and reasonable results. But when I use a different seed, failure cases occur. Please see the following results.
Of course, I agree with
However, the attribute binding should be right most of the time, with Standard guidance (Faster modes give lower guidance).
Thanks.
from llm-groundeddiffusion.
It seems like the seed 4354 indeed gives a wrong association. Seed 4353 gives something right (still not beautiful, at a similar level to the SD baseline).
The seed 4354 even gets the association wrong without the man, so I would attribute this to the insufficient text-to-image object association learned from training.
Seed 4353 (man and the woman):
Longer explanation:
I think your observation can be interpreted this way: SD only sees image and paired caption during training, not instance annotations, so the associations from text to objects/attributes in the image are learned implicitly (without direct supervision).
Since SD is mostly trained on photos, with photo LMD already works in your case.
I believe cartoons are only a small fraction of the training data in the original SD (and GLIGEN, if you use LMD+), and the learned association is probably only weakly learned in the original SD that it still misses the association in some seeds.
LMD (stage 2), as a training-free method, can guide the generation to a specific layout. However, if SD doesn't recognize the text-to-object association (between "a woman in blue" and the cartoon woman in blue), then it is hard to convey the information to the model.
A solution is to train a LoRA or fine-tune the model to give the model more control. This is how people adapt and strengthen SD with a special style/domain. This is also an opportunity for future research to improve (making the association stronger and making it correct even more often).
from llm-groundeddiffusion.
Thanks for your helpful explanation, I have no more questions now.
I will close this issue.
from llm-groundeddiffusion.
Related Issues (10)
- when your code are released? HOT 4
- hugginface demo not work HOT 1
- How to get the image a man rides a horse? HOT 4
- Can we use small model like LLAMA to get layout? HOT 3
- Questions about per-box generation process HOT 6
- License HOT 1
- [FR] Finetuning Model using Dreambooth HOT 2
- Do you consider compatibility with stable diffusion XL? HOT 9
- ImportError: cannot import name 'Annotated' from 'pydantic.typing' HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llm-groundeddiffusion.