Comments (16)
Yes, there is a big difference.
Each prompt (line of text) is first converted to tokens (array of integers) and those tokens are converted to embeddings (array of elements where each element it itself a vector of floats).
This is done straight-forward, and this is where Embedding Merge is working: it adds or multiplies those embedding vectors, not words or their tokens.
But! Those embeddings (representing all words of the prompt) aren't fed to Stable Diffusion directly. Instead, CLIP or OpenClip transformer network is used to recalculate this two-dimensional array.
It is CLIP who "understands" what the text means. Numbers are changed drastically, no longer representing simple words but their meanings.
Transformed array represents the high-level prompt ready to be sent to U-Net of Stable Diffusion.
And this is where two other controlling methods are used: prompt weighting and prompt merging.
When you write (green) hair
– you are not increasing just the "word's" weight, you are changing the weight of vectors that were outputted by CLIP: they also contain positional information, semantic relations, and could have been influenced by ClipSkip.
When your prompt is longer than 75 tokens, or if you put BREAK
explicitly – your prompt is split, and its parts are transformed with CLIP independently of each other.
Then you will have several valid "prompts" (and each of them can be partially weighed independently).
Before sending them to Stable Diffusion, those parts are summed by elements, so each vector becomes a sum of corresponding vectors, each of which was already transformed with CLIP.
So here you are not merging words, but their meaning. green hair BREAK blue eyes
becomes something that is simultaneously means both "green hair" and "blue eyes".
(Which doesn't prevent SD to generate blue hair with green eyes, because wrong properties bindings in an inherent problem, both in U-Net and in the CLIP itself!)
EmbeddingMerge works at much lower level, merging stuff at "words", before CLIP.
This means that merged parts change their properties, no longer representing of what it was.
<'green hair'+'blue eyes'>
is the same as <'blue hair'+'green eyes'>
or <'green'+'blue'><'eyes'+'hair'>
, and at the end we will see what CLIP thinks it is.
So probably the first word is a color, and the second word is a part of the face.
On the other hand, <'green'+'hair'>
is something different, meaning both a color and an object. Unfortunately, this doesn't anyhow help CLIP or SD to separate or localize objects and their properties together.
The importance of CLIP it huge: it transforms groups of words together, and their meaning may change. In your example, low quality
is a concept of bad generation, while low
and quality
mean different things.
By putting low
in the negative prompt, what it would actually negate? Will it make buildings taller?
Worse with quality
: don't you want a concept of "quality" to be positive, not negative?
So what I see is <'low'+'resolution'>
being something that means both "low" and "resolution" simultaneously, but not "low resolution". On the other hand, <'bad'+'low'><'quality'+'resolution'>
might work more or less as expected (just be sure to check token lengths of your vectors to account for alignment)
Still, CLIP tends to understand even messed-up concepts, so <'eyes'+'blue'>
might work too, and my extension has more research purpose rather than a practical one.
from stable-diffusion-webui-embedding-merge.
By default, shortest string is padded with zero vectors.
The side effect is that the "amount" of information there is low.
In your example, adding 4 tokens text with 5 tokens text will give you 5 tokens where first 4 are merged (and thus have double-length unless you put =/2
at the end) while the last token is unmodified from the second text (which would be halved in length if you would go for =/2
at the end)
Good news is that, firstly, absolute vector length (in Cartesian sense) is not too important, SD tolerates in 0.5>=X>=3 just fine: a half of dog or thrice a dog is still a dog; and secondly, zero-vectors are not messing up general concept understanding, and their addition has even less artifacts than putting extra commas here and there.
I heard BREAK
gives a very good person identity merging, like your main prompt BREAK person1 BREAK person2 BREAK masterpiece etc
Sometimes you would need to accommodate for alignment too, if you repeat the same prompt in those parts but with changed subject.
from stable-diffusion-webui-embedding-merge.
Try it!
I must say that you've showed me a way to use it and I'm going to use it more often. Thanks again! ;-)
from stable-diffusion-webui-embedding-merge.
I'm going to use it more often.
Those who can use BREAK are often wondering why nobody else are using such power!
from stable-diffusion-webui-embedding-merge.
Perfect! I couldn't imagine that I would get so wide answer!
Tokens length is my second question related to my favorite trick with faces mixing. It works as a charm for usual [name1 | name2] but mixing tokens is unclear for me.
For example:
Laura Vandervoort, Katheryn Winnick
have 4 vs 5 parts
Should I do something more than just <'Laura Vandervoort' + 'Katheryn Winnick'> to make the mix working correctly?
from stable-diffusion-webui-embedding-merge.
So much information, so hard to get it immediately.
You have an example:
'kuvshinov' + 'kuvshinov':-1 + 'kuvshinov':-2 + 'kuvshinov':-3 =: 1
As I understand in the example you make a single vector from a complex last name.
If we have such the example, it has some sense. What is the sense?
Should I covert all my complex names into single vectored ones like this?
'Vandervoort' + 'Vandervoort':-1 + 'Vandervoort':-2 =: 1
from stable-diffusion-webui-embedding-merge.
As I understand in the example you make a single vector from a complex last name.
And it gave nothing!
Concepts are destroyed by taking their intermediate tokens.
(Example just showed how to do it, not that it will be useful)
Should I covert all my complex names into single vectored ones like this?
You can have more luck with <'first name'+'last name'>
, but probably not either.
What do you want to achieve? BREAK
is better both at merging and shortening prompts, so you can describe a character and the scene separately, for example.
One of practical applications of EM is just making chimeras out of simple objects (as I showed in the linked Discussion), it can be fun.
But even then, my preliminary tests with SDXL are showing, for example, <'cat'*X+'girl'*Y>
generates ether cat (X≈1, Y≈0.5), either girl (X<0.5), or a girl with a cat (X==Y), but not a catgirl!
I got kids with feline ears on very specific ranges like X=0.87, really unstable and seed-dependent.
from stable-diffusion-webui-embedding-merge.
I can't use typical mixing by [ | | ]
because civiai automoderator reads prompts and sends such images to a long queue.
Using EM allows me to avoid such checking and I wonder for information about EM to reach the same visual effect as [ | | ]
has.
from stable-diffusion-webui-embedding-merge.
Can't you just [ <'one'> | <'two'> | <'three'> ]
?
from stable-diffusion-webui-embedding-merge.
Sure, I can. As I've already checked it works the same.
Just like to know something new, something useful ;-)
from stable-diffusion-webui-embedding-merge.
Actually joining vectors and word switching work differently.
A disadvantage of [|]
is in the persons. Each step each current person tries to change not only the face but the whole image.
<''> + <''>
has more "healthy" behavior and as the result the final image might be more "consistent".
from stable-diffusion-webui-embedding-merge.
One more thing to keep in mind.
All parts inside [|]
have different wights by its nature, but inside <''+''>
their weights are the same.
So if I have a good face made by [|]
I can't get the same just by replacing the constructions, I have to play with weights inside <''+''>
from stable-diffusion-webui-embedding-merge.
Have you tried BREAK?
a female model Cameron Diaz BREAK a female model Lucy Liu
or
a female model BREAK Cameron Diaz BREAK Lucy Liu
(You can still hide words with EM synax if needed)
from stable-diffusion-webui-embedding-merge.
BREAK? Why do I need to use it?
I want to mix faces, not to separate
from stable-diffusion-webui-embedding-merge.
Try it!
from stable-diffusion-webui-embedding-merge.
Try it!
Actually I had used it before rarely.
This is my fresh work with it https://civitai.com/images/5744964
from stable-diffusion-webui-embedding-merge.
Related Issues (19)
- Prompt length counter is buggy when attention parenthesis are used around merge expressions.
- Inline em fails in XYZ HOT 7
- Strange behaviour with dynamic prompts HOT 11
- [Question] Break and combine words HOT 9
- Compatibility with WebUI Forge HOT 1
- Safetensor vs. unsafe pickle support HOT 29
- expected Tensor, but got tuple HOT 4
- An error after Dyn Prompt update HOT 3
- Better SDXL support? Individual control over two CLIPs HOT 22
- Error when enlarging with SDXL and Forge HOT 1
- Preparations and considerations for SD3 [Discussion Thread] HOT 1
- embedding weights work differently? HOT 3
- module not found on merge HOT 2
- Making embedding from prompt results in different results HOT 5
- Updating for WebUI version 1.4.0
- Extend the token length if at all possible! HOT 4
- Any plans to work on SDXL? HOT 13
- Unexpected behaviour in comlex prompt & hires HOT 33
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stable-diffusion-webui-embedding-merge.