Extension for AUTOMATIC1111/stable-diffusion-webui for creating and merging Textual Inversion embeddings at runtime from string literals.

License: The Unlicense

Python 100.00%

stable-diffusion-webui-embedding-merge's Introduction

klimaleksus.github.io

Projects here:

stable-diffusion-webui-embedding-merge's People

Contributors

Stargazers

Watchers

Forkers

giga-bytes-dev jaedukseo razorback456 weihaozhaung cjvandyk hslghost1977 protonk conglesolutionx zrichz

stable-diffusion-webui-embedding-merge's Issues

Making embedding from prompt results in different results

First of all thanks for making such a great extension!

Second: when I try just to make an embedding from a prompt and negative prompt the results look different. It’s a pretty long regular prompt separated by commas as is the usual way.

Inline em fails in XYZ

I get different pictures for the same settings using inline EM and XYZ plot.
How to reproduce:
Write a prompt with inline EM. My example is:
<'Tzuyu' + 'Son Ye Jin' + 'Bae Suzy'>

Run XYZ plot with any useless parameter. I used "hires upscaler" with disabled hires.fix
What's expected: all the pictures are the same
What I get now: only the first image looks like the image without XYZ. All other images looks the same but different from the first one.

Safetensor vs. unsafe pickle support

First up: amazing work. I use this extension every time I'm prompting and it's a thing of beauty.

The issue: Automatic1111 doesn't enable unsafe unpickle by default, so unless the below-mentioned flag is passed to webui at startup (not a great idea, security-wise), creating embeddings via the Embedding Merge extension fails with the following error in the console:

*** Error verifying pickled file from F:\Utility\Automatic1111\stable-diffusion-webui\embeddings\_EmbeddingMerge_temp.pt
*** The file may be malicious, so the program is not going to read it.
*** You can skip this check with --disable-safe-unpickle commandline argument.
***
    Traceback (most recent call last):
      File "F:\Utility\Automatic1111\stable-diffusion-webui\modules\safe.py", line 137, in load_with_extra
        check_pt(filename, extra_handler)
      File "F:\Utility\Automatic1111\stable-diffusion-webui\modules\safe.py", line 84, in check_pt
        check_zip_filenames(filename, z.namelist())
      File "F:\Utility\Automatic1111\stable-diffusion-webui\modules\safe.py", line 76, in check_zip_filenames
        raise Exception(f"bad file inside {filename}: {name}")
    Exception: bad file inside F:\Utility\Automatic1111\stable-diffusion-webui\embeddings\_EmbeddingMerge_temp.pt: _EmbeddingMerge_temp/byteorder

---
Traceback (most recent call last):
  File "F:\Utility\Automatic1111\stable-diffusion-webui\extensions\stable-diffusion-webui-embedding-merge\scripts\embedding_merge.py", line 1151, in need_save_embed
    token = list(pt['string_to_param'].keys())[0]
TypeError: 'NoneType' object is not subscriptable

I started playing around with the EM scripts but it looks like there's a call to Automatic1111's textual_inversion.py (specifically, create_embedding). I tried pulling just that bit out separately, but kept getting errors; I have no idea what I'm missing (not coded in Python before).

Would it be possible to support saving (both the interim _EmbeddingMerge_temp file and the final output) in safetensors format instead of .pt?

Please let me know if I can provide additional information.

Extend the token length if at all possible!

Dunno if it IS POSSIBLE, and i'm aware this is sort of updated at will but you saved me a TON OF TIME AND I MADE LIKE 10 + embeds last night with this.

Some prompts have 100-200 tokens, and if possible it'd be interesting to see if you COULD in theory extend the token length with this plugin.

<3
Much adoration.
ThANK YOU!!

Strange behaviour with dynamic prompts

A prompt:
A girl next door <'__celeb-female__' + '__celeb-female__' + '__celeb-female__'> wearing casual clothes, outdoors, [by Luis Royo : RAW photo, film grain : 0.25]
Batch count: 2
Batch size: 2
The text under the image (the seed is OK, it changes):
A girl next door <'EM_1'> wearing casual clothes, outdoors, [by Luis Royo : RAW photo, film grain : 0.25] Negative prompt: sad, monotone, low quality, low resolution, [mutated | extra | missed | broken] fingers, text Steps: 30, Sampler: DPM++ 3M SDE Karras, CFG scale: 7, Seed: 3815115804, Size: 512x768, Model hash: 670934b0bd, Model: YACAM-SR-16286.fp16, RNG: CPU, EmbeddingMerge: "<'EM_1'>=<'Kathleen Robertson' + 'Zooey Deschanel' + 'Carla Gugino'>, <'EM_2'>=<'Kirsten Prout' + 'Gemma Ward' + 'Abigail Ratchford'>, <'EM_3'>=<'Miranda Kerr' + 'Anna Faris' + 'Paget Brewster'>, <'EM_4'>=<'Scout Taylor-Compton' + 'Briana Evigan' + 'Gwyneth Paltrow'>, <'EM_5'>=<'__celeb-female__' + '__celeb-female__' + '__celeb-female__'>", Eta: 0.5, Version: v1.7.0

door <'EM_1'> wearing is a key point. The number increases from image to image and the extension puts all combinations into all images.

Preparations and considerations for SD3 [Discussion Thread]

Seems SD3 will handle some stuff a bit different. ComfyUI has already added preliminary support for it and it might be something worth looking into. Especially the timestep embedding part. That is something new right?

Link to the SD3 support commit

embedding weights work differently?

I tried creating a new embedding from one i use with a 0.6 weight all the time.
The results from the new generated embedding (using the syntax shared here, and trying many different numbers from 0.9 to 0.2 or so) differ a lot from using the original embedding with 0.6 weight, as clearly seen by comparing generations with the same seed.

Is there a structural difference to the operation done by the normal weights and the ones applied by EM? is there a way to save the original 0.6 weight just as it is?

an example below. While the style is still there, the resulting image is completely different, and depending on the weight used, it works nearly the same as using the original with weight 1

Unexpected behaviour in comlex prompt & hires

My example prompt is:

[A female model : [<'Cameron'><' '><'Diaz'>|<'Drew'><' '><'Barrymore'>|<'Lucy'><' '><'Liu'>] : 0.25]

After rendering I see the face that looks like the expected mix, so the extension works as expected, but without "EmbeddingMerge" in the image info:

[A female model : [ <'EM_1'> <'EM_2'> | <'EM_3'> <'EM_4'> | <'EM_5'> <'EM_6'> ] : 0.25] Negative prompt: sad, monotone, low quality, low resolution, [mutated | extra | missed | broken] fingers Steps: 30, Sampler: DPM++ 3M SDE Karras, CFG scale: 7, Seed: 3564665820, Size: 512x768, Model hash: ce49fd5253, Model: YACAM-SR-16111.fp16, RNG: CPU, CDT: "1,1,0,0,0,0,0,0,0,0,1,-1,1,0", Eta: 0.5, Version: v1.7.0

Hires.fix changes the face so much, see below. The face looks like a1111 painted the checkpoint's default face over, so I'm not sure if the extension works correctly here. After hires.fix I see a separate string for hires.fix in the image info:

[A female model : [ <'EM_1'> <'EM_2'> | <'EM_3'> <'EM_4'> | <'EM_5'> <'EM_6'> ] : 0.25] Negative prompt: sad, monotone, low quality, low resolution, [mutated | extra | missed | broken] fingers Steps: 30, Sampler: DPM++ 3M SDE Karras, CFG scale: 7, Seed: 3564665820, Size: 512x768, Model hash: ce49fd5253, Model: YACAM-SR-16111.fp16, Denoising strength: 0.4, RNG: CPU, Hires prompt: "[A female model : [<'Cameron'><' '><'Diaz'>|<'Drew'><' '><'Barrymore'>|<'Lucy'><' '><'Liu'>] : 0.25]", Hires upscale: 2, Hires steps: 15, Hires upscaler: 4x_NMKD-Siax_200k, CDT: "1,1,0,0,0,0,0,0,0,0,1,-1,1,0", Eta: 0.5, Version: v1.7.0

Conversation my prompt to a simple one changes the image(obviously) but doesn't have any influence on the extension
[A female model | <'Cameron'><' '><'Diaz'>|<'Drew'><' '><'Barrymore'>|<'Lucy'><' '><'Liu'>]

Error when enlarging with SDXL and Forge

When trying the <'artstation' + 'artstation' :4 :+2> example with SDXL and forge it gives

Traceback (most recent call last):
  File "E:\stable-diffusion-webui-forge\extensions\stable-diffusion-webui-embedding-merge\scripts\embedding_merge.py", line 1339, in merge_one_prompt
    (res,err) = merge_parser(part,only_count)
  File "E:\stable-diffusion-webui-forge\extensions\stable-diffusion-webui-embedding-merge\scripts\embedding_merge.py", line 808, in merge_parser
    target[1][0:vectors] = right[1]
RuntimeError: The expanded size of the tensor (768) must match the existing size (1280) at non-singleton dimension 1.  Target sizes: [2, 768].  Tensor sizes: [2, 1280]

Forge Commit hash: 29be1da7cf2b5dccfc70fbdd33eb35c56a31ffb7
Merge hash: 72181d7

Prompt length counter is buggy when attention parenthesis are used around merge expressions.

Steps to reproduce:

Write a prompt X, it would have length = 1
Change it to (X) – it is still 1
Retry with merge expression: <'X'>, it is correctly 1
Change to (<'X'>) – it shows 2, while it should be still 1

I'll try to fix this by creating another empty type of ephemeral embeddings on the fly and caching it, then call WebUI to count everything as it should be done in the first place.

An error after Dyn Prompt update

I've reported the issue there
adieyal/sd-dynamic-prompts#761
Can you help?

Updating for WebUI version 1.4.0

Since my PR to upstream was merged, I need to update Embedding Merge to use that new internal function:
AUTOMATIC1111/stable-diffusion-webui#10803

It should not change anything for users. It is also backwards-compatible with previous versions of WebUI.
There will be some additional fixes to the table on EM tab:

Restored cell padding after Gradio update
Make "Index" column show all rows separately when viewing "By vectors"

Better SDXL support? Individual control over two CLIPs

How merge expression syntax could be enhanced to incorporate an independent manipulation or L (CLIP as in SD1) and G (OpenCLIP) clips of SDXL?

Currently <'cat'*2+'1girl'> will:

Multiply L and G of "cat" by 2.0, independently.
Pad shortest token ("cat") with zero vectors to max length (of "1girl" which is 2)
Sum L of padded "cat" with L of "1girl" and put to L; accordingly, do the same with G.

What do we want:

Multiply L and G separately of each other (e.g. L*2 but G*1; or L*0.3 and G*0.7)
Combine L from one string with G of another string

What we cannot have:

Different lengths of L and G in one and the same embedding
Swapping places of L and G vectors (they have different depth dimension)
Load SD1 or SD2 embeddings to use their L or G, because WebUI does not list them in SDXL mode whatsoever.
Parenthesis or grouping, since the math parser is rather simple, it can only postpone +/-, or do *///: right away, operating only on two internal variables ("left" operand and "right" operand: * does right=right*this and + does left=left+right; right=this;)

A few ideas:

Two different merge expressions, controlling L (first part) and G (second part) separately:

<'use clip'*1.4 | 'this is OpenCLIP'*0.5>

What if lengths are different? Throw an error or pad silently?

Zero-fill L/G operator:

<'this is OpenCLIP'*0.5:G + use clip'*1.4:L>
('X':L will zero-fill G-part of 'X'; read as "use L" )

Also see a89dde6#commitcomment-140709559

[Question] Difference words vs embeddings

I have such phrases in my negative field:
low quality, low resolution
so the word "low" is presented twice there.

is there a difference if i convert them to inline embeddings like these:
<'low' + 'quality'>, <'low' + 'resolution'>

module not found on merge

No module named '_rebuild_device_tensor_from_numpy'

Any plans to work on SDXL?

I realize that SDXL is extremely new, and i'm aware you probably don't have a ton of time, but this is like my GO-To plugin, and making TI's out of prompts doesn't replace trained ones - but it enhances things in ways that sometimes trained TI's to me feel that would be more static if you get what i mean?

I'm happy to wait until the post apocalypse happens for this to update :P

Compatibility with WebUI Forge

When using EM with https://github.com/lllyasviel/stable-diffusion-webui-forge it gives an error at start:

TypeError: _webui_embedding_merge_.<locals>.hook_prompt_lengths() takes 2 positional arguments but 3 were given

I'll see what can be done.

expected Tensor, but got tuple

After updating Auto1111 a couple weeks ago, EM stopped working. It can analyze the tokens, but when it tries to save an embedding I get the following stack trace:

Traceback (most recent call last):
  File [file path]"\stable-diffusion-webui-embedding-merge\scripts\embedding_merge.py", line 1231, in need_save_embed
    vectors = [torch.cat([r[0] for r in pair[0]])]
TypeError: expected Tensor as element 0 in argument 0, but got tuple

[Question] Break and combine words

Let's take a word "description"
It is a token "13951"

Is there a way to break the word into parts "des" "crip" "ti" "on" and then combine such as <'des' + 'crip' + 'ti' + 'on'> and have the same sence/token/vector/etc?

klimaleksus / stable-diffusion-webui-embedding-merge Goto Github PK