Git Product home page Git Product logo

stable-diffusion-webui-embedding-merge's Introduction

stable-diffusion-webui-embedding-merge's People

Contributors

klimaleksus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

stable-diffusion-webui-embedding-merge's Issues

Making embedding from prompt results in different results

First of all thanks for making such a great extension!

Second: when I try just to make an embedding from a prompt and negative prompt the results look different. It’s a pretty long regular prompt separated by commas as is the usual way.

Inline em fails in XYZ

I get different pictures for the same settings using inline EM and XYZ plot.
How to reproduce:
Write a prompt with inline EM. My example is:
<'Tzuyu' + 'Son Ye Jin' + 'Bae Suzy'>
984747234-36-DPM++ 3M SDE Karras-103154_473983

Run XYZ plot with any useless parameter. I used "hires upscaler" with disabled hires.fix
What's expected: all the pictures are the same
What I get now: only the first image looks like the image without XYZ. All other images looks the same but different from the first one.
984747234-36-DPM++ 3M SDE Karras-103310_168499

Safetensor vs. unsafe pickle support

First up: amazing work. I use this extension every time I'm prompting and it's a thing of beauty.

The issue: Automatic1111 doesn't enable unsafe unpickle by default, so unless the below-mentioned flag is passed to webui at startup (not a great idea, security-wise), creating embeddings via the Embedding Merge extension fails with the following error in the console:

*** Error verifying pickled file from F:\Utility\Automatic1111\stable-diffusion-webui\embeddings\_EmbeddingMerge_temp.pt
*** The file may be malicious, so the program is not going to read it.
*** You can skip this check with --disable-safe-unpickle commandline argument.
***
    Traceback (most recent call last):
      File "F:\Utility\Automatic1111\stable-diffusion-webui\modules\safe.py", line 137, in load_with_extra
        check_pt(filename, extra_handler)
      File "F:\Utility\Automatic1111\stable-diffusion-webui\modules\safe.py", line 84, in check_pt
        check_zip_filenames(filename, z.namelist())
      File "F:\Utility\Automatic1111\stable-diffusion-webui\modules\safe.py", line 76, in check_zip_filenames
        raise Exception(f"bad file inside {filename}: {name}")
    Exception: bad file inside F:\Utility\Automatic1111\stable-diffusion-webui\embeddings\_EmbeddingMerge_temp.pt: _EmbeddingMerge_temp/byteorder

---
Traceback (most recent call last):
  File "F:\Utility\Automatic1111\stable-diffusion-webui\extensions\stable-diffusion-webui-embedding-merge\scripts\embedding_merge.py", line 1151, in need_save_embed
    token = list(pt['string_to_param'].keys())[0]
TypeError: 'NoneType' object is not subscriptable

I started playing around with the EM scripts but it looks like there's a call to Automatic1111's textual_inversion.py (specifically, create_embedding). I tried pulling just that bit out separately, but kept getting errors; I have no idea what I'm missing (not coded in Python before).

Would it be possible to support saving (both the interim _EmbeddingMerge_temp file and the final output) in safetensors format instead of .pt?

Please let me know if I can provide additional information.

Extend the token length if at all possible!

Dunno if it IS POSSIBLE, and i'm aware this is sort of updated at will but you saved me a TON OF TIME AND I MADE LIKE 10 + embeds last night with this.

Some prompts have 100-200 tokens, and if possible it'd be interesting to see if you COULD in theory extend the token length with this plugin.

<3
Much adoration.
ThANK YOU!!

Strange behaviour with dynamic prompts

A prompt:
A girl next door <'__celeb-female__' + '__celeb-female__' + '__celeb-female__'> wearing casual clothes, outdoors, [by Luis Royo : RAW photo, film grain : 0.25]
Batch count: 2
Batch size: 2
The text under the image (the seed is OK, it changes):
A girl next door <'EM_1'> wearing casual clothes, outdoors, [by Luis Royo : RAW photo, film grain : 0.25] Negative prompt: sad, monotone, low quality, low resolution, [mutated | extra | missed | broken] fingers, text Steps: 30, Sampler: DPM++ 3M SDE Karras, CFG scale: 7, Seed: 3815115804, Size: 512x768, Model hash: 670934b0bd, Model: YACAM-SR-16286.fp16, RNG: CPU, EmbeddingMerge: "<'EM_1'>=<'Kathleen Robertson' + 'Zooey Deschanel' + 'Carla Gugino'>, <'EM_2'>=<'Kirsten Prout' + 'Gemma Ward' + 'Abigail Ratchford'>, <'EM_3'>=<'Miranda Kerr' + 'Anna Faris' + 'Paget Brewster'>, <'EM_4'>=<'Scout Taylor-Compton' + 'Briana Evigan' + 'Gwyneth Paltrow'>, <'EM_5'>=<'__celeb-female__' + '__celeb-female__' + '__celeb-female__'>", Eta: 0.5, Version: v1.7.0

door <'EM_1'> wearing is a key point. The number increases from image to image and the extension puts all combinations into all images.

embedding weights work differently?

I tried creating a new embedding from one i use with a 0.6 weight all the time.
The results from the new generated embedding (using the syntax shared here, and trying many different numbers from 0.9 to 0.2 or so) differ a lot from using the original embedding with 0.6 weight, as clearly seen by comparing generations with the same seed.

Is there a structural difference to the operation done by the normal weights and the ones applied by EM? is there a way to save the original 0.6 weight just as it is?

an example below. While the style is still there, the resulting image is completely different, and depending on the weight used, it works nearly the same as using the original with weight 1

00102-4041420028
00101-4041420028

Unexpected behaviour in comlex prompt & hires

My example prompt is:

[A female model : [<'Cameron'><' '><'Diaz'>|<'Drew'><' '><'Barrymore'>|<'Lucy'><' '><'Liu'>] : 0.25]

3564665820-30-DPM++ 3M SDE Karras-010130_661683

After rendering I see the face that looks like the expected mix, so the extension works as expected, but without "EmbeddingMerge" in the image info:

[A female model : [ <'EM_1'> <'EM_2'> | <'EM_3'> <'EM_4'> | <'EM_5'> <'EM_6'> ] : 0.25] Negative prompt: sad, monotone, low quality, low resolution, [mutated | extra | missed | broken] fingers Steps: 30, Sampler: DPM++ 3M SDE Karras, CFG scale: 7, Seed: 3564665820, Size: 512x768, Model hash: ce49fd5253, Model: YACAM-SR-16111.fp16, RNG: CPU, CDT: "1,1,0,0,0,0,0,0,0,0,1,-1,1,0", Eta: 0.5, Version: v1.7.0

Hires.fix changes the face so much, see below. The face looks like a1111 painted the checkpoint's default face over, so I'm not sure if the extension works correctly here. After hires.fix I see a separate string for hires.fix in the image info:

[A female model : [ <'EM_1'> <'EM_2'> | <'EM_3'> <'EM_4'> | <'EM_5'> <'EM_6'> ] : 0.25] Negative prompt: sad, monotone, low quality, low resolution, [mutated | extra | missed | broken] fingers Steps: 30, Sampler: DPM++ 3M SDE Karras, CFG scale: 7, Seed: 3564665820, Size: 512x768, Model hash: ce49fd5253, Model: YACAM-SR-16111.fp16, Denoising strength: 0.4, RNG: CPU, Hires prompt: "[A female model : [<'Cameron'><' '><'Diaz'>|<'Drew'><' '><'Barrymore'>|<'Lucy'><' '><'Liu'>] : 0.25]", Hires upscale: 2, Hires steps: 15, Hires upscaler: 4x_NMKD-Siax_200k, CDT: "1,1,0,0,0,0,0,0,0,0,1,-1,1,0", Eta: 0.5, Version: v1.7.0

3564665820-30-DPM++ 3M SDE Karras-010231_613205

Conversation my prompt to a simple one changes the image(obviously) but doesn't have any influence on the extension
[A female model | <'Cameron'><' '><'Diaz'>|<'Drew'><' '><'Barrymore'>|<'Lucy'><' '><'Liu'>]

Error when enlarging with SDXL and Forge

When trying the <'artstation' + 'artstation' :4 :+2> example with SDXL and forge it gives

Traceback (most recent call last):
  File "E:\stable-diffusion-webui-forge\extensions\stable-diffusion-webui-embedding-merge\scripts\embedding_merge.py", line 1339, in merge_one_prompt
    (res,err) = merge_parser(part,only_count)
  File "E:\stable-diffusion-webui-forge\extensions\stable-diffusion-webui-embedding-merge\scripts\embedding_merge.py", line 808, in merge_parser
    target[1][0:vectors] = right[1]
RuntimeError: The expanded size of the tensor (768) must match the existing size (1280) at non-singleton dimension 1.  Target sizes: [2, 768].  Tensor sizes: [2, 1280]

Forge Commit hash: 29be1da7cf2b5dccfc70fbdd33eb35c56a31ffb7
Merge hash: 72181d7

Prompt length counter is buggy when attention parenthesis are used around merge expressions.

Steps to reproduce:

  1. Write a prompt X, it would have length = 1
  2. Change it to (X) – it is still 1
  3. Retry with merge expression: <'X'>, it is correctly 1
  4. Change to (<'X'>) – it shows 2, while it should be still 1

I'll try to fix this by creating another empty type of ephemeral embeddings on the fly and caching it, then call WebUI to count everything as it should be done in the first place.

Updating for WebUI version 1.4.0

Since my PR to upstream was merged, I need to update Embedding Merge to use that new internal function:
AUTOMATIC1111/stable-diffusion-webui#10803

It should not change anything for users. It is also backwards-compatible with previous versions of WebUI.
There will be some additional fixes to the table on EM tab:

  • Restored cell padding after Gradio update
  • Make "Index" column show all rows separately when viewing "By vectors"

Better SDXL support? Individual control over two CLIPs

How merge expression syntax could be enhanced to incorporate an independent manipulation or L (CLIP as in SD1) and G (OpenCLIP) clips of SDXL?

Currently <'cat'*2+'1girl'> will:

  1. Multiply L and G of "cat" by 2.0, independently.
  2. Pad shortest token ("cat") with zero vectors to max length (of "1girl" which is 2)
  3. Sum L of padded "cat" with L of "1girl" and put to L; accordingly, do the same with G.

What do we want:

  • Multiply L and G separately of each other (e.g. L*2 but G*1; or L*0.3 and G*0.7)
  • Combine L from one string with G of another string

What we cannot have:

  • Different lengths of L and G in one and the same embedding
  • Swapping places of L and G vectors (they have different depth dimension)
  • Load SD1 or SD2 embeddings to use their L or G, because WebUI does not list them in SDXL mode whatsoever.
  • Parenthesis or grouping, since the math parser is rather simple, it can only postpone +/-, or do *///: right away, operating only on two internal variables ("left" operand and "right" operand: * does right=right*this and + does left=left+right; right=this;)

A few ideas:

  1. Two different merge expressions, controlling L (first part) and G (second part) separately:

<'use clip'*1.4 | 'this is OpenCLIP'*0.5>

  • What if lengths are different? Throw an error or pad silently?
  1. Zero-fill L/G operator:

<'this is OpenCLIP'*0.5:G + use clip'*1.4:L>
('X':L will zero-fill G-part of 'X'; read as "use L" )

Also see a89dde6#commitcomment-140709559

[Question] Difference words vs embeddings

I have such phrases in my negative field:
low quality, low resolution
so the word "low" is presented twice there.

is there a difference if i convert them to inline embeddings like these:
<'low' + 'quality'>, <'low' + 'resolution'>

Any plans to work on SDXL?

I realize that SDXL is extremely new, and i'm aware you probably don't have a ton of time, but this is like my GO-To plugin, and making TI's out of prompts doesn't replace trained ones - but it enhances things in ways that sometimes trained TI's to me feel that would be more static if you get what i mean?

I'm happy to wait until the post apocalypse happens for this to update :P

expected Tensor, but got tuple

After updating Auto1111 a couple weeks ago, EM stopped working. It can analyze the tokens, but when it tries to save an embedding I get the following stack trace:

Traceback (most recent call last):
  File [file path]"\stable-diffusion-webui-embedding-merge\scripts\embedding_merge.py", line 1231, in need_save_embed
    vectors = [torch.cat([r[0] for r in pair[0]])]
TypeError: expected Tensor as element 0 in argument 0, but got tuple

[Question] Break and combine words

Let's take a word "description"
It is a token "13951"

Is there a way to break the word into parts "des" "crip" "ti" "on" and then combine such as <'des' + 'crip' + 'ti' + 'on'> and have the same sence/token/vector/etc?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.