Git Product home page Git Product logo

sd-webui-cutoff's Introduction

Cutoff - Cutting Off Prompt Effect

cover

Update Info

Upper is newer.

026ff95a492a533a4a6e5fb2959c2324258c232c
SDXL support.
527ed922b2c4f8d2620376589dfce0f9f4b622ad
Add support for the newer version of WebUI.
20e87ce264338b824296b7559679ed1bb0bdacd7
Skip empty targets.
03bfe60162ba418e18dbaf8f1b9711fd62195ef3
Add Disable for Negative prompt option. Default is True.
f0990088fed0f5013a659cacedb194313a398860
Accept an empty prompt.

What is this?

This is an extension for stable-diffusion-webui which limits the tokens' influence scope.

SDv1, SDv2 and SDXL are supported.

Usage

  1. Select Enabled checkbox.
  2. Input words which you want to limit scope in Target tokens.
  3. Generate images.

Note

If the generated image was corrupted or something like that, try to change the Weight value or change the interpolation method to SLerp. Interpolation method can be found in Details.

Details section

Disable for Negative prompt.
If enabled, Cutoff will not work for the negative prompt. Default is true.
Cutoff strongly.
See description below. Default is false.
Interpolation method
How "padded" and "original" vectors will be interpolated. Default is Lerp.
Padding token
What token will be padded instead of Target tokens. Default is _ (underbar).

Examples

SDv1

7th_anime_v3_A-fp16 / kl-f8-anime2 / DPM++ 2M Karras / 15 steps / 512x768
Prompt: a cute girl, white shirt with green tie, red shoes, blue hair, yellow eyes, pink skirt
Negative Prompt: (low quality, worst quality:1.4), nsfw
Target tokens: white, green, red, blue, yellow, pink

Sample 1.

sample 1

Sample 2. (use SLerp for interpolation)

sample 2

Sample 3.

sample 3

SDXL

It seems that the Stability AI's base model of SDXL is much improved on token separation. So the effect of cutoff is limited.

(some models) / sdxl_vae / DPM++ 3M SDE / 50 steps / 768x1344
Prompt: full body shot of a cute girl, wearing white shirt with green tie, red shoes, blue hair, yellow eyes, pink skirt
Negative Prompt: (low quality, worst quality:1.4), nsfw, close up
Target tokens: white, green, red, blue, yellow, pink
Cutoff weight: 1.0

Sample 4. (Model = sd_xl_base_1.0)

sample 4

Sample 5. (Model = hassakuXLSfwNsfw_alphaV07)

sample 5

How it works

or see #5.

idea

Japanese

プロンプトをCLIPに通して得られる (77, 768) 次元の埋め込み表現(?正式な用語は分かりません)について、 ごく単純には、77個の行ベクトルはプロンプト中の75個のトークン(+開始トークン+終了トークン)に対応していると考えられる。

※上図は作図上、この説明とは行と列を入れ替えて描いている。

このベクトルには単語単体の意味だけではなく、文章全体の、例えば係り結びなどの情報を集約したものが入っているはずである。

ここで a cute girl, pink hair, red shoes というプロンプトを考える。 普通、こういったプロンプトの意図は

  1. pinkhair だけに係っており shoes には係っていない。
  2. 同様に redhair には係っていない。
  3. a cute girl は全体に係っていて欲しい。hairshoes は女の子に合うものが出て欲しい。

……というもののはずである。

しかしながら、EvViz2 などでトークン間の関係を見ると、そううまくはいっていないことが多い。 つまり、shoes の位置のベクトルに pink の影響が出てしまっていたりする。

一方で上述の通り a cute girl の影響は乗っていて欲しいわけで、どうにかして、特定のトークンの影響を取り除けるようにしたい。

この拡張では、指定されたトークンを padding token に書き換えることでそれを実現している。

たとえば red shoes の部分に対応して a cute girl, _ hair, red shoes というプロンプトを生成する。redshoes に対応する位置のベクトルをここから生成したもので上書きしてやることで、pink の影響を除外している。

これを pink の側から見ると、自分の影響が pink hair の範囲内に制限されているように見える。What is this? の "limits the tokens' influence scope" はそういう意味。

ところで a cute girl の方は、pink hair, red shoes の影響を受けていてもいいし受けなくてもいいような気がする。 そこでこの拡張では、こういうどちらでもいいプロンプトに対して

  1. a cute girl, pink hair, red shoes
  2. a cute girl, _ hair, _ shoes

のどちらを適用するか選べるようにしている。DetailsCutoff strongly がそれで、オフのとき1.を、オンのとき2.を、それぞれ選ぶようになっている。 元絵に近いのが出るのはオフのとき。デフォルトもこちらにしてある。

English

NB. The following text is a translation of the Japanese text above by DeepL.

For the (77, 768) dimensional embedded representation (I don't know the formal terminology), one could simply assume that the 77 row vectors correspond to the 75 tokens (+ start token and end token) in the prompt.

Note: The above figure is drawn with the rows and columns interchanged from this explanation.

This vector should contain not only the meanings of individual words, but also the aggregate information of the whole sentence, for example, the connection between words.

Consider the prompt a cute girl, pink hair, red shoes. Normally, the intent of such a prompt would be

  • pink is only for hair, not shoes.
  • Similarly, red does not refer to hair.
  • We want a cute girl to be about the whole thing, and we want the hair and shoes to match the girl.

However, when we look at the relationship between tokens in EvViz2 and other tools, we see that it is not always that way. In other words, the position vector of the shoes may be affected by pink.

On the other hand, as mentioned above, we want the influence of a cute girl to be present, so we want to be able to somehow remove the influence of a specific token.

This extension achieves this by rewriting the specified tokens as a padding token.

For example, for the red shoes part, we generate the prompt a cute girl, _ hair, red shoes, and by overwriting the position vectors corresponding to red and shoes with those generated from this prompt, we remove the influence of pink.

From pink's point of view, it appears that its influence is limited to the pink hair's scope.

By the way, a cute girl may or may not be influenced by pink hair and red shoes. So, in this extension, for such a prompt that can be either

  1. a cute girl, pink hair, red shoes
  2. a cute girl, _ hair, _ shoes

The Cutoff strongly in the Details section allows you to select 1 when it is off and 2 when it is on. The one that comes out closer to the original image is "off". The default is also set this way.

sd-webui-cutoff's People

Contributors

drjkl avatar hnmr293 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sd-webui-cutoff's Issues

AssertionError

Wanted to try it today for something, returns this error when generating. Worked fine like a week ago I think

Traceback (most recent call last):
  File "/content/drive/MyDrive/stable-diffusion-webui-colab/stable-diffusion-webui/modules/call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "/content/drive/MyDrive/stable-diffusion-webui-colab/stable-diffusion-webui/modules/call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "/content/drive/MyDrive/stable-diffusion-webui-colab/stable-diffusion-webui/modules/txt2img.py", line 56, in txt2img
    processed = process_images(p)
  File "/content/drive/MyDrive/stable-diffusion-webui-colab/stable-diffusion-webui/modules/processing.py", line 486, in process_images
    res = process_images_inner(p)
  File "/content/drive/MyDrive/stable-diffusion-webui-colab/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/batch_hijack.py", line 42, in processing_process_images_hijack
    return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
  File "/content/drive/MyDrive/stable-diffusion-webui-colab/stable-diffusion-webui/modules/processing.py", line 626, in process_images_inner
    c = get_conds_with_caching(prompt_parser.get_multicond_learned_conditioning, prompts, p.steps, cached_c)
  File "/content/drive/MyDrive/stable-diffusion-webui-colab/stable-diffusion-webui/modules/processing.py", line 570, in get_conds_with_caching
    cache[1] = function(shared.sd_model, required_prompts, steps)
  File "/content/drive/MyDrive/stable-diffusion-webui-colab/stable-diffusion-webui/modules/prompt_parser.py", line 205, in get_multicond_learned_conditioning
    learned_conditioning = get_learned_conditioning(model, prompt_flat_list, steps)
  File "/content/drive/MyDrive/stable-diffusion-webui-colab/stable-diffusion-webui/modules/prompt_parser.py", line 140, in get_learned_conditioning
    conds = model.get_learned_conditioning(texts)
  File "/content/drive/MyDrive/stable-diffusion-webui-colab/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 669, in get_learned_conditioning
    c = self.cond_stage_model(c)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1215, in _call_impl
    hook_result = hook(self, input, result)
  File "/content/drive/MyDrive/stable-diffusion-webui-colab/stable-diffusion-webui/extensions/sd-webui-cutoff/scripts/cutoff.py", line 142, in hook
    assert tensor.shape == t.shape
AssertionError

More details about the "gather embeddings"?

It would be nice if could give more details about the gather embeddings ? From the image from README.md, it seems that just simple concat the clip embeddings but still got an shape of 768*77, just same as the output from hide all tokens's.

I have a hard time understanding what this CUTS OFF

I tried a different scenario, unrelated to colors.

When I apply a MASK TI to the prompt, instead of adding only the MASK, it changes the character.

ORIGINAL
00062-2756686453

<lora:Pencil-lighter:1> a lion sitting on a bench in front of a pond, a bird flying in the blue sky, (beautiful eyes:1.2), cute cartoon character, very detailed pencil drawing, (solo: 1.9), (beautiful face:1), (beautiful eyes:1)

ADDING MASK
image

<lora:Pencil-lighter:1> a lion sitting on a bench in front of a pond, a bird flying in the blue sky, (beautiful eyes:1.2), cute cartoon character, very detailed pencil drawing, (solo: 1.9), (beautiful face:1), (beautiful eyes:1), OVERMASKED

As you can see it completely changed the LION. Which of these words in the prompt should I add to Target Tokens, and what should the other parameters be?

generation fails when there is no negative prompt

repro steps

  1. attempt to generate something with addon enabled and no negative prompt

error

ValueError: max() arg is an empty sequence

Just making this in case someone else runs into the same issue, just make sure to include a negative prompt and everything works fine.

fails if negative tokens over 75

when i try using the extension with a negative embedding that puts the token count over 75 it does this
Capture

works fine without it, also works fine regardless of how high the positive token count is

[BUG?]: import name issue

With A1111 v1.8.0, I get the following error on startup:

*** Error loading script: cutoff.py
Traceback (most recent call last):
File "V:\AI_programms\stable-diffusion-webui-180\modules\scripts.py", line 527, in load_scripts
script_module = script_loading.load_module(scriptfile.path)
File "V:\AI_programms\stable-diffusion-webui-180\modules\script_loading.py", line 10, in load_module
module_spec.loader.exec_module(module)
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "V:\AI_programms\stable-diffusion-webui-180\extensions\sd-webui-cutoff\scripts\cutoff.py", line 13, in
from scripts.cutofflib.embedding import CLIP, CLIP_SDXL, generate_prompts, token_to_block
ImportError: cannot import name 'CLIP_SDXL' from 'scripts.cutofflib.embedding' (V:\AI_programms\stable-diffusion-webui-180\scripts\cutofflib\embedding.py)


Any ideas what this causes and if it can/will be fixed? 🤔
Also, how is the status of cutoff in regard to functioning with photo real checkpoints?

ComfyUI requires a CutOff node

Hello, cutoff is a good extension, because the most troublesome aspect of stable diffusion is color pollution. Recently, due to work reasons, we have to transfer the workflow from auto111 to comfyUI. However, cutoff is essential. If the author or some code master has time, PLS create a cutoff node for comfyUI,

Thank you. I wish you have a nice day!

Parse Prompt into Target tokens automatically

Great extension. also can be better.

There is no need to fill Target tokens manually. You can parse prompt, split them by "," and " " to get the first word, then fill then into Target tokens automatically.

And you can also create a color word list, as a filter, to remove non-color words, which is an option can be enabled by user.

It won't be perfect, but still be more convenient

Request to have the plugin compatible with wildcards

Hello,
when using a wildcard with dynamic prompt (https://github.com/adieyal/sd-dynamic-prompts), assuming i have wild cards for colors named colors1 and colors2, the following prompt will generate random colors at generation time :

a cute girl, __colors1__shirt with __colors2__tie, __colors1__shoes, __colors1__hair, __colors2__eyes, __colors2__skirt
2

for example, it will generate this :
a cute girl, white shirt with green tie, red shoes, blue hair, yellow eyes, pink skirt

i thought it would be cool if we could enter __ colors1__ , __colors2__in the settings of the cutoff extension to make sure that any generated word for colors1 and colors2 is selected for the cutoff. What do you think ?

Anyway, thank you for your awesome and much needed extension.
Best regards,

(Documentation Request) Descriptive Colours & General Questions

I was testing this out in Automatic1111 SD and it looks to be pretty promising. I have a few questions though, and I hope you could add to the README after clarifying.

How does this handles multi word colours and also other non-colour adjectives? Say for example you have 1girl, white shirt, dark brown jacket, denim jeans, blonde hair. Would my target tokens be: white, dark brown, denim, blonde?

There are two things I am curious about, can it do the dark brown part or are multi-word colours not supported?

Secondly, say I'm using denim instead of say blue, or blonde/brunette instead of gold/brown, will this still work?

I also have general questions regarding some points someone mentioned on the CIVITAI promotion of this extension. Can you elaborate on the 75 token limit they might be referring to? Is that a limit on it functioning, or is it more of a performance limit and it still works despite this?

Also they mentioned the targets must have a trailing comma, e.g. white, blue, instead of white, blue, is this true? Seems like a fairly simply bug to fix and wouldn't surprise me if it has been resolved.

That's all I need to know for now, great work on this 👍 I look forward to how it will improve in the future, maybe some QoL things like auto detection/autofill like another user suggested.

Working with multiple characters.

I'm working with sd-webui's API + controlnet + sd-webui-cutoff to take black and white line art manga/comics, and color them with the correct colors for the characters that were detected (by gpt4o) in a given panel of a comic/manga.

This works (somewhat) well when there is a single character in the image.

But it doesn't work when there are multiple, it doesn't know which character has what color hair, color clothes, etc.

How do I solve this?

I'm looking for any solution, no matter how difficult, to try to implement. If there are multiple possible ways to attempt this, i'd like to learn about / try all of them.

Any help would be very very welcome.

Thank you

Unable to see the difference

Hello, I'm having hard time figuring out if this extensions is working for me. Turning it off and on or changing the weight slider just doesn't produce differences I'm able to notice. Logs appear to be doing something, so not sure if this is intended behavior or not.

xyz_grid-0031-3748336610
image

SDXL support

Does this work with SDXL? I tried but it fails to run with this plugin enabled.

Thanks

Webui api

Hi,
How can I use it with webui api?

webui style pnginfo

Cutoff targets: ["white", "black", "green", "red", "blue", "yellow", "pink", "purple", , "bronze", "blonde", "silver", "magenta"],

=>

Cutoff targets: "white, black, green, red, blue, yellow, pink, purple, , bronze, blonde, silver, magenta",

SD1.5をcolab環境下で動かすと以下のエラーが発生

Warning: ControlNet failed to load SGM - will use LDM instead.
[Cutoff] failed to load sgm.modules.GeneralConditioner
*** Error loading script: cutoff.py
Traceback (most recent call last):
File "/content/stable-diffusion-webui/modules/scripts.py", line 274, in load_scripts
script_module = script_loading.load_module(scriptfile.path)
File "/content/stable-diffusion-webui/modules/script_loading.py", line 10, in load_module
module_spec.loader.exec_module(module)
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/content/stable-diffusion-webui/extensions/sd-webui-cutoff/scripts/cutoff.py", line 13, in
from scripts.cutofflib.embedding import CLIP, CLIP_SDXL, generate_prompts, token_to_block
File "/content/stable-diffusion-webui/extensions/sd-webui-cutoff/scripts/cutofflib/embedding.py", line 15, in
class ClipWrapper:
File "/content/stable-diffusion-webui/extensions/sd-webui-cutoff/scripts/cutofflib/embedding.py", line 16, in ClipWrapper
def init(self, te: Union[CLIP,CLIP_SDXL]):
NameError: name 'CLIP_SDXL' is not defined

このようなエラーが発生し、項目自体が表示されず利用が出来ないようです。

Add cutoff to the diffusers

Thanks for your nice work! Can I find some similar code at huggingface's diffusers? Or can you add this work to the diffusers?

Doesnt work

Tried to disable another axtentions, change clip skip, different weight but it still doesnt work.

prompt: a cute girl, white shirt with green tie, red shoes, blue hair, yellow eyes, pink skirt
target tokens: white, green, red, blue, yellow, pink

image

del

Is it possible to train a LoRA together with an Embedding? Here are some thoughts that came to this, when training a LoRA for an object:

  1. Training the entire CLIP is wrong. It is best left frozen.
  2. Without learnable CLIP, we cannot change the meaning of words.
  3. With or without learned CLIP, given a prompt "a photo of sks in the forest" – why would LoRA learn sks but not learn photo and forest along?
  4. Generally, I do not want to learn anything except my token.
  5. You could say "just use TI then!", but Embeddings are weak at learning complex concepts.
  6. You could say "use regularization then!", but in this case there is no "class word" (and I don't want to introduce it); making regularization against "forest" and anything I might have in descriptions – feels wrong.
  7. If it would be possible to use a learnable embedding in place of chosen token ("sks", possibly initialized with class word), then it would be more correct, because the object would be clearly stored inside this embedding and not in any other word.
  8. LoRA general training should help the embedding to reach its target quicker. It's a compromise between training the entire CLIP or not at all.
  9. Learning rate for the embedding should be set differently than learning rate for U-Net (or for CLIP if needed), because the best speed is yet to be discovered.

What do you think? Otherwise, I'm not quite sure how to train LoRA on something that is not a character nor a style. For example, to train a LoRA for "scar" concept: what descriptions should we choose?
Should we say "sks over eye, 1boy, …"? If so, isn't it more logical to say directly "scar over eye, 1boy, …"? But if so, how can we be sure that only the concept of "scar" would be changed, and not the concept of "1boy"?

Allow padding token to be a zero-vector embedding, proposing "0"

Working on my extension Embedding Merge, I found out that if you zero-fill TI-embedding and replace a part of the prompt with it – then you will have the result completely without the affected concept, with almost no influence to other concepts in the prompt.

Actually, I implemented a pre-multiplier that you can add to any word to change it weight before the CLIP phase. So, by multiplying by zero you can get rid of anything, while still have "padding" for other words.
From my experiments, such zero-padding is working better than padding with commas, exclamation signs, with pad_token and so on, especially when merging parts with different length (for primary purpose of my extension).

Would it be possible to implement zero-vector padding on your extension too?
So we could compare, will it be better or not.

I propose putting the number "0" for padding token, since actually 0 stands for "!" while 256 stands for "!</w>" which is used each time the "!" parsed, so token #0 is impossible to enter in prompts anyway; using "0" in your extension the result is exactly the same as using "256".
So token 0 is not effective currently, and can be redefined to produce zero-filled vectors.

Use this extension with SDXL, reports an error.

Traceback (most recent call last):
File "D:\SD\webui_forge_cu121_torch21\webui\modules\call_queue.py", line 57, in f
res = list(func(*args, **kwargs))
TypeError: 'NoneType' object is not iterable

when use it with sd1.5 modelm it works.

thx

PLEASE add Cutoff to Vladmantic

This version of Auto1111 has the perfect features, save one, this extension. Any possibility of making it work? I was testing it and it does not function with it, although it does not make error messages or anything, it just doesn't function.

10k thanks!

Can't make it work.

I'm not sure if I am doing something wrong, but I feel like I've followed the steps correctly, apart from my prompt having 100 tokens.
But no matter how much I cut off, changed settings, tried batches of images, it seems like there is ALWAYS contamination instead.

analog style,model shoot style,RAW photo ((close portrait:1.6)) of an edgy and cool profile picture featuring a young man who resembles Sung Jin Woo from the popular webtoon series "Solo Leveling",
(trimmed` ,orange beard,:1.1),(green eyes:1.1),detailed eyes,
(black headphones:1.1),blue lightning around the headphones,solo focus,dressed in cool and fashionable attire,long coat, clean white backdrop background,feeling of edginess,sharpness,and coolness,capturing the essence of the character while presenting a visually striking profile picture,
lora:detailedeyes:0.9,
Negative prompt: Asian-Less-Neg,color contamination,
Steps: 50, Sampler: DPM++ SDE Karras, CFG scale: 10, Seed: 2077349957, Size: 512x728, Model hash: c194532de5, Model: realMoonAnime_v20, Cutoff enabled: True, Cutoff targets: ["orange", "green", "black", "blue"], Cutoff padding: _, Cutoff weight: 2.0, Cutoff disable_for_neg: True, Cutoff strong: False, Cutoff interpolation: lerp, Lora hashes: "detailedeyes: 3013f7bd7b29", Version: v1.3.1

00044-2077349957
The image above is the one with the cutoff, while the one below is without the cutoff.

00045-2077349957

Being honest, looks like there was more color contamination on the image with the cutoff activated than the one without it.

What exactly does Cutoff strongly do?

What exactly does Cutoff strongly do? More intensive color cut? Do I need to configure something for him? Does it have any side effects? Please tell me.

Request to add settings of default target tokens.

Hi. Thank you for developing nice extension.

I want settings of default target tokens.

I use this extension mainly for separating colors.
So, I always set target tokens as below.

red,blue,green,yellow,orange,aqua,white,black,pink,purple,brown,blonde

I want the above tokens to be default❗

Please consider it. Thanks.

Assertion error with SDXL

Traceback (most recent call last):
  File "C:\stable-diffusion-webui\modules\call_queue.py", line 57, in f
    res = list(func(*args, **kwargs))
  File "C:\stable-diffusion-webui\modules\call_queue.py", line 36, in f
    res = func(*args, **kwargs)
  File "C:\stable-diffusion-webui\modules\txt2img.py", line 55, in txt2img
    processed = processing.process_images(p)
  File "C:\stable-diffusion-webui\modules\processing.py", line 734, in process_images
    res = process_images_inner(p)
  File "C:\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 42, in processing_process_images_hijack
    return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
  File "C:\stable-diffusion-webui\modules\processing.py", line 858, in process_images_inner
    p.setup_conds()
  File "C:\stable-diffusion-webui\modules\processing.py", line 1314, in setup_conds
    super().setup_conds()
  File "C:\stable-diffusion-webui\modules\processing.py", line 469, in setup_conds
    self.uc = self.get_conds_with_caching(prompt_parser.get_learned_conditioning, negative_prompts, total_steps, [self.cached_uc], self.extra_network_data)
  File "C:\stable-diffusion-webui\modules\processing.py", line 455, in get_conds_with_caching
    cache[1] = function(shared.sd_model, required_prompts, steps, hires_steps, shared.opts.use_old_scheduling)
  File "C:\stable-diffusion-webui\modules\prompt_parser.py", line 188, in get_learned_conditioning
    conds = model.get_learned_conditioning(texts)
  File "C:\stable-diffusion-webui\modules\sd_models_xl.py", line 31, in get_learned_conditioning
    c = self.conditioner(sdxl_conds, force_zero_embeddings=['txt'] if force_zero_negative_prompt else [])
  File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1547, in _call_impl
    hook_result = hook(self, args, result)
  File "C:\stable-diffusion-webui\extensions\sd-webui-cutoff\scripts\cutoff.py", line 94, in hook
    assert isinstance(mod, CLIP)
AssertionError

Prompt: A man wearing a red shirt next to a blue space monkey
Negatives: Text, 3d

Cutoff tokens: red,blue

A1111 1.6

Happens with any prompt, not just this one.

Set settings from PNG Info automatically

I see that Cutoff saves settings in PNG Info of images:
image

When I click "send to ..." button in PNG Info tab, I expect those fields to be automatically set in the Cutoff section, but they are not. So I need to set them myself everytime.

I don't know if it is a bug in my stable diffusion setup, or there are not such a feature yet.

Thank you for the extension anyways.

Have you tried enforcing cut-off on each CLIP layer instead of only on the last one?

Let me explain. As I understand, you:

  1. Split prompt at each comma, if there is a target term between.
  2. For each group create two its versions: original and replaced with target token.
  3. Transform everything several times with CLIP.
  4. Somehow combine target arrays so that each group mainly receives its original part while "sees" other groups as replaced.

What if we do this not only for the last clip layer (known as ClipSkip 1), but for each layer available? (Yes, it will work only for SD1 in this case, but it's worth trying!)

I propose something like this:

  1. Replace every term, send to CLIP and extract each value at each layer (10x77x768, or so).
  2. For each group, freeze all CLIP weights except for the current group, for all layers.
  3. When CLIP will transform each part, it should not rewrite any frozen number which belongs to other groups (this can be imagined as "inpainting" all the way from the first to the last layer but only for small subset of vector embeddings).

(Personally, I don't know how technically hook/replace CLIP's behavior, but theoretically it should be possible).

In this scenario, there would be no single bit of color information leave its group! Though, the composition might change severely (closely resembling that with already replaced terms), and the colors may not play nicely with each other (or being washed-out), but we need to see it ourselves.

What do you think?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.