google / prompt-to-prompt Goto Github PK

View Code? Open in Web Editor NEW

2.8K 24.0 266.0 6.03 MB

License: Apache License 2.0

Jupyter Notebook 99.96% Python 0.04%

prompt-to-prompt's People

Contributors

Stargazers

Watchers

Forkers

ricklentz techthiyanes c00renut lucasleandro1204 aspenmayer jamesthesnake jikkuatwork codeaudit chorseng marcus-arcadius econds zerohackz teddyj jaedukseo donlinglok mikosamey kywch trishiaza shafiahmed researchoor yangsenwxy lightshifted b1sounours we-jay sidneyrubio1983 mbrukman gaohuan2015 dayuhoho vickzhang xubig okanchou9 asdlei99 philmaas nomackleo izan00 zwglory lewisget jiang-stan gavinljj hahahaha-hash telcotronics cdrhim xinyuan-zheng hcnhatnam js-ts 563017732 taoleitian johnnypeck benjamesbabala vampireprinces aliang-cv daheekwon richardsonjf jh-001 chenxwh pblwk notmariekondo fotografomedellin xggnet ethan-jiang-1 trived10 rithik55 zurichrain ai-machine-vision-lab nd7141 kaiyuesun98 steelburn exchangefree cian0 ikitozen ahmed-elnakeeb standardgalactic joskid parzaech oyelowo zahurul-islam kurage-to-yukaina-nakamatachi coryjo artyom-morozov emermv pouyaexe josh900 tungvuthanh meefs gergopool dmarx ruanzhenglin dayvheed fmquaglia gauravmahto rmokady kenjiqq fyviezhao chunchi031 vladimir25parkov vpegasus cmdeejay landian60 silknight paipaipaidaxing

prompt-to-prompt's Issues

"TypeError: 'NoneType' object is not callable" upon model.unet() call

Im running with diffusers 0.14.0
the "NoneType" call error is from the attention function in the forward pass of CrossAttnDownBlock2D

A question about ddim inversion equation

Hi~

I derive ddim inversion equation, but it seems different from your paper:
my derivation:

equation in your paper:

There is no coefficient where the arrow points.

So where did I go wrong? There is no source for this equation in the references you cited

diffusers version should be 0.10.0 rather than 0.3.0

up-to-date diffusers 0.11.1 has a bug with attention mask
by the way, can you show how to train the null-text inversion using 16G GPU, like V100? also shows CUDA memory is not enough...

Thanks for your research :)

set_timesteps() got an unexpected keyword argument 'offset'

i get an error：
`
162 # set timesteps
163 extra_set_kwargs = {"offset": 1}
--> 164 model.scheduler.set_timesteps(num_inference_steps, **extra_set_kwargs)
165 for t in tqdm(model.scheduler.timesteps):
166 latents = diffusion_step(model, controller, latents, context, t, guidance_scale, low_resource)

TypeError: set_timesteps() got an unexpected keyword argument 'offset'`

Could you please help me to solve it. Thank you in advance

TypeError: forward() got an unexpected keyword argument 'encoder_hidden_states'

I am trying to run any of the jupyter notebooks to test the code but I am facing this error in the line where the prompts are passed to the model. The code cell and the error are the following:

g_cpu = torch.Generator().manual_seed(8888)
prompts = ["A painting of a squirrel eating a burger"]
controller = AttentionStore()
image, x_t = run_and_display(prompts, controller, latent=None, run_baseline=False, generator=g_cpu)
show_cross_attention(controller, res=16, from_where=("up", "down"))

TypeError Traceback (most recent call last)
Cell In[9], line 4
2 prompts = ["A painting of a squirrel eating a burger"]
3 controller = AttentionStore()
----> 4 image, x_t = run_and_display(prompts, controller, latent=None, run_baseline=False, generator=g_cpu)
5 show_cross_attention(controller, res=16, from_where=("up", "down"))

Cell In[6], line 6, in run_and_display(prompts, controller, latent, run_baseline, generator)
4 images, latent = run_and_display(prompts, EmptyControl(), latent=latent, run_baseline=False, generator=generator)
5 print("with prompt-to-prompt")
----> 6 images, x_t = ptp_utils.text2image_ldm_stable(ldm_stable, prompts, controller, latent=latent, num_inference_steps=NUM_DIFFUSION_STEPS, guidance_scale=GUIDANCE_SCALE, generator=generator, low_resource=LOW_RESOURCE)
7 ptp_utils.view_images(images)
8 return images, x_t

File ~/.conda/envs/prompt/lib/python3.8/site-packages/torch/autograd/grad_mode.py:27, in _DecoratorContextManager.call..decorate_context(*args, **kwargs)
24 @functools.wraps(func)
25 def decorate_context(*args, **kwargs):
26 with self.clone():
---> 27 return func(*args, **kwargs)

File ~/Downloads/prompt-to-prompt/ptp_utils.py:167, in text2image_ldm_stable(model, prompt, controller, num_inference_steps, guidance_scale, generator, latent, low_resource)
165 model.scheduler.set_timesteps(num_inference_steps)
166 for t in tqdm(model.scheduler.timesteps):
--> 167 latents = diffusion_step(model, controller, latents, context, t, guidance_scale, low_resource)
...
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []

TypeError: forward() got an unexpected keyword argument 'encoder_hidden_states'

Any suggestions?

too may indices for tensor of dimension 4

When I run this code, an error occurred in the function of latent2image.

image = car.decode(latents)[“samples”]

too may indices for tensor of dimension 4

Getting error in null_text_w_ptp

DDIM inversion...
Unexpected exception formatting exception. Falling back to standard exception
Traceback (most recent call last):
File "/home/nras/miniconda3/envs/py3.8/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3505, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/tmp/ipykernel_585007/266642345.py", line 3, in
(image_gt, image_enc), x_t, uncond_embeddings = null_inversion.invert(image_path, prompt, offsets=(0,0,200,0), verbose=True)
File "/tmp/ipykernel_585007/262494972.py", line 168, in invert
image_rec, ddim_latents = self.ddim_inversion(image_gt)
File "/home/nras/miniconda3/envs/py3.8/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/tmp/ipykernel_585007/262494972.py", line 125, in ddim_inversion
ddim_latents = self.ddim_loop(latent)
File "/home/nras/miniconda3/envs/py3.8/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/tmp/ipykernel_585007/262494972.py", line 112, in ddim_loop
noise_pred = self.get_noise_pred_single(latent, t, cond_embeddings)
File "/tmp/ipykernel_585007/262494972.py", line 46, in get_noise_pred_single
noise_pred = self.model.unet(latents, t, encoder_hidden_states=context)["sample"]
File "/home/nras/miniconda3/envs/py3.8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/nras/miniconda3/envs/py3.8/lib/python3.8/site-packages/diffusers/models/unet_2d_condition.py", line 582, in forward
File "/home/nras/miniconda3/envs/py3.8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/nras/miniconda3/envs/py3.8/lib/python3.8/site-packages/diffusers/models/unet_2d_blocks.py", line 837, in forward
File "/home/nras/miniconda3/envs/py3.8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/nras/miniconda3/envs/py3.8/lib/python3.8/site-packages/diffusers/models/transformer_2d.py", line 265, in forward
File "/home/nras/miniconda3/envs/py3.8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/nras/miniconda3/envs/py3.8/lib/python3.8/site-packages/diffusers/models/attention.py", line 291, in forward
class FeedForward(nn.Module):
File "/home/nras/miniconda3/envs/py3.8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
TypeError: forward() got an unexpected keyword argument 'encoder_hidden_states'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/nras/miniconda3/envs/py3.8/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 2102, in showtraceback
stb = self.InteractiveTB.structured_traceback(
File "/home/nras/miniconda3/envs/py3.8/lib/python3.8/site-packages/IPython/core/ultratb.py", line 1310, in structured_traceback
return FormattedTB.structured_traceback(
File "/home/nras/miniconda3/envs/py3.8/lib/python3.8/site-packages/IPython/core/ultratb.py", line 1199, in structured_traceback
return VerboseTB.structured_traceback(
File "/home/nras/miniconda3/envs/py3.8/lib/python3.8/site-packages/IPython/core/ultratb.py", line 1052, in structured_traceback
formatted_exception = self.format_exception_as_a_whole(etype, evalue, etb, number_of_lines_of_context,
File "/home/nras/miniconda3/envs/py3.8/lib/python3.8/site-packages/IPython/core/ultratb.py", line 978, in format_exception_as_a_whole
frames.append(self.format_record(record))
File "/home/nras/miniconda3/envs/py3.8/lib/python3.8/site-packages/IPython/core/ultratb.py", line 878, in format_record
frame_info.lines, Colors, self.has_colors, lvals
File "/home/nras/miniconda3/envs/py3.8/lib/python3.8/site-packages/IPython/core/ultratb.py", line 712, in lines
return self._sd.lines
File "/home/nras/miniconda3/envs/py3.8/lib/python3.8/site-packages/stack_data/utils.py", line 145, in cached_property_wrapper
value = obj.dict[self.func.name] = self.func(obj)
File "/home/nras/miniconda3/envs/py3.8/lib/python3.8/site-packages/stack_data/core.py", line 698, in lines
pieces = self.included_pieces
File "/home/nras/miniconda3/envs/py3.8/lib/python3.8/site-packages/stack_data/utils.py", line 145, in cached_property_wrapper
value = obj.dict[self.func.name] = self.func(obj)
File "/home/nras/miniconda3/envs/py3.8/lib/python3.8/site-packages/stack_data/core.py", line 649, in included_pieces
pos = scope_pieces.index(self.executing_piece)
File "/home/nras/miniconda3/envs/py3.8/lib/python3.8/site-packages/stack_data/utils.py", line 145, in cached_property_wrapper
value = obj.dict[self.func.name] = self.func(obj)
File "/home/nras/miniconda3/envs/py3.8/lib/python3.8/site-packages/stack_data/core.py", line 628, in executing_piece
return only(
File "/home/nras/miniconda3/envs/py3.8/lib/python3.8/site-packages/executing/executing.py", line 164, in only
raise NotOneValueFound('Expected one value, found 0')
executing.executing.NotOneValueFound: Expected one value, found 0

404 page not found

@google-admin ， thanks for your great job and sharing. But when I click the link "https://github.com/google/prompt-to-prompt/blob/main/%22./prompt-to-prompt_stable.ipynb%22", it came to an error "404".

Bugs about 'from diffusers import StableDiffusionPipeline'

I have installed the required diffusers and transformers, but occurs:

TypeError Traceback (most recent call last)
in
1 from typing import Optional, Union, Tuple, List, Callable, Dict
2 import torch
----> 3 from diffusers import StableDiffusionPipeline
4 import torch.nn.functional as nnf
5 import numpy as np

~/anaconda3/lib/python3.8/site-packages/diffusers/init.py in
24 )
25 from .pipeline_utils import DiffusionPipeline
---> 26 from .pipelines import DDIMPipeline, DDPMPipeline, KarrasVePipeline, LDMPipeline, PNDMPipeline, ScoreSdeVePipeline
27 from .schedulers import (
28 DDIMScheduler,

~/anaconda3/lib/python3.8/site-packages/diffusers/pipelines/init.py in
9
10 if is_transformers_available():
---> 11 from .latent_diffusion import LDMTextToImagePipeline
12 from .stable_diffusion import (
13 StableDiffusionImg2ImgPipeline,

~/anaconda3/lib/python3.8/site-packages/diffusers/pipelines/latent_diffusion/init.py in
4
5 if is_transformers_available():
----> 6 from .pipeline_latent_diffusion import LDMBertModel, LDMTextToImagePipeline

~/anaconda3/lib/python3.8/site-packages/diffusers/pipelines/latent_diffusion/pipeline_latent_diffusion.py in
7 import torch.utils.checkpoint
8
----> 9 from transformers.activations import ACT2FN
10 from transformers.configuration_utils import PretrainedConfig
11 from transformers.modeling_outputs import BaseModelOutput

~/anaconda3/lib/python3.8/site-packages/transformers/init.py in
28
29 # Check the dependencies satisfy the minimal versions required.
---> 30 from . import dependency_versions_check
31 from .utils import (
32 OptionalDependencyNotAvailable,

~/anaconda3/lib/python3.8/site-packages/transformers/dependency_versions_check.py in
15
16 from .dependency_versions_table import deps
---> 17 from .utils.versions import require_version, require_version_core
18
19

~/anaconda3/lib/python3.8/site-packages/transformers/utils/init.py in
32 replace_return_docstrings,
33 )
---> 34 from .generic import (
35 ContextManagers,
36 ExplicitEnum,

~/anaconda3/lib/python3.8/site-packages/transformers/utils/generic.py in
31
32 if is_tf_available():
---> 33 import tensorflow as tf
34
35 if is_flax_available():

~/anaconda3/lib/python3.8/site-packages/tensorflow/init.py in
53 from ._api.v2 import autograph
54 from ._api.v2 import bitwise
---> 55 from ._api.v2 import compat
56 from ._api.v2 import config
57 from ._api.v2 import data

~/anaconda3/lib/python3.8/site-packages/tensorflow/_api/v2/compat/init.py in
37 import sys as _sys
38
---> 39 from . import v1
40 from . import v2
41 from tensorflow.python.compat.compat import forward_compatibility_horizon

~/anaconda3/lib/python3.8/site-packages/tensorflow/_api/v2/compat/v1/init.py in
32 from . import autograph
33 from . import bitwise
---> 34 from . import compat
35 from . import config
36 from . import data

~/anaconda3/lib/python3.8/site-packages/tensorflow/_api/v2/compat/v1/compat/init.py in
37 import sys as _sys
38
---> 39 from . import v1
40 from . import v2
41 from tensorflow.python.compat.compat import forward_compatibility_horizon

~/anaconda3/lib/python3.8/site-packages/tensorflow/_api/v2/compat/v1/compat/v1/init.py in
49 from tensorflow._api.v2.compat.v1 import layers
50 from tensorflow._api.v2.compat.v1 import linalg
---> 51 from tensorflow._api.v2.compat.v1 import lite
52 from tensorflow._api.v2.compat.v1 import logging
53 from tensorflow._api.v2.compat.v1 import lookup

~/anaconda3/lib/python3.8/site-packages/tensorflow/_api/v2/compat/v1/lite/init.py in
9
10 from . import constants
---> 11 from . import experimental
12 from tensorflow.lite.python.lite import Interpreter
13 from tensorflow.lite.python.lite import OpHint

~/anaconda3/lib/python3.8/site-packages/tensorflow/_api/v2/compat/v1/lite/experimental/init.py in
8 import sys as _sys
9
---> 10 from . import authoring
11 from tensorflow.lite.python.analyzer import ModelAnalyzer as Analyzer
12 from tensorflow.lite.python.lite import OpResolverType

~/anaconda3/lib/python3.8/site-packages/tensorflow/_api/v2/compat/v1/lite/experimental/authoring/init.py in
8 import sys as _sys
9
---> 10 from tensorflow.lite.python.authoring.authoring import compatible
11
12 del _print_function

~/anaconda3/lib/python3.8/site-packages/tensorflow/lite/python/authoring/authoring.py in
41
42 # pylint: disable=g-import-not-at-top
---> 43 from tensorflow.lite.python import convert
44 from tensorflow.lite.python import lite
45 from tensorflow.lite.python.metrics_wrapper import converter_error_data_pb2

~/anaconda3/lib/python3.8/site-packages/tensorflow/lite/python/convert.py in
31
32 from tensorflow.lite.python import lite_constants
---> 33 from tensorflow.lite.python import util
34 from tensorflow.lite.python import wrap_toco
35 from tensorflow.lite.python.convert_phase import Component

~/anaconda3/lib/python3.8/site-packages/tensorflow/lite/python/util.py in
53 # pylint: disable=unused-import
54 try:
---> 55 from jax import xla_computation as _xla_computation
56 except ImportError:
57 _xla_computation = None

~/anaconda3/lib/python3.8/site-packages/jax/init.py in
90 # These submodules are separate because they are in an import cycle with
91 # jax and rely on the names imported above.
---> 92 from . import image
93 from . import lax
94 from . import nn

~/anaconda3/lib/python3.8/site-packages/jax/image/init.py in
16
17 # flake8: noqa: F401
---> 18 from jax._src.image.scale import (
19 resize,
20 ResizeMethod,

~/anaconda3/lib/python3.8/site-packages/jax/_src/image/scale.py in
18
19 from jax import jit
---> 20 from jax import lax
21 from jax import numpy as jnp
22 import numpy as np

~/anaconda3/lib/python3.8/site-packages/jax/lax/init.py in
322 while_p,
323 )
--> 324 from jax._src.lax.fft import (
325 fft,
326 fft_p,

~/anaconda3/lib/python3.8/site-packages/jax/_src/lax/fft.py in
85
86 @partial(jit, static_argnums=1)
---> 87 def _rfft_transpose(t, fft_lengths):
88 # The transpose of RFFT can't be expressed only in terms of irfft. Instead of
89 # manually building up larger twiddle matrices (which would increase the

~/anaconda3/lib/python3.8/site-packages/jax/api.py in jit(fun, static_argnums, device, backend, donate_argnums)
179 """
180 if FLAGS.experimental_cpp_jit and config.omnistaging_enabled:
--> 181 return _cpp_jit(fun, static_argnums, device, backend, donate_argnums)
182 else:
183 return _python_jit(fun, static_argnums, device, backend, donate_argnums)

~/anaconda3/lib/python3.8/site-packages/jax/api.py in cpp_jit(fun, static_argnums, device, backend, donate_argnums)
365
366 static_argnums = (0,) + tuple(i + 1 for i in static_argnums)
--> 367 cpp_jitted_f = jax_jit.jit(fun, cache_miss, get_device_info,
368 get_jax_enable_x64, get_jax_disable_jit_flag,
369 static_argnums_)

TypeError: jit(): incompatible function arguments. The following argument types are supported:
1. (fun: function, cache_miss: function, get_device: function, static_argnums: List[int], static_argnames: List[str] = [], donate_argnums: List[int] = [], cache: jaxlib.xla_extension.CompiledFunctionCache = None) -> object

Invoked with: <function _rfft_transpose at 0x7f44d1e18ee0>, <function _cpp_jit..cache_miss at 0x7f44d1e18f70>, <function _cpp_jit..get_device_info at 0x7f44d1e1e040>, <function _cpp_jit..get_jax_enable_x64 at 0x7f44d1e1e0d0>, <function _cpp_jit..get_jax_disable_jit_flag at 0x7f44d1e1e160>, (0, 2)

I am wondering what should I do to fix it?

Output source image is too different from the input image

Hi, thanks for your work!

When I tried some real images, the null-text inversion output was fine, but the ptp editing output was totally different from the input, including both the source image and the edited image. Could you please help to explain why this happens? And any advice on how to solve it?

Can we make this work in Colab?

The requirements.txt states diffusers==0.3.0
However we run into the following error in Colab.

Also 0.10.0 does now work

Here is my attempt:
https://colab.research.google.com/drive/1fKzfvdv_7lf8bGpYiPq0AdJ7kUjjKMq6?usp=sharing

Related to #31

Is LocalBlend an implementation of the mask-based real image editing technique described in the paper?

Hi, just want to confirm if I'm correct in understanding that the LocalBlend object used in the "Local Edit" section of the prompt-to-prompt_stable notebook implements the technique in described in page 10 of the paper (i.e. the mask-based real image editing technique that does not rely on deterministic image inversion).

Image Size in Null-Text Inversion

I am testing the inversion accuracy using COCO dataset but the result is not stable. Only change I did is relaxing the hard-coded 512x512 image size. Do you see any potential risks with that size change? Thanks

Cross attention control and xformers memory efficient attention

Hi awesome paper!

Is it possible to integrate cross attention control mechanism in the memory efficient attention formula?

From what I understand, cross attention control modifies the attention map to make edits, but memory efficient attention doesn't compute attention in the same way, and doesn't explicitly compute the attention map. How can we tweak the memory efficient attention formula to support cross attention control? Is it possible to use both together?

Thank you!

Bad results？

Compare to Imagic, exmple:
This is Null text method：
prompts = ["A dog",
"A sitting dog"
]

This is imagic:

The complete inference code is as follows：

image_path = "./imgs/dog2.png"
prompt = "A dog"
# offsets=(0,0,200,0)
(image_gt, image_enc), x_t, uncond_embeddings = null_inversion.invert(image_path, prompt, verbose=True)

print("Modify or remove offsets according to your image!")

prompts = [prompt]
controller = AttentionStore()
image_inv, x_t = run_and_display(prompts, controller, run_baseline=False, latent=x_t, uncond_embeddings=uncond_embeddings, verbose=False)
print("showing from left to right: the ground truth image, the vq-autoencoder reconstruction, the null-text inverted image")
ptp_utils.view_images([image_gt, image_enc, image_inv[0]])
show_cross_attention(controller, 16, ["up", "down"])

prompts = ["A dog",
           "A sitting dog"
        ]

cross_replace_steps = {'default_': .8, }
self_replace_steps = .7
blend_word = ((('dog',),)) # for local edit
eq_params = {"words": ("sitting", ), "values": (5,)}  
 
controller = make_controller(prompts, False, cross_replace_steps, self_replace_steps, blend_word, eq_params)
images, _ = run_and_display(prompts, controller, run_baseline=False, latent=x_t, uncond_embeddings=uncond_embeddings)

I tried many parameters but couldn't edit this dog,Is this a limitation of the current method？？

Unexpected argument to scheduler

The Error

I'm getting this when running the original Stable Diffusion notebook with diffusers==0.3.0

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <module>:4                                                                                    │
│ in run_and_display:6                                                                             │
│                                                                                                  │
│ /usr/lib/python3/dist-packages/torch/autograd/grad_mode.py:27 in decorate_context                │
│                                                                                                  │
│    24 │   │   @functools.wraps(func)                                                             │
│    25 │   │   def decorate_context(*args, **kwargs):                                             │
│    26 │   │   │   with self.clone():                                                             │
│ ❱  27 │   │   │   │   return func(*args, **kwargs)                                               │
│    28 │   │   return cast(F, decorate_context)                                                   │
│    29 │                                                                                          │
│    30 │   def _wrap_generator(self, func):                                                       │
│                                                                                                  │
│ /home/ubuntu/p2p2/ptp_utils.py:164 in text2image_ldm_stable                                      │
│                                                                                                  │
│   161 │                                                                                          │
│   162 │   # set timesteps                                                                        │
│   163 │   extra_set_kwargs = {"offset": 1}                                                       │
│ ❱ 164 │   model.scheduler.set_timesteps(num_inference_steps, **extra_set_kwargs)                 │
│   165 │   for t in tqdm(model.scheduler.timesteps):                                              │
│   166 │   │   latents = diffusion_step(model, controller, latents, context, t, guidance_scale,   │
│   167                                                                                            │
│                                                                                                  │
│ /home/ubuntu/.local/lib/python3.8/site-packages/diffusers/schedulers/scheduling_pndm.py:171 in   │
│ set_timesteps                                                                                    │
│                                                                                                  │
│   168 │   │                                                                                      │
│   169 │   │   self.ets = []                                                                      │
│   170 │   │   self.counter = 0                                                                   │
│ ❱ 171 │   │   self.set_format(tensor_format=self.tensor_format)                                  │
│   172 │                                                                                          │
│   173 │   def step(                                                                              │
│   174 │   │   self,                                                                              │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'PNDMScheduler' object has no attribute 'tensor_format'

When updating to diffusers 0.8.0:

AttributeError: 'PNDMScheduler' object has no attribute 'tensor_format'

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <module>:4                                                                                    │
│ in run_and_display:6                                                                             │
│                                                                                                  │
│ /usr/lib/python3/dist-packages/torch/autograd/grad_mode.py:27 in decorate_context                │
│                                                                                                  │
│    24 │   │   @functools.wraps(func)                                                             │
│    25 │   │   def decorate_context(*args, **kwargs):                                             │
│    26 │   │   │   with self.clone():                                                             │
│ ❱  27 │   │   │   │   return func(*args, **kwargs)                                               │
│    28 │   │   return cast(F, decorate_context)                                                   │
│    29 │                                                                                          │
│    30 │   def _wrap_generator(self, func):                                                       │
│                                                                                                  │
│ /home/ubuntu/p2p2/ptp_utils.py:164 in text2image_ldm_stable                                      │
│                                                                                                  │
│   161 │                                                                                          │
│   162 │   # set timesteps                                                                        │
│   163 │   extra_set_kwargs = {"offset": 1}                                                       │
│ ❱ 164 │   model.scheduler.set_timesteps(num_inference_steps, **extra_set_kwargs)                 │
│   165 │   for t in tqdm(model.scheduler.timesteps):                                              │
│   166 │   │   latents = diffusion_step(model, controller, latents, context, t, guidance_scale,   │
│   167                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: set_timesteps() got an unexpected keyword argument 'offset'

Explanation

The PNDM Scheduler API in Diffusers has to receive steps_offset at initialization.

So we either move the "offset = 1" somewhere else, or understand why the first error occurs.

Any ideas?

real image inversion

Dear @amirhertz ,

Thank you for sharing this great work, I really like it.

Do you have plan to release the codes for real image editing in section 4.1?

Thank you for your help.

Best Wishes,

Zongze

KeyError: 'up_cross'

running show_cross_attention(controller, 16, ["up", "down"]), it throws KeyError, what's the problem

How does this method edit the actions of an object?

Hi,I tried to make the standing cat sit down, but nothing changed. I hope to receive help. Thank you very much

TypeError: getattr(): attribute name must be string from "null_text_w_ptp.ipynb" file

I am trying to run the jupyter file and third block give me the following error.

scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False)
MY_TOKEN = ''
LOW_RESOURCE = False
NUM_DDIM_STEPS = 50
GUIDANCE_SCALE = 7.5
MAX_NUM_WORDS = 77
device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu')
ldm_stable = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=MY_TOKEN, scheduler=scheduler).to(device)
try:
ldm_stable.disable_xformers_memory_efficient_attention()
except AttributeError:
print("Attribute disable_xformers_memory_efficient_attention() is missing")
tokenizer = ldm_stable.tokenizer

TypeError Traceback (most recent call last)
Cell In[3], line 8
6 MAX_NUM_WORDS = 77
7 device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu')
----> 8 ldm_stable = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=MY_TOKEN, scheduler=scheduler).to(device)
9 try:
10 ldm_stable.disable_xformers_memory_efficient_attention()

File ~/anaconda3/envs/p2p/lib/python3.8/site-packages/diffusers/pipeline_utils.py:373, in DiffusionPipeline.from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
370 if issubclass(class_obj, class_candidate):
371 load_method_name = importable_classes[class_name][1]
--> 373 load_method = getattr(class_obj, load_method_name)
375 loading_kwargs = {}
376 if issubclass(class_obj, torch.nn.Module):

TypeError: getattr(): attribute name must be string

any comments?

All the other jupyter file works well.

I also tried to bring Stable-diffusion v-2.1, and it also didn't work :(

Colabs do not run

I've tried running both colabs and I get a sequence of errors:

Setting the offset in ptp_utils.py L163 doesn't work with the newest diffusers.
After turning off the offset kwarg above, I get TypeError: forward() got an unexpected keyword argument 'attention_mask'.
I reverted to diffusers==0.3.0 like in the requirements.txt and I get that TypeError: getattr(): attribute name must be string when trying to load the model (ldm_stable = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=MY_TOKEN).to(device)).

What am I missing?

Installation issues: ModuleNotFound Error when trying to run on Colab

I'm trying to run the Null-text inversion code on colab, but can't seem to install it due to xformers issues.

I think I succeeded installing xformers package using !pip install -U --pre xformers
The versions of the packages are:

Torch version: 1.13.0+cu116
xformers version: 0.0.16rc396
diffusers version: 0.10.0

But I get the following error:


---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
[<ipython-input-15-ab2e3648a6a0>](https://localhost:8080/#) in <module>
      8 ldm_stable = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=MY_TOKEN, scheduler=scheduler).to(device)
      9 try:
---> 10     ldm_stable.disable_xformers_memory_efficient_attention()
     11 except AttributeError:
     12     print("Attribute disable_xformers_memory_efficient_attention() is missing")

7 frames
[/usr/local/lib/python3.8/dist-packages/diffusers/pipeline_utils.py](https://localhost:8080/#) in disable_xformers_memory_efficient_attention(self)
    829         Disable memory efficient attention as implemented in xformers.
    830         """
--> 831         self.set_use_memory_efficient_attention_xformers(False)
    832 
    833     def set_use_memory_efficient_attention_xformers(self, valid: bool) -> None:

[/usr/local/lib/python3.8/dist-packages/diffusers/pipeline_utils.py](https://localhost:8080/#) in set_use_memory_efficient_attention_xformers(self, valid)
    846         Xformers    module = getattr(self, module_name)
    847             if isinstance(module, torch.nn.Module):
--> 848                 fn_recursive_set_mem_eff(module)
    849 
    850     def enable_attention_slicing(self, slice_size: Optional[Union[str, int]] = "auto"):

[/usr/local/lib/python3.8/dist-packages/diffusers/pipeline_utils.py](https://localhost:8080/#) in fn_recursive_set_mem_eff(module)
    840 
    841             for child in module.children():
--> 842                 fn_recursive_set_mem_eff(child)
    843 
    844         module_names, _, _ = self.extract_init_dict(dict(self.config))

[/usr/local/lib/python3.8/dist-packages/diffusers/pipeline_utils.py](https://localhost:8080/#) in fn_recursive_set_mem_eff(module)
    840 
    841             for child in module.children():
--> 842                 fn_recursive_set_mem_eff(child)
    843 
    844         module_names, _, _ = self.extract_init_dict(dict(self.config))

[/usr/local/lib/python3.8/dist-packages/diffusers/pipeline_utils.py](https://localhost:8080/#) in fn_recursive_set_mem_eff(module)
    840 
    841             for child in module.children():
--> 842                 fn_recursive_set_mem_eff(child)
    843 
    844         module_names, _, _ = self.extract_init_dict(dict(self.config))

[/usr/local/lib/python3.8/dist-packages/diffusers/pipeline_utils.py](https://localhost:8080/#) in fn_recursive_set_mem_eff(module)
    840 
    841             for child in module.children():
--> 842                 fn_recursive_set_mem_eff(child)
    843 
    844         module_names, _, _ = self.extract_init_dict(dict(self.config))

[/usr/local/lib/python3.8/dist-packages/diffusers/pipeline_utils.py](https://localhost:8080/#) in fn_recursive_set_mem_eff(module)
    837         def fn_recursive_set_mem_eff(module: torch.nn.Module):
    838             if hasattr(module, "set_use_memory_efficient_attention_xformers"):
--> 839                 module.set_use_memory_efficient_attention_xformers(valid)
    840 
    841             for child in module.children():

[/usr/local/lib/python3.8/dist-packages/diffusers/models/attention.py](https://localhost:8080/#) in set_use_memory_efficient_attention_xformers(self, use_memory_efficient_attention_xformers)
    289     def set_use_memory_efficient_attention_xformers(self, use_memory_efficient_attention_xformers: bool):
    290         if not is_xformers_available():
--> 291             raise ModuleNotFoundError(
    292                 "Refer to https://github.com/facebookresearch/xformers for more information on how to install"
    293                 " xformers",

ModuleNotFoundError: Refer to https://github.com/facebookresearch/xformers for more information on how to install xformers

What am I doing wrong?
Any help would be much appreciated.

Inconsistency in the generation of image

Hi ! 😃

I have a problem regarding the prompt-to-prompt notebook.
The image of the squirrel changes a little bit between the Cross-Attention Visualization :

g_cpu = torch.Generator().manual_seed(8888)
prompts = ["A painting of a squirrel eating a burger"]
controller = AttentionStore()
image, x_t = run_and_display(prompts, controller, latent=None, run_baseline=False, generator=g_cpu)
show_cross_attention(controller, res=16, from_where=("up", "down"))

and Replacement edit cells :

prompts = ["A painting of a squirrel eating a burger",
           "A painting of a lion eating a burger"]

controller = AttentionReplace(prompts, NUM_DIFFUSION_STEPS, cross_replace_steps=.8, self_replace_steps=0.4)
_ = run_and_display(prompts, controller, latent=x_t, run_baseline=True)

sections:

The one on the left was generated from the cell of the Cross-Attention Visualization, and the right one from ****the Replacement edit cell.
If you look closely at the two black circles on the left, you’ll see a difference between the two squirrels, and this is not supposed to happen I guess.
I think you can reproduce the same errors, if you run the following code :

controller = EmptyControl()

g_cpu = torch.Generator().manual_seed(8888)
prompts = ["A painting of a squirrel eating a burger"]
image_1, x_t = run_and_display(prompts, controller, latent=None, run_baseline=False, generator=g_cpu)

g_cpu = torch.Generator().manual_seed(8888)
prompts = ["A painting of a squirrel eating a burger","A painting of a squirrel eating a burger"]
image_2, x_t = run_and_display(prompts, controller, latent=None, run_baseline=False, generator=g_cpu)

The single squirrel in image_1 would be different from the two squirrels generated in image_2

After going a little bit through the code, I suspect it comes from the size of the prompt. Because it contains two sentences, so we have batch_size = 2 in the Replacement cell. I think that’s why it doesn’t generate the exact same picture as if we used only one sentence, so maybe the total size the batch influences the generation of an image from a text even if the prompt remains the same.

The same problem arises when we work with the notebook of Null-text inversion :

Thank you for your help !! 😊

Failure case of DDIM inversion

I have used gnochi_mirror.jpeg and the associated prompt "A cat sitting next to a mirror" to try DDIM inversion using SD v1.4 (50 steps) but found the reconstruction quality is fine. What is the settings (and codes if available) to generate the failure case as shown in the Null-Text paper?

Thanks

Overfitting to the source image

Dear authors,
Thank you for sharing these interesting & fun research and releasing the source codes!

I was playing with the source codes and found that, for some examples, the result of the attention-swapped image is almost same as the source image (of the attention side).
This was also the case for the example "a photo of a butterfly on a [sth]" as Fig. 5 in the paper.
This, overfitting to the source image, seems to occur even when we swap for small numbers of the diffusion steps as with the parameters of cross_replace_steps>0.1 or self_replace_steps>0.1.

Is it expected to use small numbers (<0.3) for the replace_steps for some examples as in the above?
Is there any guideline for selecting cross_replace_steps and self_replace_steps?

Thank you for reading!

Null-text inversion worse than PTP?

Based on my test of image translation (lion->tiger using AFHQ), PTP is better than NT Inversion in many cases. Wonder if the cause is that NT inversion is too sensitive to parameters like cross_step/self_step?

Custom class adaptation

Thanks for your research :)
I wonder that this robust image editing method by attention injection can be adapted to custom class like Dreambooth research.
Ex) Photo of a cat riding a bicycle -> [V] car : give some custom class information like same car model images.

To those who have issues adapting it with Diffuersers version>=11.0

You might be encoutering error about:
hidden_states = self.attn1(norm_hidden_states, attention_mask=attention_mask) + hidden_states

Just simply rename argument attention_mask to mask, for self.attn2, rename encoder_hidden_states to context will fix the issue.

It is because the register_attention_control function have different argument names with new version of diffusers.

a question about running code

error:
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.

TypeError Traceback (most recent call last)
Input In [17], in <cell line: 7>()
5 MAX_NUM_WORDS = 77
6 device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu')
----> 7 ldm_stable = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=MY_TOKEN).to(device)
8 tokenizer = ldm_stable.tokenizer

File ~/miniconda3/lib/python3.8/site-packages/diffusers/pipeline_utils.py:373, in DiffusionPipeline.from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
370 if issubclass(class_obj, class_candidate):
371 load_method_name = importable_classes[class_name][1]
--> 373 load_method = getattr(class_obj, load_method_name)
375 loading_kwargs = {}
376 if issubclass(class_obj, torch.nn.Module):

TypeError: getattr(): attribute name must be string

get_replacement_mapper_ in seq_aligner.py

Thanks for this amazing work. I tested many times on this func, and this func always returns a diagonal matrix with 1s on the diagonal. Why don't you use the built-in func in the torch directly if this is right? If this needs to be corrected, can you help explain this issue?

Notebook links under Quickstart section produce 404's

Greets,

The links under the Quickstart section for prompt-to-prompt_ldm and prompt-to-prompt_stable are broken. Below are the links for each notebook

prompt-to-prompt_ldm
https://github.com/google/prompt-to-prompt/blob/main/%22./prompt-to-prompt_ldm.ipynb%22

prompt-to-prompt_stable
https://github.com/google/prompt-to-prompt/blob/main/%22./prompt-to-prompt_stable.ipynb%22

Thanks,
meefs

How to support img2img task

This is a really great work, thanks for open sourcing.
Currently I am trying to change the pipeline to support img2img task. Then edit the resulting image.
but failed by trying.
If it is convenient, please tell me how to support img2img task.

learned linear projections l_Q, l_K, l_V

Hi, this question is about the linear projections l_Q, l_K, l_V of the attention module in the paper Prompt-to-Prompt. The paper illustrated that the linear projections are learnable. However, in the introduction, it is claimed that "this method does not requires model training". The two expressions seem to contradict each other. How do you learn the papamaters of the l_Q, l_K, l_V?

which cross-attention layer to perform the proposed method

Thanks for sharing amazing work.
I have few questions after reading the paper, which is as following:

The proposed method is used every cross-attention layer (6464,3232,1616,88), right?
Which the value of τ used for the examples shown in this paper?
For word swap, do you try to use the attention map which is produced by the same prompt and different Z(T)?

PNDMScheduler has no attribute time steps?

The model scheduler looks like:
model scheduler : PNDMScheduler {
"_class_name": "PNDMScheduler",
"_diffusers_version": "0.8.0",
"beta_end": 0.012,
"beta_schedule": "scaled_linear",
"beta_start": 0.00085,
"clip_sample": false,
"num_train_timesteps": 1000,
"set_alpha_to_one": false,
"skip_prk_steps": true,
"steps_offset": 1,
"trained_betas": null
}

But this part of the code fails :
for t in tqdm(model.scheduler.timesteps):
latents = diffusion_step(model, controller, latents, context, t, guidance_scale,

Error: TypeError: 'NoneType' object is not iterable

Request for PyTorch 2.0 Compatibility

First of all, thank you for creating and maintaining this amazing project. I have been using it for a while and it has been really helpful.

I noticed that the current version of the project seems to be compatible with older versions of PyTorch, and I would like to inquire if there are any plans to update the code to support PyTorch 2.0. The latest PyTorch version has introduced several new features and improvements, which could potentially benefit this project as well.

If you are already working on this update or plan to do so in the near future, I would be happy to know the estimated timeline. In case you need any assistance with testing or adapting the code, please let me know. I would be more than happy to contribute and help in any way possible.

Thank you once again for your great work and looking forward to the possibility of using this project with PyTorch 2.0.

Best regards

About SD1.5

Does the code support SD1.5?

Source images in prompt to prompt and Null-text Inversion papers

Hi, I'm creating a comparison between prompt to prompt, null-text Inversion and other editing approaches using images present in their respective papers.

Could you please share them(haykpoghos[at]gmail[dot]com), or make them publicly available?

Understanding AttentionReplace

In the replace_cross_attention of AttentionReplace, why attn_replace is not used? I guess according to the paper, we have to replace attn_base with corresponding layers in attn_replace

Thank you

localblend?

What's the theory behind the "localblend", can you give me some clues? Thanks

Learning rate of null-text inversion

HI, @amirhertz !

Thank you for sharing your cool work!
I have a question about the learning rate of your null-text inversion. According to a notebook, the learning rate is set blow. However, in your paper, the learning rate is set to 0.01.

optimizer = Adam([uncond_embeddings], lr=1e-2 * (1. - i / 100.))

where $i$ represents an index of a for-loop for i in range(NUM_DDIM_STEPS):.
If we set NUM_DDIM_STEPS over 101, the learning rate gets negative.

My question is that can we modify lr=1e-2 instead of 1e-2 * (1. - i / 100.)?

About the global null-text Inversion

Excellent job!
I have some questions about Global null-text Inversion. I attempted to implement this algorithm, but I can't do a good reconstruction of the final optimized null-text embedding even if I change the number of optimization steps from 7500 to 10000. Could you kindly release the codes for this algorithm? Or is there any code snippet. Thanks!

I solved the problem and I closed it.

Links to the notebooks are not working

Links to the notebooks in the Quick Start section of the Readme file are not working.

Question about the equation of deterministic DDIM sampling in the Null-text Inversion paper

Hi,

Thanks for this wonderful work! I have a question about the equation of deterministic DDIM sampling in the Null-text Inversion paper.

Based on my understanding, deterministic DDIM sampling is to set the $\sigma_t=0$ for equation 12 in the DDIM paper. It should be the following equation:
$z_{t-1} = \sqrt{\alpha_{t-1}}\left(\frac{z_t - \sqrt{1-\alpha_t}\cdot\epsilon_\theta(z_t,\mathcal{C},t)}{\sqrt{\alpha_t}}\right) + \sqrt{1-\alpha_{t-1}}\cdot\epsilon_\theta(z_t, \mathcal{C}, t)$
If you rewrite this equation into the Null-text Inversion paper version, it should be:
$z_{t-1} = \sqrt{\frac{\alpha_{t-1}}{\alpha_t}}z_t + \sqrt{\alpha_{t-1}}\left(\sqrt{\frac{1}{\alpha_{t-1}} - 1}- \sqrt{\frac{1}{\alpha_t}-1} \right)\cdot\epsilon_\theta(z_t, \mathcal{C}, t)$
Which is different from it the Null-text Inversion paper.

It may be my understanding is wrong. I would very much appreciate it if you could point me in the right direction!

Thanks,

Jueqi

Support for half precision?

Are there any instructions on how to get this code working for half precision? If I'm not mistaken, diffusers==0.3.0 might be problematic for this (I think the VAE couldn't handle it) so I upgraded it the diffusers version which should fix that. Currently running into other errors that I'm slowly debugging. A little worried that the version upgrading might be causing more problems than necessary, so if there are specific instructions on how to get this code working for half precision that would be great to hear.

optimization on null text

Hi, the null text is optimized over the different timesteps. I am wondering whether it is an alternative solution to optimize unet itself over different timesteps only for this kind of condition? (copy unet and frozen it ahead, then for normal text input, use the frozen ones; for null text, use the optimized unet).

how run null text inversion in colab T4 gpu without CUDA out of memory

hoq run null text inversion in colab T4 gpu without CUDA out of memory

if not possible how many gigabytes do i need

Question about the influence of softmax function on the issue of attention map swapping.

Hi, I found that attention map swapping is performed after the softmax operation. In that case, the sum of those similarities could not be equal to 1. I wonder if the authors have tried to conduct attention map swapping before the softmax operation.

Request for help, problems with the installation

Hello, I have tried to run this application, but it always gives me this error:
TypeError Traceback (most recent call last)
in
5 MAX_NUM_WORDS = 77
6 device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu')
----> 7 ldm_stable = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=MY_TOKEN).to(device)
8 tokenizer = ldm_stable.tokenizer

/usr/local/lib/python3.8/dist-packages/diffusers/pipeline_utils.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
371 load_method_name = importable_classes[class_name][1]
372
--> 373 load_method = getattr(class_obj, load_method_name)
374
375 loading_kwargs = {}

TypeError: getattr(): attribute name must be strin
Could you please help me to solve it. Thank you in advance