deep-floyd / if Goto Github PK

View Code? Open in Web Editor NEW

7.6K 7.6K 492.0 28.13 MB

License: Other

Python 100.00%

if's People

Contributors

Stargazers

Watchers

Forkers

cruhl yoavhacohen nopeanuts firstuserhere we-jay georglegato estability provos jameshennessytempus edthefox elvis217 woniesong92 thearchiver dezigns333 techventurebuilder lopho jefedeoro wottan32 himomohi hojunelim treksis techthiyanes codeaudit oceantalk cyberflamego whitehorse01 dnyaneshwaru universewill maeganyork eltociear breenglespic ferranespigares edgarzyl c0debrain deepak-1530 nethdeco zss205 keyman9848 0x0000369 seaosrobotics edgency lastnexusdev xq-meng dosycorps my-basement peppicus tonywhite11 tvbboy2015 pnlkomplexit rajpdus anuragvohraec docxology if-ai alejandrosuarez apolinario yitingliu97 frrabelo richardsonjf skachan1 keksley phil4820 smali-kazmi andremoeller kp-forks toandreyhse ricklentz jamesthesnake gacwr moefear85 ohmygaugh-crypto millerhooks peterzs hyojunguy kabachuha sergeymikhalcov wenbinlee thomasreolon2 5280sec korotkovaqeou xstarlink camenduru scr1ptechnick nasa03 fastrocket hopto-dot hardsteppl annias barseghyanartur octag0no qimover malikmalna brycedrennan yucklou alfellati gchenfly vpegasus tdl77 glaceage fpzh2011 xcytxs

if's Issues

Can I distribute the stages over multiple GPUs? Like you see below

from deepfloyd_if.modules import IFStageI, IFStageII, StableStageIII
from deepfloyd_if.modules.t5 import T5Embedder

I have 4 Maxwel Titan X with 12GB VRAM each.

if_I = IFStageI('IF-I-XL-v1.0', device='cuda:0')
if_II = IFStageII('IF-II-L-v1.0', device='cuda:1')
if_III = StableStageIII('stable-diffusion-x4-upscaler', device='cuda:2')
t5 = T5Embedder(device="cuda:3")

Question: Custom Resolution or Aspect Ratios

I love the new model but certainly miss custom resolution and aspect ratios…. Any way to do it yet????

Can "open source" software require a third party account and access token

I'm aware that what does or does not constitute "open source" is somewhat contentious, but in my understanding requiring people to sign up for a third party account, consenting to a license through a third party service and using a third party access token to use the supposedly "open" software is pushing the concept of openness past the breaking point.

Deep Floyd are, of course, perfectly in their rights to impose any restrictions and requirements they like, but to then go on and advertise a release as open source for the community credit seems at least a little bit disingenuous.

Need much clearer instructions about what needs to be done for torch>=2.0.0

This is not clear.
"
If you are using torch>=2.0.0, make sure to delete all enable_xformers_memory_efficient_attention()
functions.
"

WE NEED LORA

Repository not found

Kinda a newbie, to github/huggingface/colab/Everything

When going to this link https://huggingface.co/DeepFloyd/IF-I-IF-v1.0/resolve/main/text_encoder/config.json
it responds with Repo not found

Offload_folder is ignored?

│ /usr/local/lib/python3.10/dist-packages/accelerate/utils/modeling.py:872 in │
│ load_checkpoint_in_model │
│ │
│ 869 │ """ │
│ 870 │ tied_params = find_tied_parameters(model) │
│ 871 │ if offload_folder is None and device_map is not None and "disk" in device_map.values │
│ ❱ 872 │ │ raise ValueError( │
│ 873 │ │ │ "At least one of the model submodule will be offloaded to disk, please pass │
│ 874 │ │ ) │
│ 875 │ elif offload_folder is not None and device_map is not None and "disk" in device_map. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: At least one of the model submodule will be offloaded to disk, please pass along an offload_folder.

When running on colab, modified demo code

I actually have been playing with both XL and M models to see speed vs quality differences with the models.

So I now loaded XL model again during the same session.
I have been flush()ing and del ing the pipes and everything.
Anyway, line giving me errors is:

text_encoder = T5EncoderModel.from_pretrained(
"DeepFloyd/IF-I-XL-v1.0",
subfolder="text_encoder",
device_map="auto",
load_in_8bit=True,
variant="8bit"
)

pipe = IFImg2ImgPipeline.from_pretrained(
"DeepFloyd/IF-I-XL-v1.0",
text_encoder=text_encoder,
unet=None,
device_map="auto"
)
prompt_embeds, negative_embeds = pipe.encode_prompt(prompt)

#free some memory
del pipe
del text_encoder

for image in images:
flush()
pipe = IFImg2ImgPipeline.from_pretrained(
"DeepFloyd/IF-I-XL-v1.0",
text_encoder=None,
variant="fp16",
torch_dtype=torch.float16,
device_map="auto",
offload_folder = '/content/offload' #THIS IS APPARENTLY IGNORED? SHOULD IT BE IGNORED?
)

UNetModel parameters

Would it be possible to please post the UNetModel(**model_params) so devs can work on integrating/optimizing already just with randomly initialized weights until the actual ones are released?

Would be great to allow testing optimization ideas and things like that but hard without knowing the exact size, and I couldn't find that in the code currently unless I missed it.

Using Latent Upscaler instead of x4-upscaler

Hi, Thanks for releasing awesome model.

In stage 3, right now we are using "stable-diffusion-x4-upscaler". Which has a lot of memory requirement.

Can we use "stabilityai/sd-x2-latent-upscaler"? This has small memory footprint and is faster as well.

Quickstart failing on no distribution found for torch<2.0.0

Running the pip command "pip install deepfloyd_if==1.0.0" on win 10

gives:

ERROR: Could not find a version that satisfies the requirement torch<2.0.0 (from deepfloyd-if) (from versions: 2.0.0)
ERROR: No matching distribution found for torch<2.0.0

Module PIL has not attribute "Resampling"

So, if I install Pillow>=9.2.0, then I get: Module PIL has not attribute "Resampling"
And then if I downgrade to Pillow==9.0.0 to not get that error, I get deepfloyd-if 1.0.1 requires Pillow>=9.2.0

512x512

Hello and thank you for the amazing work you've done on this SOTA text2images. After testing the HF demo I noticed the super-resolution 256 -> 1024 struggle to give good results. Isn't it possible to introduce a middle step like 256 -> 512 -> 1024 instead?

Some questions of T5 dtype?

First, thanks for answering my questions.

When training, which dtype of T5.
Does T5 dtype have a significant impact on the results?

Can not get beautiful owl picture following the instruction.

`from diffusers import DiffusionPipeline
from diffusers.utils import pt_to_pil
import torch

stage 1

stage_1 = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0", variant="fp16", torch_dtype=torch.float16)
stage_1.enable_xformers_memory_efficient_attention() # remove line if torch.version >= 2.0.0
stage_1.enable_model_cpu_offload()

stage 2

stage_2 = DiffusionPipeline.from_pretrained(
"DeepFloyd/IF-II-L-v1.0", text_encoder=None, variant="fp16", torch_dtype=torch.float16
)
stage_2.enable_xformers_memory_efficient_attention() # remove line if torch.version >= 2.0.0
stage_2.enable_model_cpu_offload()

stage 3

safety_modules = {"feature_extractor": stage_1.feature_extractor, "safety_checker": stage_1.safety_checker, "watermarker": stage_1.watermarker}
stage_3 = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-x4-upscaler", **safety_modules, torch_dtype=torch.float16)
stage_3.enable_xformers_memory_efficient_attention() # remove line if torch.version >= 2.0.0
stage_3.enable_model_cpu_offload()

prompt = 'a photo of a kangaroo wearing an orange hoodie and blue sunglasses standing in front of the eiffel tower holding a sign that says "very deep learning"'

text embeds

prompt_embeds, negative_embeds = stage_1.encode_prompt(prompt)

generator = torch.manual_seed(0)

stage 1

image = stage_1(prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt").images
pt_to_pil(image)[0].save("./if_stage_I.png")

stage 2

image = stage_2(
image=image, prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt"
).images
pt_to_pil(image)[0].save("./if_stage_II.png")

stage 3

image = stage_3(prompt=prompt, image=image, generator=generator, noise_level=100).images
image[0].save("./if_stage_III.png")`

I got one picture like this:

but when I followed this code :
`from deepfloyd_if.modules import IFStageI, IFStageII, StableStageIII
from deepfloyd_if.modules.t5 import T5Embedder

device = 'cuda:0'
if_I = IFStageI('IF-I-XL-v1.0', device=device)
if_II = IFStageII('IF-II-L-v1.0', device=device)
if_III = StableStageIII('stable-diffusion-x4-upscaler', device=device)
t5 = T5Embedder(device="cpu")
from deepfloyd_if.pipelines import dream

prompt = 'ultra close-up color photo portrait of rainbow owl with deer horns in the woods'
count = 4

result = dream(
t5=t5, if_I=if_I, if_II=if_II, if_III=if_III,
prompt=[prompt]*count,
seed=42,
if_I_kwargs={
"guidance_scale": 7.0,
"sample_timestep_respacing": "smart100",
},
if_II_kwargs={
"guidance_scale": 4.0,
"sample_timestep_respacing": "smart50",
},
if_III_kwargs={
"guidance_scale": 9.0,
"noise_level": 20,
"sample_timestep_respacing": "75",
},
)

if_III.show(result['III'], size=14)
`

I just got this:

protobuf not installed on notebook

On the example notebook you are missing
!pip install protobuf==3.20.1

just add that after the other pip installs and before t5 and it'll work great.
also if you're using a docker image make sure to use:
nvidia/cuda:11.7.1-cudnn8-devel-ubuntu22.04

Error when running image variation section in Notebook:

ValueError:
Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to
fit
the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a
custom
device_map to from_pretrained. Check
https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-
cpu-and-gpu
for more details.

Error when running through examples: "When passing variant='fp16' upgrade `transformers` to at least 4.27.0.dev0"

Running through one of the examples, and finding the following error related to the transformer version:

Traceback (most recent call last):
  File "test3.py", line 9, in <module>
    stage_1 = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-M-v1.0", variant="fp16", torch_dtype=torch.float16)
  File "${HOME}/miniconda3/envs/deepfloyd/lib/python3.8/site-packages/diffusers/pipelines/pipeline_utils.py", line 1039, in from_pretrained
    loaded_sub_model = load_sub_model(
  File "${HOME}/miniconda3/envs/deepfloyd/lib/python3.8/site-packages/diffusers/pipelines/pipeline_utils.py", line 431, in load_sub_model
    raise ImportError(
ImportError: When passing `variant='fp16'`, please make sure to upgrade your `transformers` version to at least 4.27.0.dev0

If appears that 4.25.1 is the version installed when using the requirements.txt file and following the README instructions.

I'm currently rerunning now (after removing 4.25.1 and installing transformers 4.28.1), however would 4.28.1 be compatible or would we need to keep the library under a certain version?

Thanks! : )

Sharing the sample code I've been utilizing to test:

from diffusers import DiffusionPipeline
from diffusers.utils import pt_to_pil
import torch
from huggingface_hub import login

login()

# stage 1
stage_1 = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-M-v1.0", variant="fp16", torch_dtype=torch.float16)
stage_1.enable_xformers_memory_efficient_attention()  # remove line if torch.__version__ >= 2.0.0
stage_1.enable_model_cpu_offload()

# stage 2
stage_2 = DiffusionPipeline.from_pretrained(
    "DeepFloyd/IF-II-M-v1.0", text_encoder=None, variant="fp16", torch_dtype=torch.float16
)
stage_2.enable_xformers_memory_efficient_attention()  # remove line if torch.__version__ >= 2.0.0
stage_2.enable_model_cpu_offload()

# stage 3
safety_modules = {"feature_extractor": stage_1.feature_extractor, "safety_checker": stage_1.safety_checker, "watermarker": stage_1.watermarker}
stage_3 = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-x4-upscaler", **safety_modules, torch_dtype=torch.float16)
stage_3.enable_xformers_memory_efficient_attention()  # remove line if torch.__version__ >= 2.0.0
stage_3.enable_model_cpu_offload()

Any specific formatting needed for prompts to get good results for rendered text?

Getting an impression that the images with well rendered text have been cherry picked. I have done a few experiments and either text is nowhere to be seen in images, or it is badly misspelled.

Thanks for any insights.

t5 = T5Embedder(device='cpu') works, very slowly. t5 = T5Embedder(device='cuda:0') causes a runtime error.

I have tried all kinds of combinations torch 1.13.1 and 2.0.0, CUDA 11.3 and CUDA 11.8.

torch.matmul fails on a GPU.

Please Add Discussions Tab

It would be very nice to have a centralized (GitHub discussions tab for this repo) place to have discussions about getting the code up and running it, without discussions being divided among random subreddits and discord servers.

Unreadable notebook

When I tried to open and try the notebook (via jupyter notebook) I've got the following error message:

Error loading notebook
Unreadable Notebook: /home/ogem/codes/public/2023/IF/notebooks/pipes-DeepFloyd-IF.ipynb NotJSONError("Notebook does not appear to be JSON: 'version https://git-lfs.github.com/spec...")

Is there a json syntax error? Or maybe there is another way to open and use the notebook?

Why it doesn't work when I use the model downloaded from huggingface

I download the model from huggingface:https://huggingface.co/DeepFloyd/IF-I-XL-v1.0/discussions?status=all
but it doesn't work when I run the model in stable diffusion web ui with the sample of 'a photo of a kangaroo wearing an orange hoodie and blue sunglasses standing in front of the eiffel tower holding a sign that says "very deep learning"'

Faster sampling by DPM-Solver++

Congrats! Super great work!

I've noticed that you're currently using the original DDPM scheduler, which is rather slow. It would be much faster if we could apply DPM-Solver++ into this work to accelerate the sampling.

Note that the original DPM-Solver++ may have numerical issues when using the cosine beta schedule, and I've added a fix here: https://github.com/LuChengTHU/dpm-solver/blob/5c6ee9f1e6b60c8c54f955fbaab0a6717fc2b75b/dpm_solver_pytorch.py#L105

I'm happy to help to integrate DPM-Solver++ into IF when the model is released :)

Only work at demo's pic, if I use my picture, it releases a bug , AssertionError:

AssertionError Traceback (most recent call last)
Cell In[24], line 4
1 count = 4
2 prompt = 'a boy'
----> 4 result = style_transfer(
5 t5=t5, if_I=if_I, if_II=if_II, if_III=if_III,
6 support_pil_img=zkc,
7 prompt=[prompt]*count,
8 style_prompt=[
9 f'in style lego',
10 f'in style zombie',
11 f'in style origami',
12 f'in style anime',
13 ],
14 seed=42,
15 if_I_kwargs={
16 "guidance_scale": 10.0,
17 "sample_timestep_respacing": "10,10,10,10,10,0,0,0,0,0",
18 'support_noise_less_qsample_steps': 5,
19 'positive_mixer': 0.8,
20 },
21 if_II_kwargs={
22 "guidance_scale": 4.0,
23 "sample_timestep_respacing": 'smart50',
24 "support_noise_less_qsample_steps": 5,
25 'positive_mixer': 1.0,
26 },
27 )
28 if_I.show(result['III'], 2, 14)

File ~/miniconda3/envs/if/lib/python3.10/site-packages/deepfloyd_if/pipelines/style_transfer.py:91, in style_transfer(t5, if_I, if_II, if_III, support_pil_img, style_prompt, prompt, negative_prompt, seed, if_I_kwargs, if_II_kwargs, if_III_kwargs, progress, return_tensors, disable_watermark)
87 if_II_kwargs['progress'] = progress
89 if_II_kwargs['support_noise'] = mid_res
---> 91 stageII_generations, _meta = if_II.embeddings_to_image(**if_II_kwargs)
92 pil_images_II = if_II.to_images(stageII_generations, disable_watermark=disable_watermark)
94 result['II'] = pil_images_II

File ~/miniconda3/envs/if/lib/python3.10/site-packages/deepfloyd_if/modules/stage_II.py:26, in IFStageII.embeddings_to_image(self, low_res, t5_embs, style_t5_embs, positive_t5_embs, negative_t5_embs, batch_repeat, aug_level, dynamic_thresholding_p, dynamic_thresholding_c, sample_loop, sample_timestep_respacing, guidance_scale, img_scale, positive_mixer, progress, seed, sample_fn, **kwargs)
21 def embeddings_to_image(
22 self, low_res, t5_embs, style_t5_embs=None, positive_t5_embs=None, negative_t5_embs=None, batch_repeat=1,
23 aug_level=0.25, dynamic_thresholding_p=0.95, dynamic_thresholding_c=1.0, sample_loop='ddpm',
24 sample_timestep_respacing='smart50', guidance_scale=4.0, img_scale=4.0, positive_mixer=0.5,
25 progress=True, seed=None, sample_fn=None, **kwargs):
---> 26 return super().embeddings_to_image(
27 t5_embs=t5_embs,
28 low_res=low_res,
29 style_t5_embs=style_t5_embs,
30 positive_t5_embs=positive_t5_embs,
31 negative_t5_embs=negative_t5_embs,
32 batch_repeat=batch_repeat,
33 aug_level=aug_level,
34 dynamic_thresholding_p=dynamic_thresholding_p,
35 dynamic_thresholding_c=dynamic_thresholding_c,
36 sample_loop=sample_loop,
37 sample_timestep_respacing=sample_timestep_respacing,
38 guidance_scale=guidance_scale,
39 positive_mixer=positive_mixer,
40 img_size=256,
41 img_scale=img_scale,
42 progress=progress,
43 seed=seed,
44 sample_fn=sample_fn,
45 **kwargs
46 )

File ~/miniconda3/envs/if/lib/python3.10/site-packages/deepfloyd_if/modules/base.py:181, in IFBaseModule.embeddings_to_image(self, t5_embs, low_res, style_t5_embs, positive_t5_embs, negative_t5_embs, batch_repeat, dynamic_thresholding_p, sample_loop, sample_timestep_respacing, dynamic_thresholding_c, guidance_scale, aug_level, positive_mixer, blur_sigma, img_size, img_scale, aspect_ratio, progress, seed, sample_fn, support_noise, support_noise_less_qsample_steps, inpainting_mask, **kwargs)
179 else:
180 assert support_noise_less_qsample_steps < len(diffusion.timestep_map) - 1
--> 181 assert support_noise.shape == (1, 3, image_h, image_w)
182 q_sample_steps = torch.tensor([int(len(diffusion.timestep_map) - 1 - support_noise_less_qsample_steps)])
183 support_noise = support_noise.cpu()

Does it work on M1?

CUDA out of memory.

However I am using a station with 4 x A100(40G)

if_I = IFStageI('/IF/deepfloyd-if/IF-I-XL-v1.0', device='cuda:0')
if_II = IFStageII('/IF/deepfloyd-if/IF-II-L-v1.0', device='cuda:1')
if_III = StableStageIII('/IF/deepfloyd-if/stable-diffusion-x4-upscaler', device='cuda:2')
t5 = T5Embedder(device="cuda:3")

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB (GPU 0; 39.39 GiB total capacity; 29.37 GiB already allocated; 6.90 GiB free; 30.95 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Issue with inpainting

Hi!
I;ve tried to launch the inpainting example from the internal notebook and got error.
`
----> 1 result = inpainting(
2 t5=t5, if_I=if_I,
3 if_II=if_II,
4 if_III=if_III,
5 support_pil_img=raw_pil_image.resize((128, 128), resample=Image.BICUBIC),
6 inpainting_mask=inpainting_mask,
7 prompt=[
8 'blue sunglasses',
9 ],
10 seed=42,
11 if_I_kwargs={
12 "guidance_scale": 7.0,
13 "sample_timestep_respacing": "10,10,10,10,10,0,0,0,0,0",
14 'support_noise_less_qsample_steps': 0,
15 },
16 if_II_kwargs={
17 "guidance_scale": 4.0,
18 'aug_level': 0.0,
19 "sample_timestep_respacing": '100',
20 },
21 )
22 if_I.show(result['I'], 2, 3)
23 if_I.show(result['II'], 2, 6)

File ~/miniconda3/envs/df/lib/python3.8/site-packages/deepfloyd_if/pipelines/inpainting.py:61, in inpainting(t5, if_I, if_II, if_III, support_pil_img, prompt, inpainting_mask, negative_prompt, seed, if_I_kwargs, if_II_kwargs, if_III_kwargs, progress, return_tensors, disable_watermark)
57 if_I_kwargs['negative_t5_embs'] = negative_t5_embs
59 if_I_kwargs['support_noise'] = low_res
---> 61 inpainting_mask_I = img_as_bool(resize(inpainting_mask[0].cpu(), (3, image_h, image_w)))
62 inpainting_mask_I = torch.from_numpy(inpainting_mask_I).unsqueeze(0).to(if_I.device)
64 if_I_kwargs['inpainting_mask'] = inpainting_mask_I

File ~/miniconda3/envs/df/lib/python3.8/site-packages/skimage/transform/_warps.py:154, in resize(image, output_shape, order, mode, cval, clip, preserve_range, anti_aliasing, anti_aliasing_sigma)
149 image = image.astype(np.float32)
151 if anti_aliasing is None:
152 anti_aliasing = (
153 not input_type == bool and
--> 154 not (np.issubdtype(input_type, np.integer) and order == 0) and
155 any(x < y for x, y in zip(output_shape, input_shape)))
157 if input_type == bool and anti_aliasing:
158 raise ValueError("anti_aliasing must be False for boolean images")

File ~/miniconda3/envs/df/lib/python3.8/site-packages/numpy/core/numerictypes.py:416, in issubdtype(arg1, arg2)
358 r"""
359 Returns True if first argument is a typecode lower/equal in type hierarchy.
360
(...)
413
414 """
415 if not issubclass_(arg1, generic):
--> 416 arg1 = dtype(arg1).type
417 if not issubclass_(arg2, generic):
418 arg2 = dtype(arg2).type

TypeError: Cannot interpret 'torch.float32' as a data type
`

libs:

I assume something wrong with scikit-image, not sure what
Please, assist.
Thanks!

Repository description

Please consider filling in repository details here on GitHub including topics.

The top right ⚙️ icon.

pipes-DeepFloyd-IF-v1.0.ipynb link is broken

Jupyter notebook links are broken on the readme.

They point here https://huggingface.co/DeepFloyd/IF-I-XL-v1.0/blob/main/notebooks/pipes-DeepFloyd-IF-v1.0.ipynb

Fine-tune

How can we fine-tune it on a single subject with some 10-15 photos and instance/class prompts?

How to save image

where is the api for save image of IF?

can not load "stable-diffusion-x4-upscaler"

error info:

from deepfloyd_if.modules.t5 import T5Embedder
device = 'cuda:0'
if_I = IFStageI('IF-I-XL-v1.0', device=device)
D:\AiTools\DeepFloydIF\IF\vnev\lib\site-packages\huggingface_hub\file_download.py:1104: FutureWarning: The force_filename parameter is deprecated as a new caching system, which keeps the filenames as they are on the Hub, is now in place.
warnings.warn(
if_II = IFStageII('IF-II-L-v1.0', device=device)
if_III = StableStageIII('stable-diffusion-x4-upscaler', device=device)
Traceback (most recent call last):
File "", line 1, in
File "D:\AiTools\DeepFloydIF\IF\vnev\lib\site-packages\deepfloyd_if\modules\stage_III_sd_x4.py", line 34, in init
self.model = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch_dtype, token=self.hf_token)
File "D:\AiTools\DeepFloydIF\IF\vnev\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 884, in from_pretrained
cached_folder = cls.download(
File "D:\AiTools\DeepFloydIF\IF\vnev\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 1208, in download
config_file = hf_hub_download(
File "D:\AiTools\DeepFloydIF\IF\vnev\lib\site-packages\huggingface_hub\utils_validators.py", line 112, in _inner_fn
validate_repo_id(arg_value)
File "D:\AiTools\DeepFloydIF\IF\vnev\lib\site-packages\huggingface_hub\utils_validators.py", line 166, in validate_repo_id
raise HFValidationError(
huggingface_hub.utils.validators.HFValidationError: Repo id must use alphanumeric chars or '-', '', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'stabilityai\stable-diffusion-x4-upscaler'.

No model is found on hugging face in given user

demo code on huggingface is not runnable

I got error when running the sample code on hf: https://huggingface.co/docs/diffusers/api/pipelines/if#diffusers.IFImg2ImgSuperResolutionPipeline

the error message:

I already logging in to huggingface. It seems that there is a bug when mapping the model name? I never used IF-I-XL, I used IF-I-IF instead.

Any way to speed up GPU rendering? Much slower than Stable Diffusion V1 (or V2)

I am running the dream pipeline on 4 water-cooled Maxwell Titan X, with each stage on its own GPU. It is slow as molasses. It is painful to watch.

There are no OOMs, stages do fit into 12.3MiB that each Titan has.

Any suggestions are welcome.

running the txt2image script returns all sorts of errors

Manjaro Linux, 4090, amd cpu.
I created a deepfloyd env python=3.10, activated it
pip install -U huggingface_hub diffusers transformers safetensors sentencepiece accelerate bitsandbytes torch
started python and got the token from huggingface
created the script file and ran it. got these errors:

Can someone just point me in the right direction?

2023-04-29 17:11:30.330731: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-29 17:11:30.466991: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
Traceback (most recent call last):
  File "/home/vhey/miniconda3/envs/deepfloyd/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1146, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/home/vhey/miniconda3/envs/deepfloyd/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/vhey/miniconda3/envs/deepfloyd/lib/python3.10/site-packages/transformers/models/clip/image_processing_clip.py", line 22, in <module>
    from ...image_transforms import (
  File "/home/vhey/miniconda3/envs/deepfloyd/lib/python3.10/site-packages/transformers/image_transforms.py", line 48, in <module>
    import tensorflow as tf
  File "/home/vhey/.local/lib/python3.10/site-packages/tensorflow/__init__.py", line 37, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "/home/vhey/.local/lib/python3.10/site-packages/tensorflow/python/__init__.py", line 37, in <module>
    from tensorflow.python.eager import context
  File "/home/vhey/.local/lib/python3.10/site-packages/tensorflow/python/eager/context.py", line 27, in <module>
    import six
ModuleNotFoundError: No module named 'six'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/vhey/deepfloyd/txt2img.py", line 1, in <module>
    from diffusers import DiffusionPipeline
  File "/home/vhey/miniconda3/envs/deepfloyd/lib/python3.10/site-packages/diffusers/__init__.py", line 58, in <module>
    from .pipelines import (
  File "/home/vhey/miniconda3/envs/deepfloyd/lib/python3.10/site-packages/diffusers/pipelines/__init__.py", line 45, in <module>
    from .alt_diffusion import AltDiffusionImg2ImgPipeline, AltDiffusionPipeline
  File "/home/vhey/miniconda3/envs/deepfloyd/lib/python3.10/site-packages/diffusers/pipelines/alt_diffusion/__init__.py", line 32, in <module>
    from .pipeline_alt_diffusion import AltDiffusionPipeline
  File "/home/vhey/miniconda3/envs/deepfloyd/lib/python3.10/site-packages/diffusers/pipelines/alt_diffusion/pipeline_alt_diffusion.py", line 20, in <module>
    from transformers import CLIPImageProcessor, XLMRobertaTokenizer
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "/home/vhey/miniconda3/envs/deepfloyd/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1137, in __getattr__
    value = getattr(module, name)
  File "/home/vhey/miniconda3/envs/deepfloyd/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1136, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/home/vhey/miniconda3/envs/deepfloyd/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1148, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.models.clip.image_processing_clip because of the following error (look up to see its traceback):
No module named 'six'

Clarification on license reference to removing content filters?

I'm wonder if this section of the license is supposed to be included? It appears to say that any removal of the content filters is not allowed under any circumstances. If that is the case, then it's only going to trigger conflict with the community immediately after the release of the weights.

2. All persons obtaining a copy or substantial portion of the Software,
a modified version of the Software (or substantial portion thereof), or
a derivative work based upon this Software (or substantial portion thereof)
must not delete, remove, disable, diminish, or circumvent any inference filters or
inference filter mechanisms in the Software, or any portion of the Software that
implements any such filters or filter mechanisms.

https://github.com/deep-floyd/IF/blob/af64403da0ae2667e5d40670f4014de04bd5c523/LICENSE

Commands

Is there a list of commands somewhere?

4x-upscaler deepfloyd-if python module has problems with win paths

In Windows, when running the notebook of the IF-I-XL-v.1.0 model, the following error occurs when trying to download the stable-diffusion-x4-upscaler:
HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'stabilityai\stable-diffusion-x4-upscaler'.

A quick fix would be to change line 23 in the file [your-venv-name]\Lib\site-packages\deepfloyd_if\modules to model_id = 'stabilityai/' + self.dir_or_name

vram requirements

the readme lists a minimum of 16GB of vram without the stable-x4 upscaler, 24GB with, however you can run it with the stable-x4 on as little as 6GB of vram using sequential offload on the first stage/text encoder (in fp16) and cpu offload on the second/third stage. you can also run all three stages using cpu offload on 16GB (maybe less). you do need sufficient dram though.

  stage_1 = IFPipeline.from_pretrained(
      "DeepFloyd/IF-I-XL-v1.0",
      variant="fp16",
      torch_dtype=torch.float16,
  )
  stage_2 = IFSuperResolutionPipeline.from_pretrained(
      "DeepFloyd/IF-II-L-v1.0",
      text_encoder=None,
      variant="fp16",
      torch_dtype=torch.float16,
  )
  stage_3 = DiffusionPipeline.from_pretrained(
      "stabilityai/stable-diffusion-x4-upscaler", torch_dtype=torch.float16
  )

#16 GB
stage_1.enable_model_cpu_offload()
stage_2.enable_model_cpu_offload()
stage_3.enable_model_cpu_offload()

#6 GB
stage_1.enable_sequential_cpu_offload()
stage_2.enable_model_cpu_offload()
stage_3.enable_model_cpu_offload()

i tested this on pytorch2.0.0+cu118 with torch.cuda.set_per_process_memory_fraction() to limit the amount of vram torch can use.
the sequential offload significantly slows down the first stage, but that's better than not being able to run it at all

Kernel crash on loading model in Ubuntu 22.04

Hey, I'm trying to load the model into 24GB VRAM GPU.

This is my code
from diffusers import DiffusionPipeline
from diffusers.utils import pt_to_pil
import torch

stage_1 = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0", torch_dtype=torch.float16)
stage_1.enable_xformers_memory_efficient_attention()
stage_1.enable_model_cpu_offload()

The kernel crashes while loading the model into the memory, I tried loading from deepfloyd_if same thing it also crashes while running the following code.
from deepfloyd_if.modules import IFStageI, IFStageII, StableStageIII
from deepfloyd_if.modules.t5 import T5Embedder

device = 'cuda:0'
if_I = IFStageI('IF-I-XL-v1.0', device=device)
if_II = IFStageII('IF-II-L-v1.0', device=device)
if_III = StableStageIII('stable-diffusion-x4-upscaler', device=device)
t5 = T5Embedder(device="cpu")

This is the error shown in the notebook,
Canceled future for execute_request message before replies were done The Kernel crashed while executing code in the the current cell or a previous cell. Please review the code in the cell(s) to identify a possible cause of the failure. Click here for more info. View Jupyter log for further details.

I tracked memory usage it is not passing 14GB mark, how do I resolve it?

finetune

Finetuning code will be released as well? Awesome project btw! Cant wait to train a custom model

Is this really Open Source?

Then why is this file behind a login?

https://huggingface.co/DeepFloyd/IF-I-XL-v1.0/resolve/main/config.yml

I just want to run it on my server, use my GPUs, storage, etc...

Installation instructions are not working on Windows (11)

Tried to follow the instructions, yielded in a total disaster. Each pip pack wants to install its own torch version, and I couldn't get anything to work. Followed the instructions 1:1 multiple times in a few diff fresh envs, to no avail.

Also tried with a fresh new PT2 venv, also to no avail.

Could you please re-test your instructions, on windows preferably? I have an RTX 4090 with 24gb of vram, and I couldn't even get to the loading into vram part.

Can it learn custom tokens using dreambooth or LORA like techniques?

Flan-T5

Can we use FLAN-T5 as a language model?
Those FLAN models can represent English and other languages significantly better in our tests.
"If you already know T5, FLAN-T5 is just better at everything. For the same number of parameters, these models have been fine-tuned on more than 1000 additional tasks covering also more languages."

Not Implemented Error: Memory efficient attention with `xformers` is currently not supported when `self.added_kv_proj_dim` is defined

After going through the README instructions, trying the following test script just to get started, however I am consistently receiving an error: NotImplementedError: Memory efficient attention with xformersis currently not supported whenself.added_kv_proj_dim is defined. (full traceback shared after test code section):

Testcode:

from diffusers import DiffusionPipeline
from diffusers.utils import pt_to_pil
import torch

# stage 1
stage_1 = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0", variant="fp16", torch_dtype=torch.float16)
stage_1.enable_xformers_memory_efficient_attention()  # remove line if torch.__version__ >= 2.0.0
stage_1.enable_model_cpu_offload()

# stage 2
stage_2 = DiffusionPipeline.from_pretrained(
    "DeepFloyd/IF-II-L-v1.0", text_encoder=None, variant="fp16", torch_dtype=torch.float16
)
stage_2.enable_xformers_memory_efficient_attention()  # remove line if torch.__version__ >= 2.0.0
stage_2.enable_model_cpu_offload()

# stage 3
safety_modules = {"feature_extractor": stage_1.feature_extractor, "safety_checker": stage_1.safety_checker, "watermarker": stage_1.watermarker}
stage_3 = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-x4-upscaler", **safety_modules, torch_dtype=torch.float16)
stage_3.enable_xformers_memory_efficient_attention()  # remove line if torch.__version__ >= 2.0.0
stage_3.enable_model_cpu_offload()

prompt = 'a photo of a kangaroo wearing an orange hoodie and blue sunglasses standing in front of the eiffel tower holding a sign that says "very deep learning"'

# text embeds
prompt_embeds, negative_embeds = stage_1.encode_prompt(prompt)

generator = torch.manual_seed(0)

# stage 1
image = stage_1(prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt").images
pt_to_pil(image)[0].save("./if_stage_I.png")

# stage 2
image = stage_2(
    image=image, prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt"
).images
pt_to_pil(image)[0].save("./if_stage_II.png")

# stage 3
image = stage_3(prompt=prompt, image=image, generator=generator, noise_level=100).images
image[0].save("./if_stage_III.png")

Error traceback:

Traceback (most recent call last):
  File "test2.py", line 8, in <module>00%|████████████████████████████████████████████████████████████████████████| 8.61G/8.61G [1:20:50<00:00, 2.70MB/s]
    stage_1.enable_xformers_memory_efficient_attention()  # remove line if torch.__version__ >= 2.0.0
  File "${HOME}/miniconda3/envs/deepfloyd/lib/python3.8/site-packages/diffusers/pipelines/pipeline_utils.py", line 1448, in enable_xformers_memory_efficient_attention
    self.set_use_memory_efficient_attention_xformers(True, attention_op)
  File "${HOME}/miniconda3/envs/deepfloyd/lib/python3.8/site-packages/diffusers/pipelines/pipeline_utils.py", line 1474, in set_use_memory_efficient_attention_xformers
    fn_recursive_set_mem_eff(module)
  File "${HOME}/miniconda3/envs/deepfloyd/lib/python3.8/site-packages/diffusers/pipelines/pipeline_utils.py", line 1464, in fn_recursive_set_mem_eff
    module.set_use_memory_efficient_attention_xformers(valid, attention_op)
  File "${HOME}/miniconda3/envs/deepfloyd/lib/python3.8/site-packages/diffusers/models/modeling_utils.py", line 227, in set_use_memory_efficient_attention_xformers
    fn_recursive_set_mem_eff(module)
  File "${HOME}/miniconda3/envs/deepfloyd/lib/python3.8/site-packages/diffusers/models/modeling_utils.py", line 223, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "${HOME}/miniconda3/envs/deepfloyd/lib/python3.8/site-packages/diffusers/models/modeling_utils.py", line 223, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "${HOME}/miniconda3/envs/deepfloyd/lib/python3.8/site-packages/diffusers/models/modeling_utils.py", line 223, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "${HOME}/miniconda3/envs/deepfloyd/lib/python3.8/site-packages/diffusers/models/modeling_utils.py", line 220, in fn_recursive_set_mem_eff
    module.set_use_memory_efficient_attention_xformers(valid, attention_op)
  File "${HOME}/miniconda3/envs/deepfloyd/lib/python3.8/site-packages/diffusers/models/attention_processor.py", line 161, in set_use_memory_efficient_attention_xformers
    raise NotImplementedError(
NotImplementedError: Memory efficient attention with `xformers` is currently not supported when `self.added_kv_proj_dim` is defined.

How to get output of Zero Shot Image To Image to match input image size?

How can I ensure the output image size of image to image match the input? Going on the example colab code I use this

original_image = Image.open("input.png")

text_encoder = T5EncoderModel.from_pretrained(
    "DeepFloyd/IF-I-XL-v1.0",
    subfolder="text_encoder", 
    device_map="auto", 
    load_in_8bit=True, 
    variant="8bit"
)

pipe = IFImg2ImgPipeline.from_pretrained(
    "DeepFloyd/IF-I-XL-v1.0", 
    text_encoder=text_encoder, 
    unet=None, 
    device_map="auto"
)

prompt = "anime style"

prompt_embeds, negative_embeds = pipe.encode_prompt(prompt)

pipe = IFImg2ImgPipeline.from_pretrained(
    "DeepFloyd/IF-I-XL-v1.0", 
    text_encoder=None, 
    variant="fp16", 
    torch_dtype=torch.float16, 
    device_map="auto"
)

generator = torch.Generator().manual_seed(0)

image = pipe(
    image=original_image,
    prompt_embeds=prompt_embeds,
    negative_prompt_embeds=negative_embeds, 
    output_type="pt",
    generator=generator,
).images

pil_image = pt_to_pil(image)
pil_image[0].save("output.png")

pipe = IFImg2ImgSuperResolutionPipeline.from_pretrained(
    "DeepFloyd/IF-II-L-v1.0", 
    text_encoder=None, 
    variant="fp16", 
    torch_dtype=torch.float16, 
    device_map="auto"
)

image = pipe(
    image=image,
    original_image=original_image,
    prompt_embeds=prompt_embeds,
    negative_prompt_embeds=negative_embeds, 
    generator=generator,
).images

image[0].save("output.png")

Which works, but the output size is always smaller than the input image.
What am I missing?

This is the output for a 550x550 input image.

If possible, please give full code examples too. You have a good initial code snippet on the readme for Text to Image, but then the rest of the examples are incomplete. The same sort of full code examples would be very helpful.

Can we expect this to come automatic1111 soon? Do you have a gradio script?

Hello. It looks amazingly promising

I plan to make a tutorial on my channel (https://www.youtube.com/SECourses) but it looks super technical so people wouldn't like

So my questions are

1 st : how hard it would be to be implemetened automatic1111? I dont want my tutorial to become obsolete in few days

2 nd : do you have a gradio script that will make using easier?

3 rd : why watermark is forced? Doesn't make sense

cuBLAS issue.

I have freshly installed CUDA toolkit 11.8 on both the host, and inside a docker container. Within the container I run "jupyter notebook"

Previously I got the same error with CUDA 11.3

My understanding is that cuBLAS is part of the CUDA toolkit, and therefore should be available.

import os
import torch
os.environ['FORCE_MEM_EFFICIENT_ATTN'] = "1"
import sys
from deepfloyd_if.modules import IFStageI, IFStageII, StableStageIII
from deepfloyd_if.modules.t5 import T5Embedder
from deepfloyd_if.pipelines import dream, style_transfer, super_resolution, inpainting
import torch.nn.functional as F
import random
import torchvision.transforms as T
import numpy as np
import requests
from PIL import Image
import torch
import re
print("Loaded modules")

prompt = 'lush garden'
count = 4

result = dream(
t5=t5, if_I=if_I, if_II=if_II, if_III=if_III,
prompt=[prompt]*count,
seed=42,
if_I_kwargs={
"guidance_scale": 7.0,
"sample_timestep_respacing": "smart100",
},
if_II_kwargs={
"guidance_scale": 4.0,
"sample_timestep_respacing": "smart50",
},
)
if_I.show(result['I'], size=3)
if_I.show(result['II'], size=6)
if_I.show(result['III'], size=14)

166 return module._hf_hook.post_forward(module, output)

File ~/.local/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py:530, in T5Attention.forward(self, hidden_states, mask, key_value_states, position_bias, past_key_value, layer_head_mask, query_length, use_cache, output_attentions)
525 value_states = project(
526 hidden_states, self.v, key_value_states, past_key_value[1] if past_key_value is not None else None
527 )
529 # compute scores
--> 530 scores = torch.matmul(
531 query_states, key_states.transpose(3, 2)
532 ) # equivalent of torch.einsum("bnqd,bnkd->bnqk", query_states, key_states), compatible with onnx op>9
534 if position_bias is None:
535 if not self.has_relative_attention_bias:

RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasGemmStridedBatchedExFix(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)

deep-floyd / if Goto Github PK

if's People

Contributors

Stargazers

Watchers

Forkers

if's Issues

I have 4 Maxwel Titan X with 12GB VRAM each.

stage 1

stage 2

stage 3

text embeds

stage 1

stage 2

stage 3

I have freshly installed CUDA toolkit 11.8 on both the host, and inside a docker container. Within the container I run "jupyter notebook"

Previously I got the same error with CUDA 11.3

My understanding is that cuBLAS is part of the CUDA toolkit, and therefore should be available.

Recommend Projects

Recommend Topics

Recommend Org