tothebeginning / pulid Goto Github PK

View Code? Open in Web Editor NEW

2.0K 2.0K 127.0 4.05 MB

Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment

License: Apache License 2.0

Python 100.00%

pulid's People

Contributors

Stargazers

Watchers

pulid's Issues

I can't load loras

I try to load loras with
pipeline.pipe.load_lora_weights("/kaggle/input/lorass/acuarelac1400.safetensors")

I don't know if it is the correct way, it would be helpful if you told me how to load loras

but i get

---------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[175], line 23
     21 seed_everything(seed)
     22 #out=pipeline.inference(prompt, init_image,  mask_image , 0.8, (1, H, W), neg_prompt, id_embeddings, id_scale, scale, steps  )
---> 23 out=pipeline.pipe(prompt=prompt,
     24   image=init_image,
     25   mask_image=mask_image,
     26   strength=0.8,
     27   negative_prompt=neg_prompt,
     28   num_images_per_prompt=1,
     29   height=H,
     30   width=W,
     31   num_inference_steps=steps,
     32   guidance_scale= scale,
     33   cross_attention_kwargs={ 'id_embedding': id_embeddings, 'id_scale': id_scale},)
     35 out[0]

File /opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_inpaint.py:1707, in StableDiffusionXLInpaintPipeline.__call__(self, prompt, prompt_2, image, mask_image, masked_image_latents, height, width, strength, num_inference_steps, timesteps, denoising_start, denoising_end, guidance_scale, negative_prompt, negative_prompt_2, num_images_per_prompt, eta, generator, latents, prompt_embeds, negative_prompt_embeds, pooled_prompt_embeds, negative_pooled_prompt_embeds, ip_adapter_image, output_type, return_dict, cross_attention_kwargs, guidance_rescale, original_size, crops_coords_top_left, target_size, negative_original_size, negative_crops_coords_top_left, negative_target_size, aesthetic_score, negative_aesthetic_score, clip_skip, callback_on_step_end, callback_on_step_end_tensor_inputs, **kwargs)
   1705 if ip_adapter_image is not None:
   1706     added_cond_kwargs["image_embeds"] = image_embeds
-> 1707 noise_pred = self.unet(
   1708     latent_model_input,
   1709     t,
   1710     encoder_hidden_states=prompt_embeds,
   1711     timestep_cond=timestep_cond,
   1712     cross_attention_kwargs=self.cross_attention_kwargs,
   1713     added_cond_kwargs=added_cond_kwargs,
   1714     return_dict=False,
   1715 )[0]
   1717 # perform guidance
   1718 if self.do_classifier_free_guidance:

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File /opt/conda/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py:1112, in UNet2DConditionModel.forward(self, sample, timestep, encoder_hidden_states, class_labels, timestep_cond, attention_mask, cross_attention_kwargs, added_cond_kwargs, down_block_additional_residuals, mid_block_additional_residual, down_intrablock_additional_residuals, encoder_attention_mask, return_dict)
   1109     if is_adapter and len(down_intrablock_additional_residuals) > 0:
   1110         additional_residuals["additional_residuals"] = down_intrablock_additional_residuals.pop(0)
-> 1112     sample, res_samples = downsample_block(
   1113         hidden_states=sample,
   1114         temb=emb,
   1115         encoder_hidden_states=encoder_hidden_states,
   1116         attention_mask=attention_mask,
   1117         cross_attention_kwargs=cross_attention_kwargs,
   1118         encoder_attention_mask=encoder_attention_mask,
   1119         **additional_residuals,
   1120     )
   1121 else:
   1122     sample, res_samples = downsample_block(hidden_states=sample, temb=emb, scale=lora_scale)

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File /opt/conda/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py:1160, in CrossAttnDownBlock2D.forward(self, hidden_states, temb, encoder_hidden_states, attention_mask, cross_attention_kwargs, encoder_attention_mask, additional_residuals)
   1158 else:
   1159     hidden_states = resnet(hidden_states, temb, scale=lora_scale)
-> 1160     hidden_states = attn(
   1161         hidden_states,
   1162         encoder_hidden_states=encoder_hidden_states,
   1163         cross_attention_kwargs=cross_attention_kwargs,
   1164         attention_mask=attention_mask,
   1165         encoder_attention_mask=encoder_attention_mask,
   1166         return_dict=False,
   1167     )[0]
   1169 # apply additional residuals to the output of the last pair of resnet and attention blocks
   1170 if i == len(blocks) - 1 and additional_residuals is not None:

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File /opt/conda/lib/python3.10/site-packages/diffusers/models/transformer_2d.py:392, in Transformer2DModel.forward(self, hidden_states, encoder_hidden_states, timestep, added_cond_kwargs, class_labels, cross_attention_kwargs, attention_mask, encoder_attention_mask, return_dict)
    380         hidden_states = torch.utils.checkpoint.checkpoint(
    381             create_custom_forward(block),
    382             hidden_states,
   (...)
    389             **ckpt_kwargs,
    390         )
    391     else:
--> 392         hidden_states = block(
    393             hidden_states,
    394             attention_mask=attention_mask,
    395             encoder_hidden_states=encoder_hidden_states,
    396             encoder_attention_mask=encoder_attention_mask,
    397             timestep=timestep,
    398             cross_attention_kwargs=cross_attention_kwargs,
    399             class_labels=class_labels,
    400         )
    402 # 3. Output
    403 if self.is_input_continuous:

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File /opt/conda/lib/python3.10/site-packages/diffusers/models/attention.py:366, in BasicTransformerBlock.forward(self, hidden_states, attention_mask, encoder_hidden_states, encoder_attention_mask, timestep, cross_attention_kwargs, class_labels, added_cond_kwargs)
    363     if self.pos_embed is not None and self.use_ada_layer_norm_single is False:
    364         norm_hidden_states = self.pos_embed(norm_hidden_states)
--> 366     attn_output = self.attn2(
    367         norm_hidden_states,
    368         encoder_hidden_states=encoder_hidden_states,
    369         attention_mask=encoder_attention_mask,
    370         **cross_attention_kwargs,
    371     )
    372     hidden_states = attn_output + hidden_states
    374 # 4. Feed-forward

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File /opt/conda/lib/python3.10/site-packages/diffusers/models/attention_processor.py:527, in Attention.forward(self, hidden_states, encoder_hidden_states, attention_mask, **cross_attention_kwargs)
    508 r"""
    509 The forward method of the `Attention` class.
    510 
   (...)
    522     `torch.Tensor`: The output of the attention layer.
    523 """
    524 # The `Attention` class can call different attention processors / attention functions
    525 # here we simply pass along all tensors to the selected processor class
    526 # For standard processors that are defined here, `**cross_attention_kwargs` is empty
--> 527 return self.processor(
    528     self,
    529     hidden_states,
    530     encoder_hidden_states=encoder_hidden_states,
    531     attention_mask=attention_mask,
    532     **cross_attention_kwargs,
    533 )

File /kaggle/working/PuLID/pulid/attention_processor.py:365, in IDAttnProcessor2_0.__call__(self, attn, hidden_states, encoder_hidden_states, attention_mask, temb, id_embedding, id_scale)
    359 else:
    360     zero_tensor = torch.zeros(
    361         (id_embedding.size(0), NUM_ZERO, id_embedding.size(-1)),
    362         dtype=id_embedding.dtype,
    363         device=id_embedding.device,
    364     )
--> 365     id_key = self.id_to_k(torch.cat((id_embedding, zero_tensor), dim=1)).to(query.dtype)
    366     id_value = self.id_to_v(torch.cat((id_embedding, zero_tensor), dim=1)).to(query.dtype)
    368 id_key = id_key.view(batch_size, -1, attn.heads, head_dim).transpose(1, 2)

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/linear.py:114, in Linear.forward(self, input)
    113 def forward(self, input: Tensor) -> Tensor:
--> 114     return F.linear(input, self.weight, self.bias)

RuntimeError: expected scalar type Float but found Half

I don't know if it is the correct way, it would be helpful if you told me how to load loras

Cuda missing from requirements.txt

Here was the error when using powershell, creating venv, following instruction and installing the requirements.txt, and running python app.py

C:\users\newpc\downloads\pullid\pulid\venv\lib\site-packages\torchvision\transforms\functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional.
  warnings.warn(
Please 'pip install xformers'
Please 'pip install apex'
Please 'pip install xformers'
C:\users\newpc\downloads\pullid\pulid\venv\lib\site-packages\diffusers\configuration_utils.py:244: FutureWarning: It is deprecated to pass a pretrained model name or path to `from_config`.If you were trying to load a model, please use <class 'diffusers.models.unet_2d_condition.UNet2DConditionModel'>.load_config(...) followed by <class 'diffusers.models.unet_2d_condition.UNet2DConditionModel'>.from_config(...) instead. Otherwise, please make sure to pass a configuration dictionary instead. This functionality will be removed in v1.0.0.
  deprecate("config-passed-as-path", "1.0.0", deprecation_message, standard_warn=False)
Traceback (most recent call last):
  File "C:\users\newpc\downloads\pullid\pulid\app.py", line 11, in <module>
    pipeline = PuLIDPipeline()
  File "C:\users\newpc\downloads\pullid\pulid\pulid\pipeline.py", line 42, in __init__
    unet = UNet2DConditionModel.from_config(sdxl_base_repo, subfolder='unet').to(self.device, torch.float16)
  File "C:\users\newpc\downloads\pullid\pulid\venv\lib\site-packages\torch\nn\modules\module.py", line 1145, in to
    return self._apply(convert)
  File "C:\users\newpc\downloads\pullid\pulid\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  File "C:\users\newpc\downloads\pullid\pulid\venv\lib\site-packages\torch\nn\modules\module.py", line 820, in _apply
    param_applied = fn(param)
  File "C:\users\newpc\downloads\pullid\pulid\venv\lib\site-packages\torch\nn\modules\module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
  File "C:\users\newpc\downloads\pullid\pulid\venv\lib\site-packages\torch\cuda\__init__.py", line 239, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
(venv) PS C:\users\newpc\downloads\pullid\pulid>

I was able to resolve with:
pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 --extra-index-url https://download.pytorch.org/whl/cu118

Leaving this note just in case anyone else also runs into this issue. It's downloading now.. but still mentions the diffusers.models.unet_2d_condition.UNet2dconditionModel message.

C:\users\newpc\downloads\pullid\pulid\venv\lib\site-packages\diffusers\configuration_utils.py:244: FutureWarning: It is deprecated to pass a pretrained model name or path to `from_config`.If you were trying to load a model, please use <class 'diffusers.models.unet_2d_condition.UNet2DConditionModel'>.load_config(...) followed by <class 'diffusers.models.unet_2d_condition.UNet2DConditionModel'>.from_config(...) instead. Otherwise, please make sure to pass a configuration dictionary instead. This functionality will be removed in v1.0.0.
  deprecate("config-passed-as-path", "1.0.0", deprecation_message, standard_warn=False)
sdxl_lightning_4step_unet.safetensors:   8%|██▋                                 | 388M/5.14G [00:31<06:11, 12.8MB/s]

Non-Commercial usage, request for changing face recognition model

Hello,

Nice project but as you are utilizing antelopev2 as the face recognition model, it's important to note that your entire project is currently restricted for pure research purposes and cannot be utilized in commercial environments. To align with this restriction, I recommend placing similar information at the bottom of your page, akin to FaceID's notice found here: https://huggingface.co/h94/IP-Adapter-FaceID

For instance:
'Non-commercial Use: As InsightFace pretrained models are available solely for non-commercial research purposes, IP-Adapter-FaceID models are released exclusively for research purposes and are not intended for commercial use.'

While this restriction is in place, have you considered exploring alternative face recognition models that may not carry such limitations?
Thank you!

face collapse

Thank you for the open source code for learning. I have a question to ask. I found that the consistency of the faces generated in close range (when the characters are close to the camera) is very high, but in long range (when the characters are far from the camera), the faces are prone to collapse

Training script

Will the training scripts be released at any point?

Image Size ?

Whats the perfect image size for the input image ? How much environment ?

Comfyui please!

Very good face consistency but need comfyui. Excellent work!

huggingface demo

please add huggingface demo

ValueError: '0' is not a valid ControlMode

I use the PuLID for ID keeping T2I generation. I have update the eva-clip and controlnet in a1111-webui extension, but I get a bug as follow:

Webui ControlNet implementation produces low-quality results compared to Demo

I have been testing PuLID in A1111 following this guide

Mikubill/sd-webui-controlnet#2838

But the results are very different from your demo, they are very low quality and low fidelity.

I trsted with the same lightning model, seme seed, same steps same control net strength.

As you can see in the second generation the image is very soft, it looks more like a painted picture, what can cause this issue ?
I'm getting this same texture with every model and every sampling method and scheduler I tried, so it must be something in the implementation.

Do you plan to open-source the training code？

including the train from scratch (tfs) and fine-tuning (ft), as well as the training dataset? Looking forward to hearing about your plans and reply.

Your efforts and contributions to the open-source community are greatly appreciated.

num_images_per_prompt > 1, result will be bad

I notice you use for when num_sample > 1, I try to set num_images_per_prompt > 1, eg 2 or 4, the result is bad, ID similarity has been greatly weakened

Conflict with FreeU_V2?

Using PuLID in combination with FreeU_V2 gives garbled faces with white/pink blotches.

Edit: Problem wasn't FreeU but AYS - the default 10 steps seem too low so cranking up the step count solved the issue.

对论文细节的一些疑惑的地方

这里写成了KQQ，是不是typo error啊？应该是KQV吧

Please 'pip install apex'

I got a prompt please 'pip install apex' on the console screen, but this PuLID runs fine without this on ComfyUI. I then proceed to install this Apex (Im using Windows), via python setup.py install after git clone this Apex. The previous warning about installing Apex is gone but PuLID cannot run after this and specifically got this warning: [No module named 'fused_layer_norm_cuda'] on my ComfyUI console. The Pytorch is 2.1.2 + CU 11.8.

I rolled back by uninstalling Apex, and PuLID works again. Im just wondering if a proper Apex actually speeds up the process? If yes, which version shall I install and how shall I do? Im not that tech-savvy and am just an average SD user. Please help... Thanks in advance.

questions about the ligtning branch

what's the relationship between the Conventional Diffusion branch and the Lightning T2I branch? Do they share the same UNet weights but with different sampling algorithms?

Question about training details

Hello, when you calculated the layout loss and layout-sem loss, which cross-attention layers are the QKV features from? Do you use the features from all the cross-attention layers? I'm looking forward to your reply, thanks!

Please upload stage1&2 pulid model

Thanks for your work, result ganggangdi !
I'm doing research related on this right now, can you please upload the stage2(maximum ID sim) model?

connection errored out problem...

Anyone help me?

(pulid1) D:\PuLID>python app.py
C:\Users\admin\anaconda3\envs\pulid1\lib\site-packages\transformers\utils\generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
C:\Users\admin\anaconda3\envs\pulid1\lib\site-packages\transformers\utils\generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
C:\Users\admin\anaconda3\envs\pulid1\lib\site-packages\transformers\utils\generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree.register_pytree_node(
WARNING:xformers:A matching Triton is not available, some optimizations will not be enabled
Traceback (most recent call last):
File "C:\Users\admin\anaconda3\envs\pulid1\lib\site-packages\xformers_init.py", line 55, in _is_triton_available
from xformers.triton.softmax import softmax as triton_softmax # noqa
File "C:\Users\admin\anaconda3\envs\pulid1\lib\site-packages\xformers\triton\softmax.py", line 11, in
import triton
ModuleNotFoundError: No module named 'triton'
Please 'pip install apex'
C:\Users\admin\anaconda3\envs\pulid1\lib\site-packages\diffusers\configuration_utils.py:245: FutureWarning: It is deprecated to pass a pretrained model name or path to from_config.If you were trying to load a model, please use <class 'diffusers.models.unets.unet_2d_condition.UNet2DConditionModel'>.load_config(...) followed by <class 'diffusers.models.unets.unet_2d_condition.UNet2DConditionModel'>.from_config(...) instead. Otherwise, please make sure to pass a configuration dictionary instead. This functionality will be removed in v1.0.0.
deprecate("config-passed-as-path", "1.0.0", deprecation_message, standard_warn=False)
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 7/7 [00:00<00:00, 11.20it/s]
C:\Users\admin\anaconda3\envs\pulid1\lib\site-packages\torchvision\models_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
C:\Users\admin\anaconda3\envs\pulid1\lib\site-packages\torchvision\models_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=None.
warnings.warn(msg)
INFO:root:Loaded EVA02-CLIP-L-14-336 model config.
INFO:root:Shape of rope freq: torch.Size([576, 64])
INFO:root:Loading pretrained EVA02-CLIP-L-14-336 weights (eva_clip).
INFO:root:incompatible_keys.missing_keys: ['visual.rope.freqs_cos', 'visual.rope.freqs_sin', 'visual.blocks.0.attn.rope.freqs_cos', 'visual.blocks.0.attn.rope.freqs_sin', 'visual.blocks.1.attn.rope.freqs_cos', 'visual.blocks.1.attn.rope.freqs_sin', 'visual.blocks.2.attn.rope.freqs_cos', 'visual.blocks.2.attn.rope.freqs_sin', 'visual.blocks.3.attn.rope.freqs_cos', 'visual.blocks.3.attn.rope.freqs_sin', 'visual.blocks.4.attn.rope.freqs_cos', 'visual.blocks.4.attn.rope.freqs_sin', 'visual.blocks.5.attn.rope.freqs_cos', 'visual.blocks.5.attn.rope.freqs_sin', 'visual.blocks.6.attn.rope.freqs_cos', 'visual.blocks.6.attn.rope.freqs_sin', 'visual.blocks.7.attn.rope.freqs_cos', 'visual.blocks.7.attn.rope.freqs_sin', 'visual.blocks.8.attn.rope.freqs_cos', 'visual.blocks.8.attn.rope.freqs_sin', 'visual.blocks.9.attn.rope.freqs_cos', 'visual.blocks.9.attn.rope.freqs_sin', 'visual.blocks.10.attn.rope.freqs_cos', 'visual.blocks.10.attn.rope.freqs_sin', 'visual.blocks.11.attn.rope.freqs_cos', 'visual.blocks.11.attn.rope.freqs_sin', 'visual.blocks.12.attn.rope.freqs_cos', 'visual.blocks.12.attn.rope.freqs_sin', 'visual.blocks.13.attn.rope.freqs_cos', 'visual.blocks.13.attn.rope.freqs_sin', 'visual.blocks.14.attn.rope.freqs_cos', 'visual.blocks.14.attn.rope.freqs_sin', 'visual.blocks.15.attn.rope.freqs_cos', 'visual.blocks.15.attn.rope.freqs_sin', 'visual.blocks.16.attn.rope.freqs_cos', 'visual.blocks.16.attn.rope.freqs_sin', 'visual.blocks.17.attn.rope.freqs_cos', 'visual.blocks.17.attn.rope.freqs_sin', 'visual.blocks.18.attn.rope.freqs_cos', 'visual.blocks.18.attn.rope.freqs_sin', 'visual.blocks.19.attn.rope.freqs_cos', 'visual.blocks.19.attn.rope.freqs_sin', 'visual.blocks.20.attn.rope.freqs_cos', 'visual.blocks.20.attn.rope.freqs_sin', 'visual.blocks.21.attn.rope.freqs_cos', 'visual.blocks.21.attn.rope.freqs_sin', 'visual.blocks.22.attn.rope.freqs_cos', 'visual.blocks.22.attn.rope.freqs_sin', 'visual.blocks.23.attn.rope.freqs_cos', 'visual.blocks.23.attn.rope.freqs_sin']
Fetching 6 files: 100%|████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 2575.30it/s]
Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CUDAExecutionProvider': {'device_id': '0', 'has_user_compute_stream': '0', 'cudnn_conv1d_pad_to_nc1d': '0', 'user_compute_stream': '0', 'gpu_external_alloc': '0', 'gpu_mem_limit': '18446744073709551615', 'enable_cuda_graph': '0', 'gpu_external_free': '0', 'gpu_external_empty_cache': '0', 'arena_extend_strategy': 'kNextPowerOfTwo', 'cudnn_conv_algo_search': 'EXHAUSTIVE', 'do_copy_in_default_stream': '1', 'cudnn_conv_use_max_workspace': '1', 'tunable_op_enable': '0', 'tunable_op_tuning_enable': '0', 'tunable_op_max_tuning_duration_ms': '0', 'enable_skip_layer_norm_strict_mode': '0', 'prefer_nhwc': '0', 'use_ep_level_unified_stream': '0', 'use_tf32': '1'}, 'CPUExecutionProvider': {}}
find model: .\models\antelopev2\1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CUDAExecutionProvider': {'device_id': '0', 'has_user_compute_stream': '0', 'cudnn_conv1d_pad_to_nc1d': '0', 'user_compute_stream': '0', 'gpu_external_alloc': '0', 'gpu_mem_limit': '18446744073709551615', 'enable_cuda_graph': '0', 'gpu_external_free': '0', 'gpu_external_empty_cache': '0', 'arena_extend_strategy': 'kNextPowerOfTwo', 'cudnn_conv_algo_search': 'EXHAUSTIVE', 'do_copy_in_default_stream': '1', 'cudnn_conv_use_max_workspace': '1', 'tunable_op_enable': '0', 'tunable_op_tuning_enable': '0', 'tunable_op_max_tuning_duration_ms': '0', 'enable_skip_layer_norm_strict_mode': '0', 'prefer_nhwc': '0', 'use_ep_level_unified_stream': '0', 'use_tf32': '1'}, 'CPUExecutionProvider': {}}
find model: .\models\antelopev2\2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CUDAExecutionProvider': {'device_id': '0', 'has_user_compute_stream': '0', 'cudnn_conv1d_pad_to_nc1d': '0', 'user_compute_stream': '0', 'gpu_external_alloc': '0', 'gpu_mem_limit': '18446744073709551615', 'enable_cuda_graph': '0', 'gpu_external_free': '0', 'gpu_external_empty_cache': '0', 'arena_extend_strategy': 'kNextPowerOfTwo', 'cudnn_conv_algo_search': 'EXHAUSTIVE', 'do_copy_in_default_stream': '1', 'cudnn_conv_use_max_workspace': '1', 'tunable_op_enable': '0', 'tunable_op_tuning_enable': '0', 'tunable_op_max_tuning_duration_ms': '0', 'enable_skip_layer_norm_strict_mode': '0', 'prefer_nhwc': '0', 'use_ep_level_unified_stream': '0', 'use_tf32': '1'}, 'CPUExecutionProvider': {}}
find model: .\models\antelopev2\genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0
Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CUDAExecutionProvider': {'device_id': '0', 'has_user_compute_stream': '0', 'cudnn_conv1d_pad_to_nc1d': '0', 'user_compute_stream': '0', 'gpu_external_alloc': '0', 'gpu_mem_limit': '18446744073709551615', 'enable_cuda_graph': '0', 'gpu_external_free': '0', 'gpu_external_empty_cache': '0', 'arena_extend_strategy': 'kNextPowerOfTwo', 'cudnn_conv_algo_search': 'EXHAUSTIVE', 'do_copy_in_default_stream': '1', 'cudnn_conv_use_max_workspace': '1', 'tunable_op_enable': '0', 'tunable_op_tuning_enable': '0', 'tunable_op_max_tuning_duration_ms': '0', 'enable_skip_layer_norm_strict_mode': '0', 'prefer_nhwc': '0', 'use_ep_level_unified_stream': '0', 'use_tf32': '1'}, 'CPUExecutionProvider': {}}
find model: .\models\antelopev2\glintr100.onnx recognition ['None', 3, 112, 112] 127.5 127.5
Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CUDAExecutionProvider': {'device_id': '0', 'has_user_compute_stream': '0', 'cudnn_conv1d_pad_to_nc1d': '0', 'user_compute_stream': '0', 'gpu_external_alloc': '0', 'gpu_mem_limit': '18446744073709551615', 'enable_cuda_graph': '0', 'gpu_external_free': '0', 'gpu_external_empty_cache': '0', 'arena_extend_strategy': 'kNextPowerOfTwo', 'cudnn_conv_algo_search': 'EXHAUSTIVE', 'do_copy_in_default_stream': '1', 'cudnn_conv_use_max_workspace': '1', 'tunable_op_enable': '0', 'tunable_op_tuning_enable': '0', 'tunable_op_max_tuning_duration_ms': '0', 'enable_skip_layer_norm_strict_mode': '0', 'prefer_nhwc': '0', 'use_ep_level_unified_stream': '0', 'use_tf32': '1'}, 'CPUExecutionProvider': {}}
find model: .\models\antelopev2\scrfd_10g_bnkps.onnx detection [1, 3, '?', '?'] 127.5 128.0
set det-size: (640, 640)
Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CUDAExecutionProvider': {'device_id': '0', 'has_user_compute_stream': '0', 'cudnn_conv1d_pad_to_nc1d': '0', 'user_compute_stream': '0', 'gpu_external_alloc': '0', 'gpu_mem_limit': '18446744073709551615', 'enable_cuda_graph': '0', 'gpu_external_free': '0', 'gpu_external_empty_cache': '0', 'arena_extend_strategy': 'kNextPowerOfTwo', 'cudnn_conv_algo_search': 'EXHAUSTIVE', 'do_copy_in_default_stream': '1', 'cudnn_conv_use_max_workspace': '1', 'tunable_op_enable': '0', 'tunable_op_tuning_enable': '0', 'tunable_op_max_tuning_duration_ms': '0', 'enable_skip_layer_norm_strict_mode': '0', 'prefer_nhwc': '0', 'use_ep_level_unified_stream': '0', 'use_tf32': '1'}, 'CPUExecutionProvider': {}}
loading from id_adapter
loading from id_adapter_attn_layers
Running on local URL: http://0.0.0.0:7865
INFO:httpx:HTTP Request: GET https://checkip.amazonaws.com/ "HTTP/1.1 200 "
INFO:httpx:HTTP Request: GET https://api.gradio.app/pkg-version "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET http://localhost:7865/startup-events "HTTP/1.1 502 Bad Gateway"
INFO:httpx:HTTP Request: HEAD http://localhost:7865/ "HTTP/1.1 502 Bad Gateway"
INFO:httpx:HTTP Request: HEAD http://localhost:7865/ "HTTP/1.1 502 Bad Gateway"
INFO:httpx:HTTP Request: HEAD http://localhost:7865/ "HTTP/1.1 502 Bad Gateway"
INFO:httpx:HTTP Request: HEAD http://localhost:7865/ "HTTP/1.1 502 Bad Gateway"
INFO:httpx:HTTP Request: HEAD http://localhost:7865/ "HTTP/1.1 502 Bad Gateway"
INFO:httpx:HTTP Request: GET https://api.gradio.app/v2/tunnel-request "HTTP/1.1 200 OK"
Running on public URL: https://b1de752db43ff88cfe.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run gradio deploy from Terminal to deploy to Spaces (https://huggingface.co/spaces)

Stores Picture?

Hello, I am discovering your service.
Concerning the photos that we send, are they stored on a server or everything is done locally with the GPU of the phone or computer ?

want use it in comfyui on mac M2

out of CUDA memery in 3060 12GB GPU

This is my errors:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 22.00 MiB (GPU 0; 12.00 GiB total capacity; 10.89 GiB already allocated; 0 bytes free; 11.31 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

PLatform:WIN10 / 3060 12G GPU

When I change Num samples to 1,There's error too. What should I Change In code? Thanks!

Loading a custom Stable Diffusion model

I wanted to load a custom stable diffusion model, and also to load custom lora weight instead of loading SDXL Lightning.
Is there a method, or a planned method, to load custom pipeline or/and lora weight into the PuLIDPipeline method.

I did read the issues #7 and implemented a custom model inside pipeline.py but I would be preferable to load it on your own script, those to change easily the model.

How can I replicate the results presented in the paper?

Does this only generates portrait? Could this work with OpenPose?

hey all, great project here. Will this will work OpenPose? Thanks!

what is the difference between using multiple id images and id mix

question about fig2 in the paper.

Hello, thanks for your incredible work!

In the 'Accurate ID Loss' section in the bottom right corner of Figure 2 of the paper, there are two generated images both denoted as 'predict x_0'. Are both of these images produced by the Lighting T2I? I guess they represent T2I w/ ID and T2I w/o ID, respectively. However, upon closer inspection, it appears that the IDs of both images are well-preserved, which contradicts my speculation. What are these two images' actual meanings and why do you connect them with a vertical line?

模型更建议

你好，考虑在playgroundai 2.5上面做一个新版吗

Model weights license

I understand you are using arcface, but only to calculate the loss during training, so I am not sure what the license of the checkpoints/model weights are?

MacOS Installation Issue

requirements.txt

ERROR: Could not find a version that satisfies the requirement onnxruntime-gpu (from versions: none)
ERROR: No matching distribution found for onnxruntime-gpu

app.py

AssertionError: Torch not compiled with CUDA enabled

Is it easy to install?

We don't want a web user or COMFYUI face changer with all the cumbersome dependencies that need to be installed, we just want a ROPE tool that allows users to upload pictures and do a face changer

The image returned from the replicate api is a completely black image, but the background shows that the loading is successful.

They all seem to be successful, whether it is API or web, but without exception they are black

[Feature Request]merge into diffusers so as more developers and integrate PulID into there system

Hi, I'm a developer working on SD and its relevant pipelines. PulID is a great tool for maintaining fidelity when generating new images. However, I find the code difficult to integrate with the Diffusers pipeline.

I would appreciate it if you could make it part of the Diffusers library.

This is a very elegant and practical ID customization solution, must star

Explanation for NUM_ZERO, ORTHO and ORTHO_v2

As title, I cannot find any detail about these inference trick in the paper. Especially for fidelity and extremely style, you use different settings.

Here is my understandings, not sure whether they are correct.
(1) For NUM_ZERO, you actually add some zero tokens to make it possible that the query discard ID information (maybe better to keep the background uncontaminated? But it is in an implicit manner.)
(2) For ORTHO or ORTHO_v2, you calculate the projection of ID_hidden_state to hidden_state, then orthogonal = id_hidden_states - projection is to obtain more disentangled ID information. Is this a experimental finding?

PuLID now replaces every face in the output/picture

I took the latest update Mikubill/sd-webui-controlnet#2891 and I am noticing if there are is a male and female in a picture then PuLID replaces every face with the one mentioned in ControlNet. Previously, the male/female faces were getting applied correctly. In other words, gender detection is messed up :P

Guide me with how can I choose a gender or ID of a person to apply PuLID. Thanks

question about pulid 1.1

Hello, first of all, thank you very much for your efforts. The PuLID v1.1 preview is scheduled to be released, and I was wondering if this pretrained model is simply a model trained using a different base model? Or are there any additional structural changes, such as changes in the model architecture?

huggingface demo generate error

hello there

Hello!
I really love the Replicate PuLID portrait generator what I am trying now. could you recommend some tutorial to find out how to use the settings to fine tune the output? like how I can mix 2 images, how I can maximise the code.
thanks!

Will you released v1.1 model? Where to download it?

Training script？

the multi ID and controllnet?

Thank you for sharing. Do you have plans for the multi ID input and controllnet?

Please change base model to latest fine tuned sdxl model like juggernaut x or any other model

pipeline.py

class PuLIDPipeline:
    def __init__(self, *args, **kwargs):
        super().__init__()
        self.device = 'cuda'
        sdxl_base_repo = 'stabilityai/stable-diffusion-xl-base-1.0'
        sdxl_lightning_repo = 'ByteDance/SDXL-Lightning'
        self.sdxl_base_repo = sdxl_base_repo

How to get id loss model /phi ?

Lid = CosSim(/phi(Cid), /phi(...))

python从3.8到3.10装一遍了，配置了环境都是提示这个结尾，是彩蛋故意的么？

Potential VRAM Leak from facexlib and evaclip

I am trying to implement PuLID in A1111 here: Mikubill/sd-webui-controlnet#2838

However, I found that the vram management of facexlib and evaclip is just very broken. In total they allocate about 3GB of vram and if you move their model to cpu, only about 1GB is freed. Another finding is that if you use really large image and directly pass it to facexlib, each time facexlib runs, the vram will increase significantly depending on input image size. I think this is a strong signal of vram leak. If you use small input files, the effect is probably unnoticeable.

Maybe I should file an issue in facexlib repo, but anyway just let people know.

tothebeginning / pulid Goto Github PK

pulid's People

Contributors

Stargazers

Watchers

Forkers

pulid's Issues

Recommend Projects

Recommend Topics

Recommend Org