irisrainbowneko / hcp-diffusion Goto Github PK

View Code? Open in Web Editor NEW

876.0 20.0 74.0 2.97 MB

A universal Stable-Diffusion toolbox

License: Apache License 2.0

Python 100.00%

hcp-diffusion's Introduction

HCP-Diffusion

📘中文说明

📘English document 📘中文文档

Introduction

HCP-Diffusion is a toolbox for Stable Diffusion models based on 🤗 Diffusers. It facilitates flexiable configurations and component support for training, in comparison with webui and sd-scripts.

This toolbox supports Colossal-AI, which can significantly reduce GPU memory usage.

HCP-Diffusion can unify existing training methods for text-to-image generation (e.g., Prompt-tuning, Textual Inversion, DreamArtist, Fine-tuning, DreamBooth, LoRA, ControlNet, etc) and model structures through a single .yaml configuration file.

The toolbox has also implemented an upgraded version of DreamArtist with LoRA, named DreamArtist++, for one-shot text-to-image generation. Compared to DreamArtist, DreamArtist++ is more stable with higher image quality and generation controllability, and faster training speed.

Features

Layer-wise LoRA (with Conv2d)
Layer-wise fine-tuning
Layer-wise model ensemble
Prompt-tuning with multiple words
DreamArtist and DreamArtist++
Aspect Ratio Bucket (ARB) with automatic clustering
Multiple datasets with multiple data sources
Image attention mask
Word attention multiplier
Custom words that occupy multiple words
Maximum sentence length expansion
🤗 Accelerate
Colossal-AI
xFormers for UNet and text-encoder
CLIP skip
Tag shuffle and dropout
Safetensors support
Controlnet (support training)
Min-SNR loss
Custom optimizer (Lion, DAdaptation, pytorch-optimizer, ...)
Custom lr scheduler
SDXL support

Install

Install with pip:

pip install hcpdiff
# Start a new project and make initialization
hcpinit

Install from source:

git clone https://github.com/7eu7d7/HCP-Diffusion.git
cd HCP-Diffusion
pip install -e .
# Modified based on this project or start a new project and make initialization
## hcpinit

To use xFormers to reduce VRAM usage and accelerate training:

# use conda
conda install xformers -c xformers

# use pip
pip install xformers>=0.0.17

User guidance

Training

Training scripts based on 🤗 Accelerate or Colossal-AI are provided.

For 🤗 Accelerate, you may need to configure the environment before launching the scripts.
For Colossal-AI, you can use torchrun to launch the scripts.

# with Accelerate
accelerate launch -m hcpdiff.train_ac --cfg cfgs/train/cfg_file.yaml
# with Accelerate and only one GPU
accelerate launch -m hcpdiff.train_ac_single --cfg cfgs/train/cfg_file.yaml
# with Colossal-AI
# pip install colossalai
torchrun --nproc_per_node 1 -m hcpdiff.train_colo --cfg cfgs/train/cfg_file.yaml

Inference

python -m hcpdiff.visualizer --cfg cfgs/infer/cfg.yaml pretrained_model=pretrained_model_path \
        prompt='positive_prompt' \
        neg_prompt='negative_prompt' \
        seed=42

Conversion of Stable Diffusion models

The framework is based on 🤗 Diffusers. So it needs to convert the original Stable Diffusion model into a supported format using the scripts provided by 🤗 Diffusers.

Download the config file
Convert models based on config file

python -m hcpdiff.tools.sd2diffusers \
    --checkpoint_path "path_to_stable_diffusion_model" \
    --original_config_file "path_to_config_file" \
    --dump_path "save_directory" \
    [--extract_ema] # Extract EMA model
    [--from_safetensors] # Whether the original model is in safetensors format
    [--to_safetensors] # Whether to save to safetensors format

Convert VAE:

python -m hcpdiff.tools.sd2diffusers \
    --vae_pt_path "path_to_VAE_model" \
    --original_config_file "path_to_config_file" \
    --dump_path "save_directory"
    [--from_safetensors]

Tutorials

Contributing

You are welcome to contribute more models and features to this toolbox!

Team

This toolbox is maintained by HCP-Lab, SYSU [GitHub].

Citation

@article{DBLP:journals/corr/abs-2211-11337,
  author    = {Ziyi Dong and
               Pengxu Wei and
               Liang Lin},
  title     = {DreamArtist: Towards Controllable One-Shot Text-to-Image Generation
               via Positive-Negative Prompt-Tuning},
  journal   = {CoRR},
  volume    = {abs/2211.11337},
  year      = {2022},
  doi       = {10.48550/arXiv.2211.11337},
  eprinttype = {arXiv},
  eprint    = {2211.11337},
}

hcp-diffusion's People

Contributors

Stargazers

Watchers

hcp-diffusion's Issues

RuntimeError: shape '[616, 1, 40]' is invalid for input of size 49280

╭─────────────────── Traceback (most recent call last) ────────────────────╮
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/runpy.py:196 in           │
│ _run_module_as_main                                                      │
│                                                                          │
│   193 │   main_globals = sys.modules["__main__"].__dict__                │
│   194 │   if alter_argv:                                                 │
│   195 │   │   sys.argv[0] = mod_spec.origin                              │
│ ❱ 196 │   return _run_code(code, main_globals, None,                     │
│   197 │   │   │   │   │    "__main__", mod_spec)                         │
│   198                                                                    │
│   199 def run_module(mod_name, init_globals=None,                        │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/runpy.py:86 in _run_code  │
│                                                                          │
│    83 │   │   │   │   │      __loader__ = loader,                        │
│    84 │   │   │   │   │      __package__ = pkg_name,                     │
│    85 │   │   │   │   │      __spec__ = mod_spec)                        │
│ ❱  86 │   exec(code, run_globals)                                        │
│    87 │   return run_globals                                             │
│    88                                                                    │
│    89 def _run_module_code(code, init_globals=None,                      │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/hcpdiff/tra │
│ in_ac_single.py:105 in <module>                                          │
│                                                                          │
│   102 │                                                                  │
│   103 │   conf = load_config_with_cli(args.cfg, args_list=sys.argv[3:])  │
│   104 │   trainer = TrainerSingleCard(conf)                              │
│ ❱ 105 │   trainer.train()                                                │
│   106                                                                    │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/hcpdiff/tra │
│ in_ac.py:409 in train                                                    │
│                                                                          │
│   406 │   │                                                              │
│   407 │   │   loss_sum = np.ones(30)                                     │
│   408 │   │   for data_list in self.train_loader_group:                  │
│ ❱ 409 │   │   │   loss = self.train_one_step(data_list)                  │
│   410 │   │   │   loss_sum[self.global_step%len(loss_sum)] = loss        │
│   411 │   │   │                                                          │
│   412 │   │   │   self.global_step += 1                                  │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/hcpdiff/tra │
│ in_ac.py:501 in train_one_step                                           │
│                                                                          │
│   498 │   │   │   │   other_datas = {k:v.to(self.device, dtype=self.weig │
│   499 │   │   │   │                                                      │
│   500 │   │   │   │   latents = self.get_latents(image, self.train_loade │
│ ❱ 501 │   │   │   │   model_pred, target, timesteps = self.forward(laten │
│   502 │   │   │   │   loss = self.get_loss(model_pred, target, timesteps │
│   503 │   │   │   │   self.accelerator.backward(loss)                    │
│   504                                                                    │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/hcpdiff/tra │
│ in_ac.py:479 in forward                                                  │
│                                                                          │
│   476 │   │                                                              │
│   477 │   │   # CFG context for DreamArtist                              │
│   478 │   │   noisy_latents, timesteps = self.cfg_context.pre(noisy_late │
│ ❱ 479 │   │   model_pred = self.encode_decode(prompt_ids, noisy_latents, │
│   480 │   │   model_pred = self.cfg_context.post(model_pred)             │
│   481 │   │                                                              │
│   482 │   │   # Get the target for loss depending on the prediction type │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/hcpdiff/tra │
│ in_ac_single.py:78 in encode_decode                                      │
│                                                                          │
│    75 │   │   │   │   feeder(input_all)                                  │
│    76 │   │                                                              │
│    77 │   │   encoder_hidden_states = self.text_encoder(prompt_ids, outp │
│ ❱  78 │   │   model_pred = self.unet(noisy_latents, timesteps, encoder_h │
│    79 │   │   return model_pred                                          │
│    80 │                                                                  │
│    81 │   def get_loss(self, model_pred, target, timesteps, att_mask):   │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/torch/nn/mo │
│ dules/module.py:1130 in _call_impl                                       │
│                                                                          │
│   1127 │   │   # this function, and just call forward.                   │
│   1128 │   │   if not (self._backward_hooks or self._forward_hooks or se │
│   1129 │   │   │   │   or _global_forward_hooks or _global_forward_pre_h │
│ ❱ 1130 │   │   │   return forward_call(*input, **kwargs)                 │
│   1131 │   │   # Do not call functions when jit is used                  │
│   1132 │   │   full_backward_hooks, non_full_backward_hooks = [], []     │
│   1133 │   │   if self._backward_hooks or _global_backward_hooks:        │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/accelerate/ │
│ utils/operations.py:553 in forward                                       │
│                                                                          │
│   550 │   model_forward = ConvertOutputsToFp32(model_forward)            │
│   551 │                                                                  │
│   552 │   def forward(*args, **kwargs):                                  │
│ ❱ 553 │   │   return model_forward(*args, **kwargs)                      │
│   554 │                                                                  │
│   555 │   # To act like a decorator so that it can be popped when doing  │
│   556 │   forward.__wrapped__ = model_forward                            │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/accelerate/ │
│ utils/operations.py:541 in __call__                                      │
│                                                                          │
│   538 │   │   update_wrapper(self, model_forward)                        │
│   539 │                                                                  │
│   540 │   def __call__(self, *args, **kwargs):                           │
│ ❱ 541 │   │   return convert_to_fp32(self.model_forward(*args, **kwargs) │
│   542 │                                                                  │
│   543 │   def __getstate__(self):                                        │
│   544 │   │   raise pickle.PicklingError(                                │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/torch/amp/a │
│ utocast_mode.py:12 in decorate_autocast                                  │
│                                                                          │
│     9 │   @functools.wraps(func)                                         │
│    10 │   def decorate_autocast(*args, **kwargs):                        │
│    11 │   │   with autocast_instance:                                    │
│ ❱  12 │   │   │   return func(*args, **kwargs)                           │
│    13 │   decorate_autocast.__script_unsupported = '@autocast() decorato │
│    14 │   return decorate_autocast                                       │
│    15                                                                    │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/diffusers/m │
│ odels/unet_2d_condition.py:481 in forward                                │
│                                                                          │
│   478 │   │   down_block_res_samples = (sample,)                         │
│   479 │   │   for downsample_block in self.down_blocks:                  │
│   480 │   │   │   if hasattr(downsample_block, "has_cross_attention") an │
│ ❱ 481 │   │   │   │   sample, res_samples = downsample_block(            │
│   482 │   │   │   │   │   hidden_states=sample,                          │
│   483 │   │   │   │   │   temb=emb,                                      │
│   484 │   │   │   │   │   encoder_hidden_states=encoder_hidden_states,   │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/torch/nn/mo │
│ dules/module.py:1130 in _call_impl                                       │
│                                                                          │
│   1127 │   │   # this function, and just call forward.                   │
│   1128 │   │   if not (self._backward_hooks or self._forward_hooks or se │
│   1129 │   │   │   │   or _global_forward_hooks or _global_forward_pre_h │
│ ❱ 1130 │   │   │   return forward_call(*input, **kwargs)                 │
│   1131 │   │   # Do not call functions when jit is used                  │
│   1132 │   │   full_backward_hooks, non_full_backward_hooks = [], []     │
│   1133 │   │   if self._backward_hooks or _global_backward_hooks:        │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/diffusers/m │
│ odels/unet_2d_blocks.py:781 in forward                                   │
│                                                                          │
│    778 │   │   │   │   │   return custom_forward                         │
│    779 │   │   │   │                                                     │
│    780 │   │   │   │   hidden_states = torch.utils.checkpoint.checkpoint │
│ ❱  781 │   │   │   │   hidden_states = torch.utils.checkpoint.checkpoint │
│    782 │   │   │   │   │   create_custom_forward(attn, return_dict=False │
│    783 │   │   │   │   │   hidden_states,                                │
│    784 │   │   │   │   │   encoder_hidden_states,                        │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/hcpdiff/tra │
│ in_ac.py:48 in checkpoint_fix                                            │
│                                                                          │
│    45 # fix checkpoint bug for train part of model                       │
│    46 import torch.utils.checkpoint                                      │
│    47 def checkpoint_fix(function, *args, use_reentrant: bool = False, c │
│ ❱  48 │   return checkpoint_raw(function, *args, use_reentrant=use_reent │
│    49 torch.utils.checkpoint.checkpoint = checkpoint_fix                 │
│    50                                                                    │
│    51 class Trainer:                                                     │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/torch/utils │
│ /checkpoint.py:237 in checkpoint                                         │
│                                                                          │
│   234 │   if use_reentrant:                                              │
│   235 │   │   return CheckpointFunction.apply(function, preserve, *args) │
│   236 │   else:                                                          │
│ ❱ 237 │   │   return _checkpoint_without_reentrant(                      │
│   238 │   │   │   function,                                              │
│   239 │   │   │   preserve,                                              │
│   240 │   │   │   *args                                                  │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/torch/utils │
│ /checkpoint.py:383 in _checkpoint_without_reentrant                      │
│                                                                          │
│   380 │   │   return storage.pop(x)                                      │
│   381 │                                                                  │
│   382 │   with torch.autograd.graph.saved_tensors_hooks(pack, unpack):   │
│ ❱ 383 │   │   output = function(*args)                                   │
│   384 │   │   if torch.cuda._initialized and preserve_rng_state and not  │
│   385 │   │   │   # Cuda was not initialized before running the forward, │
│   386 │   │   │   # stash the CUDA state.                                │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/diffusers/m │
│ odels/unet_2d_blocks.py:774 in custom_forward                            │
│                                                                          │
│    771 │   │   │   │   def create_custom_forward(module, return_dict=Non │
│    772 │   │   │   │   │   def custom_forward(*inputs):                  │
│    773 │   │   │   │   │   │   if return_dict is not None:               │
│ ❱  774 │   │   │   │   │   │   │   return module(*inputs, return_dict=re │
│    775 │   │   │   │   │   │   else:                                     │
│    776 │   │   │   │   │   │   │   return module(*inputs)                │
│    777                                                                   │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/torch/nn/mo │
│ dules/module.py:1130 in _call_impl                                       │
│                                                                          │
│   1127 │   │   # this function, and just call forward.                   │
│   1128 │   │   if not (self._backward_hooks or self._forward_hooks or se │
│   1129 │   │   │   │   or _global_forward_hooks or _global_forward_pre_h │
│ ❱ 1130 │   │   │   return forward_call(*input, **kwargs)                 │
│   1131 │   │   # Do not call functions when jit is used                  │
│   1132 │   │   full_backward_hooks, non_full_backward_hooks = [], []     │
│   1133 │   │   if self._backward_hooks or _global_backward_hooks:        │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/diffusers/m │
│ odels/transformer_2d.py:265 in forward                                   │
│                                                                          │
│   262 │   │                                                              │
│   263 │   │   # 2. Blocks                                                │
│   264 │   │   for block in self.transformer_blocks:                      │
│ ❱ 265 │   │   │   hidden_states = block(                                 │
│   266 │   │   │   │   hidden_states,                                     │
│   267 │   │   │   │   encoder_hidden_states=encoder_hidden_states,       │
│   268 │   │   │   │   timestep=timestep,                                 │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/torch/nn/mo │
│ dules/module.py:1130 in _call_impl                                       │
│                                                                          │
│   1127 │   │   # this function, and just call forward.                   │
│   1128 │   │   if not (self._backward_hooks or self._forward_hooks or se │
│   1129 │   │   │   │   or _global_forward_hooks or _global_forward_pre_h │
│ ❱ 1130 │   │   │   return forward_call(*input, **kwargs)                 │
│   1131 │   │   # Do not call functions when jit is used                  │
│   1132 │   │   full_backward_hooks, non_full_backward_hooks = [], []     │
│   1133 │   │   if self._backward_hooks or _global_backward_hooks:        │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/diffusers/m │
│ odels/attention.py:307 in forward                                        │
│                                                                          │
│   304 │   │   │   )                                                      │
│   305 │   │   │                                                          │
│   306 │   │   │   # 2. Cross-Attention                                   │
│ ❱ 307 │   │   │   attn_output = self.attn2(                              │
│   308 │   │   │   │   norm_hidden_states,                                │
│   309 │   │   │   │   encoder_hidden_states=encoder_hidden_states,       │
│   310 │   │   │   │   attention_mask=attention_mask,                     │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/torch/nn/mo │
│ dules/module.py:1130 in _call_impl                                       │
│                                                                          │
│   1127 │   │   # this function, and just call forward.                   │
│   1128 │   │   if not (self._backward_hooks or self._forward_hooks or se │
│   1129 │   │   │   │   or _global_forward_hooks or _global_forward_pre_h │
│ ❱ 1130 │   │   │   return forward_call(*input, **kwargs)                 │
│   1131 │   │   # Do not call functions when jit is used                  │
│   1132 │   │   full_backward_hooks, non_full_backward_hooks = [], []     │
│   1133 │   │   if self._backward_hooks or _global_backward_hooks:        │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/diffusers/m │
│ odels/cross_attention.py:160 in forward                                  │
│                                                                          │
│   157 │   │   # The `CrossAttention` class can call different attention  │
│   158 │   │   # here we simply pass along all tensors to the selected pr │
│   159 │   │   # For standard processors that are defined here, `**cross_ │
│ ❱ 160 │   │   return self.processor(                                     │
│   161 │   │   │   self,                                                  │
│   162 │   │   │   hidden_states,                                         │
│   163 │   │   │   encoder_hidden_states=encoder_hidden_states,           │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/diffusers/m │
│ odels/cross_attention.py:374 in __call__                                 │
│                                                                          │
│   371 │   │   key = attn.head_to_batch_dim(key).contiguous()             │
│   372 │   │   value = attn.head_to_batch_dim(value).contiguous()         │
│   373 │   │                                                              │
│ ❱ 374 │   │   hidden_states = xformers.ops.memory_efficient_attention(   │
│   375 │   │   │   query, key, value, attn_bias=attention_mask, op=self.a │
│   376 │   │   )                                                          │
│   377 │   │   hidden_states = hidden_states.to(query.dtype)              │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/xformers/op │
│ s/fmha/__init__.py:192 in memory_efficient_attention                     │
│                                                                          │
│   189 │   │   and options.                                               │
│   190 │   :return: multi-head attention Tensor with shape ``[B, Mq, H, K │
│   191 │   """                                                            │
│ ❱ 192 │   return _memory_efficient_attention(                            │
│   193 │   │   Inputs(                                                    │
│   194 │   │   │   query=query, key=key, value=value, p=p, attn_bias=attn │
│   195 │   │   ),                                                         │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/xformers/op │
│ s/fmha/__init__.py:295 in _memory_efficient_attention                    │
│                                                                          │
│   292 │   │   )                                                          │
│   293 │                                                                  │
│   294 │   output_shape = inp.normalize_bmhk()                            │
│ ❱ 295 │   return _fMHA.apply(                                            │
│   296 │   │   op, inp.query, inp.key, inp.value, inp.attn_bias, inp.p, i │
│   297 │   ).reshape(output_shape)                                        │
│   298                                                                    │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/xformers/op │
│ s/fmha/__init__.py:41 in forward                                         │
│                                                                          │
│    38 │   │   op_fw = op[0] if op is not None else None                  │
│    39 │   │   op_bw = op[1] if op is not None else None                  │
│    40 │   │                                                              │
│ ❱  41 │   │   out, op_ctx = _memory_efficient_attention_forward_requires │
│    42 │   │   │   inp=inp, op=op_fw                                      │
│    43 │   │   )                                                          │
│    44                                                                    │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/xformers/op │
│ s/fmha/__init__.py:323 in                                                │
│ _memory_efficient_attention_forward_requires_grad                        │
│                                                                          │
│   320 │   │   op = _dispatch_fw(inp)                                     │
│   321 │   else:                                                          │
│   322 │   │   _ensure_op_supports_or_raise(ValueError, "memory_efficient │
│ ❱ 323 │   out = op.apply(inp, needs_gradient=True)                       │
│   324 │   assert out[1] is not None                                      │
│   325 │   return (out[0].reshape(output_shape), out[1])                  │
│   326                                                                    │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/xformers/op │
│ s/fmha/flash.py:235 in apply                                             │
│                                                                          │
│   232 │   │   │   max_seqlen_q,                                          │
│   233 │   │   │   cu_seqlens_k,                                          │
│   234 │   │   │   max_seqlen_k,                                          │
│ ❱ 235 │   │   ) = _convert_input_format(inp)                             │
│   236 │   │   out, softmax_lse, rng_state = cls.OPERATOR(                │
│   237 │   │   │   inp.query,                                             │
│   238 │   │   │   inp.key,                                               │
│                                                                          │
│ /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/xformers/op │
│ s/fmha/flash.py:177 in _convert_input_format                             │
│                                                                          │
│   174 │   new_inp = replace(                                             │
│   175 │   │   inp,                                                       │
│   176 │   │   query=query.reshape([batch * seqlen_q, num_heads, head_dim │
│ ❱ 177 │   │   key=key.reshape([batch * seqlen_kv, num_heads, head_dim_q] │
│   178 │   │   value=value.reshape([batch * seqlen_kv, num_heads, head_di │
│   179 │   )                                                              │
│   180 │   softmax_scale = inp.query.shape[-1] ** (-0.5) if inp.scale is  │
╰──────────────────────────────────────────────────────────────────────────╯
RuntimeError: shape '[616, 1, 40]' is invalid for input of size 49280       
``` `

finetune unet训练得到safetensors支持直接推理吗？

@Coxy7 @KohakuBlueleaf @7eu7d7 finetune unet训练得到的unet-100.safetensors，还需要使用diffuser的脚本转换成diffuser的格式才能推理吗？

lora_convert.py --to_webui没有保存转换后的state而是保存了转换前的TextEncoder的state

这里仍然保存了转换前的sd_TE而不是转换后的state

diff --git a/hcpdiff/tools/lora_convert.py b/hcpdiff/tools/lora_convert.py
index 8242ea9..25228eb 100644
--- a/hcpdiff/tools/lora_convert.py
+++ b/hcpdiff/tools/lora_convert.py
@@ -83,5 +83,5 @@ if __name__ == '__main__':
         sd_unet = ckpt_manager.load_ckpt(args.lora_path)
         sd_TE = ckpt_manager.load_ckpt(args.lora_path_TE)
         state = converter.convert_to_webui(sd_unet['lora'], sd_TE['lora'])
-        ckpt_manager._save_ckpt(sd_TE, save_path=args.dump_path)
+        ckpt_manager._save_ckpt(state, save_path=args.dump_path)
         print('save lora to:', args.dump_path)
\ No newline at end of file

training lora encouter error about Tensor

follow this article
Error Message:
File "/root/miniconda3/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 1212, in validate
raise Exception(
Exception: Please convert all Tensors to FakeTensors first or instantiate FakeTensorMode with 'allow_non_fake_inputs'. Found in aten.clone.default(*(tensor([255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 82, 191, 1, 0, 0, 0, 0, 0, 72, 3, 0, 0,
0, 0, 0, 0], dtype=torch.uint8),), **{'memory_format': torch.contiguous_format})

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/root/miniconda3/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 324, in _compile
out_code = transform_code_object(code, transform)
File "/root/miniconda3/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 445, in transform_code_object
transformations(instructions, code_options)
File "/root/miniconda3/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 311, in transform
tracer.run()
File "/root/miniconda3/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1726, in run
super().run()
File "/root/miniconda3/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 576, in run
and self.step()
File "/root/miniconda3/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 540, in step
getattr(self, inst.opname)(inst)
File "/root/miniconda3/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1792, in RETURN_VALUE
self.output.compile_subgraph(
File "/root/miniconda3/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 517, in compile_subgraph
self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root)
File "/root/miniconda3/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 588, in compile_and_call_fx_graph
compiled_fn = self.call_user_compiler(gm)
File "/root/miniconda3/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 163, in time_wrapper
r = func(args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 675, in call_user_compiler
raise BackendCompilerFailed(self.compiler_fn, e) from e
torch._dynamo.exc.BackendCompilerFailed: debug_wrapper raised Exception: Please convert all Tensors to FakeTensors first or instantiate FakeTensorMode with 'allow_non_fake_inputs'. Found in aten.clone.default((tensor([255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 82, 191, 1, 0, 0, 0, 0, 0, 72, 3, 0, 0,
0, 0, 0, 0], dtype=torch.uint8),), **{'memory_format': torch.contiguous_format})

[2023-11-09 16:43:07,786] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT custom_forward /root/miniconda3/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py line 2203
2204 0 LOAD_DEREF 1 (return_dict)
2 LOAD_CONST 0 (None)
4 IS_OP 1
6 POP_JUMP_IF_FALSE 11 (to 22)

2205 8 LOAD_DEREF 0 (module)
10 LOAD_FAST 0 (inputs)
12 LOAD_CONST 1 ('return_dict')
14 LOAD_DEREF 1 (return_dict)
16 BUILD_MAP 1
18 CALL_FUNCTION_EX 1
20 RETURN_VALUE

2207 >> 22 LOAD_DEREF 0 (module)
24 LOAD_FAST 0 (inputs)
26 CALL_FUNCTION_EX 0
28 RETURN_VALUE

========== TorchDynamo Stack Trace ==========
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.10/site-packages/torch/random.py", line 133, in fork_rng
yield
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 414, in unpack
set_device_states(fwd_gpu_devices, fwd_gpu_states)
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 58, in set_device_states
torch.cuda.set_rng_state(state)
File "/root/miniconda3/lib/python3.10/site-packages/torch/cuda/random.py", line 51, in set_rng_state
new_state_copy = new_state.clone(memory_format=torch.contiguous_format)
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_stats.py", line 20, in wrapper
return fn(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/fx/experimental/proxy_tensor.py", line 487, in torch_dispatch
return self.inner_torch_dispatch(func, types, args, kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/fx/experimental/proxy_tensor.py", line 512, in inner_torch_dispatch
out = proxy_call(self, func, args, kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/fx/experimental/proxy_tensor.py", line 345, in proxy_call
out = func(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/_ops.py", line 287, in call
return self._op(*args, **kwargs or {})
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_stats.py", line 20, in wrapper
return fn(args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 987, in torch_dispatch
return self.dispatch(func, types, args, kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 1066, in dispatch
args, kwargs = self.validate_and_convert_non_fake_tensors(
File "/root/miniconda3/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 1220, in validate_and_convert_non_fake_tensors
return tree_map_only(
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_pytree.py", line 266, in tree_map_only
return tree_map(map_only(ty)(fn), pytree)
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_pytree.py", line 196, in tree_map
return tree_unflatten([fn(i) for i in flat_args], spec)
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_pytree.py", line 196, in
return tree_unflatten([fn(i) for i in flat_args], spec)
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_pytree.py", line 247, in inner
return f(x)
File "/root/miniconda3/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 1212, in validate
raise Exception(
Exception: Please convert all Tensors to FakeTensors first or instantiate FakeTensorMode with 'allow_non_fake_inputs'. Found in aten.clone.default((tensor([255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 82, 191, 1, 0, 0, 0, 0, 0, 72, 3, 0, 0,
0, 0, 0, 0], dtype=torch.uint8),), **{'memory_format': torch.contiguous_format})

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/miniconda3/lib/python3.10/site-packages/torch/dynamo/output_graph.py", line 670, in call_user_compiler
compiled_fn = compiler_fn(gm, self.fake_example_inputs())
File "/root/miniconda3/lib/python3.10/site-packages/torch/dynamo/debug_utils.py", line 1055, in debug_wrapper
compiled_gm = compiler_fn(gm, example_inputs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/init.py", line 1390, in call
return compile_fx(model, inputs, config_patches=self.config)
File "/root/miniconda3/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 455, in compile_fx
return aot_autograd(
File "/root/miniconda3/lib/python3.10/site-packages/torch/_dynamo/backends/common.py", line 48, in compiler_fn
cg = aot_module_simplified(gm, example_inputs, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 2822, in aot_module_simplified
compiled_fn = create_aot_dispatcher_function(
File "/root/miniconda3/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 163, in time_wrapper
r = func(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 2515, in create_aot_dispatcher_function
compiled_fn = compiler_fn(flat_fn, fake_flat_args, aot_config)
File "/root/miniconda3/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1715, in aot_wrapper_dedupe
return compiler_fn(flat_fn, leaf_flat_args, aot_config)
File "/root/miniconda3/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 2104, in aot_dispatch_autograd
fx_g = make_fx(joint_forward_backward, aot_config.decompositions)(
File "/root/miniconda3/lib/python3.10/site-packages/torch/fx/experimental/proxy_tensor.py", line 714, in wrapped
t = dispatch_trace(wrap_key(func, args, fx_tracer), tracer=fx_tracer, concrete_args=tuple(phs))
File "/root/miniconda3/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 209, in _fn
return fn(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/fx/experimental/proxy_tensor.py", line 443, in dispatch_trace
graph = tracer.trace(root, concrete_args)
File "/root/miniconda3/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 209, in _fn
return fn(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/fx/_symbolic_trace.py", line 778, in trace
(self.create_arg(fn(*args)),),
File "/root/miniconda3/lib/python3.10/site-packages/torch/fx/_symbolic_trace.py", line 652, in flatten_fn
tree_out = root_fn(*tree_args)
File "/root/miniconda3/lib/python3.10/site-packages/torch/fx/experimental/proxy_tensor.py", line 459, in wrapped
out = f(*tensors)
File "/root/miniconda3/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1158, in traced_joint
return functionalized_f_helper(primals, tangents)
File "/root/miniconda3/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1110, in functionalized_f_helper
f_outs = flat_fn_no_input_mutations(fn, f_primals, f_tangents, meta, keep_input_mutations)
File "/root/miniconda3/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1078, in flat_fn_no_input_mutations
outs = flat_fn_with_synthetic_bases_expanded(fn, primals, primals_after_cloning, maybe_tangents, meta, keep_input_mutations)
File "/root/miniconda3/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1050, in flat_fn_with_synthetic_bases_expanded
outs = forward_or_joint(fn, primals_before_cloning, primals, maybe_tangents, meta, keep_input_mutations)
File "/root/miniconda3/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1019, in forward_or_joint
backward_out = torch.autograd.grad(
File "/root/miniconda3/lib/python3.10/site-packages/torch/autograd/init.py", line 269, in grad
return handle_torch_function(
File "/root/miniconda3/lib/python3.10/site-packages/torch/overrides.py", line 1534, in handle_torch_function
result = mode.torch_function(public_api, types, args, kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/_inductor/overrides.py", line 38, in torch_function
return func(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/autograd/init.py", line 303, in grad
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 410, in unpack
with torch.random.fork_rng(devices=rng_devices, enabled=preserve_rng_state):
File "/root/miniconda3/lib/python3.10/contextlib.py", line 153, in exit
self.gen.throw(typ, value, traceback)
File "/root/miniconda3/lib/python3.10/site-packages/torch/random.py", line 137, in fork_rng
torch.cuda.set_rng_state(gpu_rng_state, device)
File "/root/miniconda3/lib/python3.10/site-packages/torch/cuda/random.py", line 51, in set_rng_state
new_state_copy = new_state.clone(memory_format=torch.contiguous_format)
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_stats.py", line 20, in wrapper
return fn(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/fx/experimental/proxy_tensor.py", line 487, in torch_dispatch
return self.inner_torch_dispatch(func, types, args, kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/fx/experimental/proxy_tensor.py", line 512, in inner_torch_dispatch
out = proxy_call(self, func, args, kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/fx/experimental/proxy_tensor.py", line 345, in proxy_call
out = func(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/_ops.py", line 287, in call
return self._op(*args, **kwargs or {})
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_stats.py", line 20, in wrapper
return fn(args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 987, in torch_dispatch
return self.dispatch(func, types, args, kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 1066, in dispatch
args, kwargs = self.validate_and_convert_non_fake_tensors(
File "/root/miniconda3/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 1220, in validate_and_convert_non_fake_tensors
return tree_map_only(
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_pytree.py", line 266, in tree_map_only
return tree_map(map_only(ty)(fn), pytree)
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_pytree.py", line 196, in tree_map
return tree_unflatten([fn(i) for i in flat_args], spec)
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_pytree.py", line 196, in
return tree_unflatten([fn(i) for i in flat_args], spec)
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_pytree.py", line 247, in inner
return f(x)
File "/root/miniconda3/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 1212, in validate
raise Exception(
Exception: Please convert all Tensors to FakeTensors first or instantiate FakeTensorMode with 'allow_non_fake_inputs'. Found in aten.clone.default((tensor([255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 82, 191, 1, 0, 0, 0, 0, 0, 72, 3, 0, 0,
0, 0, 0, 0], dtype=torch.uint8),), **{'memory_format': torch.contiguous_format})

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/root/miniconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/miniconda3/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/root/HCP-Diffusion/hcpdiff/train_ac_single.py", line 48, in
trainer.train()
File "/root/HCP-Diffusion/hcpdiff/train_ac.py", line 405, in train
loss = self.train_one_step(data_list)
File "/root/HCP-Diffusion/hcpdiff/train_ac.py", line 488, in train_one_step
self.accelerator.backward(loss)
File "/root/miniconda3/lib/python3.10/site-packages/accelerate/accelerator.py", line 1983, in backward
self.scaler.scale(loss).backward(**kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/root/miniconda3/lib/python3.10/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 423, in unpack
raise RuntimeError(
RuntimeError: Attempt to retrieve a tensor saved by autograd multiple times without checkpoint recomputation being triggered in between, this is not currently supported. Please open an issue with details on your use case so that we can prioritize adding this.
Traceback (most recent call last):
File "/root/miniconda3/bin/accelerate", line 8, in
sys.exit(main())
File "/root/miniconda3/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/root/miniconda3/lib/python3.10/site-packages/accelerate/commands/launch.py", line 986, in launch_command
simple_launcher(args)
File "/root/miniconda3/lib/python3.10/site-packages/accelerate/commands/launch.py", line 628, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/miniconda3/bin/python', '-m', 'hcpdiff.train_ac_single', '--cfg', 'cfgs/train/examples/lora_anime_character.yaml', 'character_name=surtr_arknights', 'dataset_dir=data/surtr_dataset']' returned non-zero exit status 1.

DreamArtist++ 训练报错

cfgs/train/examples/DreamArtist++.yaml

删除最后 data中的dataset_class

运行 accelerate launch -m hcpdiff.train_ac_single --cfg cfgs/train/examples/DreamArtist++.yaml

报错: Expected query.size(0) == key.size(0) to be true, but got false.

(hcpd) ➜  HCP-Diffusion git:(main) ✗ accelerate launch -m hcpdiff.train_ac_single --cfg cfgs/train/examples/DreamArtist++2.yaml
[11:10:59] WARNING  The following values were not passed to `accelerate launch` and had defaults used instead:              launch.py:890
                            `--num_processes` was set to a value of `1`
                            `--num_machines` was set to a value of `1`
                            `--mixed_precision` was set to a value of `'no'`
                            `--dynamo_backend` was set to a value of `'no'`
                    To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
tensorboard is not available
wandb is not available
2023-05-15 11:11:01.927 | INFO     | hcpdiff.loggers.cli_logger:_info:30 - world_size: 1
2023-05-15 11:11:01.927 | INFO     | hcpdiff.loggers.cli_logger:_info:30 - accumulation: 1
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
2023-05-15 11:11:08.818 | INFO     | hcpdiff.data.bucket:build_buckets_from_images:130 - build buckets from images
/data/yoke/anaconda3/envs/hcpd/lib/python3.10/site-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  warnings.warn(
2023-05-15 11:11:08.841 | INFO     | hcpdiff.data.bucket:build_buckets_from_images:159 - buckets info: size:[512 512], num:1
2023-05-15 11:11:08.842 | INFO     | __main__:build_data:57 - len(train_dataset): 4
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00,  1.87it/s]
2023-05-15 11:11:12.413 | INFO     | hcpdiff.loggers.cli_logger:_info:30 - ***** Running training *****
2023-05-15 11:11:12.414 | INFO     | hcpdiff.loggers.cli_logger:_info:30 -   Num batches each epoch = 1
2023-05-15 11:11:12.414 | INFO     | hcpdiff.loggers.cli_logger:_info:30 -   Num Steps = 1000
2023-05-15 11:11:12.414 | INFO     | hcpdiff.loggers.cli_logger:_info:30 -   Instantaneous batch size per device = 4
2023-05-15 11:11:12.414 | INFO     | hcpdiff.loggers.cli_logger:_info:30 -   Total train batch size (w. parallel, distributed & accumulation) = 4
2023-05-15 11:11:12.414 | INFO     | hcpdiff.loggers.cli_logger:_info:30 -   Gradient Accumulation steps = 1
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /data/yoke/anaconda3/envs/hcpd/lib/python3.10/runpy.py:196 in _run_module_as_main                │
│                                                                                                  │
│   193 │   main_globals = sys.modules["__main__"].__dict__                                        │
│   194 │   if alter_argv:                                                                         │
│   195 │   │   sys.argv[0] = mod_spec.origin                                                      │
│ ❱ 196 │   return _run_code(code, main_globals, None,                                             │
│   197 │   │   │   │   │    "__main__", mod_spec)                                                 │
│   198                                                                                            │
│   199 def run_module(mod_name, init_globals=None,                                                │
│                                                                                                  │
│ /data/yoke/anaconda3/envs/hcpd/lib/python3.10/runpy.py:86 in _run_code                           │
│                                                                                                  │
│    83 │   │   │   │   │      __loader__ = loader,                                                │
│    84 │   │   │   │   │      __package__ = pkg_name,                                             │
│    85 │   │   │   │   │      __spec__ = mod_spec)                                                │
│ ❱  86 │   exec(code, run_globals)                                                                │
│    87 │   return run_globals                                                                     │
│    88                                                                                            │
│    89 def _run_module_code(code, init_globals=None,                                              │
│                                                                                                  │
│ /data/yoke/HCP-Diffusion/hcpdiff/train_ac_single.py:105 in <module>                              │
│                                                                                                  │
│   102 │                                                                                          │
│   103 │   conf = load_config_with_cli(args.cfg, args_list=sys.argv[3:]) # skip --cfg             │
│   104 │   trainer=TrainerSingleCard(conf)                                                        │
│ ❱ 105 │   trainer.train()                                                                        │
│   106                                                                                            │
│                                                                                                  │
│ /data/yoke/HCP-Diffusion/hcpdiff/train_ac.py:409 in train                                        │
│                                                                                                  │
│   406 │   │                                                                                      │
│   407 │   │   loss_sum = np.ones(30)                                                             │
│   408 │   │   for data_list in self.train_loader_group:                                          │
│ ❱ 409 │   │   │   loss = self.train_one_step(data_list)                                          │
│   410 │   │   │   loss_sum[self.global_step%len(loss_sum)] = loss                                │
│   411 │   │   │                                                                                  │
│   412 │   │   │   self.global_step += 1                                                          │
│                                                                                                  │
│ /data/yoke/HCP-Diffusion/hcpdiff/train_ac.py:501 in train_one_step                               │
│                                                                                                  │
│   498 │   │   │   │   other_datas = {k:v.to(self.device, dtype=self.weight_dtype) for k, v in    │
│   499 │   │   │   │                                                                              │
│   500 │   │   │   │   latents = self.get_latents(image, self.train_loader_group.get_dataset(id   │
│ ❱ 501 │   │   │   │   model_pred, target, timesteps = self.forward(latents, prompt_ids, **othe   │
│   502 │   │   │   │   loss = self.get_loss(model_pred, target, timesteps, att_mask) * self.tra   │
│   503 │   │   │   │   self.accelerator.backward(loss)                                            │
│   504                                                                                            │
│                                                                                                  │
│ /data/yoke/HCP-Diffusion/hcpdiff/train_ac.py:479 in forward                                      │
│                                                                                                  │
│   476 │   │                                                                                      │
│   477 │   │   # CFG context for DreamArtist                                                      │
│   478 │   │   noisy_latents, timesteps = self.cfg_context.pre(noisy_latents, timesteps)          │
│ ❱ 479 │   │   model_pred = self.encode_decode(prompt_ids, noisy_latents, timesteps, **kwargs)    │
│   480 │   │   model_pred = self.cfg_context.post(model_pred)                                     │
│   481 │   │                                                                                      │
│   482 │   │   # Get the target for loss depending on the prediction type                         │
│                                                                                                  │
│ /data/yoke/HCP-Diffusion/hcpdiff/train_ac_single.py:78 in encode_decode                          │
│                                                                                                  │
│    75 │   │   │   │   feeder(input_all)                                                          │
│    76 │   │                                                                                      │
│    77 │   │   encoder_hidden_states = self.text_encoder(prompt_ids, output_hidden_states=True)   │
│ ❱  78 │   │   model_pred = self.unet(noisy_latents, timesteps, encoder_hidden_states).sample     │
│    79 │   │   return model_pred                                                                  │
│    80 │                                                                                          │
│    81 │   def get_loss(self, model_pred, target, timesteps, att_mask):                           │
│                                                                                                  │
│ /data/yoke/anaconda3/envs/hcpd/lib/python3.10/site-packages/torch/nn/modules/module.py:1130 in   │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1127 │   │   # this function, and just call forward.                                           │
│   1128 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1129 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1130 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1131 │   │   # Do not call functions when jit is used                                          │
│   1132 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1133 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /data/yoke/anaconda3/envs/hcpd/lib/python3.10/site-packages/accelerate/utils/operations.py:521   │
│ in forward                                                                                       │
│                                                                                                  │
│   518 │   model_forward = ConvertOutputsToFp32(model_forward)                                    │
│   519 │                                                                                          │
│   520 │   def forward(*args, **kwargs):                                                          │
│ ❱ 521 │   │   return model_forward(*args, **kwargs)                                              │
│   522 │                                                                                          │
│   523 │   # To act like a decorator so that it can be popped when doing `extract_model_from_pa   │
│   524 │   forward.__wrapped__ = model_forward                                                    │
│                                                                                                  │
│ /data/yoke/anaconda3/envs/hcpd/lib/python3.10/site-packages/accelerate/utils/operations.py:509   │
│ in __call__                                                                                      │
│                                                                                                  │
│   506 │   │   update_wrapper(self, model_forward)                                                │
│   507 │                                                                                          │
│   508 │   def __call__(self, *args, **kwargs):                                                   │
│ ❱ 509 │   │   return convert_to_fp32(self.model_forward(*args, **kwargs))                        │
│   510 │                                                                                          │
│   511 │   def __getstate__(self):                                                                │
│   512 │   │   raise pickle.PicklingError(                                                        │
│                                                                                                  │
│ /data/yoke/anaconda3/envs/hcpd/lib/python3.10/site-packages/torch/amp/autocast_mode.py:12 in     │
│ decorate_autocast                                                                                │
│                                                                                                  │
│     9 │   @functools.wraps(func)                                                                 │
│    10 │   def decorate_autocast(*args, **kwargs):                                                │
│    11 │   │   with autocast_instance:                                                            │
│ ❱  12 │   │   │   return func(*args, **kwargs)                                                   │
│    13 │   decorate_autocast.__script_unsupported = '@autocast() decorator is not supported in    │
│    14 │   return decorate_autocast                                                               │
│    15                                                                                            │
│                                                                                                  │
│ /data/yoke/anaconda3/envs/hcpd/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.p │
│ y:724 in forward                                                                                 │
│                                                                                                  │
│   721 │   │   down_block_res_samples = (sample,)                                                 │
│   722 │   │   for downsample_block in self.down_blocks:                                          │
│   723 │   │   │   if hasattr(downsample_block, "has_cross_attention") and downsample_block.has   │
│ ❱ 724 │   │   │   │   sample, res_samples = downsample_block(                                    │
│   725 │   │   │   │   │   hidden_states=sample,                                                  │
│   726 │   │   │   │   │   temb=emb,                                                              │
│   727 │   │   │   │   │   encoder_hidden_states=encoder_hidden_states,                           │
│                                                                                                  │
│ /data/yoke/anaconda3/envs/hcpd/lib/python3.10/site-packages/torch/nn/modules/module.py:1130 in   │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1127 │   │   # this function, and just call forward.                                           │
│   1128 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1129 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1130 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1131 │   │   # Do not call functions when jit is used                                          │
│   1132 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1133 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /data/yoke/anaconda3/envs/hcpd/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py:8 │
│ 59 in forward                                                                                    │
│                                                                                                  │
│    856 │   │   │   │   │   return custom_forward                                                 │
│    857 │   │   │   │                                                                             │
│    858 │   │   │   │   hidden_states = torch.utils.checkpoint.checkpoint(create_custom_forward(  │
│ ❱  859 │   │   │   │   hidden_states = torch.utils.checkpoint.checkpoint(                        │
│    860 │   │   │   │   │   create_custom_forward(attn, return_dict=False),                       │
│    861 │   │   │   │   │   hidden_states,                                                        │
│    862 │   │   │   │   │   encoder_hidden_states,                                                │
│                                                                                                  │
│ /data/yoke/HCP-Diffusion/hcpdiff/train_ac.py:48 in checkpoint_fix                                │
│                                                                                                  │
│    45 # fix checkpoint bug for train part of model                                               │
│    46 import torch.utils.checkpoint                                                              │
│    47 def checkpoint_fix(function, *args, use_reentrant: bool = False, checkpoint_raw = torch.   │
│ ❱  48 │   return checkpoint_raw(function, *args, use_reentrant=use_reentrant, **kwargs)          │
│    49 torch.utils.checkpoint.checkpoint = checkpoint_fix                                         │
│    50                                                                                            │
│    51 class Trainer:                                                                             │
│                                                                                                  │
│ /data/yoke/anaconda3/envs/hcpd/lib/python3.10/site-packages/torch/utils/checkpoint.py:237 in     │
│ checkpoint                                                                                       │
│                                                                                                  │
│   234 │   if use_reentrant:                                                                      │
│   235 │   │   return CheckpointFunction.apply(function, preserve, *args)                         │
│   236 │   else:                                                                                  │
│ ❱ 237 │   │   return _checkpoint_without_reentrant(                                              │
│   238 │   │   │   function,                                                                      │
│   239 │   │   │   preserve,                                                                      │
│   240 │   │   │   *args                                                                          │
│                                                                                                  │
│ /data/yoke/anaconda3/envs/hcpd/lib/python3.10/site-packages/torch/utils/checkpoint.py:383 in     │
│ _checkpoint_without_reentrant                                                                    │
│                                                                                                  │
│   380 │   │   return storage.pop(x)                                                              │
│   381 │                                                                                          │
│   382 │   with torch.autograd.graph.saved_tensors_hooks(pack, unpack):                           │
│ ❱ 383 │   │   output = function(*args)                                                           │
│   384 │   │   if torch.cuda._initialized and preserve_rng_state and not had_cuda_in_fwd:         │
│   385 │   │   │   # Cuda was not initialized before running the forward, so we didn't            │
│   386 │   │   │   # stash the CUDA state.                                                        │
│                                                                                                  │
│ /data/yoke/anaconda3/envs/hcpd/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py:8 │
│ 52 in custom_forward                                                                             │
│                                                                                                  │
│    849 │   │   │   │   def create_custom_forward(module, return_dict=None):                      │
│    850 │   │   │   │   │   def custom_forward(*inputs):                                          │
│    851 │   │   │   │   │   │   if return_dict is not None:                                       │
│ ❱  852 │   │   │   │   │   │   │   return module(*inputs, return_dict=return_dict)               │
│    853 │   │   │   │   │   │   else:                                                             │
│    854 │   │   │   │   │   │   │   return module(*inputs)                                        │
│    855                                                                                           │
│                                                                                                  │
│ /data/yoke/anaconda3/envs/hcpd/lib/python3.10/site-packages/torch/nn/modules/module.py:1130 in   │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1127 │   │   # this function, and just call forward.                                           │
│   1128 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1129 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1130 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1131 │   │   # Do not call functions when jit is used                                          │
│   1132 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1133 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /data/yoke/anaconda3/envs/hcpd/lib/python3.10/site-packages/diffusers/models/transformer_2d.py:2 │
│ 65 in forward                                                                                    │
│                                                                                                  │
│   262 │   │                                                                                      │
│   263 │   │   # 2. Blocks                                                                        │
│   264 │   │   for block in self.transformer_blocks:                                              │
│ ❱ 265 │   │   │   hidden_states = block(                                                         │
│   266 │   │   │   │   hidden_states,                                                             │
│   267 │   │   │   │   encoder_hidden_states=encoder_hidden_states,                               │
│   268 │   │   │   │   timestep=timestep,                                                         │
│                                                                                                  │
│ /data/yoke/anaconda3/envs/hcpd/lib/python3.10/site-packages/torch/nn/modules/module.py:1130 in   │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1127 │   │   # this function, and just call forward.                                           │
│   1128 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1129 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1130 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1131 │   │   # Do not call functions when jit is used                                          │
│   1132 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1133 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /data/yoke/anaconda3/envs/hcpd/lib/python3.10/site-packages/diffusers/models/attention.py:331 in │
│ forward                                                                                          │
│                                                                                                  │
│   328 │   │   │   # TODO (Birch-San): Here we should prepare the encoder_attention mask correc   │
│   329 │   │   │   # prepare attention mask here                                                  │
│   330 │   │   │                                                                                  │
│ ❱ 331 │   │   │   attn_output = self.attn2(                                                      │
│   332 │   │   │   │   norm_hidden_states,                                                        │
│   333 │   │   │   │   encoder_hidden_states=encoder_hidden_states,                               │
│   334 │   │   │   │   attention_mask=encoder_attention_mask,                                     │
│                                                                                                  │
│ /data/yoke/anaconda3/envs/hcpd/lib/python3.10/site-packages/torch/nn/modules/module.py:1130 in   │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1127 │   │   # this function, and just call forward.                                           │
│   1128 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1129 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1130 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1131 │   │   # Do not call functions when jit is used                                          │
│   1132 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1133 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /data/yoke/anaconda3/envs/hcpd/lib/python3.10/site-packages/diffusers/models/attention_processor │
│ .py:267 in forward                                                                               │
│                                                                                                  │
│    264 │   │   # The `Attention` class can call different attention processors / attention func  │
│    265 │   │   # here we simply pass along all tensors to the selected processor class           │
│    266 │   │   # For standard processors that are defined here, `**cross_attention_kwargs` is e  │
│ ❱  267 │   │   return self.processor(                                                            │
│    268 │   │   │   self,                                                                         │
│    269 │   │   │   hidden_states,                                                                │
│    270 │   │   │   encoder_hidden_states=encoder_hidden_states,                                  │
│                                                                                                  │
│ /data/yoke/anaconda3/envs/hcpd/lib/python3.10/site-packages/diffusers/models/attention_processor │
│ .py:696 in __call__                                                                              │
│                                                                                                  │
│    693 │   │   key = attn.head_to_batch_dim(key).contiguous()                                    │
│    694 │   │   value = attn.head_to_batch_dim(value).contiguous()                                │
│    695 │   │                                                                                     │
│ ❱  696 │   │   hidden_states = xformers.ops.memory_efficient_attention(                          │
│    697 │   │   │   query, key, value, attn_bias=attention_mask, op=self.attention_op, scale=att  │
│    698 │   │   )                                                                                 │
│    699 │   │   hidden_states = hidden_states.to(query.dtype)                                     │
│                                                                                                  │
│ /data/yoke/anaconda3/envs/hcpd/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py:192 in │
│ memory_efficient_attention                                                                       │
│                                                                                                  │
│   189 │   │   and options.                                                                       │
│   190 │   :return: multi-head attention Tensor with shape ``[B, Mq, H, Kv]``                     │
│   191 │   """                                                                                    │
│ ❱ 192 │   return _memory_efficient_attention(                                                    │
│   193 │   │   Inputs(                                                                            │
│   194 │   │   │   query=query, key=key, value=value, p=p, attn_bias=attn_bias, scale=scale       │
│   195 │   │   ),                                                                                 │
│                                                                                                  │
│ /data/yoke/anaconda3/envs/hcpd/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py:295 in │
│ _memory_efficient_attention                                                                      │
│                                                                                                  │
│   292 │   │   )                                                                                  │
│   293 │                                                                                          │
│   294 │   output_shape = inp.normalize_bmhk()                                                    │
│ ❱ 295 │   return _fMHA.apply(                                                                    │
│   296 │   │   op, inp.query, inp.key, inp.value, inp.attn_bias, inp.p, inp.scale                 │
│   297 │   ).reshape(output_shape)                                                                │
│   298                                                                                            │
│                                                                                                  │
│ /data/yoke/anaconda3/envs/hcpd/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py:41 in  │
│ forward                                                                                          │
│                                                                                                  │
│    38 │   │   op_fw = op[0] if op is not None else None                                          │
│    39 │   │   op_bw = op[1] if op is not None else None                                          │
│    40 │   │                                                                                      │
│ ❱  41 │   │   out, op_ctx = _memory_efficient_attention_forward_requires_grad(                   │
│    42 │   │   │   inp=inp, op=op_fw                                                              │
│    43 │   │   )                                                                                  │
│    44                                                                                            │
│                                                                                                  │
│ /data/yoke/anaconda3/envs/hcpd/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py:323 in │
│ _memory_efficient_attention_forward_requires_grad                                                │
│                                                                                                  │
│   320 │   │   op = _dispatch_fw(inp)                                                             │
│   321 │   else:                                                                                  │
│   322 │   │   _ensure_op_supports_or_raise(ValueError, "memory_efficient_attention", op, inp)    │
│ ❱ 323 │   out = op.apply(inp, needs_gradient=True)                                               │
│   324 │   assert out[1] is not None                                                              │
│   325 │   return (out[0].reshape(output_shape), out[1])                                          │
│   326                                                                                            │
│                                                                                                  │
│ /data/yoke/anaconda3/envs/hcpd/lib/python3.10/site-packages/xformers/ops/fmha/cutlass.py:175 in  │
│ apply                                                                                            │
│                                                                                                  │
│   172 │   │   if type(inp.attn_bias) not in FwOp.SUPPORTED_ATTN_BIAS_TYPES:                      │
│   173 │   │   │   raise NotImplementedError("Unsupported attn_bias type")                        │
│   174 │   │   seqstart_k, seqstart_q, max_seqlen_q, _ = _get_seqlen_info(inp)                    │
│ ❱ 175 │   │   out, lse, rng_seed, rng_offset = cls.OPERATOR(                                     │
│   176 │   │   │   query=inp.query,                                                               │
│   177 │   │   │   key=inp.key,                                                                   │
│   178 │   │   │   value=inp.value,                                                               │
│                                                                                                  │
│ /data/yoke/anaconda3/envs/hcpd/lib/python3.10/site-packages/torch/_ops.py:143 in __call__        │
│                                                                                                  │
│   140 │   │   # is still callable from JIT                                                       │
│   141 │   │   # We save the function ptr as the `op` attribute on                                │
│   142 │   │   # OpOverloadPacket to access it here.                                              │
│ ❱ 143 │   │   return self._op(*args, **kwargs or {})                                             │
│   144 │                                                                                          │
│   145 │   # TODO: use this to make a __dir__                                                     │
│   146 │   def overloads(self):                                                                   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Expected query.size(0) == key.size(0) to be true, but got false.  (Could this error message be improved?  If so, please
report an enhancement request to PyTorch.)
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /data/yoke/anaconda3/envs/hcpd/bin/accelerate:8 in <module>                                      │
│                                                                                                  │
│   5 from accelerate.commands.accelerate_cli import main                                          │
│   6 if __name__ == '__main__':                                                                   │
│   7 │   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])                         │
│ ❱ 8 │   sys.exit(main())                                                                         │
│   9                                                                                              │
│                                                                                                  │
│ /data/yoke/anaconda3/envs/hcpd/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.p │
│ y:45 in main                                                                                     │
│                                                                                                  │
│   42 │   │   exit(1)                                                                             │
│   43 │                                                                                           │
│   44 │   # Run                                                                                   │
│ ❱ 45 │   args.func(args)                                                                         │
│   46                                                                                             │
│   47                                                                                             │
│   48 if __name__ == "__main__":                                                                  │
│                                                                                                  │
│ /data/yoke/anaconda3/envs/hcpd/lib/python3.10/site-packages/accelerate/commands/launch.py:918 in │
│ launch_command                                                                                   │
│                                                                                                  │
│   915 │   elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA   │
│   916 │   │   sagemaker_launcher(defaults, args)                                                 │
│   917 │   else:                                                                                  │
│ ❱ 918 │   │   simple_launcher(args)                                                              │
│   919                                                                                            │
│   920                                                                                            │
│   921 def main():                                                                                │
│                                                                                                  │
│ /data/yoke/anaconda3/envs/hcpd/lib/python3.10/site-packages/accelerate/commands/launch.py:580 in │
│ simple_launcher                                                                                  │
│                                                                                                  │
│   577 │   process.wait()                                                                         │
│   578 │   if process.returncode != 0:                                                            │
│   579 │   │   if not args.quiet:                                                                 │
│ ❱ 580 │   │   │   raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)    │
│   581 │   │   else:                                                                              │
│   582 │   │   │   sys.exit(1)                                                                    │
│   583                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
CalledProcessError: Command '['/data/yoke/anaconda3/envs/hcpd/bin/python', '-m', 'hcpdiff.train_ac_single', '--cfg',
'cfgs/train/examples/DreamArtist++2.yaml']' returned non-zero exit status 1.

lora模型大小是否可调？

BUGreport:[Rank 0] Watchdog caught collective operation timeout

单机多卡训练过程中出现accelerate ProcessGroupNCCL 超时问题。

当前使用训练命令：
torchrun --nproc_per_node 8 -m hcpdiff.train_colo --cfg cfgs/train/examples/fine-tuning.yaml

训练配置：
base: [cfgs/train/train_base.yaml, cfgs/train/tuning_base.yaml]

      unet:
        -
          lr: 1e-6
          layers:
            - '' # fine-tuning all layers in unet
      
      ## fine-tuning text-encoder
      text_encoder:
       - lr: 1e-6
         layers:
           - ''
      
      tokenizer_pt:
        train: null
      
      train:
        gradient_accumulation_steps: 1
        save_step: 100
      
        scheduler:
          name: 'constant_with_warmup'
          num_warmup_steps: 50
          num_training_steps: 600
      
      model:
        pretrained_model_name_or_path: '/home/jovyan/data-vol-polefs-1/sd-webui/model/stable-diffusion-v1-5'
        tokenizer_repeats: 1
        ema_unet: 0
        ema_text_encoder: 0
      
      data:
        batch_size: 4
        prompt_template: 'prompt_tuning_template/object.txt'
        caption_file: null
        cache_latents: True
        tag_transforms:
          transforms:
            - _target_: hcpdiff.utils.caption_tools.TagShuffle
            - _target_: hcpdiff.utils.caption_tools.TagDropout
              p: 0.1
            - _target_: hcpdiff.utils.caption_tools.TemplateFill
              word_names: {}
        bucket:
          _target_: hcpdiff.data.bucket.RatioBucket.from_files # aspect ratio bucket
          img_root: '/home/jovyan/data-vol-polefs-1/sd-webui/Data/GOODONES/'
          target_area: {_target_: "builtins.eval", _args_: ['512*512']}
          num_bucket: 10
      
      data_class: null

报错内容：

      [E ProcessGroupNCCL.cpp:821] [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=106, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1801455 milliseconds before timing out.
      [E ProcessGroupNCCL.cpp:456] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
      [E ProcessGroupNCCL.cpp:461] To avoid data inconsistency, we are taking the entire process down.
      terminate called after throwing an instance of 'std::runtime_error'
        what():  [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=106, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1801455 milliseconds before timing out.
      WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 102175 closing signal SIGTERM
      WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 102176 closing signal SIGTERM
      WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 102177 closing signal SIGTERM
      WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 102178 closing signal SIGTERM
      WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 102179 closing signal SIGTERM
      WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 102180 closing signal SIGTERM
      WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 102181 closing signal SIGTERM
      ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 0 (pid: 102174) of binary: /home/jovyan/miniconda3/envs/sd-webui2/bin/python
      Traceback (most recent call last):
        File "/home/jovyan/miniconda3/envs/sd-webui2/bin/torchrun", line 8, in <module>
          sys.exit(main())
        File "/home/jovyan/miniconda3/envs/sd-webui2/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
          return f(*args, **kwargs)
        File "/home/jovyan/miniconda3/envs/sd-webui2/lib/python3.10/site-packages/torch/distributed/run.py", line 762, in main
          run(args)
        File "/home/jovyan/miniconda3/envs/sd-webui2/lib/python3.10/site-packages/torch/distributed/run.py", line 753, in run
          elastic_launch(
        File "/home/jovyan/miniconda3/envs/sd-webui2/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
          return launch_agent(self._config, self._entrypoint, list(args))
        File "/home/jovyan/miniconda3/envs/sd-webui2/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
          raise ChildFailedError(
      torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
      =======================================================
      hcpdiff.train_colo FAILED
      -------------------------------------------------------
      Failures:
        <NO_OTHER_FAILURES>
      -------------------------------------------------------

=======================================================

额外内容：
报错发生在训练进行中卡住大概半小时后报错。

训练日志如下：
2023-04-21 17:46:45.897 | INFO | hcpdiff.train_ac:init:59 - world_size: 8
2023-04-21 17:46:45.897 | INFO | hcpdiff.train_ac:init:60 - accumulation: 1
2023-04-21 17:49:29.190 | INFO | hcpdiff.train_ac:build_data:265 - len(train_dataset): 352
2023-04-21 17:54:20.501 | INFO | hcpdiff.train_ac:train:338 - ***** Running training *****
2023-04-21 17:54:20.502 | INFO | hcpdiff.train_ac:train:339 - Num batches each epoch = 11
2023-04-21 17:54:20.504 | INFO | hcpdiff.train_ac:train:340 - Num Steps = 600
2023-04-21 17:54:20.504 | INFO | hcpdiff.train_ac:train:341 - Instantaneous batch size per device = 4
2023-04-21 17:54:20.505 | INFO | hcpdiff.train_ac:train:342 - Total train batch size (w. parallel, distributed & accumulation) = 32
2023-04-21 17:54:20.506 | INFO | hcpdiff.train_ac:train:343 - Gradient Accumulation steps = 1
2023-04-21 17:54:51.571 | INFO | hcpdiff.train_ac:train:363 - Step [20/600], LR_model: 1.28e-05, LR_word: 0.00e+00, Loss: 0.14078
2023-04-21 17:55:14.095 | INFO | hcpdiff.train_ac:train:363 - Step [40/600], LR_model: 2.56e-05, LR_word: 0.00e+00, Loss: 0.10552
2023-04-21 17:55:35.954 | INFO | hcpdiff.train_ac:train:363 - Step [60/600], LR_model: 3.20e-05, LR_word: 0.00e+00, Loss: 0.12115
2023-04-21 17:55:57.702 | INFO | hcpdiff.train_ac:train:363 - Step [80/600], LR_model: 3.20e-05, LR_word: 0.00e+00, Loss: 0.14846

Bug Report: Fine-tuning & Dreambooth collapse when using multi-gpus

2023-04-17 16:36:29.428 | INFO     | __main__:build_data:283 - len(train_dataset): 40
100%|████████████████████████████████████████████████████████████████████████| 40/40 [00:08<00:00,  4.65it/s]
2023-04-17 16:36:41.156 | INFO     | __main__:train:355 - ***** Running training *****
2023-04-17 16:36:41.157 | INFO     | __main__:train:356 -   Num batches each epoch = 2
2023-04-17 16:36:41.157 | INFO     | __main__:train:357 -   Num Steps = 20000
2023-04-17 16:36:41.157 | INFO     | __main__:train:358 -   Instantaneous batch size per device = 1
2023-04-17 16:36:41.157 | INFO     | __main__:train:359 -   Total train batch size (w. parallel, distributed & accumulation) = 5
2023-04-17 16:36:41.157 | INFO     | __main__:train:360 -   Gradient Accumulation steps = 1
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/yabin/miniconda3/envs/hcp/lib/python3.10/runpy.py:196 in _run_module_as_main               │
│                                                                                                  │
│   193 │   main_globals = sys.modules["__main__"].__dict__                                        │
│   194 │   if alter_argv:                                                                         │
│   195 │   │   sys.argv[0] = mod_spec.origin                                                      │
│ ❱ 196 │   return _run_code(code, main_globals, None,                                             │
│   197 │   │   │   │   │    "__main__", mod_spec)                                                 │
│   198                                                                                            │
│   199 def run_module(mod_name, init_globals=None,                                                │
│                                                                                                  │
│ /home/yabin/miniconda3/envs/hcp/lib/python3.10/runpy.py:86 in _run_code                          │
│                                                                                                  │
│    83 │   │   │   │   │      __loader__ = loader,                                                │
│    84 │   │   │   │   │      __package__ = pkg_name,                                             │
│    85 │   │   │   │   │      __spec__ = mod_spec)                                                │
│ ❱  86 │   exec(code, run_globals)                                                                │
│    87 │   return run_globals                                                                     │
│    88                                                                                            │
│    89 def _run_module_code(code, init_globals=None,                                              │
│                                                                                                  │
│ /home/yabin/HCP-Diffusion/hcpdiff/train_ac.py:532 in <module>                                    │
│                                                                                                  │
│   529 │                                                                                          │
│   530 │   conf = load_config_with_cli(args.cfg, args_list=sys.argv[3:]) # skip --cfg             │
│   531 │   trainer=Trainer(conf)                                                                  │
│ ❱ 532 │   trainer.train()                                                                        │
│   533                                                                                            │
│                                                                                                  │
│ /home/yabin/HCP-Diffusion/hcpdiff/train_ac.py:370 in train                                       │
│                                                                                                  │
│   367 │   │                                                                                      │
│   368 │   │   loss_sum=0                                                                         │
│   369 │   │   for image, att_mask, prompt_ids in cycle_data(self.train_loader, arb=self.arb_is   │
│ ❱ 370 │   │   │   loss=self.train_one_step(image, att_mask, prompt_ids)                          │
│   371 │   │   │   loss_sum+=loss                                                                 │
│   372 │   │   │                                                                                  │
│   373 │   │   │   self.global_step += 1                                                          │
│                                                                                                  │
│ /home/yabin/HCP-Diffusion/hcpdiff/train_ac.py:463 in train_one_step                              │
│                                                                                                  │
│   460 │   │   │   else:                                                                          │
│   461 │   │   │   │   loss = self.get_loss(model_pred, target, att_mask)                         │
│   462 │   │   │                                                                                  │
│ ❱ 463 │   │   │   self.accelerator.backward(loss)                                                │
│   464 │   │   │                                                                                  │
│   465 │   │   │   if hasattr(self, 'optimizer'):                                                 │
│   466 │   │   │   │   if self.accelerator.sync_gradients: # fine-tuning                          │
│                                                                                                  │
│ /home/yabin/miniconda3/envs/hcp/lib/python3.10/site-packages/accelerate/accelerator.py:1681 in   │
│ backward                                                                                         │
│                                                                                                  │
│   1678 │   │   elif self.distributed_type == DistributedType.MEGATRON_LM:                        │
│   1679 │   │   │   return                                                                        │
│   1680 │   │   elif self.scaler is not None:                                                     │
│ ❱ 1681 │   │   │   self.scaler.scale(loss).backward(**kwargs)                                    │
│   1682 │   │   else:                                                                             │
│   1683 │   │   │   loss.backward(**kwargs)                                                       │
│   1684                                                                                           │
│                                                                                                  │
│ /home/yabin/miniconda3/envs/hcp/lib/python3.10/site-packages/torch/_tensor.py:487 in backward    │
│                                                                                                  │
│    484 │   │   │   │   create_graph=create_graph,                                                │
│    485 │   │   │   │   inputs=inputs,                                                            │
│    486 │   │   │   )                                                                             │
│ ❱  487 │   │   torch.autograd.backward(                                                          │
│    488 │   │   │   self, gradient, retain_graph, create_graph, inputs=inputs                     │
│    489 │   │   )                                                                                 │
│    490                                                                                           │
│                                                                                                  │
│ /home/yabin/miniconda3/envs/hcp/lib/python3.10/site-packages/torch/autograd/__init__.py:200 in   │
│ backward                                                                                         │
│                                                                                                  │
│   197 │   # The reason we repeat same the comment below is that                                  │
│   198 │   # some Python versions print out the first line of a multi-line function               │
│   199 │   # calls in the traceback and some print out the last line                              │
│ ❱ 200 │   Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the bac   │
│   201 │   │   tensors, grad_tensors_, retain_graph, create_graph, inputs,                        │
│   202 │   │   allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to ru   │
│   203                                                                                            │
│                                                                                                  │
│ /home/yabin/miniconda3/envs/hcp/lib/python3.10/site-packages/torch/autograd/function.py:274 in   │
│ apply                                                                                            │
│                                                                                                  │
│   271 │   │   │   │   │   │   │      "Function is not allowed. You should only implement one "   │
│   272 │   │   │   │   │   │   │      "of them.")                                                 │
│   273 │   │   user_fn = vjp_fn if vjp_fn is not Function.vjp else backward_fn                    │
│ ❱ 274 │   │   return user_fn(self, *args)                                                        │
│   275 │                                                                                          │
│   276 │   def apply_jvp(self, *args):                                                            │
│   277 │   │   # _forward_cls is defined by derived class                                         │
│                                                                                                  │
│ /home/yabin/miniconda3/envs/hcp/lib/python3.10/site-packages/torch/utils/checkpoint.py:157 in    │
│ backward                                                                                         │
│                                                                                                  │
│   154 │   │   │   raise RuntimeError(                                                            │
│   155 │   │   │   │   "none of output has requires_grad=True,"                                   │
│   156 │   │   │   │   " this checkpoint() is not necessary")                                     │
│ ❱ 157 │   │   torch.autograd.backward(outputs_with_grad, args_with_grad)                         │
│   158 │   │   grads = tuple(inp.grad if isinstance(inp, torch.Tensor) else None                  │
│   159 │   │   │   │   │     for inp in detached_inputs)                                          │
│   160                                                                                            │
│                                                                                                  │
│ /home/yabin/miniconda3/envs/hcp/lib/python3.10/site-packages/torch/autograd/__init__.py:200 in   │
│ backward                                                                                         │
│                                                                                                  │
│   197 │   # The reason we repeat same the comment below is that                                  │
│   198 │   # some Python versions print out the first line of a multi-line function               │
│   199 │   # calls in the traceback and some print out the last line                              │
│ ❱ 200 │   Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the bac   │
│   201 │   │   tensors, grad_tensors_, retain_graph, create_graph, inputs,                        │
│   202 │   │   allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to ru   │
│   203                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following
reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are
not shared across multiple concurrent forward-backward passes. or try to use _set_static_graph() as a
workaround if this module graph does not change during training loop.2) Reused parameters in multiple
reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of
your model, it would result in the same set of parameters been used by different reentrant backward passes
multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases in
default. You can try to use _set_static_graph() as a workaround if your module graph does not change over
iterations.
Parameter at index 598 has been marked as ready twice. This means that multiple autograd engine  hooks have
fired for this particular parameter during this iteration. You can set the environment variable
TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print parameter names for further debugging.

Config files I use:

_base_: [cfgs/train/train_base.yaml, cfgs/train/tuning_base.yaml]

unet:
  -
    lr: 1e-6
    layers:
      - ''

text_encoder:
  - lr: 1e-6
    layers:
      - ''

lora_unet: null
lora_text_encoder: null

tokenizer_pt:
  train: null

train:
  gradient_accumulation_steps: 1
  save_step: 100

  scheduler:
    name: 'constant_with_warmup'
    num_warmup_steps: 50
    num_training_steps: 600

model:
  pretrained_model_name_or_path: 'runwayml/stable-diffusion-v1-5'
  tokenizer_repeats: 1
  ema_unet: 0
  ema_text_encoder: 0
  enable_xformers: False

data:
  batch_size: 1
  prompt_template: 'prompt_tuning_template/object.txt'
  caption_file: null
  cache_latents: True
  tag_transforms:
    transforms:
      - _target_: hcpdiff.utils.caption_tools.TagShuffle
      - _target_: hcpdiff.utils.caption_tools.TagDropout
        p: 0.1
      - _target_: hcpdiff.utils.caption_tools.TemplateFill
        word_names:
          pt1: sks
          class: dog
  bucket:
    _target_: hcpdiff.data.bucket.RatioBucket.from_files
    img_root: '/home/yabin/datasets/custom/enma_ai/'
    target_area: {_target_: "builtins.eval", _args_: ['512*512']}
    num_bucket: 1

data_class:
  null

_base_: [cfgs/train/train_base.yaml, cfgs/train/tuning_base.yaml]

unet:
  - lr: 1e-6
    layers:
      - '' # fine-tuning all layers in unet

# fine-tuning text-encoder
text_encoder:
  - lr: 1e-6
    layers:
      - ''

tokenizer_pt:
  train: null

train:
  gradient_accumulation_steps: 1
  save_step: 100

  scheduler:
    name: 'constant_with_warmup'
    num_warmup_steps: 500
    num_training_steps: 20000

model:
  pretrained_model_name_or_path: 'stabilityai/stable-diffusion-2-1'
#  pretrained_model_name_or_path: '/home/yabin/HCP-Diffusion/converted_models/realismengine'
  tokenizer_repeats: 1
  ema_unet: 0
  ema_text_encoder: 0
  enable_xformers: False

data:
  batch_size: 1
  prompt_template: 'prompt_tuning_template/object.txt'
  caption_file: null
  cache_latents: True
  tag_transforms:
    transforms:
      - _target_: hcpdiff.utils.caption_tools.TagShuffle
      - _target_: hcpdiff.utils.caption_tools.TagDropout
        p: 0.1
      - _target_: hcpdiff.utils.caption_tools.TemplateFill
        word_names: {}
  bucket:
    _target_: hcpdiff.data.bucket.RatioBucket.from_files # aspect ratio bucket
    img_root: '/home/yabin/datasets/custom/enma_ai/'
    target_area: {_target_: "builtins.eval", _args_: ['1024*1024']}
    num_bucket: 1

data_class: null

Single card training and multi-gpu training with LoRA work fine.

Any plans to make this workable on Google Colab? Thank you very much!

能否提供一个一键启动的Docker镜像？

调试过程中总是报错，最好能提供一个镜像方便学习，谢谢。

学习文档示例，WEBUI格式模型转换报错：UnpicklingError: invalid load key, '\xa0'.

用的是文档的示例
https://hcpdiff.readthedocs.io/zh-cn/latest/user_guides/model_convert.html

最终报错：

magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '\xa0'.

windows 系统，
HCP-Diffusion版本：commit 9573c9b
Date: Tue Nov 14 20:47:31 2023 +0800

情况说明如下，我git clone HCP-Diffusion之后，用python -m venv venv 新建虚拟环境，然后激活虚拟环境，输入文档的代码，
并将其中模型目录换成我自己的。
它这个报错，是我输入的参数有问题嘛？

完整代码如下：

(venv) E:\AI\HCP-diffusion\HCP-Diffusion-webui\HCP-Diffusion>python -m hcpdiff.tools.sd2diffusers ^
More? --checkpoint_path "E:\AI\HCP-diffusion\erwin-demo\SD_to_Diffuse_Model\CKPT\meinamix_meinaV11.safetensors" ^
More? --original_config_file "E:\AI\HCP-diffusion\erwin-demo\SD_to_Diffuse_Model\SD1.5-v1-inference.yaml" ^
More? --dump_path "E:\AI\HCP-diffusion\erwin-demo\SD_to_Diffuse_Model\Diffuse_Output"
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
wandb is not available
Traceback (most recent call last):
  File "C:\Users\erwin\.pyenv\pyenv-win\versions\3.10.9\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\erwin\.pyenv\pyenv-win\versions\3.10.9\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "E:\AI\HCP-diffusion\HCP-Diffusion-webui\HCP-Diffusion\hcpdiff\tools\sd2diffusers.py", line 385, in <module>
    convert_ckpt(args)
  File "E:\AI\HCP-diffusion\HCP-Diffusion-webui\HCP-Diffusion\hcpdiff\tools\sd2diffusers.py", line 212, in convert_ckpt
    pipe = load_sd_ckpt(
  File "C:\Users\erwin\.pyenv\pyenv-win\versions\3.10.9\lib\site-packages\diffusers\pipelines\stable_diffusion\convert_from_ckpt.py", line 1258, in download_from_original_stable_diffusion_ckpt
    checkpoint = torch.load(checkpoint_path_or_dict, map_location=device)
  File "C:\Users\erwin\.pyenv\pyenv-win\versions\3.10.9\lib\site-packages\torch\serialization.py", line 1028, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "C:\Users\erwin\.pyenv\pyenv-win\versions\3.10.9\lib\site-packages\torch\serialization.py", line 1246, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '\xa0'.

RuntimeError: unscale_() has already been called on this optimizer since the last update().

I'm trying to train a lora with multiple embeddings, but I keep getting this error. I have tried to change a bunch of stuff in the YAML configuration to see if I could get any futher, but I haven't been able to get past this error. Any ideas on what is going wrong?
Contents of hcp-test folder: settings.zip

PS R:\lora-test\hcp-test> accelerate launch -m hcpdiff.train_ac_single --cfg .\lora_anime_character.yaml
bin F:\python\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll
The following values were not passed to accelerate launch and had defaults used instead:
--num_processes was set to a value of 1
--num_machines was set to a value of 1
--mixed_precision was set to a value of 'no'
--dynamo_backend was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
bin F:\python\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll
F:\python\lib\runpy.py:126: RuntimeWarning: 'hcpdiff.train_ac_single' found in sys.modules after import of package 'hcpdiff', but prior to execution of 'hcpdiff.train_ac_single'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
2023-11-13 01:24:24.563 | INFO | hcpdiff.loggers.cli_logger:_info:30 - world_size: 1
2023-11-13 01:24:24.563 | INFO | hcpdiff.loggers.cli_logger:_info:30 - accumulation: 1
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
2023-11-13 01:24:28.321 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: black_san_magnolia, len: 4, id: 49408
2023-11-13 01:24:28.322 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: bloody_reina, len: 2, id: 49409
2023-11-13 01:24:28.322 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: blue_san_magnolia, len: 4, id: 49410
2023-11-13 01:24:28.322 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: formal_giad, len: 4, id: 49411
2023-11-13 01:24:28.322 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: neck_scar, len: 2, id: 49412
2023-11-13 01:24:28.323 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: personal_room, len: 4, id: 49413
2023-11-13 01:24:28.323 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: shinei_nouzen, len: 3, id: 49414
2023-11-13 01:24:28.323 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: undertaker, len: 4, id: 49415
2023-11-13 01:24:28.323 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: vladilena_millize, len: 3, id: 49416
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
2023-11-13 01:24:29.182 | INFO | hcpdiff.data.caption_loader:load:18 - 144 record(s) loaded with TXTCaptionLoader, from path 'L:/waifu_diffusion/anime-tagger/out/86/512x512'
2023-11-13 01:24:29.183 | INFO | hcpdiff.data.bucket:build_buckets_from_images:241 - build buckets from images size
F:\python\lib\site-packages\sklearn\cluster_kmeans.py:1412: FutureWarning: The default value of n_init will change from 10 to 'auto' in 1.4. Set the value of n_init explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
2023-11-13 01:24:29.632 | INFO | hcpdiff.data.bucket:build_buckets_from_images:262 - buckets info: size:[512 512], num:144
2023-11-13 01:24:29.666 | INFO | hcpdiff.loggers.cli_logger:_info:30 - len(train_dataset): 144
F:\python\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion.py:128: FutureWarning: The configuration file of this scheduler: PNDMScheduler {
"_class_name": "PNDMScheduler",
"_diffusers_version": "0.19.3",
"beta_end": 0.012,
"beta_schedule": "scaled_linear",
"beta_start": 0.00085,
"num_train_timesteps": 1000,
"prediction_type": "epsilon",
"set_alpha_to_one": false,
"skip_prk_steps": false,
"steps_offset": 0,
"timestep_spacing": "leading",
"trained_betas": null
}
is outdated. steps_offset should be set to 1 instead of 0. Please make sure to update the config accordingly as leaving steps_offset might led to incorrect results in future versions. If you have downloaded this checkpoint from the Hugging Face Hub, it would be very nice if you could open a Pull request for the scheduler/scheduler_config.json file
deprecate("steps_offset!=1", "1.0.0", deprecation_message, standard_warn=False)
2023-11-13 01:24:30.572 | INFO | hcpdiff.loggers.cli_logger:_info:30 - ***** Running training *****
2023-11-13 01:24:30.572 | INFO | hcpdiff.loggers.cli_logger:_info:30 - Num batches each epoch = 144
2023-11-13 01:24:30.573 | INFO | hcpdiff.loggers.cli_logger:_info:30 - Num Steps = 1000
2023-11-13 01:24:30.573 | INFO | hcpdiff.loggers.cli_logger:info:30 - Instantaneous batch size per device = 1
2023-11-13 01:24:30.573 | INFO | hcpdiff.loggers.cli_logger:info:30 - Total train batch size (w. parallel, distributed & accumulation) = 1
2023-11-13 01:24:30.573 | INFO | hcpdiff.loggers.cli_logger:info:30 - Gradient Accumulation steps = 1
bin F:\python\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll
F:\python\lib\runpy.py:126: RuntimeWarning: 'hcpdiff.train_ac_single' found in sys.modules after import of package 'hcpdiff', but prior to execution of 'hcpdiff.train_ac_single'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
bin F:\python\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll
F:\python\lib\runpy.py:126: RuntimeWarning: 'hcpdiff.train_ac_single' found in sys.modules after import of package 'hcpdiff', but prior to execution of 'hcpdiff.train_ac_single'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
bin F:\python\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll
F:\python\lib\runpy.py:126: RuntimeWarning: 'hcpdiff.train_ac_single' found in sys.modules after import of package 'hcpdiff', but prior to execution of 'hcpdiff.train_ac_single'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
bin F:\python\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll
F:\python\lib\runpy.py:126: RuntimeWarning: 'hcpdiff.train_ac_single' found in sys.modules after import of package 'hcpdiff', but prior to execution of 'hcpdiff.train_ac_single'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
F:\python\lib\site-packages\hcpdiff\train_ac.py:425: FutureWarning: Accessing config attribute scaling_factor directly via 'AutoencoderKL' object attribute is deprecated. Please access 'scaling_factor' over 'AutoencoderKL's config object instead, e.g. 'unet.config.scaling_factor'.
latents = latents*self.vae.scaling_factor
F:\python\lib\site-packages\torch\utils\checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
Traceback (most recent call last):
File "F:\python\lib\runpy.py", line 196, in run_module_as_main
return run_code(code, main_globals, None,
File "F:\python\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "F:\python\lib\site-packages\hcpdiff\train_ac_single.py", line 61, in
trainer.train()
File "F:\python\lib\site-packages\hcpdiff\train_ac.py", line 391, in train
loss = self.train_one_step(data_list)
File "F:\python\lib\site-packages\hcpdiff\train_ac.py", line 481, in train_one_step
self.accelerator.clip_grad_norm(clip_param, self.cfgs.train.max_grad_norm)
File "F:\python\lib\site-packages\accelerate\accelerator.py", line 1916, in clip_grad_norm
self.unscale_gradients()
File "F:\python\lib\site-packages\accelerate\accelerator.py", line 1879, in unscale_gradients
self.scaler.unscale(opt)
File "F:\python\lib\site-packages\torch\cuda\amp\grad_scaler.py", line 275, in unscale
raise RuntimeError("unscale() has already been called on this optimizer since the last update().")
RuntimeError: unscale() has already been called on this optimizer since the last update().
Traceback (most recent call last):
File "F:\python\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "F:\python\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "F:\python\Scripts\accelerate.exe_main.py", line 7, in
File "F:\python\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "F:\python\lib\site-packages\accelerate\commands\launch.py", line 959, in launch_command
simple_launcher(args)
File "F:\python\lib\site-packages\accelerate\commands\launch.py", line 624, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['F:\python\python.exe', '-m', 'hcpdiff.train_ac_single', '--cfg', '.\lora_anime_character.yaml']' returned non-zero exit status 1.

采用默认配置训练速度慢accelerate launch -m hcpdiff.train_ac_single

大佬我在使用默认指令在训练lora时，发现速度很慢：accelerate launch -m hcpdiff.train_ac_single --cfg cfgs/train/examples/lora_conventional.yaml --model.pretrained_model_name_or_path "C://res/meta" dataset_dir="C://res/data/train"

打开任务管理器发现Cuda使用率一阵一阵的，这里是我哪里配置有问题吗，我的配置是3090 24G ,Windows 10

RuntimeError: Distributed package doesn't have NCCL bulit in

貌似windows下没法进行训练，还是有解决的解决方案？

Tutorial Section links from Readme are not linking to the correct pages

When trying to access the links found in https://github.com/7eu7d7/HCP-Diffusion#tutorials the links turn up a 404 error as they have been moved around into /en/ and /zh_cn/ folders.
The links in the Tutorial section should be updated to the following for the English language:

[Model Training Tutorial](https://github.com/7eu7d7/HCP-Diffusion/blob/main/doc/guide_train.md) -> [Model Training Tutorial](https://github.com/7eu7d7/HCP-Diffusion/blob/main/doc/en/user_guides/train.md)
[DreamArtist++ Tutorial](https://github.com/7eu7d7/HCP-Diffusion/blob/main/doc/guide_DA.md) -> [DreamArtist++ Tutorial](https://github.com/7eu7d7/HCP-Diffusion/blob/main/doc/en/tutorial/DA.md)
[Model Inference Tutorial](https://github.com/7eu7d7/HCP-Diffusion/blob/main/doc/guide_infer.md) -> [Model Inference Tutorial](https://github.com/7eu7d7/HCP-Diffusion/blob/main/doc/en/user_guides/infer.md)
[Configuration File Explanation](https://github.com/7eu7d7/HCP-Diffusion/blob/main/doc/guide_cfg.md) -> [Configuration File Explanation](https://github.com/7eu7d7/HCP-Diffusion/blob/main/doc/en/user_guides/cfg.md)
[webui Model Conversion Tutorial](https://github.com/7eu7d7/HCP-Diffusion/blob/main/doc/guide_webui_lora.md) -> [webui Model Conversion Tutorial](https://github.com/7eu7d7/HCP-Diffusion/blob/main/doc/en/user_guides/model_convert.md)

Unfortunately I do not speak Chinese and cannot make recommendations for that.

Posting the updated links here without code brackets for temporary use by viewers of this issue until the update is made:

With this out of the way, I can't wait to test your scripts. Hopefully they work on AMD.

webui support

proper web ui support, or a gui would be a very welcomed addition to this.

attention_mask 怎么使用，可以说明一下吗？

请问安装方式里方便添加一下Docker镜像么？

Thanks!

sdxl训练报错，time_ids为None

Traceback (most recent call last):
File "/home/inuyaxia/anaconda3/envs/hcpdiff/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/inuyaxia/anaconda3/envs/hcpdiff/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/mnt/ssd-array/xx-volume/develop/MLLM/HCP-Diffusion/hcpdiff/train_ac_single.py", line 45, in
trainer.train()
File "/mnt/ssd-array/xx-volume/develop/MLLM/HCP-Diffusion/hcpdiff/train_ac.py", line 397, in train
loss = self.train_one_step(data_list)
File "/mnt/ssd-array/xx-volume/develop/MLLM/HCP-Diffusion/hcpdiff/train_ac.py", line 479, in train_one_step
model_pred, target, timesteps = self.forward(latents, prompt_ids, **other_datas)
File "/mnt/ssd-array/xx-volume/develop/MLLM/HCP-Diffusion/hcpdiff/train_ac.py", line 455, in forward
model_pred = self.TE_unet(prompt_ids, noisy_latents, timesteps, **kwargs)
File "/home/inuyaxia/anaconda3/envs/hcpdiff/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/inuyaxia/anaconda3/envs/hcpdiff/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/inuyaxia/anaconda3/envs/hcpdiff/lib/python3.10/site-packages/accelerate/utils/operations.py", line 659, in forward
return model_forward(*args, **kwargs)
File "/home/inuyaxia/anaconda3/envs/hcpdiff/lib/python3.10/site-packages/accelerate/utils/operations.py", line 647, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/inuyaxia/anaconda3/envs/hcpdiff/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/mnt/ssd-array/xx-volume/develop/MLLM/HCP-Diffusion/hcpdiff/models/wrapper.py", line 72, in forward
model_pred = self.unet(noisy_latents, timesteps, encoder_hidden_states, added_cond_kwargs=added_cond_kwargs).sample # Predict the noise residual
File "/home/inuyaxia/anaconda3/envs/hcpdiff/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/inuyaxia/anaconda3/envs/hcpdiff/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/inuyaxia/anaconda3/envs/hcpdiff/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 977, in forward
time_embeds = self.add_time_proj(time_ids.flatten())
AttributeError: 'NoneType' object has no attribute 'flatten'

text2img_sdxl.yaml 推理报错

我并不是专业人士，这个报错让我很困惑，希望知道如何解决

(HCP) C:\webui_git\HCP-Diffusion>python -m hcpdiff.visualizer --cfg cfgs/infer/text2img.yaml pretrained_model=stabilityai/stable-diffusion-xl-base-1.0 prompt=1girl neg_prompt=bad seed=42
2023-12-26 14:43:07.146482: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
WARNING:tensorflow:From C:\Users\momo\AppData\Local\anaconda3\envs\HCP\lib\site-packages\keras\src\losses.py:2976: The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead.

C:\Users\momo\AppData\Local\anaconda3\envs\HCP\lib\runpy.py:126: RuntimeWarning: 'hcpdiff.visualizer' found in sys.modules after import of package 'hcpdiff', but prior to execution of 'hcpdiff.visualizer'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 5/5 [00:01<00:00, 2.98it/s]
C:\Users\momo\AppData\Local\anaconda3\envs\HCP\lib\site-packages\diffusers\pipelines\pipeline_utils.py:761: FutureWarning: torch_dtype is deprecated and will be removed in version 0.25.0.
deprecate("torch_dtype", "0.25.0", "")
2023-12-26 14:43:18.309 | INFO | hcpdiff.models.compose.compose_hook:hook:49 - compose hook: clip_B
2023-12-26 14:43:18.312 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: hatsune_miku_bluearchive, len: 4, id: 49408
2023-12-26 14:43:18.312 | INFO | hcpdiff.models.compose.compose_hook:hook:49 - compose hook: clip_bigG
2023-12-26 14:43:18.314 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: hatsune_miku_bluearchive, len: 4, id: 49408
Traceback (most recent call last):
File "C:\Users\momo\AppData\Local\anaconda3\envs\HCP\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\momo\AppData\Local\anaconda3\envs\HCP\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\webui_git\HCP-Diffusion\hcpdiff\visualizer.py", line 257, in
viser.vis_to_dir(prompt=prompt, negative_prompt=negative_prompt,
File "C:\webui_git\HCP-Diffusion\hcpdiff\visualizer.py", line 235, in vis_to_dir
images = self.vis_images(prompt, negative_prompt, seeds=seeds, **kwargs)
File "C:\Users\momo\AppData\Local\anaconda3\envs\HCP\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\webui_git\HCP-Diffusion\hcpdiff\visualizer.py", line 196, in vis_images
emb, pooled_output, attention_mask = self.te_hook.encode_prompt_to_emb(clean_text_n+clean_text_p)
File "C:\webui_git\HCP-Diffusion\hcpdiff\models\compose\compose_hook.py", line 104, in encode_prompt_to_emb
encoder_hidden_states, pooled_output = list(zip(*emb_list))
ValueError: too many values to unpack (expected 2)

希望支持LyCORIS的相关模型训练方法

项目链接在这里
https://github.com/KohakuBlueleaf/LyCORIS

LyCORIS是一个基于LoRA训练的扩展项目，包括一些其他算法的基于lora的模型，包括卷积locon、哈达玛积loha以及最新的克罗内克积lokr.
另外还有一个类似Ti的文本层模型，IA3

Support adam8bit optimizer and bitsandbytes

Please support adam8bit optimizer

linux support

希望能支持linux训练

How to use a safetensors model from civit.ai?

I'm following the tutorial from https://github.com/7eu7d7/HCP-Diffusion/blob/main/doc/en/tutorial/lora_anime.md

It mentions using "deepghs/animefull-latest" but is there a way to make it use a safetensors checkpoint on my machine that's not hosted in huggingface? I tried simply replacing "deepghs/animefull-latest" with the path to my safetensors file but it did not work.

I'm thinking I could use runwayml/stable-diffusion-v1-5 but then manually replace the downloaded v1-5-pruned.safetensors with mine, is that the recommended way?

Thanks

有没有整合包

有没有整合包，一键包之类的，，头看晕了

现在这个仓库训练的lora可以用在webui等地方么

Automatic model format recognition and conversion

Add model automatic recognition and conversion features.

Automatically convert models in SD official format.
Automatically convert lora in kohya-ss format.
Automatically convert controlnet in official format.

请问支持controlnet 配置吗如配置openpose

是否还支持addtional-networks 配置特定lora 和蒙版

Textual inversion训练出来的pt无法再webui中使用

readme file not desc of train, I hope get some help! And I meet err.

Readme file not desc of train, I hope get some help!
I run accelerate launch -m hcpdiff.train_ac_single --cfg cfgs/train/examples/DreamArtist.yaml

I meet a err. I modify this "pretrained_model_name_or_path" config in DreamArtist.yaml
Is there anything else that needs to be modified?
or give a yaml is direct run ! thanks! thanks! thanks!

Traceback (most recent call last):
  File "/home/image1325_user/anaconda3/envs/yudongjian_23_ldm/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/image1325_user/anaconda3/envs/yudongjian_23_ldm/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/image1325_user/ssd_disk1/yudongjian_23/HCP-Diffusion-main/hcpdiff/train_ac_single.py", line 44, in <module>
    trainer = TrainerSingleCard(conf)
  File "/home/image1325_user/ssd_disk1/yudongjian_23/HCP-Diffusion-main/hcpdiff/train_ac.py", line 78, in __init__
    self.build_optimizer_scheduler()
  File "/home/image1325_user/ssd_disk1/yudongjian_23/HCP-Diffusion-main/hcpdiff/train_ac.py", line 366, in build_optimizer_scheduler
    parameters, parameters_pt = self.get_param_group_train()
  File "/home/image1325_user/ssd_disk1/yudongjian_23/HCP-Diffusion-main/hcpdiff/train_ac.py", line 356, in get_param_group_train
    word_emb = self.ex_words_emb[v.name]
KeyError: 'pt-catgirl1'

生成图像时报错

生成图像时最后vae decode的时候报错
复现：运行Colab example的Generate Images部分

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <cell line: 16>:16                                                                            │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py:115 in decorate_context       │
│                                                                                                  │
│   112 │   @functools.wraps(func)                                                                 │
│   113 │   def decorate_context(*args, **kwargs):                                                 │
│   114 │   │   with ctx_factory():                                                                │
│ ❱ 115 │   │   │   return func(*args, **kwargs)                                                   │
│   116 │                                                                                          │
│   117 │   return decorate_context                                                                │
│   118                                                                                            │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/hcpdiff/visualizer.py:202 in vis_images                  │
│                                                                                                  │
│   199 │   │   │   │   for feeder in self.pipe.unet.input_feeder:                                 │
│   200 │   │   │   │   │   feeder(ex_input_dict)                                                  │
│   201 │   │   │                                                                                  │
│ ❱ 202 │   │   │   images = self.pipe(prompt_embeds=emb_p, negative_prompt_embeds=emb_n, **kwar   │
│   203 │   │   return images                                                                      │
│   204 │                                                                                          │
│   205 │   def save_images(self, images, root, prompt, negative_prompt='', save_cfg=True):        │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py:115 in decorate_context       │
│                                                                                                  │
│   112 │   @functools.wraps(func)                                                                 │
│   113 │   def decorate_context(*args, **kwargs):                                                 │
│   114 │   │   with ctx_factory():                                                                │
│ ❱ 115 │   │   │   return func(*args, **kwargs)                                                   │
│   116 │                                                                                          │
│   117 │   return decorate_context                                                                │
│   118                                                                                            │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_dif │
│ fusion.py:755 in __call__                                                                        │
│                                                                                                  │
│   752 │   │   │   │   │   │   callback(i, t, latents)                                            │
│   753 │   │                                                                                      │
│   754 │   │   if not output_type == "latent":                                                    │
│ ❱ 755 │   │   │   image = self.vae.decode(latents / self.vae.config.scaling_factor, return_dic   │
│   756 │   │   │   image, has_nsfw_concept = self.run_safety_checker(image, device, prompt_embe   │
│   757 │   │   else:                                                                              │
│   758 │   │   │   image = latents                                                                │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/diffusers/utils/accelerate_utils.py:46 in wrapper        │
│                                                                                                  │
│   43 │   def wrapper(self, *args, **kwargs):                                                     │
│   44 │   │   if hasattr(self, "_hf_hook") and hasattr(self._hf_hook, "pre_forward"):             │
│   45 │   │   │   self._hf_hook.pre_forward(self)                                                 │
│ ❱ 46 │   │   return method(self, *args, **kwargs)                                                │
│   47 │                                                                                           │
│   48 │   return wrapper                                                                          │
│   49                                                                                             │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/diffusers/models/autoencoder_kl.py:191 in decode         │
│                                                                                                  │
│   188 │   │   │   decoded_slices = [self._decode(z_slice).sample for z_slice in z.split(1)]      │
│   189 │   │   │   decoded = torch.cat(decoded_slices)                                            │
│   190 │   │   else:                                                                              │
│ ❱ 191 │   │   │   decoded = self._decode(z).sample                                               │
│   192 │   │                                                                                      │
│   193 │   │   if not return_dict:                                                                │
│   194 │   │   │   return (decoded,)                                                              │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/diffusers/models/autoencoder_kl.py:177 in _decode        │
│                                                                                                  │
│   174 │   │   if self.use_tiling and (z.shape[-1] > self.tile_latent_min_size or z.shape[-2] >   │
│   175 │   │   │   return self.tiled_decode(z, return_dict=return_dict)                           │
│   176 │   │                                                                                      │
│ ❱ 177 │   │   z = self.post_quant_conv(z)                                                        │
│   178 │   │   dec = self.decoder(z)                                                              │
│   179 │   │                                                                                      │
│   180 │   │   if not return_dict:                                                                │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1501 in _call_impl            │
│                                                                                                  │
│   1498 │   │   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   │
│   1499 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hooks                   │
│   1500 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1501 │   │   │   return forward_call(*args, **kwargs)                                          │
│   1502 │   │   # Do not call functions when jit is used                                          │
│   1503 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1504 │   │   backward_pre_hooks = []                                                           │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:463 in forward                  │
│                                                                                                  │
│    460 │   │   │   │   │   │   self.padding, self.dilation, self.groups)                         │
│    461 │                                                                                         │
│    462 │   def forward(self, input: Tensor) -> Tensor:                                           │
│ ❱  463 │   │   return self._conv_forward(input, self.weight, self.bias)                          │
│    464                                                                                           │
│    465 class Conv3d(_ConvNd):                                                                    │
│    466 │   __doc__ = r"""Applies a 3D convolution over an input signal composed of several inpu  │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459 in _conv_forward            │
│                                                                                                  │
│    456 │   │   │   return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=sel  │
│    457 │   │   │   │   │   │   │   weight, bias, self.stride,                                    │
│    458 │   │   │   │   │   │   │   _pair(0), self.dilation, self.groups)                         │
│ ❱  459 │   │   return F.conv2d(input, weight, bias, self.stride,                                 │
│    460 │   │   │   │   │   │   self.padding, self.dilation, self.groups)                         │
│    461 │                                                                                         │
│    462 │   def forward(self, input: Tensor) -> Tensor:                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.HalfTensor) should be the same

Bug Report: ConfigAttributeError: Missing key clip_skip

Please add clip_skip=2 in the cfgs/infer/v1.yaml

Below is my inference config, using enma_ai's lora trained by HCP Diffusion.

pretrained_model: /home/yabin/HCP-Diffusion/converted_models/Acertain
prompt: enma_ai, 1girl, tree, solo, plant, bush, outdoors, grass, garden, nature,
  branch, day, sky,
neg_prompt: lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit,
  fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts,
  signature, watermark, username, blurry
out_dir: output/
emb_dir: embs/
N_repeats: 1
bs: 4
num: 1
seed: null
fp16: true
clip_skip: 2
save:
  save_cfg: true
  image_type: png
  quality: 95
infer_args:
  width: 512
  height: 768
  guidance_scale: 7.5
new_components: {}
merge:
  exp_dir: 2023-04-17-21-59-53
  alpha: 0.8
  group1:
    type: unet
    base_model_alpha: 1.0
    lora:
    - path: exps/${....exp_dir}/ckpts/unet-2000.safetensors
      alpha: ${....alpha}
      layers: all
      mask: null
    part: null
  group2:
    type: TE
    base_model_alpha: 1.0
    lora:
    - path: exps/${....exp_dir}/ckpts/text_encoder-2000.safetensors
      alpha: ${....alpha}
      layers: all
      mask: null
    part: null

Below is the generated image with ACertain, which is also the base model I trained on.

Currently, I am unable to evaluate the quality of this LoRa model compared to my previous LoRa model trained using sd-script, mainly because there are numerous training script settings in this repository that I do not understand. However, I am confident that this is an excellent project. The code is clean and easy to use, with support for training on Linux systems and multiple GPUs. There are many algorithms available (although I am not sure of the exact number of methods that have been implemented).

Expected to see convertion scripts to transfer this lora format to general lora format, which can be used in Webui with more controlable parameters.

By the way, I'm curious about the Rank parameter in the configuration file. Initially, I thought it meant dim in sd-script, but when I set it to 128, the final output model's size was more than 200MB (while this dim setting is only 144MB in sd-script). I also noticed that the scale parameter is available in the code, but currently, I am unable to set it using the configuration file. I believe that the scale is equivalent to 'alpha'.

Does it support sdxl ?

SDXL training VRAM and speed optimization

An error occurs when using DreamArtist with SDXL.

I was able to use DreamArtist with SD1.5, but an error occurs with SDXL.

RuntimeError: split_with_sizes expects split_sizes to sum exactly to 768 (input tensor's size at dimension 1), but got split_sizes=[768, 1280]

Do you know what's causing it?

如何配置，能让lora和text inversion同时训练，

自动下载的HuggingFace模型

请问自动下载的HuggingFace模型存在本地的路径在哪？

Model conversion error

Model conversion from webui to diffusers framework wont work with diffusers==0.18.x
Trying to do so will yield the following error:
"TypeError: convert_ldm_clip_checkpoint() got an unexpected keyword argument 'text_encoder'"
Downgrading to diffusers==0.17.1 fixes this issue.
Perhaps the requirements should be modified to enforce the use of 0.17.x diffusers lib.

lora训练生成图片

cfgs/infer/anime/text2img_anime_lora.yaml
文件中的_base_运行会报错：没有cfgs/infer/text2img_anime.yaml文件
改为cfgs/infer/anime/text2img_anime.yaml就不会报这个错误

Fine-tuning训练出的checkpoint在webui如何使用

训练出来的ckpt或safetensors都没能在webui使用，是要通过某个脚本转换吗？

I can't find "Train with Reconfiguration" used by DreamArtist in Webui.

Hello.
I'm currently using HCP-Diffusion, but previously I was using it with webui.
"Train with Reconfiguration" not found. Which item should I look at?

Errors at the start of training

/workspace/HCP-Diffusion# accelerate launch -m hcpdiff.train_ac_single
--cfg cfgs/train/examples/lora_anime_character.yaml
character_name=noah
dataset_dir=/workspace/HCP-Diffusion/data/noah
wandb is not available
/usr/lib/python3.10/runpy.py:126: RuntimeWarning: 'hcpdiff.train_ac_single' found in sys.modules after import of package 'hcpdiff', but prior to execution of 'hcpdiff.train_ac_single'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
2023-10-13 08:03:42.162 | INFO | hcpdiff.loggers.cli_logger:_info:30 - world_size: 1
2023-10-13 08:03:42.162 | INFO | hcpdiff.loggers.cli_logger:_info:30 - accumulation: 1
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
2023-10-13 08:03:45.678 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: noah, len: 4, id: 28806
2023-10-13 08:03:45.830 | INFO | hcpdiff.data.caption_loader:load:18 - 2 record(s) loaded with JsonCaptionLoader, from path '/workspace/HCP-Diffusion/data/noah/image_captions.json'
2023-10-13 08:03:45.831 | INFO | hcpdiff.data.bucket:build_buckets_from_images:241 - build buckets from images size
/usr/local/lib/python3.10/dist-packages/sklearn/cluster/_kmeans.py:1416: FutureWarning: The default value of n_init will change from 10 to 'auto' in 1.4. Set the value of n_init explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
2023-10-13 08:03:45.851 | INFO | hcpdiff.data.bucket:build_buckets_from_images:262 - buckets info: size:[640 896], num:2
2023-10-13 08:03:45.851 | INFO | hcpdiff.loggers.cli_logger:_info:30 - len(train_dataset): 4
0%| | 0/4 [00:00<?, ?it/s]/workspace/HCP-Diffusion/hcpdiff/data/pair_dataset.py:107: FutureWarning: Accessing config attribute scaling_factor directly via 'AutoencoderKL' object attribute is deprecated. Please access 'scaling_factor' over 'AutoencoderKL's config object instead, e.g. 'unet.config.scaling_factor'.
data['img'] = (latents*vae.scaling_factor).cpu()
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 11.16it/s]
/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py:128: FutureWarning: The configuration file of this scheduler: PNDMScheduler {
"_class_name": "PNDMScheduler",
"_diffusers_version": "0.21.4",
"beta_end": 0.012,
"beta_schedule": "scaled_linear",
"beta_start": 0.00085,
"num_train_timesteps": 1000,
"prediction_type": "epsilon",
"set_alpha_to_one": false,
"skip_prk_steps": false,
"steps_offset": 0,
"timestep_spacing": "leading",
"trained_betas": null
}
is outdated. steps_offset should be set to 1 instead of 0. Please make sure to update the config accordingly as leaving steps_offset might led to incorrect results in future versions. If you have downloaded this checkpoint from the Hugging Face Hub, it would be very nice if you could open a Pull request for the scheduler/scheduler_config.json file
deprecate("steps_offset!=1", "1.0.0", deprecation_message, standard_warn=False)
2023-10-13 08:03:46.798 | INFO | hcpdiff.loggers.cli_logger:_info:30 - ***** Running training *****
2023-10-13 08:03:46.798 | INFO | hcpdiff.loggers.cli_logger:_info:30 - Num batches each epoch = 1
2023-10-13 08:03:46.798 | INFO | hcpdiff.loggers.cli_logger:_info:30 - Num Steps = 1000
2023-10-13 08:03:46.798 | INFO | hcpdiff.loggers.cli_logger:_info:30 - Instantaneous batch size per device = 4
2023-10-13 08:03:46.799 | INFO | hcpdiff.loggers.cli_logger:_info:30 - Total train batch size (w. parallel, distributed & accumulation) = 4
2023-10-13 08:03:46.799 | INFO | hcpdiff.loggers.cli_logger:_info:30 - Gradient Accumulation steps = 1
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/workspace/HCP-Diffusion/hcpdiff/train_ac_single.py", line 61, in
trainer.train()
File "/workspace/HCP-Diffusion/hcpdiff/train_ac.py", line 383, in train
loss = self.train_one_step(data_list)
File "/workspace/HCP-Diffusion/hcpdiff/train_ac.py", line 480, in train_one_step
self.optimizer_pt.step()
File "/usr/local/lib/python3.10/dist-packages/accelerate/optimizer.py", line 132, in step
self.scaler.step(self.optimizer, closure)
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py", line 372, in step
assert len(optimizer_state["found_inf_per_device"]) > 0, "No inf checks were recorded for this optimizer."
AssertionError: No inf checks were recorded for this optimizer.
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 986, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 628, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python', '-m', 'hcpdiff.train_ac_single', '--cfg', 'cfgs/train/examples/lora_anime_character.yaml', 'character_name=noah', 'dataset_dir=/workspace/HCP-Diffusion/data/noah']' returned non-zero exit status 1.

I've been trying all day and haven't been able to solve it.
OS: Ubuntu 22.04.2 LTS

为什么安装依赖把GPU版的torch卸载了？

how to train with controlnet

Hi,
may i ask how to train controlnet with this repo? as I see that training controlnet is mentioned in readme.
many thanks!

sd2diffusers 脚本转换到 diffusers 报错

转换sd1.5模型时报错
版本：14bfef5e2e6b6f30bcaa492fb5292982c8ab78ec
堆栈：

Traceback (most recent call last):
  File "/root/git/hcp_proj/env/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/root/git/hcp_proj/env/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/root/git/hcp_proj/HCP-Diffusion/hcpdiff/tools/sd2diffusers.py", line 372, in <module>
    convert_ckpt(args)
  File "/root/git/hcp_proj/HCP-Diffusion/hcpdiff/tools/sd2diffusers.py", line 210, in convert_ckpt
    pipe = load_sd_ckpt(
TypeError: download_from_original_stable_diffusion_ckpt() got an unexpected keyword argument 'checkpoint_path'

download_from_original_stable_diffusion_ckpt 没有 checkpoint_path 参数

希望能接入wandb的训练日志

https://wandb.ai/
在配置文件里面填写上appkey，配置测试图的Prompt文件路径，自动每隔多少步出一张测试图

DreamArtist训练的safetensors只有1KB

我的训练步骤：

使用runwayml/stable-diffusion-v1-5生成了两个embedding
修改yaml文件
训练
问题：
训练的结果

在生成图片时会报错

Does this work well for SDXL training?

I can't tell if it does or doesn't since I need to go back to university just to be able to use it from what I am seeing. Hardest SD based tool I have yet to attempt to try AND it has no real help on it. No vids, no guides, and nothing to help a new user to your tool, just a blinding white wiki documentation page that holds no real value of helping.

Even the tutorial section are all 404 errors.

I searched the Internet and came up empty-handed for any info, regardless of which version, for help on your tool.,

Who uses this tool, and does it work well for SDXL training types?

无法通过 JsonCaptionLoader 正确加载由 gen_from_ptlist 脚本生成的 caption

gen_from_ptlist 生成的 json key带文件扩展名。TXTCaptionLoader 加载的时候默认剔除了文件扩展名。JsonCaptionLoader / YamlCaptionLoader没有剔除文件扩展名。

各个模块似乎没有统一带不带扩展名。

Bug report. I think we need MSELoss.

Please do no remove mse_loss.py. I noticed that the file "mse_loss.py" has been removed, specifically the MSELoss module. However, this removal has caused some unexpected errors in the current version.