Currently when adjusting the image size parameter on the guided diffusion model in the s2ML notebook, it will fail with the following error:
RuntimeError: Error(s) in loading state_dict for UNetModel:
Missing key(s) in state_dict: "input_blocks.7.0.skip_connection.weight", "input_blocks.7.0.skip_connection.bias", "input_blocks.7.1.norm.weight", "input_blocks.7.1.norm.bias", "input_blocks.7.1.qkv.weight", "input_blocks.7.1.qkv.bias", "input_blocks.7.1.proj_out.weight", "input_blocks.7.1.proj_out.bias", "input_blocks.8.1.norm.weight", "input_blocks.8.1.norm.bias", "input_blocks.8.1.qkv.weight", "input_blocks.8.1.qkv.bias", "input_blocks.8.1.proj_out.weight", "input_blocks.8.1.proj_out.bias", "input_blocks.10.1.norm.weight", "input_blocks.10.1.norm.bias", "input_blocks.10.1.qkv.weight", "input_blocks.10.1.qkv.bias", "input_blocks.10.1.proj_out.weight", "input_blocks.10.1.proj_out.bias", "input_blocks.11.1.norm.weight", "input_blocks.11.1.norm.bias", "input_blocks.11.1.qkv.weight", "input_blocks.11.1.qkv.bias", "input_blocks.11.1.proj_out.weight", "input_blocks.11.1.proj_out.bias", "input_blocks.13.0.skip_connection.weight", "input_blocks.13.0.skip_connection.bias".
Unexpected key(s) in state_dict: "input_blocks.15.0.in_layers.0.weight",
... [omitted for brevity]
"input_blocks.17.0.out_layers.3.weight", "input_blocks.17.0.out_layers....
size mismatch for input_blocks.0.0.weight: copying a param with shape torch.Size([128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 3, 3, 3]).
size mismatch for input_blocks.0.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.1.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.1.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.1.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
... [omitted for brevity]
I would be happy to take a stab at this if you have any ideas. But i'm not entirely sure this request even makes sense as I'm still learning about the diffusion model and how it works. Is it possible to have it generate smaller images? (and ultimately reduce the memory footprint of the model).
If you have other ideas of how to reduce vram usage i would be interested in hearing that as well / discussing further!