nerdyrodent / vqgan-clip Goto Github PK

Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.

License: Other

Python 85.19% Shell 14.81%

vqgan-clip's Introduction

VQGAN-CLIP Overview

A repo for running VQGAN+CLIP locally. This started out as a Katherine Crowson VQGAN+CLIP derived Google colab notebook.

Original notebook:

Some example images:

Environment:

Tested on Ubuntu 20.04
GPU: Nvidia RTX 3090
Typical VRAM requirements:
- 24 GB for a 900x900 image
- 10 GB for a 512x512 image
- 8 GB for a 380x380 image

You may also be interested in CLIP Guided Diffusion

Set up

This example uses Anaconda to manage virtual Python environments.

Create a new virtual Python environment for VQGAN-CLIP:

conda create --name vqgan python=3.9
conda activate vqgan

Install Pytorch in the new enviroment:

Note: This installs the CUDA version of Pytorch, if you want to use an AMD graphics card, read the AMD section below.

pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

Install other required Python packages:

pip install ftfy regex tqdm omegaconf pytorch-lightning IPython kornia imageio imageio-ffmpeg einops torch_optimizer setuptools==59.5.0

Or use the requirements.txt file, which includes version numbers.

Clone required repositories:

git clone 'https://github.com/nerdyrodent/VQGAN-CLIP'
cd VQGAN-CLIP
git clone 'https://github.com/openai/CLIP'
git clone 'https://github.com/CompVis/taming-transformers'

Note: In my development environment both CLIP and taming-transformers are present in the local directory, and so aren't present in the requirements.txt or vqgan.yml files.

As an alternative, you can also pip install taming-transformers and CLIP.

You will also need at least 1 VQGAN pretrained model. E.g.

mkdir checkpoints

curl -L -o checkpoints/vqgan_imagenet_f16_16384.yaml -C - 'https://heibox.uni-heidelberg.de/d/a7530b09fed84f80a887/files/?p=%2Fconfigs%2Fmodel.yaml&dl=1' #ImageNet 16384
curl -L -o checkpoints/vqgan_imagenet_f16_16384.ckpt -C - 'https://heibox.uni-heidelberg.de/d/a7530b09fed84f80a887/files/?p=%2Fckpts%2Flast.ckpt&dl=1' #ImageNet 16384

Note that users of curl on Microsoft Windows should use double quotes.

The download_models.sh script is an optional way to download a number of models. By default, it will download just 1 model.

See https://github.com/CompVis/taming-transformers#overview-of-pretrained-models for more information about VQGAN pre-trained models, including download links.

By default, the model .yaml and .ckpt files are expected in the checkpoints directory. See https://github.com/CompVis/taming-transformers for more information on datasets and models.

Video guides are also available:

Linux - https://www.youtube.com/watch?v=1Esb-ZjO7tw
Windows - https://www.youtube.com/watch?v=XH7ZP0__FXs

Using an AMD graphics card

Note: This hasn't been tested yet.

ROCm can be used for AMD graphics cards instead of CUDA. You can check if your card is supported here: https://github.com/RadeonOpenCompute/ROCm#supported-gpus

Install ROCm accordng to the instructions and don't forget to add the user to the video group: https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html

The usage and set up instructions above are the same, except for the line where you install Pytorch. Instead of pip install torch==1.9.0+cu111 ..., use the one or two lines which are displayed here (select Pip -> Python-> ROCm): https://pytorch.org/get-started/locally/

Using the CPU

If no graphics card can be found, the CPU is automatically used and a warning displayed.

Regardless of an available graphics card, the CPU can also be used by adding this command line argument: -cd cpu

This works with the CUDA version of Pytorch, even without CUDA drivers installed, but doesn't seem to work with ROCm as of now.

Uninstalling

Remove the Python enviroment:

conda remove --name vqgan --all

and delete the VQGAN-CLIP directory.

Run

To generate images from text, specify your text prompt as shown in the example below:

python generate.py -p "A painting of an apple in a fruit bowl"

Multiple prompts

Text and image prompts can be split using the pipe symbol in order to allow multiple prompts. You can also use a colon followed by a number to set a weight for that prompt. For example:

python generate.py -p "A painting of an apple in a fruit bowl | psychedelic | surreal:0.5 | weird:0.25"

Image prompts can be split in the same way. For example:

python generate.py -p "A picture of a bedroom with a portrait of Van Gogh" -ip "samples/VanGogh.jpg | samples/Bedroom.png"

Story mode

Sets of text prompts can be created using the caret symbol, in order to generate a sort of story mode. For example:

python generate.py -p "A painting of a sunflower|photo:-1 ^ a painting of a rose ^ a painting of a tulip ^ a painting of a daisy flower ^ a photograph of daffodil" -cpe 1500 -zvid -i 6000 -zse 10 -vl 20 -zsc 1.005 -opt Adagrad -lr 0.15 -se 6000

"Style Transfer"

An input image with style text and a low number of iterations can be used create a sort of "style transfer" effect. For example:

python generate.py -p "A painting in the style of Picasso" -ii samples/VanGogh.jpg -i 80 -se 10 -opt AdamW -lr 0.25

Output	Style
	Picasso
	Sketch
	Psychedelic

A video style transfer effect can be achived by specifying a directory of video frames in video_style_dir. Output will be saved in the steps directory, using the original video frame filenames. You can also use this as a sort of "batch mode" if you have a directory of images you want to apply a style to. This can also be combined with Story Mode if you don't wish to apply the same style to every images, but instead roll through a list of styles.

Feedback example

By feeding back the generated images and making slight changes, some interesting effects can be created.

The example zoom.sh shows this by applying a zoom and rotate to generated images, before feeding them back in again. To use zoom.sh, specifying a text prompt, output filename and number of frames. E.g.

./zoom.sh "A painting of a red telephone box spinning through a time vortex" Telephone.png 150

If you don't have ImageMagick installed, you can install it with sudo apt install imagemagick

There is also a simple zoom video creation option available. For example:

python generate.py -p "The inside of a sphere" -zvid -i 4500 -zse 20 -vl 10 -zsc 0.97 -opt Adagrad -lr 0.15 -se 4500

Random text example

Use random.sh to make a batch of images from random text. Edit the text and number of generated images to your taste!

./random.sh

Advanced options

To view the available options, use "-h".

python generate.py -h

usage: generate.py [-h] [-p PROMPTS] [-ip IMAGE_PROMPTS] [-i MAX_ITERATIONS] [-se DISPLAY_FREQ]
[-s SIZE SIZE] [-ii INIT_IMAGE] [-in INIT_NOISE] [-iw INIT_WEIGHT] [-m CLIP_MODEL]
[-conf VQGAN_CONFIG] [-ckpt VQGAN_CHECKPOINT] [-nps [NOISE_PROMPT_SEEDS ...]]
[-npw [NOISE_PROMPT_WEIGHTS ...]] [-lr STEP_SIZE] [-cuts CUTN] [-cutp CUT_POW] [-sd SEED]
[-opt {Adam,AdamW,Adagrad,Adamax,DiffGrad,AdamP,RAdam,RMSprop}] [-o OUTPUT] [-vid] [-zvid]
[-zs ZOOM_START] [-zse ZOOM_FREQUENCY] [-zsc ZOOM_SCALE] [-cpe PROMPT_FREQUENCY]
[-vl VIDEO_LENGTH] [-ofps OUTPUT_VIDEO_FPS] [-ifps INPUT_VIDEO_FPS] [-d]
[-aug {Ji,Sh,Gn,Pe,Ro,Af,Et,Ts,Cr,Er,Re} [{Ji,Sh,Gn,Pe,Ro,Af,Et,Ts,Cr,Er,Re} ...]]
[-cd CUDA_DEVICE]

optional arguments:
  -h, --help            show this help message and exit
  -p PROMPTS, --prompts PROMPTS
                        Text prompts
  -ip IMAGE_PROMPTS, --image_prompts IMAGE_PROMPTS
                        Image prompts / target image
  -i MAX_ITERATIONS, --iterations MAX_ITERATIONS
                        Number of iterations
  -se DISPLAY_FREQ, --save_every DISPLAY_FREQ
                        Save image iterations
  -s SIZE SIZE, --size SIZE SIZE
                        Image size (width height) (default: [512, 512])
  -ii INIT_IMAGE, --init_image INIT_IMAGE
                        Initial image
  -in INIT_NOISE, --init_noise INIT_NOISE
                        Initial noise image (pixels or gradient)
  -iw INIT_WEIGHT, --init_weight INIT_WEIGHT
                        Initial weight
  -m CLIP_MODEL, --clip_model CLIP_MODEL
                        CLIP model (e.g. ViT-B/32, ViT-B/16)
  -conf VQGAN_CONFIG, --vqgan_config VQGAN_CONFIG
                        VQGAN config
  -ckpt VQGAN_CHECKPOINT, --vqgan_checkpoint VQGAN_CHECKPOINT
                        VQGAN checkpoint
  -nps [NOISE_PROMPT_SEEDS ...], --noise_prompt_seeds [NOISE_PROMPT_SEEDS ...]
                        Noise prompt seeds
  -npw [NOISE_PROMPT_WEIGHTS ...], --noise_prompt_weights [NOISE_PROMPT_WEIGHTS ...]
                        Noise prompt weights
  -lr STEP_SIZE, --learning_rate STEP_SIZE
                        Learning rate
  -cuts CUTN, --num_cuts CUTN
                        Number of cuts
  -cutp CUT_POW, --cut_power CUT_POW
                        Cut power
  -sd SEED, --seed SEED
                        Seed
  -opt, --optimiser {Adam,AdamW,Adagrad,Adamax,DiffGrad,AdamP,RAdam,RMSprop}
                        Optimiser
  -o OUTPUT, --output OUTPUT
                        Output file
  -vid, --video         Create video frames?
  -zvid, --zoom_video   Create zoom video?
  -zs ZOOM_START, --zoom_start ZOOM_START
                        Zoom start iteration
  -zse ZOOM_FREQUENCY, --zoom_save_every ZOOM_FREQUENCY
                        Save zoom image iterations
  -zsc ZOOM_SCALE, --zoom_scale ZOOM_SCALE
                        Zoom scale
  -cpe PROMPT_FREQUENCY, --change_prompt_every PROMPT_FREQUENCY
                        Prompt change frequency
  -vl VIDEO_LENGTH, --video_length VIDEO_LENGTH
                        Video length in seconds
  -ofps OUTPUT_VIDEO_FPS, --output_video_fps OUTPUT_VIDEO_FPS
                        Create an interpolated video (Nvidia GPU only) with this fps (min 10. best set to 30 or 60)
  -ifps INPUT_VIDEO_FPS, --input_video_fps INPUT_VIDEO_FPS
                        When creating an interpolated video, use this as the input fps to interpolate from (>0 & <ofps)
  -d, --deterministic   Enable cudnn.deterministic?
  -aug, --augments {Ji,Sh,Gn,Pe,Ro,Af,Et,Ts,Cr,Er,Re} [{Ji,Sh,Gn,Pe,Ro,Af,Et,Ts,Cr,Er,Re} ...]
                        Enabled augments
  -cd CUDA_DEVICE, --cuda_device CUDA_DEVICE
                        Cuda device to use

Troubleshooting

CUSOLVER_STATUS_INTERNAL_ERROR

For example:

RuntimeError: cusolver error: CUSOLVER_STATUS_INTERNAL_ERROR, when calling cusolverDnCreate(handle)

Make sure you have specified the correct size for the image.

RuntimeError: CUDA out of memory

For example:

RuntimeError: CUDA out of memory. Tried to allocate 150.00 MiB (GPU 0; 23.70 GiB total capacity; 21.31 GiB already allocated; 78.56 MiB free; 21.70 GiB reserved in total by PyTorch)

Your request doesn't fit into your GPU's VRAM. Reduce the image size and/or number of cuts.

Citations

@misc{unpublished2021clip,
    title  = {CLIP: Connecting Text and Images},
    author = {Alec Radford, Ilya Sutskever, Jong Wook Kim, Gretchen Krueger, Sandhini Agarwal},
    year   = {2021}
}

@misc{esser2020taming,
      title={Taming Transformers for High-Resolution Image Synthesis}, 
      author={Patrick Esser and Robin Rombach and Björn Ommer},
      year={2020},
      eprint={2012.09841},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Katherine Crowson - https://github.com/crowsonkb

Public Domain images from Open Access Images at the Art Institute of Chicago - https://www.artic.edu/open-access/open-access-images

vqgan-clip's People

Contributors

Stargazers

Watchers

Forkers

c00renut jiupinjia trendingtechnology prince-xuanchan mrsci zouxiaoliang drjkl pyr-000 dribnet macostea mazzo23 thecrazyphysicist369 fujified thehappydinoa the-emerald bondarchukb hypergan david1mdavis scyheidekamp yas emc23 notadoktor biggieboo18 pingponglabs thoughtworksarts duyuankai1992 jingweiz peternara hiromi-nee ml-and-ai-repo bmr070 zzw-zwzhang jeromeku moniqueyj ajinkyapuar markuspennanen giacomocariello afiaka87 celsinga foglerek rhomber 4sunshine sgimmini addisand microraptor titomakanijr rob813 drew00012 dberzon lucymatch zafergraph vanceagher uberduck-ai jsonmartin loganlxb fitfingers preoctopus willtejeda gabgren imranmu russdb rlallen-nps neuroidss huberalberti caoxiaoxiao0720 brianhaddock tra0420 stellaathena techthiyanes anishone davisg123 wolfgangmeyers amitparmar01 pwhiddy mrtekilla gvijqb gbark dlebech platypussdivva ssd532 lucashadfield radioshaolin rahul-art 0xdeus phreakocious yujielu10 mssnft keplaxo medigato acmilabs manoj-arunachalam joetm sirwaffle xiankgx jimimased jasonhoku clayheaton beasteers luciedits rickvenema

vqgan-clip's Issues

Make output filename = text prompt by default

Is there any way to make the program save each final product as prompt.jpg instead of overwriting output.jpg without manually telling it to use a different name at the start?

requirements.txt

Hey! Would you mind adding a requirements.txt? I'm really just looking for the version #s of the relevant repos that are used here. It should be straightforward to extract from the output of "pip freeze". Thanks in advance!

about args.augment ，What are parameters for

I pulled the new code and found that many new parameters appeared. I would like to know the functions of these parameters. I am looking forward to your reply

This error occurred when I adjusted the code to make it easier for the output path to be not output.png, but I could not solve it. This error also occurred when I downloaded a new source code, I do not know where I got it wrong

problems with video generation

when I try running the example line python generate.py -p "The inside of a sphere" -zvid -i 4500 -zse 20 -vl 10 -zsc 0.97 -opt Adagrad -lr 0.15 -se 4500 i receive an error that states
File "C:\Users\user\VQGAN-CLIP\generate.py", line 808, in <module> p = Popen(['ffmpeg', File "C:\Users\user\anaconda3\envs\vqgan\lib\subprocess.py", line 951, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "C:\Users\user\anaconda3\envs\vqgan\lib\subprocess.py", line 1420, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, FileNotFoundError: [WinError 2] The system cannot find the file specified

im not really sure where to go withh this error and I dont know if this is a problem on my side or something with the program

custom filename and location for video is giving errors

-o flag is working properly in the case of image generation, but there is no specific information is available on how to create video with custom name. In case of providing a file name with any extension the script result in the following error

ValueError: unknown file extension: .png'

On windows we cannot use the zoom.sh script in conda prompt. So using the command

python generate.py -p "An apple in a bowl" -zvid -i 2000 -vl 10 -o "output/test.mp4"

Saving each iteration to create a video

Is there a way I can save each image along the process rather than just the final output? Then using ffmpeg to combine the images into an animation? I've got it working on my PC! just interested in that feature as I can't get 900*900 on collab.. Thanks!

disable CUDA & run on CPU?

I see it in the code, but I'm not sure how to disable it?

requires_grad_ is not supported on ScriptModules

using cuda 11.2, built torch from source

Traceback (most recent call last):
  File "/home/julianallchin/github/VQGAN-CLIP/generate.py", line 548, in <module>
    perceptor = clip.load(args.clip_model, jit=jit)[0].eval().requires_grad_(False).to(device)
  File "/home/julianallchin/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/jit/_script.py", line 915, in fail
    raise RuntimeError(name + " is not supported on ScriptModules")
RuntimeError: requires_grad_ is not supported on ScriptModules

not an issue -

on your youtube videos - you have a ubuntu desktop showing some heads up display.
does it show gpu? whats the name?

Can't go past iteration 1500

Temporary solution is to pass in a 'cpe' argument that is any value greater the 'i' argument. As in:

python generate.py -i 2000 -p "vase" -cpe 3000

Error message points to generate.py line 620
Code is trying to go to the next story step when prompt is a single sentence.

About code implementation Feedback example

I particularly like this example, which is a great discovery. Can you use code to realize this example? I'm running under WIN, but I can't realize zoom.sh
Is there any text prompt that can be generated automatically? I wonder if I can generate it myself ,
replace random.sh

Why do I get different ouputs when using the same input on a different repo?

I got an amazing output that i got on the google collab notebook and am trying to replicate it to a larger scale running locally on my 3090. For some reason the outputs appear to have a different style than that of collab (im using the same model, prompt, seed and save interval..)

Is there something thats been altered ? Is it to do with the optimizer or learning rate? (these can't be specified on the collab notebook).

Thanks alot for this bit of software it has given me hours of experimenting and fun!

seed argument on random.sh

When I use '--seed 42' on generate.py it performs as expected but when using random.py it doesn't appear to be using seed 42, or at least the print command isn't listing the same value. It doesn't make sense that it's not behaving the same. Any ideas?

Problem unidentified by a newbie (me)

Hello, I followed your video (thank a lot by the way, it seems like I did not followed well actually)
Maybe you'll understand what I can do at this point :

(vqgan) C:\Users\Milaj\github\VQGAN-CLIP>python generate.py -p "A painting of an apple in a fruit bowl"
Traceback (most recent call last):
File "C:\Users\Milaj\github\VQGAN-CLIP\generate.py", line 466, in
model = load_vqgan_model(args.vqgan_config, args.vqgan_checkpoint).to(device)
File "C:\Users\Milaj\github\VQGAN-CLIP\generate.py", line 436, in load_vqgan_model
config = OmegaConf.load(config_path)
File "C:\Users\Milaj\anaconda3\envs\vqgan\lib\site-packages\omegaconf\omegaconf.py", line 183, in load
with io.open(os.path.abspath(file_), "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\Milaj\github\VQGAN-CLIP\checkpoints\vqgan_imagenet_f16_16384.yaml'

Thank you in advance, tell me if you need more infos

Requirements

Are there any specific requirements to get this working? ie. Do I need an NVIDIA GPU / CUDA?

RuntimeError: cusolver error: CUSOLVER_STATUS_INTERNAL_ERROR, when calling `cusolverDnCreate(handle)`

Hey!

Thanks for this, I am so ready to create bizarreness.

Hardware:
Ryzen 7 3700X
32GB RAM
RTX 2070 Super

OS: Windows 10 Pro

I'm getting the below error when running generate.py:

python generate.py -p "Yee"

Output:
(vqgan) PS C:\Users\andre\anaconda3\envs\vqgan\VQGAN-CLIP> python generate.py -p "Yee" Working with z of shape (1, 256, 16, 16) = 65536 dimensions. loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips\vgg.pth VQLPIPSWithDiscriminator running with hinge loss. Restored from checkpoints/vqgan_imagenet_f16_16384.ckpt C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\torchvision\transforms\transforms.py:280: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum. warnings.warn( Using device: cuda:0 Optimising using: Adam Using text prompts: ['Yee'] Using seed: 329366907029900 0it [00:01, ?it/s] Traceback (most recent call last): File "C:\Users\andre\anaconda3\envs\vqgan\VQGAN-CLIP\generate.py", line 461, in <module> train(i) File "C:\Users\andre\anaconda3\envs\vqgan\VQGAN-CLIP\generate.py", line 444, in train lossAll = ascend_txt() File "C:\Users\andre\anaconda3\envs\vqgan\VQGAN-CLIP\generate.py", line 423, in ascend_txt iii = perceptor.encode_image(normalize(make_cutouts(out))).float() File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\andre\anaconda3\envs\vqgan\VQGAN-CLIP\generate.py", line 241, in forward batch = self.augs(torch.cat(cutouts, dim=0)) File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\torch\nn\modules\container.py", line 139, in forward input = module(input) File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\kornia\augmentation\base.py", line 245, in forward output = self.apply_func(in_tensor, in_transform, self._params, return_transform) File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\kornia\augmentation\base.py", line 210, in apply_func output[to_apply] = self.apply_transform(in_tensor[to_apply], params, trans_matrix[to_apply]) File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\kornia\augmentation\augmentation.py", line 684, in apply_transform return warp_affine( File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\kornia\geometry\transform\imgwarp.py", line 192, in warp_affine dst_norm_trans_src_norm: torch.Tensor = normalize_homography(M_3x3, (H, W), dsize) File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\kornia\geometry\transform\homography_warper.py", line 380, in normalize_homography src_pix_trans_src_norm = _torch_inverse_cast(src_norm_trans_src_pix) File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\kornia\utils\helpers.py", line 48, in _torch_inverse_cast return torch.inverse(input.to(dtype)).to(input.dtype) RuntimeError: cusolver error: CUSOLVER_STATUS_INTERNAL_ERROR, when calling cusolverDnCreate(handle)
`

No module named 'CLIP'

After following your video—with the conda approach, making the environment, updating it with the .yml and getting torch==1.9.0—I am getting the following error from generate.py:

ModuleNotFoundError: No module named 'CLIP'

I tried to even install the CLIP repo via pip before re-installing torch and everything else but it didn't work...

I am sure this is a silly issue

How to save generated art?

Hi there! I'm completely new to this.
Where are the images saved after generation? Sorry, if this question is stupid.

attempting to spit out an image in the style of....

I use this
python generate.py --image_prompts '/home/jp/Desktop/7531b6c8513bbbbdd5913cb396f5f221.png' --prompts "dream" -i 1000

but results not quite there.

I'll have a play with the iw parameter and see if I can get a look and feel like the original image.

Simple zoom option zooms into corner?

Excuse my tech illiteracy if this is obvious. I'm on Windows. I'm trying to combine a zoom with the "storyboard" mode, with multiple sequential text inputs. Currently I only know how do this with the simple built-in zoom option (as opposed to zoom.sh), but it zooms into the bottom-right corner by, apparently, displacing the entire image up and left 5 pixels per iteration. Is there a solution to this?

Most likely unimportant, but here's what I use:
python generate.py -p "Roses|photo:-1 ^ Sunflowers ^ Daisies ^ Daffodils" -cpe 1500 -zvid -i 6000 -zse 10 -vl 20 -zsc 1.005 -opt Adagrad -lr 0.15 -se 6000 -s 250 250

yaml.scanner.ScannerError: mapping values are not allowed here

(base) PS C:\Users\Alex\vqgan-clip> python generate.py -p "A painting of an apple in a fruit bowl" Traceback (most recent call last): File "generate.py", line 546, in <module> model = load_vqgan_model(args.vqgan_config, args.vqgan_checkpoint).to(device) File "generate.py", line 516, in load_vqgan_model config = OmegaConf.load(config_path) File "C:\Users\Alex\anaconda3\lib\site-packages\omegaconf\omegaconf.py", line 184, in load obj = yaml.load(f, Loader=get_yaml_loader()) File "C:\Users\Alex\anaconda3\lib\site-packages\yaml\__init__.py", line 114, in load return loader.get_single_data() File "C:\Users\Alex\anaconda3\lib\site-packages\yaml\constructor.py", line 49, in get_single_data node = self.get_single_node() File "C:\Users\Alex\anaconda3\lib\site-packages\yaml\composer.py", line 36, in get_single_node document = self.compose_document() File "C:\Users\Alex\anaconda3\lib\site-packages\yaml\composer.py", line 58, in compose_document self.get_event() File "C:\Users\Alex\anaconda3\lib\site-packages\yaml\parser.py", line 118, in get_event self.current_event = self.state() File "C:\Users\Alex\anaconda3\lib\site-packages\yaml\parser.py", line 193, in parse_document_end token = self.peek_token() File "C:\Users\Alex\anaconda3\lib\site-packages\yaml\scanner.py", line 129, in peek_token self.fetch_more_tokens() File "C:\Users\Alex\anaconda3\lib\site-packages\yaml\scanner.py", line 223, in fetch_more_tokens return self.fetch_value() File "C:\Users\Alex\anaconda3\lib\site-packages\yaml\scanner.py", line 577, in fetch_value raise ScannerError(None, None, yaml.scanner.ScannerError: mapping values are not allowed here in "C:\Users\Alex\vqgan-clip\checkpoints\vqgan_imagenet_f16_16384.yaml", line 43, column 15 (base) PS C:\Users\Alex\vqgan-clip>

Error on taming.models

Please consider adding to your setup instructions:

Was receiving an error:
File "generate.py", line 18, in
from taming.models import cond_transformer, vqgan
ModuleNotFoundError: No module named 'taming'

Solution:
pip3 install taming-transformers

Multi GPU ?

Could not make this run on multi GPUs. Would love some help!

Magic Key Words

Are there any good resources of key words that work well with VQGAN+CLIP?

I compiled some I heard so far:

unreal engine | hyperrealistic | vray
trending on artstation
photorealistic
render
psychedelic | surreal | weird
pencil art sketch
drawn by a child
in the style of xxx

Since this isn't really an issue, perhaps opening up Github discussions in this repo would be better for these kinds of topics.

No such file or directory: '..checkpoints\\vqgan_imagenet_f16_16384.yaml'

I'm not quite sure what's going on here but I do have the file in the correct subfolder. I reinstalled the dependencies and still get the same error. This is on a Windows 10 build.

Pretrained model download links not working

The pretrained model download links in the download script and readme is unreachable for me: http://mirror.io.community

However, the links on the taming-transformers repo can be used for downloading the models:
https://github.com/CompVis/taming-transformers#overview-of-pretrained-models
The .yaml and .ckpt then have to be renamed accordingly.

If this is a common issue, the readme and download script should be updated.

No such file or directory: 'ffmpeg'

When I run the generate.py script, I get the following error when the video is being made.

Generating video...
Traceback (most recent call last):
  File "/home/ubuntu/VQGAN-CLIP/generate.py", line 581, in <module>
    p = Popen(['ffmpeg',
  File "/home/ubuntu/anaconda3/envs/vqgan/lib/python3.9/subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/home/ubuntu/anaconda3/envs/vqgan/lib/python3.9/subprocess.py", line 1821, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg'

I keep getting a traceback trying to make a video

in line 988 "AttributeError: 'int' object has no attribute 'stdin'"

ffmpeg command failed - check your installation
0%| | 0/5 [00:00<?, ?it/s]
Traceback (most recent call last):
File "C:\Users\Caleb\VQGAN-CLIP\generate.py", line 988, in
im.save(p.stdin, 'PNG')
AttributeError: 'int' object has no attribute 'stdin'

Maybe i'm trying to make a video wrong, but the issue persists even with the provided example of the telephone box.

CUDA out of memory.

How can i fix this?

"CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 2.00 GiB total capacity; 1.13 GiB already allocated; 0 bytes free; 1.16 GiB reserved in total by PyTorch)"

I understand that I need to allocate more memory or change the batch parameters. But In which file should I change it? Or what command should I use? I'm newbie btw...

Models mirrors have been removed

First of all, thank you so much for this notebook. It's my favorite version of the VQGAN + CLIP notebooks out there 😊.

As noted by @nerdyrodent in a previous issue, since a couple of days ago, no matter what model you choose to download you'll get the message Could not resolve host: mirror.io.community.

Accidental multi-GPU?

I have a cut of this code from a week or two ago.

Funnily enough I also added the option to run it on another GPU. When I do choose cuda:1, though, I get 2GB allocated on cuda:0 although that device is not specified anywhere in generate.py. Combined with disabling ECC (nvidia-smi -i 1 -e 0) this is Fine, because I can get over 912KibiPixels (1280x720 or 1488x624), but it would be good to understand what, why and how.

about run time ？ It takes less than a minute

I found a magical realization that took less than a minute
https://huggingface.co/spaces/akhaliq/VQGAN_CLIP

Central Park - exploration

central park concept art

central prak photoillustration

central park watercolor

I think this was "nyc,drone,tiltshift,behance hd"

"central park, 35mm film"

quite impressed by prompting a location and passing in "drone"
"vernazza,italy drone"

Guided Diffusion Version?

Excellent job converting to python from Colab. Would you consider doing the same for her other guided diffusion notebook?
RiversHaveWings Guided Diffusion

Error message about conda activate

Error message about conda activate:

CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'. To initialize your shell, run $ conda init <SHELL_NAME> Currently supported shells are: - bash

But if I try to run conda init I get this:

a init no change /root/anaconda3/condabin/conda no change /root/anaconda3/bin/conda no change /root/anaconda3/bin/conda-env no change /root/anaconda3/bin/activate no change /root/anaconda3/bin/deactivate no change /root/anaconda3/etc/profile.d/conda.sh no change /root/anaconda3/etc/fish/conf.d/conda.fish no change /root/anaconda3/shell/condabin/Conda.psm1 no change /root/anaconda3/shell/condabin/conda-hook.ps1 no change /root/anaconda3/lib/python3.8/site-packages/xontrib/conda.xsh no change /root/anaconda3/etc/profile.d/conda.csh no change /root/.bashrc No action taken.

And activate vqgan will still not work.

RuntimeError: requires_grad_ is not supported on ScriptModules

I don't know what happened - but had a working setup - and then was tinkering with facebook faiss - and gcc and now hit this problem.

python generate.py -p "The fashion of tomorrow"
/home/jp/Documents/gitWorkspace/VQGAN-CLIP/CLIP/clip/clip.py:23: UserWarning: PyTorch version 1.7.1 or higher is recommended
  warnings.warn("PyTorch version 1.7.1 or higher is recommended")
Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips/vgg.pth
VQLPIPSWithDiscriminator running with hinge loss.
Restored from checkpoints/vqgan_imagenet_f16_16384.ckpt
Traceback (most recent call last):
  File "generate.py", line 361, in <module>
    perceptor = clip.load(args.clip_model, jit=jit)[0].eval().requires_grad_(False).to(device)
  File "/home/jp/miniconda3/lib/python3.8/site-packages/torch/jit/_script.py", line 919, in fail
    raise RuntimeError(name + " is not supported on ScriptModules")
RuntimeError: requires_grad_ is not supported on ScriptModules

I'm on 1.10 nightly build of pytorch.

>>> print(torch.__version__)
1.10.0.dev20210715+cu111
>>> exit
Use exit() or Ctrl-D (i.e. EOF) to exit
>>> exit()

_pickle.UnpicklingError: invalid load key, 'm'.

I installed it yesterday on my work machine and it worked just as it should

Today I tried to install it on my home machine, but I get following error:

vstil@DESKTOP-R251CM7 MINGW64 ~/VQGAN-CLIP (main)
$ python generate.py -se 1 -p "a cat"
C:\Users\vstil\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\distutils_patch.py:25: UserWarning: Distutils was imported before Setuptools. This usage is discouraged and may exhibit undesirable behaviors or errors. Please use Setuptools' objects directly or at least import Setuptools first.
  warnings.warn(
Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips\vgg.pth
VQLPIPSWithDiscriminator running with hinge loss.
Traceback (most recent call last):
  File "C:\Users\vstil\VQGAN-CLIP\generate.py", line 546, in <module>
    model = load_vqgan_model(args.vqgan_config, args.vqgan_checkpoint).to(device)
  File "C:\Users\vstil\VQGAN-CLIP\generate.py", line 520, in load_vqgan_model
    model.init_from_ckpt(checkpoint_path)
  File "taming-transformers\taming\models\vqgan.py", line 45, in init_from_ckpt
    sd = torch.load(path, map_location="cpu")["state_dict"]
  File "C:\Users\vstil\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\serialization.py", line 608, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "C:\Users\vstil\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\serialization.py", line 777, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'm'.

I already tried to redownload the VQGAN-CLIP repo to no avail...

Any help would be greatly appreciated!

Richer demo images

https://twitter.com/e08477/status/1418440857578098691?s=21

Would love to be able to recreate this. We need to build out a taxonomy of styles.

I. Mentioned on a comment on. YouTube that could take a video / ffmpeg and stick each image in and see it reimagine video. Can you try with the something? And post result?

which CUDA version is required for pytorch here?

I'm getting UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu. (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:115.). I do have a GPU, but I'm using CUDA version 8 (it's a shared lab machine).

Is the old CUDA version why I get the above error? Any way to fix this, apart from setting up a brand new system?

Wikiart checkpoint issue

If I specify the wikiart_16384 checkpoint, the following error occurs:

Traceback (most recent call last):
  File "C:\Development\ml\VQGAN-CLIP\generate.py", line 364, in <module>
    model = load_vqgan_model(args.vqgan_config, args.vqgan_checkpoint).to(device)
  File "C:\Development\ml\VQGAN-CLIP\generate.py", line 338, in load_vqgan_model
    model.init_from_ckpt(checkpoint_path)
  File "C:\Development\ml\VQGAN-CLIP\taming-transformers\taming\models\vqgan.py", line 52, in init_from_ckpt
    self.load_state_dict(sd, strict=False)
  File "C:\ProgramData\Miniconda3\envs\vqgan\lib\site-packages\torch\nn\modules\module.py", line 1406, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for VQModel:
        size mismatch for loss.discriminator.main.8.weight: copying a param with shape torch.Size([512, 256, 4, 4]) from checkpoin
t, the shape in current model is torch.Size([1, 256, 4, 4]).

Is there a way to specify the initial model shape?

Improvement: Add cog files

https://github.com/replicate/cog makes it easy to build Docker containers for machine learning. A cog.yaml has to be configured and the interface code written, which looks pretty straightforward. The project could probably be also be added here: https://replicate.ai/explore
Anyone who has Docker installed could then run it on there system as easy as executing something like this:

docker run -d -p 5000:5000 r8.im/nerdyrodent/VQGAN-CLIP@sha256:fe8d040a80609ff5643815e28bc3c488faf8870d968f19e045c4d0e043ffae59
curl http://localhost:5000/predict -X POST -F p="A painting of an apple in a fruit bowl"

memory issues

I've been trying to do larger resolution images but no matter what size GPU I use, i get a message like the one below where it seems pytorch is using a massive amount of the available memory? Any advice on how to go about creating larger images?

GPU 0; 31.75 GiB total capacity; 29.72 GiB already allocated; 381.00 MiB free; 29.94 GiB reserved in total by PyTorch

Error when running in CPU mode

Bug

I get RuntimeError: "softmax_lastdim_kernel_impl" not implemented for 'Half' when running this against my CPU.

To reproduce

$ python generate.py -p "A painting of an apple in a fruit bowl" -cd cpu

Gives

Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips/vgg.pth
VQLPIPSWithDiscriminator running with hinge loss.
Restored from checkpoints/vqgan_imagenet_f16_16384.ckpt
Traceback (most recent call last):
  File "/home/daniel/repos/vqgan-clip/generate.py", line 633, in <module>
    embed = perceptor.encode_text(clip.tokenize(txt).to(device)).float()
  File "/home/daniel/repos/vqgan-clip/CLIP/clip/model.py", line 344, in encode_text
    x = self.transformer(x)
  File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/daniel/repos/vqgan-clip/CLIP/clip/model.py", line 199, in forward
    return self.resblocks(x)
  File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/daniel/repos/vqgan-clip/CLIP/clip/model.py", line 186, in forward
    x = x + self.attention(self.ln_1(x))
  File "/home/daniel/repos/vqgan-clip/CLIP/clip/model.py", line 183, in attention
    return self.attn(x, x, x, need_weights=False, attn_mask=self.attn_mask)[0]
  File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/activation.py", line 1031, in forward
    attn_output, attn_output_weights = F.multi_head_attention_forward(
  File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/functional.py", line 5082, in multi_head_attention_forward
    attn_output, attn_output_weights = _scaled_dot_product_attention(q, k, v, attn_mask, dropout_p)
  File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/functional.py", line 4828, in _scaled_dot_product_attention
    attn = softmax(attn, dim=-1)
  File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/functional.py", line 1679, in softmax
    ret = input.softmax(dim)
RuntimeError: "softmax_lastdim_kernel_impl" not implemented for 'Half'

Expected behavior

No error; generate an output image.

Additional notes

I followed the setup described in the readme (kudos - it's very thorough!)
Image generation using my GPU works fine, i.e. without the -cd cpu parameter

Environment

Collecting environment information...
PyTorch version: 1.9.0+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: 10.0.0-4ubuntu1 
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.9 (64-bit runtime)
Python platform: Linux-5.4.0-88-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.4.120
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3080 Ti
Nvidia driver version: 470.57.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.2
[pip3] pytorch-lightning==1.4.9
[pip3] pytorch-ranger==0.1.1
[pip3] torch==1.9.0+cu111
[pip3] torch-optimizer==0.1.0
[pip3] torchaudio==0.9.0
[pip3] torchmetrics==0.5.1
[pip3] torchvision==0.10.0+cu111
[conda] numpy                     1.21.2                   pypi_0    pypi
[conda] pytorch-lightning         1.4.9                    pypi_0    pypi
[conda] pytorch-ranger            0.1.1                    pypi_0    pypi
[conda] torch                     1.9.0+cu111              pypi_0    pypi
[conda] torch-optimizer           0.1.0                    pypi_0    pypi
[conda] torchaudio                0.9.0                    pypi_0    pypi
[conda] torchmetrics              0.5.1                    pypi_0    pypi
[conda] torchvision               0.10.0+cu111             pypi_0    pypi

RuntimeError: Error(s) in loading state_dict for VQModel

So I'm trying to be brave and set this up on my Windows 10 machine running Conda since my Titan RTX GPU is on that box. I was able to install everything w/o any issues but when I try to run the example it bails out. Not 100% sure what the error is.

(vqgan) PS C:\Users\stiet\Desktop\Work\AIStuff\VQGAN-CLIP> python generate.py -p "A painting of an apple in a fruit bowl"
Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips\vgg.pth
VQLPIPSWithDiscriminator running with hinge loss.
Traceback (most recent call last):
  File "C:\Users\stiet\Desktop\Work\AIStuff\VQGAN-CLIP\generate.py", line 546, in <module>
    model = load_vqgan_model(args.vqgan_config, args.vqgan_checkpoint).to(device)
  File "C:\Users\stiet\Desktop\Work\AIStuff\VQGAN-CLIP\generate.py", line 520, in load_vqgan_model
    model.init_from_ckpt(checkpoint_path)
  File "C:\Users\stiet\anaconda3\envs\vqgan\lib\site-packages\taming\models\vqgan.py", line 48, in init_from_ckpt
    self.load_state_dict(sd, strict=False)
  File "C:\Users\stiet\anaconda3\envs\vqgan\lib\site-packages\torch\nn\modules\module.py", line 1406, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for VQModel:
        size mismatch for loss.discriminator.main.8.weight: copying a param with shape torch.Size([1, 256, 4, 4]) from checkpoint, the shape in current model is torch.Size([512, 256, 4, 4]).
        size mismatch for quantize.embedding.weight: copying a param with shape torch.Size([16384, 256]) from checkpoint, the shape in current model is torch.Size([1024, 256]).
(vqgan) PS C:\Users\stiet\Desktop\Work\AIStuff\VQGAN-CLIP> ls


    Directory: C:\Users\stiet\Desktop\Work\AIStuff\VQGAN-CLIP


Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d-----         9/30/2021   3:52 PM                checkpoints
d-----         9/30/2021   3:23 PM                CLIP
d-----         9/30/2021   3:19 PM                samples
d-----         9/30/2021   3:54 PM                taming
d-----         9/30/2021   3:23 PM                taming-transformers
-a----         9/30/2021   3:19 PM            190 .gitignore
-a----         9/30/2021   3:19 PM           5277 download_models.sh
-a----         9/30/2021   3:19 PM          42380 generate.py
-a----         9/30/2021   3:19 PM           1095 LICENSE
-a----         9/30/2021   3:19 PM           1592 opt_tester.sh
-a----         9/30/2021   3:19 PM           1474 random.sh
-a----         9/30/2021   3:19 PM          13240 README.md
-a----         9/30/2021   3:19 PM           1187 requirements.txt
-a----         9/30/2021   3:19 PM           1544 video_styler.sh
-a----         9/30/2021   3:19 PM           2376 vqgan.yml
-a----         9/30/2021   3:19 PM           1444 zoom.sh

Idea, if we're being extra arty about videos.

Another change I've made for myself is to break every n iterations (after checkin) and await user input. If I input Y it reloads the image from disk and reinitialises the optimiser (the same as you do for a zoom video). This way I can "guide" it quite forcefully: if I want a skull with glowing blue eyes, and the blue eyes are not picked up from the init image (or have dissolved into nothing) by the 50th step, I can paint them in. I can also "promote" features in the output by exaggerating their presence.

Since we're reinitialising the optimiser, we can presumably also switch up the prompts 'in the middle' of the run, when loss has 'stabilised'? Depending on how far you want to take this (and I'll be doing my own experimentation) maybe we can draw up a timeline and construct a video based on prompts that change over time.

Can't generate video

I get this error when using -vid

Traceback (most recent call last):
  File "C:\Users\vanceagher\vqgan\generate.py", line 669, in <module>
    p = Popen(['ffmpeg',
  File "C:\Users\vanceagher\anaconda3\envs\vqgan\lib\subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Users\vanceagher\anaconda3\envs\vqgan\lib\subprocess.py", line 1420, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified````

Model Not Loading

What do these lines mean and why aren't they working?

FileNotFoundError Traceback (most recent call last)

in ()
3 #@markdown Once this has been run successfully you only need to run parameters and then the program to execute with new parameters
4 device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
----> 5 model = load_vqgan_model(args.vqgan_config, args.vqgan_checkpoint).to(device)
6 perceptor = clip.load(args.clip_model, jit=False)[0].eval().requires_grad_(False).to(device)
7

/usr/local/lib/python3.7/dist-packages/omegaconf/omegaconf.py in load(file_)
181
182 if isinstance(file_, (str, pathlib.Path)):
--> 183 with io.open(os.path.abspath(file_), "r", encoding="utf-8") as f:
184 obj = yaml.load(f, Loader=get_yaml_loader())
185 elif getattr(file_, "read", None):

FileNotFoundError: [Errno 2] No such file or directory: '/content/vqgan_imagenet_f16_16384.yaml'

Tensor is not a torch image

Hi. Thanks for the repo. I was just trying to test it, but I keep running into this:

traceback (most recent call last): file "/home/paperspace/vqgan-clip/generate.py", line 552, in train(i) file "/home/paperspace/vqgan-clip/generate.py", line 535, in train lossall = ascend_txt() file "/home/paperspace/vqgan-clip/generate.py", line 514, in ascend_txt iii = perceptor.encode_image(normalize(make_cutouts(out))).float() file "/home/paperspace/anaconda3/envs/vqgan/lib/python3.9/site-packages/torchvision/transforms/transforms.py", line 163, in call return f.normalize(tensor, self.mean, self.std, self.inplace) file "/home/paperspace/anaconda3/envs/vqgan/lib/python3.9/site-packages/torchvision/transforms/functional.py", line 201, in normalize raise typeerror('tensor is not a torch image.') typeerror: tensor is not a torch image.

Any idea how to fix it? Really appreciate any help.

Fixed

Hi there, just trying to generate video using the -vid argument but getting the following error

Error when trying to generate image (noob) any help would be appreciated

(vqgan) D:\art\VQGAN-CLIP>python generate.py -p "A painting of an apple in a fruit bowl"
Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips\vgg.pth
VQLPIPSWithDiscriminator running with hinge loss.
Traceback (most recent call last):
File "D:\art\VQGAN-CLIP\generate.py", line 546, in
model = load_vqgan_model(args.vqgan_config, args.vqgan_checkpoint).to(device)
File "D:\art\VQGAN-CLIP\generate.py", line 520, in load_vqgan_model
model.init_from_ckpt(checkpoint_path)
File "D:\art\VQGAN-CLIP\taming-transformers\taming\models\vqgan.py", line 52, in init_from_ckpt
self.load_state_dict(sd, strict=False)
File "D:\ana3\envs\vqgan\lib\site-packages\torch\nn\modules\module.py", line 1406, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for VQModel:
size mismatch for loss.discriminator.main.8.weight: copying a param with shape torch.Size([1, 256, 4, 4]) from checkpoint, the shape in current model is torch.Size([512, 256, 4, 4]).
size mismatch for quantize.embedding.weight: copying a param with shape torch.Size([16384, 256]) from checkpoint, the shape in current model is torch.Size([1024, 256]).