zsyoaoa / resshift Goto Github PK

View Code? Open in Web Editor NEW

788.0 788.0 41.0 53.29 MB

ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting (NeurIPS 2023 Spotlight)

License: Other

Python 100.00%

resshift's People

Contributors

Stargazers

Watchers

resshift's Issues

about faceir test results

Hi, thanks for offering such good work.
I'm facing a test problem, the test result is a totally black image. Is there any configuration problem for faceir task?

自定义数据集问题

作者您好，感谢您的杰出工作
我试图使用您提出的模型训练自己的数据集
在readme中我注意到只需修改
txt_file_path: [ '/mnt/lustre/zsyue/database/ImageNet/train/image_path_all.txt', '/mnt/lustre/zsyue/database/FFHQ/files_txt/ffhq256.txt', ]

但是这两个txt文件中是直接写入图片的路径吗？，放在一个txt中是如何区分gt和lq的呢？
如果您能帮忙解惑，不胜感激。

ResShift without the autoencoder

Thanks for the great work! I have a small query on training the model without the autoencoder. If i directly declare it to be none in the config file, i.e.,

autoencoder: None

it errors out due to the other dependencies on the autoencoder config params in the script. I also tried only making the target none and leaving the rest, but doesn't seem to work. Could you please guide me as to how to train ResShift without the autoencoder, basically on the image space instead of the latent space? Thanks a ton, and I hope to hear from you soon.

NameError: name 'vqgan_dir' is not defined

$ CUDA_VISIBLE_DEVICES=0 python inference_resshift.py -i input -o output --task realsrx4 --chop_size 512
C:\Python310\lib\site-packages\torchvision\transforms\functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional.
  warnings.warn(
Downloading: "https://github.com/zsyOAOA/ResShift/releases/download/v1.0/resshift_realsrx4_s15.pth" to C:\ResShift-master\weights\resshift_realsrx4_s15.pth

100%|##########| 456M/456M [00:17<00:00, 27.0MB/s]
Traceback (most recent call last):
  File "C:\ResShift-master\inference_resshift.py", line 101, in <module>
    main()
  File "C:\ResShift-master\inference_resshift.py", line 87, in main
    configs, chop_stride, chop_bs = get_configs(args)
  File "C:\ResShift-master\inference_resshift.py", line 56, in get_configs
    model_dir=vqgan_dir,
NameError: name 'vqgan_dir' is not defined

How to fix this? Thanks

gradio版本问题

不知道作者用的是哪个gradio版本，最新的版本报错AttributeError: module 'gradio' has no attribute 'outputs' ，换一个老一点的版本会报错AttributeError: module 'gradio' has no attribute 'Image'

Test wrong

I want to test the faceir, when i run "python inference_resshift.py -i /home/xielangren/project/ResShift/testdata/eval15/low --task faceir --scale 1",it got the error:

Traceback (most recent call last):
  File "/home/xielangren/project/ResShift/inference_resshift.py", line 197, in <module>
    main()
  File "/home/xielangren/project/ResShift/inference_resshift.py", line 170, in main
    resshift_sampler = ResShiftSampler(
  File "/home/xielangren/project/ResShift/sampler.py", line 53, in __init__
    self.setup_dist()  # setup distributed training: self.num_gpus, self.rank
  File "/home/xielangren/project/ResShift/sampler.py", line 72, in setup_dist
    rank = int(os.environ['LOCAL_RANK'])
  File "/home/xielangren/miniconda3/envs/resshift/lib/python3.10/os.py", line 680, in __getitem__
    raise KeyError(key) from None
KeyError: 'LOCAL_RANK'

how can i get this file '/mnt/sfs-common/zsyue/database/ImageNet/files_txt/path_train_all.txt'

Res-Shift weights without VQGAN

Thanks for the good work ! I assessed the quality of VQGAN on my data and it was really poor which caused poor quality as well when I used your model. So I want to not use any autoencoder anymore and was wondering if you have released your model weights without using any autoencoder since in the official paper Figure 2 you said that using an autoencoder is optional. I would really appreciate it, otherwise I would need to train the model from scratch...

你好，关于训练中出现的问题想请教一下

在训练的过程中，由于没有vq gan的权重我直接在pixel space训练diffusion ，在训练到1966步loss就会变为nan，我检查了我自己的数据没有出现，没有异常值，请问你知道这是为什么吗？

2x Scale

Hello,
I am trying to convert the model to be able to make 2x scaling instead of 4x.
I am referring to this issue: #22, comment: #22 (comment)

The link provided in this comment is broken, could you please re-send it?

Training Datasets Used Question

Dear author,

I hope this message finds you well. While engaging with your work and attempting to replicate the experiments based on the realsr_swinunet_realesrgan256.yaml configuration file provided, I came across a detail regarding the usage of datasets during training. The configuration file lists two dataset paths as follows:

      txt_file_path: [
                      '/mnt/lustre/zsyue/database/ImageNet/train/image_path_all.txt', 
                      '/mnt/lustre/zsyue/database/FFHQ/files_txt/ffhq256.txt',
                     ]

However, in the relevant section of your paper, while ImageNet dataset is mentioned as a resource for training, there isn't an explicit indication that the FFHQ dataset is also included. Consequently, I would like to seek clarification from you regarding which datasets were actually used in the model training and experimental results reported in your paper. Could you please clarify whether the outcomes presented in your research were derived solely from the ImageNet dataset or if they also incorporated data from the FFHQ dataset? Thank you!

How long does it take to train on imagenet?

Thank you for your awesome research.
I'm trying to train with imagenet in your code, and it turns out that it takes at least 14 days to run 50k iterations.
Can you tell me how long it took you to train?
And is there a technique that can expedite training?

Can you kindly provide the pre-trained model?

How to test ResShift in RealSR x4 dataset?

The LR resolution and GT resolution are the same in the RealSRx4 [1] Dataset.

It doesn't work whenI just set params "--scale" from 4 to 1 in inference.py.

Maybe I should downsample LR first?

Looking forward your reply!

Thank you!

[1]. Jianrui Cai, Hui Zeng, Hongwei Yong, Zisheng Cao, and Lei Zhang. Toward real-world single image super-resolution: A new benchmark and a new model. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3086–3095, 2019.

Seek help

Hello author!
Can image denoising and enhancement be performed on the following images to enhance their clarity?
Thank you!|

Issues about Image Size

Dear author, thanks for your excellent work!
After carefully reading your paper and your code, I see that the training is based on 64-256 SR task. But it seems that you use the pretrained weights (VQGAN encoder/decoder and SwinUnetModel) from 64-512 SR task to apply in 128-512 SR Task. How does it work by applying the same weights for different tasks? Or do I misunderstood your work? Thank you!

Compare with StableSR

Hi, thanks for the great work !

ResShift shows great performance comparing with BSRGGAN, RealESRGAN, SWinIR, LDM... in your paper. Have you ever compared it with StableSR ? The comparison should be very interesting.

Hi, I would like to ask how to solve the error of single GPU training model？

关于batchsize与microbatch size

请问配置文件中的batch和microbatch分别是什么意思呢，batch是总的训练batchsize，microbatch是分配到每个gpu上的batchsize吗？也就是说如果我用8卡训练，batch是64的话，microbatch应该设置为8，请问我的理解是对的吗？Thanks!

Hi，I want to know how to save psnr and lpips to log while training？

There are gt and lq files in my verification set folder. Why are there no psnr and lpips indicators in the log when the verification set is used for verification during the training process?

dear author, can you offer an introduction of how to prepare train data?

In the part of data prepare of the readme, I know how to prepare test dataset, but I'm confused how to prepare train dataset, should I just download image and put it into folder, do not need any preprocess?
can you give me some tips, thank you.

Google Colab demo

Hello your work is very impressive, the quality of the results is really very high.

Could you please make an online demo in Google Colab?
Thank you.

Dataset of Blind Face Restoration

Hi,

Thanks for releasing the code for the journal version.

Could you upload the blind face restoration dataset? I can't find the download link.

weird pink shade images

hi,

i train on my dataset and get some pink-ish shade in some images, and i don t know the cause of it
did it happen to any of you?
thanks

Discrepancies in CLIPIQA and MUSIQ Scores When Testing ResShift on RealSR65

Hi @zsyOAOA,

I am experiencing inconsistencies in the evaluation metrics while testing ResShift with the RealSR65 dataset. Below is a detailed description of my process and the issues encountered:

Data Verification and Command Execution:
- Confirmed the presence of the dataset in ./testdata/RealSet65.
- Ran the ResShift inference using the following command:
```
CUDA_VISIBLE_DEVICES=0 python inference_resshift.py -i testdata/RealSet65 -o result/RealSet65 --scale 4 --task realsrx4 --chop_size 512
```
Evaluation Metrics Assessment:
- Utilized IQA-PyTorch for computing CLIPIQA and MUSIQ metrics.
- Obtained the following results for the RealSR65 dataset:
```
CLIPIQA: 0.6418642669916153 (expected 0.6537)
MUSIQ: 58.211212921142575 (expected 61.330)
```
- Additionally, I observed these results for another subset of RealSR:
```
CLIPIQA: 0.5409876523911953 (expected 0.5958)
MUSIQ: 53.28555391311645 (expected 59.873)
```
Issue and Inquiry:
- Despite varying the random seed with the --seed option, the scores did not align with the reported values.
- This discrepancy persists across different datasets and metrics, prompting me to question if a step was missed or executed incorrectly.

Questions:

Could there be an oversight in my testing methodology or a specific procedure I should follow?
Is evaluating CLIPIQA and MUSIQ on the Y channel necessary or recommended for accurate results?

I am keen to understand and rectify these discrepancies and would greatly appreciate your insights.

Thank you for your assistance.

wights/autoenconder

请问这个文件是怎么获得啊

train error

hello, i am not able to start training and get this error
could you please help me?
greetings

Dependencies in environment.yml

I've seen and installed all sorts of projects, including stable diffusion, traiNNer, chaiNNer, and all sorts of Upscaler's stuff.
But in none of them have I seen so many libraries as dependencies.
Have you tried removing unnecessary and redundant libraries?
Or do you need every single library EVER WRITTEN for python to simple run/test ResShift stuff?
it's really a hell of different python libraries of all kinds and also tightly version and OS specific.

About inference time

Thank you for the excellent work! I used 'CUDA_VISIBLE_DEVICES=gpu_id python inference_resshift.py -i [image folder/image path] -o [result folder] --scale 4 --task realsrx4 --chop_size 512' to super-resolve ImageNet-Test on a Tesla V100 GPU. And it costs about 55 mins, which is much more than the time presented in the paper. So what's wrong with the inference code? Thank you for your reply ahead of time.

issues about training

I have trained about 390k iterations, but got poor results.

openxlab demo error

@zsyOAOA
Thank you for sharing your work. I am getting errors when trying the demo on openxlab. is there a specific input size or something?

Have you tried trained on predicting 'epsilon' but not 'xstart'?

Very awesome work and inspired me a lot!!

I have a question regarding the experiment on training objectives. Have you tried training on reconstructing 'epsilon'? To me, it's not very intuitive why the model needs to output the same 'x_0' at different time steps.

I would appreciate it if you have further insights!

Questions about training configs

您好，请教您一下训练中diffusion.step的设置，我看您的默认设置是15，请问这个step设置的更大会训练得到更好的结果吗？我目前将diffusion.step设置为了50，同时用step=50来inference，但是我训练的模型推理很难达到您的预训练模型的性能。请问我应该把在训练中将diffusion.step调小一些吗？

The question about the noisy pixels

Thanks for your great work. I find your results is always perceptually better than StableSR, but some output images have noisy pixels as below. I wonder why this happen, and how can I fix or mitigate this defect by adjusting the parameters or re-training the model?

Do you have plan to support diffusers inference in the future?

VRAM issue

Hello, can a GPU with 24GB of VRAM train this model?

module 'ldm.models.autoencoder' has no attribute 'VQModelTorch'

模块里明明有VQModelTorch函数却显示找不到，看别人复现stable-diffusion也有相同问题，不知道该怎么解决了。

CUDA out of memory

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 128.00 GiB (GPU 0; 24.00 GiB total capacity; 6.71 GiB already allocated; 12.96 GiB free; 8.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Only have 1 GPU with 24GB, is there any way to run this?

Reproducing the results on ImageNet-Val

Hi. To evaluate our setup we are trying to reproduce the results mentioned in the paper. To do so.. we have followed the following steps mentioned in readme.

Sample 3k images from the ImageNet-Val set using the script (https://github.com/zsyOAOA/ResShift/blob/master/scripts/prepare_testing_imagenet.py)
Generate reconstruction images with inference script and models shared
CUDA_VISIBLE_DEVICES=gpu_id python inference_resshift.py -i [image folder/image path] -o [result folder] --scale 4 --task realsrx4 --chop_size 512

The PSNR and SSIM from this set aren't matching the numbers reported in papers. Can you confirm the steps and add if we are missing anything?

Seeking a more detailed explanation about the _scale_input function in the gaissuain_diffusion class

Excellent job, I have learned a lot
I have 3 questions:

Is standardization very novel simply because experiments work better?
I don’t quite understand your comment: the variance of latent code is around 1.0. If implicit representation is not needed, can 1 be removed?
if latent_flag=False, why is 3*\sqrt{\eta}+1? Where did 3 and 1 come from?

May I ask the author, can this model improve the resolution of real person images?

May I ask the author, can this model improve the resolution of real person images? I have a bunch of screenshots of real-life videos with poor quality. Can I use this model to achieve better image quality?
（May I ask the author, can this model improve the resolution of real person images? I have a bunch of screenshots of real-life videos with poor quality. Can I use this model to achieve better image quality?）

Asking about the inference time

Hi,

Thank you so much for your contributions!

I'd like to ask you about the inference time in the case of using "realsr_swinunet_realesrgan256.yaml"

In particular, it takes me around 6s to handle an image with size of 500x400x3.
My GPU is RTX4090 24GB.

The reason is that, I run exact on the same situation yesterday, but it takes only less than 1s.

Therefore, I'd like to ask if the inference time (6s) is normal, or any settings I need to modify to speed up the inference.

what should I do if I want to inference or train without vae

遇见了内存分配问题

已经在运行代码加了内存控制使用0.9，
import torch

设置CUDA占用的GPU内存的百分比

torch.cuda.set_per_process_memory_fraction(0.9) # 这里设置为90%

但是还是报错：
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 128.00 GiB (GPU 0; 23.99 GiB total capacity; 7.85 GiB already allocated; 14.25 GiB free; 21.59 GiB allowed; 8.14 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

zsyoaoa / resshift Goto Github PK

resshift's People

Contributors

Stargazers

Watchers

Forkers

resshift's Issues

设置CUDA占用的GPU内存的百分比

Recommend Projects

Recommend Topics

Recommend Org