tobran / df-gan Goto Github PK

View Code? Open in Web Editor NEW

289.0 289.0 68.0 3.34 MB

A Simple and Effective Baseline for Text-to-Image Synthesis (CVPR2022 oral)

License: Other

Python 98.10% Shell 1.90%

generative-adversarial-network text-to-image

df-gan's People

Contributors

Stargazers

Watchers

df-gan's Issues

Where can I get coco_ val256_ FIDK0.npz file？

Someone can share the coco_ val256_ FIDK0.npz file? Thank you！

关于获得FID的方法

现在正在学习并尝试复现DF-GAN的代码。但是我不知道要如何进行FID评分。根据github的代码以及提供的模型（netG_600.pth）进行评分的准备时，会生成2928张图片。但是在根据github的说明使用DM-GAN的FID代码进行评分时，只有55的分数。因此我在担心是否是我操作有误。我获取FID的评分时使用的代码是（python> fid_score.py --gpu 0 --batch-size 24 --path1 bird_val.npz --path2> ../../test/valid/single）

How did you pretrain the text encoder?

What did you predict during pretraining and how did loss do it?

想问一下IS指标的问题

这份代码请问如何复现IS指标

What does bird_val256_FIDK0.npz file Contains?

Hey @tobran ,
Big Fan of Your work and i have been following your work from past 3 months. I have implemented DFGAN and CLIP paper codes and they are excellent.

I have a question Regarding what values are stored in " bird_val256_FIDK0.npz" file. I know they are used for FID score calculation. But can u tell me how exactly they are calculated and should i change those values if i change my text encoder?

Hoping for faster Response

Thank You

IP adrees blocked by google Drive

My Ip address is abandonded from google drive. Any brother can share a new copy of the pretrained coco encoder or upload it to baiduyun

RNN_ENCODER size mismatch

Hi, I have cloned the repository and have been following the steps mentioned in the readme.md file.
While following the steps: The text encoder (pretrained for CUB dataset) is giving the below error:

File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1407, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for RNN_ENCODER: size mismatch for encoder.weight: copying a param with shape torch.Size([5450, 300]) from checkpoint, the shape in current model is torch.Size([1, 300]).

Anyone can share some thoughts and a solution for the same.

I do understand that there is a mismatch on shape but i am unable to pinpoint the exact position this is happening.

Package versions with python3.7:
torch 1.9.0+cu102 torchsummary 1.5.1 torchtext 0.10.0 torchvision 0.10.0+cu102

how to generate images using only statements?

I have trained the model as per instructions but now I am stuck... how do I generate images using text only as there are no clear instructions for this in readme

Fid Score on the CUB dataset

On CUB dataset, we used the pre-training model provided by the official website for testing, then the calculated result was 26.89 in the Fid score. However, the result shown on the paper was 14.81. May I ask what causes such a big difference?

您好，作者，想问一下关于IS指标的问题

作者，您好，我在复现您提供的代码的时候，FID的指标没有问题，可是IS的数值达不到5.1
您说的 IS可以通过Inception-V3联合训练进行过拟合，是什么意思？能麻烦您解答一下吗？

name 'count' is not defined报错帮看下

[1/1][365/368] Loss_D: 1.329 Loss_G 1.297
[1/1][366/368] Loss_D: 1.162 Loss_G 3.206
[1/1][367/368] Loss_D: 1.162 Loss_G 2.319
Traceback (most recent call last):
File "main.py", line 273, in
count = train(dataloader,netG,netD,text_encoder,optimizerG,optimizerD, state_epoch,batch_size,device)
File "main.py", line 179, in train
return count
NameError: name 'count' is not defined

About `image_encoder`

Hello, thank you for all the hard work you've done.

I got interested by your works and meticulously read the code and the paper.
However, I'm still struggling to understand what the image_encoder is for, so it would be really helpful if you could give me some detailed explanations about it, and I apologise in advance if I have missed an obvious answer !

In code.lib.prepare.py we have the function:

def prepare_models(...)
    ....
    return image_encoder, ....

But actually, image_encoder seems to be never used by any modules.

Thank you !

如何复现 coco dataset 上的 fid 指标

作者您好。我看到 paper 中提供的最终 model 在 coco dataset 上的 fid 是 19.32，但是我下载了您 readme 里边提供的 checkpoint，并运行 bash scripts/calc_fid.sh ./cfg/coco.yml 来测量 coco val2014 上的 fid。但是我得到的结果是 15.59

我是完全按照 readme 里边的顺序一步一步 setup 的。唯一的区别是 codebase 使用的 pytorch 和 torchvision 的版本不 support 我使用的 3090 所以我将 torch 升级到 1.13.1 torchvision 升级到 0.14 。这样测量出来的结果是 15.59

我不知道我哪一步出了问题，我只是升级了 torch 的版本适配到我的机器上。

期待您的解答

Training result on CUB

Great repository. I got some questions about the result. Is the result suppose to be like that after training for 600 epochs ? I trained with nF=32. Is there any way to improve the result? Looking forward to your reply.

Where can I download the checkpoints at 600 epoch and 120 epoch for CUB and COCO respectively?

Thank you so much for sharing your work and code！

I have noticed that the results and trained models in this repo were updated compared with the results in the paper. The released checkpoints for the models now are ./saved_models/bird/pretrained/state_epoch_1220.pth and ./saved_models/coco/pretrained/state_epoch_290.pth. I am wondering where can I download the checkpoint files at 600 epoch and 120 epoch for CUB and COCO respectively？ Since those epoch settings are the default for Text2Image task and I believe they are what you used to produce the results in the paper, they are very essential for us to follow your work.

RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

Hi
when I run calc_fid.sh I get the following error:
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
This error also occurs when I run train.sh during fid calculation too.
This error did not occur before.
Can anyone help me?

Abnormal Inception Score on the CUB dataset

I got a result of mean: 4.16 std: 0.15 on the CUB dataset, which is much worse than 5.1.
I did not change any code.

The retrained result on COCO

Hello,

Thank you for your works.
I have some problems when retraining the model on COCO. The retraining FID is 56.46 using the default setting of coco.yml (batchsize = 24, nf = 32), and the enviroment is py36, pytorch1.5 with a 2080Ti. However, the provided pretrained model achieves 26.68 FID with the same setting. In addition, have you tried Inception Score on COCO?

I'll really appreciate it if you could give me some advice.

Generate image from my captions

After training the network, how to generate image from my own captions?

size mismatch for encoder.weight

What are the pickle files?

@tobran what are the different pickle files used in the project?
like caption_DAMSM pickle.

关于 DFGAN 的若干问题

你好，

首先感谢你的工作。但关于代码我有一些问题：

DFBlock 中的激活函数
论文中展示的 DFBlock 激活函数使用的是 ReLU，提供的代码中却是 LeakyReLU
关于提供的预训练的模型
使用你们提供的预训练模型测出来的 IS 很低，但用你们的代码 retrain 之后模型比较正常，所以你们提供的预训练模型是否出了问题？
A titan xp (set nf=32 in *.yaml) or a V100 32GB (set nf=64 in *.yaml)
论文中的生成鸟的 IS=4.86 是 nf=32 还是 nf=64 的时候测出来的呢？nf 是 32 还是 64 会造成较大影响吗？当然后续我也会自己试验一下的。

问题比较多，如果能够回答，不胜感激。

FID on MSCOCO dataset

How many images did you test in mscoco dataset in FID metric?
There is only 4490 images from sample_times: 1 * len(filenames) from test: 4490.
Shouldn't it be 30000 images?

Using GRU instead of LSTM

If I want to use GRU as text encoder instead of LSTM, where should I make changes?

Thank you.

How to train the text-to-image model?

How to prepare the dataset to train text-to-image model in Pytorch? There is not an answer to this question in other code. Thank you.

About the differences between V1, V2 and V3 version paper code

I checked the code of the network structure of your V1 version and the current version, and found only two differences. When DFBlock fuses text information, the V1 version code only has sentence vectors, but the current version also combines noise. Then the gradient update of MAGP is changed from the original step-by-step update to the joint hinge loss joint update. Besides these two differences, are there any other differences?

我查阅了你的v1版本和当前版本的网络结构部分的代码，只发现了两处不同。当DFBlock融合文本信息的时候，v1版本代码只有句子向量，但当前版本还联合了噪音。然后MAGP进行梯度更新的时候从原来的分步更新变成了联合hinge loss一同更新。除了这两处不同外还有其他不同吗？

Question about affine parameter in the paper

Hi,
I have a question about the size of affine parameters (output of MLP). Should it be size of (B,C) instead of (C,)?

SystemError: tile cannot extend outside image (Image transform error)

Transforming image is causing the foll. error:
File "datasets.py", line 301, in getitem
self.transform, normalize=self.norm)
File "datasets.py", line 77, in get_imgs
ret.append(normalize(img))
File "/home/j/anaconda3/envs/env_con/lib/python3.6/site-packages/torchvision/transforms.py", line 34, in call
img = t(img)
File "/home/j/anaconda3/envs/env_con/lib/python3.6/site-packages/torchvision/transforms.py", line 70, in call
img = torch.ByteTensor(torch.ByteStorage.from_buffer(pic.tobytes()))
File "/home/j/anaconda3/envs/env_con/lib/python3.6/site-packages/PIL/Image.py", line 738, in tobytes
e.setimage(self.im)
SystemError: tile cannot extend outside image

It looks like one of the image dimensions is becoming zero after transform resulting in the above error. Kindly suggest to resolve the error!

birds data image

birds data image 点进去出现Sorry, but the page you were trying to view does not exist.怎样解决？

Text Embeddings

How did you form the text encoder. Also what are the contents of meta data files?

代码里面没有lib.datasets

感谢作者上传新版本的代码，我正尝试复现，但发现有如下问题
在train.py的21行有如下代码：
from lib.datasets import get_fix_data`

包括modules中也用到了datasets：
from lib.datasets import TextImgDataset as Dataset
from lib.datasets import prepare_data, encode_tokens

lib文件夹中暂未找到datasets文件，是否上传遗漏，还是我复现步骤错误

    I would like to ask how to calculate the IS of the Bird dataset. Thank you.

换成自己的数据集

作者你好，你的工作非常出色。我向请教一下如何将模型在我自己的数据集上复现，能告诉我吗？

tobran / df-gan Goto Github PK

df-gan's People

Contributors

Stargazers

Watchers

Forkers

df-gan's Issues

Recommend Projects

Recommend Topics

Recommend Org