tobran / df-gan Goto Github PK
View Code? Open in Web Editor NEWA Simple and Effective Baseline for Text-to-Image Synthesis (CVPR2022 oral)
License: Other
A Simple and Effective Baseline for Text-to-Image Synthesis (CVPR2022 oral)
License: Other
Someone can share the coco_ val256_ FIDK0.npz file? Thank you!
现在正在学习并尝试复现DF-GAN的代码。但是我不知道要如何进行FID评分。根据github的代码以及提供的模型(netG_600.pth)进行评分的准备时,会生成2928张图片。但是在根据github的说明使用DM-GAN的FID代码进行评分时,只有55的分数。因此我在担心是否是我操作有误。我获取FID的评分时使用的代码是(python> fid_score.py --gpu 0 --batch-size 24 --path1 bird_val.npz --path2> ../../test/valid/single)
What did you predict during pretraining and how did loss do it?
这份代码请问如何复现IS指标
Hey @tobran ,
Big Fan of Your work and i have been following your work from past 3 months. I have implemented DFGAN and CLIP paper codes and they are excellent.
I have a question Regarding what values are stored in " bird_val256_FIDK0.npz" file. I know they are used for FID score calculation. But can u tell me how exactly they are calculated and should i change those values if i change my text encoder?
Hoping for faster Response
Thank You
My Ip address is abandonded from google drive. Any brother can share a new copy of the pretrained coco encoder or upload it to baiduyun
Hi, I have cloned the repository and have been following the steps mentioned in the readme.md file.
While following the steps: The text encoder (pretrained for CUB dataset) is giving the below error:
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1407, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for RNN_ENCODER: size mismatch for encoder.weight: copying a param with shape torch.Size([5450, 300]) from checkpoint, the shape in current model is torch.Size([1, 300]).
Anyone can share some thoughts and a solution for the same.
I do understand that there is a mismatch on shape but i am unable to pinpoint the exact position this is happening.
Package versions with python3.7:
torch 1.9.0+cu102 torchsummary 1.5.1 torchtext 0.10.0 torchvision 0.10.0+cu102
I have trained the model as per instructions but now I am stuck... how do I generate images using text only as there are no clear instructions for this in readme
On CUB dataset, we used the pre-training model provided by the official website for testing, then the calculated result was 26.89 in the Fid score. However, the result shown on the paper was 14.81. May I ask what causes such a big difference?
作者,您好,我在复现您提供的代码的时候,FID的指标没有问题,可是IS的数值达不到5.1
您说的 IS可以通过Inception-V3联合训练进行过拟合,是什么意思?能麻烦您解答一下吗?
[1/1][365/368] Loss_D: 1.329 Loss_G 1.297
[1/1][366/368] Loss_D: 1.162 Loss_G 3.206
[1/1][367/368] Loss_D: 1.162 Loss_G 2.319
Traceback (most recent call last):
File "main.py", line 273, in
count = train(dataloader,netG,netD,text_encoder,optimizerG,optimizerD, state_epoch,batch_size,device)
File "main.py", line 179, in train
return count
NameError: name 'count' is not defined
Hello, thank you for all the hard work you've done.
I got interested by your works and meticulously read the code and the paper.
However, I'm still struggling to understand what the image_encoder
is for, so it would be really helpful if you could give me some detailed explanations about it, and I apologise in advance if I have missed an obvious answer !
In code.lib.prepare.py
we have the function:
def prepare_models(...)
....
return image_encoder, ....
But actually, image_encoder
seems to be never used by any modules.
Thank you !
作者您好。我看到 paper 中提供的最终 model 在 coco dataset 上的 fid 是 19.32,但是我下载了您 readme 里边提供的 checkpoint,并运行 bash scripts/calc_fid.sh ./cfg/coco.yml
来测量 coco val2014 上的 fid。但是我得到的结果是 15.59
我是完全按照 readme 里边的顺序一步一步 setup 的。唯一的区别是 codebase 使用的 pytorch 和 torchvision 的版本不 support 我使用的 3090 所以我将 torch 升级到 1.13.1 torchvision 升级到 0.14 。这样测量出来的结果是 15.59
我不知道我哪一步出了问题,我只是升级了 torch 的版本适配到我的机器上。
期待您的解答
Thank you so much for sharing your work and code!
I have noticed that the results and trained models in this repo were updated compared with the results in the paper. The released checkpoints for the models now are ./saved_models/bird/pretrained/state_epoch_1220.pth and ./saved_models/coco/pretrained/state_epoch_290.pth. I am wondering where can I download the checkpoint files at 600 epoch and 120 epoch for CUB and COCO respectively? Since those epoch settings are the default for Text2Image task and I believe they are what you used to produce the results in the paper, they are very essential for us to follow your work.
Hi
when I run calc_fid.sh I get the following error:
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
This error also occurs when I run train.sh during fid calculation too.
This error did not occur before.
Can anyone help me?
I got a result of mean: 4.16 std: 0.15 on the CUB dataset, which is much worse than 5.1.
I did not change any code.
Hello,
Thank you for your works.
I have some problems when retraining the model on COCO. The retraining FID is 56.46 using the default setting of coco.yml (batchsize = 24, nf = 32), and the enviroment is py36, pytorch1.5 with a 2080Ti. However, the provided pretrained model achieves 26.68 FID with the same setting. In addition, have you tried Inception Score on COCO?
I'll really appreciate it if you could give me some advice.
After training the network, how to generate image from my own captions?
@tobran what are the different pickle files used in the project?
like caption_DAMSM pickle.
你好,
首先感谢你的工作。但关于代码我有一些问题:
DFBlock 中的激活函数
论文中展示的 DFBlock 激活函数使用的是 ReLU,提供的代码中却是 LeakyReLU
关于提供的预训练的模型
使用你们提供的预训练模型测出来的 IS 很低,但用你们的代码 retrain 之后模型比较正常,所以你们提供的预训练模型是否出了问题?
A titan xp (set nf=32 in *.yaml) or a V100 32GB (set nf=64 in *.yaml)
论文中的生成鸟的 IS=4.86 是 nf=32 还是 nf=64 的时候测出来的呢?nf 是 32 还是 64 会造成较大影响吗?当然后续我也会自己试验一下的。
问题比较多,如果能够回答,不胜感激。
How many images did you test in mscoco dataset in FID metric?
There is only 4490 images from sample_times: 1 * len(filenames) from test: 4490.
Shouldn't it be 30000 images?
If I want to use GRU as text encoder instead of LSTM, where should I make changes?
Thank you.
How to prepare the dataset to train text-to-image model in Pytorch? There is not an answer to this question in other code. Thank you.
I checked the code of the network structure of your V1 version and the current version, and found only two differences. When DFBlock fuses text information, the V1 version code only has sentence vectors, but the current version also combines noise. Then the gradient update of MAGP is changed from the original step-by-step update to the joint hinge loss joint update. Besides these two differences, are there any other differences?
我查阅了你的v1版本和当前版本的网络结构部分的代码,只发现了两处不同。当DFBlock融合文本信息的时候,v1版本代码只有句子向量,但当前版本还联合了噪音。然后MAGP进行梯度更新的时候从原来的分步更新变成了联合hinge loss一同更新。除了这两处不同外还有其他不同吗?
Transforming image is causing the foll. error:
File "datasets.py", line 301, in getitem
self.transform, normalize=self.norm)
File "datasets.py", line 77, in get_imgs
ret.append(normalize(img))
File "/home/j/anaconda3/envs/env_con/lib/python3.6/site-packages/torchvision/transforms.py", line 34, in call
img = t(img)
File "/home/j/anaconda3/envs/env_con/lib/python3.6/site-packages/torchvision/transforms.py", line 70, in call
img = torch.ByteTensor(torch.ByteStorage.from_buffer(pic.tobytes()))
File "/home/j/anaconda3/envs/env_con/lib/python3.6/site-packages/PIL/Image.py", line 738, in tobytes
e.setimage(self.im)
SystemError: tile cannot extend outside image
It looks like one of the image dimensions is becoming zero after transform resulting in the above error. Kindly suggest to resolve the error!
birds data image 点进去出现Sorry, but the page you were trying to view does not exist.怎样解决?
How did you form the text encoder. Also what are the contents of meta data files?
感谢作者上传新版本的代码,我正尝试复现,但发现有如下问题
在train.py的21行有如下代码:
from lib.datasets import get_fix_data`
包括modules中也用到了datasets:
from lib.datasets import TextImgDataset as Dataset
from lib.datasets import prepare_data, encode_tokens
lib文件夹中暂未找到datasets文件,是否上传遗漏,还是我复现步骤错误
我使用作者提供的pretrained model,順著指示操作(bash ./scripts/calc_fid.sh ./cfg/bird.yml),每次都測得FID score為32.79,請問是出了什麼問題
RuntimeError: Error(s) in loading state_dict for RNN_ENCODER:
size mismatch for encoder.weight: copying a param with shape torch.Size([5450, 300]) from checkpoint, the shape in current model is torch.Size([5598, 300]).
I would like to ask how to calculate the IS of the Bird dataset. Thank you.
作者你好,你的工作非常出色。我向请教一下如何将模型在我自己的数据集上复现,能告诉我吗?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.