Git Product home page Git Product logo

df-gan's People

Contributors

ha0tang avatar tobran avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

df-gan's Issues

关于获得FID的方法

现在正在学习并尝试复现DF-GAN的代码。但是我不知道要如何进行FID评分。根据github的代码以及提供的模型(netG_600.pth)进行评分的准备时,会生成2928张图片。但是在根据github的说明使用DM-GAN的FID代码进行评分时,只有55的分数。因此我在担心是否是我操作有误。我获取FID的评分时使用的代码是(python> fid_score.py --gpu 0 --batch-size 24 --path1 bird_val.npz --path2> ../../test/valid/single)

What does bird_val256_FIDK0.npz file Contains?

Hey @tobran ,
Big Fan of Your work and i have been following your work from past 3 months. I have implemented DFGAN and CLIP paper codes and they are excellent.

I have a question Regarding what values are stored in " bird_val256_FIDK0.npz" file. I know they are used for FID score calculation. But can u tell me how exactly they are calculated and should i change those values if i change my text encoder?

Hoping for faster Response

Thank You

IP adrees blocked by google Drive

My Ip address is abandonded from google drive. Any brother can share a new copy of the pretrained coco encoder or upload it to baiduyun

RNN_ENCODER size mismatch

Hi, I have cloned the repository and have been following the steps mentioned in the readme.md file.
While following the steps: The text encoder (pretrained for CUB dataset) is giving the below error:

File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1407, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for RNN_ENCODER: size mismatch for encoder.weight: copying a param with shape torch.Size([5450, 300]) from checkpoint, the shape in current model is torch.Size([1, 300]).

Anyone can share some thoughts and a solution for the same.

I do understand that there is a mismatch on shape but i am unable to pinpoint the exact position this is happening.

Package versions with python3.7:
torch 1.9.0+cu102 torchsummary 1.5.1 torchtext 0.10.0 torchvision 0.10.0+cu102

Fid Score on the CUB dataset

On CUB dataset, we used the pre-training model provided by the official website for testing, then the calculated result was 26.89 in the Fid score. However, the result shown on the paper was 14.81. May I ask what causes such a big difference?

您好,作者,想问一下关于IS指标的问题

作者,您好,我在复现您提供的代码的时候,FID的指标没有问题,可是IS的数值达不到5.1
您说的 IS可以通过Inception-V3联合训练进行过拟合,是什么意思?能麻烦您解答一下吗?

name 'count' is not defined报错帮看下

[1/1][365/368] Loss_D: 1.329 Loss_G 1.297
[1/1][366/368] Loss_D: 1.162 Loss_G 3.206
[1/1][367/368] Loss_D: 1.162 Loss_G 2.319
Traceback (most recent call last):
File "main.py", line 273, in
count = train(dataloader,netG,netD,text_encoder,optimizerG,optimizerD, state_epoch,batch_size,device)
File "main.py", line 179, in train
return count
NameError: name 'count' is not defined

About `image_encoder`

Hello, thank you for all the hard work you've done.

I got interested by your works and meticulously read the code and the paper.
However, I'm still struggling to understand what the image_encoder is for, so it would be really helpful if you could give me some detailed explanations about it, and I apologise in advance if I have missed an obvious answer !

In code.lib.prepare.py we have the function:

def prepare_models(...)
    ....
    return image_encoder, ....

But actually, image_encoder seems to be never used by any modules.

Thank you !

如何复现 coco dataset 上的 fid 指标

作者您好。我看到 paper 中提供的最终 model 在 coco dataset 上的 fid 是 19.32,但是我下载了您 readme 里边提供的 checkpoint,并运行 bash scripts/calc_fid.sh ./cfg/coco.yml 来测量 coco val2014 上的 fid。但是我得到的结果是 15.59

我是完全按照 readme 里边的顺序一步一步 setup 的。唯一的区别是 codebase 使用的 pytorch 和 torchvision 的版本不 support 我使用的 3090 所以我将 torch 升级到 1.13.1 torchvision 升级到 0.14 。这样测量出来的结果是 15.59

我不知道我哪一步出了问题,我只是升级了 torch 的版本适配到我的机器上。

期待您的解答

Training result on CUB

fake_samples_epoch_601
Great repository. I got some questions about the result. Is the result suppose to be like that after training for 600 epochs ? I trained with nF=32. Is there any way to improve the result? Looking forward to your reply.

Where can I download the checkpoints at 600 epoch and 120 epoch for CUB and COCO respectively?

Thank you so much for sharing your work and code!

I have noticed that the results and trained models in this repo were updated compared with the results in the paper. The released checkpoints for the models now are ./saved_models/bird/pretrained/state_epoch_1220.pth and ./saved_models/coco/pretrained/state_epoch_290.pth. I am wondering where can I download the checkpoint files at 600 epoch and 120 epoch for CUB and COCO respectively? Since those epoch settings are the default for Text2Image task and I believe they are what you used to produce the results in the paper, they are very essential for us to follow your work.

The retrained result on COCO

Hello,

Thank you for your works.
I have some problems when retraining the model on COCO. The retraining FID is 56.46 using the default setting of coco.yml (batchsize = 24, nf = 32), and the enviroment is py36, pytorch1.5 with a 2080Ti. However, the provided pretrained model achieves 26.68 FID with the same setting. In addition, have you tried Inception Score on COCO?

I'll really appreciate it if you could give me some advice.

关于 DFGAN 的若干问题

你好,

首先感谢你的工作。但关于代码我有一些问题:

  1. DFBlock 中的激活函数
    论文中展示的 DFBlock 激活函数使用的是 ReLU,提供的代码中却是 LeakyReLU

  2. 关于提供的预训练的模型
    使用你们提供的预训练模型测出来的 IS 很低,但用你们的代码 retrain 之后模型比较正常,所以你们提供的预训练模型是否出了问题?

  3. A titan xp (set nf=32 in *.yaml) or a V100 32GB (set nf=64 in *.yaml)
    论文中的生成鸟的 IS=4.86 是 nf=32 还是 nf=64 的时候测出来的呢?nf 是 32 还是 64 会造成较大影响吗?当然后续我也会自己试验一下的。

问题比较多,如果能够回答,不胜感激。

FID on MSCOCO dataset

How many images did you test in mscoco dataset in FID metric?
There is only 4490 images from sample_times: 1 * len(filenames) from test: 4490.
Shouldn't it be 30000 images?

About the differences between V1, V2 and V3 version paper code

I checked the code of the network structure of your V1 version and the current version, and found only two differences. When DFBlock fuses text information, the V1 version code only has sentence vectors, but the current version also combines noise. Then the gradient update of MAGP is changed from the original step-by-step update to the joint hinge loss joint update. Besides these two differences, are there any other differences?

我查阅了你的v1版本和当前版本的网络结构部分的代码,只发现了两处不同。当DFBlock融合文本信息的时候,v1版本代码只有句子向量,但当前版本还联合了噪音。然后MAGP进行梯度更新的时候从原来的分步更新变成了联合hinge loss一同更新。除了这两处不同外还有其他不同吗?

SystemError: tile cannot extend outside image (Image transform error)

Transforming image is causing the foll. error:
File "datasets.py", line 301, in getitem
self.transform, normalize=self.norm)
File "datasets.py", line 77, in get_imgs
ret.append(normalize(img))
File "/home/j/anaconda3/envs/env_con/lib/python3.6/site-packages/torchvision/transforms.py", line 34, in call
img = t(img)
File "/home/j/anaconda3/envs/env_con/lib/python3.6/site-packages/torchvision/transforms.py", line 70, in call
img = torch.ByteTensor(torch.ByteStorage.from_buffer(pic.tobytes()))
File "/home/j/anaconda3/envs/env_con/lib/python3.6/site-packages/PIL/Image.py", line 738, in tobytes
e.setimage(self.im)
SystemError: tile cannot extend outside image

It looks like one of the image dimensions is becoming zero after transform resulting in the above error. Kindly suggest to resolve the error!

birds data image

birds data image 点进去出现Sorry, but the page you were trying to view does not exist.怎样解决?

Text Embeddings

How did you form the text encoder. Also what are the contents of meta data files?

代码里面没有lib.datasets

感谢作者上传新版本的代码,我正尝试复现,但发现有如下问题
在train.py的21行有如下代码:
from lib.datasets import get_fix_data`

包括modules中也用到了datasets:
from lib.datasets import TextImgDataset as Dataset
from lib.datasets import prepare_data, encode_tokens

lib文件夹中暂未找到datasets文件,是否上传遗漏,还是我复现步骤错误

關於測試FID

我使用作者提供的pretrained model,順著指示操作(bash ./scripts/calc_fid.sh ./cfg/bird.yml),每次都測得FID score為32.79,請問是出了什麼問題

where to download the textencoder pretrained file?

RuntimeError: Error(s) in loading state_dict for RNN_ENCODER:
size mismatch for encoder.weight: copying a param with shape torch.Size([5450, 300]) from checkpoint, the shape in current model is torch.Size([5598, 300]).

换成自己的数据集

作者你好,你的工作非常出色。我向请教一下如何将模型在我自己的数据集上复现,能告诉我吗?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.