newmu / dcgan_code Goto Github PK

View Code? Open in Web Editor NEW

3.4K 159.0 695.0 65.19 MB

Deep Convolutional Generative Adversarial Networks

License: MIT License

Python 100.00%

dcgan_code's Introduction

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Alec Radford, Luke Metz, Soumith Chintala

All images in this paper are generated by a neural network. They are NOT REAL.

Full paper here: http://arxiv.org/abs/1511.06434

###Other implementations of DCGAN

##Summary of DCGAN We

stabilize Generative Adversarial networks with some architectural constraints
- Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator).
- Use batchnorm in both the generator and the discriminator
- Remove fully connected hidden layers for deeper architectures. Just use average pooling at the end.
- Use ReLU activation in generator for all layers except for the output, which uses Tanh.
- Use LeakyReLU activation in the discriminator for all layers.
use the discriminator as a pre-trained net for CIFAR-10 classification and show pretty decent results.
generate really cool bedroom images that look super real
To convince you that the network is not cheating:
- show the interpolated latent space, where transitions are really smooth and every image in the latent space is a bedroom.
- show bedrooms after one epoch of training (with a 0.0002 learning rate), come on the network cant really memorize at this stage.
To explore what the representations that the network learnt,
- show deconvolution over the filters, to show that maximal activations occur at objects like windows and beds
- figure out a way to identify and remove filters that draw windows in generation.
  - Now you can control the generator to not output certain objects.
Because we are tripping
- Smiling woman - neutral woman + neutral man = Smiling man. Whuttttt!
- man with glasses - man without glasses + woman without glasses = woman with glasses. Omg!!!!
learnt a latent space in a completely unsupervised fashion where ROTATIONS ARE LINEAR in this latent space. WHHHAAATT????!!!!!!
Figure 11, trained on imagenet has a plane with bird legs. so cooool.

Bedrooms after 5 epochs

Generated bedrooms after five epochs of training. There appears to be evidence of visual under-fitting via repeated textures across multiple samples.

Bedrooms after 1 epoch

Generated bedrooms after one training pass through the dataset. Theoretically, the model could learn to memorize training examples, but this is experimentally unlikely as we train with a small learning rate and minibatch SGD. We are aware of no prior empirical evidence demonstrating memorization with SGD and a small learning rate in only one epoch.

Walking from one point to another in bedroom latent space

Interpolation between a series of 9 random points in Z show that the space learned has smooth transitions, with every image in the space plausibly looking like a bedroom. In the 6th row, you see a room without a window slowly transforming into a room with a giant window. In the 10th row, you see what appears to be a TV slowly being transformed into a window.

Forgetting to draw windows

Top row: un-modified samples from model. Bottom row: the same samples generated with dropping out ”window” filters. Some windows are removed, others are transformed into objects with similar visual appearance such as doors and mirrors. Although visual quality decreased, overall scene composition stayed similar, suggesting the generator has done a good job disentangling scene representation from object representation. Extended experiments could be done to remove other objects from the image and modify the objects the generator draws.

Google image search from generations

Arithmetic on faces

Rotations are linear in latent space

More faces

Album covers

Imagenet generations

dcgan_code's People

Contributors

Stargazers

Watchers

Forkers

soumith aashusingh ml-lab rtvt123 zencoding xzflin ramtej nhasajob dribnet apolzin shanbady freefrancisco cauchyturing cognami wavelets rbnx codeaudit alantsev mkolod datachand noscripter techscientist peratham zuxfoucault nguyenductung jordanmicahbennett mvaz mansweet ml-ai-nlp-ir jackalhan eriche2016 penguinjunk vseledkin xcyan haojian wgapl npow kod3r nguyenducnhaty meteora9479 soroushmehr udibr selimam qinghe2015 shyamalschandra oztc wangxiong2015 fatemi magicalfox tybxiaobao leizou zbessinger davidbench cortexelus ghasshee lenovor anirudh9119 lim0606 hqdrt toints igotyooo jasonzliang lopsided allainmccain starimpact rjbashar nagyistge neuroidss dalamar66 adeze ritali satwantkumar dcfucheng yanweifu davenw16hd deeplearnphy hyperdrive adityosanjaya bertomartin nueluno multipath weilamchung kasertim lijian8 lucaskingjade zxl4869 jbciii cpehle qingsong99 yangxs txd866 ccienfall leduoba yangky11 jwgu taktak1 zbxzc35 gulugulugezi wenhuazang jwmneu

dcgan_code's Issues

Subset instances cannot be defined by a slice

Hello guys!
I got this error when I run the faces file train_uncond_dcgan.py
and this is it's Traceback.....
"
Traceback (most recent call last):
File "train_uncond_dcgan.py", line 52, in
tr_data, te_data, tr_stream, val_stream, te_stream = faces(ntrain=ntrain)
File "/home/jerry/dcgan_code/faces/load.py", line 14, in faces
te_data = H5PYDataset(path, which_sets=('test',))
File "/home/jerry/anaconda2/lib/python2.7/site-packages/fuel-0.2.0-py2.7-linux-x86_64.egg/fuel/datasets/hdf5.py", line 197, in init
self.num_examples)
File "/home/jerry/anaconda2/lib/python2.7/site-packages/fuel-0.2.0-py2.7-linux-x86_64.egg/fuel/datasets/hdf5.py", line 504, in num_examples
return self.subsets[0].num_examples
File "/home/jerry/anaconda2/lib/python2.7/site-packages/fuel-0.2.0-py2.7-linux-x86_64.egg/fuel/utils/init.py", line 441, in lazy_property_getter
self.load()
File "/home/jerry/anaconda2/lib/python2.7/site-packages/fuel-0.2.0-py2.7-linux-x86_64.egg/fuel/datasets/hdf5.py", line 465, in load
subsets = self.get_subsets(handle, self.which_sets, self.sources)
File "/home/jerry/anaconda2/lib/python2.7/site-packages/fuel-0.2.0-py2.7-linux-x86_64.egg/fuel/datasets/hdf5.py", line 448, in get_subsets
slice(row['start'], row['stop']), len(h5file[source]))
File "/home/jerry/anaconda2/lib/python2.7/site-packages/fuel-0.2.0-py2.7-linux-x86_64.egg/fuel/utils/init.py", line 53, in init
self._subset_sanity_check(list_or_slice, original_num_examples)
File "/home/jerry/anaconda2/lib/python2.7/site-packages/fuel-0.2.0-py2.7-linux-x86_64.egg/fuel/utils/init.py", line 313, in _subset_sanity_check
self._slice_subset_sanity_check(list_or_slice, num_examples)
File "/home/jerry/anaconda2/lib/python2.7/site-packages/fuel-0.2.0-py2.7-linux-x86_64.egg/fuel/utils/init.py", line 338, in _slice_subset_sanity_check
raise ValueError('Subset instances cannot be defined by a slice '
ValueError: Subset instances cannot be defined by a slice whose start value is greater than or equal to the original number of examples
"

I don't understand what kind of this problem....
Can anybody help me to solve it? PLEASE!!!

My hdf5 file is built by 128 pics 64643 faces jpeg photos......

miss bias in the first layer of the discriminator

referring to https://github.com/Newmu/dcgan_code/blob/master/faces/train_uncond_dcgan.py#L118
I tried adding my own bias just before the lrelu, but it is hard to tell if it helps:

b = bias_ifn((ndf), 'db')
...
h = lrelu(dnn_conv(X, w, subsample=(2, 2), border_mode=(2, 2))+b.dimshuffle('x', 0, 'x', 'x'))

ValueError: total size of new array must be unchanged error

The code from the load.py is generating error of "total size of new array must be unchanged". It just loads the mnist dataset to an array and then reshaping it. The error occurs while reshaping it (in 3rd line) and it is shown below:

 fd = open('C:\\Users\\***\\Desktop\\MNIST Dataset\\train-images.idx3-ubyte.gz')
 loaded = np.fromfile(file=fd,dtype=np.uint8)

---> trX = loaded[16:].reshape((60000,28*28)).astype(float)

 ValueError: total size of new array must be unchanged

I know what the function of reshape is doing. I just didn't figure it out that how to resolve this error. I have tried many things but didn't work in my favour. Can anyone suggest me any solution?

requirements.txt for installing deps?

HI, new to python so I'm just poking around and the initial setup could use some help in the form of a requirements.txt file. Would love a pip freeze > requirements.txt if you've got a virtual env with the right deps.

Awesome project.

CLASSIFYING CIFAR-10 USING GANS AS A FEATURE EXTRACTOR

@Newmu can you help me with sample code how to make this part?. I ask about the features of size 28672 how can i get this size and how to get features from it to every image.
Thanks in advance.

Saved model for faces

Would it be possible to upload the parameters for the model trained on faces?
In the same manner that the parameters for imagenet and svhn have been uploaded?

Thanks!

Batch normalization and inference in the DCGAN model

I am using the DCGAN code and pretty happy with the results. However, I am curious, should one not treat the Batch Normalization operation in a special way when doing inference (after training is completed) ?

i) the original BatchNorm paper mentions that we need to freeze the mean and variance when doing inference with the model https://arxiv.org/pdf/1502.03167v3.pdf , algorithm 2

ii) the DCGAN does not use this fixing of the statistics of the batch, so when we generate new samples with the _gen function it seem we calculate on the fly the batch norm statistics. This still works and produces nice images, to my surprise

iii) now here is a case when it does not work: start with a black image X and optimize it with respect to the discriminator function to make it close to the "true" images. With few iterations of gradient descent I can get an X image which is predicted as 1 (true), but it looks pretty much also black. So basically, the discriminator seems to be pretty bad in that case, even though the images I can generate are quite good. My guess would be that the batch normalization fails in that case, since the statistics of the single black image are totally different than the statistics of a proper random minibatch.

iv) has anyone implemented a fixing of the mini batch parameters for inference, as advocated in the original paper? This might be an useful option for the DCGAN code.

v) as next experiment, I will try to remove batch normalization and train without it, and than see whether my black image experiment will work correctly

if anyone has more insights about the use of batch normalization in the DCGAN it will be really helpful to discuss that, or to get the code for a simple modification of DCGAN in order to use fixed batch normalization operation when doing inference.

thanks a lot
Nikolay

Specify a license for the code

Awesome project!!

It would be great if you could specify a license for your code by adding a LICENSE file to the root of the git repo. If you're unfamiliar with source code licensing, check out http://choosealicense.com/

(shameless plug -- I'm a fan of the "GPL, version 2 or later" license because, in the terms of http://choosealicense.com/ I "care about sharing improvements".)

How to train it on my own datasets?

It's a amazing work.
Could you tell me how to train it on my own datasets?

How to generate a single sample?

Could someone tell me how to generate a single sample? For a single input, batchnorm changes it to [0, 0, 0...], so generated images are always the same no matter what the input is.

Training input dimensions

I'm trying to create my own dataset. What's the order of the dimensions of the input tensor? Is it (batch_size, channels, height, width), (batch_size, height, width, channels), (batch_size, height * width * channels, 1), or something different? I'm having a lot of trouble creating the hdf5 file. Anyone else have any luck?

ImportError: cannot import name gpu_alloc_empty

i'm getting an import error on the gpu_alloc_empty in the following:

from theano.sandbox.cuda.basic_ops import (as_cuda_ndarray_variable,
                                           host_from_gpu,
                                           gpu_contiguous, HostFromGpu,
                                           gpu_alloc_empty)

the other modules by themselves work fine, just gpu_alloc_empty fails. i have cudnn installed and just reinstalled Theano with pip.

There is no source code.

Bug: There is no source code for the model.

Expected Result: For source code to be released, since the project is titled dgcan_code :)

How to reproduce:

michael@halifax ~> git clone https://github.com/Newmu/dcgan_code
Cloning into 'dcgan_code'...
remote: Counting objects: 46, done.
remote: Compressing objects: 100% (39/39), done.
remote: Total 46 (delta 5), reused 45 (delta 4), pack-reused 0
Unpacking objects: 100% (46/46), done.
Checking connectivity... done.
michael@halifax ~> cd dcgan_code/
michael@halifax ~/dcgan_code> find . -iname '*.lua' | wc -l
       0

Is there a schedule to release any code? Many thanks.

How to use it with cpu

i don't have gpu, how can i use it with cpu. And i also want to know which part of your code is about "discriminator as a pre-trained net for CIFAR-10 classification ". Thanks

dead zone near origin of latent space

In looking at manifolds in my own experiments, I have noticed a consistent "dead zone" near the origin of the latent space. Here is an example generated with faces/train_uncond_dcgan.py and z=100:

I can post the math later, but suffice to say that the area near the center of the image is proportionally near zero in all z dimensions.

My strong suspicion is that this could be replicated by replacing this line in train_uncond_dcgan.py:

sample_zmb = floatX(np_rng.uniform(-1., 1., size=(nvis, nz)))

with

sample_zmb = floatX(np_rng.uniform(-0.1., 0.1., size=(nvis, nz)))

and seeing if this results in poor quality samples. I can followup and try this if that is useful - I haven't done so yet because I need to first implement the load operation to use one of the models that is being saved each epoch.

This isn't causing me any consternation, but I thought I would mention it since it's an unexpected curiosity and so might be a bug or might just be something I don't understand about the nature of this latent space.

using GAN and deconv with "valid" border mode

Hi, after playing with your code and getting very good results with it as it is, I am now looking to try other architectures and modify it. I want to try a model where there is no 0 padding and border_mode="valid" , instead of the current (2,2)

I have modified the deconv function from lib/ops.py to have the proper dimensions for the valid border mode, now called deconvV :

def deconvV(X, w, subsample=(1, 1), border_mode=(0, 0), conv_mode='conv'):
img = gpu_contiguous(X)
kerns = gpu_contiguous(w)
desc = GpuDnnConvDesc(border_mode=border_mode, subsample=subsample,
conv_mode=conv_mode)(gpu_alloc_empty(img.shape[0], kerns.shape[1], (img.shape[2]-1)_subsample[0]+kerns.shape[2], (img.shape[3]-1)_subsample[1]+kerns.shape[3]).shape, kerns.shape)
out = gpu_alloc_empty(img.shape[0], kerns.shape[1], (img.shape[2]-1)_subsample[0]+kerns.shape[2], (img.shape[3]-1)_subsample[1]+kerns.shape[3])
d_img = GpuDnnConvGradI()(kerns, img, out, desc)
return d_img

The code works with this operation, but I wonder whether it is correct. The GpuDnnConvGradI and GpuDnnConvDesc do not give an error even if I give some other values for the sizes, so I can be never sure whether I have a bug there or not.

I have also replaced the respective border mode in both the discriminator and generator, and taken care of all model dimensions so that it works. However, it runs for many iterations, sometimes looks as it learned something, but afterwards it just produces noise. Can it be I have made some error in my implementation of the deconv dimensions?

Or another explanation can be that the thus resulting architecture (without 0 padding) is much more unstable with respect to the collapse of the generator, as described by Tim Salimans in "Improved Techniques for Training GANs"

thanks a lot for your help

A small question regarding conv_cond_concat

Hi, recently I've been studying your code, especially on the conditional DCGAN you made for MNIST dataset.

I see that you concatenated the condition on every layer right after BatchNorm and ReLu, but I still get puzzled with the conv_cond_concat function that you use to concat the condition into hidden layer. On some layer, you simply use T.concatenate to join them, but on the other layer, you join them using conv_cond_concat function as described below

def conv_cond_concat(x, y):
    """ 
    concatenate conditioning vector on feature map axis 
    """
    return T.concatenate([x, y*T.ones((x.shape[0], y.shape[1], x.shape[2], x.shape[3]))], axis=1)

My questions are,

why using this function, instead of simple T.concatenate?
judging from reshaping of y, I assume you are depth-concatenating it. Am I correct?

how to get paper's dataset?

I can't find the code to download the data.
I need help . thanks

ImageNet pretrained model layer dimensions

I'm studying this code and would like to know if my understanding is correct.

Generator:

layer	gifn	gain_ifn	bias_ifn
1	(100, 128 * 8 * 4 * 4)	128 * 8 * 4 * 4	128 * 8 * 4 * 4
2	(1024, 512, 5, 5)	512	512
3	(512, 256, 5, 5)	256	256
4	(256, 128, 5, 5)	128	128
5	(128, 64, 5, 5)	64	64
6	(64, 32, 5, 5)	32	32
output	(32, 3, 5, 5)	----	----

Discriminator:

layer	gifn	gain_ifn	bias_ifn
1	(32, 3, 5, 5)	----	----
2	(64, 32, 5, 5)	64	64
3	(128, 64, 5, 5)	128	128
4	(256, 128, 5, 5)	256	256
5	(512, 256, 5, 5)	512	512
6	(1024, 512, 5, 5)	1024	1024
output	(128 * 8 * 4 * 4, 1)	----	----

Are these dimensions correct for the ImageNet model?

What does "GpuDnnConvGradI" do in deconv?

Hello, thank you very much for making public this great project.
I was going through your code, and ran into a point that was not quiet clear to me.
in "deconv" functino in lib/ops.py, line 92 and 95, you put the part that calculates gradient wrt input. I checked the counterpart of Torch version, and it was implemented using regular convolution layer there.

Why is this GpuDnnConvGradI used?
Thank you again for this great source code.

-Taeksoo

How can I add my picture in Arithmetic on faces in part three

binary data?

what is the best way to make this work effectively on binary data?
i'm assuming the network won't learn to output binary on its own, even if all the inputs are binary. trivially you can just threshold each output value at 0.5, but is there a better way to do this? i'm hoping to take advantage of having only two states to get away with using a larger input vector.

"faces" dataset: how to create hdf5 from images?

Given a set of pictures, it's unclear how to create a correctly formatted dataset. I understand if the pictures used cannot be made available, but could you please share whatever script or method you used to create the hdf5 dataset used as an input?

Thanks.

How to use this for image retrieval?

I trained dcgan on my own dataset,
and now i want to use this net for image retrieval
now i have the generator network, that takes a length 100 encoding and transform it to an image
is there a simple way to reverse this process, so that i can give an image and get a length 100 encoding?
i tried to reverse the net like so

def gen_inv(X, w, g, b, w2, g2, b2, w3, g3, b3, w4, g4, b4, wx):
    x = dnn_conv(X, wx, subsample=(2, 2), border_mode=(2, 2))
    x = relu(batchnorm(x, g=g4, b=b4))
    x = dnn_conv(x, w4, subsample=(2, 2), border_mode=(2, 2))

    h3 = relu(batchnorm(x, g=g3, b=b3))
    h3 = dnn_conv(h3, w3, subsample=(2, 2), border_mode=(2, 2))

    h2 = relu(batchnorm(h3, g=g2, b=b2))
    h2 = dnn_conv(h2, w2, subsample=(2, 2), border_mode=(2, 2))

    h2 = T.flatten(h2,2)
    h = relu(batchnorm(h2, g=g, b=b))
    h = T.dot(h, w.T)
    return h

but got that for no matter what the input is, i get the same output
so i figured that batchnorm is in training mode, so i tried to remove batchnorm as well
but still got very weird results

any advice?

cheers,
SH

a typo?

I see this in the ops.py

def conv_cond_concat(x, y): """ concatenate conditioning vector on feature map axis """ return T.concatenate([x, y*T.ones((x.shape[0], y.shape[1], x.shape[2], x.shape[3]))], axis=1)

does it really mean y.shape[1], and not x.shape[1]?

Album cover art dataset

Would be a great help if you could link the album cover art dataset you used to train the network on. Thanks very much

ValueError: total size of new array must be unchanged - Using Mnist Dataset

     fd = open('C:\\Users\\***\\Desktop\\MNIST Dataset\\train-images.idx3-ubyte.gz')
     loaded = np.fromfile(file=fd,dtype=np.uint8)
---> trX = loaded[16:].reshape((60000,28*28)).astype(float)

     ValueError: total size of new array must be unchanged

I know what the function of reshape is doing. I just didn't figure it out that how to resolve this error. I have tried many things but didn't work in my favour. Can anyone suggest me any solution?

how to deal with overfitting?

It would be nice if the README will contain some tips on how to detect and avoid overfitting.
I'm running the system and I believe I am overfitting.

So here are my tips which need editing before I put them in any README because I'm sure I dont understand the entire picture:

Currently I'm running on 50K examples so perhaps this is the main cause of the problem.
Also the output from running looks like the dump below. If I understand correctly the last two numbers
are supposed to both fall as the iterations progress but as you can see they just wounder around. Maybe this is another sign of overfitting. I'm guessing that the first 3 numbers are distance to nearst-neighbours on different sample sizes. But is 55-53 a low or high number? If it is a low number then this is yet another indication of overfitting.

0 55.06 53.97 53.16 4.0025 0.2856
1 58.91 57.44 56.40 4.7697 0.7055
2 57.16 54.62 52.88 1.7402 0.4073
3 58.61 55.97 54.14 2.1908 0.3683
4 57.77 54.55 52.74 2.6172 0.3062
5 53.55 51.07 49.17 3.7945 0.1111
6 56.28 52.57 50.30 5.4140 0.1525
7 57.81 53.84 51.49 5.6486 0.1883
8 56.94 53.39 51.10 6.3922 0.0688
9 59.04 55.49 53.08 3.5038 0.3072
10 55.08 51.79 49.73 4.6309 0.0904
11 56.03 52.80 50.83 3.6019 0.1094
12 55.67 52.30 50.22 5.2213 0.3286
13 55.65 52.55 50.65 4.2390 0.2232

No batch norm in last layer

This is more of a doubt than an issue.
Why you don't add batch norm to the last layer of your discriminator and generator?

Use fuel indexing [c,w,h] and not assume it is [w,h,c]

this will allow usage of fuel built in data-sets

#10 (comment)

CudaNdarrayType only supports dtype float32 for now

In the mnist training code, i get the following traceback. looks like a CUDA versioning issue? do i need to upgrade CUDA or is there some way around this?

gX = gen(Z, Y, *gen_params)

Traceback (most recent call last):
File "", line 1, in
File "", line 9, in gen
File "lib/ops.py", line 90, in deconv
img = gpu_contiguous(X)
File "/Users/gene/anaconda/lib/python2.7/site-packages/theano/gof/op.py", line 509, in call
node = self.make_node(_inputs, *_kwargs)
File "/Users/gene/anaconda/lib/python2.7/site-packages/theano/sandbox/cuda/basic_ops.py", line 3806, in make_node
input = as_cuda_ndarray_variable(input)
File "/Users/gene/anaconda/lib/python2.7/site-packages/theano/sandbox/cuda/basic_ops.py", line 47, in as_cuda_ndarray_variable
return gpu_from_host(tensor_x)
File "/Users/gene/anaconda/lib/python2.7/site-packages/theano/gof/op.py", line 509, in call
node = self.make_node(_inputs, *_kwargs)
File "/Users/gene/anaconda/lib/python2.7/site-packages/theano/sandbox/cuda/basic_ops.py", line 139, in make_node
dtype=x.dtype)()])
File "/Users/gene/anaconda/lib/python2.7/site-packages/theano/sandbox/cuda/type.py", line 70, in init
(self.class.name, dtype, name))
TypeError: CudaNdarrayType only supports dtype float32 for now. Tried using dtype float64 for variable None

Do you have a docker image of DCGAN?

Hi Alec,

I read from the Indico blog that you guys use docker a lot, just wondering if you have a docker copy of DCGAN handy to share? Thanks! Really cool work by the way!

Xinxin

[request]Figure 5 of DCGAN paper implementation

Hi,
I'm interested in the contents which is in section 6.2 VISUALIZING THE DISCRIMINATOR FEATURES of DCGAN paper.
I'm not sure I could understanding this part but, I failed to implement it.
I refered to #13 of the following page
https://github.com/Hvass-Labs/TensorFlow-Tutorials

please give me some tip or implementation code (doesn't matter at any code)
thanks in advance