Git Product home page Git Product logo

began-tensorflow's Introduction

BEGAN in Tensorflow

Tensorflow implementation of BEGAN: Boundary Equilibrium Generative Adversarial Networks.

alt tag

Requirements

Usage

First download CelebA datasets with:

$ apt-get install p7zip-full # ubuntu
$ brew install p7zip # Mac
$ python download.py

or you can use your own dataset by placing images like:

data
└── YOUR_DATASET_NAME
    ├── xxx.jpg (name doesn't matter)
    ├── yyy.jpg
    └── ...

To train a model:

$ python main.py --dataset=CelebA --use_gpu=True
$ python main.py --dataset=YOUR_DATASET_NAME --use_gpu=True

To test a model (use your load_path):

$ python main.py --dataset=CelebA --load_path=CelebA_0405_124806 --use_gpu=True --is_train=False --split valid

Results

Generator output (64x64) with gamma=0.5 after 300k steps

all_G_z0_64x64

Generator output (128x128) with gamma=0.5 after 200k steps

all_G_z0_64x64

Interpolation of Generator output (64x64) with gamma=0.5 after 300k steps

interp_G0_64x64

Interpolation of Generator output (128x128) with gamma=0.5 after 200k steps

interp_G0_128x128

Interpolation of Discriminator output of real images

alt tag
alt tag
alt tag
alt tag
alt tag
alt tag
alt tag
alt tag
alt tag
alt tag
alt tag

Related works

Author

Taehoon Kim / @carpedm20

began-tensorflow's People

Contributors

carpedm20 avatar chengdazhi avatar pocorall avatar sugyan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

began-tensorflow's Issues

ValueError: last dimension shape must be known but is None?

This is the problem that came up in the first place. So I printed some information and positioned it here. Can you tell me why?Thank you!!! I use my own database and tensorflow 0.10.0.
from tranier.py:
self.z:
Tensor("random_uniform:0", shape=(?, ?), dtype=float32)
from models.py :
self.z:
Tensor("random_uniform:0", shape=(?, ?), dtype=float32)

Here is the complete error message:

root: /media/z/lhj/images/testData_aligned/testData_aligned/
[16, 64, 64, 3]
3 64 64
self.data_loader:
Tensor("ToFloat:0", shape=(16, 3, 64, 64), dtype=float32)
self.z_num:
64
trainer x:
Tensor("sub:0", shape=(16, 3, 64, 64), dtype=float32)
tf.shape(x)[0]:
Tensor("strided_slice:0", shape=(), dtype=int32)
self.z:
Tensor("random_uniform:0", shape=(?, ?), dtype=float32)
self.conv_hidden_num:
128
self.channel:
3
self.repeat_num:
4
self.data_format:
NCHW
from models begin:
num_output:
8192
z:
Tensor("random_uniform:0", shape=(?, ?), dtype=float32)
Traceback (most recent call last):
File "main.py", line 44, in
main(config)
File "main.py", line 32, in main
trainer = Trainer(config, data_loader_)
File "/home/lhj/software/BEGAN-tensorflow-master/trainer.py", line 93, in init
self.build_model()
File "/home/lhj/software/BEGAN-tensorflow-master/trainer.py", line 199, in build_model
self.repeat_num, self.data_format, reuse=False)
File "/home/lhj/software/BEGAN-tensorflow-master/models.py", line 13, in GeneratorCNN
x = slim.fully_connected(z, num_output, activation_fn=None)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 171, in func_with_args
return func(*args, **current_args)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 792, in fully_connected
num_input_units = utils.last_dimension(inputs_shape, min_rank=2)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/utils.py", line 199, in last_dimension
raise ValueError('last dimension shape must be known but is None')
ValueError: last dimension shape must be known but is None

Pause the training

I wanted to pause the training with my dataset

as mentioned on this issues :

I tried to keep the same params
python main.py --dataset=mydata

python main.py --dataset=mydata --load_path=mydata_[08...]

It still begin from scratch and erase the old checkpoint.

Since the training is quite long I wonder how I could pause it

Bug in trainer.build_model()?

Hi,

thanks for sharing the code. I have a question about the discriminator loss for real images - self.d_loss_real in trainer.build_model(). You compute it as
self.d_loss_real = tf.reduce_mean(tf.abs(AE_x - x))
Before that you compute x to be normalized versions of real images by using x = norm_img(self.x), but AE_x is a de-normalized version of the discriminator output for x. Is it a bug, or am I missing something? I'm talking about this commit.

Thanks in advance,
Anastasia

downloading dataset

when I run"python download.py", I face the following problem:
image
Does anyone face the problem and know how to solve it?

mode collapse?

Hi @carpedm20 - this codebase is now working well for me, wanted to compare notes. with your latest changes I get pretty good results on 64x64 celeba. 128x128 doesn't do as well - probably needs different hyperparams.

Here are some 64x64 results after 130k cycles of training. first x_fixed, D_real :

x_fixed

130000_d_real

That looks like pretty good auto-encoder reconstruction results - the resemblance is pretty clear. OK, now here's the D_fake, and G:

130000_d_fake

130000_g

Now here my knowledge of this model gets a little hazy. These look fairly good and they look like each other - but they no longer bear very much resemblance x_fixed - should they? If so, then I think this is would likely be a symptom of a pretty significant mode collapse.

Just for comparison, here's what taking the same x_fixed looks like when passed through a trained ALI model:

x_fixed

20170414_ali64_montage_began

Some of these come out better than others, but note the overall variation in hair color, skin tone, glasses, hats, backgrounds that seems to be missing from the current BEGAN generator. Curious if you have any thoughts from your own investigations.

Unable to train datasets

Hi,
I put the datasets in the data file,
and execute this command:$python main.py --dataset=CelebA --use_gpu=True ,The following bug:
Traceback (most recent call last):
File "main.py", line 4, in
from trainer import Trainer
File "E:\product\faceVerification\faceAttributes\BEGAN-tensorflow\trainer.py", line 4, in
import StringIO
ImportError: No module named 'StringIO'

Thank you for your help!

How to do transfer learning?

How can I use existing model (weight) for new dataset?
If anyone has already done this please share your idea.

Interpolation of images done differently

Just a word of caution: As far as I understand, the way the interpolation of images is implemented here does not correspond to the way it was done in the paper:

In the paper, they approximate the G's z_1 and z_2 for two real images using Adam (reversing G(z)), then they interpolate between those z_1, z_2 and show G(z_{interp}).

In this implementation, the discriminator's latent codes h_1, h_2 are calculated from the two images, and the discriminator output for the interpolation is shown.

some questions regarding BEGAN

I am wondering whether it is possible to use BEGAN for unpaired data training? In that case, the LD and LG are not really matched, please help.

Regularization Losses

Hi,
I cannot see where the regularization losses have been added to the data loss. Am I missing something?

Error using Tensorflow 1.2.0 - Conv2DCustomBackpropInputOp only supports NHWC.

This is a great model, thanks for publishing it.

After upgrading Tensorflow to version 1.2.0 with:
pip install tensorflow --upgrade

I'm getting the following error:

InvalidArgumentError (see above for traceback): Conv2DCustomBackpropInputOp only supports NHWC.
         [[Node: gradients/D/Conv_20/convolution_grad/Conv2DBackpropInput = Conv2DBackpropInput[T=DT_FLOAT, data_format="NCHW", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/D/Conv_20/convolution_grad/Shape, D/Conv_20/weights/read, gradients/D/Conv_20/add_grad/tuple/control_dependency)]]

I'm getting the attached error, any thoughts on how to bring the model up to the current version of Tensorflow?

Full trace:

$ python main.py --dataset=CelebA --use_gpu=True
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.6 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
attempting to open ['data/CelebA/splits/train/139407.jpg', 'data/CelebA/splits/train/157712.jpg', 
.........
'data/CelebA/splits/train/116308.jpg', 'data/CelebA/splits/train/123878.jpg', 'data/CelebA/splits/train/155290.jpg', 'data/CelebA/splits/train/034270.jpg']
2017-06-17 02:38:27.935960: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-17 02:38:27.936041: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-17 02:38:27.936063: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-06-17 02:38:27.936083: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-17 02:38:27.936123: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
[*] MODEL dir: logs/CelebA_0617_023824
[*] PARAM path: logs/CelebA_0617_023824/params.json
  0%|                                                                                                                                                                               | 0/500000 [00:00<?, ?it/s]2017-06-17 02:38:30.372334: E tensorflow/core/common_runtime/executor.cc:644] Executor failed to create kernel. Invalid argument: Conv2DCustomBackpropInputOp only supports NHWC.
         [[Node: gradients/D/Conv_20/convolution_grad/Conv2DBackpropInput = Conv2DBackpropInput[T=DT_FLOAT, data_format="NCHW", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/D/Conv_20/convolution_grad/Shape, D/Conv_20/weights/read, gradients/D/Conv_20/add_grad/tuple/control_dependency)]]

Traceback (most recent call last):
  File "main.py", line 43, in <module>
    main(config)
  File "main.py", line 35, in main
    trainer.train()
  File "/home/medgar/models/GAN/carpedm20_BEGAN-tensorflow/BEGAN-tensorflow/trainer.py", line 140, in train
    result = self.sess.run(fetch_dict)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Conv2DCustomBackpropInputOp only supports NHWC.
         [[Node: gradients/D/Conv_20/convolution_grad/Conv2DBackpropInput = Conv2DBackpropInput[T=DT_FLOAT, data_format="NCHW", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/D/Conv_20/convolution_grad/Shape, D/Conv_20/weights/read, gradients/D/Conv_20/add_grad/tuple/control_dependency)]]

Caused by op u'gradients/D/Conv_20/convolution_grad/Conv2DBackpropInput', defined at:
  File "main.py", line 43, in <module>
    main(config)
  File "main.py", line 31, in main
    trainer = Trainer(config, data_loader)
  File "/home/medgar/models/GAN/carpedm20_BEGAN-tensorflow/BEGAN-tensorflow/trainer.py", line 92, in __init__
    self.build_model()
  File "/home/medgar/models/GAN/carpedm20_BEGAN-tensorflow/BEGAN-tensorflow/trainer.py", line 199, in build_model
    d_optim = d_optimizer.minimize(self.d_loss, var_list=self.D_var)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 315, in minimize
    grad_loss=grad_loss)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 386, in compute_gradients
    colocate_gradients_with_ops=colocate_gradients_with_ops)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 540, in gradients
    grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 346, in _MaybeCompile
    return grad_fn()  # Exit early
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 540, in <lambda>
    grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_grad.py", line 445, in _Conv2DGrad
    op.get_attr("data_format")),
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 488, in conv2d_backprop_input
    data_format=data_format, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

...which was originally created as op u'D/Conv_20/convolution', defined at:
  File "main.py", line 43, in <module>
    main(config)
[elided 1 identical lines from previous traceback]
  File "/home/medgar/models/GAN/carpedm20_BEGAN-tensorflow/BEGAN-tensorflow/trainer.py", line 92, in __init__
    self.build_model()
  File "/home/medgar/models/GAN/carpedm20_BEGAN-tensorflow/BEGAN-tensorflow/trainer.py", line 180, in build_model
    self.conv_hidden_num, self.data_format)
  File "/home/medgar/models/GAN/carpedm20_BEGAN-tensorflow/BEGAN-tensorflow/models.py", line 50, in DiscriminatorCNN
    out = slim.conv2d(x, input_channel, 3, 1, activation_fn=None, data_format=data_format)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
    return func(*args, **current_args)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 947, in convolution
    outputs = layer.apply(inputs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/base.py", line 492, in apply
    return self.__call__(inputs, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/base.py", line 441, in __call__
    outputs = self.call(inputs, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/convolutional.py", line 158, in call
    data_format=utils.convert_data_format(self.data_format, self.rank + 2))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 670, in convolution
    op=op)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 338, in with_space_to_batch
    return op(input, num_spatial_dims, padding)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 662, in op
    name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 131, in _non_atrous_convolution
    name=name)

InvalidArgumentError (see above for traceback): Conv2DCustomBackpropInputOp only supports NHWC.
         [[Node: gradients/D/Conv_20/convolution_grad/Conv2DBackpropInput = Conv2DBackpropInput[T=DT_FLOAT, data_format="NCHW", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/D/Conv_20/convolution_grad/Shape, D/Conv_20/weights/read, gradients/D/Conv_20/add_grad/tuple/control_dependency)]]

Unable to reproduce results: Generator output has only two face types

Anyone else having trouble training and reproducing example results for CelebA dataset?

After training on the CelebA dataset for 500,000 iterations, or about 33 hours, with the default settings, I had a look at the output and noticed something odd. The generated images, *_G.png, and fake images, fake.png, have only two face types in each file. Between files they were different, but in each file there are always just two distinct faces.

The same problem occurs in the interpolation tests. The output from setting --is_train=False has the same problem same problem and tries to interpolate between two distinct faces:
python main.py --dataset=CelebA --load_path=CelebA_0625_080047 --use_gpu=True --is_train=False --split valid

  1. Is this an example of mode collapse?
  2. Is this a bug or incorrect parameter defaults?
  3. Has anyone been able to reproduce the results on the README?

Naive question on training with A to B datasets

Is it possible to use BEGAN for predicted "B" image based on input "A" image? I know this is generally possible in GAN but in BEGAN implementations look like there is no implicit A-B training and prediction.

Sudden mode collapse after 400k epochs

I have had this weird behaviour where after 380000 epochs the generator suddenly mode collapses.
For example see:

epoch 350k
349500_g

epoch 390k
389500_g

epoch 396k
396000_g

epoch 400k
402000_g

epoch 480k
483000_g

I am suspecting that portrait guy in the middle of the lower row as the poltergeist that caused the mode collapse but I can't prove it 😆 . Anyone seen this happen before?

You can also see the g_loss and d_loss_fake increasing around epoch 380k
screenshot from 2017-04-30 10-23-16

Mode collapsing

Couldn't find a reason why, but the model always shows mode collapsing even after learning rate decay one or two times unlike the paper's description. I guess there could be an error on loss but first of all, I don't know how BEGAN tackle this without push-away factor of EBGAN (not sure learning rate decay is enough?).

Can't run the test code

when I finished BEGAN training with my own dataset, I get the following error when I run the test code as follows: python main.py --dataset=my_dataset --load_path=my_dataset_0824_004250 --use_gpu=True --is_train=False --split valid
the error is :
ValueError: Cannot feed value of shape (32, 64) for Tensor 'random_uniform:0', which has shape '(64, 64)'
so anyone meets the same error? how to use BEGAN to generate fake images?

data format problems about training with custom dataset

When I directly train the model on cifar-10, I received:

tensorflow.python.framework.errors_impl.InvalidArgumentError: CPU BiasOp only supports NHWC.
[[Node: G/Conv/BiasAdd = BiasAdd[T=DT_FLOAT, data_format="NCHW", _device="/job:localhost/replica:0/task:0/cpu:0"](G/Conv/convolution, G/Conv/biases/read)]]

Full stack trace:

E tensorflow/core/common_runtime/executor.cc:594] Executor failed to create kernel. Invalid argument: CPU BiasOp only supports NHWC.
[[Node: G/Conv/BiasAdd = BiasAdd[T=DT_FLOAT, data_format="NCHW", _device="/job:localhost/replica:0/task:0/cpu:0"](G/Conv/convolution, G/Conv/biases/read)]]

Traceback (most recent call last):
File "main.py", line 43, in
main(config)
File "main.py", line 35, in main
trainer.train()
File "/media/bitss/0D53F42E56755822/BEGAN-tensorflow/trainer.py", line 140, in train
result = self.sess.run(fetch_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 965, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1015, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1035, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: CPU BiasOp only supports NHWC.
[[Node: G/Conv/BiasAdd = BiasAdd[T=DT_FLOAT, data_format="NCHW", _device="/job:localhost/replica:0/task:0/cpu:0"](G/Conv/convolution, G/Conv/biases/read)]]

Caused by op u'G/Conv/BiasAdd', defined at:
File "main.py", line 43, in
main(config)
File "main.py", line 31, in main
trainer = Trainer(config, data_loader)
File "/media/bitss/0D53F42E56755822/BEGAN-tensorflow/trainer.py", line 92, in init
self.build_model()
File "/media/bitss/0D53F42E56755822/BEGAN-tensorflow/trainer.py", line 176, in build_model
self.repeat_num, self.data_format, reuse=False)
File "/media/bitss/0D53F42E56755822/BEGAN-tensorflow/models.py", line 11, in GeneratorCNN
x = slim.conv2d(x, hidden_num, 3, 1, activation_fn=tf.nn.elu, data_format=data_format)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 177, in func_with_args
return func(*args, **current_args)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 907, in convolution
outputs = layer.apply(inputs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/base.py", line 303, in apply
return self.call(inputs, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/base.py", line 273, in call
outputs = self.call(inputs, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/convolutional.py", line 169, in call
data_format=utils.convert_data_format(self.data_format, 4))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 1316, in bias_add
return gen_nn_ops._bias_add(value, bias, data_format=data_format, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 281, in _bias_add
data_format=data_format, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1226, in init
self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): CPU BiasOp only supports NHWC.
[[Node: G/Conv/BiasAdd = BiasAdd[T=DT_FLOAT, data_format="NCHW", _device="/job:localhost/replica:0/task:0/cpu:0"](G/Conv/convolution, G/Conv/biases/read)]]

ValueError on train with celebA

I'm getting an error during training. I tried it both with a version of celebA I already had on hand and with the one output by download.py and I got the same error. I'm running the system with --use_gpu=False (in case that matters).

Thanks for your help!

Here's the output and stack trace:

`[*] MODEL dir: logs/celebA_0418_104602
[*] PARAM path: logs/celebA_0418_104602/params.json
0%| | 0/500000 [00:00<?, ?it/s][0/500000] Loss_D: 0.538686 Loss_G: 0.048095 measure: 0.7599, k_t: 0.0002
[*] Samples saved: logs/celebA_0418_104602/0_G.png

Traceback (most recent call last):
File "main.py", line 43, in
main(config)
File "main.py", line 35, in main
trainer.train()
File "/home/mvertolli/BEGAN/trainer.py", line 158, in train
self.autoencode(x_fixed, self.model_dir, idx=step, x_fake=x_fake)
File "/home/mvertolli/BEGAN/trainer.py", line 263, in autoencode
x = self.sess.run(self.AE_x, {self.x: img})
File "/home/mvertolli/virtualenvs/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/home/mvertolli/virtualenvs/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 961, in _run
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (16, 3, 64, 64) for Tensor u'ToFloat:0', which has shape '(16, 64, 64, 3)'`

Unable to train

Hi, I meet the same problem like this

_> File "main.py", line 44, in

main(config)

File "main.py", line 31, in main
config.data_format, config.split)
File "C:\Users\repon\Downloads\BEGAN-tensorflow\BEGAN-tensorflow\data_loader.py", line 21, in get_loader
print(paths[0])
IndexError: list index out of rang_

I had run dowload.py , but it show the same error
so i trace the code, i find that there is not "data_path" in config.py

Pretrained weights for CelebA 64X64 and 128X128

Thanks for the awesome repo.
Will it be possible for you to release the pre-trained weights for CelebA on 64X64 and 128X128 resolution ?

Thanks,
Avisek Lahiri
Indian Institute of Technology Kharagpur

[BUG] StringIO module not found

Traceback (most recent call last):
  File "main.py", line 4, in <module>
    from trainer import Trainer
  File "/home/ubuntu/workspace/BEGAN-tensorflow/trainer.py", line 4, in <module>
    import StringIO
ModuleNotFoundError: No module named 'StringIO'

resolved
open trainer.py and add from io before import StringIO like this :
from io import StringIO

To Continue The Training.

Hello.
I want to stop and continue my training.
For it, I set "<model's name>_0215_104129" to --load_path.
"_0215_104129" is dir name.
Is this correct way?
Or should I do another way ?

I'll happy if you answer me.

Same learning speed with different GPUs

I have tried the code with same configuration and the dataset on 3 different GPUs: GTX 1070, TitanX, Tesla M60.

I am pretty sure that I should see a significant difference at least %60 but I am getting the same speed on all three which is around 1.5 it/s. What could be the reason? Is it about data loading overheads? How fast is your GPUs?

Can't train

I had some issues before, luckyly most of them got assesed here.

I am using a custom dataset, and this is the error I am getting:
I have no Idea what that could be.

raceback (most recent call last):
File "C:/Users/Johannes/AppData/Local/Programs/Python/Python35/BEGAN-tensorflow-master/main.py", line 43, in
main(config)
File "C:/Users/Johannes/AppData/Local/Programs/Python/Python35/BEGAN-tensorflow-master/main.py", line 35, in main
trainer.train()
File "C:\Users\Johannes\AppData\Local\Programs\Python\Python35\BEGAN-tensorflow-master\trainer.py", line 158, in train
self.autoencode(x_fixed, self.model_dir, idx=step, x_fake=x_fake)
File "C:\Users\Johannes\AppData\Local\Programs\Python\Python35\BEGAN-tensorflow-master\trainer.py", line 274, in autoencode
x = self.sess.run(self.AE_x, {self.x: img})
File "C:\Users\Johannes\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 778, in run
run_metadata_ptr)
File "C:\Users\Johannes\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 961, in _run
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (16, 3, 64, 64) for Tensor 'ToFloat:0', which has shape '(16, 64, 64, 3)'

can't train

Traceback (most recent call last):
File "D:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1039, in _do_call
return fn(*args)
File "D:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1021, in _run_fn
status, run_metadata)
File "D:\ProgramData\Anaconda3\envs\tensorflow\lib\contextlib.py", line 66, in exit
next(self.gen)
File "D:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_o
k_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.OutOfRangeError: RandomShuffleQueue '_0_synthetic_inputs/random_shuffle_queue' is closed and has insufficient
elements (requested 16, current size 0)
[[Node: synthetic_inputs = QueueDequeueManyV2[component_types=[DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](sy
nthetic_inputs/random_shuffle_queue, synthetic_inputs/n)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 43, in
main(config)
File "main.py", line 35, in main
trainer.train()
File "E:\Workspaces\ideaPro\pythondemo\began_tf\trainer.py", line 123, in train
x_fixed = self.get_image_from_loader()
File "E:\Workspaces\ideaPro\pythondemo\began_tf\trainer.py", line 349, in get_image_from_loader
x = self.data_loader.eval(session=self.sess)
File "D:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 569, in eval
return _eval_using_default_session(self, feed_dict, self.graph, session)
File "D:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 3741, in _eval_using_default_session
return session.run(tensors, feed_dict)
File "D:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 778, in run
run_metadata_ptr)
File "D:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "D:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "D:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: RandomShuffleQueue '_0_synthetic_inputs/random_shuffle_queue' is closed and has insufficient
elements (requested 16, current size 0)
[[Node: synthetic_inputs = QueueDequeueManyV2[component_types=[DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](sy
nthetic_inputs/random_shuffle_queue, synthetic_inputs/n)]]

Caused by op 'synthetic_inputs', defined at:
File "main.py", line 43, in
main(config)
File "main.py", line 30, in main
config.data_format, config.split)
File "E:\Workspaces\ideaPro\pythondemo\began_tf\data_loader.py", line 41, in get_loader
min_after_dequeue=min_after_dequeue, name='synthetic_inputs')
File "D:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\input.py", line 1214, in shuffle_batch
name=name)
File "D:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\input.py", line 784, in _shuffle_batch
dequeued = queue.dequeue_many(batch_size, name=name)
File "D:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\data_flow_ops.py", line 458, in dequeue_many
self._queue_ref, n=n, component_types=self._dtypes, name=name)
File "D:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_data_flow_ops.py", line 1328, in _queue_dequeue_many_v2
timeout_ms=timeout_ms, name=name)
File "D:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 768, in apply_op
op_def=op_def)
File "D:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "D:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 1228, in init
self._traceback = _extract_stack()

OutOfRangeError (see above for traceback): RandomShuffleQueue '_0_synthetic_inputs/random_shuffle_queue' is closed and has insufficient elements (req
uested 16, current size 0)
[[Node: synthetic_inputs = QueueDequeueManyV2[component_types=[DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](sy
nthetic_inputs/random_shuffle_queue, synthetic_inputs/n)]]``

cannot run test model perfectly

I tried to run the test model which I had trained before, but I failed.
The results all seemed noise after 'generate()' function.
I had check the input z_r, they were numbers between -1~1. (It's correct.)
I also print the tensors and variables from checkpoint files, they had numbers inside.
Is there any one can help me? thanks!!

Train problem

I'm new to machine learning. Can anyone tell me how to solve this problem. Thanks ahead.

File "main.py", line 43, in
main(config)
File "main.py", line 35, in main
trainer.train()
File "/Users/developer/Documents/git/opensource/BEGAN-tensorflow/BEGAN-tensorflow/trainer.py", line 158, in train
self.autoencode(x_fixed, self.model_dir, idx=step, x_fake=x_fake)
File "/Users/developer/Documents/git/opensource/BEGAN-tensorflow/BEGAN-tensorflow/trainer.py", line 263, in autoencode
x = self.sess.run(self.AE_x, {self.x: img})
File "/Users/developer/anaconda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/Users/developer/anaconda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 961, in _run
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (16, 3, 64, 64) for Tensor 'ToFloat:0', which has shape '(16, 64, 64, 3)'

What about MNIST dataset?

The model can not show any convergence clues in MNIST dataset during my training.
agg_d_mnist
after 1600 batch of training. It just remain no change with serious mode collapse.
something wrong?

Training on Multi-GPUs

I am pretty new to tensorflow although I have some experience with Keras. If I have two GPUs in my system, would it be possible to accelerate the training? If so, how? Apologies if this is a obvious question.

ValueError: Dimensions must be equal when using --use_authors_model

With the --use_authors_model flag, I get a value error (probably related to the data format)

(env) <me>@cuda4xl-02:~/Projects/BEGAN/src$ python main.py --use_gpu True --use_authors_model True

The same command without the --use_authors_model flag works perfectly.

cheers and keep up the great work 👍

Traceback (most recent call last):
  File "main.py", line 42, in <module>
    main(config)
  File "main.py", line 30, in main
    trainer = Trainer(config, data_loader)
  File "/home/<me>/Projects/BEGAN/src/trainer.py", line 79, in __init__
    self.build_model()
  File "/home/<me>/Projects/BEGAN/src/trainer.py", line 164, in build_model
    G, self.G_var = G_enc(G_in, 0)
  File "/home/<me>/Projects/BEGAN/src/layers.py", line 97, in __call__
    vout = unboxn(convs[0](vout), 2)
  File "/home/<me>/Projects/BEGAN/src/layers.py", line 48, in __call__
    padding=self.padding, data_format=self.data_format), self.bias)
  File "/home/<me>/Projects/BEGAN/src/layers.py", line 78, in <lambda>
    def __init__(self, name, n, width, colors, depth, scales, nl=lambda x, y: x + y, data_format="NCHW"): 
  File "/home/<me>/Projects/BEGAN/env/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 821, in binary_op_wrapper
    return func(x, y, name=name)
  File "/home/<me>/Projects/BEGAN/env/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 73, in add
    result = _op_def_lib.apply_op("Add", x=x, y=y, name=name)
  File "/home/<me>/Projects/BEGAN/env/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
    op_def=op_def)
  File "/home/<me>/Projects/BEGAN/env/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2338, in create_op
    set_shapes_for_outputs(ret)
  File "/home/<me>/Projects/BEGAN/env/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1719, in set_shapes_for_outputs
    shapes = shape_func(op)
  File "/home/<me>/Projects/BEGAN/env/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1669, in call_with_requiring
    return call_cpp_shape_fn(op, require_shape_fn=True)
  File "/home/<me>/Projects/BEGAN/env/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 610, in call_cpp_shape_fn
    debug_python_shape_fn, require_shape_fn)
  File "/home/<me>/Projects/BEGAN/env/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 676, in _call_cpp_shape_fn_impl
    raise ValueError(err.message)
ValueError: Dimensions must be equal, but are 8 and 128 for 'add' (op: 'Add') with input shapes: [?,128,8,8], [128].

Checkpoint

Hi,
Thank you for your amazing work.
Can you share you trained model as I find it's hard to get the satisfactory results.
Thank you!

Save Image

How can we separate images from generated image?

Training loss goes to NaN

I started training with Celeb database, but around 50k iteration, the loss goes to NaN & k_t goes to 1. Have you seen this before.
10%|█████▎ | 49650/500000 [5:29:56<49:13:32, 2.54it/s][49650/500000] Loss_D: 0.112764 Loss_G: 0.053848 measure: 0.1190, k_t: 0.0458
10%|█████▎ | 49700/500000 [5:30:16<49:18:01, 2.54it/s][49700/500000] Loss_D: nan Loss_G: nan measure: nan, k_t: 1.0000

Having said that, the output at 49k is starting to look good. Below are my last good outputs (real, d_real, g, d_fake).
x_fixed
49500_d_real
49500_g
49500_d_fake

My training stats looks like this
began_training_screenshot

Error in interpolation_G (function build_test_model)

hello, i think there is a problem in generator interpolation, in lines 237.
--> self.z_r_loss = tf.reduce_mean(tf.abs(self.x - G_z_r))
self.x is not normed to[-1,1], compute loss will cause an error
--> x = norm_img(self.x)
--> self.z_r_loss = tf.reduce_mean(tf.abs(x - G_z_r))
maybe it's the right form.
i found params "train_epoch=0" in function "interpolate_G", i think you may have tried to map a real image and it's mirror to noise and interpolate to generate new images with different poses but failed, after fixed the problem, you can try to interpolate again.

Running in Python 3

Code is working well in Python 3.5 also in case of code is modified as follows

diff --git a/trainer.py b/trainer.py
index 3fd8868..9670209 100644
--- a/trainer.py
+++ b/trainer.py
@@ -1,7 +1,7 @@
from future import print_function
import os
-import StringIO
+#import StringIO
import scipy.misc
import numpy as np
from glob import glob

shape of data error?

Below is the alert that i got, anybody know y does this happen? Thank you

Traceback (most recent call last):
File "main.py", line 45, in
main(config)
File "main.py", line 37, in main
trainer.train()
File "D:\GAN study\BEGAN-tensorflow-master\trainer.py", line 161, in train
self.autoencode(x_fixed, self.model_dir, idx=step, x_fake=x_fake)
File "D:\GAN study\BEGAN-tensorflow-master\trainer.py", line 266, in autoencode
x = self.sess.run(self.AE_x, {self.x: img})
File "C:\Users\Josh\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 895, in run
run_metadata_ptr)
File "C:\Users\Josh\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1100, in _run
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (16, 3, 64, 64) for Tensor 'ToFloat:0', which has shape '(16, 64, 64, 3)'

Batch size

I see that this implementation (and all the others I've seen online) use a batch size of 16. The paper also mentions this as the default batch size they tried. Is there a reason why the batch size is kept so small? 32 or 64 would speed up learning. Have you tried bigger batch sizes?

'image' must be three-dimensional

Hi, sorry to interrupt you , I got another problem when running the program as follows :

Traceback (most recent call last):
File "main.py", line 42, in
main(config)
File "main.py", line 28, in main
config.data_format, config.split)
File "/home/chikiuso/Downloads/BEGAN-tensorflow/data_loader.py", line 48, in get_loader
queue = tf.image.crop_to_bounding_box(queue, 50, 25, 128, 128)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/image_ops_impl.py", line 473, in crop_to_bounding_box
assert_ops += _Check3DImage(image, require_static=False)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/image_ops_impl.py", line 131, in _Check3DImage
raise ValueError("'image' must be three-dimensional.")
ValueError: 'image' must be three-dimensional.

Resuming training and dataset size

Hi there,
Thanks for the amazing work! I was training a custom dataset and things were looking good when I ran into a queue error - only 12 elements available out of 16. Does my dataset size need to be a multiple of 16 to avoid this?

Is there a way of resuming training from previously saved checkpoints? I've started looking into how to resume training in the official tf documentation and I assume I'll need to add some functionality to do this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.