alecgraves / bvae-tf Goto Github PK
View Code? Open in Web Editor NEWDisentangled Variational Auto-Encoder in TensorFlow / Keras (Beta-VAE)
License: The Unlicense
Disentangled Variational Auto-Encoder in TensorFlow / Keras (Beta-VAE)
License: The Unlicense
What was past me thinking? I do not know ๐
if self.reg == 'bvae':
# kl divergence:
latent_loss = -0.5 * K.mean(1 + stddev
- K.square(mean)
- K.exp(stddev), axis=-1)
# use beta to force less usage of vector space:
# also try to use <capacity> dimensions of the space:
latent_loss = self.beta * K.abs(latent_loss - self.capacity/self.shape.as_list()[1])
self.add_loss(latent_loss, x)
I just randomly subtract a constant from my loss?
This is more like it:
if self.reg == 'bvae':
# kl divergence:
latent_losses = -0.5 * (1 + stddev
- K.square(mean)
- K.exp(stddev))
# use beta to force less usage of vector space:
# also try to use <capacity> dimensions of the space:
bvae_weight = self.beta * K.ones(shape=(self.shape.as_list()[1]-self.capacity))
if self.capacity > 0:
vae_weight = K.ones(shape=(self.capacity))
bvae_weight = K.concatenate([vae_weight, bvae_weight], axis=-1)
latent_loss = K.abs(K.mean(bvae_weight*latent_losses, axis=-1))
self.add_loss(latent_loss, x)
heyo,
really like your implementation but noticed the static batch size was causing me all sorts of grief when i wanted to play around with training. after a bit of mucking around i came up with a solution that i feel is a little more elegant.
basically the issue arises because during construction there is a call to the instantiated layer. at that point the tensor being passed in as "x" to the sampling layers call() function has an undefined batch_size. at build time all we need to do is return a tensor with the appopriate shape, we dont actually need to call the K.random_normal() function which is the only part of this function that needs the batch_size explicitly.
long story short, stick this in your Sampling.call() function:
# trick to allow setting batch at train/eval time
if x[0].shape[0].value == None:
return mean + 0*stddev
in context that is (i made some slight other changes to function but you can ignore them, this is just so you can see how y fix would fit into the function):
def call(self, x):
if len(x) != 2:
raise Exception('input layers must be a list: mean and stddev')
if len(x[0].shape) != 2 or len(x[1].shape) != 2:
raise Exception('input shape is not a vector [batchSize, latentSize]')
mean = x[0]
stddev = x[1]
# trick to allow setting batch at train/eval time
if x[0].shape[0].value == None:
return mean + 0*stddev
if self.reg:
# kl divergence:
latent_loss = -0.5 * K.mean(1 + stddev
- K.square(mean)
- K.exp(stddev), axis=-1)
if self.reg == 'bvae':
# use beta to force less usage of vector space:
# also try to use <capacity> dimensions of the space:
latent_loss = self.beta * K.abs(latent_loss - self.capacity/self.shape.as_list()[1])
self.add_loss(latent_loss, x)
epsilon = K.random_normal(shape=self.shape,
mean=0., stddev=1.)
if self.random:
# 'reparameterization trick':
return mean + K.exp(stddev / 2) * epsilon
else: # do not perform random sampling, simply grab the impulse value
return mean + 0*stddev # Keras needs the *0 so the gradinent is not None
Hi,
I've been using your code in some experiments.
I have the following questions:
Applying your recent committed changes to the loss actually resulted in predicted values with weird (larger) ranges in my experiments, which were weirder to convert to an image. I had to "roll back" to the previous version... Have you noticed such an impact?
Shouldn't the last layer have a sigmoid as activation so that the output has values between 0 and 1? These values should be comparable to the input ones, which I think are rescaled to be between 0 and 1, I am correct? Does this affect the reconstruction loss?
Also, in some other implementations the common reconstruction loss is the mean squared error and not the mean absolute error. Do you use 'mae" for some reason?
This is an extra issue that I'm having. Have you been able to use the Tensorboard callback to log the losses and metrics? When trying to add the Tensorboard callback I get an error which I think is because the ae model is made of two models, and thus internally has more than one loss. I get the following error: line 1050, in _write_custom_summaries
summary_value.simple_value = value.item()
ValueError: can only convert an array of size 1 to a Python scalar
I could not find a solution yet..!
Minor detail: Why changing the stddev to its absolute value? Can it ever be negative?!
I'm sorry for the long text and for raising all these issues, but I think they may be relevant for more users too!
Thank you in advance!
Hi,
Thanks for providing the code!
I was wondering if you know how to apply the code to a 1D data instead an image? I have done some edits to the code, but I am getting the following error:
Here is my edits to the code:
`class Darknet19Encoder(Architecture):
'''
This encoder predicts distributions then randomly samples them.
Regularization may be applied to the latent space output
a simple, fully convolutional architecture inspried by
pjreddie's darknet architecture
https://github.com/pjreddie/darknet/blob/master/cfg/darknet19.cfg
'''
def __init__(self, inputShape=(16889,), batchSize=32,
latentSize=1024, latentConstraints='bvae', beta=100., capacity=0.,
randomSample=True):
'''
params
-------
latentConstraints : str
Either 'bvae', 'vae', or 'no'
Determines whether regularization is applied
to the latent space representation.
beta : float
beta > 1, used for 'bvae' latent_regularizer
(Unused if 'bvae' not selected, default 100)
capacity : float
used for 'bvae' to try to break input down to a set number
of basis. (e.g. at 25, the network will try to use
25 dimensions of the latent space)
(unused if 'bvae' not selected)
randomSample : bool
whether or not to use random sampling when selecting from distribution.
if false, the latent vector equals the mean, essentially turning this into a
standard autoencoder.
'''
self.latentConstraints = latentConstraints
self.beta = beta
self.latentCapacity = capacity
self.randomSample = randomSample
print('inputShape ', inputShape, 'batchSize ', batchSize,'latentSize ', latentSize)
super().__init__(inputShape, batchSize, latentSize)
def Build(self):
# create the input layer for feeding the netowrk
inLayer = Input(shape=(16889,))
net = Dense(1024, activation='relu',kernel_initializer='glorot_uniform')(inLayer)
net = BatchNormalization()(net)
net = Activation('relu')(net)
mean = Dense(1024, name = 'mean')(net)
stddev = Dense(1024, name = 'std')(net)
sample = SampleLayer(self.latentConstraints, self.beta,
self.latentCapacity, self.randomSample)([mean, stddev])
return Model(inputs=inLayer, outputs=sample)`
and this is the error that I'm getting:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-124-14cadf28fcb2> in <module>()
----> 1 d19e = Darknet19Encoder()
2 d19e.model()
<ipython-input-123-3d464c6af1ad> in __init__(self, inputShape, batchSize, latentSize, latentConstraints, beta, capacity, randomSample)
78 print('inputShape ', inputShape, 'batchSize ', batchSize,'latentSize ', latentSize)
79
---> 80 super().__init__(inputShape, batchSize, latentSize)
81
82 def Build(self):
<ipython-input-123-3d464c6af1ad> in __init__(self, inputShape, batchSize, latentSize)
37 self.latentSize = latentSize
38
---> 39 self.model = self.Build()
40
41
<ipython-input-123-3d464c6af1ad> in Build(self)
93
94 sample = SampleLayer(self.latentConstraints, self.beta,
---> 95 self.latentCapacity, self.randomSample)([mean, stddev])
96
97 return Model(inputs=inLayer, outputs=sample)
/projects/sysbio/projects/czi/immune/anaconda2/envs/py36/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, inputs, *args, **kwargs)
734
735 if not in_deferred_mode:
--> 736 outputs = self.call(inputs, *args, **kwargs)
737 if outputs is None:
738 raise ValueError('A layer\'s `call` method should return a Tensor '
<ipython-input-121-c50a024c4c69> in call(self, x)
110
111 epsilon = K.random_normal(shape=self.shape,
--> 112 mean=0., stddev=1.)
113 if self.random:
114 # 'reparameterization trick':
/projects/sysbio/projects/czi/immune/anaconda2/envs/py36/lib/python3.6/site-packages/tensorflow/python/keras/backend.py in random_normal(shape, mean, stddev, dtype, seed)
4512 seed = np.random.randint(10e6)
4513 return random_ops.random_normal(
-> 4514 shape, mean=mean, stddev=stddev, dtype=dtype, seed=seed)
4515
4516
/projects/sysbio/projects/czi/immune/anaconda2/envs/py36/lib/python3.6/site-packages/tensorflow/python/ops/random_ops.py in random_normal(shape, mean, stddev, dtype, seed, name)
70 """
71 with ops.name_scope(name, "random_normal", [shape, mean, stddev]) as name:
---> 72 shape_tensor = _ShapeTensor(shape)
73 mean_tensor = ops.convert_to_tensor(mean, dtype=dtype, name="mean")
74 stddev_tensor = ops.convert_to_tensor(stddev, dtype=dtype, name="stddev")
/projects/sysbio/projects/czi/immune/anaconda2/envs/py36/lib/python3.6/site-packages/tensorflow/python/ops/random_ops.py in _ShapeTensor(shape)
41 else:
42 dtype = None
---> 43 return ops.convert_to_tensor(shape, dtype=dtype, name="shape")
44
45
/projects/sysbio/projects/czi/immune/anaconda2/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in convert_to_tensor(value, dtype, name, preferred_dtype)
996 name=name,
997 preferred_dtype=preferred_dtype,
--> 998 as_ref=False)
999
1000
/projects/sysbio/projects/czi/immune/anaconda2/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in internal_convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, ctx)
1092
1093 if ret is None:
-> 1094 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
1095
1096 if ret is NotImplemented:
/projects/sysbio/projects/czi/immune/anaconda2/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py in _tensor_shape_tensor_conversion_function(s, dtype, name, as_ref)
236 if not s.is_fully_defined():
237 raise ValueError(
--> 238 "Cannot convert a partially known TensorShape to a Tensor: %s" % s)
239 s_list = s.as_list()
240 int64_value = 0
ValueError: Cannot convert a partially known TensorShape to a Tensor: (?, 1024)
and here some printing results that may help?
inputShape (16889,) batchSize 32 latentSize 1024
len(x) 2 len(x[0].shape) 2 len(x[1].shape) 2 x [<tf.Tensor 'mean_10/BiasAdd:0' shape=(?, 1024) dtype=float32>, <tf.Tensor 'std_8/BiasAdd:0' shape=(?, 1024) dtype=float32>]
mean = Tensor("mean_10/BiasAdd:0", shape=(?, 1024), dtype=float32)
stddev = Tensor("std_8/BiasAdd:0", shape=(?, 1024), dtype=float32)
latent_loss Tensor("sample_layer_33/mul:0", shape=(), dtype=float32)
latent_loss Tensor("sample_layer_33/mul_1:0", shape=(), dtype=float32)
as @beatriz-ferreira mentions in #3, the paper implements normalization for beta, providing more standard performance over different latent vector sizes. This behavior (as default or an option) would be a good addition to the layer.
Line 91 in c26a21d
in_train_phase should call the reparameterization function when K.backend is in its training phase.... It does not appear to be running at all.
I checked various pytorch repos,
all of them have loss for mean and log_var value, but your have not.
Also resampler formula wrong in your repo.
In your case, you use beta = 100. So, how to choose proper beta value (not constant)? And large or small beta value is good or not?
I was trying to run the code, and encountered the following error. Please tell me how I can fix it.
Traceback (most recent call last):
File "ae.py", line 65, in <module>
test()
File "ae.py", line 45, in test
encoder = Darknet19Encoder(inputShape, latentSize=latentSize, latentConstraints='bvae', beta=69)
File "/home/ies/billa/BVAE-tf/bvae/models.py", line 72, in __init__
super().__init__(inputShape, batchSize, latentSize)
File "/home/ies/billa/BVAE-tf/bvae/models.py", line 41, in __init__
self.model = self.Build()
File "/home/ies/billa/BVAE-tf/bvae/models.py", line 114, in Build
sample = SampleLayer(self.latentConstraints, self.beta)([mean, logvar], training=self.training)
File "/home/ies/billa/miniconda3/envs/pfprint/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 922, in __call__
outputs = call_fn(cast_inputs, *args, **kwargs)
File "/home/ies/billa/miniconda3/envs/pfprint/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 265, in wrapper
raise e.ag_error_metadata.to_exception(e)
AttributeError: in user code:
/home/ies/billa/BVAE-tf/bvae/sample_layer.py:70 call *
if mean.shape[0].value == None or logvar.shape[0].value == None:
AttributeError: 'NoneType' object has no attribute 'value'
I do not know if I should abs the stddev component of latent space or not... I think it breaks the loss function if it is negative?
# kl divergence:
latent_loss = -0.5 * K.mean(1 + stddev
- K.square(mean)
- K.exp(stddev), axis=-1)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.