alecgraves / bvae-tf Goto Github PK

View Code? Open in Web Editor NEW

54.0 6.0 13.0 2.68 MB

Disentangled Variational Auto-Encoder in TensorFlow / Keras (Beta-VAE)

License: The Unlicense

Python 100.00%

variational autoencoder disentangled beta tensorflow keras

bvae-tf's Introduction

BVAE-tf

Disentangled Variational Auto-Encoder in TensorFlow (Beta-VAE)

⭐ 💥 THE UNLICENSE 💥 ⭐

Example Reconstructed Image

What has been done

darknet 19 (fully convolutional & fast) encoder and decoder
Custom keras sampling layer for sampling the distribution of variational autoencoders
Custom loss in sampling layer for latent space regularization
- Options are no reg, vae reg (kl divergence), or bvae reg (beta*kl-divergence)
- You can also set a target capacity for dimension usage of the latent space
Simple interface for setting up your own VAE or B-VAE
- See the test function in ae.py for usage information

Enviroment Setup

I am using conda to ensure the enviroment is easy to install

Install Anaconda or Miniconda (the python 3 version) for your platform
Recreate the conda environment from the yml: conda env create -f environment.yml
Active the enviroment
1. Windows: go to cmd and activate bvae-tf
2. Linux: source activate bvae-tf
If you want to use CPU only, run pip uninstall tensorflow-gpu followed by pip install tensorflow==1.4.0 after you activate the environment.

If you do not want to / cannot use conda, I am using tensorflow 1.4.0; see the environment.yml for more package info.

Demo

For a simple overfitting demonstration, run ae.py in your terminal. This will cause the autoencoder to run on the included demo image.

Note: The demo takes a few minutes on my 1060 6GB, so it will take a while on a CPU...

bvae-tf's People

Contributors

Stargazers

Watchers

Forkers

stoplime kylelamarrc jperl mochiliu kapitsa2811 blasscoc bmanobel fuqianggu bgtoo daringpig yoann-fleytoux atanas1054 kdkuldeep

bvae-tf's Issues

Is negative stddev a problem?

I do not know if I should abs the stddev component of latent space or not... I think it breaks the loss function if it is negative?

# kl divergence:
latent_loss = -0.5 * K.mean(1 + stddev
                        - K.square(mean)
                        - K.exp(stddev), axis=-1)

wrong implementation?

I checked various pytorch repos,
all of them have loss for mean and log_var value, but your have not.
Also resampler formula wrong in your repo.

better way to handle batch_size

heyo,

really like your implementation but noticed the static batch size was causing me all sorts of grief when i wanted to play around with training. after a bit of mucking around i came up with a solution that i feel is a little more elegant.

basically the issue arises because during construction there is a call to the instantiated layer. at that point the tensor being passed in as "x" to the sampling layers call() function has an undefined batch_size. at build time all we need to do is return a tensor with the appopriate shape, we dont actually need to call the K.random_normal() function which is the only part of this function that needs the batch_size explicitly.

long story short, stick this in your Sampling.call() function:

        # trick to allow setting batch at train/eval time
        if x[0].shape[0].value == None:
            return mean + 0*stddev

in context that is (i made some slight other changes to function but you can ignore them, this is just so you can see how y fix would fit into the function):

    def call(self, x):
        if len(x) != 2:
            raise Exception('input layers must be a list: mean and stddev')
        if len(x[0].shape) != 2 or len(x[1].shape) != 2:
            raise Exception('input shape is not a vector [batchSize, latentSize]')
        
        mean = x[0]
        stddev = x[1]        
        
        # trick to allow setting batch at train/eval time
        if x[0].shape[0].value == None:
            return mean + 0*stddev

        if self.reg:
            # kl divergence:
            latent_loss = -0.5 * K.mean(1 + stddev
                                        - K.square(mean)
                                        - K.exp(stddev), axis=-1)        
    
            if self.reg == 'bvae':
                # use beta to force less usage of vector space:
                # also try to use <capacity> dimensions of the space:
                latent_loss = self.beta * K.abs(latent_loss - self.capacity/self.shape.as_list()[1])

            self.add_loss(latent_loss, x)

        epsilon = K.random_normal(shape=self.shape,
                              mean=0., stddev=1.)
        if self.random:
            # 'reparameterization trick':
            return mean + K.exp(stddev / 2) * epsilon
        else: # do not perform random sampling, simply grab the impulse value
            return mean + 0*stddev # Keras needs the *0 so the gradinent is not None

Unable to run ae.py due to NoneType error

I was trying to run the code, and encountered the following error. Please tell me how I can fix it.

Traceback (most recent call last):
  File "ae.py", line 65, in <module>
    test()
  File "ae.py", line 45, in test
    encoder = Darknet19Encoder(inputShape, latentSize=latentSize, latentConstraints='bvae', beta=69)
  File "/home/ies/billa/BVAE-tf/bvae/models.py", line 72, in __init__
    super().__init__(inputShape, batchSize, latentSize)
  File "/home/ies/billa/BVAE-tf/bvae/models.py", line 41, in __init__
    self.model = self.Build()
  File "/home/ies/billa/BVAE-tf/bvae/models.py", line 114, in Build
    sample = SampleLayer(self.latentConstraints, self.beta)([mean, logvar], training=self.training)
  File "/home/ies/billa/miniconda3/envs/pfprint/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 922, in __call__
    outputs = call_fn(cast_inputs, *args, **kwargs)
  File "/home/ies/billa/miniconda3/envs/pfprint/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 265, in wrapper
    raise e.ag_error_metadata.to_exception(e)
AttributeError: in user code:

    /home/ies/billa/BVAE-tf/bvae/sample_layer.py:70 call  *
        if mean.shape[0].value == None or  logvar.shape[0].value == None:

    AttributeError: 'NoneType' object has no attribute 'value'

Some questions about implementation

Hi,

I've been using your code in some experiments.
I have the following questions:

Applying your recent committed changes to the loss actually resulted in predicted values with weird (larger) ranges in my experiments, which were weirder to convert to an image. I had to "roll back" to the previous version... Have you noticed such an impact?
Shouldn't the last layer have a sigmoid as activation so that the output has values between 0 and 1? These values should be comparable to the input ones, which I think are rescaled to be between 0 and 1, I am correct? Does this affect the reconstruction loss?
Also, in some other implementations the common reconstruction loss is the mean squared error and not the mean absolute error. Do you use 'mae" for some reason?
This is an extra issue that I'm having. Have you been able to use the Tensorboard callback to log the losses and metrics? When trying to add the Tensorboard callback I get an error which I think is because the ae model is made of two models, and thus internally has more than one loss. I get the following error: line 1050, in _write_custom_summaries
summary_value.simple_value = value.item()
ValueError: can only convert an array of size 1 to a Python scalar
I could not find a solution yet..!
Minor detail: Why changing the stddev to its absolute value? Can it ever be negative?!

I'm sorry for the long text and for raising all these issues, but I think they may be relevant for more users too!

Thank you in advance!

I do not think the capacity argument works

What was past me thinking? I do not know 😄

 if self.reg == 'bvae':
            # kl divergence:
            latent_loss = -0.5 * K.mean(1 + stddev
                                - K.square(mean)
                                - K.exp(stddev), axis=-1)
            # use beta to force less usage of vector space:
            # also try to use <capacity> dimensions of the space:
            latent_loss = self.beta * K.abs(latent_loss - self.capacity/self.shape.as_list()[1])
            self.add_loss(latent_loss, x)

I just randomly subtract a constant from my loss?

This is more like it:

if self.reg == 'bvae':
            # kl divergence:
            latent_losses = -0.5 * (1 + stddev
                                - K.square(mean)
                                - K.exp(stddev))
            # use beta to force less usage of vector space:
            # also try to use <capacity> dimensions of the space:
            bvae_weight = self.beta * K.ones(shape=(self.shape.as_list()[1]-self.capacity))
            if self.capacity > 0:
                vae_weight = K.ones(shape=(self.capacity))
                bvae_weight = K.concatenate([vae_weight, bvae_weight], axis=-1)
            latent_loss = K.abs(K.mean(bvae_weight*latent_losses, axis=-1))
            
            self.add_loss(latent_loss, x)

a simple, fully convolutional architecture inspried by 
    pjreddie's darknet architecture
https://github.com/pjreddie/darknet/blob/master/cfg/darknet19.cfg
'''
def __init__(self, inputShape=(16889,), batchSize=32,
             latentSize=1024, latentConstraints='bvae', beta=100., capacity=0.,
             randomSample=True):
    '''
    params
    -------
    latentConstraints : str
        Either 'bvae', 'vae', or 'no'
        Determines whether regularization is applied
            to the latent space representation.
    beta : float
        beta > 1, used for 'bvae' latent_regularizer
        (Unused if 'bvae' not selected, default 100)
    capacity : float
        used for 'bvae' to try to break input down to a set number
            of basis. (e.g. at 25, the network will try to use 
            25 dimensions of the latent space)
        (unused if 'bvae' not selected)
    randomSample : bool
        whether or not to use random sampling when selecting from distribution.
        if false, the latent vector equals the mean, essentially turning this into a
            standard autoencoder.
    '''
    self.latentConstraints = latentConstraints
    self.beta = beta
    self.latentCapacity = capacity
    self.randomSample = randomSample
    print('inputShape ', inputShape, 'batchSize ', batchSize,'latentSize ',  latentSize)

    super().__init__(inputShape, batchSize, latentSize)

def Build(self):
    
    
    # create the input layer for feeding the netowrk
    inLayer = Input(shape=(16889,))
    net = Dense(1024, activation='relu',kernel_initializer='glorot_uniform')(inLayer)
    net = BatchNormalization()(net)
    net = Activation('relu')(net)
    
    mean = Dense(1024, name = 'mean')(net)
    stddev = Dense(1024, name = 'std')(net)
    
    sample = SampleLayer(self.latentConstraints, self.beta,
                        self.latentCapacity, self.randomSample)([mean, stddev])

    return Model(inputs=inLayer, outputs=sample)`

and this is the error that I'm getting:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-124-14cadf28fcb2> in <module>()
----> 1 d19e = Darknet19Encoder()
      2 d19e.model()

<ipython-input-123-3d464c6af1ad> in __init__(self, inputShape, batchSize, latentSize, latentConstraints, beta, capacity, randomSample)
     78         print('inputShape ', inputShape, 'batchSize ', batchSize,'latentSize ',  latentSize)
     79 
---> 80         super().__init__(inputShape, batchSize, latentSize)
     81 
     82     def Build(self):

<ipython-input-123-3d464c6af1ad> in __init__(self, inputShape, batchSize, latentSize)
     37         self.latentSize = latentSize
     38 
---> 39         self.model = self.Build()
     40 
     41 

<ipython-input-123-3d464c6af1ad> in Build(self)
     93 
     94         sample = SampleLayer(self.latentConstraints, self.beta,
---> 95                             self.latentCapacity, self.randomSample)([mean, stddev])
     96 
     97         return Model(inputs=inLayer, outputs=sample)

/projects/sysbio/projects/czi/immune/anaconda2/envs/py36/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, inputs, *args, **kwargs)
    734 
    735       if not in_deferred_mode:
--> 736         outputs = self.call(inputs, *args, **kwargs)
    737         if outputs is None:
    738           raise ValueError('A layer\'s `call` method should return a Tensor '

<ipython-input-121-c50a024c4c69> in call(self, x)
    110 
    111         epsilon = K.random_normal(shape=self.shape,
--> 112                               mean=0., stddev=1.)
    113         if self.random:
    114             # 'reparameterization trick':

/projects/sysbio/projects/czi/immune/anaconda2/envs/py36/lib/python3.6/site-packages/tensorflow/python/keras/backend.py in random_normal(shape, mean, stddev, dtype, seed)
   4512     seed = np.random.randint(10e6)
   4513   return random_ops.random_normal(
-> 4514       shape, mean=mean, stddev=stddev, dtype=dtype, seed=seed)
   4515 
   4516 

/projects/sysbio/projects/czi/immune/anaconda2/envs/py36/lib/python3.6/site-packages/tensorflow/python/ops/random_ops.py in random_normal(shape, mean, stddev, dtype, seed, name)
     70   """
     71   with ops.name_scope(name, "random_normal", [shape, mean, stddev]) as name:
---> 72     shape_tensor = _ShapeTensor(shape)
     73     mean_tensor = ops.convert_to_tensor(mean, dtype=dtype, name="mean")
     74     stddev_tensor = ops.convert_to_tensor(stddev, dtype=dtype, name="stddev")

/projects/sysbio/projects/czi/immune/anaconda2/envs/py36/lib/python3.6/site-packages/tensorflow/python/ops/random_ops.py in _ShapeTensor(shape)
     41   else:
     42     dtype = None
---> 43   return ops.convert_to_tensor(shape, dtype=dtype, name="shape")
     44 
     45 

/projects/sysbio/projects/czi/immune/anaconda2/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in convert_to_tensor(value, dtype, name, preferred_dtype)
    996       name=name,
    997       preferred_dtype=preferred_dtype,
--> 998       as_ref=False)
    999 
   1000 

/projects/sysbio/projects/czi/immune/anaconda2/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in internal_convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, ctx)
   1092 
   1093     if ret is None:
-> 1094       ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
   1095 
   1096     if ret is NotImplemented:

/projects/sysbio/projects/czi/immune/anaconda2/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py in _tensor_shape_tensor_conversion_function(s, dtype, name, as_ref)
    236   if not s.is_fully_defined():
    237     raise ValueError(
--> 238         "Cannot convert a partially known TensorShape to a Tensor: %s" % s)
    239   s_list = s.as_list()
    240   int64_value = 0

ValueError: Cannot convert a partially known TensorShape to a Tensor: (?, 1024)

and here some printing results that may help?

inputShape  (16889,) batchSize  32 latentSize  1024
len(x) 2 len(x[0].shape) 2 len(x[1].shape) 2 x [<tf.Tensor 'mean_10/BiasAdd:0' shape=(?, 1024) dtype=float32>, <tf.Tensor 'std_8/BiasAdd:0' shape=(?, 1024) dtype=float32>]
mean =  Tensor("mean_10/BiasAdd:0", shape=(?, 1024), dtype=float32)
stddev =  Tensor("std_8/BiasAdd:0", shape=(?, 1024), dtype=float32)
latent_loss Tensor("sample_layer_33/mul:0", shape=(), dtype=float32)
latent_loss Tensor("sample_layer_33/mul_1:0", shape=(), dtype=float32)

why is the in_train_phase not working

BVAE-tf/bvae/sample_layer.py

Line 91 in c26a21d

 return K.in_train_phase(reparameterization_trick, mean + 0*logvar, training=training) # TODO figure out why this is not working in the specified tf version??? 

in_train_phase should call the reparameterization function when K.backend is in its training phase.... It does not appear to be running at all.