I was playing around with the notebook, trying to look at the intermediate representations of the training data. I was expecting that the output of the y
layer would be (pretty) sparse and (nearly) binarized. But it seems like that's not the case:
...
Step 40001, ELBO: -101.598
Step 45001, ELBO: -99.799
>>> np_x, _ = data.next_batch(1)
>>> emb = sess.run(y, {x : np_x})
>>> emb.max(axis=-1) # Value of maximum of embedding -- would expect to be 1
array([ 0.13201179, 0.36978129, 0.41773844, 0.26891398, 0.24909849,
0.21777716, 0.1552867 , 0.47244716, 0.16195767, 0.39042374,
0.17623694, 0.2765696 , 0.19546057, 0.18048088, 0.12659149,
0.64287513, 0.14742081, 0.2126791 , 0.53717244, 0.23660626,
0.14906606, 0.15466955, 0.1191797 , 0.20597951, 0.25431085,
0.1979771 , 0.16981648, 0.2198326 , 0.17538837, 0.27005175], dtype=float32)
>>> ((emb < 0.01) | (emb > 0.99)).mean()
0.12
So it looks like the intermediate representations are still dense and not very binary. Any thoughts? (I'm new to Tensorflow/VAEs, so I may be making some silly coding/conceptual mistake...)