Comments (18)
I found that the NaN is caused by small sigma, which led to a large likelihood. You may try to calculate log(P) instead p and calculate r by tf.reduce_logsumexp @www0wwwjs1
from matrix-capsules-em-tensorflow.
Thanks a lot for the suggestion!
from matrix-capsules-em-tensorflow.
Is the suggestion resolved?
from matrix-capsules-em-tensorflow.
There's still some problems, I print out the activation of the class capsules layer, it seems that all of the activation is 1., I guess it's caused by a large -log(sigma_h) so the logits are so large. It seems that decrease the temperature(lambda) to 1e-2 will work. I don't know if it is reasonable, what's your idea? @www0wwwjs1 Thanks!
from matrix-capsules-em-tensorflow.
Could you please mark the particular lines that you're improving? Thanks!
from matrix-capsules-em-tensorflow.
\lambda = 0.01 is already used in the latest version of the code and it helps the robustness. Another implementation also adopts similar parameter. It looks like a reasonable configuration here, however the original paper rarely mentioned the specific value of such parameters.
from matrix-capsules-em-tensorflow.
@yhyu13
I change the e-step into
p_c_h = -0.5 * tf.log(2 * math.pi * sigma_square) -tf.square(votes - miu) / (2 * sigma_square)
p_c = tf.reduce_sum(p_c_h, axis=3)
a1 = tf.log(tf.reshape(activation1, shape=[batch_size, 1, caps_num_c]))
ap = p_c + a1
sum_ap = tf.reduce_logsumexp(ap, axis=2, keep_dims=True)
r = tf.exp(ap - sum_ap)
from matrix-capsules-em-tensorflow.
Thanks for the suggestion of stability, bigger number of iterations is supported now.
from matrix-capsules-em-tensorflow.
I'm still a little confused. Is this necessary:
log_p_c_h = log_p_c_h -
(tf.reduce_max(log_p_c_h, axis=[2, 3], keep_dims=True) - tf.log(10.0))
@www0wwwjs1 @yhyu13
from matrix-capsules-em-tensorflow.
Yes. If you comment that line, it gives NaN gradient & loss during training immediately on both dataset. Let me know if this happens on your machine too.
Also, we are still using Yunzhi Shi's contribution because I find the network learns a bit faster in his setting. However, your contribution in this discussion is valuable and is much appreciated.
from matrix-capsules-em-tensorflow.
The purpose of this line is only to help the numerical stability, as shown in @yhyu13's comment. It does not correspond to any part of the algorithm in the original paper.
from matrix-capsules-em-tensorflow.
Thanks.
It seems that if the results on mnist don't perform like your report, maybe there should be a check?@www0wwwjs1
from matrix-capsules-em-tensorflow.
The experiment on MNIST was casted with an old version of the code. After the experiment, we added more things to further improve the performance especially for the more challenging smallNORB dataset. These auxiliary parts may impact the performance on the MNIST as well, however I think it will be positive influence. I also plan to recast the experiments on MNIST after we finished current experiments on smallNORB. If negative influence is observed, please let us know, many thanks.
from matrix-capsules-em-tensorflow.
I clone the latest version and run it without change any hyper-parameters. The loss is decreasing. However, the test accuracy cannot improve. I also try to set the "is_train" flag false, the test accuracy don't improve as well. I just run the command "python3 train.py "mnist"" and "python3 eval.py "mnist"". If there's anything wrong on my experiment settings, please let me know. Thanks!
from matrix-capsules-em-tensorflow.
Sorry, that's my bad. I pushed some experimental configurations without solid validation. You can clone the project again. The latest version of configuration should be valid. The same hyper-parameters are also used in our newest experiments. Although the final results is still incoming, the test accuracy is already growing up.
from matrix-capsules-em-tensorflow.
Thanks! it can perform well know.
from matrix-capsules-em-tensorflow.
Did anyone manage to get this to work with 3 or more iterations?
from matrix-capsules-em-tensorflow.
In em-routing's m-step, why not updating mean in the first two iterations? capsule_em.py L352? (the else part does not clear), can you give some explaination here, thanks
from matrix-capsules-em-tensorflow.
Related Issues (20)
- SmallNORB Data Creation
- Issue:Calculate the “Mean“ and “Variance” in EM Routing
- How to add a dateset to the get_coord_add function HOT 4
- Strange jump in training loss and accuracy HOT 4
- get a different loss and accuracy curve HOT 2
- Loss starting at 0.2 HOT 2
- Shape of beta_v
- Training without reconstruction loss HOT 3
- tf.matmul slows training and causes low GPU usage.
- How to add a dataset to the Coord_add function ? HOT 1
- ValueError: ('Convolution not supported for input with rank', 2)
- batch from the data queue
- eval.py HOT 1
- Is there Something to be done after running download.sh for smallNORB data to be available? HOT 6
- smallNORB dataset urls error "connection forbidden"
- cifar10 parameters setting
- Wrong Results
- the mean and std of smallNORB dataset HOT 1
- Why we need coord_add function?
- why use "batch_squash = tf.divide(batch_x, 255.)" rather than just batch_x
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from matrix-capsules-em-tensorflow.