www0wwwjs1 / matrix-capsules-em-tensorflow Goto Github PK
View Code? Open in Web Editor NEWA Tensorflow implementation of CapsNet based on paper Matrix Capsules with EM Routing
License: Apache License 2.0
A Tensorflow implementation of CapsNet based on paper Matrix Capsules with EM Routing
License: Apache License 2.0
I run ./download.sh
under ./data and then the directory smallNORB
and 4 files smallnorb-xxxx-{testing|training}-{cat|dat}.mat
got available.
But when I tried to train the network with the smallNORB dataset by sending the command python train.py "smallNORB"
, errors as follows returned.
$ python3 train.py "smallNORB"
Using TensorFlow backend.
2018-08-07 19:49:45,984 [5964] INFO __main__: Using dataset: smallNORB
2018-08-07 19:49:45,990 [5964] CRITICAL root: Traceback (most recent call last):
File "train.py", line 156, in <module>
tf.app.run()
File "/home/user/python3.6/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "train.py", line 52, in main
batch_x, batch_labels = create_inputs()
File "/path/to/Matrix-Capsules-EM-Tensorflow/config.py", line 84, in <lambda>
'smallNORB': lambda: create_inputs_norb(is_train, epochs),
File "/path/to/Matrix-Capsules-EM-Tensorflow/utils.py", line 31, in create_inputs_norb
image, label = norb.read_norb_tfrecord(chunk_files, epochs)
File "/path/to/Matrix-Capsules-EM-Tensorflow/data/smallNORB.py", line 128, in read_norb_tfrecord
filename_queue = tf.train.string_input_producer(filenames, num_epochs=epochs)
File "/home/user/python3.6/lib/python3.6/site-packages/tensorflow/python/training/input.py", line 241, in string_input_producer
raise ValueError(not_null_err)
ValueError: string_input_producer requires a non-null input tensor
Did I miss anything to do before training with smallNORB?
I think the 4 files downloaded should be converted tfrecord files.
I'm a begginer of tensorflow so what I said maybe beside the point...
Thanks.
Could you please explain why loss values after 1200 iterations are growing up?
2018-12-28 07:17:58,417 [19476] INFO main: 412 iteration finishs in 3.548040 second loss=0.288399
2018-12-28 08:17:33,187 [19476] INFO main: 1399 iteration finishs in 3.582372 second loss=0.510083
Dataset: Mnist
Thanks
Jalil
There is an error now when we are trying to run for smallNORB dataset
chmod +x download.sh;./download.sh
Most likely the urls have been expired or not available.
In the code, the shape of beta_v is [caps_num_c, n_channels], however in the authors's response on OpenReview (https://openreview.net/forum?id=HJWLfGWRb), they state:
"beta_v and beta_a are per capsule type. Therefore, they are vectors for both convolutional capsules and final capsules. For example in terms of the notation in fig.1 beta_a and beta_v for convCaps1 are C dimensional vectors."
So should the shape of beta_v be just [caps_num_c]?
I want to realize the first normalization of smallNORB, but I could not get the value of mean and variance(std) in data/smallNORB.py
177 line: mean, variance = tf.nn.moments(image, [0, 1, 2])
I am sure both mean and variance is scalar.
is there some way to get the mean and variance(std) of smallNORB dataset?
Thanks so much if you could help me!
What happen if we remove coord_add
function from code? why we need this function?
Thanks for posting the code!
I tried your code on smallNORM dataset with the parameter specified in the paper: A=64 B=8 C=D=16, routing iteration = 3, batch_size = 64 (set the number myself). But the result is very bad (cannot even converge). However in the paper the author said the accuracy should be 97.8%. I am wondering why it is so sensitive to how many capsules in A and also routing iterations?
And could the author post more testing result with different number of capsules/ routing iteration/ learning rate/ batch size etc.
Thank you very much!
I try to replace the batch_squash with batch_x, and the reconstruct loss going to very large.
Why we can not training with original MNIST data?
The fully connected layers added on top of the capsule network consist of ~1.6m parameters, whereas the capsules only have roughly 60k trainable parameters in the small configuration. As the matrix capsules are supposed to generalize better with fewer parameters, as compared to traditional architectures, this approach seems counterintuitive to me.
However, removing the reconstruction loss and training with spread loss alone doesn't appear to converge (on smallNORB). Where you able to train your network with spread loss only (as suggested by the paper)?
Nice implementation of the capsule network with EM routing! However, I do have a question about the function mat_transform() in the ConvCaps layer. In your implementation the weights are not shared over the patches of capsules. It is true that in the paper, the authors didn't mention whether the weights should be shared in the ConvCaps layer. But judging from the number of parameters that is reported in the paper, if the weights are not shared, there will be way too many parameters to be trained. Check this out:
This paragraph is on page 5 of the paper.
It would also be reasonable to used shared weights if one wants to build deeper system, but I might be wrong about this.
Searching the batch from the data queue for only once. Can this cover all the data from the queue?
In capsnet_em.py em_routing():
r = tf.constant(np.ones([batch_size, caps_num_i, caps_num_c], dtype=np.float32) / 32)
should this be:
r = tf.constant(np.ones([batch_size, caps_num_i, caps_num_c], dtype=np.float32) / caps_num_c)
as in paper:
∀i, c: Ric ← 1/size(L + 1)
is it a problem of weights ?
Hi,
I have two questions:
Hi,Thanks for your code firstI don't know the meaning about coord,hope you can give some answer about it,Thank you
Hi @www0wwwjs1! I had a quick question -- I just set up this code and am running it, and on both MNIST and smallNORB the loss starts at <1 (usually around 0.3) and flatlines during training. I'm confused as to why the loss starts so low and doesn't change. When testing on smallNORB, the accuracy is quite low, around 0.45. Please let me know what you think, thank you!
Open a chrome browser, visit the site: http://127.0.1.1:6006/
maybe http://127.0.0.1:6006/
I completed the code for cifar10 and made it run.
The model gets 0.37 accuracy which is very disheartening. Do you think there is a problem with the code or model itself is bad?
Thank you so much for contributing you excellent code! I have read it into detail and it is really inspiring!
However, there is a small part of the code that I can not understand in the EM routing section:
It is the calculation of "miu"(mean) and "sigma_square"(variance) in the m-step section in function em_routing(line 341 - line 350, capsnet_em.py). Apparently the calculation procedure is different from my normal thinking, which is strictly follow the paper
_(procedure M-STEP, line 3, Procedure 1 in the paper)_
1. multiply R_ij and V_ij^h, then sum it by dimention "i"
2. sum R_ij by dimention "i"
3. divition
(I think I will implement that in a more naive way:
v_in=[1250,72,16,16]
r=[1250,72,16]
r = tf.reshape(r, (1250,72,16,1))
up = v_in * r
up = tf.reduce_sum(up, axis=1, keep_dims=True)
down = tf.reduce_sum(r, axis=1, keep_dims=True)
miu = up / down
)
and I have trouble understanding that!
Could you please explain how you transfer that into a matrix calculation procedure?
How(where) can I get the knowledge about doing this kind of transformation?
Question translate to Chinese: 求问如何能理解m-step这部分的矩阵运算,这个运算的过程似乎与原文有点不一样,我看不懂为什么这样做能够算出均值和方差。我在上面尝试写出了,更类似原文运算的代码(心里没谱)。求问您进行矩阵运算的思路是怎样的?如何能找到相关的资料,特别是如何将一个运算转换为矩阵运算的资料?这样实现的优势是什么?谢谢您
Hi, I am the author of pytorch implementation of Matrix Capsules.
I find that you are facing the problem of unstable training, which did not happen in my project.
I guess it may due to my use of clipping method, both in gradient and variable,for example, if you have a/b or log(b) in your computational graph, you may need to clamp b to be bigger than a small number like 0.01 for computational stability.
Both a colleague and I have tried to run your network, but it doesn't train and gets stuck on loss 0.36. Maybe one of the latest updates caused this?
Btw, thanks for sharing your code!
Hello,
Thanks for releasing great code base on Capsule Network.
One quick query are not stopping gradient during em_routing?
I tried python3 train.py "mnist" with master version and the default parameters. But the spread loss is not decreasing, it jumps from 0.36 to 0.65 in 1200 iterations. However, the experiment version seems to work fine.
Hi,
I would like to know
(1) the intuition behind the pose matrix, how is it formulated for each capsule based on the Relu ofmaps ?
(2) I am trying to evaluate the CapNet expensive operations, is the reshaping after each output is necessary for the next stage?
Thanks,
I just realized that the paper says that image samples are first normalized, then cropped, and then they add random brightness and contrast. But in your data creation code (data/smallNORB.py), you have added the random brightness and contrast before cropping.
Please add line number 164 and 165 after line number 180. It should have some effect on the accuracy I guess?
Thanks!
@www0wwwjs1 Hi Prof. Suofei! I train CapsNet on cifar-10, but the accuracy is 50% after 50 epochs. Could you please give me some advice about parameter setting?
Thank you very much!
There is no file "eval.py" in the code files.
I want to train this program with my own dataset, but I have difficulties to add it to the get_coord_add function. Could someone please help me ?
In capsule_em.py
,
votes = tf.reshape(tf.matmul(output, w), [batch_size, caps_num_i, caps_num_c, 16])
when training with tf.matmul
, gpu usage is very low, usually around 50% or less sometimes. Why does this happen ?
I want to train the imdb dataset
Hi,
I see huge variation in convergence across multiple runs on the same dataset. Sometime it converges in 3k iteration while other time it runs 100k iterations without convergence. Do you know why its like this?
Thank you very much for your work.
After printing r ,I find r is the same between capsulses in layer L. Is it reasonable?
Hi,
If the routing iteration is greater than 1, the loss will become nan. What can I do to modify it?
Hi,
thanks for sharing your implementation. I read through your high-level description on https://openreview.net/forum?id=HJWLfGWRb , and I have a question about your implementation of the EM routing for convolutional capsule layers. I haven't looked deeply into your code yet, so I apologize if I'm wrong.
In particular, I think that you clone each patch / perceptive field into a column matrix and then handle each perceptive field separately as in the fully connected case. But this disregards the fact that each input capsule appears in multiple perceptive fields - and these influence each other. So if capsule I1 appears in the perceptive fields of capsules O1 and O2, and the EM determines that O1 is a good match for I1, than it cannot be a good match for O2.
This cross-influencing of capsules is briefly mentioned in the paper:
For convolutional capsules, each capsule in layer L+1 sends feedback only to capsules within its receptive field in layer L. Therefore each convolutional instance of a capsule in layer L receives at most kernel_size x kernel_size feedback from each capsule type in layer L+1.
Am I correct that this means that one cannot treat each input patch in isolation but has to run a global EM pass, respecting the more complex connectivity?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.