www0wwwjs1 / matrix-capsules-em-tensorflow Goto Github PK

View Code? Open in Web Editor NEW

219.0 219.0 83.0 787 KB

A Tensorflow implementation of CapsNet based on paper Matrix Capsules with EM Routing

License: Apache License 2.0

Python 99.60% Shell 0.40%

matrix-capsules-em-tensorflow's People

Contributors

Stargazers

Watchers

Forkers

watchsea kaitoops ryfan-rs sebluo luonango fitrialif yhyu13 yunzhishi fujenchu boltzmannblitz grseb9s wuweitao kurnianggoro 1kaiser mathildor shi27feng mazecreator blank-wang muxanick insad zhangjunwang madhubabuv loppol38 grgdh lc-john rlan fillinside dragonfly90 unclecao gmorenz izhangh ds2268 liviust oscar2song vocaliodmiku apappu97 jiajiemo getattached doubledaibo caprdzv lingguomeng safaadaf soonhwan-kwon loganick praveenmunagapati shikaize ymlai87416 hbcbh1999 ahmedhamdy2121 parinaya-007 gar1t hecongqing lryanx pkruskal shitouxyz123 stjordanis hintonthu wsf1297139301 lianglili linhduongtuan msrocean moeinh77 liguihong brucemareri jiansfoggy sonaliam cwpl catherine-hfut sc1054 huqingli yanxiankun zlpmichelle songxiangyu27 xiaogaogaoxiao rotcx zceehua liyakong liubaoyang csw9270 mopshell scottblack1998 leo-fengpan iq-scm

matrix-capsules-em-tensorflow's Issues

Is there Something to be done after running download.sh for smallNORB data to be available?

I run ./download.sh under ./data and then the directory smallNORB and 4 files smallnorb-xxxx-{testing|training}-{cat|dat}.mat got available.
But when I tried to train the network with the smallNORB dataset by sending the command python train.py "smallNORB", errors as follows returned.

$ python3 train.py "smallNORB"
Using TensorFlow backend.
2018-08-07 19:49:45,984 [5964] INFO     __main__: Using dataset: smallNORB
2018-08-07 19:49:45,990 [5964] CRITICAL root: Traceback (most recent call last):
  File "train.py", line 156, in <module>
    tf.app.run()
  File "/home/user/python3.6/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "train.py", line 52, in main
    batch_x, batch_labels = create_inputs()
  File "/path/to/Matrix-Capsules-EM-Tensorflow/config.py", line 84, in <lambda>
    'smallNORB': lambda: create_inputs_norb(is_train, epochs),
  File "/path/to/Matrix-Capsules-EM-Tensorflow/utils.py", line 31, in create_inputs_norb
    image, label = norb.read_norb_tfrecord(chunk_files, epochs)
  File "/path/to/Matrix-Capsules-EM-Tensorflow/data/smallNORB.py", line 128, in read_norb_tfrecord
    filename_queue = tf.train.string_input_producer(filenames, num_epochs=epochs)
  File "/home/user/python3.6/lib/python3.6/site-packages/tensorflow/python/training/input.py", line 241, in string_input_producer
    raise ValueError(not_null_err)
ValueError: string_input_producer requires a non-null input tensor

Did I miss anything to do before training with smallNORB?
I think the 4 files downloaded should be converted tfrecord files.

I'm a begginer of tensorflow so what I said maybe beside the point...

Thanks.

Wrong Results

Could you please explain why loss values after 1200 iterations are growing up?

2018-12-28 07:17:58,417 [19476] INFO main: 412 iteration finishs in 3.548040 second loss=0.288399
2018-12-28 08:17:33,187 [19476] INFO main: 1399 iteration finishs in 3.582372 second loss=0.510083

Dataset: Mnist
Thanks
Jalil

smallNORB dataset urls error "connection forbidden"

There is an error now when we are trying to run for smallNORB dataset
chmod +x download.sh;./download.sh
Most likely the urls have been expired or not available.

Shape of beta_v

In the code, the shape of beta_v is [caps_num_c, n_channels], however in the authors's response on OpenReview (https://openreview.net/forum?id=HJWLfGWRb), they state:

"beta_v and beta_a are per capsule type. Therefore, they are vectors for both convolutional capsules and final capsules. For example in terms of the notation in fig.1 beta_a and beta_v for convCaps1 are C dimensional vectors."

So should the shape of beta_v be just [caps_num_c]?

the mean and std of smallNORB dataset

I want to realize the first normalization of smallNORB, but I could not get the value of mean and variance(std) in data/smallNORB.py
177 line: mean, variance = tf.nn.moments(image, [0, 1, 2])
I am sure both mean and variance is scalar.
is there some way to get the mean and variance(std) of smallNORB dataset?

Thanks so much if you could help me!

Why we need coord_add function?

What happen if we remove coord_add function from code? why we need this function?

Test on smallNORM with parameters specified in the paper have very bad result

Thanks for posting the code!

I tried your code on smallNORM dataset with the parameter specified in the paper: A=64 B=8 C=D=16, routing iteration = 3, batch_size = 64 (set the number myself). But the result is very bad (cannot even converge). However in the paper the author said the accuracy should be 97.8%. I am wondering why it is so sensitive to how many capsules in A and also routing iterations?

And could the author post more testing result with different number of capsules/ routing iteration/ learning rate/ batch size etc.

Thank you very much!

why use "batch_squash = tf.divide(batch_x, 255.)" rather than just batch_x

I try to replace the batch_squash with batch_x, and the reconstruct loss going to very large.
Why we can not training with original MNIST data?

Training without reconstruction loss

The fully connected layers added on top of the capsule network consist of ~1.6m parameters, whereas the capsules only have roughly 60k trainable parameters in the small configuration. As the matrix capsules are supposed to generalize better with fewer parameters, as compared to traditional architectures, this approach seems counterintuitive to me.
However, removing the reconstruction loss and training with spread loss alone doesn't appear to converge (on smallNORB). Where you able to train your network with spread loss only (as suggested by the paper)?

Shared Weights in ConCaps Layer

Nice implementation of the capsule network with EM routing! However, I do have a question about the function mat_transform() in the ConvCaps layer. In your implementation the weights are not shared over the patches of capsules. It is true that in the paper, the authors didn't mention whether the weights should be shared in the ConvCaps layer. But judging from the number of parameters that is reported in the paper, if the weights are not shared, there will be way too many parameters to be trained. Check this out:
This paragraph is on page 5 of the paper.
It would also be reasonable to used shared weights if one wants to build deeper system, but I might be wrong about this.

batch from the data queue

Searching the batch from the data queue for only once. Can this cover all the data from the queue?

Initial value assigned to r.

In capsnet_em.py em_routing():

r = tf.constant(np.ones([batch_size, caps_num_i, caps_num_c], dtype=np.float32) / 32)

should this be:

r = tf.constant(np.ones([batch_size, caps_num_i, caps_num_c], dtype=np.float32) / caps_num_c)

as in paper:

∀i, c: Ric ← 1/size(L + 1)

ValueError: ('Convolution not supported for input with rank', 2)

is it a problem of weights ?

get a different loss and accuracy curve

Hi, and I am using the default parameters training the net, but I got some different loss and training accuracy curves compared with the README file. Has anyone meet this kind of things?

smallNORB all loss:
smallNORB training accuracy:
mnist all loss:
mnist training accuracy:

Softmax as logistic function

Hi,
I have two questions:

After training mnist, the testing it gets me average accurage of 0.3, any ideas why?
Activations are updated by calculating the cost and applying a logistic function, why is a softmax used instead of sigmoid?

A code question

Hi,Thanks for your code first~~I don't know the meaning about coord,hope you can give some answer about it,Thank you~~

Loss starting at 0.2

Hi @www0wwwjs1! I had a quick question -- I just set up this code and am running it, and on both MNIST and smallNORB the loss starts at <1 (usually around 0.3) and flatlines during training. I'm confused as to why the loss starts so low and doesn't change. When testing on smallNORB, the accuracy is quite low, around 0.45. Please let me know what you think, thank you!

readme file have an issue

Open a chrome browser, visit the site: http://127.0.1.1:6006/
maybe http://127.0.0.1:6006/

CIFAR10 giving on 0.37 accuracy

I completed the code for cifar10 and made it run.
The model gets 0.37 accuracy which is very disheartening. Do you think there is a problem with the code or model itself is bad?

Strange jump in training loss and accuracy

Hi. Has anyone had this type of behavior during training? I have just cloned and ran without changing anything.

EDIT: This is what it looks like when it has completely finished training.

Issue：Calculate the “Mean“ and “Variance” in EM Routing

Thank you so much for contributing you excellent code! I have read it into detail and it is really inspiring!
However, there is a small part of the code that I can not understand in the EM routing section:
It is the calculation of "miu"(mean) and "sigma_square"(variance) in the m-step section in function em_routing(line 341 - line 350, capsnet_em.py). Apparently the calculation procedure is different from my normal thinking, which is strictly follow the paper

      _(procedure M-STEP, line 3, Procedure 1 in the paper)_
       1. multiply R_ij and V_ij^h， then sum it by dimention "i"
       2. sum R_ij by dimention "i"
       3. divition
       （I think I will implement that in a more naive way: 
                        v_in=[1250,72,16,16]
                        r=[1250,72,16]
                        r =  tf.reshape(r, (1250,72,16,1))
                        up = v_in * r
                        up = tf.reduce_sum(up, axis=1, keep_dims=True)
                        down = tf.reduce_sum(r, axis=1, keep_dims=True)
                        miu = up / down
         ）

and I have trouble understanding that!
Could you please explain how you transfer that into a matrix calculation procedure?
How(where) can I get the knowledge about doing this kind of transformation?

Question translate to Chinese: 求问如何能理解m-step这部分的矩阵运算，这个运算的过程似乎与原文有点不一样，我看不懂为什么这样做能够算出均值和方差。我在上面尝试写出了，更类似原文运算的代码（心里没谱）。求问您进行矩阵运算的思路是怎样的？如何能找到相关的资料，特别是如何将一个运算转换为矩阵运算的资料？这样实现的优势是什么？谢谢您

A suggestion on stable training

Hi, I am the author of pytorch implementation of Matrix Capsules.
I find that you are facing the problem of unstable training, which did not happen in my project.
I guess it may due to my use of clipping method, both in gradient and variable,for example, if you have a/b or log(b) in your computational graph, you may need to clamp b to be bigger than a small number like 0.01 for computational stability.

Network no longer trains correctly

Both a colleague and I have tried to run your network, but it doesn't train and gets stuck on loss 0.36. Maybe one of the latest updates caused this?

Btw, thanks for sharing your code!

Back propagation withing em_routing

Hello,
Thanks for releasing great code base on Capsule Network.
One quick query are not stopping gradient during em_routing?

spread loss is not decreasing

I tried python3 train.py "mnist" with master version and the default parameters. But the spread loss is not decreasing, it jumps from 0.36 to 0.65 in 1200 iterations. However, the experiment version seems to work fine.

Formation of Pose matrix and then votes

Hi,
I would like to know
(1) the intuition behind the pose matrix, how is it formulated for each capsule based on the Relu ofmaps ?
(2) I am trying to evaluate the CapNet expensive operations, is the reshaping after each output is necessary for the next stage?
Thanks,

SmallNORB Data Creation

I just realized that the paper says that image samples are first normalized, then cropped, and then they add random brightness and contrast. But in your data creation code (data/smallNORB.py), you have added the random brightness and contrast before cropping.

Please add line number 164 and 165 after line number 180. It should have some effect on the accuracy I guess?

Thanks!

cifar10 parameters setting

@www0wwwjs1 Hi Prof. Suofei! I train CapsNet on cifar-10, but the accuracy is 50% after 50 epochs. Could you please give me some advice about parameter setting？

Thank you very much!

eval.py

There is no file "eval.py" in the code files.

How to add a dataset to the Coord_add function ?

I want to train this program with my own dataset, but I have difficulties to add it to the get_coord_add function. Could someone please help me ?

tf.matmul slows training and causes low GPU usage.

In capsule_em.py,

votes = tf.reshape(tf.matmul(output, w), [batch_size, caps_num_i, caps_num_c, 16])
when training with tf.matmul, gpu usage is very low, usually around 50% or less sometimes. Why does this happen ?

How to add a dateset to the get_coord_add function

I want to train the imdb dataset

Query on Randomness across multiple runs

Hi,
I see huge variation in convergence across multiple runs on the same dataset. Sometime it converges in 3k iteration while other time it runs 100k iterations without convergence. Do you know why its like this?

Is r supposed to be the same between capsulses in layer L?

Thank you very much for your work.
After printing r ,I find r is the same between capsulses in layer L. Is it reasonable？

a problem on routing iteration

Hi,

If the routing iteration is greater than 1, the loss will become nan. What can I do to modify it?

EM routing for convolutional capsules

Hi,

thanks for sharing your implementation. I read through your high-level description on https://openreview.net/forum?id=HJWLfGWRb , and I have a question about your implementation of the EM routing for convolutional capsule layers. I haven't looked deeply into your code yet, so I apologize if I'm wrong.

In particular, I think that you clone each patch / perceptive field into a column matrix and then handle each perceptive field separately as in the fully connected case. But this disregards the fact that each input capsule appears in multiple perceptive fields - and these influence each other. So if capsule I1 appears in the perceptive fields of capsules O1 and O2, and the EM determines that O1 is a good match for I1, than it cannot be a good match for O2.

This cross-influencing of capsules is briefly mentioned in the paper:

For convolutional capsules, each capsule in layer L+1 sends feedback only to capsules within its receptive field in layer L. Therefore each convolutional instance of a capsule in layer L receives at most kernel_size x kernel_size feedback from each capsule type in layer L+1.

Am I correct that this means that one cannot treat each input patch in isolation but has to run a global EM pass, respecting the more complex connectivity?