Git Product home page Git Product logo

matrix-capsules-em-tensorflow's People

Contributors

www0wwwjs1 avatar yhyu13 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

matrix-capsules-em-tensorflow's Issues

Is there Something to be done after running download.sh for smallNORB data to be available?

I run ./download.sh under ./data and then the directory smallNORB and 4 files smallnorb-xxxx-{testing|training}-{cat|dat}.mat got available.
But when I tried to train the network with the smallNORB dataset by sending the command python train.py "smallNORB", errors as follows returned.

$ python3 train.py "smallNORB"
Using TensorFlow backend.
2018-08-07 19:49:45,984 [5964] INFO     __main__: Using dataset: smallNORB
2018-08-07 19:49:45,990 [5964] CRITICAL root: Traceback (most recent call last):
  File "train.py", line 156, in <module>
    tf.app.run()
  File "/home/user/python3.6/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "train.py", line 52, in main
    batch_x, batch_labels = create_inputs()
  File "/path/to/Matrix-Capsules-EM-Tensorflow/config.py", line 84, in <lambda>
    'smallNORB': lambda: create_inputs_norb(is_train, epochs),
  File "/path/to/Matrix-Capsules-EM-Tensorflow/utils.py", line 31, in create_inputs_norb
    image, label = norb.read_norb_tfrecord(chunk_files, epochs)
  File "/path/to/Matrix-Capsules-EM-Tensorflow/data/smallNORB.py", line 128, in read_norb_tfrecord
    filename_queue = tf.train.string_input_producer(filenames, num_epochs=epochs)
  File "/home/user/python3.6/lib/python3.6/site-packages/tensorflow/python/training/input.py", line 241, in string_input_producer
    raise ValueError(not_null_err)
ValueError: string_input_producer requires a non-null input tensor

Did I miss anything to do before training with smallNORB?
I think the 4 files downloaded should be converted tfrecord files.

I'm a begginer of tensorflow so what I said maybe beside the point...

Thanks.

Wrong Results

Could you please explain why loss values after 1200 iterations are growing up?

2018-12-28 07:17:58,417 [19476] INFO main: 412 iteration finishs in 3.548040 second loss=0.288399
2018-12-28 08:17:33,187 [19476] INFO main: 1399 iteration finishs in 3.582372 second loss=0.510083

Dataset: Mnist
Thanks
Jalil

Shape of beta_v

In the code, the shape of beta_v is [caps_num_c, n_channels], however in the authors's response on OpenReview (https://openreview.net/forum?id=HJWLfGWRb), they state:

"beta_v and beta_a are per capsule type. Therefore, they are vectors for both convolutional capsules and final capsules. For example in terms of the notation in fig.1 beta_a and beta_v for convCaps1 are C dimensional vectors."

So should the shape of beta_v be just [caps_num_c]?

the mean and std of smallNORB dataset

I want to realize the first normalization of smallNORB, but I could not get the value of mean and variance(std) in data/smallNORB.py
177 line: mean, variance = tf.nn.moments(image, [0, 1, 2])
I am sure both mean and variance is scalar.
is there some way to get the mean and variance(std) of smallNORB dataset?

Thanks so much if you could help me!

Test on smallNORM with parameters specified in the paper have very bad result

Thanks for posting the code!

I tried your code on smallNORM dataset with the parameter specified in the paper: A=64 B=8 C=D=16, routing iteration = 3, batch_size = 64 (set the number myself). But the result is very bad (cannot even converge). However in the paper the author said the accuracy should be 97.8%. I am wondering why it is so sensitive to how many capsules in A and also routing iterations?

And could the author post more testing result with different number of capsules/ routing iteration/ learning rate/ batch size etc.

Thank you very much!

Training without reconstruction loss

The fully connected layers added on top of the capsule network consist of ~1.6m parameters, whereas the capsules only have roughly 60k trainable parameters in the small configuration. As the matrix capsules are supposed to generalize better with fewer parameters, as compared to traditional architectures, this approach seems counterintuitive to me.
However, removing the reconstruction loss and training with spread loss alone doesn't appear to converge (on smallNORB). Where you able to train your network with spread loss only (as suggested by the paper)?

Shared Weights in ConCaps Layer

Nice implementation of the capsule network with EM routing! However, I do have a question about the function mat_transform() in the ConvCaps layer. In your implementation the weights are not shared over the patches of capsules. It is true that in the paper, the authors didn't mention whether the weights should be shared in the ConvCaps layer. But judging from the number of parameters that is reported in the paper, if the weights are not shared, there will be way too many parameters to be trained. Check this out:
image This paragraph is on page 5 of the paper.
It would also be reasonable to used shared weights if one wants to build deeper system, but I might be wrong about this.

batch from the data queue

Searching the batch from the data queue for only once. Can this cover all the data from the queue?

Initial value assigned to r.

In capsnet_em.py em_routing():

r = tf.constant(np.ones([batch_size, caps_num_i, caps_num_c], dtype=np.float32) / 32)

should this be:

r = tf.constant(np.ones([batch_size, caps_num_i, caps_num_c], dtype=np.float32) / caps_num_c)

as in paper:

∀i, c: Ric ← 1/size(L + 1)

get a different loss and accuracy curve

Hi, and I am using the default parameters training the net, but I got some different loss and training accuracy curves compared with the README file. Has anyone meet this kind of things?

  • smallNORB all loss:
    sn_all_loss
  • smallNORB training accuracy:
    sn_train_acc
  • mnist all loss:
    mn_all_loss2
  • mnist training accuracy:
    mn_train_acc2

Softmax as logistic function

Hi,
I have two questions:

  1. After training mnist, the testing it gets me average accurage of 0.3, any ideas why?
  2. Activations are updated by calculating the cost and applying a logistic function, why is a softmax used instead of sigmoid?

A code question

Hi,Thanks for your code firstI don't know the meaning about coord,hope you can give some answer about it,Thank you

Loss starting at 0.2

Hi @www0wwwjs1! I had a quick question -- I just set up this code and am running it, and on both MNIST and smallNORB the loss starts at <1 (usually around 0.3) and flatlines during training. I'm confused as to why the loss starts so low and doesn't change. When testing on smallNORB, the accuracy is quite low, around 0.45. Please let me know what you think, thank you!

CIFAR10 giving on 0.37 accuracy

I completed the code for cifar10 and made it run.
The model gets 0.37 accuracy which is very disheartening. Do you think there is a problem with the code or model itself is bad?

Strange jump in training loss and accuracy

Hi. Has anyone had this type of behavior during training? I have just cloned and ran without changing anything.

capture1

capture

EDIT: This is what it looks like when it has completely finished training.

unreal3_96x96_1

Issue:Calculate the “Mean“ and “Variance” in EM Routing

Thank you so much for contributing you excellent code! I have read it into detail and it is really inspiring!
However, there is a small part of the code that I can not understand in the EM routing section:
It is the calculation of "miu"(mean) and "sigma_square"(variance) in the m-step section in function em_routing(line 341 - line 350, capsnet_em.py). Apparently the calculation procedure is different from my normal thinking, which is strictly follow the paper

      _(procedure M-STEP, line 3, Procedure 1 in the paper)_
       1. multiply R_ij and V_ij^h, then sum it by dimention "i"
       2. sum R_ij by dimention "i"
       3. divition
       (I think I will implement that in a more naive way: 
                        v_in=[1250,72,16,16]
                        r=[1250,72,16]
                        r =  tf.reshape(r, (1250,72,16,1))
                        up = v_in * r
                        up = tf.reduce_sum(up, axis=1, keep_dims=True)
                        down = tf.reduce_sum(r, axis=1, keep_dims=True)
                        miu = up / down
         )

and I have trouble understanding that!
Could you please explain how you transfer that into a matrix calculation procedure?
How(where) can I get the knowledge about doing this kind of transformation?

Question translate to Chinese: 求问如何能理解m-step这部分的矩阵运算,这个运算的过程似乎与原文有点不一样,我看不懂为什么这样做能够算出均值和方差。我在上面尝试写出了,更类似原文运算的代码(心里没谱)。求问您进行矩阵运算的思路是怎样的?如何能找到相关的资料,特别是如何将一个运算转换为矩阵运算的资料?这样实现的优势是什么?谢谢您

A suggestion on stable training

Hi, I am the author of pytorch implementation of Matrix Capsules.
I find that you are facing the problem of unstable training, which did not happen in my project.
I guess it may due to my use of clipping method, both in gradient and variable,for example, if you have a/b or log(b) in your computational graph, you may need to clamp b to be bigger than a small number like 0.01 for computational stability.

Network no longer trains correctly

Both a colleague and I have tried to run your network, but it doesn't train and gets stuck on loss 0.36. Maybe one of the latest updates caused this?

Btw, thanks for sharing your code!

spread loss is not decreasing

I tried python3 train.py "mnist" with master version and the default parameters. But the spread loss is not decreasing, it jumps from 0.36 to 0.65 in 1200 iterations. However, the experiment version seems to work fine.

Formation of Pose matrix and then votes

Hi,
I would like to know
(1) the intuition behind the pose matrix, how is it formulated for each capsule based on the Relu ofmaps ?
(2) I am trying to evaluate the CapNet expensive operations, is the reshaping after each output is necessary for the next stage?
Thanks,

SmallNORB Data Creation

I just realized that the paper says that image samples are first normalized, then cropped, and then they add random brightness and contrast. But in your data creation code (data/smallNORB.py), you have added the random brightness and contrast before cropping.

Please add line number 164 and 165 after line number 180. It should have some effect on the accuracy I guess?

Thanks!

cifar10 parameters setting

@www0wwwjs1 Hi Prof. Suofei! I train CapsNet on cifar-10, but the accuracy is 50% after 50 epochs. Could you please give me some advice about parameter setting?

Thank you very much!

eval.py

There is no file "eval.py" in the code files.

tf.matmul slows training and causes low GPU usage.

In capsule_em.py,

votes = tf.reshape(tf.matmul(output, w), [batch_size, caps_num_i, caps_num_c, 16])
when training with tf.matmul, gpu usage is very low, usually around 50% or less sometimes. Why does this happen ?

Query on Randomness across multiple runs

Hi,
I see huge variation in convergence across multiple runs on the same dataset. Sometime it converges in 3k iteration while other time it runs 100k iterations without convergence. Do you know why its like this?

EM routing for convolutional capsules

Hi,

thanks for sharing your implementation. I read through your high-level description on https://openreview.net/forum?id=HJWLfGWRb , and I have a question about your implementation of the EM routing for convolutional capsule layers. I haven't looked deeply into your code yet, so I apologize if I'm wrong.

In particular, I think that you clone each patch / perceptive field into a column matrix and then handle each perceptive field separately as in the fully connected case. But this disregards the fact that each input capsule appears in multiple perceptive fields - and these influence each other. So if capsule I1 appears in the perceptive fields of capsules O1 and O2, and the EM determines that O1 is a good match for I1, than it cannot be a good match for O2.

This cross-influencing of capsules is briefly mentioned in the paper:

For convolutional capsules, each capsule in layer L+1 sends feedback only to capsules within its receptive field in layer L. Therefore each convolutional instance of a capsule in layer L receives at most kernel_size x kernel_size feedback from each capsule type in layer L+1.

Am I correct that this means that one cannot treat each input patch in isolation but has to run a global EM pass, respecting the more complex connectivity?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.