Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dynamic routing #1

Closed
InnerPeace-Wu opened this issue Nov 1, 2017 · 4 comments
Closed

dynamic routing #1

InnerPeace-Wu opened this issue Nov 1, 2017 · 4 comments

Comments

@InnerPeace-Wu
Copy link

thx for the amazing work you've done. Since i adapted dynamic routing from your code, and I wanna share some of my ideas about it. Here is my repo with tentorflow
bias updating
you mentioned that you fix bias to 0, but during dynamic routing you are updating it, is that so? code: here and here.
In my opinion, the bias should not be updated, since it's just the initial value for dynamic routing, with your implementation, you will update bias every time you send in some data, even with Variable be set as trainable=False, and of course, the same thing goes for testing procedure. I think the easiest way is make a temporal variable with temp_bias = bias, and use it for dynamic routing.
bias summing
code here, it seems that you are trying to keep the shape of bias as [num_caps, 10], and you sum over all the training examples. I think that's problematic. The paper mentioned that bias is independent from image, but during routing, capsule prediction from layer below varies for different image, so the updated bias should be different too. After bias updated, the shape of bias should be [batch_size, caps, 10].

I tried with 3 iterations of dynamic routing, after less than 4 epoch (2k iters) the validation accuracy is 99.16, it seems working. Still not as efficient as the paper said.
But i got a huge problem that training procedure is slow, with almost 2s per iteration with batch_size 100 in Nvidia 1060, which way more than yours.

Just some of my ideas, glad to discuss with you.
best.

@iwasaki-kenta
Copy link

I agree with what you say for bias updates, though I am not so sure for bias summations.

As you said, the bias is independent from the image. If we keep it independent from the image, then wherever an image is in the index of a batch shouldn't affect the route it takes.

If the bias is in the form of [batch_size, caps, 10], wherever the image is routed is dependent on its batch index.

The reason as to why I believe [num_caps, 10] is correct is because the image after convolution could simply be global average/max-pooled, and thus no image-specific features relates to the bias during the dynamic routing process.

The paper itself states that 10 for MNIST's case is the number of capsules in DigitCaps, and 32 * 6 * 6 represents 6 * 6 capsules, and 32 channels of these 6 * 6 capsules. Hence, the architecture itself considers the image-space as a set of capsules.

@InnerPeace-Wu
Copy link
Author

InnerPeace-Wu commented Nov 1, 2017

Discussion clears things. Thx for sharing your idea @iwasaki-kenta.
My point is i make the log prior be same for every image before dynamic routing. Referring to the algorithm.
image
before routint $b_{ij}$ is initially set to 0, but for every image the $$\hat{u_{j|i}}$$ varies. So in my opinion, during routing the bias differ for different image with different cap predictions. But the initial $$b_{ij}$$ is still the same for every image, this is why we should not update THE bias during routing, but to take use of the VALUE of bias for building up agreement with layer after.

Imagine that, firstly you seed in a image with number 5, you get the digit caps. Then you send in a image 8. Did the bias during routing differ from each other?

In a word, I think the initial log prior and the bias during dynamic routing is not the same thing. The later just take use of the VALUE of the initial log prior for routing.

best,

@iwasaki-kenta
Copy link

iwasaki-kenta commented Nov 1, 2017

Alright, I completely understand and agree with what you mean :).

As we go through r iterations, we are having each image in a batch go through the routing process. The initial log likelihoods representing the biases have no relation to the images being routed, though we should keep track rather than sum the batch dimension of each image in the batch. Otherwise, we lose information on where an image is being routed by summing/averaging their posteriors.

I reflected the change in a version of the model I'm working on right now. Thanks a lot for the clarification.

image

@XifengGuo
Copy link
Owner

Thanks for this discussion, I have fixed this issue. Please check the newest commit. @InnerPeace-Wu @iwasaki-kenta

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants