-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dynamic routing #1
Comments
I agree with what you say for bias updates, though I am not so sure for bias summations. As you said, the bias is independent from the image. If we keep it independent from the image, then wherever an image is in the index of a batch shouldn't affect the route it takes. If the bias is in the form of The reason as to why I believe The paper itself states that |
Discussion clears things. Thx for sharing your idea @iwasaki-kenta. Imagine that, firstly you seed in a image with number 5, you get the digit caps. Then you send in a image 8. Did the bias during routing differ from each other? In a word, I think the best, |
Alright, I completely understand and agree with what you mean :). As we go through I reflected the change in a version of the model I'm working on right now. Thanks a lot for the clarification. |
Thanks for this discussion, I have fixed this issue. Please check the newest commit. @InnerPeace-Wu @iwasaki-kenta |
thx for the amazing work you've done. Since i adapted dynamic routing from your code, and I wanna share some of my ideas about it. Here is my repo with tentorflow
bias updating
you mentioned that you fix bias to 0, but during
dynamic routing
you are updating it, is that so? code: here and here.In my opinion, the bias should not be updated, since it's just the initial value for
dynamic routing
, with your implementation, you will update bias every time you send in some data, even with Variable be set astrainable=False
, and of course, the same thing goes for testing procedure. I think the easiest way is make a temporal variable withtemp_bias = bias
, and use it fordynamic routing
.bias summing
code here, it seems that you are trying to keep the shape of bias as
[num_caps, 10]
, and you sum over all the training examples. I think that's problematic. The paper mentioned that bias is independent from image, but during routing,capsule prediction from layer below
varies for different image, so the updated bias should be different too. After bias updated, the shape of bias should be[batch_size, caps, 10]
.I tried with 3 iterations of dynamic routing, after less than 4 epoch (2k iters) the validation accuracy is 99.16, it seems working. Still not as efficient as the paper said.
But i got a huge problem that training procedure is slow, with almost 2s per iteration with batch_size 100 in Nvidia 1060, which way more than yours.
Just some of my ideas, glad to discuss with you.
best.
The text was updated successfully, but these errors were encountered: