Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the shape of tform #7

Closed
andrewliao11 opened this issue Nov 8, 2017 · 1 comment
Closed

Question about the shape of tform #7

andrewliao11 opened this issue Nov 8, 2017 · 1 comment

Comments

@andrewliao11
Copy link

Hi @tinghuiz
Thanks for make the code open-source. I wonder if you can elaborate the method that you encode the pose data in KITTI dataset? The original data in KITTI is a 12-D vector, while in your code, I found that the dimension is 1,6,224,224.

Can you please elaborate your encoding method?

@tinghuiz
Copy link
Owner

tinghuiz commented Nov 8, 2017

For the paper, I fed the 12-D difference vector through two FC layers (12 -> 128 -> 256), and concatenate the output with the image features (4096) to form the input to the flow decoder pathway.

After the submission, I found that it actually performs better by using the Euler angles + 3D translation (6 numbers) pose representation, and concatenate them along the color channels of the input image (spatially replicated for each pixel) as the input to the network. This way there's actually no need for FC layers, and the network can be fully-convolutional.

@tinghuiz tinghuiz closed this as completed Nov 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants