Question about the shape of `tform` #7

andrewliao11 · 2017-11-08T04:00:48Z

Hi @tinghuiz
Thanks for make the code open-source. I wonder if you can elaborate the method that you encode the pose data in KITTI dataset? The original data in KITTI is a 12-D vector, while in your code, I found that the dimension is 1,6,224,224.

Can you please elaborate your encoding method?

tinghuiz · 2017-11-08T05:28:32Z

For the paper, I fed the 12-D difference vector through two FC layers (12 -> 128 -> 256), and concatenate the output with the image features (4096) to form the input to the flow decoder pathway.

After the submission, I found that it actually performs better by using the Euler angles + 3D translation (6 numbers) pose representation, and concatenate them along the color channels of the input image (spatially replicated for each pixel) as the input to the network. This way there's actually no need for FC layers, and the network can be fully-convolutional.

tinghuiz closed this as completed Nov 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the shape of `tform` #7

Question about the shape of `tform` #7

andrewliao11 commented Nov 8, 2017

tinghuiz commented Nov 8, 2017

Question about the shape of tform #7

Question about the shape of tform #7

Comments

andrewliao11 commented Nov 8, 2017

tinghuiz commented Nov 8, 2017

Question about the shape of `tform` #7

Question about the shape of `tform` #7