Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Linemod dataset #14

Closed
Trulli99 opened this issue Feb 15, 2023 · 9 comments
Closed

Use Linemod dataset #14

Trulli99 opened this issue Feb 15, 2023 · 9 comments

Comments

@Trulli99
Copy link

Hi,

Do you think it would be possible to run shapo with the linemod dataset if I follow the tips of some issues related to the use of custom datasets.

Thank you!

@zubair-irshad
Copy link
Owner

Yes, ofcourse. Unfortunately we don't have a script that prepares the Linemod dataset, but if it is very similar to NOCS and provides all the relevant data, you could follow the step by step things mentioned in this thread #11 and this thread #13 to train it with Linemod.

@Trulli99
Copy link
Author

Do you know how can I generate this images?

0000_coord

@zubair-irshad
Copy link
Owner

These are object NOCS images and you can find more information hughw19/NOCS_CVPR2019#62 or use blender proc.

But please note that you don't need these NOCS map for training our repo i.e. shapo, if you already have the GT 6D pose and sizes of rendered images (which i think Linemod already provides). Please see my answer here #11 (comment). You would have to save the relevant pose, image and depth data as datapoints here to train shapo. This information could be retrieved in any form i.e. 6D pose or size estimated from GT NOCS or just the GT 6D poses if available!

@Trulli99
Copy link
Author

Thank you, I will have a look. I thought that it was needed because you do processing of all the images in the camera train folder.

@Trulli99
Copy link
Author

In linemod I have the GT 6D, will I still need to generate the files in the folder sdf_rgb_pretrained?

@zubair-irshad
Copy link
Owner

Yes, you would still have to train SDF and RGB MLPs as well as the respective latent codes per object (if your categories are different than the categories we train on i.e. bottle, bowl, camera, mug and laptop) since our network requires them as a strong prior which we regress and later optimize from a single view observations. Please see this thread #13 on how you can train these for your own dataset.

@peng25zhang
Copy link

@zubair-irshad hi, thanks for your work.. Can you tell me what means the sizes of rendered images. I see the datapoint using *_norm.txt

@DavidYaonanZhu
Copy link

Also for YCB dataset

@zubair-irshad
Copy link
Owner

@peng25zhang the transformations are defined by R,T,s where R is a 3by3 rotation matrix, T is a 3by1 translation vector and s is a one-dimensional scale value. The scale value determines the scaling factor of the observed instance from canonical shape i.e. point clouds where the size is the 3dimensional extent of the canonical pointclouds.

In inference time, we are only given a single RGB-D observation, so we get size information from our predicted shape i.e. extracted pointclouds (canonical) and we regress R,T,s values from a neural network MLP and hence we could transform the canonical point-clouds to camera frame pointclouds. Hope it helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants