Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some question about process 2d datasets #12

Open
yuchen-ji opened this issue Jul 5, 2022 · 3 comments
Open

some question about process 2d datasets #12

yuchen-ji opened this issue Jul 5, 2022 · 3 comments

Comments

@yuchen-ji
Copy link

i found that you crop the closest person for 3d datasets if have multiple person in one image, but for 2d dataset you may crop all persons in one image. why you process these datasets differently?
in addition, the cropped image may contain another person, won't this process bring ambiguity to the network?
thank you very much!

@hongsukchoi
Copy link
Owner

i found that you crop the closest person for 3d datasets if have multiple person in one image, but for 2d dataset you may crop all persons in one image. why you process these datasets differently?

The cropping process is the same regardless of datasets. Even if you crop the closest person for 3d datasets, there could be other people in the cropped image. And actually MuCo, which you are referring to, does not have multiple person in one image originally. It synthesizes multiple real person images to one image using depths.

in addition, the cropped image may contain another person, won't this process bring ambiguity to the network?

That is the challenge of crowded scenes, which 3DCrowdNet resolves. Please see the paper.

@yuchen-ji
Copy link
Author

i found that you crop the closest person for 3d datasets if have multiple person in one image, but for 2d dataset you may crop all persons in one image. why you process these datasets differently?

The cropping process is the same regardless of datasets. Even if you crop the closest person for 3d datasets, there could be other people in the cropped image. And actually MuCo, which you are referring to, does not have multiple person in one image originally. It synthesizes multiple real person images to one image using depths.

in addition, the cropped image may contain another person, won't this process bring ambiguity to the network?

That is the challenge of crowded scenes, which 3DCrowdNet resolves. Please see the paper.

Thanks for your reply!
I found many up-to-down methods using single person datasets for training, these can prevent ambiguity during training, which the cropped image does not contain another person. while for inference, the cropped image often contain other persons, but it often regress the right person's smpl parameters. Does this mean that if other people are included in the cropped image during training, it will bring ambiguity to the network and make it difficult for training.
In 3DCrowdNet, the cropped image contain other persons even for training. but add 2d robust pose heatmap to resolve the ambiguity. Is my understanding correct?

@hongsukchoi
Copy link
Owner

Does this mean that if other people are included in the cropped image during training, it will bring ambiguity to the network and make it difficult for training.

No. I think you are confused with how deep learning works. Given accurate ground truth, a neural network becomes robust to the ambiguity during training. Then, the neural network performs better on those ambiguous input in test time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants