Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector3dvector and other vector Eigen bindings speed up #657

Merged
merged 4 commits into from
Nov 5, 2018

Conversation

yxlao
Copy link
Collaborator

@yxlao yxlao commented Nov 2, 2018

Improves speed of open3d.Vector3dVector, Vector3iVector, Vector2iVector, Matrix4dVector by 40-200x, resolving issue #403 and pybind/pybind11#1481.

Special thanks to Wenzel's feedback. According to Wenzel, the slowness is due to "casting millions of small vectors, which requires a proportional amount of Python API calls".

Comparison

# in the build dir, inside virtualenv
➜  pip install pytest
➜  make install-pip-package -j && pytest ../src/UnitTest -s

Before

open3d.Vector3dVector: (2000000, 3)
open3d -> numpy: 1.005225s
numpy -> open3d: 0.000018s

open3d.Vector3iVector: (2000000, 3)
open3d -> numpy: 0.983502s
numpy -> open3d: 0.001925s

open3d.Vector2iVector: (2000000, 3)
open3d -> numpy: 0.944884s
numpy -> open3d: 0.000886s

After

open3d.Vector3dVector: (2000000, 3)
open3d -> numpy: 0.024798s
numpy -> open3d: 0.000017s

open3d.Vector3iVector: (2000000, 3)
open3d -> numpy: 0.010451s
numpy -> open3d: 0.001906s

open3d.Vector2iVector: (2000000, 3)
open3d -> numpy: 0.005553s
numpy -> open3d: 0.000865s

Discussions

This solution is not ideal yet:

  1. We pay the copy penalty. The way to avoid this is to replace the underlying storage of say vector<Eigen::Vector3d> to one blob of buffer. However, this requires significant rework of the code base.
    2) We pay the penalty accessing numpy array index individually. We can do more aggressive optimizations (e.g. more direct memory mapping) if we we handle double and int types separately (i.e. handle Vector3dVector and Vector3iVector separately) instead of using a generic function. assert that the incoming array is contiguous. Preliminary tests shows about 20% - 30% speed up.
    Edit:2) is addressed in the improved direct mapping approach

Future works

Some of the templated functions can be further merged. Please let me know if you have suggestions.


This change is Reviewable

@qianyizh
Copy link
Collaborator

qianyizh commented Nov 2, 2018

This tries to address #403

@yxlao
Copy link
Collaborator Author

yxlao commented Nov 2, 2018

Update: squeezed in another optimization ed122ac with direct memory mapping, 30% more speed up:

2e6 points:

open3d.Vector3dVector: (2000000, 3)
open3d -> numpy: 0.017295s
numpy -> open3d: 0.000016s

open3d.Vector3iVector: (2000000, 3)
open3d -> numpy: 0.009439s
numpy -> open3d: 0.002394s

open3d.Vector2iVector: (2000000, 3)
open3d -> numpy: 0.004427s
numpy -> open3d: 0.001198s

2e5 points (as used in #403):

open3d.Vector3dVector: (200000, 3)
open3d -> numpy: 0.001392s
numpy -> open3d: 0.000009s

open3d.Vector3iVector: (200000, 3)
open3d -> numpy: 0.000263s
numpy -> open3d: 0.000012s

open3d.Vector2iVector: (200000, 3)
open3d -> numpy: 0.000198s
numpy -> open3d: 0.000013s

Copy link
Contributor

@syncle syncle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 4 files at r1, 2 of 2 files at r2.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @takanokage and @qianyizh)

Copy link
Contributor

@syncle syncle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great PR. It is good to have unit test for this as well. As a sanity check, could you also check other tutorial examples that uses VectorXXVectors?

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @takanokage and @qianyizh)

Copy link
Collaborator Author

@yxlao yxlao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@syncle For vector and matrices, we have

Vector3dVector
Vector3iVector
Vector2iVector
Matrix4dVector

The first 3 has been optimized. The fourth Matrix4dVector could be optimized in the same way, however, it is only used for converting camera parameters, so performance shouldn't be an issue for now.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @takanokage and @qianyizh)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants