Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out why multithreading gains aren't great #12

Open
clbarnes opened this issue Nov 16, 2020 · 2 comments
Open

Figure out why multithreading gains aren't great #12

clbarnes opened this issue Nov 16, 2020 · 2 comments

Comments

@clbarnes
Copy link
Owner

clbarnes commented Nov 16, 2020

8x threads results in only a ~2x speedup for containment; 1.1x speedup for ray intersections.

Off the top of my head, there's probably some combination of 3 sources:

  1. The python-rust bridge. This would manifest as the query spending a significant portion of its time thrashing a single CPU at the start and end of the query, and possibly the multithreading gains getting worse with the number of queries, but improving with the number of rays cast for each containment query. It would be improved by Use rust-numpy for data IO #1
  2. The constant startup cost of rayon. This would manifest as the multithreading gains improving with the number of queries. Unavoidable, unless there's a lighter runtime available (smol?). Could automatically switch to single-threaded for small number of queries when threads=True?
  3. The N = number of queries cost of the work-stealing job scheduler, if the ray casts are very cheap. Would be improved by chunking the queries so that N = number of chunks.

Some combination of 2 and 3 are certainly certainly already a problem: benchmarked containment checks are ~2.5x faster on 0 threads than on 1.

@clbarnes
Copy link
Owner Author

clbarnes commented Nov 9, 2021

The fact that containment checks (multiple ray casts per task) get a better speedup than ray intersections (1 ray cast per task) implies that 3 is definitely a factor.

@clbarnes
Copy link
Owner Author

clbarnes commented Feb 22, 2022

Gains are still not great with #28 , which eliminates the python-rust bridge (although I think there's still a copy involved within the rust side) and at least some of rayon's startup cost (because it uses the global thread pool rather than building a new one every query). So I guess it's the job scheduler sapping our gains.

However, reorganising to use chunks would be a massive faff, unlikely to fix this any time soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant