-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: get all partition widths/lengths in parallel instead of serially. #4494
Comments
I see this is already implemented for Dask by #4420. Is adding a similar implementation for PandasOnRay sufficient, or do we also need similar changes for PyarrowOnRay and OmnisciOnNative as well? |
@noloerino here are all the places I can find where we are getting uncached partition shapes serially. I found them by searching for
We do get lengths serially in |
Signed-off-by: Jonathan Shi <jhshi@ponder.io>
It's hard to find a case where this optimization is useful. See my comment here: #4683 (comment) Given that this optimization doesn't seem to give any major gains, it doesn't seem to be worth the extra code complexity in #4683. I'll close this issue for now. |
The reproducing script is the same as in #4493, but the solution here is different: instead of getting all lengths/widths serially, we should do so in parallel. We will need to add a new method at the physical layer for that.
This solution will not save us the cost of serializing the call queue drain result after computing length, but it will let us get all the lengths/widths at once instead of serially.
The text was updated successfully, but these errors were encountered: