Skip to content
This repository has been archived by the owner on Jul 3, 2023. It is now read-only.

Fixes DAG construction slowness #169

Merged
merged 1 commit into from
Aug 5, 2022
Merged

Fixes DAG construction slowness #169

merged 1 commit into from
Aug 5, 2022

Conversation

elijahbenizzy
Copy link
Collaborator

@elijahbenizzy elijahbenizzy commented Aug 4, 2022

This is an issue with check_output. We were using sum(lists) which does
a concat operation for each list. This is fine at a small scale, but at
a larger scale this is an O(n^2) operation. We can still optimize this
significantly but this fixes the underlying issue in #168.

Note that this does not fix the issues around ray yet, this is just
the underlying piece. With Ray, the hard part is validating everything
which is inherently slow. Worth looking into why parallelism didn't help
though.

[Short description explaining the high-level reason for the pull request]

Changes

Testing

Notes

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.
  • Reviewers requested with the Reviewers tool ➡️

Testing checklist

Python - local testing

  • python 3.6
  • python 3.7

This is an issue with check_output. We were using sum(lists) which does
a concat operation for each list. This is fine at a small scale, but at
a larger scale this is an O(n^2) operation. We can still optimize this
significantly but this fixes the underlying issue in #168.

Note that this does *not* fix the issues around ray yet, this is just
the underlying piece. With Ray, the hard part is validating everything
which is inherently slow. Worth looking into why parallelism didn't help
though.
@elijahbenizzy elijahbenizzy requested a review from skrawcz August 4, 2022 21:37
Copy link
Collaborator

@skrawcz skrawcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we have test coverage here? If so, LGTM.

@skrawcz
Copy link
Collaborator

skrawcz commented Aug 4, 2022

how much does it improve #168 ?

@elijahbenizzy
Copy link
Collaborator Author

Need to run some actual benchmark but 7x for 10k nodes and 100x for 100k nodes.

@skrawcz skrawcz mentioned this pull request Aug 4, 2022
@elijahbenizzy
Copy link
Collaborator Author

Scales linearly:

Created DAG with 10 columns in: 0.0002918243408203125 seconds
Created DAG with 100 columns in: 0.0019080638885498047 seconds
Created DAG with 1000 columns in: 0.031607866287231445 seconds
Created DAG with 10000 columns in: 0.3109908103942871 seconds
Created DAG with 100000 columns in: 4.007328033447266 seconds

Compared to:

Created DAG with 10 columns in: 0.00030612945556640625 seconds
Created DAG with 100 columns in: 0.0021920204162597656 seconds
Created DAG with 1000 columns in: 0.05421900749206543 seconds
Created DAG with 10000 columns in: 2.8478641510009766 seconds
# don't feel like waiting

@elijahbenizzy
Copy link
Collaborator Author

OK, so where does it spend its time?

image

No obvious bang for our buck -- @skrawcz your intuition is right once we fixed it getting super slow. Object creation takes 20=25% of the time. Then resolving arguments (I think we can probably cache that) takes another 230%. So, not big solutions here. Going to call this a success and merge.

@elijahbenizzy elijahbenizzy merged commit 9ecf72c into main Aug 5, 2022
@elijahbenizzy elijahbenizzy deleted the slow-dag-build branch August 5, 2022 22:29
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants