-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficient queries for connection list #17360
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
pmossman
changed the title
Parker/optimize connection list
Efficient queries for connection list
Sep 28, 2022
github-actions
bot
added
area/platform
issues related to the platform
area/server
labels
Sep 28, 2022
pmossman
force-pushed
the
parker/optimize-connection-list
branch
from
September 30, 2022 17:00
7da0acd
to
af043f7
Compare
pmossman
force-pushed
the
parker/optimize-connection-list
branch
from
October 3, 2022 16:17
518f06a
to
15fc834
Compare
pmossman
force-pushed
the
parker/optimize-connection-list
branch
from
October 3, 2022 17:54
f5b39bb
to
058c5df
Compare
davinchia
reviewed
Oct 6, 2022
...yte-persistence/job-persistence/src/main/java/io/airbyte/persistence/job/JobPersistence.java
Outdated
Show resolved
Hide resolved
davinchia
reviewed
Oct 6, 2022
...-config/config-persistence/src/main/java/io/airbyte/config/persistence/ConfigRepository.java
Outdated
Show resolved
Hide resolved
davinchia
reviewed
Oct 6, 2022
...-config/config-persistence/src/main/java/io/airbyte/config/persistence/ConfigRepository.java
Outdated
Show resolved
Hide resolved
davinchia
reviewed
Oct 6, 2022
...-config/config-persistence/src/main/java/io/airbyte/config/persistence/ConfigRepository.java
Outdated
Show resolved
Hide resolved
davinchia
reviewed
Oct 6, 2022
...-config/config-persistence/src/main/java/io/airbyte/config/persistence/ConfigRepository.java
Outdated
Show resolved
Hide resolved
davinchia
reviewed
Oct 6, 2022
...-config/config-persistence/src/main/java/io/airbyte/config/persistence/ConfigRepository.java
Outdated
Show resolved
Hide resolved
davinchia
reviewed
Oct 6, 2022
airbyte-config/config-persistence/src/main/java/io/airbyte/config/persistence/DbConverter.java
Show resolved
Hide resolved
davinchia
reviewed
Oct 6, 2022
airbyte-config/config-persistence/src/main/java/io/airbyte/config/persistence/DbConverter.java
Show resolved
Hide resolved
davinchia
reviewed
Oct 6, 2022
...-config/config-persistence/src/main/java/io/airbyte/config/persistence/ConfigRepository.java
Show resolved
Hide resolved
davinchia
reviewed
Oct 6, 2022
...sistence/job-persistence/src/main/java/io/airbyte/persistence/job/DefaultJobPersistence.java
Outdated
Show resolved
Hide resolved
davinchia
reviewed
Oct 6, 2022
...sistence/job-persistence/src/main/java/io/airbyte/persistence/job/DefaultJobPersistence.java
Outdated
Show resolved
Hide resolved
davinchia
reviewed
Oct 6, 2022
...yte-persistence/job-persistence/src/main/java/io/airbyte/persistence/job/JobPersistence.java
Outdated
Show resolved
Hide resolved
Good points:
I'm curious what kind of performance improvement we have here. |
…nd group-by connectionID in memory
… a different order. verify operationIds separately from rest of object
pmossman
force-pushed
the
parker/optimize-connection-list
branch
from
October 10, 2022 15:38
84409c8
to
67d1441
Compare
This was referenced Oct 13, 2022
jhammarstedt
pushed a commit
to jhammarstedt/airbyte
that referenced
this pull request
Oct 31, 2022
* query once for all needed models, instead of querying within connections loop * cleanup and fix failing tests * pmd fix * fix query and add test * return empty if input list is empty * undo aggressive autoformatting * don't query for connection operations in a loop, instead query once and group-by connectionID in memory * try handling operationIds in a single query instead of two * remove optional * fix operationIds query * very annoying, test was failing because operationIds can be listed in a different order. verify operationIds separately from rest of object * combined queries/functions instead of separate queries for actor and definition * remove leftover lines that aren't doing anything * format * add javadoc * format * use leftjoin so that connections that lack operations aren't left out * clean up comments and format
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What
The web_backend connections list handler finds all connections within a workspace, and loops over each one, calling other handlers to build source reads, destination reads, and fetching the latest job info. Doing this once per connection is very slow and can require hundreds of queries for workspaces with many connections.
Instead, this PR performs all necessary queries up front and stores the information in maps that can then be referenced when building the response model for each connection. This should drastically reduce the number of database queries.
How
Performance Testing
I did some comparison in
dev
between master and this branch for a workspace with ~250 active connectionsTagging @malikdiarra and @davinchia for review as they both have context on our repository layer and experience with making similar optimizations. Definitely looking for thoughts and feedback on the new queries and in-memory grouping code.