Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci(bigquery): bigquery ci is very slow #8987

Closed
cpcloud opened this issue Apr 17, 2024 · 6 comments · Fixed by #9418
Closed

ci(bigquery): bigquery ci is very slow #8987

cpcloud opened this issue Apr 17, 2024 · 6 comments · Fixed by #9418
Assignees
Labels
bigquery The BigQuery backend ci Continuous Integration issues or PRs developer-tools Tools related to ibis development performance Issues related to ibis's performance

Comments

@cpcloud
Copy link
Member

cpcloud commented Apr 17, 2024

This CI run took nearly an hour and a half: https://github.com/ibis-project/ibis/actions/runs/8697121662/job/23851753722.

Is there something we can do to speed this up a bit?

cc @tswast

@cpcloud cpcloud added ci Continuous Integration issues or PRs performance Issues related to ibis's performance bigquery The BigQuery backend developer-tools Tools related to ibis development labels Apr 17, 2024
@cpcloud
Copy link
Member Author

cpcloud commented Apr 17, 2024

Created a notebook in a gist showing the issue: https://gist.github.com/cpcloud/b019ed898312d422190152b02b029377.

@tswast tswast self-assigned this Apr 25, 2024
@tswast
Copy link
Collaborator

tswast commented Apr 25, 2024

Not sure why the increase in variability. One thing we might want to try is the new query_and_wait method (added in google-cloud-bigquery 3.14.0, googleapis/python-bigquery#1722) which is optimized for queries that return small (< 100 MB) results.

@tswast
Copy link
Collaborator

tswast commented Apr 25, 2024

optimized for queries that return small (< 100 MB) results.

Note: It falls back to the existing BQ Storage Read API implementation for larger results.

@tswast
Copy link
Collaborator

tswast commented Jun 20, 2024

Copying from an email here for easier reference.

I added Client.query_and_wait in google-cloud-bigquery 3.14.0 late last year (https://github.com/googleapis/python-bigquery/blob/main/CHANGELOG.md#3140-2023-12-08). Since then there have been a few fixes and optimizations, but I think that should be safe as the minimum version for what ibis is doing. For small queries with small results (< 500 KB or so) that can save anywhere from a few hundred milliseconds to 3 seconds.

Looking at where .query() is currently, called in ibis

query = self.client.query(
stmt, job_config=job_config, project=self.billing_project
)
query.result() # blocks until finished

it won't be quite trivial. For example, the pattern of waiting until .result() to set the page size

query_result = query.result(page_size=chunk_size)

won't work. It needs to be set at query_and_wait time.

@cpcloud
Copy link
Member Author

cpcloud commented Jun 20, 2024

Yep, working through it now!

@cpcloud
Copy link
Member Author

cpcloud commented Jun 20, 2024

Thanks for the reference!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery The BigQuery backend ci Continuous Integration issues or PRs developer-tools Tools related to ibis development performance Issues related to ibis's performance
Projects
Archived in project
2 participants