-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: use faster query_and_wait method from google-cloud-bigquery when available #722
Conversation
I tested this manually in a local ipython instance to verify that the latency is improved when using this new functionality: After this change: # In [1]:
import pandas_gbq
# In [2]: %%timeit -n10 -r10
df = pandas_gbq.read_gbq("SELECT 1", progress_bar_type=None)
# Out [2]:
# 394 ms ± 37.8 ms per loop (mean ± std. dev. of 10 runs, 10 loops each) Before this change: # In [1]:
import pandas_gbq
# In [2]: %%timeit -n10 -r10
df = pandas_gbq.read_gbq("SELECT 1", progress_bar_type=None)
# Out [2]:
# 1.08 s ± 102 ms per loop (mean ± std. dev. of 10 runs, 10 loops each) This is quite similar to the results I observed when implementing this optimization in the DB-API connector. googleapis/python-bigquery#1747 (comment) |
…n available fix unit tests fix python 3.7 fix python 3.7 fix python 3.7 fix python 3.7 fix wait_timeout units boost test coverage remove dead code boost a little more coverage
51fa1a4
to
7739f41
Compare
tests/system/test_gbq.py
Outdated
@@ -426,33 +426,6 @@ def test_query_inside_configuration(self, project_id): | |||
) | |||
tm.assert_frame_equal(df, DataFrame({"valid_string": ["PI"]})) | |||
|
|||
def test_configuration_without_query(self, project_id): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure why this isn't failing anymore. I guess jobs.query is more lenient about unknown parameters than jobs.insert is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you share a link to a failing test log? I wasn't able to find it in test fusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just ran this test in a debugger. The request body we send to jobs.query
looks like:
{
'useLegacySql': True,
'formatOptions': {'useInt64Timestamp': True},
'query': 'SELECT 1',
'requestId': '8c4508cb-b8e4-463b-bef8-4a8f894d3644'
}
Notice that none of the copy
configuration is included. I believe this to be a bug in the query_and_wait
implementation. I'll send a PR today to fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested locally with googleapis/python-bigquery#1793 and the request body now includes unknown parameters, but surprisingly the REST API seems to ignore these. I'll follow-up on that PR to raise on common errors, such as passing in a configuration object of the wrong type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated googleapis/python-bigquery#1793 with some client-side checks for common mistakes of sending the wrong type for job_config
and now this test is passing.
Will need to wait for a google-cloud-bigquery
release with that fix before merging this PR.
All looks good except for that I wanted to double check on the removed system test. |
@Linchin Tests are passing after the most recent google-cloud-bigquery release. OK for another look. |
LGTM, thank you! |
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Requires #720, #721, and googleapis/python-bigquery#1793 to be merged first.
Fixes #710 🦕