Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simple query hangs in 0.14.1, works in 0.13.3 #343

Closed
mmwilbert opened this issue Nov 20, 2020 · 8 comments · Fixed by googleapis/python-bigquery#400 or #354
Closed

simple query hangs in 0.14.1, works in 0.13.3 #343

mmwilbert opened this issue Nov 20, 2020 · 8 comments · Fixed by googleapis/python-bigquery#400 or #354
Labels
type: question Request for information or clarification. Not an issue.

Comments

@mmwilbert
Copy link

SUMMARY_TABLE is very simple million-record table in bigquery, nothing but short strings, numbers, and a timestamp.

Installed pandas-gbq and got the 0.14.1 version.

Run

query = f"select * from {SUMMARY_TABLE} LIMIT 20000"
df = pd.read_gbq(query,
                 project_id=PROJECT_NAME,
                 dialect='standard',
                 progress_bar_type='tqdm',
       )

and it works normally.

Run

query = f"select * from {SUMMARY_TABLE} LIMIT 50000"
df = pd.read_gbq(query,
                 project_id=PROJECT_NAME,
                 dialect='standard',
                 progress_bar_type='tqdm',
       )

(or no LIMIT) and it runs forever. Note that I tried setting max_results and it didn't make any difference, even if set to less than 20000.

Uninstall 0.14.1, install 0.13.3, the problem is gone.

@tswast tswast added the type: question Request for information or clarification. Not an issue. label Nov 20, 2020
@tswast
Copy link
Collaborator

tswast commented Nov 20, 2020

Please share what package versions you are using.

According to the package manager:

conda list
pip freeze

According to Python itself:

python -c 'import six ; print(six.__version__)'
python -c 'import google.cloud.bigquery ; print(google.cloud.bigquery.__version__)'
python -c 'import google.cloud.bigquery_storage ; print(google.cloud.bigquery_storage.__version__)'

Also, when you stop the program, please share the stacktrace so that we can see where it is getting stuck.

@mmwilbert
Copy link
Author

mmwilbert commented Nov 20, 2020

While generating the requirements files, I changed the version back to 0.14.1 and it still worked. I then created a new environment using the freeze file and that worked too. So I tried recreating the problem from scratch, and I succeeded.

Made new virtualenv (python 3.8)
pip install pandas
pip install pandas-gbq

And the problem returns. I did a diff on the freeze files from the 0.14.1 environment that works, and the one that doesn't.

(good left, bad right)
6d5
< google-api-python-client==1.12.8
8d6
< google-auth-httplib2==0.0.4
10,11c8,9
< google-cloud-bigquery==1.28.0
< google-cloud-bigquery-storage==1.1.0
---
> google-cloud-bigquery==2.4.0
> google-cloud-bigquery-storage==2.1.0
17d14
< httplib2==0.18.1
27c24
< pyarrow==1.0.1
---
> pyarrow==2.0.0
41d37
< uritemplate==3.0.1

Replacing google-cloud-bigquery==2.4.0 with google-cloud-bigquery==1.28.0 fixes the issue. Didn't change anything else in the environment.

Just in case it matters, here's a session record from a run that didn't work. FYI, I actually left one of these running for five hours and it still didn't finish.

loading summary dataframe
Requesting query...
Query running...
Job ID: 7dc2ff96-173e-4e16-91a7-1ab12593cb39
Elapsed 6.49 s. Waiting...
Elapsed 7.51 s. Waiting...
Elapsed 8.53 s. Waiting...
Elapsed 9.55 s. Waiting...
Elapsed 10.56 s. Waiting...
Elapsed 11.58 s. Waiting...
Elapsed 12.59 s. Waiting...
Elapsed 13.61 s. Waiting...
Elapsed 14.62 s. Waiting...
Elapsed 15.65 s. Waiting...
Elapsed 16.66 s. Waiting...
Elapsed 17.68 s. Waiting...
Elapsed 18.69 s. Waiting...
Elapsed 19.71 s. Waiting...
Elapsed 20.72 s. Waiting...
Elapsed 21.74 s. Waiting...
Elapsed 22.76 s. Waiting...
Elapsed 23.78 s. Waiting...
Elapsed 24.79 s. Waiting...
Elapsed 25.8 s. Waiting...
Elapsed 26.82 s. Waiting...
Elapsed 27.83 s. Waiting...
Elapsed 28.85 s. Waiting...
^CTraceback (most recent call last):
File "test.py", line 56, in
df = loadSummaryTableIntoDf()
File "test.py", line 39, in loadSummaryTableIntoDf
df = pd.read_gbq(query,
File "/home/mwilbert/bqtest/venv/lib/python3.8/site-packages/pandas/io/gbq.py", line 184, in read_gbq
return pandas_gbq.read_gbq(
File "/home/mwilbert/bqtest/venv/lib/python3.8/site-packages/pandas_gbq/gbq.py", line 972, in read_gbq
final_df = connector.run_query(
File "/home/mwilbert/bqtest/venv/lib/python3.8/site-packages/pandas_gbq/gbq.py", line 507, in run_query
query_reply.result(timeout=timeout_sec)
File "/home/mwilbert/bqtest/venv/lib/python3.8/site-packages/google/cloud/bigquery/job/query.py", line 1160, in result
super(QueryJob, self).result(retry=retry, timeout=timeout)
File "/home/mwilbert/bqtest/venv/lib/python3.8/site-packages/google/cloud/bigquery/job/base.py", line 631, in result
return super(_AsyncJob, self).result(timeout=timeout, **kwargs)
File "/home/mwilbert/bqtest/venv/lib/python3.8/site-packages/google/api_core/future/polling.py", line 129, in result
self._blocking_poll(timeout=timeout, **kwargs)
File "/home/mwilbert/bqtest/venv/lib/python3.8/site-packages/google/cloud/bigquery/job/query.py", line 1017, in _blocking_poll
super(QueryJob, self)._blocking_poll(timeout=timeout, **kwargs)
File "/home/mwilbert/bqtest/venv/lib/python3.8/site-packages/google/api_core/future/polling.py", line 107, in blocking_poll
retry
(self._done_or_raise)(**kwargs)
File "/home/mwilbert/bqtest/venv/lib/python3.8/site-packages/google/api_core/retry.py", line 281, in retry_wrapped_func
return retry_target(
File "/home/mwilbert/bqtest/venv/lib/python3.8/site-packages/google/api_core/retry.py", line 184, in retry_target
return target()
File "/home/mwilbert/bqtest/venv/lib/python3.8/site-packages/google/api_core/future/polling.py", line 85, in _done_or_raise
if not self.done(**kwargs):
File "/home/mwilbert/bqtest/venv/lib/python3.8/site-packages/google/cloud/bigquery/job/query.py", line 1000, in done
self._reload_query_results(retry=retry, timeout=timeout)
File "/home/mwilbert/bqtest/venv/lib/python3.8/site-packages/google/cloud/bigquery/job/query.py", line 1106, in _reload_query_results
self._query_results = self._client._get_query_results(
File "/home/mwilbert/bqtest/venv/lib/python3.8/site-packages/google/cloud/bigquery/client.py", line 1557, in _get_query_results
resource = self._call_api(
File "/home/mwilbert/bqtest/venv/lib/python3.8/site-packages/google/cloud/bigquery/client.py", line 636, in _call_api
return call()
File "/home/mwilbert/bqtest/venv/lib/python3.8/site-packages/google/api_core/retry.py", line 281, in retry_wrapped_func
return retry_target(
File "/home/mwilbert/bqtest/venv/lib/python3.8/site-packages/google/api_core/retry.py", line 184, in retry_target
return target()
File "/home/mwilbert/bqtest/venv/lib/python3.8/site-packages/google/cloud/_http.py", line 424, in api_request
response = self._make_request(
File "/home/mwilbert/bqtest/venv/lib/python3.8/site-packages/google/cloud/_http.py", line 288, in _make_request
return self._do_request(
File "/home/mwilbert/bqtest/venv/lib/python3.8/site-packages/google/cloud/_http.py", line 326, in _do_request
return self.http.request(
File "/home/mwilbert/bqtest/venv/lib/python3.8/site-packages/google/auth/transport/requests.py", line 464, in request
response = super(AuthorizedSession, self).request(
File "/home/mwilbert/bqtest/venv/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/home/mwilbert/bqtest/venv/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/home/mwilbert/bqtest/venv/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/home/mwilbert/bqtest/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/home/mwilbert/bqtest/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/home/mwilbert/bqtest/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib/python3.8/http/client.py", line 1347, in getresponse
response.begin()
File "/usr/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/usr/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/usr/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
KeyboardInterrupt

@tswast
Copy link
Collaborator

tswast commented Nov 20, 2020

Could you try with google-cloud-bigquery==2.3.1?

I wonder if this is an unintended consequence of googleapis/python-bigquery#374

@mmwilbert
Copy link
Author

Works with google-cloud-bigquery==2.3.1

@bhachauk
Copy link

Same issue .. got fixed...! Thanks

@tswast
Copy link
Collaborator

tswast commented Nov 24, 2020

I am reverting what I believe to be the cause of the regression in googleapis/python-bigquery#400 Hopefully that means this issue will not occur in google-cloud-bigquery==2.4.1 or possibly google-cloud-bigquery==2.5.0

gcf-merge-on-green bot pushed a commit to googleapis/python-bigquery that referenced this issue Nov 24, 2020
When there are large result sets, fetching rows while waiting for the
query to finish can cause the API to hang indefinitely. (This may be due
to an interaction between connection timeout and API timeout.)

This reverts commit 86f6a51 (#374).

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
- [x] Make sure to open an issue as a [bug/issue](https://github.com/googleapis/python-bigquery/issues/new/choose) before writing your code!  That way we can discuss the change, evaluate designs, and agree on the general idea
- [x] Ensure the tests and linter pass
- [x] Code coverage does not decrease (if any source code was changed)
- [x] Appropriate docs were updated (if necessary)

Fixes googleapis/python-bigquery-pandas#343
Fixes #394 🦕
@tswast
Copy link
Collaborator

tswast commented Dec 17, 2020

Fixed in google-cloud-bigquery>=2.5.0 https://github.com/googleapis/python-bigquery/blob/master/CHANGELOG.md#250-2020-12-02

I've closed out googleapis/python-bigquery#362 which was the cause of this behavior, as I don't believe multi-second API requests are ideal for the often interactive Python use cases.

@tswast tswast closed this as completed Dec 17, 2020
@tswast tswast reopened this Dec 17, 2020
@tswast
Copy link
Collaborator

tswast commented Dec 17, 2020

Actually, I'll re-open this to mark version 2.4.x as unsupported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: question Request for information or clarification. Not an issue.
Projects
None yet
3 participants