-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
increased memory usage in 2.4.0 #394
Comments
We are caching the first page of results in the QueryJob class, which is why the memory is still being used in this example. (You're hanging on to the QueryJob class.) What's the reason you'd want to retain a reference to this job class? |
Workaround: You can call We can investigate doing this automatically, though it's a bit tricky since the first page of results aren't actually used until the RowIterator is iterated over. |
@tswast the reason to keep around the QueryJob objects is to launch multiple queries in parallel, and then call |
I'm reverting this change in #400 I'm also doing some more intense benchmarking with a variety of table / row sizes, as it's clear that there are many combinations for which this caching behavior was a regression. |
When there are large result sets, fetching rows while waiting for the query to finish can cause the API to hang indefinitely. (This may be due to an interaction between connection timeout and API timeout.) This reverts commit 86f6a51 (#374). Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly: - [x] Make sure to open an issue as a [bug/issue](https://github.com/googleapis/python-bigquery/issues/new/choose) before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea - [x] Ensure the tests and linter pass - [x] Code coverage does not decrease (if any source code was changed) - [x] Appropriate docs were updated (if necessary) Fixes googleapis/python-bigquery-pandas#343 Fixes #394 🦕
Version 2.4.0 of the library is allocating much more memory that the previous version, 2.3.1, when running multiple queries.
In particular, it seems that the
QueryJob
object is retaining the results of the query internally, and that memory is not deallocated.I think that the problem is related to #374.
Environment details
google-cloud-bigquery
version: 2.4.0Steps to reproduce
Run the script in the code example with
google-cloud-bigquery
2.4.0 and 2.3.1 versions.You will also need to install:
The outputs on my machine are:
With 2.4.0:
With 2.3.1:
Code example
Please note that we are storing a reference to the
QueryJob
objects, but not to the resulting DataFrames.The text was updated successfully, but these errors were encountered: