Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery: Unhelpful code samples in "Migrating to Python Client Library v0.28+" doc #4929

Closed
inglesp opened this issue Feb 25, 2018 · 8 comments
Assignees
Labels
api: bigquery Issues related to the BigQuery API. type: process A process-related concern. May include testing, release, or the like.

Comments

@inglesp
Copy link

inglesp commented Feb 25, 2018

The code samples at https://cloud.google.com/bigquery/docs/python-client-migration show similar code before and after the big API changes in 0.28. However, the before and after samples are often very different, making a comparison much harder than it needs to be.

@chemelnucfin chemelnucfin self-assigned this Feb 26, 2018
@chemelnucfin chemelnucfin added documentation api: bigquery Issues related to the BigQuery API. type: process A process-related concern. May include testing, release, or the like. labels Feb 26, 2018
@chemelnucfin chemelnucfin changed the title Unhelpful code samples in "Migrating to Python Client Library v0.28+" doc BigQuery: Unhelpful code samples in "Migrating to Python Client Library v0.28+" doc Feb 26, 2018
@theacodes
Copy link
Contributor

@alixhami @tswast can you take a look?

@tswast
Copy link
Contributor

tswast commented Feb 26, 2018

@inglesp Is there a specific sample that you believe does too much?

@inglesp
Copy link
Author

inglesp commented Feb 27, 2018

Is there a specific sample that you believe does too much?

All of them! (And it's not really that the examples are doing too much, just that the before and after versions are written in totally different ways.)

For instance, in the Querying data synchronously section, the before version is:

query_results = client.run_sync_query(query)
query_results.use_legacy_sql = False

query_results.run()

# The query might not complete in a single request. To account for a
# long-running query, force the query results to reload until the query
# is complete.
while not query_results.complete:
  query_iterator = query_results.fetch_data()
  try:
     six.next(iter(query_iterator))
  except StopIteration:
      pass

rows = query_results.fetch_data()
for row in rows:
    print(row)

while the after version is:

QUERY = (
    'SELECT name FROM `bigquery-public-data.usa_names.usa_1910_2013` '
    'WHERE state = "TX" '
    'LIMIT 100')
TIMEOUT = 30  # in seconds
rows = list(client.query_rows(QUERY, timeout=TIMEOUT))  # API request

assert len(rows) == 100
row = rows[0]
assert row[0] == row.name == row['name']

The after version defines QUERY and TIMEOUT; the before version assumes that query is already defined, and doesn't specify a timeout.

The before version iterates over the results, printing them one by one; the after version turns the results into a list, asserts the length of the list, and asserts something about the first result.

Given the wholesale API changes introduced in 0.28, I would expect much more thought to go into the migration docs, especially as many (most?) of your library's users are paying customers.

@tswast
Copy link
Contributor

tswast commented Feb 27, 2018

Yeah, you're right that they aren't doing 100% exactly the same thing.

Your specific sample to highlight is a good indicator, actually. query_rows is no longer a method in 0.29+ since the query function can be used in exactly the same way as it is here. I'll leave this open to give that page another run through.

@ye
Copy link

ye commented Mar 19, 2018

Also, the before version query.rows is a simple Python nested tuple, which is easy to compare and to remove duplicates (by calling set()) but the new version (after converting the iterator to a list) is returning a tuple of google.cloud.bigquery._helpers.Row objects, which isn't hashable and calling them on set() will yield a TypeError: unhashable type: 'Row'" exception. I understand the reason to make a wrapper object to facilitate both positional access and key access to the result row, but this broke our expectation of the result object and it's not documented either.

Can we 1. update the docs in all places where return type changed? 2. make google.cloud.bigquery._helpers.Row hashable so that calling a set() on it won't break?

@tswast
Copy link
Contributor

tswast commented Mar 19, 2018

Can we 1. update the docs in all places where return type changed? 2. make google.cloud.bigquery._helpers.Row hashable so that calling a set() on it won't break?

@ye Pull requests welcome. :-) There's definitely precedent for adding methods to the Row class. For example, #4393 makes it so that the Row class acts more like a dictionary. Since I'd expect rows not to be mutable, I think it would make sense for them to be hashable.

@tseaver
Copy link
Contributor

tseaver commented Aug 28, 2018

@tswast Seems like this issue has changed from a docs bug to a feature request. Is it still relevant, given that the current release of google-cloud-bigquery is at 1.5.0?

@tswast
Copy link
Contributor

tswast commented Aug 28, 2018

I never updated the migration page samples, but I do think there's probably less demand to do so now. We can close this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. type: process A process-related concern. May include testing, release, or the like.
Projects
None yet
Development

No branches or pull requests

7 participants