BigQuery: Unhelpful code samples in "Migrating to Python Client Library v0.28+" doc #4929

inglesp · 2018-02-25T20:50:36Z

The code samples at https://cloud.google.com/bigquery/docs/python-client-migration show similar code before and after the big API changes in 0.28. However, the before and after samples are often very different, making a comparison much harder than it needs to be.

theacodes · 2018-02-26T19:15:42Z

@alixhami @tswast can you take a look?

tswast · 2018-02-26T20:23:38Z

@inglesp Is there a specific sample that you believe does too much?

inglesp · 2018-02-27T09:46:16Z

Is there a specific sample that you believe does too much?

All of them! (And it's not really that the examples are doing too much, just that the before and after versions are written in totally different ways.)

For instance, in the Querying data synchronously section, the before version is:

query_results = client.run_sync_query(query)
query_results.use_legacy_sql = False

query_results.run()

# The query might not complete in a single request. To account for a
# long-running query, force the query results to reload until the query
# is complete.
while not query_results.complete:
  query_iterator = query_results.fetch_data()
  try:
     six.next(iter(query_iterator))
  except StopIteration:
      pass

rows = query_results.fetch_data()
for row in rows:
    print(row)

while the after version is:

QUERY = (
    'SELECT name FROM `bigquery-public-data.usa_names.usa_1910_2013` '
    'WHERE state = "TX" '
    'LIMIT 100')
TIMEOUT = 30  # in seconds
rows = list(client.query_rows(QUERY, timeout=TIMEOUT))  # API request

assert len(rows) == 100
row = rows[0]
assert row[0] == row.name == row['name']

The after version defines QUERY and TIMEOUT; the before version assumes that query is already defined, and doesn't specify a timeout.

The before version iterates over the results, printing them one by one; the after version turns the results into a list, asserts the length of the list, and asserts something about the first result.

Given the wholesale API changes introduced in 0.28, I would expect much more thought to go into the migration docs, especially as many (most?) of your library's users are paying customers.

tswast · 2018-02-27T17:22:10Z

Yeah, you're right that they aren't doing 100% exactly the same thing.

Your specific sample to highlight is a good indicator, actually. query_rows is no longer a method in 0.29+ since the query function can be used in exactly the same way as it is here. I'll leave this open to give that page another run through.

ye · 2018-03-19T19:54:03Z

Also, the before version query.rows is a simple Python nested tuple, which is easy to compare and to remove duplicates (by calling set()) but the new version (after converting the iterator to a list) is returning a tuple of google.cloud.bigquery._helpers.Row objects, which isn't hashable and calling them on set() will yield a TypeError: unhashable type: 'Row'" exception. I understand the reason to make a wrapper object to facilitate both positional access and key access to the result row, but this broke our expectation of the result object and it's not documented either.

Can we 1. update the docs in all places where return type changed? 2. make google.cloud.bigquery._helpers.Row hashable so that calling a set() on it won't break?

tswast · 2018-03-19T20:17:16Z

Can we 1. update the docs in all places where return type changed? 2. make google.cloud.bigquery._helpers.Row hashable so that calling a set() on it won't break?

@ye Pull requests welcome. :-) There's definitely precedent for adding methods to the Row class. For example, #4393 makes it so that the Row class acts more like a dictionary. Since I'd expect rows not to be mutable, I think it would make sense for them to be hashable.

tseaver · 2018-08-28T18:40:45Z

@tswast Seems like this issue has changed from a docs bug to a feature request. Is it still relevant, given that the current release of google-cloud-bigquery is at 1.5.0?

tswast · 2018-08-28T18:42:59Z

I never updated the migration page samples, but I do think there's probably less demand to do so now. We can close this out.

chemelnucfin self-assigned this Feb 26, 2018

chemelnucfin added documentation api: bigquery Issues related to the BigQuery API. type: process A process-related concern. May include testing, release, or the like. labels Feb 26, 2018

chemelnucfin changed the title ~~Unhelpful code samples in "Migrating to Python Client Library v0.28+" doc~~ BigQuery: Unhelpful code samples in "Migrating to Python Client Library v0.28+" doc Feb 26, 2018

theacodes assigned tswast and alixhami Feb 26, 2018

tseaver unassigned chemelnucfin and alixhami Apr 10, 2018

tswast closed this as completed Aug 28, 2018

theacodes unassigned tswast Sep 28, 2018

JustinBeckwith assigned tswast Feb 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BigQuery: Unhelpful code samples in "Migrating to Python Client Library v0.28+" doc #4929

BigQuery: Unhelpful code samples in "Migrating to Python Client Library v0.28+" doc #4929

inglesp commented Feb 25, 2018

theacodes commented Feb 26, 2018

tswast commented Feb 26, 2018

inglesp commented Feb 27, 2018

tswast commented Feb 27, 2018

ye commented Mar 19, 2018

tswast commented Mar 19, 2018

tseaver commented Aug 28, 2018

tswast commented Aug 28, 2018

BigQuery: Unhelpful code samples in "Migrating to Python Client Library v0.28+" doc #4929

BigQuery: Unhelpful code samples in "Migrating to Python Client Library v0.28+" doc #4929

Comments

inglesp commented Feb 25, 2018

theacodes commented Feb 26, 2018

tswast commented Feb 26, 2018

inglesp commented Feb 27, 2018

tswast commented Feb 27, 2018

ye commented Mar 19, 2018

tswast commented Mar 19, 2018

tseaver commented Aug 28, 2018

tswast commented Aug 28, 2018