Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery Python Client v0.28.0: num_rows, schema property of a Table returning None and empty list #4373

Closed
pramodvspk opened this issue Nov 9, 2017 · 6 comments
Assignees
Labels
api: bigquery Issues related to the BigQuery API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@pramodvspk
Copy link

Hi,

I am using the BigQuery Python client v0.28.0

Using the bigquery client to list all tables in a dataset using method list_dataset_tables, I get an iterator of table references in the dataset. When I try to access the num_rows and schema properties of the Table object it returns None and [].

But when I access the table using client.get_table() I am able to access both the num_rows and schema properties.

I am confused since both the methods return a google.cloud.bigquery.table.Table object but are inconsistent.

for table in bqclient.list_dataset_tables(dataset_ref):
	print(table.num_rows)
	print(table.schema)

Returns None and []

table = bqclient.get_table(table)
	print(table.num_rows)
	print(table.schema)

Returns the number of rows and list of schemaFields

Thanks

@tseaver tseaver added api: bigquery Issues related to the BigQuery API. type: question Request for information or clarification. Not an issue. labels Nov 9, 2017
@tswast
Copy link
Contributor

tswast commented Nov 13, 2017

Thanks for the report.

It appears as though the BigQuery tables.list method only returns a subset of the fields of a table.

I see a few potentially sane fixes:

  1. (most desirable) See if we can get the backend team to expose a full Table resource in list operations (maybe with an optional parameter).
  2. (breaking, but maybe our best option) Make list_dataset_tables return a list of table references and discard the few extra table fields that the list call returns.
  3. (inefficient) Make a call to get the table resource for each item in the returned list.

I'll send a note to my contacts on the BigQuery backend team about doing (1), but the most likely fix will be (2).

@tswast tswast added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. and removed type: question Request for information or clarification. Not an issue. labels Nov 13, 2017
@tswast
Copy link
Contributor

tswast commented Nov 13, 2017

I guess there's another option (4) document that only a subset of fields are exposed in the table response objects from the list request.

I dislike that option the most, as it leaves a lot of room for confusion.

@tswast
Copy link
Contributor

tswast commented Nov 13, 2017

I got confirmation from the backend team that (1) is infeasible. Properties like the schema and total rows of the table take much longer to fetch in the backend. The list API call includes only properties which are fast for the backend to fetch.

I think I'd like to propose another option (5) Introduce a new type containing this subset of properties. Maybe PartialTable or TableListItem which includes only those properties present in the list API response. That way it will be much clearer from the documentation which properties are included.

@tseaver
Copy link
Contributor

tseaver commented Nov 15, 2017

I had (naively, I guess) assumed that this issue was the exact reason TableReference even exists.

@tswast
Copy link
Contributor

tswast commented Nov 15, 2017

No, I unfortunately hadn't thought that listing would have some but not all properties. TableReference was added for the very similar problem of where properties such as destination table on the query job only include the table ID.

I'm thinking we really do need to introduce a third type.

@tswast
Copy link
Contributor

tswast commented Nov 20, 2017

Just sent #4427 to add TableListItem. This issue is also present on datasets and will need a similar change for them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

3 participants