Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: avoid policy tags 403 error in load_table_from_dataframe #557

Merged
merged 4 commits into from
Mar 19, 2021

Conversation

tswast
Copy link
Contributor

@tswast tswast commented Mar 17, 2021

In internal issue 182204971, as customer is encountering a 403 error for missing permissions to set policy tags on a table when trying to append a dataframe to a table with load_table_from_dataframe. This is because we get the schema from the table and then pass it back to the API. We only need to set the field names (and maybe type + mode) in this case.

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes internal bug 182204971 🦕

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Mar 17, 2021
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Mar 17, 2021
@tswast tswast changed the title WIP: fix: don't set policy tags in load job from dataframe fix: avoid policy tags 403 error in load_table_from_dataframe Mar 17, 2021
@tswast tswast marked this pull request as ready for review March 18, 2021 14:34
@tswast tswast requested review from a team, stephaniewang526, shollyman and plamut and removed request for a team March 18, 2021 14:34
}
if mode is not None:
self._properties["mode"] = mode.upper()
if description is not _DEFAULT_VALUE:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shollyman This is one of the key changes: we no longer set the resource value for "description" if it's not explicitly set.

We already omit policy_tags from the resource if none (though arguably it should get the same treatment so that someone can unset policy tags from Python)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My default inclination would be for special handling for None values to happen at the places where it's significant, like when calling tables.update. It's also the case that schema fields can't be manipulated individually, so perhaps I'm simply just not thinking this through properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I called that out as a possibility in #558, but that'd require updating our field mask logic to support sub-fields, which gets into some hairy string parsing (perhaps not all that hairy, as it could be as simple as split on '.', but is definitely a departure from what we've been doing).

Also, it might mean that we'd have to introduce a field mask to our load job methods. Based on the error message we're seeing, it sounds like it's possible to make updates to fields like policy tags from a load job.

field for field in table.schema if field.name in columns_and_indexes
# Field description and policy tags are not needed to
# serialize a data frame.
SchemaField(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the actual bug fix. Rather than populate all properties of schema field from the table schema, just populate the minimum we need to convert to parquet/CSV and then upload

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to revisit this for parameterization constraints, but that's a problem for future Tim.

Also, check that sent schema matches DataFrame order, not table order
}
if mode is not None:
self._properties["mode"] = mode.upper()
if description is not _DEFAULT_VALUE:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My default inclination would be for special handling for None values to happen at the places where it's significant, like when calling tables.update. It's also the case that schema fields can't be manipulated individually, so perhaps I'm simply just not thinking this through properly.

field for field in table.schema if field.name in columns_and_indexes
# Field description and policy tags are not needed to
# serialize a data frame.
SchemaField(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to revisit this for parameterization constraints, but that's a problem for future Tim.

@tswast
Copy link
Contributor Author

tswast commented Apr 26, 2021

For posterity, here is the error you get if you include policyTags in the schema, even if they aren't actually changed:

[1] Reason: 403 POST
https://bigquery.googleapis.com/upload/bigquery/v2/projects/YOUR-PROJECT-ID/jobs?uploadType=resumable:
Access Denied: Taxonomy projects/REDACTED/locations/eu/taxonomies/REDACTED: 
User does not have permission to get taxonomy projects/REDACTED/locations/eu/taxonomies/REDACTED\n'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants