Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add reference_file_schema_uri to LoadJobConfig, ExternalConfig #1399

Merged
merged 29 commits into from
Nov 14, 2022

Conversation

aribray
Copy link
Contributor

@aribray aribray commented Nov 4, 2022

Current behavior:

  • for load jobs from federated formats like AVRO, PARQUET, and ORC, BigQuery uses the schema of whichever file is lexicographically last.

Example:

source_uris = [
    "gs://{project}/{bucket_name}/c-file.avro", 
    "gs://{project}/{bucket_name}/b-file.avro",
    "gs://{project}/{bucket_name}/r-file.avro",
]

"gs://{project}/{bucket_name}/r-file.avro" is lexicographically last

New behavior:

  • The reference_file_schema_uri field allows users to specify the schema
  • The reference_file_schema_uri does not have to be a file from the source_uris list
  • To prevent data loss, the reference_file_schema_uri should be a superset of the schemas in the source_uris list

Googlers see 246809557

@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery API. labels Nov 4, 2022
@aribray aribray marked this pull request as ready for review November 4, 2022 15:56
@aribray aribray requested a review from a team November 4, 2022 15:56
@aribray aribray requested a review from a team as a code owner November 4, 2022 15:56
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Nov 4, 2022
Copy link
Contributor

@leahecole leahecole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah with the nits, I'm honestly torn. Use your best judgment - it's nbd if it's not changed.

tests/system/test_client.py Outdated Show resolved Hide resolved
tests/system/test_client.py Outdated Show resolved Hide resolved
@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: l Pull request size is large. labels Nov 9, 2022
@product-auto-label product-auto-label bot removed the size: m Pull request size is medium. label Nov 10, 2022
@aribray aribray added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 10, 2022
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 10, 2022
@aribray aribray added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 10, 2022
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 10, 2022
@aribray aribray added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 10, 2022
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 10, 2022
@aribray aribray added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 11, 2022
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 11, 2022
@aribray aribray added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 13, 2022
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 13, 2022
@aribray aribray merged commit 931285f into googleapis:main Nov 14, 2022
@aribray aribray deleted the aribray--federated-formats branch November 14, 2022 22:26
abdelmegahedgoogle pushed a commit to abdelmegahedgoogle/python-bigquery that referenced this pull request Apr 17, 2023
googleapis#1399)

* feat: add 'reference_file_schema_uri' to LoadJobConfig and ExternalConfig
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants