[Feature Request]: Support table schema field addition with Storage Write API autoSchemaUpdate #27478
Closed
2 of 15 tasks
Labels
Milestone
What would you like to happen?
Auto-schema updates allow Storage API writes to adapt to BigQuery schema changes (e.g. required field relaxes to nullable). However, currently it doesn't support schema field additions (ie. a new column being added). Best to explain with an example:
Say we are writing to a table with schema 1: {field1, field2}
and while we are writing, the table updates to schema 2: {field1, field2, field3}
If we start writing TableRows that match schema 2, they will fail and go to the PCollection's DLQ. This fails not at the data insertion step, but when we convert the TableRows to proto prior to that using StorageApiDynamicDestinationsTableRow. This dynamic destinations object contains a
MessageConverter
that is instantiated once for each destination. The destination table's schema is fetched in the constructor of this MessageConverter object and is never updated later:beam/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiDynamicDestinationsTableRow.java
Lines 93 to 123 in 843e7fd
There should be a way to update the schema when appropriate.
Issue Priority
Priority: 3 (nice-to-have improvement)
Issue Components
The text was updated successfully, but these errors were encountered: