Schema_on_Change parameter causes model to fail #486

Christy1984 · 2024-12-18T12:10:23Z

Describe the bug

I am running the a model with following configurations (note file_format is Iceberg):

{{ config(
    materialized='incremental',
    incremental_strategy='append',  
    on_schema_change='append_new_columns',
    partition_by=['extract_date'],
    pre_hook="""
    {{ delete_if_table_exists(this, delete_date=var('run_date', 'date_sub(current_date(), 1)')) }}
    """
) }}

When running this i get the error message:

lue adapter: Glue returned `error` for statement None for code SqlWrapper2.execute('''describe lindy_raw.raw_pwh_actors_tmp''', use_arrow=False, location='s3://sb1-prod-dev-transaction-prod-dev-data'), AnalysisException: [TABLE_OR_VIEW_NOT_FOUND] The table or view `lindy_raw`.`raw_pwh_actors_tmp` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS.; line 1 pos 9;

I assume that this worked in 1.8.6 because it was completely ignored, but now is taken into consideration.

When deleting on_schema_change='append_new_columns' from the configuration it works again, but i get a new error see issue 487.

The text was updated successfully, but these errors were encountered:

bhawesh490 · 2025-01-30T05:35:24Z

Hi any update on this?

Christy1984 · 2025-02-10T12:37:19Z

1. Issue with Temporary Table Creation when using `schema_on_change`

When running with schema_on_change, one of the first steps is to generate a temporary table. This occurs in the following adapter macros:

{% macro glue__create_table_as(temporary, relation, sql) -%}
  {% if temporary -%}  {# This needs to be changed: works if {% if temporary == true -%}, input is too flexible #}
    {{ create_temporary_view(relation, sql) }}
  {%- else -%}
    {%- set file_format = config.get('file_format', validator=validation.any[basestring]) -%}
    {%- set table_properties = config.get('table_properties', default={}) -%}

    {%- set create_statement_string -%}
      {% if file_format in ['delta', 'iceberg'] -%}
        create or replace table
      {%- else -%}
        create table
      {% endif %}
    {%- endset %}

    {{ create_statement_string }} {{ relation }}  {# This needs to contain a schema reference #}
    {% set contract_config = config.get('contract') %}
    {% if contract_config.enforced %}
      {{ get_assert_columns_equivalent(sql) }}
      {# This does not enforce constraints and needs to be addressed #}
      {# We'll need to modify the query because with CREATE TABLE AS SELECT, columns are not explicitly specified #}
    {% endif %}
    
    {{ glue__file_format_clause() }}
    {{ partition_cols(label="partitioned by") }}
    {{ clustered_cols(label="clustered by") }}
    {{ set_table_properties(table_properties) }}
    {{ glue__location_clause() }}
    {{ comment_clause() }}
    as
    {{ sql }}
  {%- endif %}
{%- endmacro -%}

Temporary Physical Table Creation:

{% macro glue__create_temporary_view(relation, sql) -%}
  create or replace table glue_catalog.{{ relation.include(schema=true) }} as  {# Added glue_catalog and schema reference #}
    {{ sql }}
{% endmacro %}

Loading Data from the Temporary Table:

{% macro get_insert_into_sql(source_relation, target_relation) %}
    {%- set dest_columns = adapter.get_columns_in_relation(target_relation) -%}
    {%- set dest_cols_csv = dest_columns | map(attribute='name') | join(', ') -%}
    insert into table {{ target_relation }}
    select {{ dest_cols_csv }}
    from glue_catalog.{{ source_relation.schema }}.{{ source_relation }}  {# Added glue_catalog and schema reference #}
{% endmacro %}

Problem Explanation:

A key issue lies in the condition {% if temporary -%}. This results in the creation of only a temporary view, which Iceberg cannot utilize to compare columns. This causes failures when identifying new columns.

When creating a physical temporary table instead, the insert into statement still references the temporary view, causing conflicts. This logic needs to be revisited to ensure consistent handling based on whether the table is temporary or permanent.

2. Execution Fails on First Run When Adding a New Column but Works on Subsequent Runs

When adding a new column, the first run fails immediately after the column is added. This seems to occur because the ALTER TABLE ADD COLUMN command does not return metadata in the subsequent call:

Glue adapter: GlueCursor fetchall results=[]

In contrast, calls like SHOW TABLES IN glue_catalog.lindy1_raw LIKE 'xxx' return:

GlueCursor fetchall results=['namespace', 'tableName', 'isTemporary']

This indicates that after adding a column, metadata needs to be refreshed to reflect the changes. This likely requires a new fetch or refresh operation in the Python adapter, which cannot be solved through macros alone.

On subsequent runs, since both the metadata and table structure are now consistent, execution proceeds without issues.

3. Adding New Columns Only at the End of a Table (Iceberg Constraint)

The following macro needs to be extended to address Iceberg’s requirement that new columns can only be added at the end of a table:

{% macro default__alter_relation_add_remove_columns(relation, add_columns, remove_columns) %}

To meet this constraint, use the syntax:

ALTER TABLE table_name ADD COLUMN new_column AFTER existing_column;

This will ensure that columns are appended correctly per Iceberg’s requirements.

4. Optional: Support for Data Type Changes in Iceberg (If Supported)

In many cases, changing data types doesn’t result in data loss. Where Iceberg supports such changes, dbt-glue should accommodate them.

This can be achieved by maintaining a dictionary mapping which data type changes are allowed. For instance:

supported_type_changes = {
  "int": ["bigint", "float"],
  "float": ["double"],
  "string": ["text", "varchar"]
}

A second phase, this could involve implementing a add/drop logic to handle more complex data type changes based on user preferences (even if data is lost). I can provide a detailed specification for this if needed.

Christy1984 added the bug Something isn't working label Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schema_on_Change parameter causes model to fail #486

Schema_on_Change parameter causes model to fail #486

Christy1984 commented Dec 18, 2024 •

edited

Loading

bhawesh490 commented Jan 30, 2025

Christy1984 commented Feb 10, 2025 •

edited

Loading

Schema_on_Change parameter causes model to fail #486

Schema_on_Change parameter causes model to fail #486

Comments

Christy1984 commented Dec 18, 2024 • edited Loading

Describe the bug

bhawesh490 commented Jan 30, 2025

Christy1984 commented Feb 10, 2025 • edited Loading

1. Issue with Temporary Table Creation when using schema_on_change

Temporary Physical Table Creation:

Loading Data from the Temporary Table:

Problem Explanation:

2. Execution Fails on First Run When Adding a New Column but Works on Subsequent Runs

3. Adding New Columns Only at the End of a Table (Iceberg Constraint)

4. Optional: Support for Data Type Changes in Iceberg (If Supported)

Christy1984 commented Dec 18, 2024 •

edited

Loading

Christy1984 commented Feb 10, 2025 •

edited

Loading

1. Issue with Temporary Table Creation when using `schema_on_change`