Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove subquery from incremental models when no sql_where is specified #1351

Closed
ianterrell opened this issue Mar 14, 2019 · 3 comments · Fixed by #1408
Closed

Remove subquery from incremental models when no sql_where is specified #1351

ianterrell opened this issue Mar 14, 2019 · 3 comments · Fixed by #1408

Comments

@ianterrell
Copy link

ianterrell commented Mar 14, 2019

Issue

Issue description

Redshift Spectrum does not support (some?) nested queries. This includes a subquery used in the default incremental materialization.

The generated simple query for my incremental model looks like this:

  select
      "..." as surrogate_key
    , "etl-date" as etl_date
    , ...
 from spectrum.table
where "etl-date" < to_char(current_date, 'YYYY-MM-DD')
  and "etl_date" > coalesce((select max("etl_date") from "dev"."raw_table"), '0000-00-00')

With configuration in dbt_project.yml like so:

    raw:
      materialized: incremental
      dist: surrogate_key
      sort: ['etl_date']

Results

On runs with --full-refresh specified, the model and resulting incremental table is created and selected into as expected. On subsequent runs without --full-refresh Redshift complains that the nested query is not allowed. From logs/dbt.log:

Database Error in model raw_activity (models/raw/raw_activity.sql)
  Spectrum nested query error
  DETAIL:
    -----------------------------------------------
    error:  Spectrum nested query error
    code:      8001
    context:   Expression is not supported by Spectrum.
    query:     0
    location:  nested_query_rewriter.cpp:683
    process:   padbmaster [pid=7955]

Workaround

The nested query in question is on L61 of the incremental materialization.

       {% set tmp_table_sql -%}
         {# We are using a subselect instead of a CTE here to allow PostgreSQL to use indexes. -#}
         select * from (
           {{ sql }}
         ) as dbt_incr_sbq

         {% if sql_where %}
         where ({{ sql_where }})
           or ({{ sql_where }}) is null
         {% endif %}
       {%- endset %}

The subquery as dbt_incr_sbq appears unused in the case where sql_where is not present. If sql_where is not used, this issue can be worked around by creating a custom materialization that replaces the above with simply:

       {% set tmp_table_sql -%}
         {{ sql }}
       {%- endset %}

System information

(venv) $ dbt --version
installed version: 0.12.2
   latest version: 0.12.2

Up to date!
(venv) $ python --version
Python 2.7.10
(venv) $ sw_vers
ProductName:	Mac OS X
ProductVersion:	10.14.3
BuildVersion:	18D42
@drewbanin
Copy link
Contributor

Thanks for the report @ianterrell! I think your workaround sounds like a reasonable one, though the best solution is probably to omit the sql_where config in favor of using the new is_incremental() pattern.

Can you try omitting the sql_where flag in favor of the is_incremental() pattern, then try running the model incrementally? That should remove the invalid spectrum subquery pattern

@ianterrell
Copy link
Author

ianterrell commented Mar 27, 2019

Hi @drewbanin!

My queries were already written with is_incremental() and had omitted the sql_where flag.

That should remove the invalid spectrum subquery pattern

It appears to me that the dbt_incr_sbq subquery is generated regardless, and it's only the subsequent where clause that is dependent on the sql_where flag.

@drewbanin
Copy link
Contributor

Got it - thanks @ianterrell. Let's remove the subquery if no sql_where is specified. Thanks for the additional info!

@drewbanin drewbanin changed the title Incremental materialization does not work with Redshift Spectrum source table Remove subquery from incremental models when no sql_where is specified Mar 27, 2019
@drewbanin drewbanin added this to the Wilt Chamberlain milestone Mar 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants