Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is_incremental() macro should return false when the target relation is not "incremental" #1249

Closed
davehowell opened this issue Jan 18, 2019 · 8 comments · Fixed by #1608
Closed

Comments

@davehowell
Copy link

If the is_incremental() macro is used to write a SQL where clause, it doesn't really make sense to apply that where clause when the deployed model is a view. Or does it? I can't think of the use case.

The docs on understanding the is_incremental() macro also state that this applies to a destination table, not mentioning that it applies to a view:

The is_incremental() macro will return True if:

the destination table already exists in the database
dbt is not running in full-refresh mode

I propose a small modification to change this line:
https://github.com/fishtown-analytics/dbt/blob/3b0d14bd5d5805da10835fa522368152e048b4cd/core/dbt/include/global_project/macros/etc/is_incremental.sql#L8

To this
return(relation.type == relation.Table and not flags.FULL_REFRESH)

The relation should not only exist, it should be a table.

@drewbanin
Copy link
Contributor

Hey @davehowell - You're right that the is_incremental() macro isn't sensible to use inside of a view model, and my recommendation is therefore to not write code that uses the is_incremental() macro inside of view models!

Can you tell me a little bit more about behavior you're interested in seeing here? I'm not sure I totally understand who the intended benefactor of changing the is_incremental() logic would be.

@davehowell
Copy link
Author

Thanks for quick response Drew.

The docs suggest starting out with all models as views, and then later changing to table or incremental as required, which makes total sense.

I've started to do that, adding an if is_incremental() block to modify a model. So in this transition state, there is a view on the remote target, and I expected that until I switched the local config.materialized to incremental and the remote state switched from view to table that the block would be ignored. It surprised me that it doesn't matter what the local config is set to. It could be view or incremental or table and the only thing that matters is if the relation exists remotely (table or view) and the FULL-REFRESH flag hasn't been set.

The consequence is I have a view with an incremental block where clause. When I switch it to materialized='incremental' I will have a table with only the latest incremental data in it. Maybe that's fine because FULL-REFRESH is the only guarantee for an initial load.

If writing code that uses is_incremental() inside view models is the wrong way to use the tool then perhaps that should be disallowed? So instead of my original suggestion, testing config.get('materialized') == incremental` would be appropriate.

Maybe it's just me that's confused, and I have workarounds available. I can comment-out the incremental block until actually switching it to incremental, and I can use a different custom macro to get the behaviour I expected.

@davehowell
Copy link
Author

I've thought some more and it seems the intention is that it will ignore the `is_incremental()' block and do a full refresh either the very first time the model is deployed, or if the user explicitly asks to do a full refresh, and the existing code does do that. The behaviour I expect though is that it honours the config.materialized setting. I'm using this custom macro to achieve that.

{% macro is_incremental_extended() -%}
    {% if not execute %}
        {{ return(false) }}
    {% else %}
        {%- set materialized = config.get('materialized') -%}
        {# TODO: Next dbt release will include `database` in the args for adapter.get_relation() #}
        {%- set relation = adapter.get_relation(this.schema, this.table) -%}
        {{ return(materialized == 'incremental' and relation is not none and not flags.FULL_REFRESH) }}
    {% endif %}
{% endmacro %}

ps) I notice there's already a nice protection against accidentally dropping a table if you are going from incremental/table --> view in the bigquery__handle_existing_table macro. Thanks for including that protection.

@drewbanin
Copy link
Contributor

Thanks for the additional info @davehowell. Check out the previous discussion over here: #744

I think you're right, the name is_incremental() isn't perfectly descriptive of the macro logic. I tried a lot of different names, and I couldn't come up with one that was both 1) clear and 2) concise. We're replacing strings like

{% if adapter.already_exists(...) and not flags.FULL_REFRESH %}
...

so I was a little opposed to picking a macro name like:

{% if table_exists_and_not_full_refresh() %}
...

and instead was relatively happy with is_incremental(). I think it's important to note that the logic implemented by this macro is the same as the logic that folks would type out manually before 0.12.2. This is just some syntactic sugar.

Last thing to say about this: dbt allows users to override (or define their own) materializations. As such, I have a great opposition to hard-coding the name of a dbt materialization in a macro like this. Imagine you wrote a custom materialization that slightly modifies the incremental materialization logic, and suppose that you also called this materialization my_incremental. If you did that, the is_incremental_extended macro wouldn't work as expected.

So, all in all, I think what you're asking for is reasonable, but there are really tiny reasons why I'm not super bullish on making this change now. Happy to leave this issue open so other interested folks can agglomerate their opinions here.

Thanks!

@davehowell
Copy link
Author

Thanks for your considered response.

Overriding the built in materializations would be a recipe for confusing others used to the standard dbt behaviour, but assuming I did and named it 'incremental' then it would be my fault if it then didn't cooperate with the built in macros. If I wrote a custom materialization it would be reasonable to expect that it may not work with built in macros like is_incremental.

I see your point about avoiding the longer name for it, although it kind of already does have the name of a materialization hardcoded into the name of the macro itself so makes it hard to separate the two concepts.

I'm happy with my workaround, and as you've said the behaviour hasn't changed so other users will already be used to it. Avoiding impact to them is the best reason not to change it, so I'm happy to close this.

Reminds me of this.
There's 2 hard things in software engineering:

  1. Naming things
  2. Exactly once delivery
  3. Off by one errors

@drewbanin drewbanin changed the title is_incremental() macro should return false when the target relation is a View is_incremental() macro should return false when the target relation is not a relation May 23, 2019
@drewbanin drewbanin changed the title is_incremental() macro should return false when the target relation is not a relation is_incremental() macro should return false when the target relation is not "incremental" May 23, 2019
@drewbanin drewbanin reopened this May 23, 2019
@drewbanin
Copy link
Contributor

Had a chat with at-samk in the dbt Slack. This is probably a change worth making! Custom materialization implementers can handle this in a multitude of ways -it's ok to hard-code the incremental materialization name here

@davehowell
Copy link
Author

Oh nice, and I see #1292 implements what I first suggested in this issue.

I agree with "someone who goes through all the trouble of implementing a custom materialization could override the is_incremental() macro too".

I think implementing a custom materialization is a lot more complex than just making a new macro or overriding it.

@drewbanin
Copy link
Contributor

FYI: This is implemented in #1608

Should be going out in 0.14.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants