Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added dbt incremental macro to default_config.cfg #363

Merged
merged 2 commits into from
Nov 26, 2020
Merged

Added dbt incremental macro to default_config.cfg #363

merged 2 commits into from
Nov 26, 2020

Conversation

azhard
Copy link
Contributor

@azhard azhard commented May 20, 2020

Currently when using the is_incremental dbt macro inside a SQL file, sqlfluff gets confused such as in the following file:

SELECT
  *
FROM {{ source('dbt', 'stage') }}
{% if is_incremental() %}
WHERE ts > (
  SELECT
    MAX(ts)
  FROM {{ this }}
)
{% endif %}

This change sets the is_incremental macro to always evaluate to true so that the full SQL in the file can be evaluated.

@alanmcruickshank
Copy link
Member

@azhard - I like the thinking here. There's already an implementation of is_incremental however in a different place (which I accept is a bit crufty already), which defaults to False (check out templaters.py).

That other implementation is there to allow these macros to be used in macro folders. I think we might need a refactor of some of this code more broadly.

I'm curious for your views on what we should do with the is_incremental macro, because whether we choose True or False, we always miss some logic. I think we should have the option to automatically both the True path and the False path. Do you think that might be useful? If so, any ideas on how we might do it?

@azhard
Copy link
Contributor Author

azhard commented May 21, 2020

Hm that's actually a really good point about wanting to test the logic for both True and False. Maybe you want to have a list of boolean "conditionals" and if one of those conditionals exist in your file, you run the file once for each condition.

Then on pass / fail your output would be something like

file.sql__condition_true (100%)
file.sql__condition_false (0%)

Of course I'm sure you'd inevitably end up with a scenario where there are multiple conditions used per file which gets more complicated but can potentially still be handled by looping through all permutations. Output for that case might be more like:

file.sql__condition1_true_condition2_true (100%)
file.sql__condition1_true_condition2_false (100%)
file.sql__condition1_false_condition2_true (100%)
file.sql__condition1_false_condition2_false (100%)

Also not familiar enough with the parser to know if something like that is even feasible but those are some initial thoughts

@alanmcruickshank
Copy link
Member

I totally agree with your intent of having multiple runs of the files. On whether it's feasible that's an excellent question!

I'm thinking more and more that trying to "mock" the dbt compile step is more and more of a headache to maintain. Maybe we should just use the dbt compile step from dbt itself (and then use all the dependencies and modules that people have installed)? In this case we would do two compiles, one for incremental and one for not.

Thoughts?

@alanmcruickshank alanmcruickshank added the question Further information is requested label Aug 3, 2020
@azhard
Copy link
Contributor Author

azhard commented Aug 24, 2020

I like that idea a lot! My only concern would be with macros where dbt adds SQL which might not conform to the sqlfluff linting rules eg.
Macro

{% macro cast_columns(columns) %}
{% for col in columns %}
CAST({{ col }} AS STRING) {{ col }},
{% endfor %}
{% endmacro %}

SQL

SELECT
    col1,
    col2,
    {{ cast_columns([col3, col4, col5]) }}
    col 6
...

If you use dbt and compile this you might end up with linting errors around extra line breaks or incorrect tabbing

@alanmcruickshank
Copy link
Member

@azhard - I think you're totally right. In my head we have a few options:

  1. Consider "it's still SQL", and lint it anyway. You're right that this might make some people mad because there will be errors caused that they're unable to fix. For some macros it might be impossible to have nicely formatted macros and nicely formatted output SQL.
  2. Ignore any templated block. This is the most lenient, and potentially lets through some bugs. Worst case if there's a very large looped block, it would be totally skipped and that might not be a desired outcome.
  3. Somehow match up sql within the macro and the sql after the macro and selectively lint it based on some more complicated rules. This is probably the best, but definitely the hardest to implement.

Which do you think would be the most sensible to start with?

@azhard
Copy link
Contributor Author

azhard commented Sep 1, 2020

I'm inclined to pick a hybrid of 2 to start although 3 would definitely be a great long-term solution. My thinking against 1 is like you mentioned in some cases it might be impossible to have the formats match as well as the fact that if 3 is the ideal long term goal, people might spend unnecessary time cleaning up macros which becomes redundant after 3 is release.

Allowing bugs to go through with 2 is a bit iffy as people are probably reliant on sqlfluff for some level of validation so instead of fully ignoring errors from templated blocks, what if you instead introduced warnings. You can add a caveat that sqlfluff does not currently fully support jinja templating but will still warn you (but not block you) about potential bugs although there might be some false positives

@pwildenhain pwildenhain added the templating Fix or enhance templating capabilities label Oct 4, 2020
@dmateusp
Copy link
Contributor

In #508 we are implementing a templater that re-uses the dbt compiler

In #541 there is mapping between the source "pre-templating" and "post-templating" code, which should allow work on downgrading or silencing errors coming from templated code (#563)

From #554 @alanmcruickshank you say there's still potentially value in supporting the JinjaTemplater (and Built-in macros) so this PR should still be merged!

On the following:

In this case we would do two compiles, one for incremental and one for not.

I'm wondering if we wouldn't get into a lot of complexity. For example if there are 2 different if blocks in the code that use 2 different conditionals. Are we now testing TRUE FALSE, TRUE TRUE, FALSE TRUE, FALSE FALSE? What about if the conditional block is made of multiple if and elif statements?

@alanmcruickshank alanmcruickshank merged commit a9d2d5d into sqlfluff:master Nov 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested templating Fix or enhance templating capabilities
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants