-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
state:modified picks up unchanged models if they have round brackets in their configs #2941
Comments
Nice find @XiaozhouWang85! Thanks for providing a thorough explication of the bug and a reproduction case. We made the data type of When dbt ultimately renders ipdb> unrendered
{'unique_key': 'my_key', 'materialized': 'table', 'cluster_by': ('my_cluster_id_1', 'my_cluster_id_2'), 'partition_by': {'field': 'partition_id', 'data_type': 'timestamp'}}
ipdb> other
{'unique_key': 'my_key', 'materialized': 'table', 'cluster_by': ['my_cluster_id_1', 'my_cluster_id_2'], 'partition_by': {'field': 'partition_id', 'data_type': 'timestamp'}}
ipdb> key
'cluster_by'
ipdb> unrendered[key] == other[key]
False I think our options are one of:
I'm leaning toward the former, for simplicity's sake. @kwigley I'd be curious to hear which of these you prefer. One other thought: It's obvious that, if you went in and changed the models:
+cluster_by: 'my_cluster_id_1' becomes models:
+cluster_by:
- my_cluster_id_1 It's not clear to me whether switching the representation of the same, single cluster key from string to list should mark the model as modified. If we pursue path #1, it would be a modification; if we pursue path #2, it wouldn't be. |
I just hit that issue where we have a custom materialization which takes a list of tuples as an argument like: deduplication_date_cols=[
("event_date", "desc"),
("extraction_date", "desc"),
("batch_date", "asc")
], Inside the config, which leads to be always picked up as a change. I changed it to deduplication_date_cols={
"event_date": "desc",
"extraction_date": "desc",
"batch_date": "asc"
}, and adapted my loops to be - {% for col, order in deduplication_date_cols %}
+ {% for col, order in deduplication_date_cols.items() %} @jtcohen6 what do you think about adding it as a caveat in https://docs.getdbt.com/reference/node-selection/state-comparison-caveats#false-positives ? |
@dmateusp There is a note at the bottom of that docs page that links to all open issues with the tag Frankly, I'd be even more interested in resolving this one, plain and simple. What do you think? Should we adapt the |
sounds good @jtcohen6 (re: fixing it rather than adding more docs) |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days. |
Describe the bug
When using defer / state to run changed models. A lot of models that are definitely unchanged are selected and run.
dbt run -m state:modified+ --defer --state ./
I've already read these and don't believe this is covered there:
After comparing production
manifest.json
to the one I'm generating locally I noticed that the only difference appears to be lists inconfig
of those models sometimes have different order. This problems only seems to occur when round brackets are used instead of square brackets.Steps To Reproduce
When defining config as below, state:modified selects unchanged models.
This problem seems to disappear once square brackets are used.
Expected behavior
There should not be a difference in behaviour between square and round brackets. Ideally
state:modified
should work correctly with both.Screenshots and log output
n/a
System information
Which database are you using dbt with?
The output of
dbt --version
:The operating system you're using:
MacOS / Linux
The output of
python --version
:Python 3.7.0
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: