-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] New columns not populated with data in a snapshot #10088
Comments
Thanks for reporting this @elsander. I wasn't able to get the same result as you with dbt=1.6.5 and bigquery=1.6.9. Instead, I got this after the 2nd snapshot:
Could you try using the the files below and let me know if there's something that can be tweaked to see the scenario you are reporting?
|
@dbeatty10 that is truly bizarre; I used your example and replicated the same issue I reported. The only differences between your test and my replication of your test:
We are using slightly different microversions of DBT/BQ plugin, and you're probably using a somewhat different OS environment. My best guess would be that the OS environment or a microversion update are the root cause here. Unfortunately, I can't easily tweak these, because I use a Docker image that I am not the maintainer for. |
Thanks for trying this out @elsander 👍 The OS environment shouldn't make a difference here. I went ahead and adjusted my modified files
name: "my_project"
version: "1.0.0"
config-version: 2
profile: "bigquery"
snapshots:
my_project:
+target_database: "{{ target.database }}"
+target_schema: "{{ target.schema }}"
{% snapshot snapshot__snapshot_test %}
{{
config(
unique_key='id',
strategy='check',
invalidate_hard_deletes=True,
check_cols='all'
)
}}
SELECT *
FROM {{ ref('snapshot_test') }}
{% endsnapshot %} But when I tried this out with the same microversions as you, it still worked as expected for me:
To reduce as many confounding variables as possible, could you try using all the same files and commands as me and share the terminal output ?
|
@dbeatty10 Really bizarre-- I tried your version, using all of the same files and commands, and I got the behavior you did, where it's correctly updating the table with new data. The main difference I could at all imagine to be relevant is that I directly specified the target database and schema in the snapshot config. Here is the snapshot section of my dbt_project.yml in case it's helpful for debugging:
|
That is bizarre, but also good news that we're narrowing in! Could you try tweaking my version and see if you can get it to give the same unexpected results that you originally posted? If you can do that, then we'll have a solid reproducible example ("reprex"). |
Ok @dbeatty10, I spent some time trying to reproduce the issue. I was able to reproduce the issue with the following changes: dual.sql
snapshot_test__v1.sql
snapshot_test__v2.sql
Without both of those config options, I am not able to reproduce the issue. With them, I do. These match settings I had as default in my |
@elsander I still haven't been able to reproduce this. Do you get something different when you use the files and commands below? Do you see anything I've configured differently than the example you used to reproduce the issue? Project files and commands
SELECT DISTINCT
'123' as id,
1 as val_1
FROM {{ ref("dual") }}
UNION ALL
SELECT DISTINCT
'456' as id,
1 as val_1
FROM {{ ref("dual") }}
{{ config(schema="cohort_self_storage", materialized="table") }}
SELECT DISTINCT
'123' as id,
1 as val_1,
'test' as val_2
FROM {{ ref("dual") }}
UNION ALL
SELECT DISTINCT
'456' as id,
1 as val_1,
'test_2' as val_2
FROM {{ ref("dual") }}
select
id,
val_1,
cast({{ date_trunc('day', 'dbt_valid_from') }} as date) as dbt_valid_from,
cast({{ date_trunc('day', 'dbt_valid_to') }} as date) as dbt_valid_to
from {{ ref("snapshot__snapshot_test") }}
order by dbt_valid_from, id
select
id,
val_1,
val_2,
cast({{ date_trunc('day', 'dbt_valid_from') }} as date) as dbt_valid_from,
cast({{ date_trunc('day', 'dbt_valid_to') }} as date) as dbt_valid_to
from {{ ref("snapshot__snapshot_test") }}
order by dbt_valid_from, id Run these commands: dbt run-operation drop_table --args '{name: snapshot__snapshot_test }'
dbt run -s dual
cp snapshot_test__v1.sql models/snapshot_test.sql
dbt run -s models/snapshot_test.sql
dbt snapshot -s snapshots/snapshot__snapshot_test.sql
dbt show -s analyses/snapsphot_1.sql
cp snapshot_test__v2.sql models/snapshot_test.sql
dbt run -s models/snapshot_test.sql
dbt snapshot -s snapshots/snapshot__snapshot_test.sql
dbt show -s analyses/snapsphot_2.sql Get this output:
|
Yeah, I reran it today, with the same setup as before, and I couldn't reproduce it, either. I'm really at a loss. It was on the same commit where I'd replicated the issue before. It's really bizarre. |
I think my next step will be to see if others on my team can reproduce the issue as well from my original setup, to figure out if it's something about the DBT project config, or something specific to my computer/Docker container/??. |
@elsander I'm going to close this for now, but if you're able to reproduce let me know and we can re-open! |
Makes sense to me. I've spent some time working with folks on my end, and although we see the behavior sometimes, it has been difficult to track down a reproducible example outside our specific environment. |
Is this a new bug in dbt-core?
Current Behavior
I experienced an edge case where new columns get added to a snapshot, but are not populated with data.
Steps to reproduce:
check_cols='all'
After step 2, I had the expected snapshot with one row per unique id, populated with the data from the table in step 1. After step 4, I had the same snapshot, but with a new column added. The new column was populated entirely with nulls.
Expected Behavior
In step 4, I expect the snapshot to add a new row for each unique id, which contains all data in the table, including the values in the new column. Instead, no new rows were added to the snapshot, and the new column values were not populated in the snapshot.
One interesting note: if any previously snapshot values change (even in only one row), ALL unique ids get a new snapshot row with data from the new columns. So you would only encounter this problem if the previously snapshot data stays entirely static.
Steps To Reproduce
Linux 0584341454eb 5.15.49-linuxkit # 1 SMP Tue Sep 13 07:51:46 UTC 2022 x86_64 GNU/Linux
Python version 3.9.19
DBT version 1.6.13, bigquery plugin version 1.6.1
snapshot_test
, filling intbl
with some valid table for your environment:snapshot_test
:This results in a snapshot with only one row for each id, where the
val_2
column is present, but filled with null.Note that if you run the following in step 3 instead (note the update to one of the values in
val_1
):A new row will be added to the snapshot as expected, with non-null values for the new columns.
Relevant log output
No response
Environment
Linux 0584341454eb 5.15.49-linuxkit #1 SMP Tue Sep 13 07:51:46 UTC 2022 x86_64 GNU/Linux Python version 3.9.19 DBT version 1.6.13, bigquery plugin version 1.6.1
Which database adapter are you using with dbt?
bigquery
Additional Context
No response
The text was updated successfully, but these errors were encountered: