Update docs for snapshot configuration #2900
Labels
content
Improvements or additions to content
improvement
Use this when an area of the docs needs improvement as it's currently unclear
Contributions
Link to the page on docs.getdbt.com requiring updates
https://docs.getdbt.com/docs/build/snapshots
What part(s) of the page would you like to see updated?
Things to know
Python
datetime
s delineate between timestamps that are "aware" vs. those that are "naive". Aware timestamps are able to represent a unique instant in time by explicitly storing the relevant UTC offset. Naive timestamps are unable to represent a unique instant in time (unless the offset is determined by "mutual agreement").An example of mutual agreement would be a producer and consumer both agreeing that a particular naive timestamp represents UTC. dbt has been a proponent of mutually agreeing that naive timestamps are implicitly in UTC and can be considered to represent a unique instant in time.
The timestamp data type of the
snapshot_get_time()
macro is often a "naive" data type rather than an "aware" one. This has implications when the data or configuration of a snapshot includes aware timestamps, especially as it relates to implicit data type conversion performed by the database.Overview
Here's some key pieces of information for us to communicate:
timestamp
check
snapshot_get_time()
macroConfiguration options
Specific things that are configurable:
updated_at
configinvalidate_hard_deletes
configupdated_at
configtimestamp
strategy:updated_at
is requiredupdated_at
must be a column name (at least for the Snowflake adapter, and possibly others as well); i.e., expressions will not work (which could be considered a bug to fix)check
strategy:updated_at
is optional (which is not clearly documented currently)updated_at
defaults to the expression given by thesnapshot_get_time()
macroupdated_at
may be either a column name or an expressionupdated_at
config is used to populate thedbt_valid_from
,dbt_valid_to
anddbt_updated_at
columns.invalidate_hard_deletes
configfalse
)timestamp
strategy:invalidate_hard_deletes
istrue
, thesnapshot_get_time()
macro is used to represent when the record ceased to be valid.check
strategy:invalidate_hard_deletes
istrue
, theupdated_at
config is used to represent when the record ceased to be validupdated_at
defaults to the expression given by thesnapshot_get_time()
macroThe valid from/to intervals could exhibit undesirable behavior if any of the following occurs:
updated_at
is not monotonically increasing for eachunique_key
value for each successive snapshot (e.g. out of order snapshot times)updated_at
has thetimestamp_ntz
data type, but it represents a local time zone other than UTCinvalidate_hard_deletes
istrue
and theupdated_at
column or expression is an aware timestamp, but thesnapshot_get_time()
macro is a naive timestampAdditional information
Related issue:
The text was updated successfully, but these errors were encountered: