Implement archival with blocks #1175

drewbanin · 2018-12-05T18:58:43Z

Feature

Feature description

Let's take archival out of configuration and redefine it using code. In so doing, users will be able to more flexibly control the semantics of their archival jobs.

Functionality to support:

all existing archive functionality (ie. archive by unique_key and updated_at).
archive from one database to another (#838)
archive records when a subset of fields have changed (#706)
archives should be ref'able
archives should be able to call macros

Proposed spec

{% archive your_archive_name_here() %}
  {{ config(
          target_database='<optional database name>', 
          target_schema='<schema name>',
          target_table='<table name>',
          strategy={'timestamp' | 'check'},

          -- always required
          unique_key='id',

          -- strategy == 'timestamp'
          updated_at='updated_at',

          -- strategy == 'check'
          check_cols=['object_status', 'object_name'],
  ) }}

select
    id,
    object_status,
    object_name,
    updated_at

from source_data.table_name

{% endarchive %}

Parameters:

{% archive {archive_name} %}. Use this name to ref the archive
config params:
- target_database: the destination database (if supported by the warehouse)
- target_schema: the destination schema
- target_table: the destination table
- unique_key: The column that uniquely identifies an entity in the query
- strategy: Can be one of timestamp or check
- timestamp: implements the existing behavior of archival. Requires an updated_at config
- check: dbt will compare the columns in check_cols to previous values for the unique_key. Archival will occur when these values change. If check_cols is set to "all", then db will check all columns in the table. (those exact semantics TBD)

Notes:

If there are a large number of "check cols", users can build a surrogate key in the SQL and use that as a single check_col
this archival destination table can be ref'd using the name specified in the archive block

Considerations:

We should provide a mechanism for migrating existing archives. How do we do that?
These archives will live in an archives/ dir by default. Users can change this with an archive-paths config in dbt_project.yml.
let's normalize the metadata column names as described in (#251)
Archives should be individually selectable on the CLI, and should support either FQN or Tag type selectors
Archives should be testable in schema.yml files

Who will this benefit?

Archive users

The text was updated successfully, but these errors were encountered:

drewbanin · 2018-12-05T18:58:53Z

cc @jthandy

drewbanin · 2018-12-05T19:02:31Z

cc @jtcohen6

jthandy · 2018-12-06T01:27:56Z

I really love this. If, in the process, we can also improve the logging of archival sql to the standard log that would make me incredibly happy. My guess is we will in the process of touching this code anyway.

Here are some thoughts:

do we really need archive-paths? these are in blocks...can't we just put them alongside of something else and have the compiler just parse all of the blocks together? seems like they could live in a macros folder... are you assuming that we'll just continue putting things in their own folders until we make everything blocks and then transition all at once?
what if we used the term "delta" instead of "check" for the strategy? "check" feels like a weird name of a strategy to me...
do archival names need to be unique within the same namespace as other "refable" things (i.e. models)? if so, should note that. the other alternative i had considered was having a sub-namespace like my_project.archives.archive_name. theoretically we could namespace all objects like this.

drewbanin · 2018-12-06T01:36:18Z

improve the logging of archival sql to the standard log

i don't actually know what you mean by that!

do we really need archive-paths?

yeah, my idea is very much that we'll be able to get rid of the "-paths" notion altogether once everything is defined in blocks. I will say: these paths can be overlapping, so you could just make a single directory that's specified as you macro-paths and archive-paths. We can't put models or custom data tests in there yet, but hopefully.... soon....

i'm into delta
archive names will indeed need to be globally unique, though I am keen to answer the larger "namespacing" question too. Will see if there's an opportunity to broach that topic with archival.

really great feedback 👍

jthandy · 2018-12-06T01:41:50Z

improve the logging of archival sql to the standard log

The last time I checked, archival didn't actually output the queries it was running against your warehouse to the standard dbt.log file. I'm rather used to having all queries logged there and it's made it hard for me to troubleshoot archival issues in the past that this sql isn't present there.

If this is no longer the case then 👍 but would be great if we could just do a super-quick audit of what log statements we have in the archival process. And maybe the archival sql should actually go to /target as well...?

joevandyk · 2019-01-11T21:22:43Z

I think I saw the archival sql statements being logged yesterday.

drewbanin · 2019-01-11T21:28:08Z

Yeah - these will be logged to dbt.log for sure. Will also be good to compile them to the target/ dir as @jthandy indicated above

drewbanin added the snapshots Issues related to dbt's snapshot functionality label Dec 5, 2018

drewbanin added this to the Wilt Chamberlain milestone Dec 5, 2018

drewbanin mentioned this issue Mar 6, 2019

Implement archival with a "merge" statement #1339

Closed

This was referenced Mar 23, 2019

snapshots should handle hard deletes #249

Closed

Add an archival strategy that operates by checking for column value diffs by PK #706

Closed

Allow archival of tables from one database.schema to a different database.schema #838

Closed

beckjake mentioned this issue Apr 3, 2019

Feature/archive blocks #1361

Merged

beckjake closed this as completed in #1361 Apr 26, 2019

graciegoheen mentioned this issue May 30, 2024

[Feature] No more jinja block for snapshots - new snapshot design ideas #10246

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement archival with blocks #1175

Implement archival with blocks #1175

drewbanin commented Dec 5, 2018 •

edited

Loading

drewbanin commented Dec 5, 2018

drewbanin commented Dec 5, 2018

jthandy commented Dec 6, 2018

drewbanin commented Dec 6, 2018 •

edited

Loading

jthandy commented Dec 6, 2018

joevandyk commented Jan 11, 2019

drewbanin commented Jan 11, 2019

Implement archival with blocks #1175

Implement archival with blocks #1175

Comments

drewbanin commented Dec 5, 2018 • edited Loading

Feature

Feature description

Who will this benefit?

drewbanin commented Dec 5, 2018

drewbanin commented Dec 5, 2018

jthandy commented Dec 6, 2018

drewbanin commented Dec 6, 2018 • edited Loading

jthandy commented Dec 6, 2018

joevandyk commented Jan 11, 2019

drewbanin commented Jan 11, 2019

drewbanin commented Dec 5, 2018 •

edited

Loading

drewbanin commented Dec 6, 2018 •

edited

Loading