Skip to content

Commit

Permalink
New snapshot configs (#5817)
Browse files Browse the repository at this point in the history
## What are you changing in this pull request and why?

Closes #5805

[First round of updates] 
Adds notes to the `target_schema` and `target_database` that the configs
are optional
Snapshot page examples and tables updated with new examples and table
configs

## Checklist
<!--
Uncomment when publishing docs for a prerelease version of dbt:
- [ ] Add versioning components, as described in [Versioning
Docs](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#versioning-entire-pages)
- [ ] Add a note to the prerelease version [Migration
Guide](https://github.com/dbt-labs/docs.getdbt.com/tree/current/website/docs/docs/dbt-versions/core-upgrade)
-->
- [ ] Review the [Content style
guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md)
so my content adheres to these guidelines.
- [ ] For [docs
versioning](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#about-versioning),
review how to [version a whole
page](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version)
and [version a block of
content](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#versioning-blocks-of-content).
- [ ] Add a checklist item for anything that needs to happen before this
PR is merged, such as "needs technical review" or "change base branch."

Adding or removing pages (delete if not applicable):
- [ ] Add/remove page in `website/sidebars.js`
- [ ] Provide a unique filename for new pages
- [ ] Add an entry for deleted pages in `website/vercel.json`
- [ ] Run link testing locally with `npm run build` to update the links
that point to deleted pages

---------

Co-authored-by: Grace Goheen <53586774+graciegoheen@users.noreply.github.com>
Co-authored-by: Ly Nguyen <107218380+nghi-ly@users.noreply.github.com>
  • Loading branch information
3 people committed Jul 25, 2024
1 parent 76f8ab5 commit b08aff4
Show file tree
Hide file tree
Showing 12 changed files with 372 additions and 65 deletions.
8 changes: 8 additions & 0 deletions website/dbt-versions.js
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,14 @@ exports.versions = [
]

exports.versionedPages = [
{
"page": "/reference/resource-configs/target_database",
"lastVersion": "1.8",
},
{
"page": "/reference/resource-configs/target_schema",
"lastVersion": "1.8",
},
{
"page": "reference/global-configs/indirect-selection",
"firstVersion": "1.8",
Expand Down
185 changes: 184 additions & 1 deletion website/docs/docs/build/snapshots.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ This order is now in the "shipped" state, but we've lost the information about w

In dbt, snapshots are `select` statements, defined within a snapshot block in a `.sql` file (typically in your `snapshots` directory). You'll also need to configure your snapshot to tell dbt how to detect record changes.

<VersionBlock lastVersion="1.8">

<File name='snapshots/orders_snapshot.sql'>

```sql
Expand All @@ -58,6 +60,33 @@ select * from {{ source('jaffle_shop', 'orders') }}

</File>

</VersionBlock>

<VersionBlock firstVersion="1.9">

<File name='snapshots/orders_snapshot.sql'>

```sql
{% snapshot orders_snapshot %}

{{
config(
unique_key='id',
schema='snapshots',
strategy='timestamp',
updated_at='updated_at',
)
}}

select * from {{ source('jaffle_shop', 'orders') }}

{% endsnapshot %}
```

</File>

</VersionBlock>

:::info Preview or Compile Snapshots in IDE

It is not possible to "preview data" or "compile sql" for snapshots in dbt Cloud. Instead, run the `dbt snapshot` command in the IDE by completing the following steps.
Expand Down Expand Up @@ -107,6 +136,8 @@ select * from {{ source('jaffle_shop', 'orders') }}

5. Add configurations to your snapshot using a `config` block (more details below). You can also configure your snapshot from your `dbt_project.yml` file ([docs](/reference/snapshot-configs)).

<VersionBlock lastVersion="1.8">

<File name='snapshots/orders_snapshot.sql'>

```sql
Expand All @@ -132,9 +163,38 @@ select * from {{ source('jaffle_shop', 'orders') }}

6. Run the `dbt snapshot` [command](/reference/commands/snapshot) — for our example a new table will be created at `analytics.snapshots.orders_snapshot`. You can change the `target_database` configuration, the `target_schema` configuration and the name of the snapshot (as defined in `{% snapshot .. %}`) will change how dbt names this table.

</VersionBlock>

<VersionBlock firstVersion="1.9">

<File name='snapshots/orders_snapshot.sql'>

```sql
{% snapshot orders_snapshot %}

{{
config(
schema='snapshots',
unique_key='id',
strategy='timestamp',
updated_at='updated_at',
)
}}

select * from {{ source('jaffle_shop', 'orders') }}

{% endsnapshot %}
```

</File>

6. Run the `dbt snapshot` [command](/reference/commands/snapshot) &mdash; for our example, a new table will be created at `analytics.snapshots.orders_snapshot`. The [`schema`](/reference/resource-configs/schema) config will utilize the `generate_schema_name` macro.

</VersionBlock>

```
$ dbt snapshot
Running with dbt=0.16.0
Running with dbt=1.8.0
15:07:36 | Concurrency: 8 threads (target='dev')
15:07:36 |
Expand Down Expand Up @@ -179,6 +239,8 @@ The `timestamp` strategy requires the following configurations:

**Example usage:**

<VersionBlock lastVersion="1.8">

<File name='snapshots/orders_snapshot_timestamp.sql'>

```sql
Expand All @@ -200,6 +262,33 @@ The `timestamp` strategy requires the following configurations:

</File>

</VersionBlock>

<VersionBlock firstVersion="1.9">

<File name='snapshots/orders_snapshot_timestamp.sql'>

```sql
{% snapshot orders_snapshot_timestamp %}

{{
config(
schema='snapshots',
strategy='timestamp',
unique_key='id',
updated_at='updated_at',
)
}}

select * from {{ source('jaffle_shop', 'orders') }}

{% endsnapshot %}
```

</File>

</VersionBlock>

### Check strategy
The `check` strategy is useful for tables which do not have a reliable `updated_at` column. This strategy works by comparing a list of columns between their current and historical values. If any of these columns have changed, then dbt will invalidate the old record and record the new one. If the column values are identical, then dbt will not take any action.

Expand All @@ -220,6 +309,8 @@ The `check` snapshot strategy can be configured to track changes to _all_ column

**Example Usage**

<VersionBlock lastVersion="1.8">

<File name='snapshots/orders_snapshot_check.sql'>

```sql
Expand All @@ -241,6 +332,32 @@ The `check` snapshot strategy can be configured to track changes to _all_ column

</File>

</VersionBlock>

<VersionBlock firstVersion="1.9">

<File name='snapshots/orders_snapshot_check.sql'>

```sql
{% snapshot orders_snapshot_check %}

{{
config(
schema='snapshots',
strategy='check',
unique_key='id',
check_cols=['status', 'is_cancelled'],
)
}}

select * from {{ source('jaffle_shop', 'orders') }}

{% endsnapshot %}
```

</File>

</VersionBlock>

### Hard deletes (opt-in)

Expand All @@ -252,6 +369,8 @@ For this configuration to work with the `timestamp` strategy, the configured `up

**Example Usage**

<VersionBlock lastVersion="1.8">

<File name='snapshots/orders_snapshot_hard_delete.sql'>

```sql
Expand All @@ -274,11 +393,40 @@ For this configuration to work with the `timestamp` strategy, the configured `up

</File>

</VersionBlock>

<VersionBlock firstVersion="1.9">

<File name='snapshots/orders_snapshot_hard_delete.sql'>

```sql
{% snapshot orders_snapshot_hard_delete %}

{{
config(
schema='snapshots',
strategy='timestamp',
unique_key='id',
updated_at='updated_at',
invalidate_hard_deletes=True,
)
}}

select * from {{ source('jaffle_shop', 'orders') }}

{% endsnapshot %}
```

</File>

</VersionBlock>

## Configuring snapshots
### Snapshot configurations
There are a number of snapshot-specific configurations:

<VersionBlock lastVersion="1.8">

| Config | Description | Required? | Example |
| ------ | ----------- | --------- | ------- |
| [target_database](/reference/resource-configs/target_database) | The database that dbt should render the snapshot table into | No | analytics |
Expand All @@ -295,6 +443,30 @@ Snapshots can be configured from both your `dbt_project.yml` file and a `config`

Note: BigQuery users can use `target_project` and `target_dataset` as aliases for `target_database` and `target_schema`, respectively.

</VersionBlock>

<VersionBlock firstVersion="1.9">

| Config | Description | Required? | Example |
| ------ | ----------- | --------- | ------- |
| Config | Description | Required? | Example |
| ------ | ----------- | --------- | ------- |
| [database](/reference/resource-configs/database) | Specify a custom database for the snapshot | No | analytics |
| [schema](/reference/resource-configs/schema) | Specify a custom schema for the snapshot | No | snapshots |
| [alias](/reference/resource-configs/alias) | Specify an alias for the snapshot | No | your_custom_snapshot |
| [strategy](/reference/resource-configs/strategy) | The snapshot strategy to use. Valid values: `timestamp` or `check` | Yes | timestamp |
| [unique_key](/reference/resource-configs/unique_key) | A <Term id="primary-key" /> column or expression for the record | Yes | id |
| [check_cols](/reference/resource-configs/check_cols) | If using the `check` strategy, then the columns to check | Only if using the `check` strategy | ["status"] |
| [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at |
| [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source and set `dbt_valid_to` to current time if the record no longer exists | No | True |

In versions prior to v1.9, the `target_schema` (required) and `target_database` (optional) configurations defined a single schema or database to build a snapshot into across users and environments. This created problems when testing or developing a snapshot, as there was no clear separation between development and production environments. In v1.9, support was added for environment-aware snapshots by making `target_schema` optional. Snapshots, by default with no `target_schema` or `target_database` config defined, now resolve the schema or database to build the snapshot into using the `generate_schema_name` or `generate_database_name` macros. Developers can optionally define a custom location for snapshots to build to with the [`schema`](/reference/resource-configs/schema) and [`database`](/reference/resource-configs/database) configs, as is consistent with other resource types.

A number of other configurations are also supported (for example, `tags` and `post-hook`). For the complete list, refer to [Snapshot configurations](/reference/snapshot-configs).

You can configure snapshots from both the `dbt_project.yml` file and a `config` block. For more information, refer to the [configuration docs](/reference/snapshot-configs).

</VersionBlock>

### Configuration best practices
#### Use the `timestamp` strategy where possible
Expand All @@ -303,9 +475,20 @@ This strategy handles column additions and deletions better than the `check` str
#### Ensure your unique key is really unique
The unique key is used by dbt to match rows up, so it's extremely important to make sure this key is actually unique! If you're snapshotting a source, I'd recommend adding a uniqueness test to your source ([example](https://github.com/dbt-labs/jaffle_shop/blob/8e7c853c858018180bef1756ec93e193d9958c5b/models/staging/schema.yml#L26)).

<VersionBlock lastVersion="1.8">

#### Use a `target_schema` that is separate to your analytics schema
Snapshots cannot be rebuilt. As such, it's a good idea to put snapshots in a separate schema so end users know they are special. From there, you may want to set different privileges on your snapshots compared to your models, and even run them as a different user (or role, depending on your warehouse) to make it very difficult to drop a snapshot unless you really want to.

</VersionBlock>

<VersionBlock firstVersion="1.9">

#### Use a schema that is separate to your models' schema
Snapshots can't be rebuilt. Because of this, it's a good idea to put snapshots in a separate schema so end users know they're special. From there, you may want to set different privileges on your snapshots compared to your models, and even run them as a different user (or role, depending on your warehouse) to make it very difficult to drop a snapshot unless you really want to.

</VersionBlock>

## Snapshot query best practices

#### Snapshot source data.
Expand Down
27 changes: 27 additions & 0 deletions website/docs/docs/core/connect-data-platform/glue-setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -681,6 +681,9 @@ from events
group by 1
```
#### Iceberg Snapshot source code example

<VersionBlock lastVersion="1.8">

```sql

{% snapshot demosnapshot %}
Expand All @@ -699,6 +702,30 @@ select * from {{ ref('customers') }}

```

</VersionBlock>

<VersionBlock firstVersion="1.9">

```sql

{% snapshot demosnapshot %}

{{
config(
strategy='timestamp',
schema='jaffle_db',
updated_at='dt',
file_format='iceberg'
) }}

select * from {{ ref('customers') }}

{% endsnapshot %}

```

</VersionBlock>

## Monitoring your Glue Interactive Session

Monitoring is an important part of maintaining the reliability, availability,
Expand Down
21 changes: 0 additions & 21 deletions website/docs/faqs/Snapshots/snapshot-target-schema.md

This file was deleted.

4 changes: 2 additions & 2 deletions website/docs/guides/debug-schema-names.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ Now, re-read through the logic of your `generate_schema_name` macro, and mentall

You should find that the schema dbt is constructing for your model matches the output of your `generate_schema_name` macro.

Be careful. Snapshots do not follow this behavior, check out the docs on [target_schema](/reference/resource-configs/target_schema) instead.
Be careful. Snapshots do not follow this behavior if target_schema is set. To have environment-aware snapshots in v1.9+ or dbt Cloud, remove the [target_schema config](/reference/resource-configs/target_schema) from your snapshots. If you still want a custom schema for your snapshots, use the [`schema`](/reference/resource-configs/schema) config instead.

## Adjust as necessary

Expand All @@ -103,4 +103,4 @@ Now that you understand how a model's schema is being generated, you can adjust

If you change the logic in `generate_schema_name`, it's important that you consider whether two users will end up writing to the same schema when developing dbt models. This consideration is the reason why the default implementation of the macro concatenates your target schema and custom schema together — we promise we were trying to be helpful by implementing this behavior, but acknowledge that the resulting schema name is unintuitive.

</div>
</div>
Loading

0 comments on commit b08aff4

Please sign in to comment.