Skip to content

Commit

Permalink
Merge branch 'current' into joellabes-patch-1
Browse files Browse the repository at this point in the history
  • Loading branch information
mirnawong1 authored Jun 7, 2023
2 parents 4ce3096 + 9a4b9a9 commit 8db40ba
Show file tree
Hide file tree
Showing 280 changed files with 5,293 additions and 3,270 deletions.
33 changes: 33 additions & 0 deletions .github/workflows/autoupdate.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: Auto Update

on:
# This will trigger on all pushes to all branches.
push: {}
# Alternatively, you can only trigger if commits are pushed to certain branches, e.g.:
# push:
# branches:
# - current
# - unstable
jobs:
autoupdate:
name: autoupdate
runs-on: ubuntu-22.04
steps:
- uses: chinthakagodawita/autoupdate@v1.7.0
env:
GITHUB_TOKEN: "${{ secrets.GITHUB_TOKEN }}"
DRY_RUN: "false"
PR_FILTER: "labelled"
PR_LABELS: "auto update"
PR_READY_STATE: "all"
# EXCLUDED_LABELS: "do not merge"
MERGE_MSG: "This branch was auto-updated!"
RETRY_COUNT: "5"
RETRY_SLEEP: "300"
MERGE_CONFLICT_ACTION: "ignore"

- run: echo 'We found merge conflicts when updating this PR. Please fix them as soon as you can.'
if: ${{ steps.autoupdate.outputs.conflicted }}

- run: echo 'Good news! No merge conflicts this time around.'
if: ${{ !steps.autoupdate.outputs.conflicted }}
2 changes: 1 addition & 1 deletion contributing/adding-page-components.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## Using warehouse components

You can use the following components to provide code snippets for each supported warehouse. You can see a real-life example in the docs page [Initialize your project](/docs/quickstarts/dbt-cloud/databricks#initialize-your-dbt-project-and-start-developing).
You can use the following components to provide code snippets for each supported warehouse. You can see a real-life example in the docs page [Initialize your project](/quickstarts/databricks?step=6).

Identify code by labeling with the warehouse names:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ In short, a jaffle is:

*See above: Tasty, tasty jaffles.*

Jaffle Shop is a demo repo referenced in [dbt’s Getting Started Guide](/docs/quickstarts/overview), and its jaffles hold a special place in the dbt community’s hearts, as well as on Data Twitter™.
Jaffle Shop is a demo repo referenced in [dbt’s Getting Started Guide](/quickstarts), and its jaffles hold a special place in the dbt community’s hearts, as well as on Data Twitter™.

![jaffles on data twitter](/img/blog/2022-02-08-customer-360-view/image_1.png)

Expand Down
18 changes: 8 additions & 10 deletions website/blog/2022-05-24-joining-snapshot-complexity.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ This boils down to the following steps:
1. Get rid of dupes if needed
2. Snapshot your data tables
3. Future-proof your `valid_to` dates
4. Join all your tables to build a fanned out spine containing the grain ids onto which we will join the rest of the data
4. Join your non-matching grain tables to build a fanned out spine containing the grain ids onto which we will join the rest of the data
5. Join the snapshots to the data spine on the appropriate id in overlapping timespans, narrowing the valid timespans per row as more tables are joined
6. Clean up your columns in final <Term id="cte" />
7. Optional addition of global variable to filter to current values only
Expand Down Expand Up @@ -152,15 +152,15 @@ coalesce(dbt_valid_to, cast('{{ var("future_proof_date") }}' as timestamp)) as v

You will thank yourself later for building in a global variable. Adding important global variables will set your future-self up for success. Now, you can filter all your data to the current state by just filtering on `where valid_to = future_proof_date`*.* You can also ensure that all the data-bears with their data-paws in the data-honey jar are referencing the **same** `future_proof_date`, rather than `9998-12-31`, or `9999-12-31`, or `10000-01-01`, which will inevitably break something eventually. You know it will; don’t argue with me! Global vars for the win!

## Step 4: Join all your tables together to build a fanned out id spine
## Step 4: Join your tables together to build a fanned out id spine

:::important What happens in this step?
Step 4 walks you through how to do your first join, in which you need to fan out the data spine to the finest grain possible and to include all the id onto which we will join the rest of the data. This step is crucial to joining the snapshots in subsequent steps.
Step 4 walks you through how to do your first join, in which you need to fan out the data spine to the finest grain possible and to include the id onto which we will join the rest of the data. This step is crucial to joining the snapshots in subsequent steps.
:::

Let’s look at how we’d do this with an example. You may have many events associated with a single `product_id`. Each `product_id` may have several `order_ids`, and each `order_id` may have another id associated with it. Which means that the grain of each table needs to be identified. The point here is that we need to build in an id at the finest grain. To do so, we’ll add in a [dbt_utils.generate_surrogate_key](https://github.com/dbt-labs/dbt-utils/blob/main/macros/sql/generate_surrogate_key.sql) in the staging models that live on top of the snapshot tables.

Then, in your joining model, let’s add a CTE to build out our spine with all of our ids.
Then, in your joining model, let’s add a CTE to build out our spine with our ids of these different grains.

```sql
build_spine as (
Expand All @@ -178,7 +178,7 @@ left join
... )
```

The result will be all the columns from your first table, fanned out as much as possible by the added id columns. We will use these id columns to join the historical data from our tables.
The result will be all the columns from your first table, fanned out as much as possible by the added `id` columns. We will use these `id` columns to join the historical data from our tables. It is extremely important to note that if you have tables as part of this pattern that are captured at the same grain as the original table, you **do not** want to join in that table and id as part of the spine. It will fan-out _too much_ and cause duplicates in your data. Instead, simply join the tables with the same grain as the original table (in this case, `historical_table_1` on `product_id`) in the next step, using the macro.

| product_id | important_status | dbt_valid_from | dbt_valid_to | product_order_id |
| --- | --- | --- | --- | --- |
Expand Down Expand Up @@ -225,16 +225,14 @@ Your parameters are `cte_join`, the table that is creating the spine of your fin

from {{cte_join}}
left join {{cte_join_on}} on {{cte_join}}.{{cte_join_id}} = {{cte_join_on}}.{{cte_join_on_id}}
and (({{cte_join_on}}.{{cte_join_on_valid_from}} >= {{cte_join}}.{{cte_join_valid_from}}
and {{cte_join_on}}.{{cte_join_on_valid_from}} < {{cte_join}}.{{cte_join_valid_to}})
or ({{cte_join_on}}.{{cte_join_on_valid_to}} >= {{cte_join}}.{{cte_join_valid_from}}
and {{cte_join_on}}.{{cte_join_on_valid_to}} < {{cte_join}}.{{cte_join_valid_to}}))
and ({{cte_join_on}}.{{cte_join_on_valid_from}} <= {{cte_join}}.{{cte_join_valid_to}}
and {{cte_join_on}}.{{cte_join_on_valid_to}} >= {{cte_join}}.{{cte_join_valid_from}})


{% endmacro %}
```

The joining logic finds where the ids match and where the timestamps overlap between the two tables. We use the **greatest** `valid_from` and the **least** `valid_to` between the two tables to ensure that the new, narrowed timespan for the row is when the rows from both tables are valid.
The joining logic finds where the ids match and where the timestamps overlap between the two tables. We use the **greatest** `valid_from` and the **least** `valid_to` between the two tables to ensure that the new, narrowed timespan for the row is when the rows from both tables are valid. _**Update: Special thank you to Allyn Opitz for simplifying this join logic! It's so much prettier now.**_

You should see something like this as your end result:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ As a rule of thumb, you can consider that if your table partition length is less
When we designed ingestion partitioning table support with the dbt Labs team, we focused on ease of use and how to have seamless integration with incremental materialization.

One of the great features of incremental materialization is to be able to proceed with a full refresh. We added support for that feature and, luckily, `MERGE` statements are working as intended for ingestion-time partitioning tables. This is also the approach used by the [dbt BigQuery connector](/reference/warehouse-setups/bigquery-setup).
One of the great features of incremental materialization is to be able to proceed with a full refresh. We added support for that feature and, luckily, `MERGE` statements are working as intended for ingestion-time partitioning tables. This is also the approach used by the [dbt BigQuery connector](/docs/core/connect-data-platform/bigquery-setup).

The complexity is hidden in the connector and it’s very intuitive to use. For example, if you have a model with the following SQL:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ Before you can get started:
- You must have Python 3.8 or above installed
- You must have dbt version 1.3.0 or above installed
- You should have a basic understanding of [SQL](https://www.sqltutorial.org/)
- You should have a basic understanding of [dbt](https://docs.getdbt.com/docs/quickstarts/overview)
- You should have a basic understanding of [dbt](https://docs.getdbt.com/quickstarts)

### Step 2: Clone the repository

Expand Down
2 changes: 1 addition & 1 deletion website/blog/ctas.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@
header: "Just Getting Started?"
subheader: Check out guides on getting your warehouse set up and connected to dbt Cloud.
button_text: Learn more
url: https://docs.getdbt.com/docs/quickstarts/overview
url: https://docs.getdbt.com/quickstarts
66 changes: 52 additions & 14 deletions website/docs/docs/build/custom-schemas.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ title: "Custom schemas"
id: "custom-schemas"
---

## What is a custom schema?
By default, all dbt models are built in the schema specified in your target. In dbt projects with lots of models, it may be useful to instead build some models in schemas other than your target schema – this can help logically group models together.

For example, you may wish to:
Expand Down Expand Up @@ -52,7 +51,6 @@ models:
## Understanding custom schemas
### Why does dbt concatenate the custom schema to the target schema?
When first using custom schemas, it's common to assume that a model will be built in a schema that matches the `schema` configuration exactly, for example, a model that has the configuration `schema: marketing`, would be built in the `marketing` schema. However, dbt instead creates it in a schema like `<target_schema>_marketing` by default – there's a good reason for this!

In a typical setup of dbt, each dbt user will use a separate target schema (see [Managing Environments](/docs/build/custom-schemas#managing-environments)). If dbt created models in a schema that matches a model's custom schema exactly, every dbt user would create models in the same schema.
Expand All @@ -62,7 +60,10 @@ Further, the schema that your development models are built in would be the same
If you prefer to use different logic for generating a schema name, you can change the way dbt generates a schema name (see below).

### How does dbt generate a model's schema name?
Under the hood, dbt uses a macro called `generate_schema_name` to determine the name of the schema that a model should be built in. The code for the macro that expresses the current logic follows:

dbt uses a default macro called `generate_schema_name` to determine the name of the schema that a model should be built in.

The following code represents the default macro's logic:

```sql
{% macro generate_schema_name(custom_schema_name, node) -%}
Expand All @@ -83,26 +84,63 @@ Under the hood, dbt uses a macro called `generate_schema_name` to determine the

## Advanced custom schema configuration

You can customize schema name generation in dbt depending on your needs, such as creating a custom macro named `generate_schema_name` in your project or using the built-in macro for environment-based schema names. The built-in macro follows a pattern of generating schema names based on the environment, making it a convenient alternative.

If your dbt project has a macro that’s also named `generate_schema_name`, dbt will always use the macro in your dbt project instead of the default macro.

### Changing the way dbt generates a schema name
If your dbt project includes a macro that is also named `generate_schema_name`, dbt will _always use the macro in your dbt project_ instead of the default macro.

Therefore, to change the way dbt generates a schema name, you should add a macro named `generate_schema_name` to your project, where you can then define your own logic.
To modify how dbt generates schema names, you should add a macro named `generate_schema_name` to your project and customize it according to your needs:

- Copy and paste the `generate_schema_name` macro into a file named 'generate_schema_name'.

- Modify the target schema by either using [target variables](/reference/dbt-jinja-functions/target) or [env_var](/reference/dbt-jinja-functions/env_var). Check out our [Advanced Deployment - Custom Environment and job behavior](https://courses.getdbt.com/courses/advanced-deployment) course video for more details.

**Note**: dbt will ignore any custom `generate_schema_name` macros included in installed packages.

<details>
<summary>❗️ Warning: Don't replace <code>default_schema</code> in the macro.</summary>

If you're modifying how dbt generates schema names, don't just replace ```{{ default_schema }}_{{ custom_schema_name | trim }}``` with ```{{ custom_schema_name | trim }}``` in the ```generate_schema_name``` macro.

Removing ```{{ default_schema }}``` causes developers to overriding each other's models when custom schemas are defined. This can also cause issues during development and continuous integration (CI).

❌ The following code block is an example of what your code _should not_ look like:
```sql
{% macro generate_schema_name(custom_schema_name, node) -%}
{%- set default_schema = target.schema -%}
{%- if custom_schema_name is none -%}
{{ default_schema }}
{%- else -%}
# The following is incorrect as it omits {{ default_schema }} before {{ custom_schema_name | trim }}.
{{ custom_schema_name | trim }}
{%- endif -%}
{%- endmacro %}
Note: dbt ignores any custom `generate_schema_name` macros that are part of a package installed in your project.
```
</details>

### An alternative pattern for generating schema names
A frequently used pattern for generating schema names is to change the behavior based on dbt's environment, such that:

- In prod:
- If a custom schema is provided, a model's schema name should match the custom schema, rather than being concatenated to the target schema.
- If no custom schema is provided, a model's schema name should match the target schema.
A common way to generate schema names is by adjusting the behavior according to the environment in dbt. Here's how it works:

**Production environment**

- If a custom schema is specified, the schema name of a model should match the custom schema, instead of concatenating to the target schema.
- If no custom schema is specified, the schema name of a model should match the target schema.

**Other environments** (like development or quality assurance (QA)):

- In other environments (e.g. `dev` or `qa`):
- Build _all_ models in the target schema, as in, ignore custom schema configurations.
- Build _all_ models in the target schema, ignoring any custom schema configurations.

dbt ships with a global macro that contains this logic `generate_schema_name_for_env`.
dbt ships with a global, predefined macro that contains this logic - `generate_schema_name_for_env`.

If you want to use this pattern, you'll need a `generate_schema_name` macro in your project that points to this logic. You can do this by creating a file in your `macros` directory (we normally call it `get_custom_schema.sql`), and pasting in the following:
If you want to use this pattern, you'll need a `generate_schema_name` macro in your project that points to this logic. You can do this by creating a file in your `macros` directory (typically named `get_custom_schema.sql`), and copying/pasting the following code:

<File name='macros/get_custom_schema.sql'>

Expand Down
3 changes: 2 additions & 1 deletion website/docs/docs/build/incremental-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,8 @@ the reliability of your `unique_key`, or the availability of certain features.

* [Snowflake](/reference/resource-configs/snowflake-configs#merge-behavior-incremental-models): `merge` (default), `delete+insert` (optional), `append` (optional)
* [BigQuery](/reference/resource-configs/bigquery-configs#merge-behavior-incremental-models): `merge` (default), `insert_overwrite` (optional)
* [Spark](/reference/resource-configs/spark-configs#incremental-models): `append` (default), `insert_overwrite` (optional), `merge` (optional, Delta-only)
* [Databricks](/reference/resource-configs/databricks-configs#incremental-models): `append` (default), `insert_overwrite` (optional), `merge` (optional, Delta-only)
*[Spark](/reference/resource-configs/spark-configs#incremental-models): `append` (default), `insert_overwrite` (optional), `merge` (optional, Delta-only)

<VersionBlock firstVersion="1.3">

Expand Down
2 changes: 1 addition & 1 deletion website/docs/docs/build/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -490,7 +490,7 @@ You may find some pieces of functionality, like secondary calculations, complica
| Input | Example | Description | Required |
| ----------- | ----------- | ----------- | -----------|
| <VersionBlock firstVersion="1.2">metric_list</VersionBlock><VersionBlock lastVersion="1.1">metric_name</VersionBlock> | <VersionBlock firstVersion="1.2">`metric('some_metric)'`, <br />[`metric('some_metric)'`, <br />`metric('some_other_metric)'`]<br /></VersionBlock><VersionBlock lastVersion="1.1">`'metric_name'`<br /></VersionBlock> | <VersionBlock firstVersion="1.2">The metric(s) to be queried by the macro. If multiple metrics required, provide in list format.</VersionBlock><VersionBlock lastVersion="1.1">The name of the metric</VersionBlock> | Required |
| grain | `'day'`, `'week'`, <br />`'month'`, `'quarter'`, <br />`'year'`, `'all_time'`<br /> | The time grain that the metric will be aggregated to in the returned dataset | Required |
| grain | `'day'`, `'week'`, <br />`'month'`, `'quarter'`, <br />`'year'`<br /> | The time grain that the metric will be aggregated to in the returned dataset | Optional |
| dimensions | [`'plan'`,<br /> `'country'`] | The dimensions you want the metric to be aggregated by in the returned dataset | Optional |
| secondary_calculations | [`metrics.period_over_period( comparison_strategy="ratio", interval=1, alias="pop_1wk")`] | Performs the specified secondary calculation on the metric results. Examples include period over period calculations, rolling calculations, and period to date calculations. | Optional |
| start_date | `'2022-01-01'` | Limits the date range of data used in the metric calculation by not querying data before this date | Optional |
Expand Down
2 changes: 1 addition & 1 deletion website/docs/docs/build/models.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,4 @@ The top level of a dbt workflow is the project. A project is a directory of a `.

Your organization may need only a few models, but more likely you’ll need a complex structure of nested models to transform the required data. A model is a single file containing a final `select` statement, and a project can have multiple models, and models can even reference each other. Add to that, numerous projects and the level of effort required for transforming complex data sets can improve drastically compared to older methods.

Learn more about models in [SQL models](/docs/build/sql-models) and [Python models](/docs/build/python-models) pages. If you'd like to begin with a bit of practice, visit our [Getting Started Guide](/docs/quickstarts/overview) for instructions on setting up the Jaffle_Shop sample data so you can get hands-on with the power of dbt.
Learn more about models in [SQL models](/docs/build/sql-models) and [Python models](/docs/build/python-models) pages. If you'd like to begin with a bit of practice, visit our [Getting Started Guide](/quickstarts) for instructions on setting up the Jaffle_Shop sample data so you can get hands-on with the power of dbt.
Loading

0 comments on commit 8db40ba

Please sign in to comment.