dbt-labs · gwenwindflower · Jul 5, 2023 · Jul 7, 2022 · Jul 7, 2022 · Jul 8, 2022
@@ -0,0 +1,27 @@
+---
+title: How we style our dbt projects
+id: 0-how-we-style-our-dbt-projects
+---
+
+## Why does style matter?
+
+Style might seem like a trivial, surface-level issue, but in fact it's a deeply material aspect of a well-built project. Consistent, clear style enhances readability, which in turn makes it easier to understand and maintain your project. Highly readable code will make it easier for you to build clear mental models of your project, which will make it easier for you to debug and extend your project. It's not just a favor to yourself though, equally importantly, it makes it easier for others to understand and contribute to your project, essential for peer collaboration, open source work, and onboarding new team members.
+
+## What's important about style?
+
+There are two crucial tenets of code style:
+
+- Clarity
+- Consistency
+
+Style your code in such a way that you can read your code quickly and understand it. It's also important to consider code review and git diffs. If you're making a change to a model, you want reviewers to be able to see just the material changes your making clearly.
+
+Once you've established a clear style, stay consistent. This is the most important thing. Everybody on your team needs to have a unified style, which is why having a style guide is so crucial. If you're writing a model, you should be able to look at other models in the project that your teammates have written and read the same style. If you're writing a macro or a test, you should see the same style as your models. Consistency is key.
+
+## How should I style?
+
+You should style the project in the way you and your teammates or collaborators agree on. The most important thing is that you have a style guide and that you stick to it. This guide is just a suggestion to get you started and to give you a sense of what a style guide might look like. It covers various areas you may want to consider, with suggested rules. It emphasizes lots of whitespace, clarity, clear naming, and comments. We belive one of the strengths of SQL is that it reads like English, so we lean into that declarative nature throughout our projects. Even within dbt Labs though, there are differing opinions on how to style, even a small but passionate contigent of leading comma enthusiasts! So again, the important thing is not to use to follow this style guide, is to make _your_ style guide and follow it.
+
+## Automation
+
+Use formatters and linters as much as possible. We're all human, we make mistakes. Not only that, but we all have different preferences and opinions while writing code. Automation is a great way to ensure that your project is styled consistently and correctly and that people can write in a way that's quick and comfortable for them, while still getting perfectly consistent output.
@@ -0,0 +1,17 @@
+---
+title: How we style our dbt models
+id: 1-how-we-style-our-dbt-models
+---
+
+## Fields and model names
+
+- Each model should have a primary key.
+- The primary key of a model should be named `<object>_id`, e.g. `account_id` – this makes it easier to know what `id` is being referenced in downstream joined models.
+- Booleans should be prefixed with `is_` or `has_`.
+- Timestamp columns should be named `<event>_at`, e.g. `created_at`, and should be in UTC. If a different timezone is being used, this should be indicated with a suffix, e.g `created_at_pt`.
+- Price/revenue fields should be in decimal currency (e.g. `19.99` for $19.99; many app databases store prices as integers in cents). If non-decimal currency is used, indicate this with suffix, e.g. `price_in_cents`.
+- Avoid reserved words as column names
+- Consistency is key! Use the same field names across models where possible, e.g. a key to the `customers` table should be named `customer_id` rather than `user_id` or 'id'.
+- Schema, table and column names should be in `snake_case`.
+- Use names based on the _business_ terminology, rather than the source terminology. For example, if the source database uses `user_id` but the business calls them `customer_id`, use `customer_id` in the model.
+- Versions of models should use the suffix `_v1`, `_v2`, etc for consistency, e.g. `customers_v1`, `customers_v2`.
@@ -0,0 +1,179 @@
+---
+title: How we style our SQL
+id: 2-how-we-style-our-sql
+---
+
+## Basics
+
+- Use [SQLFluff](https://sqlfluff.com/) to maintain these style rules automatically.
+  - Reference this [SQLFluff config file](https://github.com/dbt-labs/jaffle-shop-template/blob/main/.sqlfluff) for the rules we use.
+- Use Jinja comments (`{# #}`) for comments that should not be included in the compiled SQL.
+- Use trailing commas.
+- Indents should be four spaces.
+- Lines of SQL should be no longer than 80 characters.
+- Field names, keywords, and function names should all be lowercase.
+- The `as` keyword should be used explicitly when aliasing a field or table.
+
+## Fields, aggregations, and grouping
+
+- Fields should be stated before aggregates and window functions.
+- Aggregations should be executed as early as possible (on the smallest data set possible) before joining to another table to improve performance.
+- Ordering and grouping by a number (eg. group by 1, 2) is preferred over listing the column names (see [this classic rant](https://blog.getdbt.com/write-better-sql-a-defense-of-group-by-1/) for why). Note that if you are grouping by more than a few columns, it may be worth revisiting your model design.
+
+## Joins
+
+- Prefer `union all` to `union` unless you explicitly want to remove duplicates.
+- Avoid table aliases in join conditions (especially initialisms) — it's harder to understand what the table called "c" as compared to "customers".
+- If joining two or more tables, _always_ prefix your column names with the table alias. If only selecting from one table, prefixes are not needed.
+- Be explicit about your join type (i.e. write `inner join` instead of `join`).
+- Always move left to right to make joins easy to reason about - `right joins` often indicate that you should change which table you select `from` and which one you `join` to.
+
+## 'Import' CTEs
+
+- All `{{ ref('...') }}` statements should be placed in CTEs at the top of the file.
+- 'Import' CTEs should be named after the table they are referencing.
+- Limit the data scanned by CTEs as much as possible. Only select the columns you're actually using and use `where` clauses to filter out unneeded data.
+- For example:
+
+```sql
+with
+
+orders as (
+
+    select
+        order_id,
+        customer_id,
+        order_total,
+        order_date
+
+    from {{ ref('orders') }}
+
+    where order_date >= '2020-01-01'
+
+)
+```
+
+## 'Functional' CTEs
+
+- Where performance permits, CTEs should perform a single, logical unit of work.
+- CTE names should be as verbose as needed to convey what they do e.g. `events_joined_to_users` instead of `user_events` (this could be a good model name, but does not describe a specific function or transformation).
+- CTEs that are duplicated across models should be pulled out into their own intermediate models. Look out for chunks of repeated logic that should be refactored into their own model.
+
+## Model configuration
+
+- Model-specific attributes (like sort/dist keys) should be specified in the model.
+- If a particular configuration applies to all models in a directory, it should be specified in the `dbt_project.yml` file.
+
+- In-model configurations should be specified like this:
+
+```sql
+{{
+    config(
+      materialized = 'table',
+      sort = 'id',
+      dist = 'id'
+    )
+}}
+```
+
+## Example SQL
+
+```sql
+with
+
+events as (
+
+    ...
+
+),
+
+{# CTE comments go here #}
+filtered_events as (
+
+    ...
+
+)
+
+select * from filtered_events
+```
+
+### Example SQL
+
+```sql
+with
+
+my_data as (
+
+    select
+        field_1,
+        field_2,
+        field_3,
+        cancellation_date,
+        expiration_date,
+        start_date
+
+    from {{ ref('my_data') }}
+
+),
+
+some_cte as (
+
+    select
+        id,
+        field_4,
+        field_5
+
+    from {{ ref('some_cte') }}
+
+),
+
+some_cte_agg as (
+
+    select
+        id,
+        sum(field_4) as total_field_4,
+        max(field_5) as max_field_5
+
+    from some_cte
+
+    group by 1
+
+),
+
+joined as (
+
+    select
+        my_data.field_1,
+        my_data.field_2,
+        my_data.field_3,
+
+        -- use line breaks to visually separate calculations into blocks
+        case
+            when my_data.cancellation_date is null
+                and my_data.expiration_date is not null
+                then expiration_date
+            when my_data.cancellation_date is null
+                then my_data.start_date + 7
+            else my_data.cancellation_date
+        end as cancellation_date,
+
+        some_cte_agg.total_field_4,
+        some_cte_agg.max_field_5
+
+    from my_data
+
+    left join some_cte_agg
+        on my_data.id = some_cte_agg.id
+
+    where my_data.field_1 = 'abc' and
+        (
+            my_data.field_2 = 'def' or
+            my_data.field_2 = 'ghi'
+        )
+
+    having count(*) > 1
+
+)
+
+select * from joined
+```
@@ -0,0 +1,12 @@
+---
+title: How we style our python
+id: 3-how-we-style-our-python
+---
+
+## Python tooling
+
+- Python has a more mature and robust ecosystem for formatting and linting (helped by the fact that it doesn't have a million distinct dialects), we recommend using those tools to format and lint your code in the style you prefer.
+
+- Our current recommendations are
+  - [black](https://pypi.org/project/black/) formatter
+  - [ruff](https://pypi.org/project/ruff/) linter
@@ -0,0 +1,38 @@
+---
+title: How we style our Jinja
+id: 4-how-we-style-our-jinja
+---
+
+## Jinja style guide
+
+- When using Jinja delimiters, use spaces on the inside of your delimiter, like `{{ this }}` instead of `{{this}}`
+- Use newlines to visually indicate logical blocks of Jinja.
+- Indent 4 spaces into a Jinja block to indicate visually that the code inside is wrapped by that block.
+- Don't worry about Jinja whitespace control, focus on your project code being readable. The time you save by not worrying about whitespace control will far outweigh the time you spend in your compiled code where it might not be perfect.
+
+## Examples of Jinja style
+
+```jinja
+{% macro make_cool(uncool_id) %}
+
+    do_cool_thing({{ uncool_id }})
+
+{% endmacro %}
+```
+
+````sql
+select
+    entity_id,
+    entity_type,
+    {% if this %}
+
+        {{ that }},
+
+    {% else %}
+
+        {{ the_other_thing }},
+
+    {% endif %}
+    {{ make_cool('uncool_id') }} as cool_id
+```
+````
@@ -0,0 +1,38 @@
+---
+title: How we style our YAML
+id: 5-how-we-style-our-yaml
+---
+
+- Indents should be two spaces
+- List items should be indented
+- Use a new line to separate list items that are dictionaries where appropriate
+- Lines of YAML should be no longer than 80 characters.
+- Use the [dbt JSON schema](https://github.com/dbt-labs/dbt-jsonschema) with any compatible IDE and a YAML formatter (we recommend [Prettier](https://github.com/dbt-labs/dbt-jsonschema)) to validate your YAML files and format them automatically.
+
+### Example YAML
+
+```yaml
+version: 2
+
+models:
+  - name: events
+    columns:
+      - name: event_id
+        description: This is a unique identifier for the event
+        tests:
+          - unique
+          - not_null
+
+      - name: event_time
+        description: "When the event occurred in UTC (eg. 2018-01-01 12:00:00)"
+        tests:
+          - not_null
+
+      - name: user_id
+        description: The ID of the user who recorded the event
+        tests:
+          - not_null
+          - relationships:
+              to: ref('users')
+              field: id
+```
@@ -282,7 +282,10 @@ const sidebarSettings = {
           type: "category",
           label: "Model governance",
           collapsed: true,
-          link: { type: "doc", id: "docs/collaborate/govern/about-model-governance" },
+          link: {
+            type: "doc",
+            id: "docs/collaborate/govern/about-model-governance",
+          },
           items: [
             "docs/collaborate/govern/model-access",
             "docs/collaborate/govern/model-contracts",
@@ -735,6 +738,21 @@ const sidebarSettings = {
             "guides/best-practices/how-we-structure/5-the-rest-of-the-project",
           ],
         },
+        {
+          type: "category",
+          label: "How we style our dbt projects",
+          link: {
+            type: "doc",
+            id: "guides/best-practices/how-we-style/0-how-we-style-our-dbt-projects",
+          },
+          items: [
+            "guides/best-practices/how-we-style/1-how-we-style-our-dbt-models",
+            "guides/best-practices/how-we-style/2-how-we-style-our-sql",
+            "guides/best-practices/how-we-style/3-how-we-style-our-python",
+            "guides/best-practices/how-we-style/4-how-we-style-our-jinja",
+            "guides/best-practices/how-we-style/5-how-we-style-our-yaml",
+          ],
+        },
         {
           type: "category",
           label: "Materializations best practices",
@@ -802,7 +820,7 @@ const sidebarSettings = {
             "guides/orchestration/custom-cicd-pipelines/2-lint-on-push",
             "guides/orchestration/custom-cicd-pipelines/3-dbt-cloud-job-on-merge",
             "guides/orchestration/custom-cicd-pipelines/4-dbt-cloud-job-on-pr",
-            "guides/orchestration/custom-cicd-pipelines/5-something-to-consider",  
+            "guides/orchestration/custom-cicd-pipelines/5-something-to-consider",
           ],
         },
         {
@@ -873,7 +891,7 @@ const sidebarSettings = {
               ],
             },
             "guides/migration/tools/migrating-from-spark-to-databricks",
-            "guides/migration/tools/refactoring-legacy-sql"
+            "guides/migration/tools/refactoring-legacy-sql",
           ],
         },
       ],
@@ -949,7 +967,8 @@ const sidebarSettings = {
     {
       type: "category",
       label: "Advanced",
-      items: ["guides/advanced/creating-new-materializations",
+      items: [
+        "guides/advanced/creating-new-materializations",
         "guides/advanced/using-jinja",
       ],
     },
@@ -1154,4 +1173,4 @@ const sidebarSettings = {
   ],
 };
 
-module.exports = sidebarSettings;
+module.exports = sidebarSettings;