Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add style guide #3300

Merged
merged 29 commits into from
Jul 5, 2023
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
4eb9959
add style guide to guides section
Jul 7, 2022
7c338b2
update sidebars.js
Jul 7, 2022
e457ad8
Break up structure for consistency
Jul 8, 2022
9bcc57f
add style guide to guides section
Jul 7, 2022
50566d1
update sidebars.js
Jul 7, 2022
856cddc
Break up structure for consistency
Jul 8, 2022
0373fde
Finish draft of guide
gwenwindflower Apr 27, 2023
037e6cd
Add style guide files to sidebar
gwenwindflower Apr 27, 2023
567549b
Merge branch 'add-style-guide' of github.com:dbt-labs/docs.getdbt.com…
gwenwindflower Apr 27, 2023
390be8b
Remove extra moved files from remote
gwenwindflower Apr 27, 2023
8cd5c8f
Remove extra files from sidebar
gwenwindflower Apr 27, 2023
f14975d
Try to fix stupid sidebar
gwenwindflower Apr 27, 2023
ce17bdc
Write intro
gwenwindflower Apr 27, 2023
d206648
Merge branch 'current' into add-style-guide
gwenwindflower Apr 28, 2023
a5c6148
Add rec on model version naming
gwenwindflower May 1, 2023
9eb5f76
Incorporate Wasila's review
gwenwindflower Jun 23, 2023
6738ae7
Expand model and python examples
gwenwindflower Jun 23, 2023
92e4805
Merge branch 'current' into add-style-guide
gwenwindflower Jun 23, 2023
94cc15d
Add conclusion
gwenwindflower Jun 23, 2023
82083b0
Merge branch 'add-style-guide' of github.com:dbt-labs/docs.getdbt.com…
gwenwindflower Jun 23, 2023
5e37162
Add conclusion to sidebar nav
gwenwindflower Jun 23, 2023
f02be1b
Update 0-how-we-style-our-dbt-projects.md
matthewshaver Jun 23, 2023
e45a49d
Apply suggestions from Matt's code review
gwenwindflower Jun 27, 2023
8445df8
Apply suggestions from Joel's code review
gwenwindflower Jun 27, 2023
091e906
Fix link to Prettier
gwenwindflower Jun 27, 2023
d6d1d50
Link to dbt-checkpoint project
gwenwindflower Jun 27, 2023
5649800
Link to code review article
gwenwindflower Jun 27, 2023
4ddead7
Add dbt Cloud formatting callouts
gwenwindflower Jun 29, 2023
7ffff15
Merge branch 'current' into add-style-guide
gwenwindflower Jul 5, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
title: How we style our dbt projects
id: 0-how-we-style-our-dbt-projects
---

## Why does style matter?

Style might seem like a trivial, surface-level issue, but in fact it's a deeply material aspect of a well-built project. Consistent, clear style enhances readability, which in turn makes it easier to understand and maintain your project. Highly readable code will make it easier for you to build clear mental models of your project, which will make it easier for you to debug and extend your project. It's not just a favor to yourself though, equally importantly, it makes it easier for others to understand and contribute to your project, essential for peer collaboration, open source work, and onboarding new team members.
gwenwindflower marked this conversation as resolved.
Show resolved Hide resolved

## What's important about style?

There are two crucial tenets of code style:

- Clarity
- Consistency

Style your code in such a way that you can read your code quickly and understand it. It's also important to consider code review and git diffs. If you're making a change to a model, you want reviewers to be able to see just the material changes your making clearly.
gwenwindflower marked this conversation as resolved.
Show resolved Hide resolved
gwenwindflower marked this conversation as resolved.
Show resolved Hide resolved

Once you've established a clear style, stay consistent. This is the most important thing. Everybody on your team needs to have a unified style, which is why having a style guide is so crucial. If you're writing a model, you should be able to look at other models in the project that your teammates have written and read the same style. If you're writing a macro or a test, you should see the same style as your models. Consistency is key.
gwenwindflower marked this conversation as resolved.
Show resolved Hide resolved

## How should I style?

You should style the project in the way you and your teammates or collaborators agree on. The most important thing is that you have a style guide and that you stick to it. This guide is just a suggestion to get you started and to give you a sense of what a style guide might look like. It covers various areas you may want to consider, with suggested rules. It emphasizes lots of whitespace, clarity, clear naming, and comments. We belive one of the strengths of SQL is that it reads like English, so we lean into that declarative nature throughout our projects. Even within dbt Labs though, there are differing opinions on how to style, even a small but passionate contigent of leading comma enthusiasts! So again, the important thing is not to use to follow this style guide, is to make _your_ style guide and follow it.
gwenwindflower marked this conversation as resolved.
Show resolved Hide resolved

## Automation

Use formatters and linters as much as possible. We're all human, we make mistakes. Not only that, but we all have different preferences and opinions while writing code. Automation is a great way to ensure that your project is styled consistently and correctly and that people can write in a way that's quick and comfortable for them, while still getting perfectly consistent output.
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
title: How we style our dbt models
id: 1-how-we-style-our-dbt-models
---

## Fields and model names
gwenwindflower marked this conversation as resolved.
Show resolved Hide resolved

- Each model should have a primary key.
- The primary key of a model should be named `<object>_id`, e.g. `account_id` – this makes it easier to know what `id` is being referenced in downstream joined models.
gwenwindflower marked this conversation as resolved.
Show resolved Hide resolved
- Booleans should be prefixed with `is_` or `has_`.
- Timestamp columns should be named `<event>_at`, e.g. `created_at`, and should be in UTC. If a different timezone is being used, this should be indicated with a suffix, e.g `created_at_pt`.
gwenwindflower marked this conversation as resolved.
Show resolved Hide resolved
- Price/revenue fields should be in decimal currency (e.g. `19.99` for $19.99; many app databases store prices as integers in cents). If non-decimal currency is used, indicate this with suffix, e.g. `price_in_cents`.
- Avoid reserved words as column names
gwenwindflower marked this conversation as resolved.
Show resolved Hide resolved
- Consistency is key! Use the same field names across models where possible, e.g. a key to the `customers` table should be named `customer_id` rather than `user_id` or 'id'.
- Schema, table and column names should be in `snake_case`.
- Use names based on the _business_ terminology, rather than the source terminology. For example, if the source database uses `user_id` but the business calls them `customer_id`, use `customer_id` in the model.
- Versions of models should use the suffix `_v1`, `_v2`, etc for consistency, e.g. `customers_v1`, `customers_v2`.
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
---
title: How we style our SQL
id: 2-how-we-style-our-sql
---

## Basics

- Use [SQLFluff](https://sqlfluff.com/) to maintain these style rules automatically.
- Reference this [SQLFluff config file](https://github.com/dbt-labs/jaffle-shop-template/blob/main/.sqlfluff) for the rules we use.
- Use Jinja comments (`{# #}`) for comments that should not be included in the compiled SQL.
- Use trailing commas.
- Indents should be four spaces.
- Lines of SQL should be no longer than 80 characters.
- Field names, keywords, and function names should all be lowercase.
- The `as` keyword should be used explicitly when aliasing a field or table.

## Fields, aggregations, and grouping

- Fields should be stated before aggregates and window functions.
- Aggregations should be executed as early as possible (on the smallest data set possible) before joining to another table to improve performance.
- Ordering and grouping by a number (eg. group by 1, 2) is preferred over listing the column names (see [this classic rant](https://blog.getdbt.com/write-better-sql-a-defense-of-group-by-1/) for why). Note that if you are grouping by more than a few columns, it may be worth revisiting your model design.

## Joins

- Prefer `union all` to `union` unless you explicitly want to remove duplicates.
- Avoid table aliases in join conditions (especially initialisms) — it's harder to understand what the table called "c" as compared to "customers".
gwenwindflower marked this conversation as resolved.
Show resolved Hide resolved
- If joining two or more tables, _always_ prefix your column names with the table alias. If only selecting from one table, prefixes are not needed.
- Be explicit about your join type (i.e. write `inner join` instead of `join`).
- Always move left to right to make joins easy to reason about - `right joins` often indicate that you should change which table you select `from` and which one you `join` to.

## 'Import' CTEs

- All `{{ ref('...') }}` statements should be placed in CTEs at the top of the file.
- 'Import' CTEs should be named after the table they are referencing.
- Limit the data scanned by CTEs as much as possible. Only select the columns you're actually using and use `where` clauses to filter out unneeded data.
gwenwindflower marked this conversation as resolved.
Show resolved Hide resolved
- For example:

```sql
with

orders as (

select
order_id,
customer_id,
order_total,
order_date

from {{ ref('orders') }}

where order_date >= '2020-01-01'

)
```

## 'Functional' CTEs

- Where performance permits, CTEs should perform a single, logical unit of work.
- CTE names should be as verbose as needed to convey what they do e.g. `events_joined_to_users` instead of `user_events` (this could be a good model name, but does not describe a specific function or transformation).
- CTEs that are duplicated across models should be pulled out into their own intermediate models. Look out for chunks of repeated logic that should be refactored into their own model.

## Model configuration

- Model-specific attributes (like sort/dist keys) should be specified in the model.
- If a particular configuration applies to all models in a directory, it should be specified in the `dbt_project.yml` file.

- In-model configurations should be specified like this:
gwenwindflower marked this conversation as resolved.
Show resolved Hide resolved

```sql
{{
config(
materialized = 'table',
sort = 'id',
dist = 'id'
)
}}
```

## Example SQL

```sql
with

events as (

...

),

{# CTE comments go here #}
filtered_events as (

...

)

select * from filtered_events
gwenwindflower marked this conversation as resolved.
Show resolved Hide resolved
```

### Example SQL

```sql
with

my_data as (

select
field_1,
field_2,
field_3,
cancellation_date,
expiration_date,
start_date

from {{ ref('my_data') }}

),

some_cte as (

select
id,
field_4,
field_5

from {{ ref('some_cte') }}

),

some_cte_agg as (

select
id,
sum(field_4) as total_field_4,
max(field_5) as max_field_5

from some_cte

group by 1

),

joined as (

select
my_data.field_1,
my_data.field_2,
my_data.field_3,

-- use line breaks to visually separate calculations into blocks
case
when my_data.cancellation_date is null
and my_data.expiration_date is not null
then expiration_date
when my_data.cancellation_date is null
then my_data.start_date + 7
else my_data.cancellation_date
end as cancellation_date,

some_cte_agg.total_field_4,
some_cte_agg.max_field_5

from my_data

left join some_cte_agg
on my_data.id = some_cte_agg.id

where my_data.field_1 = 'abc' and
(
my_data.field_2 = 'def' or
my_data.field_2 = 'ghi'
)

having count(*) > 1

)

select * from joined
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
title: How we style our python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this lowercase python a specific choice? everywhere else it's capitalised but as the lowercase-your-proper-nouns company, I'm loath to immediately label this a typo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good q -- @matthewshaver what's our official style stance here do we capitalize python?

id: 3-how-we-style-our-python
---

## Python tooling

- Python has a more mature and robust ecosystem for formatting and linting (helped by the fact that it doesn't have a million distinct dialects), we recommend using those tools to format and lint your code in the style you prefer.
gwenwindflower marked this conversation as resolved.
Show resolved Hide resolved

- Our current recommendations are
- [black](https://pypi.org/project/black/) formatter
- [ruff](https://pypi.org/project/ruff/) linter
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
title: How we style our Jinja
id: 4-how-we-style-our-jinja
---

## Jinja style guide

- When using Jinja delimiters, use spaces on the inside of your delimiter, like `{{ this }}` instead of `{{this}}`
- Use newlines to visually indicate logical blocks of Jinja.
- Indent 4 spaces into a Jinja block to indicate visually that the code inside is wrapped by that block.
- Don't worry about Jinja whitespace control, focus on your project code being readable. The time you save by not worrying about whitespace control will far outweigh the time you spend in your compiled code where it might not be perfect.

## Examples of Jinja style

```jinja
{% macro make_cool(uncool_id) %}

do_cool_thing({{ uncool_id }})

{% endmacro %}
```

````sql
select
entity_id,
entity_type,
{% if this %}

{{ that }},

{% else %}

{{ the_other_thing }},

{% endif %}
{{ make_cool('uncool_id') }} as cool_id
```
gwenwindflower marked this conversation as resolved.
Show resolved Hide resolved
````
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
title: How we style our YAML
id: 5-how-we-style-our-yaml
---

- Indents should be two spaces
- List items should be indented
- Use a new line to separate list items that are dictionaries where appropriate
- Lines of YAML should be no longer than 80 characters.
- Use the [dbt JSON schema](https://github.com/dbt-labs/dbt-jsonschema) with any compatible IDE and a YAML formatter (we recommend [Prettier](https://github.com/dbt-labs/dbt-jsonschema)) to validate your YAML files and format them automatically.

### Example YAML

```yaml
version: 2

models:
- name: events
columns:
- name: event_id
description: This is a unique identifier for the event
tests:
- unique
- not_null

- name: event_time
description: "When the event occurred in UTC (eg. 2018-01-01 12:00:00)"
tests:
- not_null

- name: user_id
description: The ID of the user who recorded the event
tests:
- not_null
- relationships:
to: ref('users')
field: id
```
29 changes: 24 additions & 5 deletions website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -282,7 +282,10 @@ const sidebarSettings = {
type: "category",
label: "Model governance",
collapsed: true,
link: { type: "doc", id: "docs/collaborate/govern/about-model-governance" },
link: {
type: "doc",
id: "docs/collaborate/govern/about-model-governance",
},
items: [
"docs/collaborate/govern/model-access",
"docs/collaborate/govern/model-contracts",
Expand Down Expand Up @@ -735,6 +738,21 @@ const sidebarSettings = {
"guides/best-practices/how-we-structure/5-the-rest-of-the-project",
],
},
{
type: "category",
label: "How we style our dbt projects",
link: {
type: "doc",
id: "guides/best-practices/how-we-style/0-how-we-style-our-dbt-projects",
gwenwindflower marked this conversation as resolved.
Show resolved Hide resolved
},
items: [
"guides/best-practices/how-we-style/1-how-we-style-our-dbt-models",
"guides/best-practices/how-we-style/2-how-we-style-our-sql",
"guides/best-practices/how-we-style/3-how-we-style-our-python",
"guides/best-practices/how-we-style/4-how-we-style-our-jinja",
"guides/best-practices/how-we-style/5-how-we-style-our-yaml",
],
},
{
type: "category",
label: "Materializations best practices",
Expand Down Expand Up @@ -802,7 +820,7 @@ const sidebarSettings = {
"guides/orchestration/custom-cicd-pipelines/2-lint-on-push",
"guides/orchestration/custom-cicd-pipelines/3-dbt-cloud-job-on-merge",
"guides/orchestration/custom-cicd-pipelines/4-dbt-cloud-job-on-pr",
"guides/orchestration/custom-cicd-pipelines/5-something-to-consider",
"guides/orchestration/custom-cicd-pipelines/5-something-to-consider",
],
},
{
Expand Down Expand Up @@ -873,7 +891,7 @@ const sidebarSettings = {
],
},
"guides/migration/tools/migrating-from-spark-to-databricks",
"guides/migration/tools/refactoring-legacy-sql"
"guides/migration/tools/refactoring-legacy-sql",
],
},
],
Expand Down Expand Up @@ -949,7 +967,8 @@ const sidebarSettings = {
{
type: "category",
label: "Advanced",
items: ["guides/advanced/creating-new-materializations",
items: [
"guides/advanced/creating-new-materializations",
"guides/advanced/using-jinja",
],
},
Expand Down Expand Up @@ -1154,4 +1173,4 @@ const sidebarSettings = {
],
};

module.exports = sidebarSettings;
module.exports = sidebarSettings;