Optimize primarily for readability, maintainability, and robustness rather than for fewer lines of code.
Newlines are cheap; people's time is expensive.
If a select
statement is so large it can't be easily comprehended, it would be better to refactor it into multiple smaller CTEs that are later joined back together.
Very long lines are harder to read, especially in situations where space may be limited like on smaller screens or in side-by-side version control diffs.
It's more readable, easier to keep consistent, and avoids having to quote identifiers due to capitalization, spaces, or other special characters.
Otherwise the identifier will have to be quoted everywhere it's used.
It's easier to keep things consistent in version control when only space characters are used.
Lowercase is more readable than uppercase, and you won't have to constantly be holding down a shift key.
/* Good */
select *
from customers
/* Good */
select count(*) as customers_count
from customers
/* Bad */
SELECT *
FROM customers
/* Bad */
Select *
From customers
/* Bad */
select COUNT(*) as customers_count
from customers
!=
reads like "not equal" which is closer to how we'd say it out loud.
||
is a standard SQL operator, and in some databases like Redshift concat
only accepts two arguments.
coalesce
is universally supported, whereas Redshift doesn't supportifnull
and BigQuery doesn't supportnvl
.coalesce
is more flexible and accepts an arbitrary number of arguments.
isnull
and notnull
are specific to Redshift.
case
statements are universally supported, whereas Redshift doesn't support iff
, and in BigQuery the function is named if
instead of iff
.
/* Good */
select count(*) as customers_count
from customers
/* Bad */
select count(*) customers_count
from customers
/* Good */
select max(id) as max_customer_id
from customers
/* Bad */
select max(id)
from customers
Queries filter on the where
clause earlier in their processing, so where
filters are more performant.
union all
is more performant because it doesn't have to sort and de-duplicate the rows.
This makes the intention clear.
/* Good */
select distinct
customer_id
, date_trunc('day', created_at) as purchase_date
from orders
/* Bad */
select
customer_id
, date_trunc('day', created_at) as purchase_date
from orders
group by 1, 2
There's no need to incur the performance hit. If consumers of the query need the results ordered they can normally do that themselves.
For functions that take date part parameters, specify the date part as a string rather than a keyword.
- While some advanced SQL editors can helpfully auto-complete and validate date part keywords, if they get it wrong they'll show superfluous errors.
- Less advanced SQL editors won't syntax highlight date part keywords, so using strings helps them stand out.
- Using a string makes it unambiguous that it's not a column reference.
/* Good */
date_trunc('month', created_at)
/* Bad */
date_trunc(month, created_at)
This allows single-line comments to naturally expand into multi-line comments without having to change their syntax.
When expanding a comment into multiple lines:
- Keep the opening
/*
on the same line as the first comment text and the closing*/
on the same line as the last comment text. - Indent subsequent lines by 4 spaces, and add an extra space before the first comment text to align it with the text on subsequent lines.
/* Good */
-- Bad
/* Good: Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Dolor sed viverra ipsum nunc aliquet bibendum enim. */
/* Bad: Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Dolor sed viverra ipsum nunc aliquet bibendum enim. */
-- Bad: Lorem ipsum dolor sit amet, consectetur adipiscing elit,
-- sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
-- Dolor sed viverra ipsum nunc aliquet bibendum enim.
Some SQL dialects like BigQuery support using double quotes or even triple quotes for strings, but for most dialects:
- Double quoted strings represent identifiers.
- Triple quoted strings will be interpreted like the value itself contains leading and trailing single quotes.
/* Good */
select *
from customers
where email like '%@domain.com'
/* Bad */
select *
from customers
where email like "%@domain.com"
/* Will probably result in an error like `column "%@domain.com" does not exist`. */
/* Bad */
select *
from customers
where email like '''%@domain.com'''
/* Will probably be interpreted like '\'%domain.com\''. */
- Having all joins use
on
is more consistent. - If additional join conditions need to be added later,
on
is easier to adapt. using
can produce inconsistent results with outer joins in some databases.
It's better to be explicit so that the join type is crystal clear.
/* Good */
select *
from customers
inner join orders on customers.id = orders.customer_id
/* Bad */
select *
from customers
join orders on customers.id = orders.customer_id
This makes it easier to determine if the join is going to cause the results to fan out.
/* Good */
select *
from customers
left join orders on customers.id = orders.customer_id
/* primary key = foreign key --> one-to-many --> fan out */
/* Good */
select *
from orders
left join customers on orders.customer_id = customers.id
/* foreign key = primary key --> many-to-one --> no fan out */
/* Bad */
select *
from customers
left join orders on orders.customer_id = customers.id
You should be able to tell at a glance where a column is coming from.
/* Good */
select
customers.email
, orders.invoice_number
, orders.total_amount
from customers
inner join orders on customers.id = orders.customer_id
/* Bad */
select
email
, invoice_number
, total_amount
from customers
inner join orders on customers.id = orders.customer_id
Only join conditions should be put in a join
clause. All filter conditions should be put together in the where
clause.
/* Good */
select
...
from orders
inner join customers on orders.customer_id = customers.id
where
orders.total_amount >= 100
and customers.email like '%@domain.com'
/* Bad */
select
...
from orders
inner join customers on
orders.customer_id = customers.id
and customers.email like '%@domain.com'
where orders.total_amount >= 100
- Where performance permits, CTEs should perform a single, logical unit of work.
- CTE names should be as verbose as needed to convey what they do.
- CTE names should not be prefixed or suffixed with
cte
. - CTEs with confusing or notable logic should be commented.
CTEs will make your queries more straightforward to read/reason about, can be referenced multiple times, and are easier to adapt/refactor later.
/* Good */
with
paying_customers as (
select *
from customers
where plan_name != 'free'
)
select ...
from paying_customers
/* Bad */
select ...
from (
select *
from customers
where plan_name != 'free'
) as paying_customers
This allows us to easily tell at a glance whether a column is a primary key, helps us discern whether joins are one-to-many or many-to-one, and is more succinct than other primary key naming conventions (particularly in join conditions).
/* Good */
select ...
from orders
left join customers on orders.customer_id = customers.id
/* Easier to tell this is a many-to-one join and thus won't fan out. */
/* Bad */
select ...
from orders
left join customers on orders.customer_id = customers.customer_id
/* Good */
select ...
from orders
left join some_exceedingly_long_name on orders.some_exceedingly_long_name_id = some_exceedingly_long_name.id
/* Bad */
select ...
from orders
left join some_exceedingly_long_name on orders.some_exceedingly_long_name_id = some_exceedingly_long_name.some_exceedingly_long_name_id
- Date columns based on UTC should be named like
<event>_date
. - Date columns based on a specific timezone should be named like
<event>_date_<timezone indicator>
(e.g.order_date_et
). - Date+time columns based on UTC should be named like
<event>_at
. - Date+time columns based on a specific timezone should be named like
<event>_at_<timezone indicator>
(e.gcreated_at_pt
). - US timezone indicators:
et
= Eastern Time.ct
= Central Time.mt
= Mountain Time.pt
= Pacific Time.
- Boolean columns should be prefixed with a present or past tense third-person singular verb, such as:
is_
orwas_
.has_
orhad_
.does_
ordid_
.
Some examples:
price_usd
weight_oz
weight_kg
weight_grams
Suggested guidelines:
- If the table name consists of 3 words or less don't alias it.
- Use a subset of the words as the alias if it makes sense (e.g. if
partner_shipments_order_line_items
is the only line items table being referenced it could be reasonable to alias it as justline_items
).
/* Good */
select
customers.email
, orders.invoice_number
from customers
inner join orders on customers.id = orders.customer_id
/* Bad */
select
c.email
, o.invoice_number
from customers as c
inner join orders as o on c.id = o.customer_id
An overarching pattern is:
- If there's only one thing, put it on the same line as the opening keyword.
- If there are multiple things, put each one on its own line (including the first one), indented one level more than the opening keyword.
This is easier to keep consistent, and is also easier to write.
/* Good */
select email
from customers
where email like '%@domain.com'
/* Bad */
select email
from customers
where email like '%@domain.com'
/* Good */
select
id
, email
from customers
where
email like '%@domain.com'
and plan_name != 'free'
/* Bad */
select
id
, email
from customers
where email like '%@domain.com'
and plan_name != 'free'
If code containing such operators needs to be split across multiple lines, put the operators at the beginning of the subsequent lines.
- You should be able to scan the left side of the query text to see the logic being used without having to read to the end of every line.
- The operator is only there for/because of what follows it. If nothing followed the operator it wouldn't be needed, so putting the operator on the same line as what follows it makes it clearer why it's there.
/* Bad */
select *
from customers
where
email like '%@domain.com' and
plan_name != 'free'
/* Good */
select *
from customers
where
email like '%@domain.com'
and plan_name != 'free'
If code containing commas needs to be split across multiple lines, put the commas at the beginning of the subsequent lines, followed by a space.
- This makes it easier to spot missing commas.
- Version control diffs will be cleaner when adding to the end of a list because you don't have to add a trailing comma to the preceding line.
- The comma is only there for/because of what follows it. If nothing followed the comma it wouldn't be needed, so putting the comma on the same line as what follows it makes it clearer why it's there.
/* Good */
with
customers as (
...
)
, paying_customers as (
...
)
select
id
, email
, date_trunc('month', created_at) as signup_month
from paying_customers
where email in (
'user-1@example.com'
, 'user-2@example.com'
, 'user-3@example.com'
)
/* Bad */
with
customers as (
...
),
paying_customers as (
...
)
select
id,
email,
date_trunc('month', created_at) as signup_month
from paying_customers
where email in (
'user-1@example.com',
'user-2@example.com',
'user-3@example.com'
)
- If there is only one column expression, put it on the same line as
select
. - If there are multiple column expressions, put each one on its own line (including the first one), indented one level more than
select
. - If there is a
distinct
qualifier, put it on the same line asselect
.
/* Good */
select id
/* Good */
select
id
, email
/* Bad */
select id, email
/* Bad */
select id
, email
/* Good */
select distinct country
/* Good */
select distinct
state
, country
/* Bad */
select distinct state, country
- Put the initial table being selected from on the same line as
from
. - If there are other tables being joined:
- Put each
join
on its own line, at the same indentation level asfrom
. - If there is only one join condition, put it on the same line as the
join
. - If there are multiple join conditions, end the
join
line withon
and put each condition on its own line (including the first one), indented one level more than thejoin
.
- Put each
/* Good */
from customers
/* Good */
from customers
left join orders on customers.id = orders.customer_id
/* Bad */
from customers
left join orders on customers.id = orders.customer_id
/* Bad */
from customers
left join orders
on customers.id = orders.customer_id
/* Good */
from customers
left join orders on
customers.id = orders.customer_id
and customers.region_id = orders.region_id
/* Bad */
from customers
left join orders on customers.id = orders.customer_id
and customers.region_id = orders.region_id
/* Bad */
from customers
left join orders
on customers.id = orders.customer_id
and customers.region_id = orders.region_id
- If there is only one condition, put it on the same line as
where
. - If there are multiple conditions, put each one on its own line (including the first one), indented one level more than
where
.
/* Good */
where email like '%@domain.com'
/* Good */
where
email like '%@domain.com'
and plan_name != 'free'
/* Bad */
where email like '%@domain.com' and plan_name != 'free'
/* Bad */
where email like '%@domain.com'
and plan_name != 'free'
- If grouping/ordering by column numbers, put all numbers on the same line as
group by
/order by
. - If grouping/ordering by column names/aliases:
- If there is only one column, put it on the same line as
group by
/order by
. - If there are multiple columns, put each on its own line (including the first one), indented one level more than
group by
/order by
.
- If there is only one column, put it on the same line as
/* Good */
group by 1, 2, 3
/* Bad */
group by
1
, 2
, 3
/* Good */
order by plan_name
/* Good */
order by
plan_name
, signup_month
/* Bad */
order by plan_name, signup_month
/* Bad */
order by plan_name
, signup_month
- Start each CTE on its own line, indented one level more than
with
(including the first one, and even if there is only one). - Use a single blank line around CTEs to add visual separation.
- Put any comments about the CTE within the CTE's parentheses, at the same indentation level as the
select
.
/* Good */
with
paying_customers as (
select ...
from customers
)
select ...
from paying_customers
/* Bad */
with paying_customers as (
select ...
from customers
)
select ...
from paying_customers
/* Good */
with
paying_customers as (
select ...
from customers
)
, paying_customers_per_month as (
/* CTE comments... */
select ...
from paying_customers
)
select ...
from paying_customers_per_month
/* Bad */
with paying_customers as (
select ...
from customers
)
/* CTE comments... */
, paying_customers_per_month as (
select ...
from paying_customers
)
select ...
from paying_customers_per_month
- You can put a
case
statement all on one line if it only has a singlewhen
clause and doesn't cause the line's length to be too long. - For multi-line
case
statements:when
:when
clauses should start on their own line, indented one level more than thecase
statement.- If a
when
clause has multiple conditions, keep the first condition on the same line aswhen
and put subsequent conditions on their own lines. - If a
when
clause has multiple lines, all its subsequent lines should be indented at least one level more thanwhen
.
then
:then
clauses can go on the same line as a single-linewhen
clause if it doesn't cause the line's length to be too long.- Otherwise,
then
clauses should go on their own line, indented one level more than its associatedwhen
. - If a
then
clause has multiple lines, all its subsequent lines should be indented at least one level more thanthen
.
else
:else
clauses should go on their own line, at the same indentation level as thewhen
clauses.- If an
else
clause has multiple lines, all its subsequent lines should be indented at least one level more thanelse
.
end
:end
should go on its own line, at the same indentation level ascase
.- If the
case
starts after a leading comma and space, alignend
withcase
by adding two extra spaces before it.
- If a multi-line
case
statement is within a function call,case
andend
should go on their own lines, indented one level more than the function call.
- If using
case <expression>
syntax, keep the expression on the same line ascase
.
/* Good */
select
case when customers.status_code = 1 then 'Active' else 'Inactive' end as customer_status
/* Bad */
select
case when customers.status_code = 1 and customers.deleted_at is null then 'Active' else 'Inactive' end as customer_status
/* Good */
select
case
when customers.status_code = 1 then 'Active'
else 'Inactive'
end as customer_status
, ...
/* Bad */
select
case when customers.status_code = 1 then 'Active'
else 'Inactive' end as customer_status
, ...
/* Good */
select
...
, case
when customers.status_code = 1
and customers.deleted_at is null
then 'Active'
else 'Inactive'
end as customer_status
/* Bad */
select
...
, case
when customers.status_code = 1 and customers.deleted_at is null
then 'Active'
else 'Inactive'
end as customer_status
/* Good */
select
...
, sum(
case
when customers.status_code = 1
and customers.deleted_at is null
then customers.lifetime_value
else 0
end
) as active_customers_lifetime_value
/* Bad */
select
...
, sum(case
when customers.status_code = 1 and customers.deleted_at is null then customers.lifetime_value
else 0
end) as active_customers_lifetime_value
- You can put a window function all on one line if it doesn't cause the line's length to be too long.
- If breaking a window function into multiple lines:
- Put each sub-clause within
over ()
on its own line, indented one level more than the window function:partition by
order by
rows between
/range between
- Put the closing
over ()
parenthesis on its own line at the same indentation level as the window function.
- Put each sub-clause within
/* Good */
select
customer_id
, invoice_number
, row_number() over (partition by customer_id order by created_at) as order_rank
from orders
/* Good */
select
customer_id
, invoice_number
, row_number() over (
partition by customer_id
order by created_at
) as order_rank
from orders
/* Bad */
select
customer_id
, invoice_number
, row_number() over (partition by customer_id
order by created_at) as order_rank
from orders
- Break long lists of
in
values into multiple indented lines with one value per line.
/* Good */
where email in (
'user-1@example.com'
, 'user-2@example.com'
, 'user-3@example.com'
)
/* Bad */
where email in ('user-1@example.com', 'user-2@example.com', 'user-3@example.com')
/* Bad */
select *
from customers
where plan_name in ( 'monthly', 'yearly' )
/* Good */
select *
from customers
where plan_name in ('monthly', 'yearly')
This style guide was inspired in part by: