Include required columns in exposures? #5102
Replies: 2 comments
-
@louis-vines Thanks for opening, and sorry for the delay in getting back to you. I'm definitely interested in hearing more about what you're thinking here. There was a good Slack thread about this a few months ago. I've been hesitant in the past to define specific columns in exposures, for fear that it can get quickly out of sync with the BI layer / downstream consumption and require a dbt project PR to update each time. This is also captured in #2835 (comment):
That being said, I see the argument in favor of having this as an option, when you're ready to lock in a set of columns that are definitely used downstream. One way to emulate this functionality today: Create a view/ephemeral model, paired with a one-off test, that selects the set of required columns. The appeal of using a model here is that you can That model also includes one more critical piece: the So when I think about something like
exposures:
- name: my_dashboard
depends_on:
- ref('orders')
- ref('customers')
required_columns:
- order_id
- customers.customer_id
- order_date
- order_amount -- test SQL
select order_id, customers.customer_id, order_date, order_amount
from (select * from {{ ref('orders') }} where 1=2 limit 0) orders
cross join (select * from {{ ref('customers') }} where 1=2 limit 0) customers
exposures:
- name: my_dashboard
depends_on:
- ref('orders')
- ref('customers'): # i dont love this
join: left
on: orders.order_id = ref('customers').order_id
required_columns:
- order_id
- customer_id
- order_date
- order_amount -- test SQL
select order_id, customers.customer_id, order_date, order_amount
from (select * from {{ ref('orders') }} where 1=2 limit 0) orders
left join (select * from {{ ref('customers') }} where 1=2 limit 0) customers
on orders.order_id = customers.order_id Very curious to hear what you're thinking! |
Beta Was this translation helpful? Give feedback.
-
Suggested syntax: exposures:
name: customer_orders
- model: ref('orders')
relationship:
- model: ref('customers')
type: left
join_on: ref('orders').order_id = ref('customers').order_id
- model: ref('product')
type: left
using: product_id
exposures:
# columns required in all exposures
required_columns:
- _ingested_at
- _curated_at
# columns required in ‘jaffle_shop’ exposure
# three required columns in total here
jaffle_shop:
required_columns:
- customer_id |
Beta Was this translation helpful? Give feedback.
-
Describe the feature
I'm not sure what other people think of this but I think it would be nice feature to be able define required columns in
exposures
? This would mean that the exposure definitions would be similar to API definitions and thus we could make our DBT runs fail if we remove a column which is required by an exposure. Feeback on this idea welcome...Describe alternatives you've considered
I haven't thought about alternatives but I guess any other way where we can ensure that certain columns aren't getting removed from models as we are certain they are required by downstream systems.
Additional context
NA
Who will this benefit?
This will improve robustness of interfacing DBT models with downstream systems.
Are you interested in contributing this feature?
I would be happy to start looking at this but I'm unfamiliar with the DBT codebase at present so may need some help!
Beta Was this translation helpful? Give feedback.
All reactions