Feature: Cleaning up your DAG with hidden ref() #7733
danielwilczak101
started this conversation in
Ideas
Replies: 1 comment 1 reply
-
Hi! You can already achieve what you are after today by overwriting the default |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Feature request:
Add the capability to toggle if references are shown on the DAG.
Problem we're trying to solve:
There are multiple data modeling design patterns, of which the most common approaches involve denormalized and normalized tables. In a denormalized table, all of the relevant dimensional data (columns/attributes) are stored directly in the table. In a normalized table, only a reference key is stored in the fact table, which is used as a foreign key to other tables which store dimensional data. Regardless of which approach is used, it is generally considered best practice to do calculations using these keys where possible, because the keys contain all of the relevant information needed, tends to be more performant, etc.
In DBT however, nodes are connected based on references, and there isn't a good way to represent the fact that a key is equivalent to a dimensional model. For denormalized design patterns, this results in having all of the dimensional models connect to the final fact model when it finally extracts the dimensional data out of the key. This often shows up as a bunch of staging models reaching all the way across your DAG (Directed Acyclic Graph) to the final fact model, which makes the DAG unnecessarily confusing. An example of this can be seen above.
To simplify the DAG, the hidden ref macro was created, which allows DBT models to reference other models without having them show up on the DAG.
It's simple to implement once the macro is created:
Macro Code:
The standard schema approach:
For a standard database, schema, and table alias, the ref function can be replicated simply using "database.schema.table". This makes for an easy implementation of hidden ref without using the ref function:
Custom schema approach:
To use hidden ref, it manually implements what the normal ref macro does without calling it at all, since calling it triggers Python code to add it to the DAG and the model's dependencies (more on this later). We used a straightforward approach with brute forced yaml:
Usually only a few models will ever need to be added to this hardcoded yaml, since there are only so many dimensional models that are used, and in such a way that using hidden ref becomes beneficial.
Dev, test, prod approach:
Hiding references on the DAG is only useful when looking at the DAG in dev. By setting "target.name" for different environments, you can ensure certain environments use the original ref function instead of the custom hidden ref implementation. This ensures that even if a mistake is made with the hidden ref function in dev, your code should continue to work in test or production.
Dbt Recommendation:
A better approach would be to have this functionality built directly into the ref function so that users do not have to worry about adding macros or potentially missing a materialization by improper use.
Contributions:
Jack Nguyen - Co-Author of the article.
Michael Johnson (Dbt Labs)- For inspiring us to share the macro with the community.
Beta Was this translation helpful? Give feedback.
All reactions