Support PK and FK declarations for ER diagram and modelling purposes #3295

JC-Lightfold · 2021-04-26T12:14:40Z

DBT needs to be able to play nice with data model diagrams!

If you point any diagramming tool at schema that DBT produces, it's not going to be able to draw connections between tables, because they all rely on the use of the CONSTRAINT DDL syntax, regardless of whether or not its enforced (this is the reason Snowflake has the syntax but doesn't enforce it). Wouldn't it be great to simply be able to declare PK and FK constraints in the model YML, which seems the EXACT place to do it?!

You can currently attempt to declare the necessary DDL with post-hooks and macros, but it's a lot of work for each and every column and table.

Being able to diagrammise structures created and maintained by DBT with more than simply the lineage of the DAG generator is critical to meeting many enterprise governance requirements that are used to using ERDs to communicate logical models with stakeholders.

Likewise, given the HUGE potential for DBT to basically become the "Universal Data Transformer" for MPP systems like Snowflake, it seems overly limited to expect that it will only ever define transient/ephemeral tables that are slices of a DAG on their way to a final analytics transformation. Don't we want to use DBT to build data vaults? Star schemas? 3NF models of any kind?

If the answer is yes, then people will expect to be able to understand the relationship model of the schemas DBT is persisting. That means diagrams using protocols like IDEF1X or crow's feet.

Recommended implementation patterns:

An option is to simply annotate PK/FK status and linkages as the COMMENT field in all databases (especially Snowflake). This will still require custom parsing solutions to convert into reliable diagrams, but it's a start
I think the preferred implementation is to allow the model parameters declaring fields in the model YAML to declare PK and FK CONSTRAINTS and pass that through to the DDL compiled

This issue is raised specifically with regards to how it can support standard model diagrams using ER schema, but it connects well to the following issues:

#2936
#2191
#1570

jtcohen6 · 2021-05-06T15:28:20Z

@JC-Lightfold Thanks so much for opening this issue, and for the really thoughtful comments in the original Slack thread.

You raise a really good question: Even though table constraints don't do anything in most modern data warehouses (or worse—sometimes they're used but not validated!), should dbt support table constraints as a way to backport support for older schema diagramming software?

I do think the answer here could take one of two forms:

"Strict" schema validation that includes a column list, and constraints, as part of the CTA in databases that support it (Snowflake, BigQuery). This is the idea proposed in the issue you linked; models would fail if there's a mismatch between the columns specified in .yml and the columns produced by the query.
Post hoc addition of constraints. This is totally doable today, with a little bit of custom work.

Wouldn't it be great to simply be able to declare PK and FK constraints in the model YML, which seems the EXACT place to do it?!

You can currently attempt to declare the necessary DDL with post-hooks and macros, but it's a lot of work for each and every column and table.

I think you can have it both ways!

Consider an approach where I define a macro, macros/add_constraints.sql (using PostgreSQL):

{% macro add_constraints() %}

    {% set constraint_sql_list = [] %}
    {% set column_dict = model.columns %}
    
    {% for column_name in column_dict %}
    
        {% set quoted_name = adapter.quote(column_name) if column_dict[column_name]['quote'] else column_name %}
        {% set constraint = column_dict[column_name]['meta']['constraint'] %}
        
        {% if constraint %}
        
            {% set constraint_name = 
                this.identifier + '_' + constraint|replace(' ','_') + '_' + column_name %}
        
            {% set constraint_sql %}
            alter table {{ this }} add constraint {{ constraint_name }}
                {{ constraint }} ({{ quoted_name }})
            {% endset %}
            
            {% do constraint_sql_list.append(constraint_sql) %}
            
        {% endif %}
        
    {% endfor %}
    
    {% do return(constraint_sql_list | join(';\n')) %}
        
{% endmacro %}

Then I have a model, models/model_a.sql:

{{ config(
    materialized = 'table',
    post_hook = "{{ add_constraints() }}"
) }}

select 1 as column_a

I define properties for it in models/schema.yml:

version: 2

models:
  - name: my_model
    columns:
      - name: column_a
        meta:
          constraint: primary key

And voila, dbt runs:

create  table "jerco"."dbt_jcohen"."my_model__dbt_tmp" as (select 1 as column_a);
alter table "jerco"."dbt_jcohen"."my_model" rename to "my_model__dbt_backup"
alter table "jerco"."dbt_jcohen"."my_model__dbt_tmp" rename to "my_model"
alter table "jerco"."dbt_jcohen"."my_model" add constraint my_model_primary_key_column_a
                primary key (column_a);
commit;
drop table if exists "jerco"."dbt_jcohen"."my_model__dbt_backup" cascade;

The meta property lets you define whichever values you want, wherever you want, entirely for your own use, whatever it may be. If it makes more sense to define those constraints as meta properties of the model, rather than the individual column, that can also work.

The downside of this approach is that you still need to specify post_hook = "{{ add_constraints() }}" in each individual model. You could also look into:

Reimplementing the 'table' + 'incremental' materializations to add the macro calls there
Reimplementing the persist_docs macro, which already loops over model columns to add descriptions as comments, to also include a step that adds constraints. As long as you have the persist_docs config turned on for columns, it would "just work." (This is how dbt-bigquery supports column-level policy tags.)

I'm curious to hear what you think! If any of these approaches feels especially promising, we could think about adding it into dbt proper.

jtcohen6 · 2021-05-06T15:28:23Z

Of course, there's another question at play here: Should dbt be able to create its own ERDs?

It's totally possible from a conceptual perspective, and there's been a lot of discussion on it (dbt-labs/dbt-docs#84, discourse). These could be generated from unique + relationships tests (or from user-supplied meta properties, or new properties entirely). Personally, I'd like to see a version that:

Leverages dbt artifacts: manifest.json includes test definitions + properties, catalog.json includes database representations and column names
Creates one ERD per exposure, including the relationships between sources, models, snapshots, and seeds "exposed" there. I'm skeptical of the value of an ERD that includes hundreds of models, many in staging/intermediate layers; I feel like the exposure has a lot of potential for defining which dbt resources are user-facing, queryable, and best represented in the diagram.

jmriego · 2021-07-19T09:30:51Z

@jtcohen6 Totally agree than having one ERD with hundreds of models wouldn't do much. In our case we have the most important models in one folder but it might difficult to know what most people would be doing.
In my opinion a good solution might be making the ERD creation configurable.
Something like an ERD generation command and you can pass selection flags to it. If it's for the docs, probably makes sense to configure it in some way also

blake-enyart · 2021-08-28T18:32:31Z

@JC-Lightfold , just curious if there has been any more traction on this? I'm super interested in this functionality.

Thank you @jtcohen6 for the solution above! Will be using that shortly on some customer projects.

github-actions · 2022-02-25T01:52:55Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

github-actions · 2022-09-04T02:14:40Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

blake-enyart · 2022-09-07T23:20:04Z

I definitely still think this issue is relevant for documenting data architecture

MazrimT · 2023-01-18T16:16:03Z

Not being able to set primary and foregin keys is basically what's keeping us from using dbt. We have requirements to produce this kind of documentation and keep it up to date, and no-one wants to do this by hand on the side that is not automatically updated.
In our case it's only for documentation to the users of the data, not to actually constrain the tables.

jtcohen6 · 2023-01-18T16:52:42Z

@MazrimT There's in-progress work to make this possible! (#6271)

github-actions · 2023-07-18T02:12:23Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions · 2023-07-26T01:57:04Z

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

JC-Lightfold added enhancement New feature or request triage labels Apr 26, 2021

jtcohen6 removed the triage label May 6, 2021

github-actions bot added stale Issues that have gone stale and removed stale Issues that have gone stale labels Feb 25, 2022

github-actions bot added the stale Issues that have gone stale label Sep 4, 2022

lostmygithubaccount removed the stale Issues that have gone stale label Sep 8, 2022

github-actions bot added the stale Issues that have gone stale label Jul 18, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 26, 2023

graciegoheen mentioned this issue May 22, 2024

[Feature] Support constraints independently from enforcing a full model contract #10195

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support PK and FK declarations for ER diagram and modelling purposes #3295

Support PK and FK declarations for ER diagram and modelling purposes #3295

JC-Lightfold commented Apr 26, 2021 •

edited

Loading

jtcohen6 commented May 6, 2021 •

edited

Loading

jtcohen6 commented May 6, 2021 •

edited

Loading

jmriego commented Jul 19, 2021

blake-enyart commented Aug 28, 2021

github-actions bot commented Feb 25, 2022

github-actions bot commented Sep 4, 2022

blake-enyart commented Sep 7, 2022

MazrimT commented Jan 18, 2023 •

edited

Loading

jtcohen6 commented Jan 18, 2023

github-actions bot commented Jul 18, 2023

github-actions bot commented Jul 26, 2023

Support PK and FK declarations for ER diagram and modelling purposes #3295

Support PK and FK declarations for ER diagram and modelling purposes #3295

Comments

JC-Lightfold commented Apr 26, 2021 • edited Loading

jtcohen6 commented May 6, 2021 • edited Loading

jtcohen6 commented May 6, 2021 • edited Loading

jmriego commented Jul 19, 2021

blake-enyart commented Aug 28, 2021

github-actions bot commented Feb 25, 2022

github-actions bot commented Sep 4, 2022

blake-enyart commented Sep 7, 2022

MazrimT commented Jan 18, 2023 • edited Loading

jtcohen6 commented Jan 18, 2023

github-actions bot commented Jul 18, 2023

github-actions bot commented Jul 26, 2023

JC-Lightfold commented Apr 26, 2021 •

edited

Loading

jtcohen6 commented May 6, 2021 •

edited

Loading

jtcohen6 commented May 6, 2021 •

edited

Loading

MazrimT commented Jan 18, 2023 •

edited

Loading