Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated generate_model_yaml macro to correctly handle nested bigquery… #54

Merged
merged 5 commits into from
Jun 23, 2022
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
# Unreleased

## New features
- Add support for importing descriptions from columns with the same names in upstream models. It is available by setting the parameter `upstream_descriptions` to `True` in `generate_model_yaml` ([#61](https://github.com/dbt-labs/dbt-codegen/pull/61))
- Add support for including description placeholders for the source and table, which changes the behavior of `generate_source` when `include_descriptions` is set to `True`. Previous logic only created description placeholders for the columns.
- Add optional `name` arg to `generate_source`
- Add optional `table_names` arg to `generate_source` (#50 @rahulj51)

## Fixes
- generate_model_yaml now correctly handles nested bigquery fields (#27)

# dbt-codegen v0.6.0

This release creates breaking changes to the `generate_source.sql` macro.
Expand Down
24 changes: 20 additions & 4 deletions macros/generate_model_yaml.sql
Original file line number Diff line number Diff line change
@@ -1,3 +1,21 @@
{% macro generate_column_yaml(column, model_yaml, column_desc_dict, parent_column_name="") %}
{% if parent_column_name %}
{% set column_name = parent_column_name ~ "." ~ column.name %}
{% else %}
{% set column_name = column.name %}
{% endif %}

{% do model_yaml.append(' - name: ' ~ column.name | lower ) %}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for poking an old PR but shouldn't this line read(?):

{% do model_yaml.append('      - name: ' ~ column_name  | lower ) %}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure one way or the other @Zatte

@bodschut Some questions:

  1. What do you think of @Zatte 's question about column_name versus column.name?
  2. Is there something more descriptive than column_name that would be accurate, like aliased_parent_column_name?
  3. Is parent_column_name even used as a parameter? It's not explicitly set here, and the generate_column_yaml looks like a "private" macro internal to codegen rather than a "public" macro intended to be used by consumers.

Copy link

@Zatte Zatte Dec 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context; I use generate_model_yaml-source (which calls this private macro) on a nested struct in BigQuery

With the current state i only get "tail columns" as the column names (not the full parent hierarchy, dot delimited). I would expect the column names to contain the full parent hierarchy.

Example.
SQL

SELECT 
   struct(
      "a" as a, 
      "b" as b
   ) as substruct

Current output

models:
  - name: model_struct
    description: ""
    columns:
      - name: a
        description: ""

      - name: b
        description: ""

Expected output (with column_name change)

models:
  - name: model_struct
    description: ""
    columns:
      - name: substruct.a
        description: ""

      - name: substruct.b
        description: ""

The current output will be ambiguous in the case of multiple sub-structs (nested columns) with the same name like this.

SELECT 
   struct(
      "a1" as a, 
      "b1" as b
   ) as substruct1, 
   struct(
      "a2" as a, 
      "b2" as b
   ) as substruct2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This info is perfect @Zatte !

I was able to reproduce and see that your suggested fix works.

Opened this issue and will follow-up with a PR:
#98

I took a peek at the commit history on this -- looks like @bodschut had it correct the first time around, and I accidentally foobar'd it when resolving a merge conflict 😬

{% do model_yaml.append(' description: "' ~ column_desc_dict.get(column.name | lower,'') ~ '"') %}
{% do model_yaml.append('') %}
{% if column.fields|length > 0 %}
{% for child_column in column.fields %}
{% set model_yaml = generate_column_yaml(child_column, model_yaml, column_desc_dict, parent_column_name=column_name) %}
{% endfor %}
{% endif %}
{% do return(model_yaml) %}
{% endmacro %}

{% macro generate_model_yaml(model_name, upstream_descriptions=False) %}

{% set model_yaml=[] %}
Expand All @@ -14,9 +32,7 @@
{%- set columns = adapter.get_columns_in_relation(relation) -%}

{% for column in columns %}
{% do model_yaml.append(' - name: ' ~ column.name | lower ) %}
{% do model_yaml.append(' description: "' ~ column_desc_dict.get(column.name | lower,'') ~ '"') %}
{% do model_yaml.append('') %}
{% set model_yaml = generate_column_yaml(column, model_yaml, column_desc_dict) %}
dbeatty10 marked this conversation as resolved.
Show resolved Hide resolved
{% endfor %}

{% if execute %}
Expand All @@ -27,4 +43,4 @@

{% endif %}

{% endmacro %}
{% endmacro %}