Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Error Message for Persisted Comments on Columns #3039

Closed
1 of 5 tasks
elikastelein opened this issue Jan 28, 2021 · 4 comments · Fixed by #3149
Closed
1 of 5 tasks

Improve Error Message for Persisted Comments on Columns #3039

elikastelein opened this issue Jan 28, 2021 · 4 comments · Fixed by #3149
Labels
bug Something isn't working
Milestone

Comments

@elikastelein
Copy link
Contributor

Describe the bug

  • When using persisted_docs for table columns in Snowflake, the error messaging could be improved in the specific case where you have a column in the .yml file that doesn't exist in the model
  • Snowflake tries to comment on a table column that doesn't actually exist (since it's not in the model), resulting in an error. dbt passes through the snowflake error during the dbt run saying Database Error ... invalid identifier '<colum_name>' but this message could be potentially improved to direct you closer to the specific problem.

Steps To Reproduce

models/staging/stg_orders.sql

{{ config(materialized='table') }}

with source as (
    select * from {{ ref('raw_orders') }}
)

select id as order_id from source /* No column called 'foo' */

models/staging/schema.yml

version: 2

models:
  - name: stg_orders
    columns:
      - name: order_id
        tests:
          - unique
          - not_null

      - name: foo
        description: 'test comment'
        tests:
          - not_null

dbt_project.yml

...
models:
  +persist_docs:
    columns: true

On snowflake:

Run dbt run --models stg_orders

Running with dbt=0.19.0
Found 8 models, 15 tests, 0 snapshots, 0 analyses, 143 macros, 0 operations, 3 seed files, 0 sources, 0 exposures

09:54:44 | Concurrency: 8 threads (target='dev')
09:54:44 | 
09:54:44 | 1 of 1 START table model eli.stg_orders.............................. [RUN]
09:54:48 | 1 of 1 ERROR creating table model eli.stg_orders..................... [ERROR in 3.56s]
09:54:51 | 
09:54:51 | Finished running 1 table model in 11.17s.

Completed with 1 error and 0 warnings:

Database Error in model stg_orders (models/staging/stg_orders.sql)
  000904 (42000): SQL compilation error: error line 5 at position 8
  invalid identifier 'FOO'
  compiled SQL at target/run/jaffle_shop/models/staging/stg_orders.sql

Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

dbt.log

dbt.log
2021-01-28 18:00:42.714447 (Thread-1): Using snowflake connection "model.jaffle_shop.stg_orders".
2021-01-28 18:00:42.714515 (Thread-1): On model.jaffle_shop.stg_orders: /* {"app": "dbt", "dbt_version": "0.19.0", "profile_name": "jaffle_shop", "target_name": "dev", "node_id": "model.jaffle_shop.stg_orders"} */


      create or replace transient table DEV.eli.stg_orders  as
      (

with source as (
    select * from DEV.eli.raw_orders
)

select id as order_id from source /* No column called 'foo' */
      );
2021-01-28 18:00:44.407204 (Thread-1): SQL status: SUCCESS 1 in 1.69 seconds
2021-01-28 18:00:44.408124 (Thread-1): On model.jaffle_shop.stg_orders: COMMIT
2021-01-28 18:00:44.408251 (Thread-1): Using snowflake connection "model.jaffle_shop.stg_orders".
2021-01-28 18:00:44.408318 (Thread-1): On model.jaffle_shop.stg_orders: COMMIT
2021-01-28 18:00:44.607178 (Thread-1): SQL status: SUCCESS 1 in 0.20 seconds
2021-01-28 18:00:44.617705 (Thread-1): Using snowflake connection "model.jaffle_shop.stg_orders".
2021-01-28 18:00:44.617827 (Thread-1): On model.jaffle_shop.stg_orders: /* {"app": "dbt", "dbt_version": "0.19.0", "profile_name": "jaffle_shop", "target_name": "dev", "node_id": "model.jaffle_shop.stg_orders"} */

    alter table DEV.eli.stg_orders alter
    
        order_id COMMENT $$$$ ,
    
        foo COMMENT $$test comment$$ ;
2021-01-28 18:00:44.907090 (Thread-1): Snowflake query id: 0199ea98-04b1-6014-0008-f983000a5082
2021-01-28 18:00:44.907203 (Thread-1): Snowflake error: 000904 (42000): SQL compilation error: error line 5 at position 8
invalid identifier 'FOO'
2021-01-28 18:00:44.907587 (Thread-1): finished collecting timing info
2021-01-28 18:00:44.907807 (Thread-1): On model.jaffle_shop.stg_orders: Close
2021-01-28 18:00:45.260005 (Thread-1): Database Error in model stg_orders (models/staging/stg_orders.sql)
  000904 (42000): SQL compilation error: error line 5 at position 8
  invalid identifier 'FOO'
  compiled SQL at target/run/jaffle_shop/models/staging/stg_orders.sql

Expected behavior

  • Multiple members of my team including myself have struggled debugging this error message the first time it happens to us. It seems obvious in retrospect but in the moment it isn't clear that the column comment is the source of the error.
    • Because it happens during the run, it's not immediately obvious that the cause of the error is the "foo" column existing in the .yml but not the .sql model (we tend to mentally associate the .yml file only with the testing step)
    • Also, when you look at the target file target/run/jaffle_shop/models/staging/stg_orders.sql the comment commands aren't there, making debugging a bit harder
  • I think this experience could be improved by having dbt raise it's own error to warn about trying to comment on columns that doesn't exist, rather than just passing through the snowflake error which isn't as direct (not sure about the experience on other dbs)

Screenshots and log output

dbt.log
2021-01-28 18:00:42.714447 (Thread-1): Using snowflake connection "model.jaffle_shop.stg_orders".
2021-01-28 18:00:42.714515 (Thread-1): On model.jaffle_shop.stg_orders: /* {"app": "dbt", "dbt_version": "0.19.0", "profile_name": "jaffle_shop", "target_name": "dev", "node_id": "model.jaffle_shop.stg_orders"} */


      create or replace transient table DEV.eli.stg_orders  as
      (

with source as (
    select * from DEV.eli.raw_orders
)

select id as order_id from source /* No column called 'foo' */
      );
2021-01-28 18:00:44.407204 (Thread-1): SQL status: SUCCESS 1 in 1.69 seconds
2021-01-28 18:00:44.408124 (Thread-1): On model.jaffle_shop.stg_orders: COMMIT
2021-01-28 18:00:44.408251 (Thread-1): Using snowflake connection "model.jaffle_shop.stg_orders".
2021-01-28 18:00:44.408318 (Thread-1): On model.jaffle_shop.stg_orders: COMMIT
2021-01-28 18:00:44.607178 (Thread-1): SQL status: SUCCESS 1 in 0.20 seconds
2021-01-28 18:00:44.617705 (Thread-1): Using snowflake connection "model.jaffle_shop.stg_orders".
2021-01-28 18:00:44.617827 (Thread-1): On model.jaffle_shop.stg_orders: /* {"app": "dbt", "dbt_version": "0.19.0", "profile_name": "jaffle_shop", "target_name": "dev", "node_id": "model.jaffle_shop.stg_orders"} */

    alter table DEV.eli.stg_orders alter
    
        order_id COMMENT $$$$ ,
    
        foo COMMENT $$test comment$$ ;
2021-01-28 18:00:44.907090 (Thread-1): Snowflake query id: 0199ea98-04b1-6014-0008-f983000a5082
2021-01-28 18:00:44.907203 (Thread-1): Snowflake error: 000904 (42000): SQL compilation error: error line 5 at position 8
invalid identifier 'FOO'
2021-01-28 18:00:44.907587 (Thread-1): finished collecting timing info
2021-01-28 18:00:44.907807 (Thread-1): On model.jaffle_shop.stg_orders: Close
2021-01-28 18:00:45.260005 (Thread-1): Database Error in model stg_orders (models/staging/stg_orders.sql)
  000904 (42000): SQL compilation error: error line 5 at position 8
  invalid identifier 'FOO'
  compiled SQL at target/run/jaffle_shop/models/staging/stg_orders.sql

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

installed version: 0.19.0
   latest version: 0.19.0

Up to date!

Plugins:
  - bigquery: 0.19.0
  - snowflake: 0.19.0
  - redshift: 0.19.0
  - postgres: 0.19.0

The operating system you're using: Mac OS

The output of python --version: Python 3.8.5

Additional context

I'm very interested in working on this issue myself but wanted to first see if anybody else had feedback or thoughts to share about it.

@elikastelein elikastelein added bug Something isn't working triage labels Jan 28, 2021
@jtcohen6 jtcohen6 removed the triage label Feb 1, 2021
@jtcohen6
Copy link
Contributor

jtcohen6 commented Feb 1, 2021

Thanks for the detailed write-up @elikastelein! I really like a lot of what you've got to say above :)

I think there are a few options here:

  1. Catch and spruce up these error messages
  2. Check to see if a column exists in the (just-created) table before running comment for that column, i.e. via adapter.get_columns_in_relation
  3. Use if exists (docs) to ensure that the comment DML never raises an error.

Personally, I like the third option, for two reasons:

There's one drawback: Not every database supports if exists for comment or equivalent DML statements. If we want parity on those, we may need to use the introspection-based method instead.

What do you think?

@elikastelein
Copy link
Contributor Author

Thanks for the reply @jtcohen6. Using if exists never occurred to me, but I like that as a solution.

Simplicity. This code change would be very straightforward

Cool, I'll start a PR for this in the next couple of weeks

@jtcohen6
Copy link
Contributor

Nice, glad to hear it!

There is a small downside that I failed to mention above. To take advantage of if exists, we'd need to switch from one DDL statement for all column comments:

alter table dev_jerco.my_model alter
  col1 comment 'hello'
  col2 comment 'goodbye'
  badcol comment 'this one is missing'
;

To one DDL statement per column comment:

comment if exists on column dev_jerco.my_model.col1 is 'hello';
comment if exists on column dev_jerco.my_model.col2 is 'goodbye';
comment if exists on column dev_jerco.my_model.badcol is 'this one is missing';

IMO that's a small price to pay, as each statement should still run quite quickly.

@elikastelein
Copy link
Contributor Author

I made a very small PR here: #3149

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants