Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache miss when alias includes explicit quotes, instead of configuring quoting: {identifier: true} #3835

Closed
1 of 5 tasks
magnusfors opened this issue Aug 30, 2021 · 3 comments
Labels
bug Something isn't working good_first_issue Straightforward + self-contained changes, good for new contributors!

Comments

@magnusfors
Copy link

magnusfors commented Aug 30, 2021

Describe the bug

If updating a quoted model the dbt tests are not executed against the newly built model, instead they are deferring to the models in the previous state. For regular (non quoted) models the tests run against the correct newly built models.

In the provided example dbt 0.19.1 has been used but the same issue occurs in dbt 0.20.1.

Attached is a dbt project used to reproduce the issue (dbt_bug_project.zip)

Steps To Reproduce

  1. Created some source data in Snowflake like below.
create schema afs_source;
create table sourcedata(col1 number);
insert into sourcedata(col1)  values(1),(2),(3);
  1. dbt run --target=dev
Running with dbt=0.19.1
Found 2 models, 2 tests, 0 snapshots, 0 analyses, 144 macros, 0 operations, 0 seed files, 1 source, 0 exposures

13:13:00 | Concurrency: 1 threads (target='dev')
13:13:00 |
13:13:00 | 1 of 2 START view model [dbt_bug_project.zip](https://github.com/dbt-labs/dbt/files/7077915/dbt_bug_project.zip).not_quoted........................... [RUN]
13:13:02 | 1 of 2 OK created view model AFS_DEV.not_quoted...................... [SUCCESS 1 in 2.20s]
13:13:02 | 2 of 2 START view model AFS_DEV."quoted"............................. [RUN]
13:13:04 | 2 of 2 OK created view model AFS_DEV."quoted"........................ [SUCCESS 1 in 1.91s]
13:13:06 |
13:13:06 | Finished running 2 view models in 12.25s.

Completed successfully

Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
  1. dbt test --target=dev
Running with dbt=0.19.1
Found 2 models, 2 tests, 0 snapshots, 0 analyses, 144 macros, 0 operations, 0 seed files, 1 source, 0 exposures

13:16:01 | Concurrency: 1 threads (target='dev')
13:16:01 |
13:16:01 | 1 of 2 START test sanity_check_model_not_quoted_..................... [RUN]
13:16:04 | 1 of 2 PASS sanity_check_model_not_quoted_........................... [PASS in 2.61s]
13:16:04 | 2 of 2 START test sanity_check_model_quoted_......................... [RUN]
13:16:07 | 2 of 2 PASS sanity_check_model_quoted_............................... [PASS in 2.87s]
13:16:07 |
13:16:07 | Finished running 2 tests in 7.24s.

Completed successfully

Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
  1. dbt clean
Running with dbt=0.19.1
Checking target/*
 Cleaned target/*
Finished cleaning all paths.
  1. dbt run --target=prod
Running with dbt=0.19.1
Found 2 models, 2 tests, 0 snapshots, 0 analyses, 144 macros, 0 operations, 0 seed files, 1 source, 0 exposures

13:17:48 | Concurrency: 1 threads (target='prod')
13:17:48 |
13:17:48 | 1 of 2 START view model AFS_PROD.not_quoted.......................... [RUN]
13:17:49 | 1 of 2 OK created view model AFS_PROD.not_quoted..................... [SUCCESS 1 in 1.96s]
13:17:49 | 2 of 2 START view model AFS_PROD."quoted"............................ [RUN]
13:17:51 | 2 of 2 OK created view model AFS_PROD."quoted"....................... [SUCCESS 1 in 1.65s]
13:17:53 |
13:17:53 | Finished running 2 view models in 11.40s.

Completed successfully

Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
  1. dbt test --target=prod
Running with dbt=0.19.1
Found 2 models, 2 tests, 0 snapshots, 0 analyses, 144 macros, 0 operations, 0 seed files, 1 source, 0 exposures

13:18:27 | Concurrency: 1 threads (target='prod')
13:18:27 |
13:18:27 | 1 of 2 START test sanity_check_model_not_quoted_..................... [RUN]
13:18:29 | 1 of 2 PASS sanity_check_model_not_quoted_........................... [PASS in 2.20s]
13:18:29 | 2 of 2 START test sanity_check_model_quoted_......................... [RUN]
13:18:31 | 2 of 2 PASS sanity_check_model_quoted_............................... [PASS in 2.20s]
13:18:31 |
13:18:31 | Finished running 2 tests in 6.14s.

Completed successfully

Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
  1. dbt docs generate --target=prod
Running with dbt=0.19.1
Found 2 models, 2 tests, 0 snapshots, 0 analyses, 144 macros, 0 operations, 0 seed files, 1 source, 0 exposures

13:19:56 | Concurrency: 1 threads (target='prod')
13:19:56 |
13:19:56 | Done.
13:19:56 | Building catalog
13:20:02 | Catalog written to /src/target/catalog.json
  1. cp ./target/manifest.json .

  2. dbt clean

Running with dbt=0.19.1
Checking target/*
 Cleaned target/*
Finished cleaning all paths.
  1. Modify both the quoted and the not_quoted models by adding an extra column.
  2. dbt run -m state:modified+ --defer --state ./ --target=dev
Running with dbt=0.19.1
Found 2 models, 2 tests, 0 snapshots, 0 analyses, 144 macros, 0 operations, 0 seed files, 1 source, 0 exposures

13:23:35 | Concurrency: 1 threads (target='dev')
13:23:35 |
13:23:35 | 1 of 2 START view model AFS_DEV.not_quoted........................... [RUN]
13:23:36 | 1 of 2 OK created view model AFS_DEV.not_quoted...................... [SUCCESS 1 in 1.74s]
13:23:36 | 2 of 2 START view model AFS_DEV."quoted"............................. [RUN]
13:23:38 | 2 of 2 OK created view model AFS_DEV."quoted"........................ [SUCCESS 1 in 1.52s]
13:23:39 |
13:23:39 | Finished running 2 view models in 8.88s.

Completed successfully

Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
  1. dbt test -m state:modified+ --defer --state ./ --target=dev
Running with dbt=0.19.1
Found 2 models, 2 tests, 0 snapshots, 0 analyses, 144 macros, 0 operations, 0 seed files, 1 source, 0 exposures

13:24:27 | Concurrency: 1 threads (target='dev')
13:24:27 |
13:24:27 | 1 of 2 START test sanity_check_model_not_quoted_..................... [RUN]
13:24:29 | 1 of 2 PASS sanity_check_model_not_quoted_........................... [PASS in 2.67s]
13:24:29 | 2 of 2 START test sanity_check_model_quoted_......................... [RUN]
13:24:31 | 2 of 2 PASS sanity_check_model_quoted_............................... [PASS in 1.49s]
13:24:31 |
13:24:31 | Finished running 2 tests in 6.27s.

Completed successfully

Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
  1. cat ./logs/dbt.log
    Please see the logs attached in the zip file. Both the quoted and the unquoted models are changed but they render different results in the log as below. They should be executing against the AFS_DEV [dbt_bug_project.zip](https://github.com/dbt-labs/dbt/files/7077922/dbt_bug_project.zip) schema.
The with validation as (

  select count(*) as count
  from QLIK_SCRATCH.[dbt_bug_project.zip](https://github.com/dbt-labs/dbt/files/7077915/dbt_bug_project.zip).not_quoted

)


with validation as (

  select count(*) as count
  from QLIK_SCRATCH.AFS_PROD."quoted"

)

Expected behavior

The dbt tests should run against the latest built models and not defered to the previously built models.

Screenshots and log output

Logs found in logs/dbt.log in the attached zip file.

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

installed version: 0.19.1
   latest version: 0.20.1

Your version of dbt is out of date! You can find instructions for upgrading here:
https://docs.getdbt.com/docs/installation

Plugins:
  - snowflake: 0.19.1

The operating system you're using:
Both locally MacOS Big Sur v11.5.2 and remotely EC2 instance with AMI name amzn2-ami-hvm-2.0.20200520.1-x86_64-gp2 (i.e. Linux).

The output of python --version:

Python 3.8.11
@magnusfors magnusfors added bug Something isn't working triage labels Aug 30, 2021
@magnusfors
Copy link
Author

dbt_bug_project.zip

@jtcohen6
Copy link
Contributor

jtcohen6 commented Sep 2, 2021

@magnusfors Thanks for the really clear reproduction case! I followed the steps you outlined, with dbt_jcohen as my dev schema and dbt_project_prod as my prod schema.

explanation

This isn't actually an error with --defer as much as it's a cache miss. One of the conditions dbt checks, when deciding if a relation should be deferred during a --defer run, is to look it up in the just-built relation cache via adapter.get_relation:

https://github.com/dbt-labs/dbt/blob/9d7a6556efb9bdbbfedf24b6f01ded90595a572d/core/dbt/contracts/graph/manifest.py#L930-L937

I re-ran with cache logging turned on (--log-cache-events), and I see this in the logs:

$ dbt --debug test -m state:modified+ --defer --state ./ --target=dev --log-cache-events
...
Adding relation: _CachedRelation(database=ad_hoc, schema=dbt_jcohen, identifier=quoted, inner="AD_HOC"."DBT_JCOHEN"."quoted")

Notice that dbt doesn't know to quote the identifier. That cached relation ends up being slightly different from the relation dbt thinks it's looking for, which has database: 'ad_hoc', schema: 'dbt_jcohen', and identifier: '"quoted"'.

resolution

Ultimately, there's a significantly better way to tell dbt that you want your relations to have quoted identifiers, and it's using the project-level quoting config: https://docs.getdbt.com/reference/project-configs/quoting. Rather than including quotes explicitly in the alias config, you want to set this in dbt_project.yml:

# dbt_project.yml
quoting:
  identifier: true

Note that, when you set that config, you're setting it for all resources in your project. There's no option to have it both ways.

In general, we highly recommend against setting quoting: true on Snowflake—you're going to have a bad time! dbt will try to do the thing as often as possible, but it makes a lot of things harder than they need to be, both for you and for dbt.

All that said, if you set that config to quote all identifiers, and remove the alias config from your models, everything works the way you'd expect—the cache lookups succeed, and the sanity_check_model test selects from the dev version (ad_hoc.dbt_jcohen."quoted").

next steps

The exact reproduction case isn't a behavior I'd expect dbt to handle. That said, there's a better error that dbt should have been raising, as soon as it finds a candidate match in the cache that's approximately but not exactly right:

Encountered an error:
Compilation Error
  When searching for a relation, dbt found an approximate match. Instead of guessing
  which relation to use, dbt will move on. Please delete "AD_HOC"."DBT_JCOHEN"."quoted", or rename it to be less ambiguous.
  Searched for: AD_HOC.DBT_JCOHEN."QUOTED"
  Found: "AD_HOC"."DBT_JCOHEN"."quoted"

I can trip that error by tweaking the logic for identifying approximate (but not exact) matches, to strip the adapter quote character in addition to performing a case-insensitive comparison:
https://github.com/dbt-labs/dbt/blob/9d7a6556efb9bdbbfedf24b6f01ded90595a572d/core/dbt/adapters/base/relation.py#L92-L93

            if (
                self.path.get_lowered_part(k).strip(self.quote_character) != 
                v.lower().strip(self.quote_character)
            ):
                approximate_match = False

Is adjusting the logic, so as to trip a better error earlier on, a change you'd be interested in contributing? Otherwise, I'm going to close this issue.

@jtcohen6 jtcohen6 removed the triage label Sep 2, 2021
@jtcohen6 jtcohen6 changed the title dbt test --defer --state uses models from previous state when updated models are quoted Cache miss when alias includes explicit quotes, instead of configuring quoting: {identifier: true} Sep 2, 2021
@jtcohen6 jtcohen6 added the good_first_issue Straightforward + self-contained changes, good for new contributors! label Sep 2, 2021
@jtcohen6
Copy link
Contributor

jtcohen6 commented Feb 2, 2022

Resolved by #4076

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good_first_issue Straightforward + self-contained changes, good for new contributors!
Projects
None yet
Development

No branches or pull requests

2 participants