Cache miss when alias includes explicit quotes, instead of configuring quoting: {identifier: true} #3835

magnusfors · 2021-08-30T15:54:05Z

Describe the bug

If updating a quoted model the dbt tests are not executed against the newly built model, instead they are deferring to the models in the previous state. For regular (non quoted) models the tests run against the correct newly built models.

In the provided example dbt 0.19.1 has been used but the same issue occurs in dbt 0.20.1.

Attached is a dbt project used to reproduce the issue (dbt_bug_project.zip)

Steps To Reproduce

Created some source data in Snowflake like below.

create schema afs_source;
create table sourcedata(col1 number);
insert into sourcedata(col1)  values(1),(2),(3);

dbt run --target=dev

Running with dbt=0.19.1
Found 2 models, 2 tests, 0 snapshots, 0 analyses, 144 macros, 0 operations, 0 seed files, 1 source, 0 exposures

13:13:00 | Concurrency: 1 threads (target='dev')
13:13:00 |
13:13:00 | 1 of 2 START view model [dbt_bug_project.zip](https://github.com/dbt-labs/dbt/files/7077915/dbt_bug_project.zip).not_quoted........................... [RUN]
13:13:02 | 1 of 2 OK created view model AFS_DEV.not_quoted...................... [SUCCESS 1 in 2.20s]
13:13:02 | 2 of 2 START view model AFS_DEV."quoted"............................. [RUN]
13:13:04 | 2 of 2 OK created view model AFS_DEV."quoted"........................ [SUCCESS 1 in 1.91s]
13:13:06 |
13:13:06 | Finished running 2 view models in 12.25s.

Completed successfully

Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

dbt test --target=dev

Running with dbt=0.19.1
Found 2 models, 2 tests, 0 snapshots, 0 analyses, 144 macros, 0 operations, 0 seed files, 1 source, 0 exposures

13:16:01 | Concurrency: 1 threads (target='dev')
13:16:01 |
13:16:01 | 1 of 2 START test sanity_check_model_not_quoted_..................... [RUN]
13:16:04 | 1 of 2 PASS sanity_check_model_not_quoted_........................... [PASS in 2.61s]
13:16:04 | 2 of 2 START test sanity_check_model_quoted_......................... [RUN]
13:16:07 | 2 of 2 PASS sanity_check_model_quoted_............................... [PASS in 2.87s]
13:16:07 |
13:16:07 | Finished running 2 tests in 7.24s.

Completed successfully

Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

dbt clean

Running with dbt=0.19.1
Checking target/*
 Cleaned target/*
Finished cleaning all paths.

dbt run --target=prod

Running with dbt=0.19.1
Found 2 models, 2 tests, 0 snapshots, 0 analyses, 144 macros, 0 operations, 0 seed files, 1 source, 0 exposures

13:17:48 | Concurrency: 1 threads (target='prod')
13:17:48 |
13:17:48 | 1 of 2 START view model AFS_PROD.not_quoted.......................... [RUN]
13:17:49 | 1 of 2 OK created view model AFS_PROD.not_quoted..................... [SUCCESS 1 in 1.96s]
13:17:49 | 2 of 2 START view model AFS_PROD."quoted"............................ [RUN]
13:17:51 | 2 of 2 OK created view model AFS_PROD."quoted"....................... [SUCCESS 1 in 1.65s]
13:17:53 |
13:17:53 | Finished running 2 view models in 11.40s.

Completed successfully

Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

dbt test --target=prod

Running with dbt=0.19.1
Found 2 models, 2 tests, 0 snapshots, 0 analyses, 144 macros, 0 operations, 0 seed files, 1 source, 0 exposures

13:18:27 | Concurrency: 1 threads (target='prod')
13:18:27 |
13:18:27 | 1 of 2 START test sanity_check_model_not_quoted_..................... [RUN]
13:18:29 | 1 of 2 PASS sanity_check_model_not_quoted_........................... [PASS in 2.20s]
13:18:29 | 2 of 2 START test sanity_check_model_quoted_......................... [RUN]
13:18:31 | 2 of 2 PASS sanity_check_model_quoted_............................... [PASS in 2.20s]
13:18:31 |
13:18:31 | Finished running 2 tests in 6.14s.

Completed successfully

Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

dbt docs generate --target=prod

Running with dbt=0.19.1
Found 2 models, 2 tests, 0 snapshots, 0 analyses, 144 macros, 0 operations, 0 seed files, 1 source, 0 exposures

13:19:56 | Concurrency: 1 threads (target='prod')
13:19:56 |
13:19:56 | Done.
13:19:56 | Building catalog
13:20:02 | Catalog written to /src/target/catalog.json

cp ./target/manifest.json .
dbt clean

Running with dbt=0.19.1
Checking target/*
 Cleaned target/*
Finished cleaning all paths.

Modify both the quoted and the not_quoted models by adding an extra column.
dbt run -m state:modified+ --defer --state ./ --target=dev

Running with dbt=0.19.1
Found 2 models, 2 tests, 0 snapshots, 0 analyses, 144 macros, 0 operations, 0 seed files, 1 source, 0 exposures

13:23:35 | Concurrency: 1 threads (target='dev')
13:23:35 |
13:23:35 | 1 of 2 START view model AFS_DEV.not_quoted........................... [RUN]
13:23:36 | 1 of 2 OK created view model AFS_DEV.not_quoted...................... [SUCCESS 1 in 1.74s]
13:23:36 | 2 of 2 START view model AFS_DEV."quoted"............................. [RUN]
13:23:38 | 2 of 2 OK created view model AFS_DEV."quoted"........................ [SUCCESS 1 in 1.52s]
13:23:39 |
13:23:39 | Finished running 2 view models in 8.88s.

Completed successfully

Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

dbt test -m state:modified+ --defer --state ./ --target=dev

Running with dbt=0.19.1
Found 2 models, 2 tests, 0 snapshots, 0 analyses, 144 macros, 0 operations, 0 seed files, 1 source, 0 exposures

13:24:27 | Concurrency: 1 threads (target='dev')
13:24:27 |
13:24:27 | 1 of 2 START test sanity_check_model_not_quoted_..................... [RUN]
13:24:29 | 1 of 2 PASS sanity_check_model_not_quoted_........................... [PASS in 2.67s]
13:24:29 | 2 of 2 START test sanity_check_model_quoted_......................... [RUN]
13:24:31 | 2 of 2 PASS sanity_check_model_quoted_............................... [PASS in 1.49s]
13:24:31 |
13:24:31 | Finished running 2 tests in 6.27s.

Completed successfully

Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

cat ./logs/dbt.log
Please see the logs attached in the zip file. Both the quoted and the unquoted models are changed but they render different results in the log as below. They should be executing against the AFS_DEV [dbt_bug_project.zip](https://github.com/dbt-labs/dbt/files/7077922/dbt_bug_project.zip) schema.

The with validation as (

  select count(*) as count
  from QLIK_SCRATCH.[dbt_bug_project.zip](https://github.com/dbt-labs/dbt/files/7077915/dbt_bug_project.zip).not_quoted

)


with validation as (

  select count(*) as count
  from QLIK_SCRATCH.AFS_PROD."quoted"

)

Expected behavior

The dbt tests should run against the latest built models and not defered to the previously built models.

Screenshots and log output

Logs found in logs/dbt.log in the attached zip file.

System information

Which database are you using dbt with?

The output of dbt --version:

installed version: 0.19.1
   latest version: 0.20.1

Your version of dbt is out of date! You can find instructions for upgrading here:
https://docs.getdbt.com/docs/installation

Plugins:
  - snowflake: 0.19.1

The operating system you're using:
Both locally MacOS Big Sur v11.5.2 and remotely EC2 instance with AMI name amzn2-ami-hvm-2.0.20200520.1-x86_64-gp2 (i.e. Linux).

The output of python --version:

Python 3.8.11

The text was updated successfully, but these errors were encountered:

magnusfors · 2021-08-30T15:57:38Z

dbt_bug_project.zip

jtcohen6 · 2021-09-02T09:56:28Z

@magnusfors Thanks for the really clear reproduction case! I followed the steps you outlined, with dbt_jcohen as my dev schema and dbt_project_prod as my prod schema.

explanation

This isn't actually an error with --defer as much as it's a cache miss. One of the conditions dbt checks, when deciding if a relation should be deferred during a --defer run, is to look it up in the just-built relation cache via adapter.get_relation:

https://github.com/dbt-labs/dbt/blob/9d7a6556efb9bdbbfedf24b6f01ded90595a572d/core/dbt/contracts/graph/manifest.py#L930-L937

I re-ran with cache logging turned on (--log-cache-events), and I see this in the logs:

$ dbt --debug test -m state:modified+ --defer --state ./ --target=dev --log-cache-events
...
Adding relation: _CachedRelation(database=ad_hoc, schema=dbt_jcohen, identifier=quoted, inner="AD_HOC"."DBT_JCOHEN"."quoted")

Notice that dbt doesn't know to quote the identifier. That cached relation ends up being slightly different from the relation dbt thinks it's looking for, which has database: 'ad_hoc', schema: 'dbt_jcohen', and identifier: '"quoted"'.

resolution

Ultimately, there's a significantly better way to tell dbt that you want your relations to have quoted identifiers, and it's using the project-level quoting config: https://docs.getdbt.com/reference/project-configs/quoting. Rather than including quotes explicitly in the alias config, you want to set this in dbt_project.yml:

# dbt_project.yml
quoting:
  identifier: true

Note that, when you set that config, you're setting it for all resources in your project. There's no option to have it both ways.

In general, we highly recommend against setting quoting: true on Snowflake—you're going to have a bad time! dbt will try to do the thing as often as possible, but it makes a lot of things harder than they need to be, both for you and for dbt.

All that said, if you set that config to quote all identifiers, and remove the alias config from your models, everything works the way you'd expect—the cache lookups succeed, and the sanity_check_model test selects from the dev version (ad_hoc.dbt_jcohen."quoted").

next steps

The exact reproduction case isn't a behavior I'd expect dbt to handle. That said, there's a better error that dbt should have been raising, as soon as it finds a candidate match in the cache that's approximately but not exactly right:

Encountered an error:
Compilation Error
  When searching for a relation, dbt found an approximate match. Instead of guessing
  which relation to use, dbt will move on. Please delete "AD_HOC"."DBT_JCOHEN"."quoted", or rename it to be less ambiguous.
  Searched for: AD_HOC.DBT_JCOHEN."QUOTED"
  Found: "AD_HOC"."DBT_JCOHEN"."quoted"

I can trip that error by tweaking the logic for identifying approximate (but not exact) matches, to strip the adapter quote character in addition to performing a case-insensitive comparison:
https://github.com/dbt-labs/dbt/blob/9d7a6556efb9bdbbfedf24b6f01ded90595a572d/core/dbt/adapters/base/relation.py#L92-L93

            if (
                self.path.get_lowered_part(k).strip(self.quote_character) != 
                v.lower().strip(self.quote_character)
            ):
                approximate_match = False

Is adjusting the logic, so as to trip a better error earlier on, a change you'd be interested in contributing? Otherwise, I'm going to close this issue.

jtcohen6 · 2022-02-02T12:46:19Z

Resolved by #4076

magnusfors added bug Something isn't working triage labels Aug 30, 2021

jtcohen6 removed the triage label Sep 2, 2021

jtcohen6 changed the title ~~dbt test --defer --state uses models from previous state when updated models are quoted~~ Cache miss when alias includes explicit quotes, instead of configuring quoting: {identifier: true} Sep 2, 2021

jtcohen6 added the good_first_issue Straightforward + self-contained changes, good for new contributors! label Sep 2, 2021

emilieschario mentioned this issue Oct 15, 2021

Adjust logic when finding approx matches for model or test matching #4076

Merged

4 tasks

dataders mentioned this issue Nov 2, 2021

When dbt finds existing approximate matches via adapter.get_relation() use the same quoting when printing error message #4187

Closed

1 task

dataders mentioned this issue Nov 16, 2021

adapter.get_relation mistakes exact match for approximate match firebolt-db/dbt-firebolt#11

Closed

jeremyyeo mentioned this issue Jan 18, 2022

[CT-53] [Bug] Possible regression - dbt 1.0 doesn't work properly if alias has quotes in them #4581

Closed

1 task

jtcohen6 closed this as completed Feb 2, 2022

jtcohen6 mentioned this issue Feb 3, 2023

Optimize BaseRelation.matches() #6844

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache miss when alias includes explicit quotes, instead of configuring quoting: {identifier: true} #3835

Cache miss when alias includes explicit quotes, instead of configuring quoting: {identifier: true} #3835

magnusfors commented Aug 30, 2021 •

edited

Loading

magnusfors commented Aug 30, 2021

jtcohen6 commented Sep 2, 2021

jtcohen6 commented Feb 2, 2022

Cache miss when alias includes explicit quotes, instead of configuring quoting: {identifier: true} #3835

Cache miss when alias includes explicit quotes, instead of configuring quoting: {identifier: true} #3835

Comments

magnusfors commented Aug 30, 2021 • edited Loading

Describe the bug

Steps To Reproduce

Expected behavior

Screenshots and log output

System information

magnusfors commented Aug 30, 2021

jtcohen6 commented Sep 2, 2021

explanation

resolution

next steps

jtcohen6 commented Feb 2, 2022

magnusfors commented Aug 30, 2021 •

edited

Loading