dbt 0.20.0 tries to create two tables with the same name when --store-failures is used with similar-looking expression_is_true tests #3626

joellabes · 2021-07-25T22:16:06Z

Describe the bug

This schema.yml file:

version: 2

sources:
    - name: crm
      database: reporting
      schema: crm
      loader: Amazon DMS

      tables: 
        - name: schoolattributevalues
          tests:
          - dbt_utils.expression_is_true:
              expression: "not (len(memovalue) > 0 and len(value) > 0)" #only one of memovalue and value can be populated
          - dbt_utils.expression_is_true:
              expression: "not (len(memovalue) = 0 and len(value) = 0)" #... but, one of them does have to be

gives an error when running dbt test --store-failures:

Encountered an error:
Compilation Error
  dbt found two resources with the database representation ""reporting"."dev_jlabes_dbt_test__audit"."dbt_utils_source_expression_is_d74da5f62da07f2223c634b4409dd71d"".
  dbt cannot create two resources with identical database representations. To fix this,
  change the configuration of one of these resources:
  - test.educationperfect.dbt_utils_source_expression_is_true_crm_schoolattributevalues_not_len_memovalue_0_and_len_value_0_.77167f5215 (models\staging\crm\other\crm_sources.yml)
  - test.educationperfect.dbt_utils_source_expression_is_true_crm_schoolattributevalues_not_len_memovalue_0_and_len_value_0_.c19220b2a8 (models\staging\crm\other\crm_sources.yml)

I guess the problem comes from the only difference between the expressions being the > vs =, which both get turned into _s. However, the .77167f5215 and .c19220b2a8 suffixes provide deduplication for the tests' FQNs (I think those are the FQNs?)

Expected behavior

Both tests to be given different DB representations.

Screenshots and log output

See above

System information

Which database are you using dbt with?

The output of dbt --version:

installed version: 0.20.0
   latest version: 0.20.0

Up to date!

Plugins:
  - bigquery: 0.18.2
  - postgres: 0.20.0
  - redshift: 0.20.0
  - snowflake: 0.18.2

The operating system you're using:
Win10

The output of python --version:
Python 3.8.0

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2021-07-26T13:16:05Z

@joellabes Thanks for the report!

dbt does try to create a unique hashed alias of <64 characters for tests with too-long names. The issue is that we're first substituting all special characters for underscores, then hashing the flattened arguments into a unique identifier:

https://github.com/dbt-labs/dbt/blob/72722635f275666b34efc9baf8485be79dfcaa88/core/dbt/parser/schema_test_builders.py#L41-L58

Since the two tests only differ by a special character, both get stringified to dbt_utils_source_expression_is_true_crm_schoolattributevalues_not_len_memovalue_0_and_len_value_0_, then hashed and suffixed to dbt_utils_source_expression_is_d74da5f62da07f2223c634b4409dd71d.

fix

The most obvious fix here is to hash flat_args before flattening into clean_flat_args. I think the code change for that is fairly straightforward, so I'm going to call this a good first issue.

There's one issue I can foresee: If a test name (including all arguments and underscores) is <64 characters, we aren't including a hashed suffix at all. Special characters would still be substituted out for underscores, causing a collision. Should we start adding hash suffixes for all tests with arguments, regardless of length? Frankly, I don't know how common it is for two tests to both have very few/concise arguments and differ by only special characters between them. We will have at least narrowed this edge case into an even rarer occasion.

workaround

In the meantime, you could certainly work around this by adding an extra condition to one of the tests, and "salting" the hash:

          tests:
          - dbt_utils.expression_is_true:
              expression: "not (len(memovalue) > 0 and len(value) > 0)" #only one of memovalue and value can be populated
          - dbt_utils.expression_is_true:
              expression: "not (len(memovalue) = 0 and len(value) = 0) and true" #... but, one of them does have to be

longer-term options

Tests already support an alias config, just not yet as a test-specific modifier (defined in the same .yml file where the test is defined). We could add support for that, giving users the option of overriding whatever default behavior dbt is attempting around argument flattening and hashing:

          tests:
          - dbt_utils.expression_is_true:
              expression: "not (len(memovalue) > 0 and len(value) > 0)" #only one of memovalue and value can be populated
              alias: test_only_one_of_memovalue_and_value
          - dbt_utils.expression_is_true:
              expression: "not (len(memovalue) = 0 and len(value) = 0)" #... but, one of them does have to be
              alias: test_at_least_one_of_memovalue_and_value

We've also got some broader thinking in the works about node naming + identification: #3348, #3548

joellabes · 2021-07-26T21:51:31Z

Thanks for all the options!

Should we start adding hash suffixes for all tests with arguments, regardless of length?

Feels like a lot of uglification on the happy path to protect against an edge case of edge cases, so I'd go 👎 until it becomes a clearer issue.

I worked around it by flipping the order of the clauses on the second test! It was nice to have them in the same order to make it easy to visually diff, but not a big deal. (Made for a fun code review though

)

joellabes · 2021-08-22T01:41:57Z

@jtcohen6 I've just spent some time looking into this, and have found another wrinkle:

    columns:
      - name: total_unweighted
        tests:
          - not_null
          - accepted_range:
              where: "is_resub_adjustment is false"
              min_value: 0
              tags: 
                - hubspot_bad_revenue
          - accepted_range:
              where: "is_resub_adjustment is true"
              max_value: 0

The args dict for the latter is

{'max_value': 0, 'column_name': 'total_unweighted', 'model': "{% if config.get('where') %}(select * from {{ ref('hubspot_revenue__revenue_lines') }} where {{config.get('where')}}) hubspot_revenue__revenue_lines{% else %}{{ ref('hubspot_revenue__revenue_lines') }}{% endif %}"}

and the where clause is nowhere to be found. That means that I wind up with total_unweighted__0 as the entirety of even the unclean flat args.

Should I be passing config into get_nice_schema_test_name as well? Is it even available at that point? I haven't gone looking, figured I'd check in before going too far down the rabbit hole.

jtcohen6 · 2021-08-23T11:38:48Z

@joellabes Ah, that's a really good point. We've made it so that config values are not included in the generation of a test's name (or unique_id, I believe), so that dbt is able to detect it's the same test, just with different configs (rather than a deleted test + a new test). That's a pretty subtle difference, and it may not matter much, except for the few folks who use the state:new selection method.

The thing that feels weirder to me here is that both min_value: 0 and max_value: 0 are showing up as just __0. I wonder if we should be including arg names as well as values in the generated name?

As it happens, thanks to the work in #3616, v0.21 will actually include support for the "long-term option" I described above: defining a custom alias under each specific test, which would enable you to avoid collisions with --store-failures. What do you think about that as a workable resolution here?

joellabes · 2021-08-24T01:04:47Z

I wonder if we should be including arg names as well as values in the generated name?

I had the same thought, but decided that it was still papering around the edges of the problem instead of fixing it forever so it didn't really move things forward.

What do you think about that as a workable resolution here?

I strongly prefer this 😍 our use case here is to use a combo of get_relations_by_pattern and union_relations to build a dashboard of failing tests over time. It'll be much easier to get the specific relations by pattern if there's.. a pattern, especially one that we can control.

Let's close this issue and I'll jump on the .21 beta quicksharp!

joellabes added bug Something isn't working triage labels Jul 25, 2021

jtcohen6 added dbt tests Issues related to built-in dbt testing functionality and removed triage labels Jul 26, 2021

joellabes mentioned this issue Aug 26, 2021

dbt 0.21.0-b2: test aliases don't play well with anchored yaml #3815

Closed

joellabes closed this as completed Oct 19, 2021

This was referenced Oct 19, 2021

[Feature] Increase tests' aliasability discoverability #4097

Closed

Document that alias is available for tests now dbt-labs/docs.getdbt.com#890

Closed

This was referenced Feb 7, 2022

Better node identification #3348

Closed

[CT-166] [Bug] accepted_values test causing OSError: File name too long #4684

Closed

joellabes mentioned this issue Nov 7, 2023

remove callout and add tabs as examples dbt-labs/docs.getdbt.com#4402

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dbt 0.20.0 tries to create two tables with the same name when --store-failures is used with similar-looking expression_is_true tests #3626

dbt 0.20.0 tries to create two tables with the same name when --store-failures is used with similar-looking expression_is_true tests #3626

joellabes commented Jul 25, 2021 •

edited

Loading

jtcohen6 commented Jul 26, 2021

joellabes commented Jul 26, 2021

joellabes commented Aug 22, 2021

jtcohen6 commented Aug 23, 2021

joellabes commented Aug 24, 2021

dbt 0.20.0 tries to create two tables with the same name when --store-failures is used with similar-looking expression_is_true tests #3626

dbt 0.20.0 tries to create two tables with the same name when --store-failures is used with similar-looking expression_is_true tests #3626

Comments

joellabes commented Jul 25, 2021 • edited Loading

Describe the bug

Expected behavior

Screenshots and log output

System information

jtcohen6 commented Jul 26, 2021

fix

workaround

longer-term options

joellabes commented Jul 26, 2021

joellabes commented Aug 22, 2021

jtcohen6 commented Aug 23, 2021

joellabes commented Aug 24, 2021

joellabes commented Jul 25, 2021 •

edited

Loading