Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snowflake's QUOTED_IDENTIFIERS_IGNORE_CASE setting causes errors #1815

Closed
1 of 5 tasks
bmedx opened this issue Oct 8, 2019 · 4 comments
Closed
1 of 5 tasks

Snowflake's QUOTED_IDENTIFIERS_IGNORE_CASE setting causes errors #1815

bmedx opened this issue Oct 8, 2019 · 4 comments
Labels
bug Something isn't working snowflake

Comments

@bmedx
Copy link

bmedx commented Oct 8, 2019

Due to issues with our legacy system's use of Snowflake reserved words in column names, we've found ourselves needing to set the QUOTED_IDENTIFIERS_IGNORE_CASE parameter. https://docs.snowflake.net/manuals/sql-reference/identifiers-syntax.html#controlling-case-using-the-quoted-identifiers-ignore-case-parameter

This setting has a number of unfortunate downstream effects, most notably that it upper cases returned column names. So far we have run into issues:

  • https://github.com/fishtown-analytics/dbt/blob/0.14.latest/core/dbt/task/generate.py#L135-L136 ('table_name' and 'schema_name' are returned capitalized with this setting on). It looks like this has been overhauled since 0.14.x so perhaps this is no longer an issue.
  • dbt run when a schema defined in the project.yml file is not all upper cased. In this case it looks like on startup dbt will query Snowflake for schemas, decide somehow that they don't match based on case, and try to CREATE IF NOT EXISTS it. Normally this would be ok, but our dbt user does not have permissions to create views in the given schema, so we get an error.

Steps To Reproduce

  • Create a Snowflake database "PROD" with schema "finance"
  • Create a dbt project yml with a model:
models:
  warehouse_transforms:
    finance:
      schema: finance
  • Do a dbt run, you should see no "CREATE SCHEMA IF NOT EXISTS" in the Snowflake query log
  • Change your Snowflake connection settings to turn QUOTED_IDENTIFIERS_IGNORE_CASE on
  • Do a dbt run again, you should now see "CREATE SCHEMA IF NOT EXISTS" in the Snowflake query log
  • Change the project yml file to be:
models:
  warehouse_transforms:
    finance:
      schema: FINANCE
  • Do a dbt run again, you should now see "CREATE SCHEMA IF NOT EXISTS" does not appear Snowflake query log

Expected behavior

dbt respects the quirks of Snowflake's QUOTED_IDENTIFIERS_IGNORE_CASE, or a warning that it may not play nicely with dbt in the docs.

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

installed version: 0.14.0
   latest version: 0.14.2

The operating system you're using:
Ubuntu 16.04

The output of python --version:
Python 3.6.9

Additional context

We are pretty early in on our dbt deployment and so likely haven't flushed out all of the issues this setting can cause. The things I've listed so far have been minor and we've been able to work around them. As such, these issues may simply be something to document instead of trying to fix.

@bmedx bmedx added bug Something isn't working triage labels Oct 8, 2019
@drewbanin drewbanin added snowflake and removed triage labels Oct 9, 2019
@drewbanin
Copy link
Contributor

Thanks for the report @bmedx!

We typically solve problems like this by simply making comparisons on Snowflake (and other databases) case-insensitive. Rather than trying to find an exact-match for a schema name, we can check to see if the lower-cased version of the provided schema is present in the set of lower-cased schemas in the database. This can definitely lead to some false-positives -- both finance and FINANCE` should (normally) be able to live beside one another in Snowflake, but in practice, that's probably a terrible idea.

I think the fix here is to:

  1. make the schema lookup you've described case-insensitive
  2. potentially lower-case column names in a post-processing step (eg. in docs generation)

I feel great about our ability to make this change for schema names, but lower-casing all column names that pass through Snowflake in pursuit of supporting QUOTED_IDENTIFIERS_IGNORE_CASE feels undesirable to me. We may very well be able to handle this on a case-by-case basis (eg. in docs generation) though.

@bmedx
Copy link
Author

bmedx commented Oct 15, 2019

For sure, I agree on both points. In most cases it probably doesn't matter as the object names would be coming from Snowflake in the first place. It's only in places where they are hard coded or coming from configuration that it seems to be an issue.

@sean-rose
Copy link

Perhaps dbt could use the model/column names from schema.yml files as the canonical names in terms of capitalization?

@beckjake
Copy link
Contributor

I'm closing this, as we mostly believe this is fixed in more recent versions of dbt. In particular, dbt 0.17.0 adds more support around this kind of behavior - as part of implementing #2322 and #2324, dbt takes special pains to provide whatever the user provided with their quoting parameters directly to snowflake for queries. The only case-insensitive matching that should occur is when dbt is creating the catalog, and for now that seems unavoidable.

That said, dbt doesn't really test with this parameter enabled, so regressions are always a risk - please open issues for any regressions you experience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working snowflake
Projects
None yet
Development

No branches or pull requests

4 participants