Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make BigQuery cache lookups case-insensitive #1810

Closed
4 tasks
haukeduden opened this issue Oct 7, 2019 · 6 comments · Fixed by #1881
Closed
4 tasks

Make BigQuery cache lookups case-insensitive #1810

haukeduden opened this issue Oct 7, 2019 · 6 comments · Fixed by #1881
Labels
bigquery bug Something isn't working

Comments

@haukeduden
Copy link
Contributor

haukeduden commented Oct 7, 2019

UPDATE: added step 4 to reproduction recipe

Describe the bug

The is_incremental() macro always returns false when the target is BigQuery, effectively disabling the incremental build feature.

Steps To Reproduce

  1. Create a fresh DBT project with dbt init
  2. Configure the profile to use BigQuery
  3. Change the "my_first_dbt_model.sql" file to this:

{{ config(materialized='incremental') }}
select 1 as id
{% if is_incremental() %}
THIS SHOULD CAUSE AN ERROR
{% endif %}

  1. execute "dbt run" to initially create the table
  2. execute "dbt run" again
    => no error is displayed (bug)
  3. execute "dbt run --bypass-cache"
    => error is displayed (expected behaviour)

Expected behavior

An error message should be displayed when calling "dbt run" because the SQL inside the is_incremental condition is invalid.
When running "dbt run --bypass-cache" the result is as expected.

Additional information

I did a little debugging and verified that is_incremental is indeed returning false. The cause is that adapter.get_relation apparently returns None. Since the problem vanishes when --bypass-cache is specified, I suspect that the problem is caused by a bug in the relation cache.

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • [X ] bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

installed version: 0.14.2
   latest version: 0.14.2

Up to date!

The operating system you're using:
macOS Mojave

The output of python --version:
Python 3.7.4

@haukeduden haukeduden added bug Something isn't working triage labels Oct 7, 2019
@drewbanin drewbanin removed the triage label Oct 7, 2019
@drewbanin
Copy link
Contributor

hey @haukeduden - I think this is the normal and expected behavior of the is_incremental() macro -- this macro should return True when:

  1. a model is materialized as incremental AND
  2. the --full-refresh flag was not provided AND
  3. the target table exists in the database

I think the issue you're seeing is that for the first run of a model, the destination table does not yet exist in the database.

The is_incremental() block is frequently useful for self-queries, eg.

-- models/my_model.sql
select *
from source_table
{% if is_incremental() %}
  where event_time > (select max(event_time) from {{ this }})
{% endif %}

Here, {{ this }} refers to the model itself, eg analytics.my_model. The is_incremental() logic prevents this self-query from running if analytics.my_model does not yet exist in the database, that is: when adapter.get_relation returns None.

It's certainly possible that there is some other cache-related bug at play here, but I'm not sure that the example code you sent through exercises that bug if it does exist. I'm going to close this issue out, but please feel free to follow up if I have the wrong idea - happy to discuss this further.

@haukeduden
Copy link
Contributor Author

Hi Drewbanin,
thank you for the quick response. I realize now that my reproduction recipe was oversimplified. I have modified the original problem description and added an additional run step.
The problem occurs even if the target table already exists, so it would seem that this is an actual bug.

@drewbanin drewbanin reopened this Oct 7, 2019
@drewbanin
Copy link
Contributor

@haukeduden thanks for the additional info. When I try to reproduce this, I do see dbt failing on the second invocation of dbt run.

Screen Shot 2019-10-07 at 11 36 51 AM

If this is a cache-related issue, then it probably has something to do with the state of your particular BQ project. Does your project name or dataset name contain any interesting/special characters? When we've seen caching issues like this in the past, they usually had to do with case-sensitive comparisons in the cache lookup. Any chance your dataset is named something like MydbtProject or something like that?

@haukeduden
Copy link
Contributor Author

Yes, the dataset has some camel-casing. It is called "AshData".

@haukeduden
Copy link
Contributor Author

haukeduden commented Oct 7, 2019

I just tried again with a lower-case dataset name. Then I also get the expected error message on the second call. So it seems that this might indeed be caused by upper characters in the original dataset name.

@drewbanin
Copy link
Contributor

Thanks @haukeduden - that's super helpful. I appreciate you taking the time to report this one -- we'll take a look at fixing it!

@drewbanin drewbanin changed the title is_incremental always returns false for BigQuery Make BigQuery cache lookups case-insensitive Oct 7, 2019
@drewbanin drewbanin added this to the Louisa May Alcott milestone Oct 8, 2019
beckjake added a commit that referenced this issue Nov 4, 2019
…nsitive

Fix bigquery case sensitive caching issue (#1810)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants