Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue generating docs on BigQuery #1984

Closed
drewbanin opened this issue Dec 5, 2019 · 0 comments · Fixed by #2005
Closed

Issue generating docs on BigQuery #1984

drewbanin opened this issue Dec 5, 2019 · 0 comments · Fixed by #2005
Labels
bigquery bug Something isn't working
Milestone

Comments

@drewbanin
Copy link
Contributor

Describe the bug

Docs generation can fail in new and interesting ways in dbt v0.15.0. Whereas on most databases, the information_schema is accessed at the database-level, on BigQuery, it is sometimes accessed at the project level and other times at the dataset level. This means that the catalog query on BigQuery can fail if a dataset does not exist. That is not the case on other plugins.

I've seen some reports of users whose dbt docs generate task is failing on BigQuery with dataset xxx does not exist in region US. This appears to be happening because dbt is querying the information schema for a dataset which does not exist.

There are two ways this can happen:

  1. dbt is checking an information schema errantly. From @ryanhempenstall on Slack:

Each of the folders within our models folder uses its own schema mapping in the dbt_project.yml file (pasted in below). The schema for our models is then generated based on the target + the schema in our profile + the schema mapping in the yml file. For example, a model might go into dbt_ryan_facts for me but would be added to dbt_alice_facts for Alice when we run with the target dev but if we’re running in production it would just be facts.

# dbt_project.yml file
models:
  dbtpoc:
    materialized: table
    staging:
      materialized: ephemeral
      tags: ["staging", "hourly"]
      schema: stage
    dimensions:
      materialized: ephemeral
      schema: dbtpoc
    facts:
      materialized: table
      schema: dbtpoc
    test_models:
      materialized: table
      schema: test
      tags: testing

The error that dbt encounters is:

  404 Not found: Dataset dev-de1-eu-datatools:dbt_ryan was not found in location US

This seems to indicate that dbt is checking a dataset called dbt_ryan which no models are configured to build into! Here, it's possible that dbt is errantly generating an information_schema relation for a resource that should not participate in docs generation (eg. an ephemeral model, or a disabled model).

  1. There might not be an error at all here, but the project might be configured to build a resource into a schema that doesn't exist (eg. a seed file exists, but dbt seed has never been run).

In either case, I think that failing the entire docs generation step for a single missing dataset is suboptimal. Can we either:

  • check the list of existing datasets and prune the supplied information_schemas to only search in existing datasets
  • run a single catalog query for each information schema, then combine the results at the end. If one of these queries fails with a DatabaseError: 404 not found, we can catch the error and print a warning. Note: running one query per dataset might incur a non-trivial cost on BigQuery.

Steps To Reproduce

case #1: configure a model to build into a dataset that does not exist. Run dbt docs generate on BigQuery and observe the failure.

case #2: configure a disabled model with:

{{ config(disabled=true, schema='notrealnotreal') }}

select 1 as id

run dbt docs generate on BigQuery and observe the failure

case #3: configure an ephemeral model with:

{{ config(materialized='ephemeral', schema='notrealnotreal') }}

select 1 as id

run dbt docs generate on BigQuery and observe the failure

Expected behavior

dbt docs generate should succeed even if a dataset present in the manifest does not exist in the database.

System information

Which database are you using dbt with?

  • bigquery

The output of dbt --version:

0.15.0
@drewbanin drewbanin added bug Something isn't working bigquery labels Dec 5, 2019
@drewbanin drewbanin added this to the 0.15.1 milestone Dec 5, 2019
beckjake added a commit that referenced this issue Dec 13, 2019
…-info-schema-queries

Fix bigquery catalog queries with nonexistent schemas (#1984)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant