Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-3383][Bug] Metadata freshness checks that include a filter cause dbt to hang #9078

Closed
2 tasks done
Tracked by #9425
sp-tkerlavage opened this issue Nov 13, 2023 · 3 comments · Fixed by #9825
Closed
2 tasks done
Tracked by #9425
Assignees
Labels
backport 1.7.latest bug Something isn't working High Severity bug with significant impact that should be resolved in a reasonable timeframe
Milestone

Comments

@sp-tkerlavage
Copy link

sp-tkerlavage commented Nov 13, 2023

Is this a new bug in dbt-snowflake?

  • I believe this is a new bug in dbt-snowflake
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

I found this when I was testing the new information_schema/metadata driven freshness feature released in v1.7.

I removed the loaded_at field from my freshness test configs and ran them, not realizing I still had a filter on my freshness check definition. The output in the terminal showed the freshness checks being executed, but they never completed. Viewing my warehouse query history, I confirmed that dbt didn't actually issue any freshness queries at all. dbt output just sat there for several minutes before I manually cancelled it.

Removing the filter from the freshness check definition resolved the issue.

Expected Behavior

Metadata driven freshness checks should either support filter if theres a use case for that behavior, or filter should be ignored and there should be a warning message indicating that filter was ignored.

Either way the current behavior where filter causes metadata driven freshness checks to just hang does seem to be a bug.

Steps To Reproduce

This should cause the freshness check to "run" but no query against the warehouse will be issued:

version: 2
sources:
  - name: src
    database: db
    schema: test

    tables:
      - name: test_table
        freshness:
          warn_after:
            count: 4
            period: hour
          error_after:
            count: 8
            period: hour
          filter: 1=1

This should resolve the error (filter config is removed--freshness check query is actually issued and runs to completion)

version: 2
sources:
  - name: src
    database: db
    schema: test

    tables:
      - name: test_table
        freshness:
          warn_after:
            count: 4
            period: hour
          error_after:
            count: 8
            period: hour

Relevant log output

No response

Environment

- OS: macOS-13.5.2-x86_64-i386-64bit
- Python: 3.11.6
- dbt-core: 1.7.1
- dbt-snowflake: 1.7.0

Additional Context

No response

@sp-tkerlavage sp-tkerlavage added bug Something isn't working triage labels Nov 13, 2023
@github-actions github-actions bot changed the title [Bug] Metadata freshness checks that include a filter cause dbt to hang [ADAP-1032] [Bug] Metadata freshness checks that include a filter cause dbt to hang Nov 13, 2023
@dbeatty10
Copy link
Contributor

Thanks for reporting this @sp-tkerlavage 🙏

I was able to reproduce this in dbt-snowflake and something similar in dbt-postgres as well, so I'm going to transfer this issue to dbt-core.

Reprex (for Snowflake)

seeds/my_seed.csv

id
1

Build the seed (to use it as a source):

dbt seed

models/_sources.yml

version: 2

sources:
  - name: src
    database: "{{ target.database }}"
    schema: "{{ target.schema }}"
    tables:
      - name: my_seed
        freshness:
          warn_after:
            count: 4
            period: hour
          error_after:
            count: 8
            period: hour
          filter: 1=1

Let this run for 30 seconds (or whatever) before using ctrl-c to cancel it:

dbt source freshness

Stack trace:

05:07:05  1 of 1 START freshness of src.my_seed .......................................... [RUN]
^C05:10:58  CANCEL query source.my_project.src.my_seed ..................................... [CANCEL]
05:10:59  
05:10:59  Exited because of keyboard interrupt
05:10:59  
05:10:59  Done. PASS=0 WARN=0 ERROR=0 SKIP=0 TOTAL=0
05:10:59  Encountered an error:

05:10:59  Traceback (most recent call last):
  File "/Users/dbeatty/projects/environments/snowflake_1.7/lib/python3.10/site-packages/dbt/cli/requires.py", line 90, in wrapper
    result, success = func(*args, **kwargs)
  File "/Users/dbeatty/projects/environments/snowflake_1.7/lib/python3.10/site-packages/dbt/cli/requires.py", line 75, in wrapper
    return func(*args, **kwargs)
  File "/Users/dbeatty/projects/environments/snowflake_1.7/lib/python3.10/site-packages/dbt/cli/requires.py", line 168, in wrapper
    return func(*args, **kwargs)
  File "/Users/dbeatty/projects/environments/snowflake_1.7/lib/python3.10/site-packages/dbt/cli/requires.py", line 197, in wrapper
    return func(*args, **kwargs)
  File "/Users/dbeatty/projects/environments/snowflake_1.7/lib/python3.10/site-packages/dbt/cli/requires.py", line 244, in wrapper
    return func(*args, **kwargs)
  File "/Users/dbeatty/projects/environments/snowflake_1.7/lib/python3.10/site-packages/dbt/cli/requires.py", line 284, in wrapper
    return func(*args, **kwargs)
  File "/Users/dbeatty/projects/environments/snowflake_1.7/lib/python3.10/site-packages/dbt/cli/main.py", line 846, in freshness
    results = task.run()
  File "/Users/dbeatty/projects/environments/snowflake_1.7/lib/python3.10/site-packages/dbt/task/runnable.py", line 474, in run
    result = self.execute_with_hooks(selected_uids)
  File "/Users/dbeatty/projects/environments/snowflake_1.7/lib/python3.10/site-packages/dbt/task/runnable.py", line 435, in execute_with_hooks
    res = self.execute_nodes()
  File "/Users/dbeatty/projects/environments/snowflake_1.7/lib/python3.10/site-packages/dbt/task/runnable.py", line 359, in execute_nodes
    self.run_queue(pool)
  File "/Users/dbeatty/projects/environments/snowflake_1.7/lib/python3.10/site-packages/dbt/task/runnable.py", line 291, in run_queue
    self.job_queue.join()
  File "/Users/dbeatty/projects/environments/snowflake_1.7/lib/python3.10/site-packages/dbt/graph/queue.py", line 198, in join
    self.inner.join()
  File "/Users/dbeatty/.pyenv/versions/3.10.10/lib/python3.10/queue.py", line 90, in join
    self.all_tasks_done.wait()
  File "/Users/dbeatty/.pyenv/versions/3.10.10/lib/python3.10/threading.py", line 320, in wait
    waiter.acquire()
KeyboardInterrupt

Reprex (for Postgres)

Everything the same as the reprex for Snowflake above, but doesn't even need the filter: 1=1 part:

version: 2

sources:
  - name: src
    database: "{{ target.database }}"
    schema: "{{ target.schema }}"
    tables:
      - name: my_seed
        freshness:
          warn_after:
            count: 4
            period: hour
          error_after:
            count: 8
            period: hour

It should give similar (or same) stacktrace as in Snowflake.

@dbeatty10 dbeatty10 removed the triage label Nov 15, 2023
@dbeatty10 dbeatty10 transferred this issue from dbt-labs/dbt-snowflake Nov 15, 2023
@github-actions github-actions bot changed the title [ADAP-1032] [Bug] Metadata freshness checks that include a filter cause dbt to hang [CT-3383] [ADAP-1032] [Bug] Metadata freshness checks that include a filter cause dbt to hang Nov 15, 2023
@graciegoheen graciegoheen added backport 1.7.latest High Severity bug with significant impact that should be resolved in a reasonable timeframe labels Nov 15, 2023
@graciegoheen
Copy link
Contributor

Notes from estimation:

  • if there's an unhandled exception in one of the threads, it could cause this to hang

@emmyoop
Copy link
Member

emmyoop commented Nov 20, 2023

There are a few things happening here that cause the hang.

First, we're logging Note events that have positional arguments that we explicitly do not allow.. This assert should really raise a DbtInternalError instead of being an assert. Is there a reason we're doing this check other than to discourage putting vars into the message instead of passing them as data? If there's not, we should exclude the Note event from this check to allow the check to pass through.

Once that's fixed, another error will surface with the loaded_at_field. Right now for a warehouse that does not support metadata checks, if the loaded_at field is blank, we have another uncaught exception here because the freshness kwarg is empty.

This behavior needs to be tweaked where if you try to configure freshness for a node and you don’t supply the loaded_at_field but also the warehouse doesn’t support this feature, we should show the freshness as skipped and log the warning message.

This involves adding NodeStatus.Skipped to the FreshnessStatus and plumbing it through. We will also need to handle returning the SourceFreshnessResult more appropriately here.

@martynydbt martynydbt changed the title [CT-3383] [ADAP-1032] [Bug] Metadata freshness checks that include a filter cause dbt to hang [CT-3383][Bug] Metadata freshness checks that include a filter cause dbt to hang Nov 30, 2023
@martynydbt martynydbt added this to the v1.8 milestone Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.7.latest bug Something isn't working High Severity bug with significant impact that should be resolved in a reasonable timeframe
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants