-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storing test results in the database #2593
Comments
Heya, just giving my 2 cents here. If you want to write stuff to the database, why not make it a model? A few weeks ago people asked me this exact same thing about storing test failure results in the db, in order for them to be able to fully deprecate a much more expensive tooling they had in place. The solution I came up with was to have tests as models that output a single line, with exactly 1 row and at least the following 3 columns: I must say it has worked pretty well for now, but of course, I don't think this would solve every use case. Here's one of the examples: with
{{import('base__account_management__transaction')}},
{{import('base__account_management__lender_account')}},
issues as (
select
account_id,
sum(amount) as current_balance
from base__account_management__transaction t
join base__account_management__lender_account la
on t.account_id = la.id
where t.status in ('CONFIRMED', 'CONFIRMED_AWAITING_APPROVAL', 'CORRECTED')
and la.account_status = 'CLOSED'
group by account_id
having current_balance >= 1
)
select
'AllClosedAccountsHaveAZeroBalance' as test_name, -- could be {{ model.name }}
case
when count(*) != 0 then 'FAILED'
else 'PASSED'
end as test_result,
'Number of accounts = ' || count(*) || '\n\n' ||
listagg('Account: ' || account_id || '\nBalance: ' || current_balance, '\n\n') as test_details
from issues Here's the macro I used:
And here's the sample slack alert I got from it. We're also able to surface that test_history table in our viz tool to try and detect patters in test failures etc. Maybe this could be translated into a custom materialization, no? Anyway, just thought it was worth sharing.. |
While I like what is written in the FR, I even more agree with @brunomurino 's argument: 'If you want to write stuff to the database, why not make it a model?' I do get that DBT is stateless now, but IMO this should not apply for tests. After all, you also don't delete the default logging metadata after each DBT run, but simply append to the existing log. if the hassle of managing a database for storing test results is too much work: Taking @brunomurino 's input as a starter, I would then write the results of a test to 5 JSON keys per test:
The important detail for me would then be what DBT will write to the test_count and test_details fields.
If you want more input @jtcohen6 , I can mock up some of my proposed test logging JSON records for you... |
Hi @jtcohen6, I can see from the status of this ticket I see that implementing this is not planned yet. Can you give me any indication if / when this might happen?
|
@joshtemple, you also might want to contribute to this FR as it greatly overlaps with #517 |
@brunomurino @bashyroger Thank you both for sharing your thoughts, and for engaging seriously with this proposal! This feature is not currently prioritized, insofar as it depends on—and is meant to inform—work we want to do around v2 test nodes (#1173, #2274) for You're absolutely right that there are other approaches possible today, which leverage the Results object or dbt logs/artifacts. Bruno laid out a compelling one above. With that multiplicity in mind, I believe the built-in approach needs to be: Easy to useThis approach needs to make sense for the most basic, straightforward dbt projects. It should be accessible to users who can write and run SQL against their data warehouse. It shouldn't depend on an ability to write Jinja or parse JSON (trickier in some databases than in others). It should feel like a natural outgrowth of what My goal is not one of consolidation. It's not to create the be-all and end-all mechanism for storing test results in the database or preserving test metadata. Instead, I want a starting point that works well for most projects, while also opening the door for custom behavior as dbt users become more comfortable writing Jinja. IdempotentI believe that we guarantee sensible results and intuitive process by keeping this approach idempotent. I take this point:
It's true that dbt wants to be stateful when it comes to its project files (via git / version control) and debug logs. Every database operation it performs, however, is ultimately idempotent—with snapshots being the exception that prove the rule. It's important to me that To achieve in-database statefulness, you'll be able to create You could also put a ExtensibleThe In order to achieve what Rogier proposed:
I think it would reasonable to write a {% test not_null_by_key(model, column_name, unique_key) %}
{% set name = 'not_null_by_key__' ~ model.name ~ '__' ~ column_name ~ '_by_' ~ unique_key %}
{% set description %} Assert that {{ column_name }} is never null in {{ model }} {% endset %}
{% set result_calc = 'sum(null_values)' %}
{% set fail_msg %} Found {{ result }} null values of {{ model }}.{{ column_name }}. See the breakdown by {{ unique_key }}: {% endset %}
{{ config(name=name, description=description, result_calc=result_calc, fail_msg=fail_msg) }}
select {{ unique_key }}, count(*) as null_values
from {{ model }}
where {{ column_name }} is null
group by 1
{% endmacro %} |
Hi @jtcohen6, I stubled on this article that shares some good ideas on how testing could be improved in DBT: What I specifically liked about the article is the suggesting of defining tests as a Gherkin document: |
I want to store results of |
Let me start by saying that I think this post, as well #2959, encapsulate the majority of what I’m about to write, but the context is split across many comments and linked posts, so I felt it may be beneficial to sum up here before adding on top. I posit that:
Based on the following first principles:
For example, if model B is defined as This may seem like a core principle of dbt, however, certain features are very much outside the scope of this axiom. Specifically, Specifically, if your invocation of DBT looks like this
You are explicitly defining snapshots to happen before runs, but after seeds. What if you want a snapshot of a seed? or a snapshot of a model that’s based on a seed? You also then get the following execution order:
When perhaps instead what you want is
Example: The ideal future state of tests is such that error handling on run failures and test failures is equally mature: The result of a test can control the behavior of the remaining dbt invocation (fail fast causing downstream models not to run, roll back all or part of a transaction such that the data that passes assertions is committed, while failing data does not, complete rollback, etc.)
It doesn’t particularly matter whether it was a “schema test” or a “data test” that issues the alert. Though the instrumentation differs (return 0 vs return 0 rows), the advent of defining custom schema tests all but unifies these two features, leaving the distinction only in instrumentation complexity (custom schema tests require building macros for the purposes of re-using the test while data tests are faster to build but may cause duplication in testing logic over time). Example: I have a mapping model with the following fields I expect there to be more than one entry per company_id, since the company.customer_id could carry the reference OR customer.company_id could carry the reference, however, each company should have at most 1 customer_id. To test for this, I can write the following custom data test:
OR I could include this in the model itself
And then write a schema test like this
(sidenote: this tells me how many distinct unexpected values there are, not how many rows are failing the test, which isn’t ideal, but a tangential discussion) OR I can even write a custom schema test that takes 2 fields as params and builds out the count(distinct) window function for me. @joellabes points out in #2959, when a schema test (any test, really) fails, the first thing to do is figure out why it failed (data issue vs logic issue) and that requires auditing/profiling queries with a bunch of context, which generally means:
If someone outside the Analytics Engineering team is the owner of the system of record (sales ops, marketing, billing etc.), then we need to convey to them that this test is failing AND for which records it’s failing, likely via a BI dashboard, so that they can resolve it without having to touch code / dbt. We should not replicate the test logic in the BI dashboard to do so. |
@ltchikindas Thank you for the thought, time, and care that went into your comment! As we begin work on changes to tests for the next version of dbt (v0.20.0), including many of the ideas surfaced in this issue, it is validating to know just how much we're thinking about the same things.
I agree. In my view, the right implementation here doesn't involve a change to tests (treating them more like models) as much as it requires the advent of a new, generalized task that can execute a DAG of all resource types—to accomplish exactly that thing you're describing. My latest thoughts here are in #2743. Could you give that a read, and comment if anything there seems off the mark?
I agree here as well. Instrumentation complexity feels like the biggest significant difference between "custom schema" tests and "data" tests, in a world when we'd love project and package maintainers to be writing both. To that end, we're thinking of potentially renaming these. Why did we use these names in the first place? The Schema Test reflect a property of the model on which it's defined (primary key, foreign key, enum, etc); of course, for that properties to be of any value in analytics database, which lack (or worse, ignore) column constraints, it needs to be asserted and validated with queries. We didn't think about the Data Test as a property of their referenced models, but rather as a complex, emergent truth that may require information from several models—again, though, a conditional truth that must be validated at query time. While pieces of this distinction may still be conceptually useful—or even, in a world where the dbt-docs generates an ERD based on primary ( |
I was talking with @kosti-hokkanen-supermetrics last week: he mentioned that this feature didn't work 100% as he anticipated and turned it off. Kosti wants to consume a single table of failures and use that as a notification trigger. Curious if anyone has thoughts on this idea? |
I haven’t yet refactored my approach since dbt released this new feature, however it may seem that my approach is still required, which is to use post-run-hooks to insert the single-line test-tables into a “historical” test results table. The single line test-tables all have same structure of course, with a column with a json to carry the details of the test and the rest is some metadata like “consecutive failures threshold for alerting” and etc…. It’s been working great, though I do use a python script to query that historical table and take the latest results and etc to get the notification alert and etc |
@boxysean @brunomurino Right on! I just responded to a very similar comment from @trevor-petach: #903 (comment) I totally get the desire for a single table of failures; I think the answer here is actually metadata artifacts. The In the meantime, I also want to give users the ability to take matters into their own hands, by writing their own macros for consuming results at the end of |
Hi I have implemented this kind of test in dbt. I have also created macro with the code you gave in macros. but i don't understand how to run that macro. when i try to run dbt run operation it 's not working. what do i need to do? |
Describe the feature
dbt testing is powerful; being notified when a test fails can be life-saving; debugging those failed tests could be easier!
There are two long-lived issues related to this subject (#517 and #903), and dozens of valuable comments on both. After discussing with several colleagues and a few community members, we've arrived at an approach I feel good about.
The goal here isn't to build a dimensional model of historical data surrounding dbt invocations. I think we can and should build out capacity for those longitudinal analyses, but it should be powered by dbt artifacts (manifest and run results) rather than adding records to database tables (a la logging package).
Rather, the goal is to add tooling that can be of immediate benefit to dbt developers, for whom debugging tests is a slog that's harder than it needs to be. We want to provide the post-test investigator with a high-level overview of
dbt test
statuses, and the errant records from any failing tests, within a query or two.A lot of this work is going to build on top of v2 schema tests (#1173) and data tests (#2274), both of which are on the 1.0.0 docket. So while this feature is not coming in the next month or two, it is firmly on the roadmap.
Spec: v2 schema tests
To get where we want to go, we need to talk a little about our planned changes to schema tests.
Let's imagine a modified version of the current
not_null
schema test. Several small things are different, the most significant of which is that the test returns a set of records instead of justcount(*)
, similar to how data tests work today:Let's say we have two
not_null
tests defined in our project, in a file calledresources.yml
:The standard way of executing these tests looks like compiling and running some queries, and storing the compiled version of those queries in the
target
directory:After running, dbt returns any failures as log output to the CLI:
There are many features we imagine unlocking with v2 testing:
warn
severity, more configurablewarn_after
anderror_after
thresholds based on a scalar count or %For now, let's focus on storing the results of these tests in the database for easier debugging.
Spec: storing results
Now let's imagine a new
test
flag:$ dbt test --store-results
It's unlikely that dbt developers would want to include this flag when iterating on net-new models, but it's quite likely that production or CI test runs would benefit from including it.
When the
--store-results
flag is included, dbt will instead execute tests like so:Based on the row count in the query status returned by the database, or by running a quick
count(*)
on each table after creation, dbt will determine whether the test passed or failed and create a summary table of all tests in the invocation. (We'll use a mechanism similar todbt seed
to batch inserts and avoid any limits on inserts or query length.)And finally log to the CLI:
Questions
Test definition
dbt will depend entirely on the test block to circumscribe its failures. Each test block query will be responsible for defining:
*
, an explicit column list,dbt_utils.star
or* (except)
(on BigQuery).limit 500
.Audit schema
dbt will store all results in one audit schema per
dbt test --store-results
invocation. It will name the schema{{ target.schema }}__dbt_test_results
.Can I name the schema differently? What about tests on models with custom schemas?
We're going to keep this simple for the first version by storing all tests in one schema with a set name. We envision adding configuration for this later, especially if there is user demand.
Why does dbt drop and recreate the dbt test audit schema every time?
dbt test
is a stateless, idempotent operation;dbt test --store-results
will be as well. If you run it several times, you will end up with the same artifacts.How can I preserve history of past test failures?
The same way you preserve historical data from sources, models, or anything else: snapshots on top of tables in the audit schema, or your own custom run-operation macros to copy or unload that data to external storage.
Granular control
The
--store-results
flag flipsdbt test
from running a set of queries that performs only introspection to a set of queries that create assets in the database.Will it be possible to store results for some tests, and not for others?
In the first version of this, the
--store-results
flag will change the behavior of all tests run. Eventually, we envision adding a test config flag (store_results: false
) that takes precedence over the CLI flag, similar to changes to--full-refresh
in 0.18.0.Until then, you can manage this with multiple commands and node selection:
The text was updated successfully, but these errors were encountered: