Provide option to allow tests to be ran before a model's data is updated #5687
Replies: 7 comments 16 replies
-
+1 for this ability. Realize it may not be the easiest implementation. My org is starting to experiment with table/dataset cloning as a means to work with dbt's current functionality. Even contemplating building a clone materialization that would only run if all tests are passed (using |
Beta Was this translation helpful? Give feedback.
-
#1054 has an interesting discussion on this topic. |
Beta Was this translation helpful? Give feedback.
-
@adamcunnington-mlg This is a tremendous start to a discussion! You've compiled some great research here. I am going to convert this to a discussion, in matter of fact. I also agree with @dbeatty10 that #1054 offers some great historical context on the ideas we've had in the past, as well as the tripwires we've come across. I appreciate the problem you're raising—when bad data's in the production pipeline, it's in the production pipeline—and I agree it's a very common one. There are a diversity of tactics folks use when solving it today. The two most common I see:
While not perfect solutions, those two constructs have served many quite well. They care more about the holistic deployment of an entire DAG, rather than treating each model as its own microcosm. I do think that dbt could do more to make the write-audit-publish / "blue-green" pattern more readily available. It is certainly easiest on Snowflake, due to its support of zero-copy cloning, though still requires some custom macros + operations to string together. BigQuery also supports zero-cost copying, for tables and datasets, and (I understand) is rolling out capabilities around table cloning (for discussion in dbt-labs/dbt-bigquery#270). Once there is a critical mass of support among a few of the most popular adapters, I feel more comfortable with the idea of building out an adapter-agnostic abstraction within dbt. Each model to its ownThere are three patterns I could see to support this on a model-by-model basis, all of which are much more case-by-case:
I agree that, in the ideal case, there would be a common abstraction in dbt to identify which tests are worth rolling back for—with adapter-specific implementations behind the scenes—just as Let's keep the discussion going :) |
Beta Was this translation helpful? Give feedback.
-
@jtcohen6 thanks for the response - exactly the sort of discussion I was hoping to generate! I hadn't fully appreciated (despite the fact that I am actually benefiting from the exact behaviour you described) that the models are dependent on the sources and so source tests do provide a significant safety barrier. It doesn't cause the discussion to be moot as we can still miss data issues through not catching edge cases in our source tests and then model tests may fail (and it's too late) but I think it would be good to increase the emphasis of this in docs. Somewhere during the tutorial / getting started pages, it could be really clear that tests run after models but the way dbt build works is such that models will not run at all if source test fails and thus source tests are a critical opportunity to verify upstream data integrity. This reduces the job of model tests to then be i) validating the model logic is good - which is part of typical software dev/test workflow and falls into the solved problem 1 category I outlined before, and ii) catching edge cases that are pertinent to the logic of the model - which should drive a feedback loop in identifying gaps in source tests and improving them. In fact, the above leads me to think that it would be a valid conclusion for the original hope of this discussion to be met with "it's not feasible given the amount of effort, and that source tests run first" - this is not something I had appreciated at all at the top of this conversation. Onto the options though.. thanks for listing them out - I guess as much as possible, for sake of code maintainability (and accommodating future adapters), you want as few strategies as possible for achieving this - some of what you have listed are broad-brush strategies and others (e.g. constraints) are specific to particular types of tests, nevermind particular adapters. I guess leaning on the former as much as possible (even with limitations) may be preferable. I.e. if any dev work to be done, would a good starting point be to implore the transaction rollback approach behind a common model property that only has an effect on adapters that support transactions, and just call this out as a limitation? It feels like it would be the smallest code change too. Logically:
|
Beta Was this translation helpful? Give feedback.
-
Not sure if this is exactly related, but: Being able to run newly developed tests and macros from a user's local machine against an environment where they don't have the rights to build models (create tables), would be extremely helpful. Right now, if a user doesn't have CREATE TABLE rights, when they Running the local dbt project in a temp environment against real data would be amazingly useful. |
Beta Was this translation helpful? Give feedback.
-
@adamcunnington-mlg very useful discussion! With the "run, test, and only then deploy models to production" approach, models downstream of a failed test will run needlessly. However with Since |
Beta Was this translation helpful? Give feedback.
-
wondering if dbt clone changes things here.. thinking (similar to others here):
my current team doesn't have a pager and failing jobs can sometimes go days without being noticed. seems like this is a totally doable implementation especially given the benefit of only having tested data in production. I see this as a logical next step in line with the shift from dbt run to dbt build. |
Beta Was this translation helpful? Give feedback.
-
Describe the feature
Context
Currently, DBT tests run after a model is built.
If a test fails, there is some characteristic of the data that is unexpected - but it's too late. That "bad" data is in the model and is impacting downstream models - the integrity of the data is broken and this could flow through to other models where crazy things might happen (row explosion uh oh) and use cases (i.e. real-time dashboards) suddenly show the wrong data.
There is no straight forward solution to this problem. On snowflake, there is a neat way of achieving blue/green (although I think it's a slight misuse of the term blue/green because we're talking about new data, not [necessarily] new software) because Snowflake supports partition swaps. Others have implemented more complex workarounds such as managing separate staging and production environments - orchestrated either via Airflow, or perhaps achieved in DBT via logical layers with different run commands. Calogica have a nice write-up about how they use DBT to achieve this here.
My conjecture though is that this is a really core need that should be possible, natively, within DBT.
Acknowledgements
I am aware that there are use cases where you don't want to prevent "bad" data flowing through in the event of a test failure - but I do believe it should be an option - and should probably be the default behaviour (ignoring for one moment compromises that might be made when rolling out such a feature as to not cause a breaking change).
Also, it's worth calling out that tests actually serve two purposes:
These are arguably quite different things. The former is more like a conventional software test and we'd expect to run that before deploying our updated artifacts (in this case, updated model code) - which we do via a typical DBT dev workflow. The second is a runtime consideration relating to the assertion of our data's profile. This is a very different thing and why it's so important to be able to succeed the tests before updating the model's data (or at least have the option to do this). The first is a solved problem. The second is not.
Discussion
DBT makes decisions about what should be abstracted behind a common layer and what should be adapter-specific. It feels to me like the ability, and user-facing interface, for having tests run (logically) before a model is updated, should be adapter-agnostic - but I appreciate this is a totally non-trivial problem to solve. Perhaps adapter-specific could be a starting point and this could be homogenised later.
I'm also aware, at the other end of the spectrum, that if you are going to consider running tests "transactionally" before a model is updated, what about a model that is dependent on another model? You may wish to update both or none at all. Sure! But one step at a time.
I imagine a world where there is a per-adapter implementation but perhaps hidden behind a common model property which controls whether tests run before or after.
Specifically, and selfishly, I want this capability in BigQuery. BigQuery is quite limited in terms of tools it gives us to help but multi-statement transactions have been round a shortwhile and are due to move from preview to GA in the not too distant future. I know transactions are not a silver-bullet solution to everything but maybe there's at least some mileage in exploring the feasibility of updating model and then running all model & column-level tests within the same transaction and rolling back if any fail (maybe the failure behaviour is a config that belongs to the
tests:
array items rather than at the model level)?Keen to start a discussion on this!
Describe alternatives you've considered
Managing a custom blue/green approach.
Who will this benefit?
A lot of people. I don't have any hard numbers to back this up but I suspect that the majority of DBT users would not expect a model to be updated if tests fail. I imagine some of those users implement workarounds which cost development time and processing resources and opportunity cost in complex support routines. I imagine that others don't do anything in particular but just set expectation with users (directly or otherwise) that when data breaks, it'll be fixed quickly - but it will be broken for a period of time. Most data analytics still isn't mission critical after all (despite what we all say).
Are you interested in contributing this feature?
Perhaps! Certainly dev resources within my team.
Anything else?
https://cloud.google.com/bigquery/docs/reference/standard-sql/transactions
Beta Was this translation helpful? Give feedback.
All reactions