-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-3121] [Implementation] get schema of inputs prior to executing unit test command #8649
Comments
Idea 1If the datatype is explicitly provided, let's use that:
If there's no indication, we "guess" (similar to what we do for seeds - agate library; or whatever the data warehouse would guess for the inputed data). But!! If you don't provide us anything (columns that you don't care about), we can't guess. What if we said:
Future aspirations:
Idea 2Can we use SQL parsing to get the types? With the schemas for sources from Idea 3
Idea 4What if we could somehow "clean" the SQL to only select inputted columns? Idea 5Can we run the catalog queries for the direct parents of the model selected for the unit test command and make those available in the unit testing jinja context for rendering inputs and expected outputs SQL? We could make use of the applied state work to filter catalog queries to only run the catalog queries for the relevant models #8648 But catalog queries will fail if you haven't previously built the relevant models... |
Idea 6cc: @jtcohen6
Provided we:
This is just However - thing that you wouldn't be able to do: |
Idea 7Can we
|
Our requirement -> "We need to know the data types of my current logical state (for direct parents of our unit tested models)"
Possible solutions:
|
What's the simplest thing we could do here:
We will need to confirm when we should add '' around the mocked data. Let's see how far this gets us, and if this is "good enough" or if we need to take one of the more complicated approaches listed above. |
A few follow up questions:
|
From a discussion with @dbeatty10 and @jtcohen6: What if we supply folks 3 main options: Option 1: supply mock data for all of your referenced columns |
Closing in favor of:
which covers option 2 and option 3. We can revisit option 1 if desired. |
Housekeeping
Short description
Instead of requiring folks to run
dbt run --select +my_model --exclude my_model
before executing unit tests onmy_model
or requiring the use of--deferral
(which would cause issues in a CI context when you're deferring to a production environment and may have changes to your inputs as part of the PR)...we should be able to run
dbt unit-test --select my_model
in a new environment without having previously rundbt run --select +my_model --exclude my_model
ordbt docs generate
Acceptance criteria
dbt unit-test --select my_model
in a new environment without having previously rundbt run --select +my_model --exclude my_model
ordbt docs generate
Impact to Other Teams
None
Will backports be required?
No
Context
This is how the fixture sql is currently being generated, using the
adapter.get_columns_in_relation
to dynamically query the existing input table (or view) during input rendering: https://github.com/dbt-labs/dbt-core/blob/unit_testing_feature_branch/core/dbt/include/global_project/macros/unit_test_sql/get_fixture_sql.sql#L6-L9The text was updated successfully, but these errors were encountered: