Allow use of sources as unit testing inputs #9059

gshank · 2023-11-12T00:15:10Z

resolves #8507

Problem

We want to support the use of sources as inputs in unit test cases.

Solution

Created a UnitTestSourceDefinition object, which acts as a source for purposes of resolving "source" calls, but acts as a model for purpose of executing the test case.

Checklist

I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX
This PR includes type annotations for new and modified functions

codecov · 2023-11-12T00:17:50Z

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (436dae6) 86.80% compared to head (c5b4428) 86.81%.

Files	Patch %	Lines
core/dbt/context/providers.py	87.50%	1 Missing ⚠️

Additional details and impacted files

@@                     Coverage Diff                      @@
##           unit_testing_feature_branch    #9059   +/-   ##
============================================================
  Coverage                        86.80%   86.81%           
============================================================
  Files                              181      181           
  Lines                            27057    27075   +18     
============================================================
+ Hits                             23488    23505   +17     
- Misses                            3569     3570    +1

Flag	Coverage Δ
integration	`83.82% <97.22%> (+0.01%)`	⬆️
unit	`64.57% <36.11%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

aranke · 2023-11-13T17:58:34Z

core/dbt/adapters/base/relation.py

@@ -263,8 +271,10 @@ def create_from(
        node: ResultNode,
        **kwargs: Any,
    ) -> Self:
-        if node.resource_type == NodeType.Source:
-            if not isinstance(node, SourceDefinition):
+        if node.resource_type == NodeType.Source or isinstance(node, UnitTestSourceDefinition):


Can we simplify the logic here?

How? We can't set the resource_type to Source because that breaks execution.

core/dbt/parser/unit_tests.py

gshank · 2023-11-13T21:20:39Z

I looked at taking out the special casing of UnitTestSourceDefinition, but unfortunately there are subtle differences in the specification of quoting between sources and models, and so I think it's best to actually use the relation.create_from_source to get the quoting right.

MichelleArk · 2023-11-14T15:50:27Z

core/dbt/parser/unit_tests.py

+                    source_name=original_input_node.source_name,  # needed for source lookup
+                )
+                # Sources need to go in the sources dictionary in order to create the right lookup
+                self.unit_test_manifest.sources[input_node.unique_id] = input_node  # type: ignore


do we anticipate any issues by having the sources dictionary contain a unique_id key that is prefixed with model instead of source here?

It doesn't seem to care. We don't actually check the unique_id prefix that I can recall. If somebody starts parsing the unit_test_manifest, I suppose it might be confusing. But right now we're putting it in two places, so one of them will be wrong.

This probably isn't worth spending tons of time on.. but I think it could be possible to get around having to add the node to manifest.sources and do the lookup from the .nodes collection in UnitTestRuntimeSourceResolver since the unique_id will include source_name. kind of like what's done here: https://github.com/dbt-labs/dbt-core/blob/unit_testing_feature_branch/core/dbt/context/providers.py#L578

Not entirely sure what's more readable or less complex in this case. I can imagine having to maintain UnitTestSourceDefinitions across both dictionaries could be error-prone though..

But right now we're putting it in two places, so one of them will be wrong.

Given that UnitTestSourceDefinition is a ModelNode, I think having it in nodes is 'more' correct

I think the lookup behavior of sources and nodes is subtly different with regard to the meaning of package=None, so I don't think looking up sources as though they were nodes is worth it.

MichelleArk · 2023-11-14T17:46:29Z

core/dbt/parser/unit_tests.py

+                # Sources need to go in the sources dictionary in order to create the right lookup
+                self.unit_test_manifest.sources[input_node.unique_id] = input_node  # type: ignore
+
+            # Both ModelNode and UnitTestSourceDefinition need to go in nodes dictionary


for my own understanding - is this to enable cte injection?

Yeah. There's code in compilation.py that looks up the existence of the cte in the nodes dictionary: if cte.id not in manifest.nodes:.

In theory we could also check for a UnitTestSourceDefinition and in sources, but that didn't feel like an improvement.

core/dbt/context/providers.py

MichelleArk · 2023-11-14T18:00:21Z

core/dbt/parser/unit_tests.py

+                "resource_type": NodeType.Model,
+                "package_name": package_name,
+                "original_file_path": original_input_node.original_file_path,
+                "unique_id": f"model.{package_name}.{input_name}",


I think we may need to include source_name in input_name to avoid clobbering sources with the same table_name but different source_names when they are inserted to manifest.nodes and manifest.sources below.

Good point.

I've changed this to include the source_name. This does make for pretty long unique_ids. Do we have any concerns about that? It's not like we're using that name to construct tables or anything...

So far I've noticed this issue creep up in #9015. I think we could shorten the node name for CTE generation (since it doesn't need to be unique) but keep the unique_id longer

core/dbt/parser/unit_tests.py

…ut_sources

package when looking up source

* Initial implementation of unit testing (from pr #2911) Co-authored-by: Michelle Ark <michelle.ark@dbtlabs.com> * 8295 unit testing artifacts (#8477) * unit test config: tags & meta (#8565) * Add additional functional test for unit testing selection, artifacts, etc (#8639) * Enable inline csv format in unit testing (#8743) * Support unit testing incremental models (#8891) * update unit test key: unit -> unit-tests (#8988) * convert to use unit test name at top level key (#8966) * csv file fixtures (#9044) * Unit test support for `state:modified` and `--defer` (#9032) Co-authored-by: Michelle Ark <michelle.ark@dbtlabs.com> * Allow use of sources as unit testing inputs (#9059) * Use daff for diff formatting in unit testing (#8984) * Fix #8652: Use seed file from disk for unit testing if rows not specified in YAML config (#9064) Co-authored-by: Michelle Ark <MichelleArk@users.noreply.github.com> Fix #8652: Use seed value if rows not specified * Move unit testing to test and build commands (#9108) * Enable unit testing in non-root packages (#9184) * convert test to data_test (#9201) * Make fixtures files full-fledged members of manifest and enable partial parsing (#9225) * In build command run unit tests before models (#9273) --------- Co-authored-by: Michelle Ark <michelle.ark@dbtlabs.com> Co-authored-by: Michelle Ark <MichelleArk@users.noreply.github.com> Co-authored-by: Emily Rockman <emily.rockman@dbtlabs.com> Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com> Co-authored-by: Kshitij Aranke <kshitij.aranke@dbtlabs.com>

Allow use of sources as unit testing inputs

e92f753

gshank requested review from a team as code owners November 12, 2023 00:15

gshank requested review from mikealfare and emmyoop November 12, 2023 00:15

cla-bot bot added the cla:yes label Nov 12, 2023

gshank requested review from MichelleArk and removed request for mikealfare November 13, 2023 14:54

aranke reviewed Nov 13, 2023

View reviewed changes

MichelleArk reviewed Nov 13, 2023

View reviewed changes

core/dbt/parser/unit_tests.py Outdated Show resolved Hide resolved

MichelleArk reviewed Nov 13, 2023

View reviewed changes

core/dbt/parser/unit_tests.py Show resolved Hide resolved

MichelleArk reviewed Nov 13, 2023

View reviewed changes

core/dbt/parser/unit_tests.py Outdated Show resolved Hide resolved

gshank added 2 commits November 13, 2023 15:40

Convert statically_parsed sets to a list

17571dd

Some minor refactoring

27b99bd

MichelleArk reviewed Nov 14, 2023

View reviewed changes

core/dbt/context/providers.py Outdated Show resolved Hide resolved

MichelleArk reviewed Nov 14, 2023

View reviewed changes

core/dbt/parser/unit_tests.py Outdated Show resolved Hide resolved

gshank added 2 commits November 14, 2023 17:29

Merge branch 'unit_testing_feature_branch' into 8507-unit_testing_inp…

98a1b8b

…ut_sources

Include source_name in UnitTestSourceDefinition unique_id. Don't pass

c5b4428

package when looking up source

gshank requested review from MichelleArk and graciegoheen November 14, 2023 23:16

MichelleArk approved these changes Nov 15, 2023

View reviewed changes

gshank merged commit c6be2d2 into unit_testing_feature_branch Nov 15, 2023
49 checks passed

gshank deleted the 8507-unit_testing_input_sources branch November 15, 2023 15:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow use of sources as unit testing inputs #9059

Allow use of sources as unit testing inputs #9059

gshank commented Nov 12, 2023

codecov bot commented Nov 12, 2023 •

edited

Loading

aranke Nov 13, 2023

gshank Nov 13, 2023 •

edited

Loading

gshank commented Nov 13, 2023

MichelleArk Nov 14, 2023

gshank Nov 14, 2023

MichelleArk Nov 14, 2023 •

edited

Loading

gshank Nov 14, 2023

MichelleArk Nov 14, 2023 •

edited

Loading

gshank Nov 14, 2023

gshank Nov 14, 2023

MichelleArk Nov 14, 2023

gshank Nov 14, 2023

gshank Nov 14, 2023

MichelleArk Nov 15, 2023

Allow use of sources as unit testing inputs #9059

Allow use of sources as unit testing inputs #9059

Conversation

gshank commented Nov 12, 2023

Problem

Solution

Checklist

codecov bot commented Nov 12, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

gshank Nov 13, 2023 • edited Loading

Choose a reason for hiding this comment

gshank commented Nov 13, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichelleArk Nov 14, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichelleArk Nov 14, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Nov 12, 2023 •

edited

Loading

gshank Nov 13, 2023 •

edited

Loading

MichelleArk Nov 14, 2023 •

edited

Loading

MichelleArk Nov 14, 2023 •

edited

Loading