Split out `test-iasworld-data` workflow from `test-dbt-models` so that data integrity tests will fail properly #671

jeancochrane · 2024-12-05T22:46:21Z

There is currently a bug in our weekly scheduled data integrity tests: Failing tests do not cause the workflow to fail, so we never get notified of them. (See here for an example workflow run that should have failed.) This makes sense in that the test-dbt-models workflow was originally intended to be used exclusively to run iasWorld tests, so it doesn't fail the workflow if tests fail, because we expect some iasWorld tests to fail. This doesn't work for data integrity tests, however, since in the case the workflow should fail if any test fails.

I thought about modifying the test-dbt-models workflow and its underlying run_iasworld_data_tests.py script to add a new flag to exit with a nonzero status if a test fails, but I realized that this would bloat run_iasworld_data_tests.py with a feature that it doesn't need. What's more, it's unreasonable to use the run_iasworld_data_tests.py script to run data integrity tests in the first place, since iasWorld tests and data integrity tests have been categorically different since #638. In my opinion, this is the root cause of the bug, and will continue causing problems until we disentangle the two testing workflows.

In order to fix the problem at the root, this PR splits out test-dbt-models into two workflows with different uses:

test-iasworld-data, which runs iasWorld tests via workflow dispatch using the run_iasworld_data_tests.py script
test-dbt-models, which runs any dbt tests via workflow dispatch or runs data integrity tests on a schedule using direct dbt test calls

As a result, we can configure test-dbt-models to fail if any tests fail, while test-iasworld-data can maintain its current behavior by not failing when tests fail.

Examples of the two workflows running as expected:

test-iasworld-data
test-dbt-models (Note that this workflow failed, but that's what we expect)

Note

Since this PR changes the purpose of the test-dbt-models workflow, we need to change the service-spark-iasworld to dispatch the new test-iasworld-data workflow instead. See this companion PR: ccao-data/service-spark-iasworld#25

…test it" This reverts commit 59440f9.

jeancochrane · 2024-12-05T23:40:14Z

.github/workflows/test_iasworld_data.yaml

+          # Default to no select option, which will fall back to the default
+          # behavior of the underlying Python script
+          SELECT_OPTION=""


This block is different from the old implementation of test-dbt-models: Now that this workflow is dedicated to running iasWorld tests, we can fall back to the default behavior of the underlying run_iasworld_data_tests.py script, which means we can avoid passing in the --select or --selector options unless the caller specifies one of them.

jeancochrane · 2024-12-05T23:40:43Z

.github/workflows/test_iasworld_data.yaml

+# This workflow is not configured to run on PRs, because PRs have their own CI
+# process defined by the `build-and-test-dbt` workflow. Instead, the Data
+# Department's Spark data extraction process dispatches it via API once per
+# day after finishing the daily ingest of iasWorld data. For more detail, see:
+#
+# https://github.com/ccao-data/service-spark-iasworld


This comment is slightly different, to reflect the new conditions in which this workflow runs compared to the old test-dbt-models workflow.

jeancochrane · 2024-12-05T23:42:04Z

.github/workflows/test_iasworld_data.yaml

+          subject: "iasWorld tests errored for workflow run: ${{ github.run_id }}"
+          body: |
+            iasWorld tests raised an error for workflow ${{ github.run_id }}, run on ${{ env.TIMESTAMP }} UTC


This subject and body are now slightly different from the old implementation ("iasWorld tests" instead of "dbt tests") to make it clear that the message specifically relates to iasWorld tests.

jeancochrane · 2024-12-05T23:43:08Z

.github/workflows/test_iasworld_data.yaml

@@ -0,0 +1,141 @@
+name: test-iasworld-data


This workflow should be nearly identical to the old implementation of test-dbt-models, but GitHub can't pick up on the diff because we're keeping test-dbt-models and adapting it for a slightly more specific purpose. I'll call out every change I made in comments below.

dfsnow

Alright @jeancochrane, this looks good to me. I think the separation here makes sense. Let's get this and the Spark PR merged today.

One tiny blocking issue: we need to update the README to reflect the new workflows and when they're run.

.github/workflows/test_dbt_models.yaml

dfsnow · 2024-12-06T20:33:28Z

.github/workflows/test_iasworld_data.yaml

+            $DEFER_OPTION
+        working-directory: ${{ env.PROJECT_DIR }}
+        shell: bash
+        env:


question (non-blocking): Why pass these in as env vars rather than just interpolating them directly into the bash?

Good question! There's no functional reason, I just felt like it was clearer in this particular case. For USER, GIT_SHA, GIT_REF, and GIT_AUTHOR, the script expects these environment variables to be set, so we need to convert the GitHub variables to env vars anyway. For SELECT_OPTION, CACHE_HIT, and SKIP_ARTIFACTS_OPTION, I felt like the GitHub variable references were a bit long/esoteric, and it feels easier to me to read them as env vars. Definitely a weak preference on my part, I'm open to changing any of these if you think a GitHub variable would be clearer.

Co-authored-by: Dan Snow <31494343+dfsnow@users.noreply.github.com>

…odels

jeancochrane added 2 commits December 5, 2024 16:38

Split out test-iasworld-data workflow from test-dbt-models

cd123f6

Temporarily run test-iasworld-data workflow on PRs so we can test it

59440f9

jeancochrane mentioned this pull request Dec 5, 2024

Call new workflow for running iasWorld tests ccao-data/service-spark-iasworld#25

Merged

jeancochrane added 2 commits December 5, 2024 17:01

Revert "Temporarily run test-iasworld-data workflow on PRs so we can …

420d2c2

…test it" This reverts commit 59440f9.

Rename "iasWorld tests" -> "iasWorld data tests"

6225116

jeancochrane commented Dec 5, 2024

View reviewed changes

jeancochrane changed the title ~~Split out test-iasworld-data workflow from test-dbt-models so that data integrity tests will fail~~ Split out test-iasworld-data workflow from test-dbt-models so that data integrity tests will fail properly Dec 5, 2024

jeancochrane marked this pull request as ready for review December 5, 2024 23:44

jeancochrane requested a review from a team as a code owner December 5, 2024 23:44

jeancochrane requested a review from dfsnow December 5, 2024 23:44

dfsnow approved these changes Dec 6, 2024

View reviewed changes

jeancochrane and others added 2 commits December 6, 2024 15:15

Clarify notification comment in .github/workflows/test_dbt_models.yaml

6cb2d36

Co-authored-by: Dan Snow <31494343+dfsnow@users.noreply.github.com>

Match notifications comment between test-iasworld-data and test-dbt-m…

92e6c96

…odels

jeancochrane merged commit 1e65774 into master Dec 6, 2024
7 checks passed

jeancochrane deleted the jeancochrane/split-out-iasworld-data-test-workflow branch December 6, 2024 21:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split out `test-iasworld-data` workflow from `test-dbt-models` so that data integrity tests will fail properly #671

Split out `test-iasworld-data` workflow from `test-dbt-models` so that data integrity tests will fail properly #671

jeancochrane commented Dec 5, 2024 •

edited

Loading

jeancochrane Dec 5, 2024

jeancochrane Dec 5, 2024

jeancochrane Dec 5, 2024

jeancochrane Dec 5, 2024

dfsnow left a comment

dfsnow Dec 6, 2024

jeancochrane Dec 6, 2024 •

edited

Loading

Split out test-iasworld-data workflow from test-dbt-models so that data integrity tests will fail properly #671

Split out test-iasworld-data workflow from test-dbt-models so that data integrity tests will fail properly #671

Conversation

jeancochrane commented Dec 5, 2024 • edited Loading

jeancochrane Dec 5, 2024

Choose a reason for hiding this comment

jeancochrane Dec 5, 2024

Choose a reason for hiding this comment

jeancochrane Dec 5, 2024

Choose a reason for hiding this comment

jeancochrane Dec 5, 2024

Choose a reason for hiding this comment

dfsnow left a comment

Choose a reason for hiding this comment

dfsnow Dec 6, 2024

Choose a reason for hiding this comment

jeancochrane Dec 6, 2024 • edited Loading

Choose a reason for hiding this comment

Split out `test-iasworld-data` workflow from `test-dbt-models` so that data integrity tests will fail properly #671

Split out `test-iasworld-data` workflow from `test-dbt-models` so that data integrity tests will fail properly #671

jeancochrane commented Dec 5, 2024 •

edited

Loading

jeancochrane Dec 6, 2024 •

edited

Loading