-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split out test-iasworld-data
workflow from test-dbt-models
so that data integrity tests will fail properly
#671
Split out test-iasworld-data
workflow from test-dbt-models
so that data integrity tests will fail properly
#671
Conversation
…test it" This reverts commit 59440f9.
# Default to no select option, which will fall back to the default | ||
# behavior of the underlying Python script | ||
SELECT_OPTION="" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block is different from the old implementation of test-dbt-models
: Now that this workflow is dedicated to running iasWorld tests, we can fall back to the default behavior of the underlying run_iasworld_data_tests.py
script, which means we can avoid passing in the --select
or --selector
options unless the caller specifies one of them.
# This workflow is not configured to run on PRs, because PRs have their own CI | ||
# process defined by the `build-and-test-dbt` workflow. Instead, the Data | ||
# Department's Spark data extraction process dispatches it via API once per | ||
# day after finishing the daily ingest of iasWorld data. For more detail, see: | ||
# | ||
# https://github.com/ccao-data/service-spark-iasworld |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is slightly different, to reflect the new conditions in which this workflow runs compared to the old test-dbt-models
workflow.
subject: "iasWorld tests errored for workflow run: ${{ github.run_id }}" | ||
body: | | ||
iasWorld tests raised an error for workflow ${{ github.run_id }}, run on ${{ env.TIMESTAMP }} UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This subject and body are now slightly different from the old implementation ("iasWorld tests" instead of "dbt tests") to make it clear that the message specifically relates to iasWorld tests.
@@ -0,0 +1,141 @@ | |||
name: test-iasworld-data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This workflow should be nearly identical to the old implementation of test-dbt-models
, but GitHub can't pick up on the diff because we're keeping test-dbt-models
and adapting it for a slightly more specific purpose. I'll call out every change I made in comments below.
test-iasworld-data
workflow from test-dbt-models
so that data integrity tests will failtest-iasworld-data
workflow from test-dbt-models
so that data integrity tests will fail properly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright @jeancochrane, this looks good to me. I think the separation here makes sense. Let's get this and the Spark PR merged today.
One tiny blocking issue: we need to update the README to reflect the new workflows and when they're run.
$DEFER_OPTION | ||
working-directory: ${{ env.PROJECT_DIR }} | ||
shell: bash | ||
env: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question (non-blocking): Why pass these in as env vars rather than just interpolating them directly into the bash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! There's no functional reason, I just felt like it was clearer in this particular case. For USER
, GIT_SHA
, GIT_REF
, and GIT_AUTHOR
, the script expects these environment variables to be set, so we need to convert the GitHub variables to env vars anyway. For SELECT_OPTION
, CACHE_HIT
, and SKIP_ARTIFACTS_OPTION
, I felt like the GitHub variable references were a bit long/esoteric, and it feels easier to me to read them as env vars. Definitely a weak preference on my part, I'm open to changing any of these if you think a GitHub variable would be clearer.
Co-authored-by: Dan Snow <31494343+dfsnow@users.noreply.github.com>
There is currently a bug in our weekly scheduled data integrity tests: Failing tests do not cause the workflow to fail, so we never get notified of them. (See here for an example workflow run that should have failed.) This makes sense in that the
test-dbt-models
workflow was originally intended to be used exclusively to run iasWorld tests, so it doesn't fail the workflow if tests fail, because we expect some iasWorld tests to fail. This doesn't work for data integrity tests, however, since in the case the workflow should fail if any test fails.I thought about modifying the
test-dbt-models
workflow and its underlyingrun_iasworld_data_tests.py
script to add a new flag to exit with a nonzero status if a test fails, but I realized that this would bloatrun_iasworld_data_tests.py
with a feature that it doesn't need. What's more, it's unreasonable to use therun_iasworld_data_tests.py
script to run data integrity tests in the first place, since iasWorld tests and data integrity tests have been categorically different since #638. In my opinion, this is the root cause of the bug, and will continue causing problems until we disentangle the two testing workflows.In order to fix the problem at the root, this PR splits out
test-dbt-models
into two workflows with different uses:test-iasworld-data
, which runs iasWorld tests via workflow dispatch using therun_iasworld_data_tests.py
scripttest-dbt-models
, which runs any dbt tests via workflow dispatch or runs data integrity tests on a schedule using directdbt test
callsAs a result, we can configure
test-dbt-models
to fail if any tests fail, whiletest-iasworld-data
can maintain its current behavior by not failing when tests fail.Examples of the two workflows running as expected:
test-iasworld-data
test-dbt-models
(Note that this workflow failed, but that's what we expect)Note
Since this PR changes the purpose of the
test-dbt-models
workflow, we need to change theservice-spark-iasworld
to dispatch the newtest-iasworld-data
workflow instead. See this companion PR: ccao-data/service-spark-iasworld#25