Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mantle/kola: Auto enable rerun success in some failure scenarios #3494

Merged
merged 8 commits into from
May 31, 2023

Conversation

dustymabe
Copy link
Member

This will auto enable rerun success for some failure scenarios including timeouts and some console checks.

This tag will be used when the test framework detects tests failing
in specific ways that we've decided we want to allow rerun success
for.
This will allow tests that get aborted due to a timeout to not fail
the overall run if a rerun is attempted and it passes. One example of
where this is useful is if a test times out on initialization and never
reaches the machine via SSH (which we see regularly). In that case
is the failure an issue with the platform bringing up the machine or
a fundamental issue with the software inside FCOS/RHCOS? If it's an
issue with the platform and the rerun succeeds then we don't want
to see a failure. If it's a fundamental issue with the software
inside (i.e. Ignition fails) then the rerun will fail anyway and this
will do no harm.
This patch enables us to have console checks that are non-fatal and
will print a log message to the screen and nothing more, which was
stated as desirable in [1].

This commit also re-works the implementation of the console/journal
checks in runTest() to deduplicate code. It has the side-effect of
making SkipConsoleWarnings apply to journal checks too, but I think
that's actually a benefit and not a negative.

[1] coreos#3450 (comment)
This will make it so there are console checks we can define that,
if configured, will mark a test as able to have a test run succeed
if a rerun succeeds.
Now that we have warnOnly and allowRerunSuccess capabilities in
our consoleChecks let's add back in the kernel soft lockup check
that was removed in 7283c89.
This function conveniently exists so let's use it.
Since we call runProvidedTests() for both the first run and the rerun
let's not call the variable firstRunErr since in the nested call that
actually won't be accurate. Let's just call it runErr instead.
In this case it's usually the platform having some internal failure.
If it succeeds in the rerun then just be happy with that.
Copy link
Member

@gursewak1997 gursewak1997 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good overall.

@dustymabe dustymabe merged commit 8f7c06a into coreos:main May 31, 2023
2 checks passed
@dustymabe dustymabe deleted the dusty-more-rerun-success branch May 31, 2023 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants