Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FAILURE status on openjdkXX-pipelines when platform build jobs are UNSTABLE #750

Closed
sxa opened this issue Jul 4, 2023 · 6 comments
Closed
Assignees

Comments

@sxa
Copy link
Member

sxa commented Jul 4, 2023

What are you trying to do?
Get good status from the openjdkXX-pipeline jobs

Expected behaviour:
Builds are not marked as failure (red) if only test jobs are failing (possibly with the exceptions of smoke tests)

Observed behaviour:
Failure status showing up when platform build jobs are UNSTABLE

Any other comments:
Check possibly introduced by d8615ae
Sample pipeline showing the problem: https://ci.adoptium.net/job/build-scripts/job/openjdk11-pipeline/2417/consoleFull
00:32:08 ERROR: Propagating downstream job result: build-scripts/jobs/jdk11u/jdk11u-windows-x86-32-temurin, Result: UNSTABLE CopyArtifactsSuccess: true
Slack thread: https://adoptium.slack.com/archives/C09NW3L2J/p1688484427574299?thread_ts=1688132895.637959&cid=C09NW3L2J

@adamfarley
Copy link
Contributor

adamfarley commented Jul 5, 2023

Ok, my theory is that context.error is causing this problem.

When this change went in, I think Andrew took a look at lines 884 and 885 build_base_file.groovy and assumed that context.error simply generates a log message, with currentBuild.result doing the actual changes to the end state of the pipeline job.

So when he locked up the .result call inside an if statement, he left the .error call unprotected, assuming (as I would) that it was simply informational.

However, I think the big ERROR in the output indicates that the .error call can set the failure state of a job as well as posting the message.

To prove/disprove this, I have a jdk17 pipeline running here, with a single build and a known "unsafe" test target:

https://ci.adoptium.net/job/build-scripts/job/openjdk17-pipeline/730/

In that pipeline, I've replaced the .error call with a .println call.

@adamfarley
Copy link
Contributor

Ok, the tests in that pipeline did not hit "unstable" state, so the changes were not tested. Still, this proves the change doesn't break anything, so I've pushed the changes in here.

@sxa - Please launch your big pipeline and we'll see if the pipeline job correctly achieves Unstable state when (assuming none of the test jobs fail, but at least one achieves Unstable state).

@adamfarley adamfarley self-assigned this Jul 6, 2023
@adamfarley
Copy link
Contributor

Ok, lots of pipeline build failures recently. We're working on fixing those as a team, but in the meantime here's a reduced pipeline on seemingly-reliably-unsafe platforms: https://ci.adoptium.net/job/build-scripts/job/openjdk11-pipeline/2429/

Hopefully we'll see passed builds and at least one unsafe build. If the pipeline status is also unsafe, we can close this issue.

@adamfarley
Copy link
Contributor

The pipeline finished with only passed and unstable builds, but though my change is verified as having been used, the pipeline still fails. It's a more useful error now, but the main problem behind this issue remains.

@adamfarley
Copy link
Contributor

adamfarley commented Jul 12, 2023

Ok, got a new theory. The currentBuild.result may be initially set to null. If it is, then we're failing every time we check if the current result is success/unsafe.

Either way, we need more detailed error messages here. Working on a patch to solve the problem (in theory), and also to provide more information when we change pipeline status.

Update: Change complete.

PR here: #759

Testing PR here: https://ci.adoptium.net/job/build-scripts/job/openjdk11-pipeline/2438/

Update: Test run failed because it wasn't using my pipeline. Will kick it off again from upstream after the release is over.

@adamfarley
Copy link
Contributor

The above PR was tested again after the release and proved to work. Merged. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Done
Development

No branches or pull requests

2 participants