-
-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Produce better errors for failing jobsets #1025
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you. Let me know about that question and I'll happily merge.
Maybe it’s time to consider merging this now and worrying about that small cleanup later? |
757b979
to
8478697
Compare
Whoops, meant to get back to this. Extracted the logic out into its own function, but it's a bit messy because it's a "get or perform spooky side-effects" lambda. Not sure that's a huge improvement. |
At the moment, the jobset object is unlikely to actually retrieve the evaluation error output, because it isn't refreshed after hydra-eval-jobsets is run. Explicitly calling DBIx::Class::Row->discard_changes causes any updated data to be refreshed, at the cost of losing any not-yet committed changes to the row.
At the moment, aggregate jobs can easily break and cause the entire evaluation to fail, which is not ideal. For Nixpkgs, we do have some important aggregate jobs (like `tested`), but for debugging and building purposes it's still useful to get a partial result even if the channel won't actually advance. This commit changes the behaviour of hydra-eval-jobs such that it aggregates any errors found during the construction of an aggregate, and will instead annotate the job with the evaluation failure such that it shows up in a "cleaner" way. There are really two types of failure that we care about: one is where the attribute just ends up missing altogether in the final output, and also where the attribute is in the output but fails to evaluate. Both are handled here. Note that this does mean that the same error message may be output multiple times, but this aids debuggability because it'll be much clearer what's blocking the job from being created.
8478697
to
67ebce8
Compare
Recently nixpkgs had a problem where evaluations were failing with the dreaded
[json.exception.type_error.302] type must be string, but is null
nlohmann/json error.It turns out this is because some of the NixOS tests included in the
tested
job could no longer be evaluated.No more! With this change, not only will we produce an evaluation, but we'll also generate more useful evaluation errors for people to use when debugging, rather than trying to manually bisect nixpkgs to work out at what point evaluations broke.
More context in the commits included in this PR. It even includes a test (which appears to be the first test of the aggregate job functionality, and I'm... asserting that it breaks correctly. Oh well.)