Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Produce better errors for failing jobsets #1025

Merged
merged 2 commits into from
Oct 26, 2021

Conversation

lukegb
Copy link
Contributor

@lukegb lukegb commented Sep 22, 2021

Recently nixpkgs had a problem where evaluations were failing with the dreaded [json.exception.type_error.302] type must be string, but is null nlohmann/json error.

It turns out this is because some of the NixOS tests included in the tested job could no longer be evaluated.

No more! With this change, not only will we produce an evaluation, but we'll also generate more useful evaluation errors for people to use when debugging, rather than trying to manually bisect nixpkgs to work out at what point evaluations broke.

More context in the commits included in this PR. It even includes a test (which appears to be the first test of the aggregate job functionality, and I'm... asserting that it breaks correctly. Oh well.)

Copy link
Member

@grahamc grahamc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you. Let me know about that question and I'll happily merge.

@andersk
Copy link

andersk commented Oct 26, 2021

nixos-unstable evaluation has been broken again on Hydra for several days (NixOS/nixpkgs#142918) with the same useless error: [json.exception.type_error.302] type must be string, but is null. I just confirmed that if this had been merged, it would have produced a helpful error pointing straight at the real problem.

nixos.tests.gnome-xorg.x86_64-linux: error: undefined variable 'source-sans-pro'

       at /home/anders/nixpkgs/nixos/modules/services/x11/desktop-managers/gnome.nix:456:9:

          455|         source-code-pro # Default monospace font in 3.32
          456|         source-sans-pro
             |         ^
          457|       ];
nixos.tests.gnome.x86_64-linux: error: undefined variable 'source-sans-pro'

       at /home/anders/nixpkgs/nixos/modules/services/x11/desktop-managers/gnome.nix:456:9:

          455|         source-code-pro # Default monospace font in 3.32
          456|         source-sans-pro
             |         ^
          457|       ];

Maybe it’s time to consider merging this now and worrying about that small cleanup later?

@lukegb lukegb force-pushed the hydra-better-errors branch from 757b979 to 8478697 Compare October 26, 2021 09:04
@lukegb
Copy link
Contributor Author

lukegb commented Oct 26, 2021

Whoops, meant to get back to this. Extracted the logic out into its own function, but it's a bit messy because it's a "get or perform spooky side-effects" lambda. Not sure that's a huge improvement.

At the moment, the jobset object is unlikely to actually retrieve the
evaluation error output, because it isn't refreshed after
hydra-eval-jobsets is run.

Explicitly calling DBIx::Class::Row->discard_changes causes any updated
data to be refreshed, at the cost of losing any not-yet committed
changes to the row.
At the moment, aggregate jobs can easily break and cause the entire
evaluation to fail, which is not ideal. For Nixpkgs, we do have some
important aggregate jobs (like `tested`), but for debugging and building
purposes it's still useful to get a partial result even if the channel
won't actually advance.

This commit changes the behaviour of hydra-eval-jobs such that it
aggregates any errors found during the construction of an aggregate, and
will instead annotate the job with the evaluation failure such that it
shows up in a "cleaner" way.

There are really two types of failure that we care about: one is where
the attribute just ends up missing altogether in the final output, and
also where the attribute is in the output but fails to evaluate. Both
are handled here.

Note that this does mean that the same error message may be output
multiple times, but this aids debuggability because it'll be much
clearer what's blocking the job from being created.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants