Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a mechanism to catch kernel interfaces breaking. #12102

Closed
wants to merge 1 commit into from

Conversation

rincebrain
Copy link
Contributor

@rincebrain rincebrain commented May 21, 2021

Motivation and Context

Between #12060 and #12076, we've had two bugs where Linux
breaking an interface in a way we didn't have an explicit case
for silently broke functionality.

(If this lands first, #12073's version checks could be refactored to use this macro instead...)

Description

Import ax_compare_version to do the comparisons, then Introduce a
config test that lets you specify "I expect this feature to exist
starting at version A (and, optionally, up to version B)"; if it
doesn't, error out.

This PR adds a check that will pass, because it bounds the version comparison to 5.12, where the interface broke.

How Has This Been Tested?

Tried the test with "[]" version max on 5.10 (it passed), and 5.12.1 (it failed).
Tried the test with "[5.12]" max on 5.10 (it passed), and 5.12.1 (it also passed).

If people like this idea, I'll follow it up with more checks.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

Between openzfs#12060 and openzfs#12076, we've had two bugs where Linux
breaking an interface in a way we didn't have an explicit case
for silently broke functionality.

Import ax_compare_version to do the comparisons, then Introduce a
config test that lets you specify "I expect this feature to exist
starting at version A (and, optionally, up to version B)"; if it
doesn't, error out.

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
@jwk404 jwk404 added Status: Code Review Needed Ready for review and testing Status: Work in Progress Not yet ready for general review and removed Status: Code Review Needed Ready for review and testing labels May 24, 2021
@rincebrain
Copy link
Contributor Author

One quick note that might not be obvious to people, and I'll put in a comment whether it gets changed or not - if I understand how the version comparison macro I pulled in works correctly, it would turn, say, 5.10.0 into "0005001000000000" for comparison - and "5.10.0-rc4" into 0005001000000004, which means that "5.10.0 < 5.10.0-rc4" would return true, among other possibly-surprising outcomes.

The three options I see for handling this are:

  1. specify in the comment for the kernel version comparison macro that it should not be used for anything but non-rc releases, and modify the usage to only compare X.Y.Z - it'll still report 5.12.0-rc1 as >= 5.12.0, but it won't report 5.12.0 < 5.12.0-rc1; the downside is that it will hard error out if, say, the interface was introduced in 5.12.0-rc3, the comparison is added for 5.12.0, and you try building against 5.12.0-rc1.
  2. the above, and add special handling to notice the existence of "rcX" in the version and do something like if !(VERSION_MIN == VERSION_WITHOUT_RC_SUFFIX && RCVERSION > 0) { [rest of handling] } - this won't try handling any other random strings that distros might stick on, but doing that was never going to be generally feasible
  3. modify the version comparison code itself to special-case rcX

I think my preference is 2, then either 1 or 3, so I'll probably push something doing 2 soon if nobody proposes a better plan before I do.

@ryao
Copy link
Contributor

ryao commented Sep 16, 2022

There is an alternative solution that I advocated in the past, but it never caught on. The reason for these issues is that the autotool checks are written like this:

  • test new api, else
  • Assume old api

They should be written like this:

  • test new api, else
  • test old api, else
  • Error

In checks that I wrote in the past, I would do the latter. To give a few examples:

50a0749
c3d9c0d
0f37d0c

I wanted to eventually rewrite all of the existing checks to match this style, but I never did and others did not follow my example. :/

@behlendorf
Copy link
Contributor

behlendorf commented Sep 16, 2022

Introduce a config test that lets you specify "I expect this feature to exist starting at version A (and, optionally, up to version B)";

One of the difficulties with a version number based approach is it's common for Enterprise Distributions, like RHEL or SLES, to backport significant kernel changes to much older kernels. We also see this kind of thing to a lesser degree with other distribution kernels which choose to apply patches for their own reasons.

There is an alternative solution that I advocated in the past, but it never caught on.
They should be written like this:

test new api, else
test old api, else
Error

We've been gradually been moving the code base to this style of check and making the checks themselves stricter. However, we haven't made a push to systematically convert all the remains checks. I think it'd be great if someone wanted to work on this. There's a good example of an updated check in kernel-put-link.m4.

@rincebrain
Copy link
Contributor Author

Sure, it's not going to solve RHEL kernels breaking, but nothing on earth is going to solve that mess.

The goal was to try and catch at least some common cases of "we are past the minimum version for X feature, and the check failed" versus failing open.

Since it sounds like everyone dislikes this solution, I'll close it.

@rincebrain rincebrain closed this Sep 16, 2022
@ryao
Copy link
Contributor

ryao commented Sep 17, 2022

I envision writing a script that will batch fetch every major Linux release and compile the ZFS kernel modules against it. That would be needed to verify that a rework of all of the autotools checks in the old style into the new style works properly against all supported kernel versions. I imagine the runtime would be excessive (even if make modules_prepare is used to reduce the compilation time per kernel version). :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Work in Progress Not yet ready for general review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants