Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thermal: fix segfault due to mismatching genl attribute #341

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ailiop-suse
Copy link

Recent kernel commit 1773572863c4 ("thermal: netlink: Add the commands and the events for the thresholds") merged in v6.13 added the "THERMAL_GENL_ATTR_TZ_PREV_TEMP" attribute in the middle of enum thermal_genl_attr and so all the following attrs are now offset by one position.

This causes irqbalance to segfault during thermal event handling as it encounters a null nlattr in the parsed attrs array in the position where it expects to find THERMAL_GENL_ATTR_CAPACITY (which is now returned in the next array position). The null is unchecked and passed to nla_len() which dereferences it to obtain the nla_len, and triggers the segfault.

Fix this by removing the hardcoded attrs and instead including the uapi header file. Also, check the parsed attr for null to prevent crashing in case of similar kernel updates.

Recent kernel commit 1773572863c4 ("thermal: netlink: Add the commands
and the events for the thresholds") merged in v6.13 added the
"THERMAL_GENL_ATTR_TZ_PREV_TEMP" attribute in the middle of enum
thermal_genl_attr and so all the following attrs are now offset by one
position.

This causes irqbalance to segfault during thermal event handling as it
encounters a null nlattr in the parsed attrs array in the position where
it expects to find THERMAL_GENL_ATTR_CAPACITY (which is now returned in
the next array position). The null is unchecked and passed to nla_len()
which dereferences it to obtain the nla_len, and triggers the segfault.

Fix this by removing the hardcoded attrs and instead including the uapi
header file. Also, check the parsed attr for null to prevent crashing
in case of similar kernel updates.

Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
@nhorman
Copy link
Member

nhorman commented Feb 16, 2025

um, isn't changing attribute values a kernel ABI break? That seems like it should be fixed in the kernel

@ailiop-suse
Copy link
Author

um, isn't changing attribute values a kernel ABI break? That seems like it should be fixed in the kernel

Indeed, and it appears that it has already been addressed in commit c195b9c6ab9c ("thermal/netlink: Prevent userspace segmentation fault by adjusting UAPI header") which is already merged and part of v6.14-rc3.

Nevertheless, it seemed like a good idea to remove the hardcoded attrs and sync-up with the kernel uapi (for example all the CAPACITY attrs are misnamed, this has always been CAPABILITY in the kernel, and this was a bit confusing by itself while trying to determine the issue).

Considering though that 1773572863c4 made it out on a stable release (v6.13), perhaps it is desirable to keep the definitions local, since it would avoid other bugs in case e.g. irqbalance was compiled with v6.13 headers and then run on top of a v6.14 or later kernel. On the other hand, similar mishaps may happen and it will lead again to segfaults.

Would it make sense to keep the definitions local as they were, and just add the null check for the parsed attribute?

@ppwaskie
Copy link
Contributor

um, isn't changing attribute values a kernel ABI break? That seems like it should be fixed in the kernel

Indeed, and it appears that it has already been addressed in commit c195b9c6ab9c ("thermal/netlink: Prevent userspace segmentation fault by adjusting UAPI header") which is already merged and part of v6.14-rc3.

Nevertheless, it seemed like a good idea to remove the hardcoded attrs and sync-up with the kernel uapi (for example all the CAPACITY attrs are misnamed, this has always been CAPABILITY in the kernel, and this was a bit confusing by itself while trying to determine the issue).

Considering though that 1773572863c4 made it out on a stable release (v6.13), perhaps it is desirable to keep the definitions local, since it would avoid other bugs in case e.g. irqbalance was compiled with v6.13 headers and then run on top of a v6.14 or later kernel. On the other hand, similar mishaps may happen and it will lead again to segfaults.

Would it make sense to keep the definitions local as they were, and just add the null check for the parsed attribute?

I'm a fan of assuming the ABI isn't broken, and check in the unlikely event you run into a brain-damaged kernel.

I think your suggestion of leaving the local definitions as-is, and adding the check for the unlikely scenario the null check can happen in, and log loudly that the kernel they're running on is broken. Then compatibility is maintained without adding additional complexity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants