Fix rasdaemon crash during bootup on AMD CPU #19023
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why I did it
Booting SONiC on a AMD EPYC 16-Core CPU is causing rasdaemon to crash. This is not a major blocker because rasdaemon eventually restarts and is stable after a point.
Coredump stack trace:
Known issue for rasdaemon: mchehab/rasdaemon#77
Fixed here:
mchehab/rasdaemon@f1ea763
Unfortunately this fix is not present in the default bookworm version. So, backported the fix and compiled rasdaemon from source
Here is the patch: https://sources.debian.org/patches/rasdaemon/0.8.0-2/0001-Check-CPUs-online-not-configured.patch/
Work item tracking
How I did it
How to verify it
Booted the image built with these changes and no issue in observed
Before this change:
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)