-
Notifications
You must be signed in to change notification settings - Fork 894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(apt): Don't warn on 822 source format #5028
Conversation
fyi @cjp256 |
@@ -708,15 +708,15 @@ def generate_sources_list(cfg, release, mirrors, cloud): | |||
) | |||
if expected_content: | |||
if expected_content != util.load_text_file(apt_sources_list): | |||
LOG.warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are a couple integration tests that expect exit 2 due to deb822 warnings we know are in logs. You may need to fix those up as part of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any clue which? I checked all of the tests that reference NOBLE
and didn't see anything that looked like it would exit 2 due to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/integration_tests/modules/test_cli.py::test_invalid_userdata. Nevermind, I see we already fixed with this commit. 4e87db2. I don't see anything else that is checking for result.return_code == 2 from cloud-init status that isn't also intentionally setting up a failure path.
Related note. I'm wondering if the status code really just pushed warnings into the error category. We kind of lost that middle ground. I know we make use of warnings in the Azure data source to indicate soft (retriable) failures that may be delaying boot. Similar to this, I now have to debate whether or not downgrade warnings to info, but then users will no longer get the heads up on the console that DHCP/IMDS/etc. failed and we're retrying so sorry about the delay.... I also understand that a lot of warnings really should be errors. Thoughts? Should we have an API to push a log out to console out that's not at warning+ level? |
Thanks for the thoughts @cjp256.
Yeah, that's not ideal. And I agree that if cloud-init was capable of meeting users expectations, even something less ideal happened, that we probably shouldn't consider that an error / nonzero return code - return code 2 on logged warning may be too agressive. What if we:
This would align better with Python's the documented log levels, and would allow console messages without considering something recoverable error.
That could work too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with us backing this behavior out to info
level. In most cases in image production, we have addressed build artifacts leaving stale /etc/apt/sources.list files on the system. And this condition isn't really a warning more than an informational message that cloud-init corrected an invalid APT config file on the system that it used to silently blow away on first boot anyway. I double checked potential integration test impacts, but I think we are good now as 4e87db2 already expects happy state anyway, and no warnings from cloud-init status.
I really would prefer that we keep a higher visibility of warnings in our CLI exit codes if we can, though I recognize the cost dealing with these warnings is potentially painful if we are trying to remedy behavior changes quickly to avoid impacting some CI, scripts or automation which doesn't differentiate between exit 1 and exit 2+. If we diminish the visibility of warnings again to just things dropped into logs I think we continue to avoid finding or tracking bugs or issues related to unexpected behaviors that otherwise go completely unnoticed (for years) as the warnings are buried in cloud-init's verbose logs. I think we should consider each warning case and determine if it really needs to be a warning anymore as different issues arise on different platforms. We could perform a preliminary review of our warnings(~330) throughout code to evaluate whether we need to demote some of those to info level. We also have only another 100 info level log messages which we also try to use sparingly. As to whether Azure wants to surface info level logs to a console, I think we are looking at 11 info level logs per boot. Whether this is a reasonable volume or not depends on the dashboard/view into the console I suppose.
I think this is a good practice too. We should re-review/groom our warnings and decide what should ultimately be promoted to ERROR level as well. |
Proposed Commit Message
spurious warnings, probably shouldn't do that to users