Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(apt): Don't warn on 822 source format #5028

Merged
merged 1 commit into from
Mar 11, 2024

Conversation

holmanb
Copy link
Member

@holmanb holmanb commented Mar 7, 2024

Proposed Commit Message

fix(apt): Don't warn on 822 source format

spurious warnings, probably shouldn't do that to users

@holmanb
Copy link
Member Author

holmanb commented Mar 7, 2024

fyi @cjp256

@@ -708,15 +708,15 @@ def generate_sources_list(cfg, release, mirrors, cloud):
)
if expected_content:
if expected_content != util.load_text_file(apt_sources_list):
LOG.warning(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are a couple integration tests that expect exit 2 due to deb822 warnings we know are in logs. You may need to fix those up as part of this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any clue which? I checked all of the tests that reference NOBLE and didn't see anything that looked like it would exit 2 due to this.

Copy link
Collaborator

@blackboxsw blackboxsw Mar 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests/integration_tests/modules/test_cli.py::test_invalid_userdata. Nevermind, I see we already fixed with this commit. 4e87db2. I don't see anything else that is checking for result.return_code == 2 from cloud-init status that isn't also intentionally setting up a failure path.

@cjp256
Copy link
Contributor

cjp256 commented Mar 8, 2024

Related note. I'm wondering if the status code really just pushed warnings into the error category. We kind of lost that middle ground.

I know we make use of warnings in the Azure data source to indicate soft (retriable) failures that may be delaying boot. Similar to this, I now have to debate whether or not downgrade warnings to info, but then users will no longer get the heads up on the console that DHCP/IMDS/etc. failed and we're retrying so sorry about the delay....

I also understand that a lot of warnings really should be errors.

Thoughts? Should we have an API to push a log out to console out that's not at warning+ level?

@holmanb
Copy link
Member Author

holmanb commented Mar 8, 2024

Thanks for the thoughts @cjp256.

Related note. I'm wondering if the status code really just pushed warnings into the error category. We kind of lost that middle ground.

I know we make use of warnings in the Azure data source to indicate soft (retriable) failures that may be delaying boot. Similar to this, I now have to debate whether or not downgrade warnings to info, but then users will no longer get the heads up on the console that DHCP/IMDS/etc. failed and we're retrying so sorry about the delay....
I also understand that a lot of warnings really should be errors.

Yeah, that's not ideal. And I agree that if cloud-init was capable of meeting users expectations, even something less ideal happened, that we probably shouldn't consider that an error / nonzero return code - return code 2 on logged warning may be too agressive.

What if we:

  1. switch error code 2 to only occur when >= logging.ERROR messages are logged
  2. allow logging.WARN messages to include messages about situations were cloud-init is not behaving ideally, but is able to complete the expected configuration
  3. require logs that indicate that cloud-init wasn't able to complete a requested action to be logging.ERROR messages - (possibly bump the level on some of our current warnings)

This would align better with Python's the documented log levels, and would allow console messages without considering something recoverable error.

Should we have an API to push a log out to console out that's not at warning+ level?

That could work too.

@holmanb holmanb added the 24.1 label Mar 9, 2024
@holmanb holmanb mentioned this pull request Mar 9, 2024
1 task
Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with us backing this behavior out to info level. In most cases in image production, we have addressed build artifacts leaving stale /etc/apt/sources.list files on the system. And this condition isn't really a warning more than an informational message that cloud-init corrected an invalid APT config file on the system that it used to silently blow away on first boot anyway. I double checked potential integration test impacts, but I think we are good now as 4e87db2 already expects happy state anyway, and no warnings from cloud-init status.

@blackboxsw
Copy link
Collaborator

Thanks for the thoughts @cjp256.

Related note. I'm wondering if the status code really just pushed warnings into the error category. We kind of lost that middle ground.
I know we make use of warnings in the Azure data source to indicate soft (retriable) failures that may be delaying boot. Similar to this, I now have to debate whether or not downgrade warnings to info, but then users will no longer get the heads up on the console that DHCP/IMDS/etc. failed and we're retrying so sorry about the delay....
I also understand that a lot of warnings really should be errors.

Yeah, that's not ideal. And I agree that if cloud-init was capable of meeting users expectations, even something less ideal happened, that we probably shouldn't consider that an error / nonzero return code - return code 2 on logged warning may be too agressive.

What if we:

  1. switch error code 2 to only occur when >= logging.ERROR messages are logged

I really would prefer that we keep a higher visibility of warnings in our CLI exit codes if we can, though I recognize the cost dealing with these warnings is potentially painful if we are trying to remedy behavior changes quickly to avoid impacting some CI, scripts or automation which doesn't differentiate between exit 1 and exit 2+.

If we diminish the visibility of warnings again to just things dropped into logs I think we continue to avoid finding or tracking bugs or issues related to unexpected behaviors that otherwise go completely unnoticed (for years) as the warnings are buried in cloud-init's verbose logs.

I think we should consider each warning case and determine if it really needs to be a warning anymore as different issues arise on different platforms. We could perform a preliminary review of our warnings(~330) throughout code to evaluate whether we need to demote some of those to info level. We also have only another 100 info level log messages which we also try to use sparingly. As to whether Azure wants to surface info level logs to a console, I think we are looking at 11 info level logs per boot. Whether this is a reasonable volume or not depends on the dashboard/view into the console I suppose.

  1. allow logging.WARN messages to include messages about situations were cloud-init is not behaving ideally, but is able to complete the expected configuration
  1. require logs that indicate that cloud-init wasn't able to complete a requested action to be logging.ERROR messages - (possibly bump the level on some of our current warnings)

I think this is a good practice too. We should re-review/groom our warnings and decide what should ultimately be promoted to ERROR level as well.

@holmanb holmanb merged commit 52c6abd into canonical:main Mar 11, 2024
29 checks passed
holmanb added a commit to holmanb/cloud-init that referenced this pull request Mar 11, 2024
holmanb added a commit to holmanb/cloud-init that referenced this pull request Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants