-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cloud-init auto-detect to prevent multiple provisioning mechanisms from relying on configuration for coordination #1633
Add cloud-init auto-detect to prevent multiple provisioning mechanisms from relying on configuration for coordination #1633
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #1633 +/- ##
===========================================
+ Coverage 67.34% 67.54% +0.19%
===========================================
Files 80 80
Lines 11433 11456 +23
Branches 1605 1604 -1
===========================================
+ Hits 7700 7738 +38
+ Misses 3393 3374 -19
- Partials 340 344 +4
Continue to review full report at Codecov.
|
1d5cbff
to
f97d136
Compare
Ready for review. // @jasonzio |
systemctl_output = subprocess.check_output([ | ||
'systemctl', | ||
'is-enabled', | ||
'cloud-init-local.service' | ||
], stderr=subprocess.STDOUT).decode('utf-8').replace('\n', '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will produce an exception if the service is missing.
I'm also not sure using systemctl
is a particular good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An exception is ok. This is wrapped in a try/except so that any exception case (such as a missing service) will just result in a falsy return. As for using systemctl
, it is the best option forward from what I've seen. I originally tried using dbus but there is no guarantee that the system dbus is up and running at this point, whereas systemd has it's own internal dbus for communication (which systemctl uses) that we can rely on at this boot stage. Let me know if you have any other hesitations for using systemctl otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not certain that calling systemctl from within a code path that runs as part of a systemd unit will always produce reliable results. I should but I don't know for certain nor have I found any documentation thereof.
This also makes the assumption that the generator (which is safe) has run, and properly detected Azure (there have been generator detection issues). Last but not least it assumes that all images run the generator that is part of cloud-init. There are other way to run cloud-init-local.service
. While we can argue that that's not the "standard way" when/if someone has a "brilliant idea" and this code breaks I did want to point out at the implicit dependency that is being created by this check.
I am not advocating for a change, just making a comment.
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 12 files at r2, 2 of 2 files at r3.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @larohra, @Moustafa-Moustafa, @narrieta, @pgombar, @trstringer, @vrdmr, and @waldiTM)
Validated with @narrieta, let me know if there are any other things needed for this PR to get merged. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of comments, but looks good to me.
I'll do an automation run and approve/merge afterwards.
specifying "auto"). Possible options are "auto" (default), "waagent", "cloud-init", | ||
or "disabled". | ||
|
||
#### __Provisioning.Enabled__ (*removed in VERSION*) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't removing this be a breaking change? who are the likely users of those settings? thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These settings need to be made in the image from which a VM is created, and they're generally set by a user only when they create a new image. Simply updating waagent in an already-provisioned VM should have no impact, as the settings are ignored (since provisioning already happened).
I can imagine someone being impacted if they take an image they'd already created, update the waagent, and use that to create a new image. It might be worthwhile for the waagent package (deb or rpm) to have a post-install script check the config file for the presence of the Provisioning.Enabled
config setting and emit a warning or error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might also be a good idea for the provisioning code in waagent to detect the presence of Provisioning.Enabled
and emit a warning in the log (and via telemetry). This should make diagnosis easier in the few cases where the change gets missed.
@narrieta hold off on merging just yet, we're going to try to get another review of this before we move forward. Changing this to "work in progress" for the time being. |
@trstringer OK |
systemctl_output = subprocess.check_output([ | ||
'systemctl', | ||
'is-enabled', | ||
'cloud-init-local.service' | ||
], stderr=subprocess.STDOUT).decode('utf-8').replace('\n', '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not certain that calling systemctl from within a code path that runs as part of a systemd unit will always produce reliable results. I should but I don't know for certain nor have I found any documentation thereof.
This also makes the assumption that the generator (which is safe) has run, and properly detected Azure (there have been generator detection issues). Last but not least it assumes that all images run the generator that is part of cloud-init. There are other way to run cloud-init-local.service
. While we can argue that that's not the "standard way" when/if someone has a "brilliant idea" and this code breaks I did want to point out at the implicit dependency that is being created by this check.
I am not advocating for a change, just making a comment.
@narrieta if this PR looks good it should be ready for merge. |
running automation |
…into thstring/cloudinit-autodetect2
@narrieta ready for re-running and consideration. Thanks! |
@trstringer - Running automation |
Automation OK |
Description
Currently waagent relies on explicit configuration (
Provisioning.UseCloudInit
to be set, andProvisioning.Enabled
to be falsy). This can pose an interesting problem with the introduction of cloud-init. When waagent and cloud-init live side-by-side, they will contend with each other to be the provisioning agent. Image maintainers need to explicitly set configuration (above) so that waagent allows cloud-init to be the provisioning agent. With maintainers possibly not knowing about this behavior, they could unknowing cause provisioning issues.This PR changes that behavior by deprecating
Provisioning.UseCloudInit
andProvisioning.Enabled
configuration options and introduces a new configuration setting:Provisioning.Agent
. The following values change the behavior of which provisioning agent to use:auto
(default) - Allow waagent to autodetect to see if cloud-init is enabled. If so, use cloud-init as the provisioning agent. If not, use waagent as the provisioning agent.waagent
- Force waagent as the provisioning agent.cloud-init
- Force cloud-init as the provisioning agent.disabled
- Disable all provisioning.PR information
Quality of Code and Contribution Guidelines
This change is