Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Always use single datasource if specified #5098

Merged
merged 2 commits into from
Mar 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion tests/unittests/test_ds_identify.py
Original file line number Diff line number Diff line change
Expand Up @@ -680,7 +680,7 @@ def test_single_entry_defines_datasource(self):
mydata = copy.deepcopy(VALID_CFG["Ec2-hvm"])
cfgpath = "etc/cloud/cloud.cfg.d/myds.cfg"
mydata["files"][cfgpath] = 'datasource_list: ["NoCloud"]\n'
self._check_via_dict(mydata, rc=RC_FOUND, dslist=["NoCloud", DS_NONE])
self._check_via_dict(mydata, rc=RC_FOUND, dslist=["NoCloud"])

def test_configured_list_with_none(self):
"""When datasource_list already contains None, None is not added.
Expand Down
6 changes: 5 additions & 1 deletion tools/ds-identify
Original file line number Diff line number Diff line change
Expand Up @@ -2018,7 +2018,11 @@ _main() {
# if there is only a single entry in $DI_DSLIST
if [ $# -eq 1 ] || [ $# -eq 2 -a "$2" = "None" ] ; then
debug 1 "single entry in datasource_list ($DI_DSLIST) use that."
found "$@"
if [ $# -eq 1 ]; then
write_result "datasource_list: [ $1 ]"
Copy link
Collaborator

@blackboxsw blackboxsw Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Walking through this, the only way we can get $# -eq 1 is if one of the following was configs was present:

  1. kernel cmdline with (ds|ci.ds|ci.datasource)=<some_ds_name>
  2. kernel cmdline with cc:datasource_list: [ <some_ds_name> ] end_cc # MAAS does this
  3. /etc/cloud/ds-identify.cfg with datasource: <some_ds_name> # we should probably deprecate this as I am unsure where that's used anymore
  4. /etc/cloud/cloud.cfg.d/* containing datasource_list: [ <some_ds_name> ]

In those cases, the environment configuration is intentionally limiting the datasource list selection to a single datasource.

If ds-identify avoids appending fallback None to datasource_list in /run/cloud-init/cloud.cfgds-identify, the system will no longer fallback to DataSourceNone behavior if the primary single datsource is undiscoverable. This will leave us with a slew of WARNINGs which show up in cloud-init status --format=yaml and /var/log/cloud-init-output.log`

    - No instance datasource found! Likely bad things to come!
    - Can not apply stage config, no datasource found! Likely bad things to come!
    - Can not apply stage final, no datasource found! Likely bad things to come!

This ultimately leaves us with a system that hasn't run any config modules at all which could make the system error conditions even less discoverable because cloud-init will not have generated SSH host keys or imported SSH pub keys to the instance. I'm not sure we want to go down this specific path, though it could be argued that a filesystem config or kernel command line that wishes to have a sensible fallback option should either provide cc:datasource_list: [ DS_NAME, None ] or provide that fallback in /etc/cloud/cloud.cfg.d/*.cfg as datasource_list: [ DS_NAME, None ].

[update]
In either case, the fallback to DataSourceNone is undesired anyway and won't contain cloud-specific
configuration to make the instance accessible w/ user-data/network-config/vendor-data etc. The only benefit I could see from DataSourceNone over just dying with "no datasource found", is that enough config could be put into /etc/cloud/cloud.cfg.d in the DataSourceNone case to trigger correct network config, user creation and SSH key setup to allow that modified image to be more easily accessible for triage.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasn't run any config modules at all which could make the system error conditions even less discoverable

I'm actually not sure I really believe this statement I made as the DataSourceNone on something like an Azure/Oracle/GCe/Ec2 cloud provides "more" visibility for triage. DataSourceNone won't be reading any cloud platform specific instance metadata configuration anyway which would have provided sensible config for which project/user SSH keys to import anyway. Also, most likely it wouldn't have been provided enough network configuration information to bring up the network anyway unless there was already functional static config in the image (which is possible). So, getting into that broken instance would still be problematic anyway and probably require virtual serial console to access the instance for triage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you've walked back most of your objection, but I just wanna make sure we're on the same page about what's changing.

If ds-identify avoids appending fallback None to datasource_list in /run/cloud-init/cloud.cfgds-identify, the system will no longer fallback to DataSourceNone behavior if the primary single datsource is undiscoverable. This will leave us with a slew of WARNINGs which show up in cloud-init status --format=yaml and /var/log/cloud-init-output.log`

Because of #2060 , I don't think this has been true since 23.2. In that PR we skipped datasource detection on the python side if we saw a single datasource or [datasource, None], so DataSourceNone would never actually get used. Am I understanding this correctly? My goal with this commit is to take us back to what things were post 23.2 if a single datasource is specified.

Copy link
Collaborator

@blackboxsw blackboxsw Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of #2060 , I don't think this has been true since 23.2. In that PR we skipped datasource detection on the python side if we saw a single datasource or [datasource, None], so DataSourceNone would never actually get used. Am I understanding this correctly? My goal with this commit is to take us back to what things were post 23.2 if a single datasource is specified.

You are right, I think it was 23.1 that we introduced the override_ds_detect. Ok, so this behavior observing the single configured datasource has really been active for a while. The None fallback has been ignored as a fallback option in any such case where the single datasource went undiscoverable due to ds_detect returning False. It just hadn't hit any significant cases until this OpenStack on Oracle config issue #5091.

Also, I think this current #5091 issue could also be resolved without any changes to cloud-init or ds-identify given a simple config file change if they have reasons to retain OpenStack datasource on Oracle platform.

From the looks of the DataSourceOpenStack.ds_detect. If their platform needs to use OpenStack datasource over Oracle DataSource. It looks like the following config could be provided in /etc/cloud/cloud.cfg.d/99-delphix-datasource.cfg

datasource_list: [ OpenStack, Oracle ]

This will tell the unique DataSourceOpenStack.ds_detect logic to allow OpenStack datasource detection on any system which exposes chassis asset tag=OracleCloud.com.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I think your suggestion James is the correct long-term solution. If ds-identify only sees one datasource defined in datasource_list, then it really should only persist that single datasource to /run/cloud-init/cloud.cfg. It would honor the config provided.

else
found "$@"
fi
return
fi

Expand Down
Loading