-
Notifications
You must be signed in to change notification settings - Fork 894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataSourceMAAS timeout delays the boot of the machine by ~2 minutes #5064
Comments
Link to MAAS upstream bug https://bugs.launchpad.net/maas/+bug/2057763 |
Incorrectly registering DataSourceMAAS for init-local DEP_FILESYSTEM inadvertently short-circuits the _get_dat checks on network setup from initramfs. This results in a 2 minute timeout on physical system reboots. Correct this registration to do the proper checks and avoid trying to perform GETs against the IMDS in init-local timeframe when we know network is not yet available. Fixes canonicalGH-5064
Thank you @r00ta for reflecting the bug here. I've tried setting a custom image streams to http://images.maas.io/ephemeral-v3/candidate in an attempt to reproduce this issue. I'm unable to reproduce this issue correctly due to my own inability on a maas local VM in multipass to get to a commissioning success state so I haven't seen this yet. But, from logs it looks like this timeout is due to the MAAS trying to run it's full get_data against the IMDS url in init-local timeframe before networking is up on the physical machine. That shouldn't generally happen unless initramfs provided network config to the booted machine, which I was told only occurs in Ephemeral boots not physical system reboots. This leads me to discovering a misconfiguration. The DatasourceMAASLocal class should have been registered to DEP_FILESYSTEM(init-local boot stage). But it looks like DataSourceMAAS (the init-network stage datasource) was registered instead. This is likely causing that 2 minute timeout you are seeing on reboot. |
Incorrectly registering DataSourceMAAS for init-local DEP_FILESYSTEM inadvertently short-circuits the _get_data checks performed by DataSourceMAASLocal for presence of network config from initramfs. This results in a 2 minute timeout on physical system reboots because the network stage DataSourceMAAS expects all system networking is already configured and the IMDS to bebe accessible. Correct this registration to do the proper initramfs checks and avoid trying to perform GETs against the IMDS in init-local timeframe when we know network is not yet available. Fixes canonicalGH-5064
Thank you for filing this bug @r00ta. I have uploaded a test package 23.4.4 to this PPA https://launchpad.net/~chad.smith/+archive/ubuntu/maas-test. I believe correcting the registration of DataSourceMAASLocal in init-local timeframe should solve this timeout problem. But, I'd need a test run of a modified maas ephemeral image to confirm and I don't have a reproducer handy. Would someone be able to test this to confirm 2 things:
|
…nonical#5068) Incorrectly registering DataSourceMAAS for init-local DEP_FILESYSTEM inadvertently short-circuits the _get_data checks performed by DataSourceMAASLocal for presence of network config from initramfs. This results in a 2 minute timeout on physical system reboots because the network stage DataSourceMAAS expects all system networking is already configured and the IMDS to be accessible. Correct this registration to do the proper initramfs checks and avoid trying to perform GETs against the IMDS in init-local timeframe when we know network is not yet available. Fixes canonicalGH-5064 LP: #2057763
I will try this later. For now, what I've done is to try to deploy a custom, non-Ubuntu, image with MAAS and the proposed fixed. It doesn't only time out, it seems to fail to configure networking altogether for some reason. This is the cloud-init.log failing to deploy rhel9.3 with cloud-init 24.1, with the proposed fix applied: https://paste.ubuntu.com/p/GP3nd5zmTw/ For reference, this is the cloud-init.log from a deployment of rhel9.3, with cloud-init 23.1: https://paste.ubuntu.com/p/6CnPcdHk7r/ |
Bug report
On every MAAS version when I deploy ubuntu 22.04 (https://images.maas.io/ephemeral-v3/stable/jammy/amd64/20240301/) the os is properly installed, but when the machine reboots there is a delay in the startup and looking at the journalctl logs it's possible to see a timeout of ~2 minutes for the DataSourceMAAS that failed to make some requests to the rack.
The issue here is the delay of ~2 minutes in the boot process.
Steps to reproduce the problem
Just deploy any ubuntu 22.04 machine with MAAS
Environment details
cloud-init logs
The text was updated successfully, but these errors were encountered: