Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jobs stuck for hours because cloud-init doesn't become ready #78

Closed
ximon18 opened this issue Jul 12, 2023 · 2 comments · Fixed by #79
Closed

Jobs stuck for hours because cloud-init doesn't become ready #78

ximon18 opened this issue Jul 12, 2023 · 2 comments · Fixed by #79
Labels
bug Something isn't working

Comments

@ximon18
Copy link
Member

ximon18 commented Jul 12, 2023

This issue started affecting Routinator packaging workflow runs yesterday, and was seen affecting Debian Buster, Bullseye and Bookworm LXC images in the png-test phase. It has also been seen affecting CentOS 7.

The underlying cause is unclear, perhaps something wrong with the images themselves. It might be something in the LXC images themselves. Today only Debian Buster appears to be failing in the ploutos-testing suite. Automated weekly test runs of ploutos-testing show that this issue was not happening until recently, and manual test runs succeeded even a couple of days ago.

Could this be related? From: https://linuxcontainers.org/lxd/

LXD is now under Canonical
The LXD project is no longer part of the LinuxContainers project but can now be found directly on Canonical's websites.

Website: https://ubuntu.com/lxd
Github: https://github.com/canonical/lxd
Forum: https://discourse.ubuntu.com/c/lxd/
Documentation: https://documentation.ubuntu.com/lxd/

Project announcement
Date: 4th of July 2023

Hello,

Canonical, the creator and main contributor of the LXD project has decided that after over 8 years as part of the Linux Containers community, the project would now be better served directly under Canonical’s own set of projects.

While the team behind Linux Containers regrets that decision and will be missing LXD as one of its projects, it does respect Canonical’s decision and is now in the process of moving the project over.

Concretely, the expected changes are:

https://github.com/lxc/lxd will now become https://github.com/canonical/lxd
https://linuxcontainers.org/lxd will disappear and be replaced with a mention directing users to https://ubuntu.com/lxd
The LXD YouTube channel will be handed over to the Canonical team
The LXD section on the LinuxContainers community forum will slowly be sunset in favor of the Ubuntu Discourse forum > run by Canonical
The LXD CI infrastructure will be moved under Canonical’s care
Image building for Linux Containers will no longer be relying on systems provided by Canonical, limiting image building to x86_64 and aarch64.
What will not be changing:

The rest of the Linux Containers projects remain unaffected
The image server, currently used by both LXC and LXD will keep operating as normal, though with less architectures available as mentioned above
Those changes will likely all happen pretty rapidly as everything is relatively tightly integrated together. As a result, you > may notice a bit of bumpiness while Canonical sets up the replacement infrastructure.

Sincerely,

The Linux Containers team

Christian Brauner
Serge Hallyn
Stéphane Graber

At any rate Ploutos should give up after a period, not retry forever.

@ximon18 ximon18 changed the title Jobs stuck for hours because cloud-init doesn''t become ready Jobs stuck for hours because cloud-init doesn't become ready Jul 12, 2023
@ximon18 ximon18 added the bug Something isn't working label Jul 12, 2023
@ximon18
Copy link
Member Author

ximon18 commented Jul 12, 2023

See also #77.

@ximon18
Copy link
Member Author

ximon18 commented Jul 12, 2023

We didn't find a way to pin to the last working LXC Debian Buster image, nor is it even clear that the image is still available on LXC image servers. As such we will instead (temporarily?) exclude Buster from the png-test phase as we already do for Stretch (because the image is no longer available for Stretch).

ximon18 added a commit that referenced this issue Jul 12, 2023
* Limit cloud-init waiting to roughly 60 seconds.
* More diagnostics for cloud-init timeout.
* Don't suppress default shell settings for the prepar container step.
* Try and keep going anyway without knowing based on cloud-init completion that the system is likely ready for use.
* Guide the user to find the logs we added.
* Exclude Debian Buster from package testing while the LXC image has cloud-init issues.
* Link to underlying issue.
ximon18 added a commit that referenced this issue Jul 12, 2023
* Limit cloud-init waiting to roughly 60 seconds.
* More diagnostics for cloud-init timeout.
* Don't suppress default shell settings for the prepar container step.
* Try and keep going anyway without knowing based on cloud-init completion that the system is likely ready for use.
* Guide the user to find the logs we added.
* Exclude Debian Buster from package testing while the LXC image has cloud-init issues.
* Link to underlying issue.

(cherry picked from commit c6fde49)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant