Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release: remove timeout overrides for systemd units #1483

Merged
merged 1 commit into from
Apr 14, 2021

Conversation

arnaldo2792
Copy link
Contributor

@arnaldo2792 arnaldo2792 commented Apr 13, 2021

Issue number:
#1450

Description of changes:

db76613a release: remove stop timeout overrides for systemd units

This commit removes the DefaultTimeoutStartSec configuration, since services like kubernetes require more than 10 seconds to start in the vmware variant.

Only the DefaultTimeoutStopSec is overwritten since kubernetes pods take longer to stop when using the default value (1min 30s), as shown in the logs:

M[K[   ***] A stop job is running for libcontai���df6da360b4e4 (1min 20s / 1min 30s)
M[K[    **] A stop job is running for libcontai���df6da360b4e4 (1min 20s / 1min 30s)
M[K[     *] A stop job is running for libcontai���df6da360b4e4 (1min 21s / 1min 30s)
M[K[    **] A stop job is running for libcontai���df6da360b4e4 (1min 21s / 1min 30s)
M[K[   ***] A stop job is running for libcontai���df6da360b4e4 (1min 22s / 1min 30s)
M[K[  *** ] A stop job is running for libcontai���df6da360b4e4 (1min 22s / 1min 30s)
M[K[ ***  ] A stop job is running for libcontai���df6da360b4e4 (1min 23s / 1min 30s)
M[K[***   ] A stop job is running for libcontai���df6da360b4e4 (1min 23s / 1min 30s)
M[K[**    ] A stop job is running for libcontai���df6da360b4e4 (1min 24s / 1min 30s)
M[K[*     ] A stop job is running for libcontai���df6da360b4e4 (1min 24s / 1min 30s)
M[K[**    ] A stop job is running for libcontai���df6da360b4e4 (1min 25s / 1min 30s)
M[K[***   ] A stop job is running for libcontai���df6da360b4e4 (1min 25s / 1min 30s)
M[K[ ***  ] A stop job is running for libcontai���df6da360b4e4 (1min 26s / 1min 30s)
M[K[  *** ] A stop job is running for libcontai���df6da360b4e4 (1min 26s / 1min 30s)
M[K[   ***] A stop job is running for libcontai���df6da360b4e4 (1min 27s / 1min 30s)
M[K[    **] A stop job is running for libcontai���df6da360b4e4 (1min 27s / 1min 30s)
M[K[     *] A stop job is running for libcontai���df6da360b4e4 (1min 28s / 1min 30s)
M[K[    **] A stop job is running for libcontai���df6da360b4e4 (1min 28s / 1min 30s)
M[K[   ***] A stop job is running for libcontai���df6da360b4e4 (1min 29s / 1min 30s)
M[K[  *** ] A stop job is running for libcontai���df6da360b4e4 (1min 29s / 1min 30s)
M[K[ ***  ] A stop job is running for libcontai���df6da360b4e4 (1min 30s / 1min 30s)

Testing done:

In k8s 1.19, ecs and dev aarch64:

  • Run systemctl status to check that no units were failing
  • Checked values for TimeoutStopSec and TimeoutStartSec with systemctl show:
DefaultTimeoutStartUSec=1min 30s # Was 10s
DefaultTimeoutStopUSec=10s

k8s 1.19

First boot:

# In latest 1.0.8
Startup finished in 1.873s (kernel) + 42.179s (userspace) = 44.053s
multi-user.target reached after 42.170s in userspace
# With new configuration
Startup finished in 1.785s (kernel) + 43.784s (userspace) = 45.570s
multi-user.target reached after 43.746s in userspace

Reboot:

# In latest 1.0.8
Startup finished in 1.795s (kernel) + 40.333s (userspace) = 42.129s
multi-user.target reached after 40.316s in userspace
# With new configuration, got lucky that wicked didn't take longer than expected
Startup finished in 722ms (kernel) + 12.897s (userspace) = 13.620s
multi-user.target reached after 12.847s in userspace

systemd-analyze blame
6.889s activate-multi-user.service
6.661s kubelet.service
3.713s systemd-random-seed.service
2.878s wicked.service

ECS

First boot:

# In latest 1.0.8
Startup finished in 2.154s (kernel) + 12.205s (userspace) = 14.360s
multi-user.target reached after 12.167s in userspace
# With new configuration
Startup finished in 1.496s (kernel) + 8.473s (userspace) = 9.970s
multi-user.target reached after 8.455s in userspace

Reboot:

# In latest 1.0.8
Startup finished in 739ms (kernel) + 8.700s (userspace) = 9.440s
multi-user.target reached after 8.652s in userspace
# With new configuration
Startup finished in 709ms (kernel) + 8.580s (userspace) = 9.290s
multi-user.target reached after 8.542s in userspace

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@arnaldo2792 arnaldo2792 requested review from tjkirch and bcressey April 13, 2021 00:31
@arnaldo2792 arnaldo2792 linked an issue Apr 13, 2021 that may be closed by this pull request
@bcressey
Copy link
Contributor

The key metric here is elapsed time from when reboot is initiated until systemd finishes shutting down everything and triggers the actual reboot.

Copy link
Contributor

@zmrow zmrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕹️

From the code perspective this looks right.

@zmrow
Copy link
Contributor

zmrow commented Apr 13, 2021

The key metric here is elapsed time from when reboot is initiated until systemd finishes shutting down everything and triggers the actual reboot.

How can we best gather that?

@bcressey
Copy link
Contributor

The key metric here is elapsed time from when reboot is initiated until systemd finishes shutting down everything and triggers the actual reboot.

How can we best gather that?

The best way is by watching the console when the system is shutting down.

@arnaldo2792
Copy link
Contributor Author

  • Re added removed configurations and just remove DefaultTimeoutStartSec=10s

Copy link
Contributor

@zmrow zmrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

This commit removes the `DefaultTimeoutStartSec` configuration, since
services like kubernetes require more than 10 seconds to start in the
vmware variant.
@arnaldo2792 arnaldo2792 changed the title packages: remove timeout overrides for systemd units release: remove timeout overrides for systemd units Apr 14, 2021
@arnaldo2792
Copy link
Contributor Author

  • Updated the commit message in the last forced push

@arnaldo2792 arnaldo2792 removed the request for review from tjkirch April 14, 2021 22:37
@arnaldo2792 arnaldo2792 merged commit 328b50a into bottlerocket-os:develop Apr 14, 2021
@arnaldo2792 arnaldo2792 deleted the remove-timeouts branch April 14, 2021 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Revert global systemd unit timeout defaults
3 participants