Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

imdsclient: add retry to fetch_token #1801

Merged
merged 1 commit into from
Nov 4, 2021

Conversation

jpculp
Copy link
Member

@jpculp jpculp commented Nov 2, 2021

Description of changes:

This adds retry logic if imdsclient fails to fetch a session token.

Also includes small changes to the fetch_imds retry logic, such as moving
the sleep after the ensure imds_attempts <= max_imds_attempts and bumping
the wait time from 100ms to 1s.

Testing done:

  • Built aws-k8s-1.20 ami and launched instance.
  • Instance connected to eks cluster.
  • Connected to control container via ssm session.
  • Verified that host-containers.admin.user-data contained a base64-encoded block.
  • Connected to admin container via ssh.
  • Verified that /.bottlerocket/host-containers/admin/user-data contained JSON.
  • Ran sudo sheltie to verify root shell was still available.
  • Checked for failed systemd units.
  • Ran pluto with it's sub-commands to verify functionality.

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@jpculp jpculp requested a review from webern November 2, 2021 23:47
Comment on lines 357 to 361
let mut token_attempts: u8 = 0;
let max_token_attempts: u8 = 2;
loop {
token_attempts += 1;
ensure!(
token_attempts <= max_token_attempts,
error::FailedFetchToken { token_attempts }
);
if token_attempts > 1 {
time::sleep(Duration::from_secs(5)).await;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to align the number of retries and the length of the delay with fetch_imds - the requests are going to the same service and reaching the failure state in either function will likely result in a boot failure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retries in fetch_imds are mostly to trigger token refreshes in the even the token is expired as opposed to IMDS unavailability (hence a shorter sleep). Each client initialization and fetch_imds attempt is going to fetch a new token. If IMDS is unreachable for whatever reason, the failure would occur during the fetch_token. Since we are adding retries to the token fetch now it might make sense to drop the max_imds_attempts from 3 to 2, but I wouldn't recommend aligning the sleep duration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of dropping fetch_imds from 3 to 2, I bumped fetch_token from 2 to 3.

@zmrow
Copy link
Contributor

zmrow commented Nov 3, 2021

Once @bcressey 's concern is handled, lgtm

This adds retry logic if imdsclient fails to fetch a session token.

Also includes small changes to the fetch_imds retry logic, such as
moving the sleep after the ensure imds_attempts <= max_imds_attempts and
bumping the wait time from 100ms to 1s.
@jpculp jpculp force-pushed the add-imds-token-retry branch from 47f4f82 to 05f01a2 Compare November 3, 2021 18:18
@jpculp
Copy link
Member Author

jpculp commented Nov 3, 2021

  • Replaced variables like imds_attempts with attempt.
  • Increased the fetch_token max attempts from 2 to 3. Now both fetch_token and fetch_imds have 3 attempts.
  • Changed wording for fetch_token error message to be more descriptive.

@jpculp jpculp merged commit fa6ef3e into bottlerocket-os:develop Nov 4, 2021
@jpculp jpculp deleted the add-imds-token-retry branch November 4, 2021 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants