Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented Retry Policy #521

Open
wants to merge 6 commits into
base: andyohart/managed-identity
Choose a base branch
from

Conversation

4gust
Copy link
Collaborator

@4gust 4gust commented Oct 21, 2024

Pull Request: Implement Retry Policy for Client

Overview

This pull request introduces a retry policy for network calls in the client. The retry policy aims to enhance the resilience of our application by allowing it to automatically retry failed requests up to a maximum of three times.

Changes Made

  • Retry Policy Implementation:

    • A retry mechanism has been added to handle transient errors in network calls.
    • The default maximum retry count is set to 3.
  • Configuration Option to Disable Retry Policy:

    • Introduced an optional method WithRetryPolicyDisabled() for the client.
    • When this method is used, the maximum retry count is reduced to 1 (effectively disabling the retry mechanism).

Default Behavior (Retries Enabled)

By default, the client will automatically retry failed requests up to 3 times with delay of 1 second. Here’s how to create the client with the retry policy enabled:

client, err := mi.New(SystemAssigned())
if err != nil {
    // Handle error
}

Default Behavior (Retries Disabled)

By default, the client will not retry the failed request. Here’s how to create the client with the retry policy disabled:

client, err := mi.New(SystemAssigned(), WithRetryPolicyDisabled())
if err != nil {
    // Handle error
}

Error List when retry is done.

	http.StatusNotFound,                      // 404
	http.StatusGone,                          // 410
	http.StatusTooManyRequests,               // 429 // retry after.
	http.StatusInternalServerError,           // 500
	http.StatusNotImplemented,                // 501
	http.StatusBadGateway,                    // 502
	http.StatusServiceUnavailable,            // 503
	http.StatusGatewayTimeout,                // 504
	http.StatusHTTPVersionNotSupported,       // 505
	http.StatusVariantAlsoNegotiates,         // 506
	http.StatusInsufficientStorage,           // 507
	http.StatusLoopDetected,                  // 508
	http.StatusNotExtended,                   // 510
	http.StatusNetworkAuthenticationRequired, // 511

@4gust 4gust requested review from AndyOHart and chlowell and removed request for rayluo October 21, 2024 20:23
Copy link
Collaborator

@chlowell chlowell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a spec for this feature? It seems very basic e.g. no backoff, hardcoded retry count and delay. Blockers are

  • implicit upgrade to Go 1.21
  • cancellation
  • rewinding request bodies

apps/managedidentity/managedidentity.go Outdated Show resolved Hide resolved
apps/managedidentity/managedidentity_test.go Outdated Show resolved Hide resolved
apps/managedidentity/managedidentity.go Outdated Show resolved Hide resolved
apps/managedidentity/managedidentity.go Outdated Show resolved Hide resolved
apps/managedidentity/managedidentity.go Outdated Show resolved Hide resolved
if attempts-1 < 1 {
return resp, nil
}
time.Sleep(delay)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MSAL must observe Context cancellation while waiting to retry. See for example the Azure SDK retry policy

return resp, nil
}
time.Sleep(delay)
return retry(attempts-1, delay, c, req)
Copy link
Collaborator

@chlowell chlowell Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably need to rewind the request body before retrying (this will matter for Cloud Shell). Make sure you have a test verifying retries have the same content as the initial attempt

Added test to check request body is same for each request.
@4gust
Copy link
Collaborator Author

4gust commented Oct 23, 2024

It is very basic retry, and has a fix 1 second between retry and max count of 3 retries
This is what we set for Private preview.

@4gust 4gust requested a review from chlowell October 23, 2024 16:51
apps/managedidentity/managedidentity.go Outdated Show resolved Hide resolved
apps/managedidentity/managedidentity.go Outdated Show resolved Hide resolved
apps/managedidentity/managedidentity.go Outdated Show resolved Hide resolved
apps/managedidentity/managedidentity.go Outdated Show resolved Hide resolved
defer tryCancel()
cloneReq := req.Clone(tryCtx)
resp, err = c.Do(cloneReq)
if err == nil && !contains(retryStatusCodes, resp.StatusCode) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should consider the Source. retryStatusCodes follows IMDS docs, and that guidance doesn't make sense for other APIs. For example, in general you don't want to retry 404. Azure SDK retries these codes by default and special-cases IMDS

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @neha-bhargava as this is probably a small bug in all other MSALs (which should not affect Azure SDK)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussions with @bgavrilMS, we have decided to maintain consistency with other MSAL implementations. We will create a bug report to address this issue and ensure that the necessary updates are applied across all relevant areas.

Updating the error code for retry
@4gust 4gust requested a review from chlowell October 29, 2024 11:53
Copy link

sonarcloud bot commented Oct 29, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants