Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation for retrying requests #57

Merged
merged 4 commits into from
Jun 8, 2021
Merged

Conversation

manicminer
Copy link
Owner

@manicminer manicminer commented Jun 2, 2021

  • Attempt to work around eventual consistency issues at the cost of some performance and potentially unnecessary requests.
  • Request input structs can now contain a ConsistencyFailureFunc which determines whether a request should be retried on failure.
  • ConsistencyFailureFuncs receive a copy of the response and the parsed OData with which to make such decisions.
  • A canned RetryOn404ConsistencyFailureFunc can simply retry requests that receive a 404 response.
  • This mitigates some eventual consistency issues, such as manipulating or referencing a newly created object, but doesn't work when listing objects.
  • There's an example of a custom func in the ServicePrincipalsClient{}.Create() method which indicates retries when the referenced application isn't found

@manicminer manicminer force-pushed the feature/request-retrying branch from 8dc9455 to c59f24d Compare June 2, 2021 18:31
@manicminer manicminer marked this pull request as ready for review June 2, 2021 18:31
@manicminer manicminer added this to the v0.16.0 milestone Jun 2, 2021
@manicminer manicminer force-pushed the feature/request-retrying branch 2 times, most recently from 7351c3b to db2dd24 Compare June 2, 2021 19:04
@manicminer manicminer changed the title WIP Initial implementation for retrying requests Implementation for retrying requests Jun 2, 2021
@manicminer manicminer force-pushed the feature/request-retrying branch from 1ae0753 to cebaaca Compare June 2, 2021 19:45
- Attempt to work around eventual consistency issues at the cost of some performance and potentially unnecessary requests.
- Request input structs can now contain a ConsistencyConditionFunc which determines whether a request should be retried on failure.
- ConsistencyConditionFuncs receive a copy of the response and the parsed OData with which to make such decisions.
- A canned RetryOn404ConsistencyConditionFunc can simply retry requests that receive a 404 response.
- This mitigates some eventual consistency issues, such as manipulating or referencing a newly created object, but doesn't work when listing objects.
@manicminer manicminer force-pushed the feature/request-retrying branch from cebaaca to 5c69989 Compare June 2, 2021 19:46
@manicminer
Copy link
Owner Author

Test results with these mitigations:

Screenshot 2021-06-02 at 20 48 28

@manicminer
Copy link
Owner Author

manicminer commented Jun 2, 2021

@romainDavaze @tsologub Do you have any thoughts on this approach? Is there a risk to replaying requests that I haven't realized?

In the end I figured it was probably better to enable this retry mechanism by default, but it can be disabled by setting client.BaseClient.DisableRetries = true.

It's unlikely the SDK can shield all consistency issues from apps (eg. the List methods can often be missing newly created objects) but hopefully most of the common delay-related errors can be handled.

@romainDavaze
Copy link

This seems to be a good way to implement the retry mechanism.

To me the only risk is that the end-user might be a bit thrown off by the delay of function calls in some cases. I think it might be wise to add a note somewhere in the documentation explaining this behavior, why we do this and how it can be turned off.

This will introduce more delay in SDK calls which is not ideal, but I'd rather have this than something quicker that has a chance to fail.

@tsologub
Copy link
Contributor

tsologub commented Jun 3, 2021

I will gladly take a look on Monday (June 7). Currently, on a short vacation.

@tsologub
Copy link
Contributor

tsologub commented Jun 8, 2021

Great job! I like it, especially turning this feature on by default. 👍 Yeah, it can be slow, but a user can turn it off explicitly.

Btw. I found other places where the retries can be useful as well. e.g. application.RemovePassword(), application.GetDeleted(), directoryRoleTemplates.Get(), and many more. Do you want to roll out the feature iteratively, and that's why left them untouched on purpose?

return nil, status, nil, fmt.Errorf("reading request body: %v", err)
}
}

var attempts, backoff, multiplier int64
for attempts = 0; attempts < requestAttempts; attempts++ {
Copy link
Contributor

@tsologub tsologub Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is slightly hard to understand for how long the retries will be performed.
Correct me if I am wrong:

  • we have 10 attempts.
  • every time we have a consistency error, the backoff is set to 2.
  • in between attempts, we sleep for 2 seconds.

i.e., at most, we are retrying for ~20 seconds, right ?


Some time ago, I stumbled upon this video, where the author talks about 1-2 minute for replication to propagate fully.

We are provisioning applications and service principals daily, and I can say statistically it is about 2 minutes. i.e., we stopped receiving errors when we increased retries to 2 minutes mark.

I am afraid that ~20 seconds won't be enough. Or, we can start with something (20 seconds) and see how it goes.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been trying to strike a balance between performance and consistency - ultimately the API doesn't guarantee consistency so you should always defensively check - and 20 seconds seems to smooth over replication most of the time, i.e. in the best conditions. But if you think the retries should be a bit more persistent that's totally fine, we can extend it. I agree around 2 mins is a reasonable milestone, anecdotally it's fairly rare for it to be longer than that IME. We should probably add some backoff in that case (but still be fairly aggressive).

@manicminer
Copy link
Owner Author

Great job! I like it, especially turning this feature on by default. 👍 Yeah, it can be slow, but a user can turn it off explicitly.

Btw. I found other places where the retries can be useful as well. e.g. application.RemovePassword(), application.GetDeleted(), directoryRoleTemplates.Get(), and many more. Do you want to roll out the feature iteratively, and that's why left them untouched on purpose?

Thanks! I noticed a few methods I missed, I'll add those 👍

- Retry in more methods
- Use exponential backoff for retrying due to consistency failure
- Capped at just over 2 minutes
@manicminer
Copy link
Owner Author

Thanks all for the time to review 🙌

@manicminer manicminer merged commit b52ca40 into main Jun 8, 2021
@manicminer manicminer deleted the feature/request-retrying branch June 8, 2021 20:50
manicminer added a commit that referenced this pull request Jun 8, 2021
manicminer added a commit that referenced this pull request Jun 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants