Implementation for retrying requests #57

manicminer · 2021-06-02T02:00:33Z

Attempt to work around eventual consistency issues at the cost of some performance and potentially unnecessary requests.
Request input structs can now contain a ConsistencyFailureFunc which determines whether a request should be retried on failure.
ConsistencyFailureFuncs receive a copy of the response and the parsed OData with which to make such decisions.
A canned RetryOn404ConsistencyFailureFunc can simply retry requests that receive a 404 response.
This mitigates some eventual consistency issues, such as manipulating or referencing a newly created object, but doesn't work when listing objects.
There's an example of a custom func in the ServicePrincipalsClient{}.Create() method which indicates retries when the referenced application isn't found

- Attempt to work around eventual consistency issues at the cost of some performance and potentially unnecessary requests. - Request input structs can now contain a ConsistencyConditionFunc which determines whether a request should be retried on failure. - ConsistencyConditionFuncs receive a copy of the response and the parsed OData with which to make such decisions. - A canned RetryOn404ConsistencyConditionFunc can simply retry requests that receive a 404 response. - This mitigates some eventual consistency issues, such as manipulating or referencing a newly created object, but doesn't work when listing objects.

manicminer · 2021-06-02T19:49:44Z

Test results with these mitigations:

manicminer · 2021-06-02T19:53:45Z

@romainDavaze @tsologub Do you have any thoughts on this approach? Is there a risk to replaying requests that I haven't realized?

In the end I figured it was probably better to enable this retry mechanism by default, but it can be disabled by setting client.BaseClient.DisableRetries = true.

It's unlikely the SDK can shield all consistency issues from apps (eg. the List methods can often be missing newly created objects) but hopefully most of the common delay-related errors can be handled.

romainDavaze · 2021-06-03T08:49:19Z

This seems to be a good way to implement the retry mechanism.

To me the only risk is that the end-user might be a bit thrown off by the delay of function calls in some cases. I think it might be wise to add a note somewhere in the documentation explaining this behavior, why we do this and how it can be turned off.

This will introduce more delay in SDK calls which is not ideal, but I'd rather have this than something quicker that has a chance to fail.

tsologub · 2021-06-03T16:13:22Z

I will gladly take a look on Monday (June 7). Currently, on a short vacation.

tsologub · 2021-06-08T04:29:39Z

Great job! I like it, especially turning this feature on by default. 👍 Yeah, it can be slow, but a user can turn it off explicitly.

Btw. I found other places where the retries can be useful as well. e.g. application.RemovePassword(), application.GetDeleted(), directoryRoleTemplates.Get(), and many more. Do you want to roll out the feature iteratively, and that's why left them untouched on purpose?

tsologub · 2021-06-08T04:29:46Z

msgraph/client.go

+			return nil, status, nil, fmt.Errorf("reading request body: %v", err)
+		}
+	}
+
 	var attempts, backoff, multiplier int64
 	for attempts = 0; attempts < requestAttempts; attempts++ {


It is slightly hard to understand for how long the retries will be performed.
Correct me if I am wrong:

we have 10 attempts.

every time we have a consistency error, the backoff is set to 2.

in between attempts, we sleep for 2 seconds.

i.e., at most, we are retrying for ~20 seconds, right ?

Some time ago, I stumbled upon this video, where the author talks about 1-2 minute for replication to propagate fully.

We are provisioning applications and service principals daily, and I can say statistically it is about 2 minutes. i.e., we stopped receiving errors when we increased retries to 2 minutes mark.

I am afraid that ~20 seconds won't be enough. Or, we can start with something (20 seconds) and see how it goes.

I've been trying to strike a balance between performance and consistency - ultimately the API doesn't guarantee consistency so you should always defensively check - and 20 seconds seems to smooth over replication most of the time, i.e. in the best conditions. But if you think the retries should be a bit more persistent that's totally fine, we can extend it. I agree around 2 mins is a reasonable milestone, anecdotally it's fairly rare for it to be longer than that IME. We should probably add some backoff in that case (but still be fairly aggressive).

manicminer · 2021-06-08T08:41:16Z

Great job! I like it, especially turning this feature on by default. 👍 Yeah, it can be slow, but a user can turn it off explicitly.

Btw. I found other places where the retries can be useful as well. e.g. application.RemovePassword(), application.GetDeleted(), directoryRoleTemplates.Get(), and many more. Do you want to roll out the feature iteratively, and that's why left them untouched on purpose?

Thanks! I noticed a few methods I missed, I'll add those 👍

- Retry in more methods - Use exponential backoff for retrying due to consistency failure - Capped at just over 2 minutes

manicminer · 2021-06-08T20:50:39Z

Thanks all for the time to review 🙌

manicminer added the enhancement New feature or request label Jun 2, 2021

romainDavaze mentioned this pull request Jun 2, 2021

Add azuread_invitation resource hashicorp/terraform-provider-azuread#445

Merged

Update dependencies, linting

74381a8

manicminer force-pushed the feature/request-retrying branch from 8dc9455 to c59f24d Compare June 2, 2021 18:31

manicminer marked this pull request as ready for review June 2, 2021 18:31

manicminer added this to the v0.16.0 milestone Jun 2, 2021

manicminer force-pushed the feature/request-retrying branch 2 times, most recently from 7351c3b to db2dd24 Compare June 2, 2021 19:04

manicminer changed the title ~~WIP Initial implementation for retrying requests~~ Implementation for retrying requests Jun 2, 2021

manicminer force-pushed the feature/request-retrying branch from 1ae0753 to cebaaca Compare June 2, 2021 19:45

manicminer added 2 commits June 2, 2021 20:46

Retry 500 and 502 responses with backoff

5c69989

manicminer force-pushed the feature/request-retrying branch from cebaaca to 5c69989 Compare June 2, 2021 19:46

tsologub reviewed Jun 8, 2021

View reviewed changes

More eventual consistency handling

0c56b60

- Retry in more methods - Use exponential backoff for retrying due to consistency failure - Capped at just over 2 minutes

manicminer merged commit b52ca40 into main Jun 8, 2021

manicminer deleted the feature/request-retrying branch June 8, 2021 20:50

manicminer mentioned this pull request Jun 8, 2021

Implement retry for user deletion #50

Closed

manicminer added a commit that referenced this pull request Jun 8, 2021

Changelog for #57

8902529

manicminer added a commit that referenced this pull request Jun 15, 2021

Changelog for #57

6fafe87

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation for retrying requests #57

Implementation for retrying requests #57

manicminer commented Jun 2, 2021 •

edited

Loading

manicminer commented Jun 2, 2021

manicminer commented Jun 2, 2021 •

edited

Loading

romainDavaze commented Jun 3, 2021

tsologub commented Jun 3, 2021

tsologub commented Jun 8, 2021

tsologub Jun 8, 2021 •

edited

Loading

manicminer Jun 8, 2021

manicminer commented Jun 8, 2021

manicminer commented Jun 8, 2021

Implementation for retrying requests #57

Implementation for retrying requests #57

Conversation

manicminer commented Jun 2, 2021 • edited Loading

manicminer commented Jun 2, 2021

manicminer commented Jun 2, 2021 • edited Loading

romainDavaze commented Jun 3, 2021

tsologub commented Jun 3, 2021

tsologub commented Jun 8, 2021

tsologub Jun 8, 2021 • edited Loading

Choose a reason for hiding this comment

manicminer Jun 8, 2021

Choose a reason for hiding this comment

manicminer commented Jun 8, 2021

manicminer commented Jun 8, 2021

manicminer commented Jun 2, 2021 •

edited

Loading

manicminer commented Jun 2, 2021 •

edited

Loading

tsologub Jun 8, 2021 •

edited

Loading