Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: rate-limiting (requests/time) #260

Closed
georgiosd opened this issue Jun 16, 2017 · 66 comments · Fixed by #903
Closed

Feature request: rate-limiting (requests/time) #260

georgiosd opened this issue Jun 16, 2017 · 66 comments · Fixed by #903
Labels
Milestone

Comments

@georgiosd
Copy link

Hi :)

I'm enjoying Polly very much, good work!

Would love something like this: http://www.jackleitch.net/2010/10/better-rate-limiting-with-dot-net/ that works better than that :)

I've used that RateGate to connect to APIs that don't ban you if you send more than x requests in y seconds and it doesn't work too well, it leaks here and there.

@joelhulen
Copy link
Member

@georgiosd We're glad you're enjoying Polly :)

Have you looked into using Polly to handle TResult?

There are typical HTTP results that are returned once you hit a rate limit or a server is busy (such as 429). Some API providers even return header results that tell you how long to wait before retrying. In fact, @dreisenberger wrote a blog post last month on the subject. There's also much discussion around the topic here.

@georgiosd
Copy link
Author

Actually for the use cases I'm thinking about (cryptocurrency exchanges API), hitting the limit would be bad - the goal is to poll data as quickly as possible and if you get rate limitted there is a penalty that could make me miss an important event.

Makes sense? :)

@joelhulen
Copy link
Member

Ok, I understand now. I thought you were dealing with HTTP requests where you just have to back off for a bit. In your case, you can't afford to hit the limit. So do you know the rate limits up front? You can, with absolute certainty, say that "I can send x requests within a given n minute window"? Also, is your application multi-threaded with multiple instances potentially calling this API at once, requiring you to share your API hit count across instances?

@georgiosd
Copy link
Author

georgiosd commented Jun 16, 2017

Correct. The limits are published (though often inaccurately).

It is multi-threaded because there are several kinds of updates that need to happen at once (trade updates, account updates). I usually have an event loop running in a Task fetching each one. So the rate gate would effectively throttle the event loops.

@reisenberger
Copy link
Member

Hi @georgiosd , thanks for joining the Polly conversation!

I've looked at Jack Leitch's article in the past, as we've had discussion around a rate-limiting Policy in our slack channel. You mentioned "it leaks here and there": if you have any specifics on that (what you thought was causing the leak; or just the circumstances), it would be good to hear more, so that we can consider that in any Polly implementation.

One evident issue: any rate-limiter whose approach is to hold back 'hot' tasks - tasks which are already executing, in memory - is intrinsically vulnerable to memory bulges if fresh requests consistently outstrip the permitted rate, simply because those pending requests are all in memory. At least, in the absence of any co-operative demand control (back-pressure) or load-shedding. I've noted this in more detail in the Polly Roadmap. Is this the kind of thing you were thinking of with the 'leaks', or did you detect leaks even in relatively steady state? (ie without large numbers of pending requests backing up).

Thanks!

@georgiosd
Copy link
Author

Hey @reisenberger, thanks for checking in.

I must say that this was a few months ago so my recollection is somewhat hazy - I remember that I was hitting the rate limits all the time and decided to print out the rate-limited timestamps of the requests and some of them were out of whack.

i.e. let's say the fastest I can call the API is 2s, it would be like:

2s
1.99s
2.01s
0.5s
2s

You get the idea.

I tried to figure out what could be causing it but I couldn't understand what the code does, well enough.

I also don't think your second point applies to my particular use case, it's much simpler than that. Basically there are 1-5 tasks/event loops that are running concurrently, making HTTP calls to the same API. I must be sure that a) none of them is allowed to execute faster than the permitted rate and b) there is some fairness in entering the critical section. Obvious, I guess, as you wouldn't want any of the event loops to "starve" (wait for ever or for a long time vs other event loops).

Makes sense? Happy to provide any more details.

@reisenberger
Copy link
Member

Hi @georgiosd . Re:

I also don't think your second point applies to my particular use case
Basically there are 1-5 tasks/event loops that are running concurrently

With you there 👍 (suspected that might be the setup from your initial description, but good to have it confirmed in more detail).

Re:

tried to figure out what could be causing it

The only observation I can make is that, if those up-to-five separate event loops are hitting the same API and need (combined) to not exceed the rate limit, then they'd need (from my code reading) to share the same RateGate instance. (The same would also apply for the way I envisage we could implement this as a Polly RateLimit policy.)

Testing the multi-threaded case robustly is certainly something we should do if we implement a Polly RateLimit policy.

@georgiosd
Copy link
Author

No problem! I forgot the mention the "leak" is that 0.5s delay amongst the pool of 2s ones.

And yes, the RateGate was shared between event loops.

@reisenberger
Copy link
Member

👍 @georgiosd Thanks for the clarification and feedback!

@georgiosd
Copy link
Author

Hey @reisenberger - I resurrected the project that needs this so I was just wondering whether this is on the roadmap somewhere? :)

@reisenberger
Copy link
Member

reisenberger commented Jul 31, 2017

Hi @georgiosd . This is on the roadmap but there isn't any allocated resource or timescale. For the core maintainers, some other large-ish items are ahead at mo (CachePolicy; unify sync/async policies; perf enhancements).

We'd love to see it implemented, however, if you (or anyone else) is interested in working on a PR. Shout if so, and we can provide guidance on how to structure an implementation as a Polly policy!

@georgiosd
Copy link
Author

It's not my core strength (which is why I wasn't able to find what was wrong with the RateGate and fix it, in the first place) but if you can help out with the actual logic, I'd love to contribute.

@reisenberger
Copy link
Member

reisenberger commented Aug 1, 2017

Understood @georgiosd . When I/somebody can squeeze an implementation out, it would awesome to have you put it through its paces! (or any other contributions!). As prev. noted, other development priorities are (unfortunately) ahead of this feature at mo.

@georgiosd
Copy link
Author

If you have give me some steps/implementation guidance, I can give it a try!

@reisenberger
Copy link
Member

Hi @georgiosd . The best way to see the architecture of how a new Policy is implemented is look at the shape of the files making up the NoOpPolicy (and its tests). NoOpPolicy is just an (intentionally) empty policy which does nothing, so that shows you the bare bones structure you would start with ... for adding a new policy.

https://github.com/App-vNext/Polly/tree/master/src/Polly.Shared/NoOp

https://github.com/App-vNext/Polly/pull/214/files

@georgiosd
Copy link
Author

georgiosd commented Aug 2, 2017 via email

@reisenberger
Copy link
Member

Cross-ref #330

@Matthewsre
Copy link

Just came over to request this feature and see it is already on the roadmap. Would love to see this implemented.

@phatcher
Copy link

phatcher commented Aug 2, 2018

Just chipping in with a couple of thoughts...

Basically this looks to me like a classic producer/consumer pattern i.e. we have requests arriving at some rate and they can either be executed immediately or they need to be queued up since we are in danger of breaching the rate limit.

One open question is do we regularise the flow i.e. constant time between each request or do we let them flow as fast as possible until we get close to the limit.

This seems to be the token bucket/leaky bucket concepts.

There's a C# implementation of this at TokenBucket which I think could be adapted to this purpose - might be more than one policy depending on what behaviour you want e.g. discard silent, discard but throw etc, etc

@reisenberger
Copy link
Member

x-ref #528

@rahulrai-in
Copy link

Hi,
I would like to take up this activity to contribute, however, I am a little confused with the discussion until this point. Is this issue still open and what is the conclusion with respect to implementing this feature?

@reisenberger
Copy link
Member

Hi @rahulrai-in . Thank you for your offer to take this up! This is still up-for-grabs.

I took a couple of hours to sketch some code ideas that were kicking around my head. I have pushed these to a branch: https://github.com/reisenberger/Polly/tree/reisenberger-rateLimiterSketch

Please feel free to build on this. Is it useful? Is it missing any important angle?

I'll try to post some further notes shortly about the code in that ^ branch.

@reisenberger
Copy link
Member

Some notes about https://github.com/reisenberger/Polly/tree/reisenberger-rateLimiterSketch, as promised;

  • It's a token-bucket implementation.
  • There is only the async version of the *Engine.cs file, and only the async TResult version of the *Syntax.cs, but it should give an idea how sync variants and void-returning syntax could be similarly done. Any of the other policies provide examples of how to complete the pattern. NoOpPolicy or `BulkheadPolicy are easy examples to follow.
  • There is a lock-based and a lock-free implementation of IRateLimiter. Any readers have any views around this? I think we should include only one implementation in main Polly. We could possibly push the other one out to Polly.Contrib as an option.
  • We need unit tests. The implementations use Polly's SystemClock.DateTimeOffsetUtcNow(). So the tests can manipulate this abstracted clock, to avoid real-time delays. Like some of the cache tests do.

This feature is not a priority for my own projects, but I am happy to collaborate/guide/help as much as I can, for those who need it in their own projects, to implement. Thank you @rahulrai-in for your interest! Do you think this is a useful starting point to take forward? Or do you have other ideas / thoughts / questions, how it could be done? And: same questions to anyone else interested! Thanks.

@reisenberger
Copy link
Member

A final piece of explanation, about the Func<TimeSpan, Context, TResult> retryAfterFactory in https://github.com/reisenberger/Polly/tree/reisenberger-rateLimiterSketch:

The proposed syntax allows the user to (optionally) configure the policy with a Func<TimeSpan, Context, TResult> retryAfterFactory. This allows the user to specify - for the case when a call is rate-limited - how they would like the retry-after expressed.

  • If the retryAfterFactory isn't supplied, the policy will throw RateLimitRejectedException.
  • Obviously, the retryAfterFactory is irrelevant for non-generic policies. If it is a non-generic Async/RateLimitPolicy (not Async/RateLimitPolicy<TResult>), the user can't specify any return value how the 'retry-after' should be expressed. It just has to throw RateLimitRejectedException.

To give an example, for a policy guarding calls returning HttpResponseMessage, a user could use the factory an obvious way like:

Func<TimeSpan, Context, HttpResponseMessage> retryAfterFactory = (span, context) =>
{
    var response = new HttpResponseMessage
    {
        StatusCode = (HttpStatusCode) 429,
        ReasonPhrase = "Too Many Requests",
        Content = new StringContent(string.Format(CultureInfo.InvariantCulture, "Rate limit reached. Retry in {0} seconds.", span.TotalSeconds))
    };

    response.Headers.Add("Retry-After", span.TotalSeconds.ToString(CultureInfo.InvariantCulture));

    return response;
};

The idea of this Func<TimeSpan, Context, TResult> retryAfterFactory is that it could be useful for both those imposing a rate-limit (as a service-provider wanting to rate-limit callers); and for those wanting to abide by a rate-limit they know a downstream system will impose on them (and perhaps they prefer/need to not breach the limit so they don't get blacklisted for a period). If you are wanting to abide by a rate-limit which you expect the called system to impose, it could be very convenient to build a rate-limit policy that limits you by providing responses in exactly the same style as the downstream system would. That way, you only have to build one WaitAndRetry(...) policy to respond to those "you've been rate-limited" signals.

@reisenberger
Copy link
Member

Triage as previously promised: As mentioned elsewhere, I see rate-limiting as two distinct use-cases:

(a) being the rate-imposing party, for example a called system imposing a rate-limit on those who call (you want simply to reject executions exceeding the rate-limit)

(b) being the rate-conforming party, for example being a caller of a system imposing a rate-limit (rather than reject, you want to queue executions to abide by the rate-limit)

The implementation in the WIP spike is largely production-ready for (a) the rate-imposing use case; the (b) rate-conforming use case needs a new implementation.

For the (a) rate-imposing use case, the implementation currently normalises a configured rate of N tokens per time T, treating it as 1 token per time T/N. I believe this normalisation should be removed.

(My current priorities are in other areas of Polly v8.)

@alastairtree
Copy link

(b) being the rate-conforming party, for example being a caller of a system imposing a rate-limit (rather than reject, you want to queue executions to abide by the rate-limit)

These two libraries implement the leaky/token bucket which are useful for implementing the (b) use case or as an alternative package to help folks who come here looking for it:

@madelson
Copy link

madelson commented Mar 17, 2021

What is the status of this effort and is there consensus on the behavior? It's a feature we would use and I'd be interested in contributing this (with guidance) if there is interest/need.

Some general thoughts:

  • In the cases I'm working with, my goal is to avoid overloading the system that is behind the rate limit gate when there is a spike in load. Most of the time, I'd be happy to just start dropping requests that fail to pass the gate (rate-imposing), but in some cases I could see wanting to queue requests in that case (rate-conforming). I don't think I'd ever want to queue requests without some kind of bound on the queue length to avoid the load spike turning into a local spike in memory.
  • There are lots of rate limiting algorithms out there; I'm a fan of the "sliding window" approach described on https://konghq.com/blog/how-to-design-a-scalable-rate-limiting-algorithm/ because this requires constant memory regardless of what parameters are selected while at the same time providing a smooth requests/time curve.
  • I've also seen cases where you want to combine 2 rate limiting policies, for example "no more than 10 requests/second, but also no more than 100 requests/minute".

@martincostello
Copy link
Member

There's a draft PR open here: #666

I ported it to various production code bases from source (#666 (comment)) and it's been happily running for about 8 months for the use case we needed it for.

I'd suggest taking a look at the draft and seeing if it would work for your use case(s), and if not add feedback to the draft.

@madelson
Copy link

madelson commented Mar 17, 2021

Thanks @martincostello I've commented on the PR.

@SeanFarrow
Copy link
Contributor

SeanFarrow commented Mar 17, 2021 via email

@martincostello
Copy link
Member

martincostello commented Mar 17, 2021

@SeanFarrow We use it to throttle per-user requests to particular code paths, such as "expensive" queries, or to prevent spam from "abusive" client application requests.

@madelson
Copy link

madelson commented Mar 18, 2021

@SeanFarrow if it is helpful we are using this technique in ad-hoc ways (but not currently through Polly) in a couple of scenarios:

  • We have an error-reporting service, but when something goes really wrong (e. g. network problems, certain code bugs) that causes a huge spike in the number of errors being reported which itself can cause more problems. We use a compound rate-limiting policy (X errors/sec of a each type, Y errors/sec across all types) to prevent this. The idea is that it is generally fine to skip some error reports in such scenarios.
  • We have another system that executes user-provided scripts and saves off their console logs for later review. A poorly-written script can exhaust our storage if it spams logs. We rate-limit the logged output to prevent this problem while still preserving a sampling of logs from across the script run (this was found to be more useful than just saving off the first N log entries). This also uses a compound policy (X logs/sec, Y logs/min) to allow for bursts of logs (often at the start/end of a script) while maintaining a more conservative limit overall.

@IanKemp
Copy link

IanKemp commented Aug 6, 2021

For anyone wondering when this is likely to be released, just saw on the PR: #666 (comment) (4 days ago at time of typing this):

This feature is currently on-hold for an indefinite period due to the author's personal commitments making them unable to further pursue it.

So it hasn't been forgotten, life has just got in the way. Hope everything is alright with @reisenberger.

@IanKemp
Copy link

IanKemp commented Aug 12, 2021

I require this functionality, so I pulled the branch down, built it into a NuGet package and started testing it. And I've encountered some weirdness. Here's my code:

Startup.cs ConfigureServices:

services
    .AddHttpClient<IMyClient, MyClient>(c => c.BaseAddress = new Uri("https://my-service.my.domain/"))
    .AddPolicyHandler(Polly.RateLimit.Policy.RateLimitAsync<HttpResponseMessage>(
        60, TimeSpan.FromMinutes(1)
    ));

Client:

public interface IMyClient
{
    async Task<SomeResponseModel> CallHttpEndpointAsync();
}

public class MyClient : IMyClient
{
    private readonly HttpClient _httpClient;

    public MyClient(HttpClient httpClient)
    {
        _httpClient = httpClient;
    }

    public async Task<SomeResponseModel> CallHttpEndpointAsync()
    {
        var response = await _httpClient.GetAsync("some/endpoint");
        var result = response.IsSuccessStatusCode ? await response.Content.ReadAsAsync<SomeResponseModel>() : default;

        return result;
    }
}

Test:

var duration = TimeSpan.FromMinutes(1); // same as in Startup
var requests = new Func<Task<bool>>[600];

for (var i = 0; i < requests.Length; i ++)
{
    requests[i] = async () =>
    {
        try
        {
            await _iMyClientInstance.CallHttpEndpointAsync();

            // if the call was made, we weren't rate-limited
            return true;
        }
        catch (RateLimitRejectedException)
        {
            // if the call threw this specific exception, we were rate-limited
            return false;
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, ex.Message);

            // non-rate-limited exceptions aren't something we can deal with
            throw;
        }
    };
}

var durationSeconds = (int)duration.TotalSeconds;
var perSecond = requests.Length / durationSeconds;
var requestsBySeconds = requests.Chunk(perSecond).ToArray();
var iterationTimer = new Stopwatch();
bool[] perSecondSuccesses;
var perSecondSuccessCounts = new int[durationSeconds];
var oneSecond = TimeSpan.FromSeconds(1);

var totalTimer = Stopwatch.StartNew();

for (var i = 0; i < durationSeconds; i ++)
{
    iterationTimer.Restart();

    perSecondSuccesses = await Task.WhenAll(requestsBySeconds[i].Select(async req => await req()));
    perSecondSuccessCounts[i] = perSecondSuccesses.Count(success => success);

    iterationTimer.Stop();

    var waitDuration = oneSecond - iterationTimer.Elapsed;
    if (waitDuration.TotalMilliseconds > 0)
    {
        await Task.Delay(waitDuration);
    }
}

totalTimer.Stop();

var totalRequestsPerDuration = (double)perSecondSuccessCounts.Sum();
// the total loop iteration time will slightly exceed the requested duration due to overhead, so normalise back
var actualRequestsPerDuration = totalRequestsPerDuration
    / totalTimer.Elapsed.Ticks
    * duration.Ticks;

Essentially I've defined a typed HttpClient with a rate limit of 60 requests per minute, which I'm testing by creating a list of 600 Task delegates encapsulating a call to said HttpClient. I then split them into 60 buckets of 10 requests each and execute all tasks in each of these buckets every second (as opposed to trying to fire off 600 requests instantly). For every bucket, I keep track of how many of those requests were issued versus how many were not (the latter being blocked by the rate-limiting code in the linked PR) and sum up the actual executions after the loop has completed. My expectation is thus that of the 600 requests, at maximum 60 will actually be issued.

The problem is that this appears to be rate-limiting far more aggressively than it should. I'd expect totalRequestsPerDuration to be in the 50s, but in actual operation it never exceeds 30. I'm almost certain that the issue is with my testing code not the Polly code - can anyone give me a suggestion as to what's going wrong here?

Note that the Task.WhenAll wrapping the HTTP calls is (to my mind) not the issue - the loop iteration timer never exceeds 1 second.

@YarekTyshchenko YarekTyshchenko mentioned this issue Oct 15, 2021
4 tasks
@martincostello martincostello mentioned this issue Dec 2, 2021
4 tasks
@martincostello martincostello linked a pull request Dec 3, 2021 that will close this issue
4 tasks
@martincostello martincostello added this to the v7.3.0 milestone Dec 3, 2021
@martincostello
Copy link
Member

The PR to implement this feature has moved to #903 - if you have any feedback on the proposed implementation, please direct any comments there.

@Misiu
Copy link

Misiu commented Jan 17, 2022

@martincostello this can be closed. #903 is merged.

@martincostello
Copy link
Member

I know, I merged it 😄

I was intentionally leaving it open until the next release is available from NuGet.org.

@martincostello
Copy link
Member

Looks like it went up about 5 minutes ago as part of v7.2.3 🚀 (thanks @joelhulen).

@joelhulen
Copy link
Member

No, thank YOU, @martincostello!

@madhugilla
Copy link

Hey, can this work in a scaled out scenario ? right now we have a single server which is taking care of calling the external services using a singleton rate limiter and we are running into perf issues.

@martincostello
Copy link
Member

@madhugilla Please create a new issue with more details of what you're experiencing and we can look into it.

@StevenWang-Craft
Copy link

I require this functionality, so I pulled the branch down, built it into a NuGet package and started testing it. And I've encountered some weirdness. Here's my code:

Startup.cs ConfigureServices:

services
    .AddHttpClient<IMyClient, MyClient>(c => c.BaseAddress = new Uri("https://my-service.my.domain/"))
    .AddPolicyHandler(Polly.RateLimit.Policy.RateLimitAsync<HttpResponseMessage>(
        60, TimeSpan.FromMinutes(1)
    ));

Client:

public interface IMyClient
{
    async Task<SomeResponseModel> CallHttpEndpointAsync();
}

public class MyClient : IMyClient
{
    private readonly HttpClient _httpClient;

    public MyClient(HttpClient httpClient)
    {
        _httpClient = httpClient;
    }

    public async Task<SomeResponseModel> CallHttpEndpointAsync()
    {
        var response = await _httpClient.GetAsync("some/endpoint");
        var result = response.IsSuccessStatusCode ? await response.Content.ReadAsAsync<SomeResponseModel>() : default;

        return result;
    }
}

Test:

var duration = TimeSpan.FromMinutes(1); // same as in Startup
var requests = new Func<Task<bool>>[600];

for (var i = 0; i < requests.Length; i ++)
{
    requests[i] = async () =>
    {
        try
        {
            await _iMyClientInstance.CallHttpEndpointAsync();

            // if the call was made, we weren't rate-limited
            return true;
        }
        catch (RateLimitRejectedException)
        {
            // if the call threw this specific exception, we were rate-limited
            return false;
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, ex.Message);

            // non-rate-limited exceptions aren't something we can deal with
            throw;
        }
    };
}

var durationSeconds = (int)duration.TotalSeconds;
var perSecond = requests.Length / durationSeconds;
var requestsBySeconds = requests.Chunk(perSecond).ToArray();
var iterationTimer = new Stopwatch();
bool[] perSecondSuccesses;
var perSecondSuccessCounts = new int[durationSeconds];
var oneSecond = TimeSpan.FromSeconds(1);

var totalTimer = Stopwatch.StartNew();

for (var i = 0; i < durationSeconds; i ++)
{
    iterationTimer.Restart();

    perSecondSuccesses = await Task.WhenAll(requestsBySeconds[i].Select(async req => await req()));
    perSecondSuccessCounts[i] = perSecondSuccesses.Count(success => success);

    iterationTimer.Stop();

    var waitDuration = oneSecond - iterationTimer.Elapsed;
    if (waitDuration.TotalMilliseconds > 0)
    {
        await Task.Delay(waitDuration);
    }
}

totalTimer.Stop();

var totalRequestsPerDuration = (double)perSecondSuccessCounts.Sum();
// the total loop iteration time will slightly exceed the requested duration due to overhead, so normalise back
var actualRequestsPerDuration = totalRequestsPerDuration
    / totalTimer.Elapsed.Ticks
    * duration.Ticks;

Essentially I've defined a typed HttpClient with a rate limit of 60 requests per minute, which I'm testing by creating a list of 600 Task delegates encapsulating a call to said HttpClient. I then split them into 60 buckets of 10 requests each and execute all tasks in each of these buckets every second (as opposed to trying to fire off 600 requests instantly). For every bucket, I keep track of how many of those requests were issued versus how many were not (the latter being blocked by the rate-limiting code in the linked PR) and sum up the actual executions after the loop has completed. My expectation is thus that of the 600 requests, at maximum 60 will actually be issued.

The problem is that this appears to be rate-limiting far more aggressively than it should. I'd expect totalRequestsPerDuration to be in the 50s, but in actual operation it never exceeds 30. I'm almost certain that the issue is with my testing code not the Polly code - can anyone give me a suggestion as to what's going wrong here?

Note that the Task.WhenAll wrapping the HTTP calls is (to my mind) not the issue - the loop iteration timer never exceeds 1 second.

When debugging and inspecting the internal RateLimitPolicy, I noticed something called: Interval. I guess here it means TimeSpan/numberOfExecutions. Let' say, set 3 execution per second, so the interval is 333.333ms, thus it will use the evenly distributed time slot to determine if it exceeds.

In your sample code, await Task.WhenAll(requestsBySeconds[i].Select(async req => await req())) will trigger all the Async method in short period of time and may run out the only 1 available bucket per slot for some of them.

@martincostello
Copy link
Member

@App-vNext App-vNext locked and limited conversation to collaborators Mar 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.