Rate Limit spec #31

EnFinlay · 2022-06-03T16:08:50Z

EnFinlay
Jun 3, 2022
Maintainer

Many current Web APIs have rate limits, but the methods used to communicate about these rate limits differ between vendors. It would be helpful for Web API consumers for us to standardize how rate limits are communicated.

At a minimum I think server should tell clients:

total number of allowed requests
number of remaining requests
UTC time that remaining requests resets

The easiest way to share this information is with HTTP response headers, the names of which we should decide on.

Also, rate limited requests must illicit a HTTP 429 response from the server instead of 400, 401, 403, or 5XX.

darnjo · 2022-06-06T20:36:47Z

darnjo
Jun 6, 2022
Maintainer

Could we potentially use Retry-After?

0 replies

grispin · 2022-06-07T04:00:08Z

grispin
Jun 7, 2022

I agree we should clearly communicate when the rate limiting threshold has been reached where possible but the requirement to communicate the rate limiting rules needs more consideration. Most rate limiting solutions often use multiple metrics beyond just request counts for their thresholds and communicate the various combinations that are in play for each of the limiting vectors would be complex to express and hard to extract from some of the commercial solutions on the market today.

The rate limiting information should not be provided inline with each request as that adds extra overhead to every request. It can be significant for some use cases like if result set size is one of the rate metrics. If the count is needed for a header fields it must be known by completing the query transaction before the first record can be returned as streamed responses ( chunked encoding) can only start once all headers are written and the payload begins.

Some limits may not be available for payload response as they are implemented outside the RESO implementation by commercial WAF, DoS or similar providers which RESO implementers may choose to protect their solutions with. These solutions offer little in terms of message payload modification when they have been triggered. Even when not triggered the metrics they are measuring or not easily accessible on a per transaction basis to provide to end users.

There are many additional rate limit dimensions that should also be considered in this implementation beyond just request counting. Here is a quick list off the top of my head but is by no means exhaustive.

Malformed requests (excessive failed requests, attempts to run vulnerability scanners against the endpoint, etc.)
Number of Resource records returned (per request, per time period)
Sliding window limits vs hard windows. (x events in the last time window)
Per Resource Type limits
Multiple time period windows (shorter windows to minimize burst load upon counter reset, longer windows to enforce limits)
Concurrency request limits
Request Source location
Adjusted rates for Time of Day/Day of week/Day of Month (allow for larger thresholds during non-peak hours)
Type of transaction (Search, Update, etc.)
Number of attempts made after being informed of rate limiting.

Coming up with an implementation to communicate the above is much more involved, especially if it needs to be machine parse-able as vendors should be free to implement the rate limiting they feel is required to protect their data.

0 replies

bryanburgers · 2022-06-07T19:18:53Z

bryanburgers
Jun 7, 2022

From a client perspective, this spec is only valuable if the information provided is actionable.

In a good-faith situation

the server employs rate limiting to protect itself and keep itself running
the client has a vested interest in getting information from a running server, so if it can cooperate with the server, it will

So for a client, the very lowest bar is the 429 Too Many Requests status code. That's actionable information for the client: "OK, I need to take it easy. How long? The server doesn't tell me." And the client and server can cooperate by the client not sending requests for a small period of time.

Retry-After is a next level up. If a client receives a 429 and a Retry-After, the client can help keep the server running by not sending another request until the specified time. And it's advantageous for the client to comply because it doesn't waste resources by making another request too early.

What doesn't help is post-facto 429s. Consider a hypothetical situation where a client has four different computers responsible for 4 different tasks (fetch Property data, fetch OpenHouse data, fetch Agent data, fetch Office data). Each computer makes one request per hour. On one particular hour, they all happen to make their request in the same second at 08:17:09 and all four of those requests succeed. However, the server has some sliding window average limit, and it recognizes that between 08:00:00 and 09:00:00, the client's maximum concurrent requests per second was 4, which exceeds the limit of 3, so the server rate limits the client from 09:00:00 to 10:00:00.

In that situation, the server didn't provide actionable information to the client when it was important (08:17:09) because the server didn't communicate any sort of "hey slow down, I'm getting overwhelmed" information at the time when it needed to protect itself (a 429 on one of the requests seems reasonable). And no amount of spec work or informational headers that the server sends after the fact will help either the client or the server.

In a bad-faith situation, the client doesn't care about the spec and will do whatever it wants anyway, so it doesn't seem particularly valuable to consider that situation too much.

I don't think we need to worry too much about specifying what the server should return when protecting itself from excessive failed requests, attempts to run vulnerability scanners against the endpoint, etc.

0 replies

grispin · 2022-06-08T03:02:25Z

grispin
Jun 8, 2022

I really like the idea of giving clear error messaging to they can response with better understanding of the connection state and the retry-after behaviour should be implemented where possible. We also need to take into account that some of this functionality can be implemented by common commercial protection solutions and their responses will be limited to industry standard behaviours.

The goal of my response was to avoid the large implementation of trying to provide rate limiting threshold data in each request as there can be a significant overhead to providing the rate limit state inline with the regular transactions. These can also be complex rules and involve multiple layers of security and different vendors. The rate limits can also be protected as their provide insight in to system abuse prevention methods.

There should definitely be an agreement between clients and vendors about what defines "acceptable use" they have with each other but I don't think that we should be done by the transport level as there are external aspects to that agreement outside of just technical transport specification. Both parties need to operate in good faith for the relationship to be successful. If either side operates in bad faith or the behaviour exceeds the agreed upon terms then rate limiting occurs.

For example, If a client decides they want to build a new replica from a source system, they should expect they are going to need to throttle their requests to a reasonable level. The definition of "reasonable level" is going to subject to who they are and their relationship with each other.

0 replies

darnjo · 2022-07-12T11:59:57Z

darnjo
Jul 12, 2022
Maintainer

There's some information on what Okta does here: https://developer.okta.com/docs/reference/rl-best-practices/#check-your-rate-limits-with-okta-s-rate-limit-headers

They use a 429 with custom headers to represent the limit, remaining, and reset, as well as define error responses for the limit.

AWS also uses the 429 response, but allows for "usage plans" to be defined with more granular throttling. I'm not sure whether it shows up in the response or not.

2 replies

olmstd Jul 12, 2022

Here are some existing WebApi implementations that provide rate limiting information:

darnjo Jul 12, 2022
Maintainer

Thanks much for providing these links!

ASZimmermann · 2022-08-01T20:39:22Z

ASZimmermann
Aug 1, 2022

From a consumer perspective, the 429 and a retry-after implementation is the most likely to be useful in a programmatic way, but the custom headers do help in terms of troubleshooting or understanding how we are using our quotas.

0 replies

ThoughtsFromSLC · 2022-08-02T23:00:41Z

ThoughtsFromSLC
Aug 2, 2022

MLS Grid has the following base limits in place:

No more than 2 requests submitted per second.
No more than 7,200 requests submitted per hour.
No more than 4GB of data consumed per hour.
No more than 40,000 requests submitted per 24 hour period.

In cases where it may be necessary to exceed these limits we ask that the data consumer contact MLS Grid support so that a Grace Period can be put into place to allow them to exceed normal rate and data caps.

If activity becomes concerning an email is sent to the email address of the Primary Contact on the data consumer account. If the activity exceeds the limits in under an hour period the access token will be temporarily suspended and a shut-off message sent to the Primary email address. Messages regarding behavior will also appear on their MLS Grid timeline. If their API access has been suspended their client will receive an HTTP 429 error in response to any requests. This error will include details of the concerning behavior. When an access token has been suspended for concerning behavior the permissions for the token will be automatically reinstated once sufficient time has passed to decrease the number of requests submitted or the amount of data consumed to acceptable levels. We use a "leaky bucket" method for this. The more egregious the activity the longer it takes for access to be reinstated.

Information regarding our rate limits are available in our Best Practices Guide and on technical documentation page: https://docs.mlsgrid.com/#rate-limits

We do allow these limits to scale based upon the number of MLS a data consumer has access to. We ask that they email MLS Grid support and request a review of their activity so that we can adjust our limits according to prior usage logs.

2 replies

bryanburgers Aug 2, 2022

If somebody submits a third request in a 1s timeframe, do you respond to that third request with a 429?

ThoughtsFromSLC Aug 2, 2022

We allow the activity to continue for one hour, at the top of each our the system review activity and then issues a response. We notify if the activity in the prior hour was concerning - OR - issue a suspension if the activity from the prior hour warrants.

darnjo · 2022-09-13T00:37:47Z

darnjo
Sep 13, 2022
Maintainer

Quick follow up - see this comment: #22 (comment)

I think we can go ahead and include the 429 portion alone in the Web API Core 2.1.0 and Payloads 2.0 specs and require it as a MUST when the provider needs to rate limit the consumer? We don't have to test it at the moment but it'd probably be valuable to have a standard in place.

0 replies

amcelmon-clgx · 2024-04-23T19:52:21Z

amcelmon-clgx
Apr 23, 2024

I'd agree that 429 on rate limit fails should be a must. The broadcast mechanisms, should be optional, but standardized.

I personally, prefer showing it in the headers. On my platform, rate limiting / quota varies per user and changes regularly. Putting this in a resource isn't ideal for me.

<Header name="QuotaType">{QuotaType}</Header>
<Header name="Minute-Quota-Limit">{Minute-Quota-Limit}</Header>
<Header name="Hour-Quota-Limit">{Hour-Quota-Limit}</Header>
<Header name="Hour-Quota-Available">{Hour-Quota-Available}</Header>
<Header name="Minute-Quota-Available">{Minute-Quota-Available}</Header>
<Header name="Hour-Quota-ResetTime">{Hour-Quota-ResetTime}</Header>

1 reply

grispin Apr 24, 2024

We should make a the Retry-After a SHOULD in the spec so that it is best effort as well. Not every industry vendor supports it but we can try to be good publishers as well.

I prefer the rate limiting data to be outside of regular transaction path (separate resource or another endpoint) so that the clients that are actually use it can ask for it when they need it. We don't want to generate the rate limiting data when it's not going to be used or has minimal incremental value as it does take resources to generate on every single request. It's also hard to give an accurate response about the current state of rate limiting as some metrics like row count, bytes, etc. may not be known until the end of the response. For this reason and concurrent requests, the rate limiting data will always be eventually consistent as well.

The second reason to leave it out of the headers is that payload can be large. It would need to support multiple rate limiting metrics to accommodate all the solutions in the market today. HTTP headers are not a good place to put large, structured data. While there is no limit for response headers in the HTTP specification but 8K has been the generally accepted threshold for servers. The example above is ~400 bytes for a single rate limit metric.

There are lots of data points that a data provider may be using to contribute to their rate limiting algorithm. Here are the ones that a data provider may share with a client. There may be others that could be shared or kept hidden. Most commercial solutions keep their metrics hidden as the algorithm's secret sauce is part of their perceived value. Here are some of the big ones:

Data store query complexity (primary key, indexed queries vs table scans)
- Allow string searches in public remarks but it's costly.
Number of records
number of queries
number of logins
type of resource being queried
Bytes
Concurrent requests
Number of unique locations
Number of badly formatted or malformed formed queries
- How many HTTP 400 have you received
number of already open sessions
- Login, do one query, then abandon the session

The intervals for a particular metric should also be part of the communicated as part of the payload and not be defined by the standard as there are some systems with 15 minute interval and another that as a one day interval. They may also be different from one metrics to another.

Here is an example of a simple JSON structure that has only two metrics with 3 time intervals; a one minutes, 15 minute and one day. This illustration shows how this can quickly grow into a fairly large dataset.

[
  {
    "quotaType": "numberOfRecords"
    "thresholds": [
      {
	   "interval": 60
	   "limit": 100000000
	   "available": 4
           "resetTime": "2024-04-01 00:30:00.000"
	  },
      {
	   "interval": 900
	   "limit": 100000000
	   "available": 400
           "resetTime": "2024-04-01 00:45:00.000"
	  },
      {
	   "interval": 86400
	   "limit": 1000000000
	   "available": 4000
           "resetTime": "2024-04-01 23:30:00.000"
	  } 
    ]
  } 
  {
    "quotaType": "numberOfQueries"
    "thresholds": [
      {
	   "interval": 60
	   "limit": 1000
	   "available": 40
           "resetTime": "2024-04-01 00:30:00.000"
	  },
      {
	   "interval": 900
	   "limit": 100000
	   "available": 400
           "resetTime": "2024-04-01 00:45:00.000"
	  },
      {
	   "interval": 86400
	   "limit": 1000000
	   "available": 4000
           "resetTime": "2024-04-01 23:30:00.000"
	  } 
    ]
  } 
]

bryanburgers · 2024-04-24T12:37:33Z

bryanburgers
Apr 24, 2024

So it sounds like we have vendors who want to...

Send rate limit information with every request, presumably to help the client not get to the point where they are rate limited (link, link, link)
Send nothing in advance, but send a 429 and a Retry-After when a client has reached the rate limit (link)
Send nothing in advance, but send a 429 when a client has reached the rate limit (no link; the existence of link suggests there are some that don't use Retry-After)
Send nothing in advance, send nothing when the client has reached the rate limit, send 429s retroactively (link)

From what I've seen on this thread, heard in meetings, and talked through privately, all of the vendors have valid reasons for their positions. If we write a spec that encompasses all of these positions, our spec is basically

The server MAY return a 429 response when rate limits have been exceeded.

Is there even value in a spec that doesn't really specify anything?

0 replies

darnjo · 2024-04-24T15:48:09Z

darnjo
Apr 24, 2024
Maintainer

The current RESO requirements are that providers MUST return an HTTP 429 response code when the rate limit has been exceeded (which providers seem to be following consistently), and MAY return Retry-After (which I've seen at least one provider use so far).

These are both "reactive" in the sense that users can only respond after the fact and don't have sufficient information to avoid being rate limited ahead of time. Retry-After is a large improvement over 429 alone. That said, consumers should always be prepared to handle both (with some kind of "reasonable" default retry interval if it's not given when the 429 is encountered).

There are at least three providers currently advertising quota information in the headers in similar ways, which also cover a large number of markets. We've seen this in certification and it's documented in the comment above.

Standardization would provide some consistency among implementations and give new providers a template for how to advertise these values.

Given that it would be optional, those who can't or don't want to advertise this information could rely on 429 and Retry-After.

0 replies

ThoughtsFromSLC · 2024-04-24T16:06:29Z

ThoughtsFromSLC
Apr 24, 2024

If I had to choose one of Bryan's 4 options, it would be option number 3.

We currently return a 429 error and a link to the logs we keep for the data consumer for their review. The link is rarely used.

In addition we provide a dashboard in each data consumer account where they can monitor their activity for themselves, and highlight problematic activity in the logs provided. Our support team has to routinely remind data consumers that this tool exists for them.

We also email the primary contact on the account a warning as they approach the rate limit, and an email when they exceed it. Response tend to come several hours, or days after the email is initially sent.

I do not think requiring Retry-After is a solution. Many of the data providers are already trying to communicate rate limits through multiple means. Most of the data consumers are simply not hearing us.

1 reply

darnjo Apr 24, 2024
Maintainer

RESO requires 429s at the moment, and Retry-After is optional, but not widely used.

There are providers advertising quota and reset information now, so some standardization would be helpful here.

amcelmon-clgx · 2024-04-24T23:42:55Z

amcelmon-clgx
Apr 24, 2024

My struggle is that email sucks, websites suck, dashboards suck, getting called sucks unless it's Joe calling, but code rules all. :D I'll gladly pay 400bytes/129 compressed for a touchless solution.

2 replies

grispin Apr 25, 2024

When you get to cloud scale, many millions of transactions and multiple rate metrics, the effort to generate and transmit the reporting adds up. I would prefer the rate limiting data was upon-request to avoid spending resources needlessly. The data could be provided many different ways: a rate limit resource, a request header that turns the response on, another endpoint for the rate limiting, etc.

Having the client request the rate limiting data will also help vendors know which clients are using or at least asking for the data .

darnjo Apr 25, 2024
Maintainer

I understand your perspective, but we already have large, cloud-based providers using headers at the moment and this is a fairly common way of advertising quotas.

We're also trying to standardize what we have now, based on current, real-world use, rather than creating a new and different ways of doing things.

Having to make separate requests to a different resource also makes things more complex when most users are already getting this information from the headers now.

For those not planning on advertising this information, there is no impact since it's optional.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rate Limit spec #31

{{title}}

Replies: 13 comments 8 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Rate Limit spec #31

EnFinlay Jun 3, 2022 Maintainer

Replies: 13 comments · 8 replies

darnjo Jun 6, 2022 Maintainer

darnjo Jul 12, 2022 Maintainer

darnjo Jul 12, 2022 Maintainer

darnjo Sep 13, 2022 Maintainer

darnjo Apr 24, 2024 Maintainer

darnjo Apr 24, 2024 Maintainer

darnjo Apr 25, 2024 Maintainer

EnFinlay
Jun 3, 2022
Maintainer

Replies: 13 comments 8 replies

darnjo
Jun 6, 2022
Maintainer

darnjo
Jul 12, 2022
Maintainer

darnjo Jul 12, 2022
Maintainer

darnjo
Sep 13, 2022
Maintainer

darnjo
Apr 24, 2024
Maintainer

darnjo Apr 24, 2024
Maintainer

darnjo Apr 25, 2024
Maintainer