Replies: 13 comments 8 replies
-
Could we potentially use Retry-After? |
Beta Was this translation helpful? Give feedback.
-
I agree we should clearly communicate when the rate limiting threshold has been reached where possible but the requirement to communicate the rate limiting rules needs more consideration. Most rate limiting solutions often use multiple metrics beyond just request counts for their thresholds and communicate the various combinations that are in play for each of the limiting vectors would be complex to express and hard to extract from some of the commercial solutions on the market today. The rate limiting information should not be provided inline with each request as that adds extra overhead to every request. It can be significant for some use cases like if result set size is one of the rate metrics. If the count is needed for a header fields it must be known by completing the query transaction before the first record can be returned as streamed responses ( chunked encoding) can only start once all headers are written and the payload begins. Some limits may not be available for payload response as they are implemented outside the RESO implementation by commercial WAF, DoS or similar providers which RESO implementers may choose to protect their solutions with. These solutions offer little in terms of message payload modification when they have been triggered. Even when not triggered the metrics they are measuring or not easily accessible on a per transaction basis to provide to end users. There are many additional rate limit dimensions that should also be considered in this implementation beyond just request counting. Here is a quick list off the top of my head but is by no means exhaustive.
Coming up with an implementation to communicate the above is much more involved, especially if it needs to be machine parse-able as vendors should be free to implement the rate limiting they feel is required to protect their data. |
Beta Was this translation helpful? Give feedback.
-
From a client perspective, this spec is only valuable if the information provided is actionable. In a good-faith situation
So for a client, the very lowest bar is the
What doesn't help is post-facto 429s. Consider a hypothetical situation where a client has four different computers responsible for 4 different tasks (fetch Property data, fetch OpenHouse data, fetch Agent data, fetch Office data). Each computer makes one request per hour. On one particular hour, they all happen to make their request in the same second at 08:17:09 and all four of those requests succeed. However, the server has some sliding window average limit, and it recognizes that between 08:00:00 and 09:00:00, the client's maximum concurrent requests per second was 4, which exceeds the limit of 3, so the server rate limits the client from 09:00:00 to 10:00:00. In that situation, the server didn't provide actionable information to the client when it was important (08:17:09) because the server didn't communicate any sort of "hey slow down, I'm getting overwhelmed" information at the time when it needed to protect itself (a 429 on one of the requests seems reasonable). And no amount of spec work or informational headers that the server sends after the fact will help either the client or the server. In a bad-faith situation, the client doesn't care about the spec and will do whatever it wants anyway, so it doesn't seem particularly valuable to consider that situation too much. I don't think we need to worry too much about specifying what the server should return when protecting itself from excessive failed requests, attempts to run vulnerability scanners against the endpoint, etc. |
Beta Was this translation helpful? Give feedback.
-
I really like the idea of giving clear error messaging to they can response with better understanding of the connection state and the The goal of my response was to avoid the large implementation of trying to provide rate limiting threshold data in each request as there can be a significant overhead to providing the rate limit state inline with the regular transactions. These can also be complex rules and involve multiple layers of security and different vendors. The rate limits can also be protected as their provide insight in to system abuse prevention methods. There should definitely be an agreement between clients and vendors about what defines "acceptable use" they have with each other but I don't think that we should be done by the transport level as there are external aspects to that agreement outside of just technical transport specification. Both parties need to operate in good faith for the relationship to be successful. If either side operates in bad faith or the behaviour exceeds the agreed upon terms then rate limiting occurs. For example, If a client decides they want to build a new replica from a source system, they should expect they are going to need to throttle their requests to a reasonable level. The definition of "reasonable level" is going to subject to who they are and their relationship with each other. |
Beta Was this translation helpful? Give feedback.
-
There's some information on what Okta does here: https://developer.okta.com/docs/reference/rl-best-practices/#check-your-rate-limits-with-okta-s-rate-limit-headers They use a 429 with custom headers to represent the limit, remaining, and reset, as well as define error responses for the limit. AWS also uses the 429 response, but allows for "usage plans" to be defined with more granular throttling. I'm not sure whether it shows up in the response or not. |
Beta Was this translation helpful? Give feedback.
-
From a consumer perspective, the 429 and a retry-after implementation is the most likely to be useful in a programmatic way, but the custom headers do help in terms of troubleshooting or understanding how we are using our quotas. |
Beta Was this translation helpful? Give feedback.
-
MLS Grid has the following base limits in place:
In cases where it may be necessary to exceed these limits we ask that the data consumer contact MLS Grid support so that a Grace Period can be put into place to allow them to exceed normal rate and data caps. If activity becomes concerning an email is sent to the email address of the Primary Contact on the data consumer account. If the activity exceeds the limits in under an hour period the access token will be temporarily suspended and a shut-off message sent to the Primary email address. Messages regarding behavior will also appear on their MLS Grid timeline. If their API access has been suspended their client will receive an HTTP 429 error in response to any requests. This error will include details of the concerning behavior. When an access token has been suspended for concerning behavior the permissions for the token will be automatically reinstated once sufficient time has passed to decrease the number of requests submitted or the amount of data consumed to acceptable levels. We use a "leaky bucket" method for this. The more egregious the activity the longer it takes for access to be reinstated. Information regarding our rate limits are available in our Best Practices Guide and on technical documentation page: https://docs.mlsgrid.com/#rate-limits We do allow these limits to scale based upon the number of MLS a data consumer has access to. We ask that they email MLS Grid support and request a review of their activity so that we can adjust our limits according to prior usage logs. |
Beta Was this translation helpful? Give feedback.
-
Quick follow up - see this comment: #22 (comment) I think we can go ahead and include the 429 portion alone in the Web API Core 2.1.0 and Payloads 2.0 specs and require it as a MUST when the provider needs to rate limit the consumer? We don't have to test it at the moment but it'd probably be valuable to have a standard in place. |
Beta Was this translation helpful? Give feedback.
-
I'd agree that 429 on rate limit fails should be a must. The broadcast mechanisms, should be optional, but standardized. I personally, prefer showing it in the headers. On my platform, rate limiting / quota varies per user and changes regularly. Putting this in a resource isn't ideal for me.
|
Beta Was this translation helpful? Give feedback.
-
So it sounds like we have vendors who want to...
From what I've seen on this thread, heard in meetings, and talked through privately, all of the vendors have valid reasons for their positions. If we write a spec that encompasses all of these positions, our spec is basically
Is there even value in a spec that doesn't really specify anything? |
Beta Was this translation helpful? Give feedback.
-
The current RESO requirements are that providers MUST return an HTTP 429 response code when the rate limit has been exceeded (which providers seem to be following consistently), and MAY return These are both "reactive" in the sense that users can only respond after the fact and don't have sufficient information to avoid being rate limited ahead of time. There are at least three providers currently advertising quota information in the headers in similar ways, which also cover a large number of markets. We've seen this in certification and it's documented in the comment above. Standardization would provide some consistency among implementations and give new providers a template for how to advertise these values. Given that it would be optional, those who can't or don't want to advertise this information could rely on 429 and |
Beta Was this translation helpful? Give feedback.
-
If I had to choose one of Bryan's 4 options, it would be option number 3. We currently return a 429 error and a link to the logs we keep for the data consumer for their review. The link is rarely used. In addition we provide a dashboard in each data consumer account where they can monitor their activity for themselves, and highlight problematic activity in the logs provided. Our support team has to routinely remind data consumers that this tool exists for them. We also email the primary contact on the account a warning as they approach the rate limit, and an email when they exceed it. Response tend to come several hours, or days after the email is initially sent. I do not think requiring Retry-After is a solution. Many of the data providers are already trying to communicate rate limits through multiple means. Most of the data consumers are simply not hearing us. |
Beta Was this translation helpful? Give feedback.
-
My struggle is that email sucks, websites suck, dashboards suck, getting called sucks unless it's Joe calling, but code rules all. :D I'll gladly pay 400bytes/129 compressed for a touchless solution. |
Beta Was this translation helpful? Give feedback.
-
Many current Web APIs have rate limits, but the methods used to communicate about these rate limits differ between vendors. It would be helpful for Web API consumers for us to standardize how rate limits are communicated.
At a minimum I think server should tell clients:
The easiest way to share this information is with HTTP response headers, the names of which we should decide on.
Also, rate limited requests must illicit a HTTP 429 response from the server instead of 400, 401, 403, or 5XX.
Beta Was this translation helpful? Give feedback.
All reactions