Recently, a client asked me how to properly rate limit their REST APIs on AWS API Gateway, as they could not quite wrap their head around it. If you are currently using REST APIs on AWS and find yourself struggling with the limited documentation on throttling, look no further! In this post I will explain the ins and outs of API Gateway throttling.

How does AWS API Gateway throttling work?

Before we look at the different ways we can rate limit requests, we need to understand how throttling works in theory. AWS throttles requests using the so-called the token bucket algorithm. In this algorithm, we have a bucket that is filled with tokens. Every token represents 1 API gateway request. The amount of tokens that our bucket can contain is what we call the burst limit. The rate at which our bucket gets refilled with tokens is called the rate limit.

A visual explanation of the token bucket algorithm
Let’s say that we set the rate limit to 3 requests per second and the burst limit to 9 requests. That means that every second, 3 new tokens are added to the bucket, up to the maximum burst limit of 9. As long as our request rate does not exceed our token refill rate, everything is fine. However, what if we send 4 requests per second? At this point, our bucket will start to deplete by 1 token per second. After 6 seconds, all our tokens in the bucket will have been depleted and we are hit with a 429: Too Many Requests error. The token bucket algorithm is used to manage throttling in many other AWS services, such as ECS Kinesis

Gateway Throttling

Now that we have looked at the underlying throttling system, we are ready to look at the different levels at which we can throttle. AWS distinguishes between roughly 4 different rate limits:

Regional.
Account-level.
Stage-level (also known as the overall rate per-method).
Usage plan-level (also known as the overall rate per-client).

AWS will evaluate every rate limit and block a request if any of them are exceeded. This effectively means that AWS applies throttling based on the narrowest throttling definition. In other words, usage-plan level throttling takes precedence over stage-level throttling, stage-level throttling takes precedence over account-level throttling and so forth.

Regional throttling

The highest rate limit is the regional rate limit. AWS does not provide much information on this topic, even in their security whitepaper on API Gateway. The only thing we know is that regional throttling limits are set across all accounts and clients in a region. As we cannot modify this limit, let’s move down one level to the account limit.

Account-level throttling

At the account level, 1 bucket of tokens is shared across all APIs that you enable per region. The default rate limit is 10.000 requests per second, and the default burst limit is 5000 requests. It is possible to increase this limit, permitting that it does not exceed AWS’s theoretical regional limits. Account-level throttling is enabled by default. Note that if we do not configure any other throttling settings, a single user can exhaust our API Gateway quota! An attacker could, for example, effectively shut down the entire API Gateway environment in your account by DDoSing a health check endpoint.

Stage-level throttling

At the stage-level, 1 bucket is created for every method across an API stage. Furthermore, method throttling is enabled by default when you deploy an API in API Gateway. While this sounds good in theory, the default limit is equal to your account level limits. Consequently, if you don’t change these default settings, a single method can still exhaust your entire account limit as described above. In addition, because these per-method buckets are shared between different users, one user can prevent other users from accessing specific endpoints if their sustained use is higher than the token refresh rate of the bucket. As a result, if you do decide to use stage-level throttling, be aware of these two caveats!

Usage plan-level throttling

A solution that prevents specific users from overwhelming your APIs, is the usage plan. Usage plans allow for more granular control of requests by defining an overall rate and burst limit per client. In a usage plan, a client is identified by their API key. Every client has their own bucket of tokens that they can use. If they exhaust their personal bucket, AWS will block further requests. When configured properly, a single client can therefore no longer throttle an API for other users when usage plans are enabled. For even more control, you can throttle individual methods on a per-client basis, with a maximum of 100 methods per usage plan. This limit is actually undocumented! According to AWS Support, there is currently no way to increase it.

Conclusion

In conclusion, API Gateway throttling can be quite confusing! The official documentation is not always clear how throttling limits are applied. In addition, the default settings are quite dangerous if we leave them as they are, and could jeopardize your API Gateway environment in your accounts. When deploying your REST API through AWS API Gateway, make sure to take these quirks into account.

FAQs

1. What is the difference between rate limit and burst limit?

The rate limit is like a speed limit for your API. It sets the maximum number of requests your API can handle per second. So, if your rate limit is 3 requests per second, your API can process up to 3 requests every second.

The burst limit acts as a buffer for sudden spikes in traffic. Think of it as a burst of speed when you need it. If your burst limit is set to 9, it allows up to 9 requests in a short burst before applying the rate limit. This helps manage temporary surges in traffic smoothly.

2. How can I monitor throttling in AWS API Gateway?

There are a few tools AWS provides to keep an eye on throttling and your API’s performance:

Amazon CloudWatch: This tool lets you track metrics like ThrottledRequests, 4xxError, and 5xxError. It helps you see how often requests are being throttled and spot any issues.
API Gateway Logs: By enabling logging in the API Gateway console under the “Stages” section, you can get detailed logs of requests and responses.
AWS X-Ray: This service gives you an end-to-end view of your requests. It’s great for spotting bottlenecks and understanding how throttling affects your system.
Usage Plans and API Keys: These let you monitor and control the usage of individual clients, ensuring that no single user can overload your API.

3. What are the default throttling limits in AWS API Gateway?

Here are the default limits:

Account-Level Throttling: Each account can handle up to 10,000 requests per second, with a burst limit of 5,000 requests per second.
Stage-Level Throttling: These limits are the same as the account-level limits unless you change them.
Usage Plan-Level Throttling: These are set by you when you create a usage plan, so there isn’t a default – it’s whatever you decide.

You can adjust these limits based on your needs, but they’re subject to regional limits that AWS doesn’t publicly share.

4. Can I customize throttling limits for specific users or endpoints?

Yes, absolutely! You can tailor throttling limits for different users or endpoints using usage plans and API keys:

Usage Plans: These let you set specific rate and burst limits for each client identified by their unique API key. This way, you can prevent any single user from hogging your API.
Method-Level Throttling: You can also set different limits for specific endpoints within your API. This is handy if some endpoints need stricter limits due to higher demand or security reasons.

This flexibility allows you to create a throttling strategy that fits your application and user needs perfectly.

5. What happens when throttling limits are exceeded?

When the throttling limits are hit, AWS API Gateway sends back a 429 Too Many Requests error. Here’s what you can do about it:

Retry Mechanism: Build a retry mechanism into your client application to handle these errors gracefully. Use a strategy like exponential backoff to avoid overwhelming the server with retries.
Adjust Limits: Check and tweak your throttling limits if necessary to match your traffic patterns better.
Optimize API Usage: Look at how your API is being used and find ways to reduce unnecessary requests. This might involve caching responses, improving client behavior, or optimizing your backend.

Understanding and managing throttling is key to keeping your API reliable and efficient. Regularly review your settings to make sure they’re aligned with your needs and traffic pattern.

AWS API Gateway throttling explained