AWS API Gateway throttling explained

11 Nov, 2022
Xebia Background Header Wave

Recently, a client asked me how to properly rate limit their REST APIs on AWS API Gateway, as they could not quite wrap their head around it. If you are currently using REST APIs on AWS and find yourself struggling with the limited documentation on throttling, look no further! In this post I will explain the ins and outs of API Gateway throttling.

How does throttling work?

Before we look at the different ways we can rate limit requests, we need to understand how throttling works in theory.
AWS throttles requests using the so-called the token bucket algorithm. In this algorithm, we have a bucket that is filled with tokens. Every token represents 1 API gateway request. The amount of tokens that our bucket can contain is what we call the burst limit. The rate at which our bucket gets refilled with tokens is called the rate limit.

A visual explanation of the token bucket algorithm
Let’s say that we set the rate limit to 3 requests per second and the burst limit to 9 requests. That means that every second, 3 new tokens are added to the bucket, up to the maximum burst limit of 9.
As long as our request rate does not exceed our token refill rate, everything is fine. However, what if we send 4 requests per second? At this point, our bucket will start to deplete by 1 token per second. After 6 seconds, all our tokens in the bucket will have been depleted and we are hit with a 429: Too Many Requests error. The token bucket algorithm is used to manage throttling in many other AWS services, such as ECS or Kinesis.


Now that we have looked at the underlying throttling system, we are ready to look at the different levels at which we can throttle.
AWS distinguishes between roughly 4 different rate limits:

  1. Regional.
  2. Account-level.
  3. Stage-level (also known as the overall rate per-method).
  4. Usage plan-level (also known as the overall rate per-client).

AWS will evaluate every rate limit and block a request if any of them are exceeded. This effectively means that AWS applies throttling based on the narrowest throttling definition. In other words, usage-plan level throttling takes precedence over stage-level throttling, stage-level throttling takes precedence over account-level throttling and so forth.

Regional throttling

The highest rate limit is the regional rate limit. AWS does not provide much information on this topic, even in their security whitepaper on API Gateway. The only thing we know is that regional throttling limits are set across all accounts and clients in a region. As we cannot modify this limit, let’s move down one level to the account limit.

Account-level throttling

At the account level, 1 bucket of tokens is shared across all APIs that you enable per region. The default rate limit is 10.000 requests per second, and the default burst limit is 5000 requests. It is possible to increase this limit, permitting that it does not exceed AWS’s theoretical regional limits.
Account-level throttling is enabled by default. Note that if we do not configure any other throttling settings, a single user can exhaust our API Gateway quota! An attacker could, for example, effectively shut down the entire API Gateway environment in your account by DDoSing a health check endpoint.

Stage-level throttling

At the stage-level, 1 bucket is created for every method across an API stage. Furthermore, method throttling is enabled by default when you deploy an API in API Gateway.
While this sounds good in theory, the default limit is equal to your account level limits. Consequently, if you don’t change these default settings, a single method can still exhaust your entire account limit as described above.
In addition, because these per-method buckets are shared between different users, one user can prevent other users from accessing specific endpoints if their sustained use is higher than the token refresh rate of the bucket.
As a result, if you do decide to use stage-level throttling, be aware of these two caveats!

Usage plan-level throttling

A solution that prevents specific users from overwhelming your APIs, is the usage plan. Usage plans allow for more granular control of requests by defining an overall rate and burst limit per client.
In a usage plan, a client is identified by their API key. Every client has their own bucket of tokens that they can use. If they exhaust their personal bucket, AWS will block further requests. When configured properly, a single client can therefore no longer throttle an API for other users when usage plans are enabled.
For even more control, you can throttle individual methods on a per-client basis, with a maximum of 100 methods per usage plan. This limit is actually undocumented! According to AWS Support, there is currently no way to increase it.


In conclusion, API Gateway throttling can be quite confusing! The official documentation is not always clear how throttling limits are applied. In addition, the default settings are quite dangerous if we leave them as they are, and could jeopardize your API Gateway environment in your accounts. When deploying your REST API through AWS API Gateway, make sure to take these quirks into account.


Get in touch with us to learn more about the subject and related solutions

Explore related posts