TIL that AWS Lambda terminates instances preemptively
TL;DR: There’s a lot of articles and blog posts on preventing or shortening cold-starts for AWS Lambda instances. I learned that AWS Lambda forces cold-starts to happen nevertheless by terminating active, running instances every two hours.
AWS Lambda is an event-driven, serverless computing platform delivered by Amazon. It runs code in response to events and manages all the computing resources required by that code. In their responsibility for managing computing resources, it is known that AWS terminates idling Lambda instances. I discovered that AWS also terminates active, running instances, and quite predictably so.
Managing resources efficiently: terminating passive instances
As mentioned, AWS Lambda terminates instances that are idling for some time, typically a bit longer than 10 minutes. AWS is efficiently managing their infrastructure resources, and not allocating computing resources to instances that aren’t running, or that haven’t been running for some time.
The result of a terminated Lambda instances is that, once a new event does come through, Lambda needs to provision a new application instance, fetch and deploy the developer’s code in that instance, and start the instance to act on the invocation.
Understandably, all this takes more time than only responding to an event from an already running instance. Depending on the instance size and the language these so-called cold-starts take about 300 milliseconds. (Slightly less time for larger instances. A bit more for non-interpreted languages. Significantly more for larger functions in terms of package size.)
Preventing the drawbacks of cold-starts
Because that extra response latency isn’t acceptable for all use cases – it’s especially inconvenient if the Lambda function needs to respond to a user-initiated request – AWS started to offer Provisioned Concurrency for Lambda since December 2019. But before that, and continuing still, many developers are trying to prevent cold-starts in the on-demand setup, for example using periodic time-triggered events to keep the functions ‘warm’. Additionally, others are trying to reducing the total time of cold-starts. There’s no shortage of articles about preventing or speeding up cold-starts.
AWS Lambda terminates instances preemptively anyway
It may not be surprising if you think it through, but recently I was taken aback to discover that AWS Lambda is also actively terminating ‘warm’ instances, preemptively, while still receiving traffic. And that there’s little a developer can do about it, perhaps except for enabling Provisioned Concurrency. So, regardless of other efforts to keep instances ‘warm’, Lambda is terminating instances after some time, thus actively inducing cold-starts when responding to a next invocation.
By the way, it’s not that it really should matter though, except for the drawback of the occasional cold-start delay. In the single-event-processing nature of Lambda, a developer shouldn’t care about the lifetime of a Lambda instance, nor rely on the fact that a specific Lambda instance may process multiple events during its existence. (You could build some caching in the Lambda function, based on the high probability that an instance will stay around for a while, but then you’re rubbing against the grain of the execution model of AWS Lambda.)
Nevertheless, I was curious about the maximum lifetime of an AWS Lambda instance, and whether Lambda terminates running instances after some time predictably or just randomly. I couldn’t find documentation or other blog posts about the average lifespan of a Lambda instance, so I figured it out myself.
Creating an experiment to investigate instance terminations
When posed such a question, I love to set up an experiment to investigate the matter. For this experiment, the setup was quite simple. I created a Lambda function with some Python code that would know whether it was running in a freshly minted instance or in one that had served requests prior. If so, the code would log some metrics. I deployed this code to AWS Lambda in three different regions and for multiple instance sizes and set up steady trickles of requests towards all Lambdas deployed for a couple of days.
The specifics of the experiment:
- I used one small, single code file in Python 3.7 with a total package size of less than 5 KB.
- I deployed the code to AWS Lambda using the Serverless Framework.
- I used three AWS regions:
- The AWS Lambda instance sizes set up were 256 MB, 512 MB, 1024 MB, and 2048 MB.
- Note: the actual memory used by the function was less than 100 MB.
- The frequency of requests to the Lambda functions was set to invoke a Lambda function every 1, 2, 3, 4, 5, 10, and 15 minutes via scheduled events. For final validation, I also sent requests to one Lambda function every 10 seconds.
Running the experiment
After installing and starting the experiment, waiting on the results started. I ran two experiments, one for two days and one for seven days. The experiments did not span a month boundary and during the experiments no drastic, notable events happened in the world. In hindsight, that is quite remarkable considering it’s 2020, but no unusual traffic patterns (high nor low) affected the results.
I used a bit of the waiting time to do some forecasting on the costs: running this experiment in three regions costs less than $ 0.04 per day. In the setup of this experiment, the costs for the (less than 5000 daily) requests are negligible. It’s mainly the daily 600 GB-seconds required for the execution that contributes to the bill.
The following table lists the results, summarized. In experiment 1a (running for two days), the delay (in minutes) between requests towards the AWS Lambda varied. In experiment 1b (also running for two days), the delay between subsequent requests was fixed to 4 minutes, but the Lambda instance memory size varied.
In experiment 2 (running for a week), again the delay (in minutes) between requests towards the Lambda function varied. In essence, it was a similar experiment as listed under 1a, but just running for a longer time. Experiment 2 also included a run with a delay of around 10 seconds between Lambda invocations.
|Experiment 1a (for two days)||delay between requests (in min.)||instance memory size (in MB)||average instance lifetime (in min.)||instance lifetime standard deviation (in min.)|
|Experiment 1b (for two days)|
|Experiment 2 (for one week)|
|eu-west-1 *)||1||1024||127 *)||22|
|Experiment 2b (for five days)|
|eu-west-1||Every 10 seconds||1024||113||37|
*) Incidentally, there was a period during the experiment week in which instances in
eu-west-1 were not terminated every two hours. This resulted in three instances that ran for 48 hours. These exceptionally long lifetimes were omitted in calculating the averages and standard deviations, as such a thing didn’t happen during the other experiments, and also not in the other two regions.
As an alternative display of the experiment data, the graph below displays the actual Lambda lifetimes for
eu-west-1 during experiment 2a.
Looking at the experiment results, some observations are clear:
- On average, AWS Lambda instances live about 130 minutes, with a standard deviation of 30 minutes. The standard deviation’s value is heavily affected by some outliers instead of a being a symptom of a generic pattern with a wide range in lifetimes.
- No instance is ever running idle for 15 minutes or longer. Passive instances (receiving hardly any traffic) are preemptively terminated if a function isn’t invoked for 15 minutes. I didn’t find the ‘guaranteed to be cut-off’ time, but it’s somewhere between 10 and 15 minutes of inactivity.
- During experiment 1, instances in
eu-west-1had a slightly longer lifetime than instances in
ap-southeast-1, and a lower lifetime variability. It initially seemed that instances in
eu-west-1were operating in a more stable environment. But during experiment 2, running for a longer period, this effect petered out completely, and no firm correlation between region and average instance lifetime could be found.
- There might be a minor correlation between frequency of function invocations and instance lifetimes. Instances for functions running more frequently seem to have a slightly longer lifetime than functions being invoked less often… except for functions running every 10 seconds, which showed a shorter lifetime again.
- There’s no strong correlation between instance memory size and lifetime. Instances of 1024 MB had a comparable lifetime as instances of 256 MB, 512 MB, 1024 MB, and 2048 MB.
AWS Lambda terminates active instances preemptively, and predictably every 130 minutes. The intensity of the traffic towards the Lambda slightly affects the lifetime: more active instances have a bit longer lifetime, but the average lifetime varies less than 10 minutes. Also, the instance’s provisioned memory size nor the region (of the three regions tested) affects the average lifetime significantly.
Bringing this discovery and the experiments to an end: notwithstanding all the efforts of preventing cold-starts, AWS Lambda purposefully terminates running Lambda instances, ultimately forcing cold-starts to happen. By the way, this is well within AWS’s right, but an insight that I couldn’t find documented or described.
So, don’t expect and plan for your Lambda instances to run for a long time, serving multiple requests from the same running instance. Even a continuous stream of events or traffic doesn’t safeguard your AWS Lambda instances from being terminated.
[Edit: after initial publication, a scatter plot diagram showing the actual instance lifetimes for
eu-west-1 during experiment 2 was added, plus the observation that the lifetime standard deviation is likely to have been influenced significantly by incidental outliers instead of the presence of a generic pattern.]
Look at our consultancy services, training offers and careers below or contact us at email@example.com