Blog

How to reduce AWS Lambda latency using custom runtimes

Gero Vermaas

Updated April 30, 2025

4 minutes

When using AWS Lambda functions you typically want to return a response to the client ASAP. However, imagine a situation where you calculate the response for the client and want to do some actions after sending the response to the client (e.g., write some metrics). Since standard AWS Lambda functions do not allow you to execute any actions after returning the response, the client will experience extra latency due to the other actions which must be completed first. This blog explains how to use AWS Lambda custom runtimes to reduce the added latency and still do the additional processing.

The pseudocode below illustrates a simple standard Lambda function. After returning the response by the handler() function, no additional processing is possible.

handler(event, context):
  response = process(event)
  // You have to do your additional processing here...
  return response
  // ... since it is not possible here

By using AWS Lambda custom runtimes you get more control over the execution context of the Lambdas. Custom runtimes allow you to control the so-called Lambda bootstrap script. This is basically an infinite loop which:

Retrieves a next incoming request using the AWS Lambda Runtime API
Invokes the actual Lambda function code to get the response
Post the response back to the client using the AWS Lambda Runtime API

With standard Lambdas, you do not have access to this bootstrap script and the AWS Lambda Runtime API, AWS provides the bootstrap script which it provisions/uses for you in this situation.

In pseudocode the bootstrap and Lambda function code are like this (refer to this example to see an example using bash scripts):

bootstrap script:
while true:
  get evenData and requestID from Lambda Runtime API next invocation endpoint
  response = invoke lambda handler with evenData
  post response to Lambda Runtime API response endpoint using requestID
Lambda code:
handler(eventData):
  response = process(evenData)
  // Again, you have to do your additional processing here...
  return response
  // ... since it is not possible here

As you can see in this setup, once the Lambda handler returns its response there is no additional processing that can be done. However, by using AWS Lambda Custom Runtimes we have control over the bootstrap script, which opens more possibilities. The trick to make to additional processing possible is to move posting of the response to the Lambda API response endpoint from the bootstrap script to the Lambda function handler itself.

The Lambda function handler can now:

First, post the response to the Lambda Runtime response API
Then, do the additional processing
And finally, return control to the bootstrap script which will then retrieve the details of the next Lambda invocation.

In pseudocode this would be:

bootstrap script:
while true:
  get evenData and requestID from Lambda Runtime API next invocation endpoint
  responseEndpointUrl = create url with requestID and Lambda Runtime API response endpoint
  invoke lambda handler with evenData and responseEndpointUrl
Lambda code:
handler(eventData, responseEndpointUrl):
  response = process(evenData)
  post response to responseEndpointUrl
  // Now we dan do the additional processing...
  Do the addition processing
  // ... before returing control to the bootstrap script
  return

This way the client receives the response ASAP while it is still possible to do additional processing after that response has been sent.

We have successfully applied this solution in our Serverless Scientist project. For more details, refer to the bootstrap script, the lambda_handler() function and the setup of the scientist Lambda in the serverless.yml file. We used the Serverless Framework and implemented the Lambdas in Python, but this solution obviously can also be implemented using other languages and frameworks.

Curious to hear what you think of this solution. Can you also apply it? Can it be improved? Let us know in the comments.

Tags: