You can search across log groups by setting up a correlation id on your log lines. You can then use CloudWatch Logs Insights to query all related log messages.
If you are dealing with different lambda functions in a project, you might not have a logging system yet. Finding and troubleshooting using Cloudwatch Logs could become cumbersome. Especially if you try to track a flow through the different log groups.
In CloudWatch Logs Insights you can execute queries across log groups. In our case we want to collect related log lines across log groups. To achieve this we need to annotate the log lines with a correlation_id
that ties them together.
Sample application
We will use the following sample application to show how to setup a correlation id on your log lines. The application looks as followed:
- API Gateway that is invoking an AWS Lambda function
- A
api
Lambda function places thebody
in a SQS Queue - A
callback
Lambda function is receiving messages from the SQS QueuEvery time that an API call executes, the
api
function is invoked. We can annotate the log lines by using the Lambda Powertools for Python.
from aws_lambda_powertools import Logger
from aws_lambda_powertools.logging import correlation_paths
from aws_lambda_powertools.utilities.typing import LambdaContext
from aws_lambda_powertools.utilities.data_classes import (
event_source,
APIGatewayProxyEvent,
)
logger = Logger(service="api")
@event_source(data_class=APIGatewayProxyEvent)
@logger.inject_lambda_context(correlation_id_path=correlation_paths.API_GATEWAY_REST)
def lambda_handler(event: APIGatewayProxyEvent, context: LambdaContext) -> dict:
logger.info("This message now has a correlation_id set")
We take the requestContext.requestId
from the incoming event. We can use that as a correlation_id
. (This is also returned to the caller as a x-amzn-requestid
header by default by API Gateway.)
To be able to search across log groups using the same correlation id. We need to pass it along with the message as an attribute when we send it to SQS.
response = client.send_message(
QueueUrl=os.environ.get("QUEUE_URL"),
MessageBody=json.dumps(event.body),
MessageAttributes={
"correlation_id": {
"DataType": "String",
"StringValue": event.request_context.request_id,
}
},
)
I used a correlation_id
attribute to pass the correlation id with the SQS message.
The queue will trigger the callback function with a batch of messages. You need to loop through the messages in the batch. Process them and remove them from the queue.
Or you use the sqs_batch_processor from the powertools.
from aws_lambda_powertools import Logger
from aws_lambda_powertools.utilities.typing import LambdaContext
from aws_lambda_powertools.utilities.batch import sqs_batch_processor
from aws_lambda_powertools.utilities.data_classes.sqs_event import SQSRecord
logger = Logger(service="callback")
def record_handler(record: dict) -> None:
record = SQSRecord(record)
# Get the correlation id from the message attributes
correlation_id = record.message_attributes["correlation_id"].string_value
logger.set_correlation_id(correlation_id)
logger.info(f"Processing message with {correlation_id} as correlation_id")
@sqs_batch_processor(record_handler=record_handler)
def lambda_handler(event: dict, context: LambdaContext) -> None:
logger.set_correlation_id(None)
logger.info(f"Received a SQSEvent with {len(list(event.records))} records")
The sqs_batch_processor
decorator will call the record_handler
method for each record received. No need for you to handle the deletion of messages.
In the end we do set the correlation id to None
. We do this because the rest of the log lines are not related to the message received.
Show me!
If you call the API using a simple curl command, you can see the correlation_id
(Note: this is also in the headers)
So imagine that something went wrong. We know the correlation_id
so we will use CloudWatch Logs Insights:
fields level, service, message
| filter correlation_id="263021ca-fb82-41ff-96ad-2e933aadbe0a"
| sort @timestamp desc
As you can see there are 3 results 2 lines originated from the api
service and 1 from the callback
service. If have an interest in how to create queries you can read the CloudWatch Logs Insights query syntax.
Conclusion
The Lambda Powertools for Python make it easy
to add a correlation_id
. By using the structured logs from the powertools you can use each property in the log line to
query on. It has a lot more utilities, so I encourage go check it out!
Not a python developer? There is a Java
version and there are some initiatives on the roadmap
to add other languages as well.