Blog

So analysieren Sie VPC-Flow-Protokolle und finden verdächtige Ziele über die Befehlszeile

Mark van Holsteijn

Aktualisiert Januar 28, 2026

8 Minuten

In this blog I will show you how you can easily analyse the VPC flows logs and find suspicious internet destinations, from the command line. The process goes through the following 5 steps.

Retrieve flow log to a text file
Limit the flow log to NAT gateway traffic
Removing traffic to own public IP addresses
Removing traffic to Datadog
Analyze the remaining flow logs

But first, what can we find in a VPC flow log?

VPC flow logs

VPC flow logs contain records of the network flows inside your VPC. A log entry is pretty basic, and contains at least the source and destination IP address, protocol, ports, as well as the amount of data transferred. A VPC flow log entry looks something like this:

2 123456789012 eni-adf878fe 10.10.1.10 13.233.147.224 38511 443 6 56 60297 1732951951 1732951981 ACCEPT OK

There is not a lot of information there. It is just data. In Ithe following table you will find a description of each column:

You can configure flow logs for your VPC to keep track of all of the traffic on your cloud network: I assume that you have already have done that.

1. Retrieving Flow Log Records

To retrieve the VPC flow logs, I used the utility flowlogs_reader. It supports both reading from S3 buckets as well as from CloudWatch logs. In my case, I have the flow logs in CloudWatch, so I type:

$ flowlogs_reader --location-type cwl /vpc/flow-log > flow-logs.txt

To find out how many records were retrieved, type:

$ wc -l flow-logs.txt
1512436 flow-logs.txt

In this case, there are a whopping 1,512,436 records!

2. Limit the Flow Log to NAT Gateway Traffic

To limit the flow logs to outbound, NAT gateway related traffic. perform the following steps:

Retrieve the NAT gateway private IP addresses
Filter to remove flow records with other source addresses
Remove flow records from the NAT gateway into the VPC

2.1. Retrieving NAT Gateway Private IP Addresses

I am only interested in traffic going out from the NAT gateway. To get the private IP addresses of the NAT gateways in the VPC, type:

$ nat_gateways=$(aws ec2 describe-nat-gateways \
                  --query 'join(`\n`,NatGateways[].NatGatewayAddresses[].PrivateIp)' \
                  --output text)

2.2. An AWK Filter for NAT Gateway Traffic

With the public IP addresses,we can filter the flow logs for traffic originating from the NAT gateways, by typing:

$ awk_filter=$(sed -e 's/.*/$4 == "&"/g' <<< "$nat_gateways" | \
                paste -s -d '|' - | \
                sed -e 's/|/ || /g' \
                -e 's/^/$4 != "-" && (/' \
                -e 's/$/) {print $0}')

$ awk "$awk_filter" flow-logs.txt > from-nat-gateways.txt

So, now we have all logs which originated from the NAT gateways.

2.3. Filtering Return Traffic to the Requester

We focus on outgoing traffic from the NAT gateway to the public internet. To filter out any traffic that is destined for the VPC (CIDR 10.0.0.0/8), type:

$ awk '$5 !~ /^10\..*/{print $0}' from-nat-gateways.txt > from-nat-gateways-out.txt
$ wc -l from-nat-gateways-out.txt
28894 from-nat-gateways-out.txt

This is nice: The volume of flow records to analyze already reduced from 1.5 million down to 29000!

3. Removing traffic to our own public IP addresses

To remove flow logs of traffic from our VPC to it's own public IP addresses, do the following:

Retrieve the public IP addresses in the VPC
Use an Awk filter to remove flow records to our own IPs

3.1. Retrieve Our VPC's Public IP Addresses

To find the public IP addresses offered by the VPC, type:

public_ips=$(aws ec2 describe-network-interfaces \
            --query 'join(`\n`,NetworkInterfaces[].PrivateIpAddresses[].Association.PublicIp)' \
            --output text)

3.2. Use AWK to Filter Traffic Back to Own IPs

To remove all flow logs of traffic back to the VPC's public IP addresses, type:

awk_filter=$(sed -e 's/.*/$5 != "&"/g' <<< "$public_ips" | \
                paste -s -d '|' -  | \
                sed -e 's/|/ && /g' \
                -e 's/$/ {print $0}/')
awk "$awk_filter" from-nat-gateways-out.txt > outgoing-public-internet.txt

4. Removing Outgoing Datadog Destinations

In our case, there is known outgoing traffic to Datadog. If you do not use Datadog, you can skip this step.

It was not easy to filter log records for a set of IP address ranges. Therefore, I created the following python script filter-datadog-ips, The following command removes all traffic flows to Datadog:

filter-datadog-ips < outgoing-public-internet.txt > outgoing-public-internet-without-datadog.txt

5. Analyzing Remaining Traffic

After filtering, you can analyze what remains:

The number of unique destination IP addresses
The number of unique destination IP ports
The DNS name of the IP addresses
The certificate name of the IP address
Identifying destinations

5.1. Identifying Destination IP Addresses

To find the number of unique destination IP addresses, type:

$ awk '{print $5, $7}' outgoing-public-internet-without-datadog.txt | sort -u > unique-ip-and-port.txt
$ wc -l unique-ip-and-port.txt
26 unique-ip-and-port.txt

How cool is that! I have only 26 IP addresses left to analyze. A great improvement from 1.6 million!

5.2. Identifying Which Ports Are Used

To see which ports are being accessed, type:

$ awk '{print $2}' unique-ip-and-port.txt | sort | uniq -c | sort -r -n
  19 443
   3 80
   2 465
   1 30120
   1 2593

Clearly some HTTP, HTTP, SMTP and two vague destination ports.

5.3. DNS Reverse lookup of the IP Addresses

To get an idea of the destination, perform a DNS reverse lookups of the IP addresses by typing:

cat unique-ip-and-port.txt | while read ip port ; do
    domain_name=$(dig +short -x $ip)
    echo "$ip,$port,${domain_name:--}"
done > dns-lookups.csv

This script performs a reverse lookup for each IP address. If unsuccessful, a dash is used instead.

5.4. Retrieving Certificate Names of the IP Addresses

For IP traffic to ports 80 or 443, it may be possible to read the names on the SSL certificates. To find out, type:

cat unique-ip-and-port.txt |while read ip port;   do
        if [[ $port -eq 80 ]] || [[ $port -eq 443 ]]; then
                alt_subject_name=$(timeout 3 openssl s_client -connect $ip:443 < /dev/null 2>&/dev/null | \
                 openssl x509 -noout -text 2>/dev/null | \
                 grep DNS: | \
                 tr '\n' ' ' | \
                 sed -e 's/[    ]*DNS://g' -e 's/,/ /g' | tr -d '\n'
                 )
                 [[ -z $alt_subject_name ]] && alt_subject_name="-"
        else
                alt_subject_name="-"
        fi      
        echo "$ip,$port,$alt_subject_name"
done > certificate-lookups.csv

This script attempts to retrieve the SSL certificate from the IP address at port 443. If found, it extracts the DNS names on the certificate. It may not be entirely correct. As the SSL connection is setup without a hostname, the serve may present the default certificate on that host. For instance, connecting to an IP address of xebia.com, will result in a certificate from kinsta.cloud.

5.5. Identifying Destinations

After collecting the DNS and certificate name, merge the two datasets by typing:

paste -d, dns-lookups.csv certificate-lookups.csv > domain-and-certificate-names.csv

In the resulting file you will find all public IP traffic destinations annotated with DNS and certificate names.

Conclusion

This series of Unix commands provide a quick way to extract which public IP addresses our applications connect to. The reverse DNS lookup and certificate retrieval for HTTP/HTTPS hosts will help to identify the targets and help you create an egress authorization allow list. Note that this is not a generic solution. Each situation is different, and you may need to add additional steps.

Image by Alex S.from Pixabay

Tags:

Verfasst von

Mark van Holsteijn

Mark van Holsteijn is a senior software systems architect at Xebia Cloud-native solutions. He is passionate about removing waste in the software delivery process and keeping things clear and simple.

Unsere Ideen

Weitere Blogs

Alle anzeigen

‌

Standardmäßig sicher, überraschend langsam. Was AWS vergessen hat, über die RDS...

Let’s discuss how we can support your journey.

‌

Antwort

Verwandte Themen

Kontextdateien

Verwandte Themen

So analysieren Sie VPC-Flow-Protokolle und finden verdächtige Ziele über die Befehlszeile

Mark van Holsteijn

VPC flow logs

1. Retrieving Flow Log Records

2. Limit the Flow Log to NAT Gateway Traffic

2.1. Retrieving NAT Gateway Private IP Addresses

2.2. An AWK Filter for NAT Gateway Traffic

2.3. Filtering Return Traffic to the Requester

3. Removing traffic to our own public IP addresses

3.1. Retrieve Our VPC's Public IP Addresses

3.2. Use AWK to Filter Traffic Back to Own IPs

4. Removing Outgoing Datadog Destinations

5. Analyzing Remaining Traffic

5.1. Identifying Destination IP Addresses

5.2. Identifying Which Ports Are Used

5.3. DNS Reverse lookup of the IP Addresses

5.4. Retrieving Certificate Names of the IP Addresses

5.5. Identifying Destinations

Conclusion

Verfasst von

Mark van Holsteijn

Weitere Blogs

Standardmäßig sicher, überraschend langsam. Was AWS vergessen hat, über die RDS...

Verwalten und entdecken Sie KI-Agenten und Tools mit Amazon Bedrock AgentCore...

Warum Cedar-Richtlinien für Ihr Amazon Bedrock AgentCore Gateway wichtig sind

Perspektive: Das Gedächtnis von KI-Agenten – Einblicke aus dem...

Let’s discuss how we can support your journey.