Blog

How to analyze VPC flow logs and find suspicious destinations from the command line

30 Nov, 2024
Xebia Background Header Wave

In this blog I will show you how you can easily analyse the VPC flows logs and find suspicious internet destinations, from the command line. The process goes through the following 5 steps.

  1. Retrieve flow log to a text file
  2. Limit the flow log to NAT gateway traffic
  3. Removing traffic to own public IP addresses
  4. Removing traffic to Datadog
  5. Analyze the remaining flow logs

But first, what can we find in a VPC flow log?

VPC flow logs

VPC flow logs contain records of the network flows inside your VPC. A log entry is pretty basic, and contains at least the source and destination IP address, protocol, ports, as well as the amount of data transferred. A VPC flow log entry looks something like this:

2 123456789012 eni-adf878fe 10.10.1.10 13.233.147.224 38511 443 6 56 60297 1732951951 1732951981 ACCEPT OK 

There is not a lot of information there. It is just data. In Ithe following table you will find a description of each column:

FieldValueDescription
Version2This indicates the version of the flow log format.
Account ID123456789012This is the AWS account ID associated with the VPC.
ENI IDeni-adf878feThis is the Elastic Network Interface (ENI) ID. It identifies the specific network interface that the traffic is associated with.
Source IP10.10.1.10This is the private IP address of the source of the traffic (the instance or resource within the VPC).
Destination IP13.233.147.224This is the public IP address of the destination (an external IP address, possibly an internet service).
Source Port38511This is the port number on the source instance that initiated the traffic.
Destination Port443This is the port number on the destination that the traffic is targeting. Port 443 is commonly used for HTTPS traffic.
Protocol6This represents the protocol used for the traffic. In this case, 6 corresponds to TCP (Transmission Control Protocol).
Packets56This indicates the number of packets that were transmitted during this flow.
Bytes60297This indicates the total number of bytes transferred during this flow.
Start Time1732951951This is the start time of the flow in Unix epoch time (seconds since January 1,1970).
End Time1732951981This is the end time of the flow in Unix epoch time.
ActionACCEPTThis indicates that the traffic was allowed through the network interface (as per the security groups and network ACLs).
Log StatusOKThis indicates the status of the log entry, in this case, it shows that the logging was successful.

You can configure flow logs for your VPC to keep track of all of the traffic on your cloud network: I assume that you have already have done that.

1. Retrieving Flow Log Records

To retrieve the VPC flow logs, I used the utility flowlogs_reader. It supports both reading from S3 buckets as well as from CloudWatch logs. In my case, I have the flow logs in CloudWatch, so I type:

$ flowlogs_reader --location-type cwl /vpc/flow-log > flow-logs.txt

To find out how many records were retrieved, type:

$ wc -l flow-logs.txt
1512436 flow-logs.txt

In this case, there are a whopping 1,512,436 records!

2. Limit the Flow Log to NAT Gateway Traffic

To limit the flow logs to outbound, NAT gateway related traffic. perform the following steps:

  1. Retrieve the NAT gateway private IP addresses
  2. Filter to remove flow records with other source addresses
  3. Remove flow records from the NAT gateway into the VPC

2.1. Retrieving NAT Gateway Private IP Addresses

I am only interested in traffic going out from the NAT gateway. To get the private IP addresses of the NAT gateways in the VPC, type:

$ nat_gateways=$(aws ec2 describe-nat-gateways \
                  --query 'join(`\n`,NatGateways[].NatGatewayAddresses[].PrivateIp)' \
                  --output text)

2.2. An AWK Filter for NAT Gateway Traffic

With the public IP addresses,we can filter the flow logs for traffic originating from the NAT gateways, by typing:

$ awk_filter=$(sed -e 's/.*/$4 == "&"/g' <<< "$nat_gateways" | \
                paste -s -d '|' - | \
                sed -e 's/|/ || /g' \
                -e 's/^/$4 != "-" && (/' \
                -e 's/$/) {print $0}')

$ awk "$awk_filter" flow-logs.txt > from-nat-gateways.txt

So, now we have all logs which originated from the NAT gateways.

2.3. Filtering Return Traffic to the Requester

We focus on outgoing traffic from the NAT gateway to the public internet. To filter out any traffic that is destined for the VPC (CIDR 10.0.0.0/8), type:

$ awk '$5 !~ /^10\..*/{print $0}' from-nat-gateways.txt > from-nat-gateways-out.txt
$ wc -l from-nat-gateways-out.txt
28894 from-nat-gateways-out.txt

This is nice: The volume of flow records to analyze already reduced from 1.5 million down to 29000!

3. Removing traffic to our own public IP addresses

To remove flow logs of traffic from our VPC to it’s own public IP addresses, do the following:

  1. Retrieve the public IP addresses in the VPC
  2. Use an Awk filter to remove flow records to our own IPs

3.1. Retrieve Our VPC’s Public IP Addresses

To find the public IP addresses offered by the VPC, type:

public_ips=$(aws ec2 describe-network-interfaces \
            --query 'join(`\n`,NetworkInterfaces[].PrivateIpAddresses[].Association.PublicIp)' \
            --output text)

3.2. Use AWK to Filter Traffic Back to Own IPs

To remove all flow logs of traffic back to the VPC’s public IP addresses, type:

awk_filter=$(sed -e 's/.*/$5 != "&"/g' <<< "$public_ips" | \
                paste -s -d '|' -  | \
                sed -e 's/|/ && /g' \
                -e 's/$/ {print $0}/')
awk "$awk_filter" from-nat-gateways-out.txt > outgoing-public-internet.txt

4. Removing Outgoing Datadog Destinations

In our case, there is known outgoing traffic to Datadog. If you do not use Datadog, you can skip this step.

It was not easy to filter log records for a set of IP address ranges. Therefore, I created the following python script filter-datadog-ips, The following command removes all traffic flows to Datadog:

filter-datadog-ips < outgoing-public-internet.txt > outgoing-public-internet-without-datadog.txt

5. Analyzing Remaining Traffic

After filtering, you can analyze what remains:

  • The number of unique destination IP addresses
  • The number of unique destination IP ports
  • The DNS name of the IP addresses
  • The certificate name of the IP address
  • Identifying destinations

5.1. Identifying Destination IP Addresses

To find the number of unique destination IP addresses, type:

$ awk '{print $5, $7}' outgoing-public-internet-without-datadog.txt | sort -u > unique-ip-and-port.txt
$ wc -l unique-ip-and-port.txt
26 unique-ip-and-port.txt

How cool is that! I have only 26 IP addresses left to analyze. A great improvement from 1.6 million!

5.2. Identifying Which Ports Are Used

To see which ports are being accessed, type:

$ awk '{print $2}' unique-ip-and-port.txt | sort | uniq -c | sort -r -n
  19 443
   3 80
   2 465
   1 30120
   1 2593

Clearly some HTTP, HTTP, SMTP and two vague destination ports.

5.3. DNS Reverse lookup of the IP Addresses

To get an idea of the destination, perform a DNS reverse lookups of the IP addresses by typing:

cat unique-ip-and-port.txt | while read ip port ; do
    domain_name=$(dig +short -x $ip)
    echo "$ip,$port,${domain_name:--}"
done > dns-lookups.csv

This script performs a reverse lookup for each IP address. If unsuccessful, a dash is used instead.

5.4. Retrieving Certificate Names of the IP Addresses

For IP traffic to ports 80 or 443, it may be possible to read the names on the SSL certificates. To find out, type:

cat unique-ip-and-port.txt |while read ip port;   do
        if [[ $port -eq 80 ]] || [[ $port -eq 443 ]]; then
                alt_subject_name=$(timeout 3 openssl s_client -connect $ip:443 < /dev/null 2>&/dev/null | \
                 openssl x509 -noout -text 2>/dev/null | \
                 grep DNS: | \
                 tr '\n' ' ' | \
                 sed -e 's/[    ]*DNS://g' -e 's/,/ /g' | tr -d '\n'
                 )
                 [[ -z $alt_subject_name ]] && alt_subject_name="-"
        else
                alt_subject_name="-"
        fi      
        echo "$ip,$port,$alt_subject_name"
done > certificate-lookups.csv

This script attempts to retrieve the SSL certificate from the IP address at port 443. If found, it extracts the DNS names on the certificate. It may not be entirely correct. As the SSL connection is setup without a hostname, the serve may present the default certificate on that host. For instance, connecting to an IP address of xebia.com, will result in a certificate from kinsta.cloud.

5.5. Identifying Destinations

After collecting the DNS and certificate name, merge the two datasets by typing:

paste -d, dns-lookups.csv certificate-lookups.csv > domain-and-certificate-names.csv

In the resulting file you will find all public IP traffic destinations annotated with DNS and certificate names.

Conclusion

This series of Unix commands provide a quick way to extract which public IP addresses our applications connect to. The reverse DNS lookup and certificate retrieval for HTTP/HTTPS hosts will help to identify the targets and help you create an egress authorization allow list. Note that this is not a generic solution. Each situation is different, and you may need to add additional steps.


Image by Alex S.from Pixabay

Mark van Holsteijn
Mark van Holsteijn is a senior software systems architect at Xebia Cloud-native solutions. He is passionate about removing waste in the software delivery process and keeping things clear and simple.
Questions?

Get in touch with us to learn more about the subject and related solutions

Explore related posts