In this blog I will show you how you can easily analyse the VPC flows logs and find suspicious internet destinations, from the command line. The process goes through the following 5 steps.
- Retrieve flow log to a text file
- Limit the flow log to NAT gateway traffic
- Removing traffic to own public IP addresses
- Removing traffic to Datadog
- Analyze the remaining flow logs
But first, what can we find in a VPC flow log?
VPC flow logs
VPC flow logs contain records of the network flows inside your VPC. A log entry is pretty basic, and contains at least the source and destination IP address, protocol, ports, as well as the amount of data transferred. A VPC flow log entry looks something like this:
2 123456789012 eni-adf878fe 10.10.1.10 13.233.147.224 38511 443 6 56 60297 1732951951 1732951981 ACCEPT OK
There is not a lot of information there. It is just data. In Ithe following table you will find a description of each column:
Field | Value | Description |
---|---|---|
Version | 2 | This indicates the version of the flow log format. |
Account ID | 123456789012 | This is the AWS account ID associated with the VPC. |
ENI ID | eni-adf878fe | This is the Elastic Network Interface (ENI) ID. It identifies the specific network interface that the traffic is associated with. |
Source IP | 10.10.1.10 | This is the private IP address of the source of the traffic (the instance or resource within the VPC). |
Destination IP | 13.233.147.224 | This is the public IP address of the destination (an external IP address, possibly an internet service). |
Source Port | 38511 | This is the port number on the source instance that initiated the traffic. |
Destination Port | 443 | This is the port number on the destination that the traffic is targeting. Port 443 is commonly used for HTTPS traffic. |
Protocol | 6 | This represents the protocol used for the traffic. In this case, 6 corresponds to TCP (Transmission Control Protocol). |
Packets | 56 | This indicates the number of packets that were transmitted during this flow. |
Bytes | 60297 | This indicates the total number of bytes transferred during this flow. |
Start Time | 1732951951 | This is the start time of the flow in Unix epoch time (seconds since January 1,1970). |
End Time | 1732951981 | This is the end time of the flow in Unix epoch time. |
Action | ACCEPT | This indicates that the traffic was allowed through the network interface (as per the security groups and network ACLs). |
Log Status | OK | This indicates the status of the log entry, in this case, it shows that the logging was successful. |
You can configure flow logs for your VPC to keep track of all of the traffic on your cloud network: I assume that you have already have done that.
1. Retrieving Flow Log Records
To retrieve the VPC flow logs, I used the utility flowlogs_reader. It supports both reading from S3 buckets as well as from CloudWatch logs. In my case, I have the flow logs in CloudWatch, so I type:
$ flowlogs_reader --location-type cwl /vpc/flow-log > flow-logs.txt
To find out how many records were retrieved, type:
$ wc -l flow-logs.txt
1512436 flow-logs.txt
In this case, there are a whopping 1,512,436 records!
2. Limit the Flow Log to NAT Gateway Traffic
To limit the flow logs to outbound, NAT gateway related traffic. perform the following steps:
- Retrieve the NAT gateway private IP addresses
- Filter to remove flow records with other source addresses
- Remove flow records from the NAT gateway into the VPC
2.1. Retrieving NAT Gateway Private IP Addresses
I am only interested in traffic going out from the NAT gateway. To get the private IP addresses of the NAT gateways in the VPC, type:
$ nat_gateways=$(aws ec2 describe-nat-gateways \
--query 'join(`\n`,NatGateways[].NatGatewayAddresses[].PrivateIp)' \
--output text)
2.2. An AWK Filter for NAT Gateway Traffic
With the public IP addresses,we can filter the flow logs for traffic originating from the NAT gateways, by typing:
$ awk_filter=$(sed -e 's/.*/$4 == "&"/g' <<< "$nat_gateways" | \
paste -s -d '|' - | \
sed -e 's/|/ || /g' \
-e 's/^/$4 != "-" && (/' \
-e 's/$/) {print $0}')
$ awk "$awk_filter" flow-logs.txt > from-nat-gateways.txt
So, now we have all logs which originated from the NAT gateways.
2.3. Filtering Return Traffic to the Requester
We focus on outgoing traffic from the NAT gateway to the public internet. To filter out any traffic that is destined for the VPC (CIDR 10.0.0.0/8), type:
$ awk '$5 !~ /^10\..*/{print $0}' from-nat-gateways.txt > from-nat-gateways-out.txt
$ wc -l from-nat-gateways-out.txt
28894 from-nat-gateways-out.txt
This is nice: The volume of flow records to analyze already reduced from 1.5 million down to 29000!
3. Removing traffic to our own public IP addresses
To remove flow logs of traffic from our VPC to it’s own public IP addresses, do the following:
- Retrieve the public IP addresses in the VPC
- Use an Awk filter to remove flow records to our own IPs
3.1. Retrieve Our VPC’s Public IP Addresses
To find the public IP addresses offered by the VPC, type:
public_ips=$(aws ec2 describe-network-interfaces \
--query 'join(`\n`,NetworkInterfaces[].PrivateIpAddresses[].Association.PublicIp)' \
--output text)
3.2. Use AWK to Filter Traffic Back to Own IPs
To remove all flow logs of traffic back to the VPC’s public IP addresses, type:
awk_filter=$(sed -e 's/.*/$5 != "&"/g' <<< "$public_ips" | \
paste -s -d '|' - | \
sed -e 's/|/ && /g' \
-e 's/$/ {print $0}/')
awk "$awk_filter" from-nat-gateways-out.txt > outgoing-public-internet.txt
4. Removing Outgoing Datadog Destinations
In our case, there is known outgoing traffic to Datadog. If you do not use Datadog, you can skip this step.
It was not easy to filter log records for a set of IP address ranges. Therefore, I created the following python script filter-datadog-ips, The following command removes all traffic flows to Datadog:
filter-datadog-ips < outgoing-public-internet.txt > outgoing-public-internet-without-datadog.txt
5. Analyzing Remaining Traffic
After filtering, you can analyze what remains:
- The number of unique destination IP addresses
- The number of unique destination IP ports
- The DNS name of the IP addresses
- The certificate name of the IP address
- Identifying destinations
5.1. Identifying Destination IP Addresses
To find the number of unique destination IP addresses, type:
$ awk '{print $5, $7}' outgoing-public-internet-without-datadog.txt | sort -u > unique-ip-and-port.txt
$ wc -l unique-ip-and-port.txt
26 unique-ip-and-port.txt
How cool is that! I have only 26 IP addresses left to analyze. A great improvement from 1.6 million!
5.2. Identifying Which Ports Are Used
To see which ports are being accessed, type:
$ awk '{print $2}' unique-ip-and-port.txt | sort | uniq -c | sort -r -n
19 443
3 80
2 465
1 30120
1 2593
Clearly some HTTP, HTTP, SMTP and two vague destination ports.
5.3. DNS Reverse lookup of the IP Addresses
To get an idea of the destination, perform a DNS reverse lookups of the IP addresses by typing:
cat unique-ip-and-port.txt | while read ip port ; do
domain_name=$(dig +short -x $ip)
echo "$ip,$port,${domain_name:--}"
done > dns-lookups.csv
This script performs a reverse lookup for each IP address. If unsuccessful, a dash is used instead.
5.4. Retrieving Certificate Names of the IP Addresses
For IP traffic to ports 80 or 443, it may be possible to read the names on the SSL certificates. To find out, type:
cat unique-ip-and-port.txt |while read ip port; do
if [[ $port -eq 80 ]] || [[ $port -eq 443 ]]; then
alt_subject_name=$(timeout 3 openssl s_client -connect $ip:443 < /dev/null 2>&/dev/null | \
openssl x509 -noout -text 2>/dev/null | \
grep DNS: | \
tr '\n' ' ' | \
sed -e 's/[ ]*DNS://g' -e 's/,/ /g' | tr -d '\n'
)
[[ -z $alt_subject_name ]] && alt_subject_name="-"
else
alt_subject_name="-"
fi
echo "$ip,$port,$alt_subject_name"
done > certificate-lookups.csv
This script attempts to retrieve the SSL certificate from the IP address at port 443. If found, it extracts the DNS names on the certificate. It may not be entirely correct. As the SSL connection is setup without a hostname, the serve may present the default certificate on that host. For instance, connecting to an IP address of xebia.com
, will result in a certificate from kinsta.cloud
.
5.5. Identifying Destinations
After collecting the DNS and certificate name, merge the two datasets by typing:
paste -d, dns-lookups.csv certificate-lookups.csv > domain-and-certificate-names.csv
In the resulting file you will find all public IP traffic destinations annotated with DNS and certificate names.
Conclusion
This series of Unix commands provide a quick way to extract which public IP addresses our applications connect to. The reverse DNS lookup and certificate retrieval for HTTP/HTTPS hosts will help to identify the targets and help you create an egress authorization allow list. Note that this is not a generic solution. Each situation is different, and you may need to add additional steps.