VPC Lattice offers a new mechanism to connect microservices across AWS accounts and across VPCs in a developer-friendly way. If VPC Lattice lives up to its promise, it could change the way we design landing zones. The question is: do you think VPC Lattice is ready for prime time? If you're designing a brand new landing zone today, would you introduce VPC Lattice right now? Or if you have an existing landing zone with AWS Transit Gateway, do you already plan to replace it with VPC Lattice? In this blog post, we'll look at what is VPC Lattice, how it compares to the alternatives and we'll share some opinions on whether we would adopt it right now.
Context: One of our core values at Xebia is sharing knowledge. Twice a month, we gather with co-workers and organize an internal conference with presentations, discussions, brainstorms and workshops. Last month, we went through Amazon's VPC Lattice workshop with 6 co-workers and discussed our findings. Here's a summary!
What problem does VPC Lattice solve?
Across customers, we see a trend of breaking down large monoliths into smaller services. Increasingly, services are owned by teams where each team has their own AWS account, each with its own VPC. This is a great way to simplify IAM permissions (each team can access only their own AWS account) and keep track of costs (each team can track their AWS spending at the account level). However, it also creates the challenge of inter-connecting VPCs across accounts.
How did we solve this before VPC Lattice?
There are numerous technologies that help you inter-connect across VPCs and Accounts:
- VPC peering allows connections between 2 VPCs. It is simple and straightforward but does not scale well when the number of VPCs grows.
- Transit VPCs are a specific hub-and-spoke network topology that attempts to make VPC peering more scalable. However, this requires a self-managed network router in the Transit VPC and you need to take great care to avoid overlapping CIDR ranges across VPCs.
- AWS Transit Gateway was first introduced at re:Invent 2018 and offers a managed service to inter-connect VPCs. It removes the need for a self-managed network router and gives fine-grained control over layer 3 network traffic routing. However, you need to ensure there are no overlapping CIDR ranges across VPCs. The developers creating the microservices typically don't like to spend time on network configurations and look for network specialists to set up connectivity.
- AWS PrivateLink was first introduced in November 2017 and offers a mechanism to access services across AWS accounts and VPCs without worrying about overlapping CIDR ranges . AWS PrivateLink is particularly well-suited for connectivity with SaaS based services where you want a strict separation between partner and customer. You can also use AWS PrivateLink to inter-connect your VPCs across accounts. However, in a larger setup you would end up with an increasing number of point-to-point connections for each pair of service and consumer. This becomes costly and hard to maintain.
- AWS Resource Access Manager allows you to share a single large VPC across multiple accounts. This is a simple and often overlooked strategy that gives the best of both worlds: strict separation of IAM policies and cost attribution with simple inter-connection at the network level. On the flip side of this argument, you have limited isolation between services. One service could exhaust the available IP addresses and impact the scaling of other services. For small scale setups or for early adopters of IPv6 (which is worth a separate blog post) this could be an acceptable risk.
So far, the most common solution has been AWS Transit Gateway because it scales best to increasing levels of complexity. It is for this reason that the title of this blog post draws a comparison between VPC Lattice and AWS Transit Gateway.
What are the core concepts in VPC Lattice?
At the highest level, VPC Lattice distinguishes Services and Service Networks. A Service is the unit of access that the development teams care about - either when consuming service dependency or when offering their own service to other consumers. A Service Network is what joins multiple services logically to allow connectivity. A service only ever belongs to a single Service Network.
To connect a service to target workloads, you use the concept of Target Groups. This resembles a familiar concept from Elastic Load Balancing. A target group can refer to Instances, IP addresses, a Lambda function or an Application Load Balancer. It is also possible to refer to an Auto Scaling Group and automatically add or remove instances as it scales. In this case, a bit counter-intuitively, you need to configure the Auto Scaling Group to refer to the VPC lattice target group.
What are the strengths of VPC Lattice?
As mentioned in the introduction, we went through the VPC Lattice workshop. During and after the workshop, we asked our panel what they liked about VPC Lattice and what concerns they'd have if you adopt VPC Lattice today. Let's start with the positives that came up:
- Participants liked the fact that VPC Lattice works with readable domain names and that it is possible to customize these through CNAME records in Route 53.
- Participants were particularly happy that you can secure services and service networks using IAM policies. IAM policies allow for fine-grained authorization at the API level.
- Participants were happy with the service abstraction cross EC2 / EKS / Lambda allowing inter-connectivity between different stacks without worrying about the underlying details.
- Participants running into issues with IAM permissions praised the readability of the error messages that make it easy to pinpoint the problem and resolve it. If you have ever found yourself weeding through CloudTrail logs and needing
aws sts decode-authorization-message, you'll value the simplicity.
What are today's caveats you need to be aware of?
With all these strengths and positives, you might think: what are we waiting for? Why not adopt VPC Lattice today itself? The answer to this depends on your context and requirements. There are a few things to take into consideration:
- VPC Lattice is great if all your cross-VPC network traffic consists of HTTP/HTTPS/GRPC. In a well-architected microservice architecture, there is a good chance this is true. However, it does have consequences. For example, there would be no cross-VPC SSH or RDP traffic. You can work around this using Session Manager instead of SSH or Fleet Manager instead of RDP. And network traffic between the microservice and its database typically remains within the same VPC which is not a problem.
- VPC Lattice comes with service quota of 10 Gbps per Availability Zone (AZ) / 10k requests per second (RPS) per AZ. This is sufficient for most use cases, however if you have a very large scale setup, you need to take this into account.
- VPC Lattice is billed on an hourly basis. If you're coming from a setup using Application Load Balancers in front of EC2 instances, VPC Lattice pricing looks quite similar. However, if you are coming from an AWS Lambda based serverless setup that scales to zero, VPC Lattice does add a significant fixed minimum cost to your bill.
- Unlike AWS Transit Gateway, VPC Lattice currently has no way of facilitating cross-region connectivity. You would use it for a single-region solution only.
- Finally, Region Availability is somewhat limited. The good news is: AWS has already added 4 regions to the list just 2 weeks ago.
How does VPC Lattice compare to AWS Transit Gateway?
Leaving aside the maturity, feature support and region availability, the fundamental difference between AWS Transit Gateway and VPC Lattice is that Transit Gateway is a network layer 3 construct whereas VPC Lattice is a layer 7 construct. Because of this, Transit Gateway serves as a coarse-grained mechanism for routing IP traffic regardless of protocol, port or packet content. It is completely unaware of whether you are sending SSH traffic or HTTPS traffic and most certainly does not care about your HTTP headers, methods or paths.
VPC Lattice, being a layer 7 construct, understands the HTTP protocol and gives you fine-grained control over where to send your traffic based on exactly these parameters like HTTP headers, methods and paths. In addition to this fine grained targeting, it also allows fine-grained IAM permissions that take this into account.
Conclusion: Can VPC Lattice replace AWS Transit Gateway today?
The answer is: maybe. If you have a clean HTTPS-based microservice architecture that runs in a single region at reasonable scale, you should definitely consider VPC Lattice. It'll give you fine-grained control over security through IAM permissions and remove a lot of the burden of managing large, multi-VPC networks. That said, it is early days and moving your cross-VPC/cross-account network connection is an invasive procedure. So if you are a typical early adopter and this looks interesting, the least we recommend is to do a PoC!
If you're still deliberating and want to have a chat about this, feel free to drop me an e-mail at email@example.com.
With thanks to Jacco Kulman, Joris Conijn, Kevin Kessels, Konstantinos Bessas, Laurens Knoll and Tibor Hercz for participating in the VPC Lattice workshops and sharing opinions that served as inputs for this blog post.
Image license for banner image: Creative Commons CC0