Also consider providing task-specific roles to your users. For example a user, who is allowed to make changes to the platform, does not need to assume a role with write access when he/she is only examining the platform.
Minimize blast radius
It’s naive to think that your environment will never be compromised: at some point it will happen. We also know that when we get compromised the next question will be: “what is the impact of this breach?” and possibly how painful the answer shall be: “it affects all our production apps”. In AWS it is very common to apply a multi-account strategy. An AWS account provides the highest level of isolation and a customer can have as many AWS accounts as needed. In the early days running multiple AWS accounts was adding quite some overhead to operational tasks, but with the recent developments around AWS Organizations, AWS Control Tower, AWS Transit Gateway, AWS Resource Access Manager, et cetera. most of the challenges are solved. At least to the level at which they cannot be an excuse to apply a multiple account strategy anymore.
Common applied strategies are:
Business units oriented strategy (account per business unit, multiple applications and owners per account).
Application lifecycle oriented strategy (account per stage in D, T, A and P, multiple applications and owners per account).
Cost-center oriented strategy (account per cost-center, multiple applications and potentially per account).
Application oriented strategy (account per application, single owner).
Owner oriented strategy (single owner per account, multiple applications per account).
User oriented strategy (one account per user, mainly used for sandboxing purposes).
From our experience I can tell that, in the end, I prefer an application oriented strategy above the other strategies. It gives the smallest possible blast radius, fine-grained insights and flexibility in ownership/access.
Reduce the effective time that you are vulnerable
Losing or leaking passwords is a very common mistake. It happens either by mistake or careless way of working. Think of passwords that are exposed over unencrypted channels, committed in source code repositories, left on a sticky note under the keyboard or on the monitor, easily guessable, etc. Once the credentials are in the hand of an attacker they can be abused without any limitations. Adding OTP (2FA) and/or temporary credentials provided by an identity provider reduces the time when the credentials are valid to a minimum.
For this threat I would recommend to implement OTP on the corporate identity provider (with reasonable expiration period for STS credentials) or enforce the presence of MFA authentication on the IAM user (e.g. don’t provide write/read access to the platform without a valid MFA token).
Enhance your IAM policies with additional attributes
IAM policies on AWS can be written very fine-grained. In addition to specifying which actions are allowed the policies can be enhanced by using conditions. These conditions have to be met before an action is allowed. This can be used to control who has access to what based on context-aware information, such as a source IP address or information about the authentication process. Multiple conditions can be combined to form a solid policy statement.
Interesting conditions are:
aws:MultiFactorAuthPresent: Use this to enforce MFA authentication before allowing access to AWS services or resources.
aws:SourceIp: Use this condition to specify a source address to allow actions from. For example only from your company’s known locations.
aws:RequestedRegion: Use this condition to limit the action to a specific AWS region. E.g. most attackers will use the most popular AWS Region us-east-1. You might only want to allow an authenticated and authorized user access to a specific region. We are based in Europe and therefore mainly are using the regions eu-{west,central}-1.
An exhaustive list of condition keys can be found here. Some services have specific condition keys, those are listed here.
Reduce IAM privileges and implement Service Control Policies
Administrator users regularly have too many privileges on the platform. It is very common that those users have the AdministratorAccess managed IAM policy attached, while most of the operations are read-only, and write-access operations are limited to a very small subset of calls. For most tasks it is sufficient to have write-access via Infrastructure As Code. You can enforce calls to CloudFormation to use a dedicated role with write access. This way making changes to the platform requires some thinking and requires specific knowledge of the platform. I know, it’s not foolproof. But everything to slow down the attacker helps.
Also implementing Service Control Policies is recommended. I prefer to have Service Control Policies applied in a coarse-grained way. This way you can, Organization-wide, restrict access for actions that should ‘never happen’.
Protect landing zone assets
Prevent attackers (and users) to disable foundation services such as your VPC, GuardDuty, CloudTrail, etc. I assume you’re using a pipeline to deploy your landing zone assets. In your SCPs you can specify policies to allow your pipeline roles access to foundation services and deny write-access to all other identities.
Restricting access to subscription calls
Some API calls are expensive and have an impact on the long term. Think of API calls for buying (C)RIs, enabling AWS Shield Advanced, and other long running subscriptions. This is something you, also in regular situations, want to prevent and only grant to specific roles. It works best when restrictions are applied at Organization root-level.
Denying access to unused services and regions
Attackers will try to use whatever service they can. The less services are allowed, the lower their success rate will be. Same goes for regions. If you know that your workloads are only going to use a subset of available AWS regions it might be a good idea to deny all other regions (be careful, some global services/resources require access to the us-east-1 region).
Tip: use the NotAction statement to whitelist services
Proactively monitor behavior
AWS provides a number of services that can be used to analyze behavior on the AWS platform. As a starting point I recommend configuring CloudTrail to record activity in all accounts and regions, preferably on Organization-level and enable Amazon GuardDuty. And don’t wait with gathering insights on activity until you have been compromised.
Periodically review IAM policies
Use IAM Access Analyzer to discover resources that are (unintentionally) shared with resources outside your zone of trust. It will give you an overview of resources that are remotely accessible.
To improve your IAM policies, use the IAM Access Advisor to collect information about which services are being used by your users. This is valuable input to start optimizing/tweaking IAM policies. You want to consider removing unused services from the list of allowed actions.
Monitor activity
One of the most difficult questions to answer is: “are you currently under attack?”. Observability is mandatory to answer this question. Logging all activity is easy, but dividing events in two categories: expected and unexpected is a lot more difficult. How do you define expected events? It requires a lot of tuning, a large number of false-positives etc. And what to do with new types of events? For example due to new features and service launches etc.
Amazon GuardDuty can help here. This is a fully managed service by AWS and has a very straight forward configuration. You simply enable the services and it will start monitoring your environment. It uses CloudTrail, VPC Flow Logs and Route 53 logs (DNS) to detect anomalies. Detection is done by Machine Learning models. It supports a large number of events and the list is still growing. Best example is for example the use of your AWS credentials. GuardDuty will build up a baseline of your behaviour. If you, for example, use your credentials from your (home) office and suddenly your credentials are being used at a different geographical location, GuardDuty will generate a finding. This finding can be forwarded to your favorite monitoring system. By consuming events from GuardDuty you gain insights on what is happening on the platform. For some events you can even decide to have auto-remediation in place by triggering a Lambda-function when an event has occurred.
Configure budget alerts
Cloud is mainly based on pay as you go consumption models. So significant changes in service usage will lead to financial changes. Most obvious is a sudden raise in costs. e.g. the attacker uses your environment to start crypto mining or data leakage is happening (data transfer out is expensive in the cloud). But an unexpected drop in costs is also something that might happen. An attacker might stop or terminate resources to wipe his/her traces or when he wants to hurt you by deleting data. Monitor changes on the budget closely to gain insight on behaviour. Setting a fixed budget alert might help, but it is probably not the smartest way. AWS and cloud management platforms provide forecasting and cost anomaly detection. Leverage those features to keep a close eye on your costs and to reduce the number of false-positives generated by budget alerts.
Final thoughts
With this blog I didn’t have the intention to come up with an exhaustive list of security controls. I did have the intention to give some insights on how to reduce the impact of a breach. We all know that software (and hardware) have their weak spots when it comes to security. And we also know that we might get faced by an attacker. There is a lot we can do to reduce the impact in case something happens. Hopefully this blog contributes to make your cloud environment a safe place.
Note: The issue is solved by the time of releasing this blog #noworries.