Blog

Securing Your Sensitive Data on AWS: A Beginner’s Guide to Amazon Macie

27 Mar, 2024
Xebia Background Header Wave

With the rise of artificial intelligence (AI), the revolutionary transformation of the data management and security landscape has emerged. The exponential growth of data has enabled businesses to innovate more in their products and services, making them more personalized. As organizations adopt Cloud technology like Amazon Web Services (AWS) to modernize their data capabilities and innovate around it, they face the complexity of managing vast information spread across multiple AWS accounts. Each account corresponds to distinct business units and use cases, adding layers of complexity not only in volume but also in the sensitivity and diversity of the data involved.

This involvement ranges from health information (PHI) and payment card industry (PCI) data to personally identifiable information (PII) and proprietary organizational intellectual property. A data breach involving this information can cause significant financial and reputational losses. Therefore, identifying and safeguarding this sensitive data scattered across accounts becomes one of the critical challenges.

In AWS, Amazon Macie emerges as a critical solution in addressing this challenge. Macie plays a pivotal role in answering below fundamental questions that are essential for robust data security and compliance

  • Understanding Data Storage: What data do I have in my S3 buckets, and where is it located?
  • Assessing Data Exposure: How is my data being shared and stored, and is it publicly or privately accessible?
  • Near Real-Time Data Classification: How can I classify my data in near real-time to ensure constant vigilance?
  • Identifying Sensitive Information: What PII or PHI might be exposed publicly, and how can I mitigate this risk?
  • Automating Compliance and Security: How do I build and implement workflow remediation to meet my specific security and compliance needs?

What is Amazon Macie?

Amazon Macie is a data security service that uses machine learning and pattern matching to identify sensitive data in Amazon S3, offering insights into security risks and automated protection. It continuously evaluates S3 buckets using built-in & custom criteria for potential data security or privacy concerns providing findings and detailed statistics for informed decision-making. Macie integrates with Amazon EventBridge and AWS Security Hub, enhancing its capabilities for monitoring and remediation of data security issues.

Use Cases of Amazon Macie

  • Compliance Monitoring and Reporting: Organisations subject to regulations like GDPR, HIPAA, or PCI-DSS can use Macie to automatically discover and classify sensitive data, ensuring compliance by identifying where this data resides and how it’s being used or accessed.
  • Intellectual Property Protection: Companies can leverage Macie to detect and protect intellectual property stored in S3 buckets, ensuring that proprietary information is not inadvertently exposed or accessed by unauthorised users.
  • Mergers and Acquisitions (M&A) Data Security Assessmentss: During M&A activities, Macie can be used to quickly assess the data security posture of acquired or merging entities, identifying sensitive data and ensuring that it complies with corporate policies and regulations.
  • Educational Institutions Protecting Student Information: Schools and universities can use Macie to safeguard student records and sensitive information, ensuring compliance with education-related privacy regulations.
  • Healthcare Data Management: Healthcare organisations can employ Macie to secure patient data, classifying and protecting health information in accordance with HIPAA and other health data protection standards.

Getting Started with Amazon Macie

Enabling Amazon Macie is a very easy task. Amazon Macie is a regional service, so you need to make sure you select the respective region as per your need from the top left corner in AWS Console.

In multi-account environments, you can monitor Macie’s usage across your organisation in AWS Organizations through the usage page of the delegated administrator account. For this blog, I’m using a single account set up.

Part of pre-requisite, we need to create S3 bucket to store results.  For every object analysed, Macie logs details in ‘sensitive data discovery results,’ including objects it couldn’t analyse due to errors. These results are stored for 90 days, but for longer retention, you can configure Macie to save these results in an S3 bucket.

Macie-01

  • Just to understand how macie shows findings, you can generate sample findings in Macie. Go to Macie Console → Setting → Generate Sample finding

Macie-02

To view finding, you can go to the Finding page on the same console.

Macie-02

As you can see, in above screenshots Macie generates sample findings related to financial, personal data. You can drill more by selecting specific to finding, it will show bucket, total number of sensitive data, type of information.

Let’s create job now.

  • You can create a custom job where you can define specific bucket, criteria. Go to Job → Create Job

Macie-03

  • Select bucket to scan → Refine scope, you can select frequency of job, type of files to scan. For demo purposes, I choose One time job. But depending on your requirement we can schedule a job Daily, weekly and monthly.

Macie-04

  • In this job, I will be checking the date of birth, I’m choosing a custom managed identifier – “DATE_OF_BIRTH”. So this job should be able to detect these keywords in files : bday,b-day, birth date, birthday, date of birth, dob

Macie-05

  • I created the sample file with below content and uploaded it to the bucket.
Date of Birth: 1961-04-21
Future Date: 2024-03-04
--------------------
date of birth: 1912-01-20
Future Date: 2024-03-06
--------------------
dob: 1956-05-25
Future Date: 2024-03-16
  • This One-Time job will get executed once you finish with creation. However, you can also execute this job any time after creation.

In few minutes, Macie shows below findings

Macie-06
Macie-06

Above we just see the example of Custom Managed Identifier – date of birth. These identifiers are defined by AWS. You can find a list here.

But let’s say you want to define some customer criteria which is not in the AWS list, you can do it using Custom Data Identifier. With custom data identifiers, you can define detection criteria that reflect your organisation’s particular scenarios, intellectual property, or proprietary data—for example, employee IDs, customer account numbers, or internal data classifications.

Macie Benefits

  • Compliance Assurance: With its capability to discover and classify sensitive data according to various regulatory standards, Macie assists organisations in meeting compliance requirements for regulations such as GDPR, HIPAA, and PCI-DSS, thereby mitigating the risk of compliance-related penalties.
  • Automated monitoring and actions: Amazon Macie seamlessly integrates with other AWS services like Amazon EventBridge, AWS Security Hub, and AWS Lambda, facilitating the creation of automated workflows which enables quick response and remediation actions.
  • Global Data Visibility: It provides organisations with a unified view of their sensitive data across multiple AWS regions and accounts, enhancing the ability to manage data security on a global scale. This visibility is crucial for multinational companies dealing with data residency and sovereignty issues.

Macie pricing

Amazon Macie adopts a usage-based pricing model, charging customers based on the amount of data processed for sensitive data discovery. It offers a 30-day free trial if you are enabling it for the first time. During free-trial, Amazon Macie provides an overview of how much monthly cost would be after free-trial. As Amazon Macie charges based on the data scanned, we can control cost by limiting the amount of data scanned e.g. maybe there is no need to scan CloudTrail logs, try files with specific extensions, tags.

Conclusion

In a world where data breaches can lead to significant financial and reputational losses, the role of Amazon Macie in safeguarding sensitive data is invaluable. Macie brings benefits that enhance an organisation’s security posture. I hope this blog gives you a good understanding of AWS’s data security service and its importance.

Vikas Bange
Passionate about cloud technology, security and an enthusiastic learner. I believe in learning by sharing. Music fuels my journey, adding rhythm to my growth.
Questions?

Get in touch with us to learn more about the subject and related solutions

Explore related posts