Introduction
With an ever-expanding digital universe, data storage has become a crucial aspect of every organization’s IT strategy. The cloud, particularly Amazon Web Services (AWS), has made storing vast amounts of data more uncomplicated than ever before. However, without the right approach and a well-thought-out strategy, costs can quickly pile up.
Whether you’re an IT professional managing vast amounts of enterprise data, a small business owner seeking to control costs, or simply an AWS user looking to make the most of your cloud storage, this blog will provide valuable inputs to help you optimize your AWS storage bill effectively. Below, we will explore various strategies to optimize your storage costs on AWS, from choosing the appropriate storage classes and leveraging lifecycle policies to utilizing intelligent tiering and implementing data archival and deletion strategies.
S3 Storage
Undoubtedly, anyone who uses AWS will inevitably encounter S3, one of the platform’s most popular storage services. Customers often use S3 to store vast amounts of data but forget to configure lifecycle management policies.
Lifecycle Management for Amazon S3 buckets is a feature that allows you to automate the moving of your data between different storage classes, deleting it once it’s no longer needed. This feature helps optimize costs and effectively manage your data throughout its lifecycle. With proper lifecycle management of AWS S3 objects, it can be optimized for cutting costs and carbon emissions.
Selecting the appropriate storage class allows you to store your data most cost-effectively, based on how frequently it needs to be accessed and how quickly it needs to be retrievable.
The following table gives you an overview of AWS storage costs.
Storage Class | Designed For | Retrieval Change | Min. Storage Duration | First Byte Latency | Min. Billable Object Size |
S3 Standard | Frequently accessed data with millisecond access. Accessing it more than once a month. | None | None | Milliseconds | None |
S3 Standard IA | Long-lived, infrequently accessed data (once a month) with millisecond access. | per GB retrieved | 128 KB | Milliseconds | 30 Days |
*S3 Intelligent Tiering | Data with unknown, changing, or unpredictable access patterns. | None | None | Milliseconds | None |
S3 One Zone IA | Recreatable, infrequently accessed data (once a month) with millisecond access. | per GB retrieved | 128 KB | Milliseconds | 30 Days |
S3 Glacier Instant Retrieval | Long-lived archive data accessed once a quarter with millisecond access. | per GB retrieved | 128 KB | Milliseconds | 90 Days |
S3 Glacier Deep Archive | Long-lived archive data accessed less than once a year with retrieval times of hours. | per GB retrieved | None | Minutes or Hours | 180 Days |
S3 Glacier Flexible Retrieval | Long-lived archive data accessed once a year with retrieval times of minutes to hours. | per GB retrieved | None | Minutes or Hours | 90 Days |
Regarding Glacier Deep Archive and Glacier Flexible Retrieval, you must restore archived objects before accessing those objects.
Storage Lens: the key to understanding S3 buckets
Amazon S3 Storage Lens provides a comprehensive view of your object-storage usage and activity across your entire organization for detailed analysis of storage patterns. With S3 Storage Lens, you can identify critical trends like fastest-growing buckets and prefixes, enabling efficient storage management and planning. S3 Storage Lens helps you identify potential areas for savings, such as buckets without S3 Lifecycle rules, which aids cost-effective storage practices. With Storage Lens, you can analyze S3 costs stemming from noncurrent version objects, deleted market objects, or incomplete multipart uploads.
You can find Storage Lens in the AWS Console by going to S3 and looking at the bottom of the menu on the left, which will show the default dashboard along with a summary of storage and cost optimization. You can also enable advanced metrics and recommendations features for extra assistance and information, all of which can help you learn how to configure Lifecycle rules for S3 buckets.
Storage Lens allows you to:
- Reduce the number of non-current versions: If S3 Versioning is enabled without corresponding lifecycle rules to transition or expire noncurrent versions, a significant accumulation of these previous versions can occur, leading to increased storage costs due to occupying space without being actively managed or removed.
- Locate incomplete multipart uploads: Multipart uploads in Amazon S3 allow you to upload large objects (up to 5 TB) in smaller parts. This method enhances throughput and provides quicker recovery from network issues. However, if the multipart upload process is interrupted, the already-uploaded parts remain in the bucket in an unusable state. These incomplete parts will continue to incur storage costs until the upload is either completed or the incomplete parts are manually removed.
- Uncover cold Amazon S3 buckets: Utilise S3 Storage Lens advanced metrics to assess the “coldness” of your S3 buckets, indicating rarely or never accessed data, content, or objects. Key metrics like GET requests and Download Bytes help determine your buckets’ daily access frequency. Track these metrics over months to identify consistent access patterns, detect unused buckets, and move data accordingly.
If you still need more help with access patterns, there’s Amazon S3 Intelligent-Tiering. The Amazon S3 Intelligent-Tiering storage class automatically adjusts to optimize storage costs by moving data to the most cost-effective access tier based on changing access patterns. For a nominal monthly charge, it monitors object’s access patterns and shifts objects that haven’t been accessed to lower-cost tiers.
EBS Storage and EBS Snapshots
AWS provides two types of General Purpose SSD volumes: GP2 and GP3. GP3 volumes provide better cost efficiency than GP2, offering a baseline performance of 3,000 IOPS and 125 MiB/s at no additional cost. In terms of performance, GP3 volumes allow you to provision up to 16,000 IOPS and throughput up to 1,000 MiB/s independently of storage capacity, giving you more flexibility to fine-tune performance based on your application’s needs. This capacity is an advantage over GP2 volumes, where IOPS scales with the size of the volume, potentially leading to over-provisioning of storage to meet performance requirements. What’s more, GP3 offers a 20% lower price than existing GP2 volume types.
EBS Snapshot
An EBS Snapshot, or Amazon Elastic Block Store Snapshot, is a point-in-time copy of your data, created from Amazon EBS volumes and used for backups, copying data across regions, and improving backup compliance. Snapshots can be taken daily or weekly, as requirements vary based on the customer. But have you configured the Lifecycle for these Snapshots? Piling up too many old snapshots leads to unwanted costs. Lifecycles of EBS Snapshots can be managed automatically.
Use Amazon Data Lifecycle Manager to automate the creation, retention, and deletion of EBS snapshots and EBS-backed AMIs. With Data Lifecycle Manager, you can also reduce AWS storage costs by deleting outdated backups.
Refer to the Automate snapshot lifecycles procedure to configure lifecycle management policy. In addition to that, you can also make use of the Amazon EBS Snapshots Archive, which offers a low-cost solution for long-term storage of rarely accessed snapshots that do not require frequent or fast retrieval. Typically, snapshots are saved incrementally in the standard tier, only capturing the changes made since the last snapshot. When archived, it is converted into a full snapshot, capturing all the data on the volume at the moment of archiving, and then moved to the archive tier. Archived snapshots can be restored to the standard tier when needed. This new archive option provides up to 75% savings on snapshot storage costs for snapshots stored for 90 days or longer.
Amazon EFS Storage
Amazon Elastic File System (EFS) is a scalable, cloud-native file storage service for applications running on AWS. It provides a simple, serverless elastic file system that can be used with AWS cloud services and on-premises resources.
As storage capacity grows, an application’s requirement to access all files constantly diminishes. Studies and usage patterns reveal that only around 20% of data is actively used, while approximately 80% is accessed infrequently.
EFS supports two storage classes: EFS Standard and EFS Infrequent Access (IA). Data in EFS Standard is automatically moved to EFS IA after a period of non-use, reducing storage costs for files that aren’t accessed regularly, yet remain readily accessible for when you need them.
EFS Lifecycle Management is easy to enable and runs automatically behind the scenes. When enabled on a file system, files not accessed according to your chosen lifecycle policy will be moved automatically to the cost-optimized EFS IA storage class.
The IA lifecycle policy transition offers options from no-transition to transitioning objects that haven’t been accessed in 90 days. Conversely, the transition out of IA policy allows objects to either remain in IA or transition out upon first access.
You can easily configure Lifecycle policy for EFS by referring to theAmazon EFS Intelligent-Tiering procedure.
RDS Snapshots
Like with EBS snapshots, we also take RDS Snapshots. AWS offers automatic snapshot management retention where our engineers take snapshots manually or via 3rd party tooling. If there are too many old snapshots, it may lead to extra RDS Charged Backup costs, which can be very high if RDS snapshots aren’t managed efficiently. To manage them, you can define AWS Lambdas that can monitor old snapshots and take the necessary actions, like informing teams about old snapshot deletion or deleting short automatically, to overwrite and/or remove old and unnecessary snapshots that elevate storage costs.
CloudWatch
Logging is an essential part of application/infra monitoring and troubleshooting. Services like EKS generate an enormous amount of logs. AWS CloudWatch Logs is a valuable service for monitoring and storing logs from various AWS resources. However, the costs can quickly accumulate if not appropriately managed.
By default, CloudWatch Logs retains your log data indefinitely. You should adjust the retention settings for each log group based on your organization’s requirements. If you only need logs for a month, set the retention period to 30 days, as this will not only save the storage cost of CloudWatch but also on AWS Athena queries that scan data on Cloudwatch log group.
Improving operational efficiency and lowering total ownership costs are key motivators for migrating to the cloud, and these goals are also relevant when it comes to storage expenses. Consequently, relocating infrequently accessed data to a more cost-effective storage tier can result in significant cost savings. Nevertheless, manually pinpointing this type of data can be a challenging task. In an ideal scenario, the system would automatically track data access patterns over time and seamlessly migrate data between storage tiers, all while ensuring that applications continue functioning smoothly.
At Xebia, we take pride in implementing these cost optimization best practices to ensure we operate at our highest potential while delivering maximum value to our clients and stakeholders.