Xebia Background Header Wave

In this blog we saw how to use DynamoDB to store data for a short interval. Because of DynamoDB’s flexibility, we can store just about any piece of text with a key and retrieve it efficiently. Dynamo isn’t the only solution, of course. Amazon also offers ElastiCache. I’ll summarize my experience with both solutions in this blog.

I’ve used DynamoDB to store small JSON formatted text that can be retrieved by a key. This
blog
shows how that works. This solution is really easy to implement. In the code that accompanies the blog, you’ll find CDK scripts to deploy the infrastructure for the cache.

So, setting up the infrastructure is easy to do. The maintenance effort is zero, because Dynamo is a service managed by AWS.

I’ve never tried to get to the performance limits, but for my use-cases all default settings were fine.

There is a caveat though, because DynamoDB has a maximum record size of 400KB. This limit made the database less convenient in a recent use case where we stored a large JSON document, retrieved from a content management system. This document was sent to an app on start up. In this case, loading the data happened in the background so the user would be unaware. Because of the 400KB size limit, Dynamo would have been less convenient.

The solution we selected for the use case above was Redis, abstracted through ElastiCache. Getting the data is simplicity itself:

const data = await this.redisClient.get('data');

Creating the cache with CDK:

redis_cluster = elasticache.CfnCacheCluster(
    scope=self,
    id="redis_cache_cluster",
    engine="redis",
    cache_node_type="cache.t3.small",
    num_cache_nodes=1,
    cache_subnet_group_name=...,
    vpc_security_group_ids=[...],
)  

We have to consider how to distribute data over the cluster and how many replicas are needed and what the node type should be.

With Dynamo, if the size of the data item is bigger than 400K, this would require a loop to get data in multiple sets which would then have to be concatenated or sent to the client in batches. Performance would definitely suffer:

import boto3

if __name__ == '__main__':

    dynamodb = boto3.resource('dynamodb')
    table = dynamodb.Table('Cache')

    cache_data = []

    batch = table.scan()
    data = batch['Items']
    cache_data.append(data)

    while 'LastEvaluatedKey' in batch:
        batch = table.scan(ExclusiveStartKey=batch['LastEvaluatedKey'])
        data = batch['Items']
        cache_data.append(data)

And creating a Dynamo table:

const cacheTable = new dynamoDB.Table(this, 'CacheTable', {
  tableName: 'Cache',
  partitionKey: {
    name: 'id',
    type: dynamoDB.AttributeType.STRING,
  },
  billingMode: dynamoDB.BillingMode.PROVISIONED,
  timeToLiveAttribute: 'ttl',
  removalPolicy: cdk.RemovalPolicy.DESTROY,
});

One more issue to consider is the time it takes to provision infrastructure. While the CDK code for Dynamo takes only minutes to complete, creating a three-node ElastiCache cluster may take 15 minutes. This won’t matter if the infrastructure is stable, following a classic develop/test/production setup. It might take a bit to get the infra running, but once finished, there’s no cost to rolling out a new application version.

One of the great features of a cloud infrastructure like AWS, however, is how easy it is to deploy a stack for a branch or to create a copy for a specific use case, like a testing environment to investigate the cause of a bug. With short-lived infrastructures, the start-up time for a ElastiCache cluster starts to get noticeable. Also, note the engine_version property: you’ll have to upgrade someday and keep doing so. Dynamo would be upgraded automatically.

Cost could also be an issue. Since Dynamo costs are calculated per use (you only pay for data manipulation and retrieval actions, not for storage) storing data is essentially free. This is not the case for Redis, since it uses EC2 instances. I don’t think it’s easy to decide which solution is cheaper. This Dynamo pricing calculator might help.

And finally, if Dynamo is your primary database, you might as well use it as a cache, if only to minimize the number of tools you need to manage in your team.

Notes:

  • DAX may be cheaper and easier to use than plain Dynamo tables, but I haven’t tried it yet.

Conclusions

So, the tradeoffs for choosing between Dynamo and Redis would be:

  • ease of use
  • startup time
  • maintenance effort
  • do you want yet another database?

As always, it depends.

Jan Vermeir
Developing software and infrastructure in teams, doing whatever it takes to get stable, safe and efficient systems in production.
Questions?

Get in touch with us to learn more about the subject and related solutions

Explore related posts