I’ve been using DynamoDB for a few months now after re-architecting a system which started becoming painful to scale on a traditional RDBMS system. The problem wasn’t necessarily read/write performance but rather the total storage space needed as a lot of “unstructured” blobs was stored in the DB.
DynamoDB gives me a care free setup with individually scalable reads, writes, and storage, fully managed by AWS, making my life easier. I’d rather spend my time focusing on the business use case rather than managing infrastructure. Prior to picking DynamoDB I considered Apache Cassandra. Both share the same origins in the form of the Dynamo paper, and the hash + range key data model fits my use case perfectly.
Here’s why I didn’t pick Cassandra: For a decent minimum setup I’d argue you need at least three servers. Paying on-demand or reserved prices for three servers is one thing, but they also need someone to manage them ( = time/money ), including but not limited to OS, security, updates, backups, monitoring and other basic management tasks. You might find it enjoyable to tinker with these things, but I’d rather not. Sure, I’m fully capable of either knowing what to do or figure it out if needed, but I’d rather not spend the time on that right now (unless you pay me for it).
But it’s all about scale. At some point it’s more economical to manage software yourself, or manage your own hardware, or even build your own data centers. I’m far from most of these scale points, and will probably never need to bother with most of them. But when does it become more economical to run your own Cassandra cluster versus using DynamoDB?
It’s not a straightforward calculation, but Stackdriver recently published an interesting performance test, comparing the new AWS C3 instances against Google and Rackspace cloud servers. Using the most cost efficient setup from their study, could we calculate when you should switch from a managed NoSQL solution to a self-managed Cassandra installation?
The most cost effective setup (from a maximum performance stand) would use AWS C3 xlarge instances. With a price of $0.30 per hour per instance (on-demand, US East pricing) and three instances that would add up to $0.90 per hour. Assumably for performance reasons the Stackdriver test placed all VMs in the same availability zone, which would make the Cassandra setup less resilient than the cross availability zone features of DynamoDB. That’s something to consider, if your uptime requirements are stricter than what one AZ can guarantee. We’d rely on local instance storage, so AWS EBS costs can be left out. We’ll also leave out any backup costs as we could assume a similar cost structure for both Cassandra and DynamoDB in this case.
So for $0.90 per hour you get roughly 15,000 inserts per second with latencies in the same ballpark as what I’ve experienced on DynamoDB. Sadly Stackdriver haven’t published any read performance figures (yet), but let’s assume we expect 2x eventually consistent reads compared to writes (DynamoDB read performance is priced on consistent reads) with the 3x replication factor used in the test. As we’ll see below, write costs dominate DynamoDB pricing anyway.
With those figures in mind, using US East DynamoDB pricing, 15,000 writes/s and 15,000 consistent reads/s, against a single table would be a whopping $11.70 per hour. Storage cost has not been included here, but assuming the C3 xlarge setup with 3x replication gives us 60 GB of usable storage capacity, this would just add a few cents per hour.
$11.70 vs $0.90 is a big difference, at least in isolation. The question then becomes: How much would it cost you to either hire someone or do it yourself to maintain and manage your own Cassandra cluster 24/7? That depends a lot on other factors. You might already have the technical knowhow internally in your company, and assuming the Cassandra cluster is fairly self managing, there might not be many hours needed per day/week/month on average.
Scalability doesn’t just go up, btw. There’s also scalability downwards. For example, with my current needs it’s cost efficient for me to use DynamoDB. You might not need 15.000 writes/reads per second 100% of the time. Scaling the capacity of DynamoDB up and down is almost a no-brainer. Adding and removing servers to a Cassandra cluster is also not extremely difficult, but less granular and something probably a lot of people would be somewhat less comfortable doing on an hourly basis. As always, flexibility and ease of use comes at a premium.
If you’re happy paying a few dollars per hour for a managed solution, don’t need more than 2000-3000 read/write operations per second on average, stick with DynamoDB. If you need “extreme performance” and/or more advanced setups like cross-region replication it’s likely more beneficial to manage your own Cassandra cluster.