I’ve written previously about both Amazon Aurora and DynamoDB, and the massive price difference between them. To recap, with some back of the envelope calculations, based on the limited Aurora information available, DynamoDB was found to be 28 times more costly.
At that point it becomes worthwhile to look at alternatives. So here’s one alternative: Instead of using DynamoDB, use Amazon Aurora, and build a sharding framework on top of it if needed.
But before we can conclude like that we need some hard facts, and luckily Aurora became generally available a few days back.
The use case here is to use the database in much the same way we’d use a NoSQL store. That entails having a hash key, a range key, and a blob of otherwise undefined data. We would auto commit on every insert, as we don’t require transactions to span more than a single row update. Each select is limited to a single hash key, with some subsection of the range key selected and sorted in ascending or descending order.
The table definition for this looks like the below:
CREATE TABLE `test_tbl` ( `hash_key` char(36) NOT NULL, `range_key` bigint(20) NOT NULL, `value` mediumblob NOT NULL, PRIMARY KEY (`hash_key`,`range_key`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
So how many rows can we put and get from Aurora given the above? I don’t know exactly what test Amazon did internally in order to get to their number (update: see comment below), but their use case was probably a lot more involved than using their database like this.
Besides testing the various instance types available for Aurora, on the client side I used a single m4.10xlarge running stock Ubuntu 15.04. I picked the m4.10xlarge for it’s network bandwidth and core count, ensuring any performance limitations wouldn’t be on the client side. To verify this wasn’t a bottleneck, I also started a second m4.10xlarge and evenly spread the client load among them. This verified that the performance obtained from the database remained the same, eliminating client limitations as a bottleneck.
Some notes on the data size: hash_key is random UUIDs stringified. Range key is just a long, and the value is a random blob of 64 kB.
These are pretty decent number I’d say. It’s far from the number presented by Amazon, but again, different use case I’d say. The sheet with all the numbers is available here if you want to dig in.
On the db.r3.2xlarge instance and larger the test didn’t manage to max out the CPU on the database, for inserts nor selects. So likely the database is hitting some kind of IO limit at that point. On the largest db.r3.8xlarge instance we had a sustained network transfer rate of ~110 MB/s on 40 insert threads, and ~480 MB/s on 40 select threads. All select testing used the 12 million rows inserted by the final 40 threads insert test, and the DB seemed to be showing a constant 60% cache hit ratio for selects.
So back to using this as a NoSQL store. A db.r3.8xlarge was able to do 1639 puts per second and 6900 gets per second. If we ran a db.r3.8xlarge it would cost us $4.64/hour in US East, or about $3390 per month. The same performance on DynamoDB, given 64 kB blobs would almost be a whopping $60k per month! It’s not 28x, but still a massive difference.