Amazon Aurora Performance, as a NoSQL store

I’ve written previously about both Amazon Aurora and DynamoDB, and the massive price difference between them. To recap, with some back of the envelope calculations, based on the limited Aurora information available, DynamoDB was found to be 28 times more costly.

At that point it becomes worthwhile to look at alternatives. So here’s one alternative: Instead of using DynamoDB, use Amazon Aurora, and build a sharding framework on top of it if needed.

But before we can conclude like that we need some hard facts, and luckily Aurora became generally available a few days back.

The use case here is to use the database in much the same way we’d use a NoSQL store. That entails having a hash key, a range key, and a blob of otherwise undefined data. We would auto commit on every insert, as we don’t require transactions to span more than a single row update. Each select is limited to a single hash key, with some subsection of the range key selected and sorted in ascending or descending order.

The table definition for this looks like the below:

CREATE TABLE `test_tbl` (
  `hash_key` char(36) NOT NULL,
  `range_key` bigint(20) NOT NULL,
  `value` mediumblob NOT NULL,
  PRIMARY KEY (`hash_key`,`range_key`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

So how many rows can we put and get from Aurora given the above? I don’t know exactly what test Amazon did internally in order to get to their number (update: see comment below), but their use case was probably a lot more involved than using their database like this.

Besides testing the various instance types available for Aurora, on the client side I used a single m4.10xlarge running stock Ubuntu 15.04. I picked the m4.10xlarge for it’s network bandwidth and core count, ensuring any performance limitations wouldn’t be on the client side. To verify this wasn’t a bottleneck, I also started a second m4.10xlarge and evenly spread the client load among them. This verified that the performance obtained from the database remained the same, eliminating client limitations as a bottleneck.

For the client side, please see the code here.

Some notes on the data size: hash_key is random UUIDs stringified. Range key is just a long, and the value is a random blob of 64 kB.

Aggregated inserts per second

Aggregated selects per second

These are pretty decent number I’d say. It’s far from the number presented by Amazon, but again, different use case I’d say. The sheet with all the numbers is available here if you want to dig in.

On the db.r3.2xlarge instance and larger the test didn’t manage to max out the CPU on the database, for inserts nor selects. So likely the database is hitting some kind of IO limit at that point. On the largest db.r3.8xlarge instance we had a sustained network transfer rate of ~110 MB/s on 40 insert threads, and ~480 MB/s on 40 select threads. All select testing used the 12 million rows inserted by the final 40 threads insert test, and the DB seemed to be showing a constant 60% cache hit ratio for selects.

So back to using this as a NoSQL store. A db.r3.8xlarge was able to do 1639 puts per second and 6900 gets per second. If we ran a db.r3.8xlarge it would cost us $4.64/hour in US East, or about $3390 per month. The same performance on DynamoDB, given 64 kB blobs would almost be a whopping $60k per month! It’s not 28x, but still a massive difference.

6 comments

  1. Hi Christian, really interesting benchmark. You might take a look at our performance benchmarking doc, which talks through how we got the numbers we did. Link here: http://d0.awsstatic.com/product-marketing/Aurora/RDS_Aurora_Performance_Assessment_Benchmarking_v1-2.pdf.

    A key point here is that, for really good benchmark perf, you need to increase the packet rate on the client machines (the pdf talks about how). You’ll also see that we are are using a lot more threads than you are (on the 8xlarge) – on the order of (in aggregate) 1000 to 2000 client threads for our SysBench numbers. I can’t say either will make the difference to your numbers, but might be worth examining the numbers to see if they make a difference in driving more throughput.

  2. Really interesting analysis. I think your pricing though leaves out the IO cost for Aurora. It’s $0.200 per 1 million requests.

  3. 1600 Insert/s ? That’s horrible ! WTF ?!
    Did you try to run the same benchmark on a good old home computer ?

    1. I might at some point do a comparison against MySQL, potentially Postgres, running on the same AWS instance types.

      Note that each of those inserts are about 64kB in size, so about 110 MB/s in aggregate. Could be interesting to also compare performance against smaller blobs.

Comments are closed.