From earlier posts you might have realized I grew tired of DynamoDB. As soon as Amazon Aurora was made available I did a non-comprehensive test to see what it would do as a NoSQL store. From a pricing perspective Aurora seemed to stack up nicely.
Since then I switched my use from DynamoDB over to Aurora, and I’ve not been let down. I get backups/snapshots (as you’d expect), consistent performance and a smaller bill. I’m in control of my sharding so resources aren’t wasted, something which was a problem while using DynamoDB, as it would severely penalize you if had anything that resembled a hot key.
From my use case I can push a single r3.large to 35-40 MB/s of sustained transfer rates with sub-10ms select and commit latency, at which point the CPU is at 95-100%. And it’s happy to keep running at that level for weeks if and when needed, and it still feels snappy when user that level of load. Which brings me to my point..
Amazon Aurora separates storage from compute. On the storage side it distributes chunks of data among a storage array, with multiple copies stored across all availability zones. To optimise on latency a quorum model is used which is still guaranteeing consistency. On the compute side you find the SQL engine, reading and writing to this storage array as needed.
You can only use one of these compute nodes as the master, but you can add multiple other compute nodes as reader slaves. The single master limitation is very common in clustered RDBMS systems, as it’s very complicated to overcome while keeping within the constraints required by a fully ACID compliant system, at least if you want to build scalable solutions. Sharding a NoSQL use case is fairly straight forward, so that’s your path if needed for writes or overall size of data.
But what if your use case is mainly read driven? Then you’ve got at your disposal a really interesting solution! As storage is separated from compute, it means you do not need to perform any additional replication when adding a new reader slave. The data is already and always replicated anyway. So as soon a new reader slave is up and running, you can immediately start firing off select statements to it. Combine this with the pay as you go model of the cloud, and you’ve got yourself a RDBMS system you can scale up and down as needed, depending on your current level of load.
That’s pretty powerful and maybe not something a lot of people think of doing with RDBMS systems, mainly due to the way these have tended to operate. Running a database is and has very much been a special case, where care is needed to ensure it’s tuned and maintained as a special snowflake in your overall stack. In companies you find teams of people just dedicated to the database, just because it has needed this special level of attention.
We’ve been autoscaling application servers, batch jobs and calculation engines for a long time. Now is the time to start autoscaling your database, and it should and could be automated. Adding and removing compute nodes, up and down, even within a short time period, should no longer be a problem. You’ve maxed out your current cluster you say? Just add some more.. Oh, you don’t need that much power anymore? Just remove a few nodes. It’s all there and ready for you, just some scripting required..
I believe AWS should embrace this and help us take it to the next level. Give us automated autoscaling, and let us use spot instances if we want to. Who would even think of using spot instances for a database!? Well, I think we can do that now, especially if your use case is of such a type that you can cope with sudden price hikes.