Performance: Rackspace Cloud vs Amazon Web Services

Performance related posts and articles seem to be rather popular, with my Java vs C++ post being one of the big traffic drivers on this blog. Now it’s time to do another one, with a very specific use case in mind.

I’m into the third and final term for my masters in quantitative finance. As part of this term I’m doing a dissertation and the topic is basically high frequency trading strategies. So what I’m doing is looking at a relatively large amount of data and applying different trading strategies to this data. The data consists of 1 minute OHLC bars + volume for all stocks in the S&P 100 going back about 10 years.

Uncompressed and stored as comma separated text files this constitutes about 6 GB of data, and when imported to a database this is just under 118 million rows, each row being a one minute bar for one ticker. I don’t feel like torturing my laptop with all the analysis I’m running the coming months, and its also only got two cores so it would be beneficial to “outsource” this to external servers.

Luckily, in these “cloud computing” times, gaining short term access to X number of servers is easy, and also reasonably priced. But who should I choose? There’s two dominating players I’m going to evaluate: Rackspace Cloud and Amazon Web Services.

When running the tests I’ve been using Rackspace’s UK data center and Amazon’s Irish data center, simply because I’m based in London and those data centers are probably a bit less crowded than some of their US based. Both of them provide different configurations, Amazon being the most flexible in this area. I’m going to need persistant disk storage, so for Amazon I’m using EBS for the disks. For Rackspace, I’m using the 8GB server, and for Amazon the Large Instance (7.5GB). This is more than what I need; 2GB will probably be enough, 4GB for sure. But Amazon doesn’t have a 4 GB server. And more importantly, price-wise they are similar with the Amazon server being a bit cheaper, but you’ll have to pay extra for the EBS disk and IO operations.

And this is where the first big difference shows. From my understanding, Rackspace uses RAID 10 with locally located hard drives. This of course removes some of your flexibility, but this is not flexibility I need anyway. EBS on the other hand is network attached storage. So the first test is raw write speed. Using the dd command as shown below, running it three times in a row on each server gave the following results:

dd if=/dev/zero of=./dd.test bs=10M count=500

That’s a massive difference, Rackspace having an average write speed of 290 MB/s and Amazon only 81 MB/s. Both servers are running Ubuntu 10.04 LTS, with all patches applied. No other tuning changes have been performed, so it’s a stock OS. Still, don’t think any tuning changes on the Amazon server would be able to keep up with the Rackspace server anyway, with this massive difference.

This also clearly shows when I imported a subset (about 14.5 million rows) of the data from files to a MySQL database. Both servers were running MySQL 5.1.41-3ubuntu12.10 with no configuration changes. The database engine used for the table in both cases was InnoDB, and a simple Java program was used to read the files and insert them into the DB. The Java version info on both server:

java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)


In this case Rackspace performed the import in 6146 seconds with the Amazon server doing the same job in 15026 seconds, or in other words over twice as slow. Not too surprising given the difference in write speed.

While the imports were running I monitored the servers and could see that a lot of performance was lost on the Amazon server due to io wait. This wouldn’t be an equally big issue for my next test which is CPU bound instead. It’s a single threaded test, so what I’m looking at here is single core performance. What the test does is load each day of intraday data for each ticker, one at a time, then with this data in memory perform a specific trading strategy with a set of parameters on that day. Also, as there’s bootstrapping involved, each set of data is regenerated 1000 times (all in memory, so no disk or DB activity for this).

Here the results are more even, with the Rackspace server performing the test in 6639 seconds and the Amazon server the same in 7025 seconds. I was considering running the same test on a “High-CPU Extra Large Instance” on Amazon as well. The write speed should be about the same according to Amazon’s own details, but it’s got a .5 increase in EC2 Compute Units pr core. But I didn’t bother, as it probably only would have put it on par with the Rackspace server, but still suffering in write speed.

So my choice is rather simple: Rackspace Cloud.


8 replies on “Performance: Rackspace Cloud vs Amazon Web Services”

Did you consider using the ephemeral disk you get included with your instance? It survives reboots, not termination – but you can leave the data backed up on S3. Did you consider using RDS for your MySQL? Did you consider an HPC or GPU node from AWS if it is all about IO and CPU performance? Does your data analysis lend itself to Map Reduce and hence the AWS EMR service?

Ephemeral disk: No, I need persistance and the ability to clone the server
RDS: No, that would drive the cost of AWS up and would not have made it a fair comparison
HPC/GPU: My code doesn’t use the GPU, and again, it’s a comparable cost comparison
Map Reduce: To some extent on some of the analysis, but not on all of them. The task scheduler is nice, but not a killer feature..

Certainly, but your comparison needs to be normalised to cpu cycles per dollar or some such measure. Then we can perhaps see the different compute classes available. Then if you eliminate the IO bottleneck by re-engineering how does the $/cpu cycle look then etc. You could then even compare it to buying a powerful workstation with its costs amortised over the duration of your studies and then we’d see if cloud was even a good choice.

But if your initial ambition was to run your analysis very fast, then using less compute hours of HPC might be very cost effective.

This test therefore doesn’t carry much information but it does confirm again that EBS isn’t fast.

Absolutely, and it would of course be interesting to see how big a difference “Very High” vs “High” IO performance makes on the EBS performance..

Then again, this is a three month project. Spending time re-engineering (moving stuff too and from disk and S3) + time spent on the import/export for each server boot also adds a significant amount of time/cost.

I would like to see far more maths in technology to help us all get past the marketing 🙂

Comments are closed.