Busy wait and queue performance
For the last month or so I've been rather busy developing a trading platform for algorithmic trading, connected to LMAX (you should also check out their very interesting Disruptor framework). It's a rather comprehensive solution with both risk management, position management, back testing, data management on tick level, etc built in. Everything that connects to and communicates with the exchange is event based, basically on a topic based setup. So one event producer may have zero or more event consumers.
I've really enjoyed finally having a proper concurrent project to work on, poking around the Java concurrency source code in order to find a few tips and tricks as to how I can squeeze out a bit more performance. It's not that the trading I'll be doing is that high frequent per say, it's more about eliminating latency. Now, of course, most of the computational power will be needed in the actual business logic as that's where the heavy lifting happens. But since the event system that glues everything together is so omnipresent, having it perform well is also very important. Analyzing the spacing between each tick for presumably one of their busiest instruments, EUR/USD, indicates that the most frequent observation is between 1 and 10 milliseconds, so that gives me a rough estimate as to how quick the consumers need to be. If they can't keep up on average the events will begin queuing up, which is very bad.
For the event system itself, delivery of one event from a producer to a consumer is on average 170 microseconds, so that basically leaves the consumer with almost 100% of the time (given that the exchange connected producers are very-to-fairly lightweight). I'm not really sure where to do further optimizations along my critical paths, so I might need to go crazy and do manual memory management or something (with the Unsafe class), but I doubt I'll need it. I think most of the quick win gains currently is at the network/IO level.
Anyway, I digress. Because the point is that while performance testing various parts I got some conflicting evidence as to which concurrent queue implementation to use. So I decided to do a somewhat more comprehensive test while still running it in an isolated environment. There are three factors at play here: Do we have a slow or quick producer, slow or quick consumer, and finally are we using busy wait loops when waiting for events? A slow producer or consumer would typically be slow if it needs to wait for IO, so that's simulated here with a Thread.sleep(1) call.
From JavaDoc we know:
- A
ConcurrentLinkedQueueis an appropriate choice when many threads will share access to a common collection. This queue does not permit null elements. ArrayBlockingQueueis a classic "bounded buffer", in which a fixed-sized array holds elements inserted by producers and extracted by consumers. This class supports an optional fairness policy for ordering waiting producer and consumer threadsLinkedBlockingQueuetypically have higher throughput than array-based queues but less predictable performance in most concurrent applications.
A look at Fudge Proto
OpenGamma is an interesting startup headquartered here in London. They provide open source software for analytics and risk analysis for the financial service industry. As that's right up my alley I decided to start looking at their code this Friday. Among all that code there's one project which looked like a good starting point, namely Fudge Proto. I took my usual approach and read what documentation I felt was relevant while browsing the source code at the same time. Experimenting with the code and learning by examples has always worked for me, so in no time I had some code up and running.
In brief you can compare the purpose of Fudge Proto to that of Google Protocol Buffers and Apache Thrift. You have one or more objects and can then easily serialize these, store them, send them over a network, and deserialize them again when needed. The target output format would typically be binary. Binary is beneficial over for instance XML and JSON as it's more compact. It should generally also be quicker to parse as the format is more fixed with less fuss. Of course it's not easily human readable, but that's not the point either.
Fudge Proto and the underlying Fudge Messaging system is fairly young, with version 0.3 released early this year. I wanted to see how it stacks up to Google's Protocol Buffers. I understand Protocol Buffers is heavily used internally at Google and given Google's scale I'm sure it's heavily optimized. Although I've dealt with serialization before, and also designed some low level messaging systems for use on very limited devices, I'm not claiming to be an expert on any of these systems. Therefore, should you spot some obvious issues in my experimentations later on, feel free to contact me by email or leave a comment. I'll be happy to rerun these tests with improvements to both Protocol Buffers and Fudge Proto.
Performance: Rackspace Cloud vs Amazon Web Services
Performance related posts and articles seem to be rather popular, with my Java vs C++ post being one of the big traffic drivers on this blog. Now it's time to do another one, with a very specific use case in mind.
I'm into the third and final term for my masters in quantitative finance. As part of this term I'm doing a dissertation and the topic is basically high frequency trading strategies. So what I'm doing is looking at a relatively large amount of data and applying different trading strategies to this data. The data consists of 1 minute OHLC bars + volume for all stocks in the S&P 100 going back about 10 years.
Uncompressed and stored as comma separated text files this constitutes about 6 GB of data, and when imported to a database this is just under 118 million rows, each row being a one minute bar for one ticker. I don't feel like torturing my laptop with all the analysis I'm running the coming months, and its also only got two cores so it would be beneficial to "outsource" this to external servers.
Luckily, in these "cloud computing" times, gaining short term access to X number of servers is easy, and also reasonably priced. But who should I choose? There's two dominating players I'm going to evaluate: Rackspace Cloud and Amazon Web Services.