OpenGamma is an interesting startup headquartered here in London. They provide open source software for analytics and risk analysis for the financial service industry. As that’s right up my alley I decided to start looking at their code this Friday. Among all that code there’s one project which looked like a good starting point, namely Fudge Proto. I took my usual approach and read what documentation I felt was relevant while browsing the source code at the same time. Experimenting with the code and learning by examples has always worked for me, so in no time I had some code up and running.
In brief you can compare the purpose of Fudge Proto to that of Google Protocol Buffers and Apache Thrift. You have one or more objects and can then easily serialize these, store them, send them over a network, and deserialize them again when needed. The target output format would typically be binary. Binary is beneficial over for instance XML and JSON as it’s more compact. It should generally also be quicker to parse as the format is more fixed with less fuss. Of course it’s not easily human readable, but that’s not the point either.
Fudge Proto and the underlying Fudge Messaging system is fairly young, with version 0.3 released early this year. I wanted to see how it stacks up to Google’s Protocol Buffers. I understand Protocol Buffers is heavily used internally at Google and given Google’s scale I’m sure it’s heavily optimized. Although I’ve dealt with serialization before, and also designed some low level messaging systems for use on very limited devices, I’m not claiming to be an expert on any of these systems. Therefore, should you spot some obvious issues in my experimentations later on, feel free to contact me by email or leave a comment. I’ll be happy to rerun these tests with improvements to both Protocol Buffers and Fudge Proto.
There are some differences in the available data types between the two systems, so for these tests I tried to keep the structures as equal as possible with these limitations in place. I ran my code on a fresh 2GB Rackspace Cloud server, running Ubuntu 11.04 with Java 6 installed, and nothing else of importance running of course. I wanted to see how the systems performed with regards to different message sizes so for the tests I had a structure with children and grandchildren, using most of the data types and with data of different size. How much data to put in was randomly decided, but with all sharing the same seed to make it equal.
Fudge Proto allows you to define your structure to either use named fields or ordinal numbers. Let’s first look at read and write performance and message size for these two cases.
Here we see the raw results + fitted lines for named fields (first one) and ordinal numbers (second one). There are some outliers we should be able to ignore, likely due to GC operations. The results are very much as expected, with the ordinal case being both faster and more space efficient. Also writes are slower than reads, as is typical. Note that when running these tests nothing was stored on disk, so we’re not affected by disk IO speed. Average summary results are:
- Mean read speed: 0.13 sec (named fields) and 0.09 sec (ordinal)
- Mean write speed: 0.19 sec (named fields) and 0.15 sec (ordinal)
As for message size:
- Mean message size of just over 4 MB for named fields
- Mean message size of just over 2.6 MB for the ordinal case
Is also nice to see a linear (as opposed to exponential) increase in time with size, indicating good scalability.
Now lets move over to Protocol Buffers. Again, similar message structure and method of filling this with data.
Again the raw results, with some outliers making it sort of difficult to see the details. It is however clear that Protocol Buffers is more optimized as compared to Fudge Proto. The summary results this time are:
- Mean read speed: 0.017 sec
- Mean write speed: 0.05 sec
- Mean message size: 2.26 MB
All of this combined:
Fudge Proto with ordinal numbers and Protocol Buffers share similar message size properties. This is also expected when you read the relevant encoding specifications. Looking at performance, Protocol Buffers is just over 5 times faster on reads and 3 times faster on writes. I’m not yet fully sure if this is an architectural issue or just low hanging optimization fruits hanging here and there, but I ran a quick profiling session on the ordinal use case and got the snapshot below.
Haven’t had time to look into this further, but it does give some initial indication towards low hanging fruits. [Edit: Having looked closer at the source I think I’ll change my mind on that for now..]
If you’re interested in looking at my source code, it’s available here with the required libraries as well.