A look at Fudge Proto

By | July 26, 2011

OpenGamma is an interesting startup headquartered here in London. They provide open source software for analytics and risk analysis for the financial service industry. As that’s right up my alley I decided to start looking at their code this Friday. Among all that code there’s one project which looked like a good starting point, namely Fudge Proto. I took my usual approach and read what documentation I felt was relevant while browsing the source code at the same time. Experimenting with the code and learning by examples has always worked for me, so in no time I had some code up and running.

In brief you can compare the purpose of Fudge Proto to that of Google Protocol Buffers and Apache Thrift. You have one or more objects and can then easily serialize these, store them, send them over a network, and deserialize them again when needed. The target output format would typically be binary. Binary is beneficial over for instance XML and JSON as it’s more compact. It should generally also be quicker to parse as the format is more fixed with less fuss. Of course it’s not easily human readable, but that’s not the point either.

Fudge Proto and the underlying Fudge Messaging system is fairly young, with version 0.3 released early this year. I wanted to see how it stacks up to Google’s Protocol Buffers. I understand Protocol Buffers is heavily used internally at Google and given Google’s scale I’m sure it’s heavily optimized. Although I’ve dealt with serialization before, and also designed some low level messaging systems for use on very limited devices, I’m not claiming to be an expert on any of these systems. Therefore, should you spot some obvious issues in my experimentations later on, feel free to contact me by email or leave a comment. I’ll be happy to rerun these tests with improvements to both Protocol Buffers and Fudge Proto.

There are some differences in the available data types between the two systems, so for these tests I tried to keep the structures as equal as possible with these limitations in place. I ran my code on a fresh 2GB Rackspace Cloud server, running Ubuntu 11.04 with Java 6 installed, and nothing else of importance running of course. I wanted to see how the systems performed with regards to different message sizes so for the tests I had a structure with children and grandchildren, using most of the data types and with data of different size. How much data to put in was randomly decided, but with all sharing the same seed to make it equal.

Fudge Proto allows you to define your structure to either use named fields or ordinal numbers. Let’s first look at read and write performance and message size for these two cases.

Here we see the raw results + fitted lines for named fields (first one) and ordinal numbers (second one). There are some outliers we should be able to ignore, likely due to GC operations. The results are very much as expected, with the ordinal case being both faster and more space efficient. Also writes are slower than reads, as is typical. Note that when running these tests nothing was stored on disk, so we’re not affected by disk IO speed. Average summary results are:

  • Mean read speed: 0.13 sec (named fields) and 0.09 sec (ordinal)
  • Mean write speed: 0.19 sec (named fields) and 0.15 sec (ordinal)

As for message size:

  • Mean message size of just over 4 MB for named fields
  • Mean message size of just over 2.6 MB for the ordinal case

Is also nice to see a linear (as opposed to exponential) increase in time with size, indicating good scalability.

Now lets move over to Protocol Buffers. Again, similar message structure and method of filling this with data.

Again the raw results, with some outliers making it sort of difficult to see the details. It is however clear that Protocol Buffers is more optimized as compared to Fudge Proto. The summary results this time are:

  • Mean read speed: 0.017 sec
  • Mean write speed: 0.05 sec
  • Mean message size: 2.26 MB

All of this combined:

It’s easy to miss read that graph, so to make it a bit easier to compare the next one focuses on time only (with scaled message size).

Fudge Proto with ordinal numbers and Protocol Buffers share similar message size properties. This is also expected when you read the relevant encoding specifications. Looking at performance, Protocol Buffers is just over 5 times faster on reads and 3 times faster on writes. I’m not yet fully sure if this is an architectural issue or just low hanging optimization fruits hanging here and there, but I ran a quick profiling session on the ordinal use case and got the snapshot below. Haven’t had time to look into this further, but it does give some initial indication towards low hanging fruits. [Edit: Having looked closer at the source I think I’ll change my mind on that for now..]

If you’re interested in looking at my source code, it’s available here with the required libraries as well.

4 thoughts on “A look at Fudge Proto

  1. Kirk Wylie

    Hi, Christian,

    Thanks for writing this up. While I’m not surprised that Fudge is slower than Protobuf, I’m a little surprised at the differential, and we’re looking into it.

    One thing to point out though is that Fudge is a bit of a different beast from Protobuf. It has a few features that we viewed as critical, and so we were willing to sacrifice a bit of size and performance to get them:
    – While binary, Fudge messages are fully self-describing; you don’t need the schema to extract the data out of them, while a Protobuf encoded message without knowledge of the schema is completely opaque.
    – Because they’re self describing, you can do very complex semi-structured messages in Fudge in a very natural way. This happens all the time in financial services applications (particularly market data, where you may never know the full set of field/value pairs provided by a market data provider).
    – Fudge as a meta-encoding system allows easy translation (even in an intermediary node in a network graph which doesn’t have any knowledge of the schema) to other encoding formats (like XML or JSON). We’ve used this capability in the OpenGamma Platform to allow us to code against Fudge, but provide XML and JSON endpoints to every part of the system.

    Have no fear, we’ll be looking at the benchmark code and seeing where we can try to improve the performance differential!

    Kirk Wylie

    1. Christian Felde

      Hi Kirk,

      Yes, realize the origins between the two are a bit different. However, the data format compactness seams quite similar so I’m sure there’s plenty of room to improve the performance should there be a need for that.

  2. Stephen Colebourne (OpenGamma)

    The primary difference here appears to be what the generated code is. Protobuf generates code that writes directly to bytes, whereas the generated Fudge proto code converts the object to the intermediate FudgeField/FudgeMsg object structure, which is then passed to Fudge itself to send as bytes. This extra step obviously has overhead, but allows techniques of transforming code to XML/JSON and so on without going to bytes and back. It is of course possible to use Fudge itself (not Fudge proto) to just write data, such as with FudgeMessageWriter and even lower level elements. There is of course the potential in the future to refactor Fudge proto to generate byte-writing classes if this proves to be the performance limiting factor in the whole system. Thanks for taking a look at Fudge!

    1. Christian Felde

      Ran a test on the ordinal case with and without the String data (empty string = without). Writing with string content is almost 1.5x slower then without string content, reading about 1.1x. So a quick win might be to look into that field type specifically (to avoid too much refactoring short term).

Comments are closed.