To quote Wikipedia: A Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set.
This is often used by systems in order to avoid accessing slow media, like a disk. Take HBase or Cassandra for instance: Instead of reading their data files in order to figure out if a particular key is present in a file, a file which might be huge, they can first consult their Bloom filters and get a true/false on if the key is likely to be in there. I say likely because a Bloom filter isn’t guaranteed to be correct. More formally: False positives are possible, but false negatives are not.
Given that false negatives can’t happen, the worst we’ll do is read a file we didn’t have to read. But we’re also guaranteed to read the file if the key is present, which is the type of guarantee we need.
Using a similar technique I managed to get almost a 2x performance gain on Oracle Coherence using a custom Bloom filter based index implementation (versus no index), using just a fraction of the memory a standard index instance would use (a few kB at most). But before we conclude on that let’s present a use case and why indexing in an in-memory key-value store makes sense to begin with.
I've been a relatively heavy user of Evernote for a while now. I don't mean heavy as in using all of its features or storing massive notes or anything, simply that my everyday personal habits depend heavily on it. Having a notetaking client that's available from my laptop and my phone, with transparent multi-way synching and frictionless integration and sharing between apps (at least on my phone where it matters the most) is immensely helpful.
In addition to all the small TODO's I store there (like movies to watch, books to read) it was also very helpful in organizing all the little things I needed to fix while writing my dissertation. Another huge use case for me was when I browsed the web on my phone. Sadly, very few webpages give me a good mobile experience. So typically several times per day, say during my lunch or dinner, I would find a webpage I wanted to read, often from Twitter. But given the useless mobile experience I would save it on Evernote for reading later when I got back to my laptop.
In here lies an opportunity, and that's exactly what Spool solves for me. I started using Spool a few days ago, and so far the experience has been excellent. Sure, I had some issues in the beginning, but hey, this is beta software and I'm happy to help out with testing. I don't think I'm using Spool exactly as they targeted it. When discovering their app during TCDisrupt I got the impression the focus was mainly on finding webpages (with or without videos) on your computer, and then store them for offline reading on your mobile device. Absolutely a good use case, and something I for sure will use it for. But mainly I use it for two reasons at the moment:
- I find a webpage on my phone that I want to read, but the mobile experience is horrible. Solution: Sync it to Spool, and within moments you get the same webpage but optimized for my phone. This is where I think my use differs from how they presented it.
- Given that I need to take the tube to and from work everyday, their offline storage is brilliant. London tubes don't have phone coverage, so now I can make good use of my offline time. This of course is very much part of how they presented it.
The only thing I need to find time to explore a bit more is their browser/computer experience. I know it's there and it looks good, just haven't had time to look into it. Sometimes I store a webpage on Spool rather than Evernote since I might find time to read it on my phone. But then I get home and might as well read it on my laptop. In this case I miss Evernote, so I would love for some form of integration between these two apps. But that's a minor thing compared to the benefits it has already given me