Around 6 years ago I was experimenting and pushing the practical boundaries of what the JVM could handle with regards to heap size.
We had Java 8 at the time, and allocating hundreds of GB of heap space was risky. Lots of time and effort was spend in many IT teams around the world just trying to keep the GC happy with this setup.
Some, often myself included, would recommend we aim for an architecture with several smaller JVMs, each dealing with a more manageable heap size. But such a setup does sometimes bring with it a lot of complexity. While a web application might easily scale horizontally like that, not all application have that luxury.
At the time we didn’t have much choice. One popular approach was going off-heap. That means, managing memory yourself. You could either do that with native code, but then you’d miss out on all the other Java goodness. So, often, you’d be doing it through serialization and off-heap storage, like what I did in the Binary off-heap hashmap, and blogged about as well in my “Only the Good Die Young (or Move Off-heap?)” post.
So what’s the story these days? Are we still have these types of issues? It’s no longer uncommon to have 10, 20, 50, or maybe even 100 GB or more available as heap to the JVM. And with big data analysis we’d often like to easily be able to use all this space.
While there is some specific difference among the two, both allow us to allocate massive heaps, many hundreds of GBs, and have hard guarantees about maximum pause times. These hard guarantees not only make it possible to run big data analysis with large heaps, but they also lend themselves well towards real-time applications.
Would I, with these, still recommend going off-heap? Probably not. Because going off-heap adds complexity, and also CPU cost for serialization and deserialization each time you’re crossing that boundary. And that’s a cost you’d like to avoid whenever possible.