Brussels / 30 & 31 January 2016


Hunting the bug from Hell

Imagine you receive a report of a bug which causes a segfault (i.e. a memory exception) in the HotSpot Java VM very occasionally. This bug only occurs about one time in ten; it might take many hours before it happens. It has only been observed when running one particular piece of proprietary software. It only seems to occur on one large machine when running many concurrent threads and a huge heap. There is a clue: it only seems to happen when running the parallel scavenge garbage collector. But you've no real idea of how the garbage collector works.

This happened to me in 2015. I'm used to being able to find and fix bugs, but this one resisted all of my attempts for a long time. As far as I can recall it's the most difficult bug I've ever fixed. In hunting this bug I used every tool available to me as a HotSpot developer. I'll describe them and how they were used.

We don't much talk about debugging. Many of us spend much of our time doing it, but not talking or writing about this important activity. We should talk about it more. If there's time it would be good to swap war stories.


Andrew Haley