Out-of-memory Handling — D-Bus Experience

by havoc

Jeff
started a blog
thread
about handling out-of-memory.

For anyone who’s interested in this, check out D-Bus (or rather, the
libdbus C implementation of D-Bus) for an example of nontrivial code that
attempts to handle out-of-memory.

I would wildly guess that the OOM handling adds 30–40% or so to the
number of lines of code, and thus the size of the library on disk.

There’s also a historical note; I wrote a lot of the code thinking OOM
was handled, then later I added testing of most OOM codepaths (with a
hack to fail each malloc, running the code over and over). I would
guess that when I first added the tests, at least 5% of mallocs were
handled in a buggy way — the handling code crashed, locked up, or
something.

(At one point I applied the same testing strategy to libxml2, and it
was also a crash-fest. Conclusion: if you haven’t tested all OOM
codepaths, they do NOT work, they’re just sitting there causing bloat.)

When adding the tests, I had to change the API in several
cases in order to fix the bugs. For example adding
dbus_connection_send_preallocated() or DBUS_DISPATCH_NEED_MEMORY.

To make OOM handling work, you have to make pretty much every part of
the code transactional — you have to be able to atomically roll back what you
were doing. In the dbus-daemon case, generally this means we roll back
the handling of the current message, then return an error to the
sender of the message.

In a GUI program, I couldn’t even guess what you’d do after the
rollback; you have no memory so can’t open a dialog, not that an
out-of-memory dialog is helpful in the first place. Best I can think
of would be to block for a while then retry the malloc, but that can
be done inside of g_malloc(), so does not require OOM handling.

dbus-daemon was the motivation for OOM handling, since dbus-daemon
can’t crash. As a side effect, though, libdbus allows apps to check
for and handle OOM. This gives us empirical evidence for how many apps
will check for OOM if a library allows it. I’m pretty sure the answer
is zero or close to it, in the libdbus case.

(This post was originally found at http://log.ometer.com/2008–02.html#4.2)