Synchronous IO Never OK

by havoc

“If synchronous IO becomes a problem, it can be made asynchronous later.” Tempting to imagine that some operations on local files are “fast enough” to implement with synchronous IO. “Premature optimization is the root of all evil,” right?

Wishful thinking. Async vs. sync IO is not a performance issue (in fact synchronous IO can be faster). It is a structural or qualitative issue, a design issue. Sync IO is guilty until proven innocent.

My new rule is: any code that will be blocking during a user-interactive main loop must run in a known-short time. The task must be known short, not indefinite-but-often-short.

Local file access could be short, but is not known short. Another common culprit: synchronous D-Bus calls.

I’ve seen so many rewrites from synchronous to asynchronous IO over the years — it is almost 100% likely that anything blocking the main loop eventually comes to be seen as a bug. There’s near-zero chance it’s really OK to use those nice, easy-to-program blocking D-Bus calls, or that nice, simple g_file_get_contents(). Even the harmless looking g_file_test() can bite.

These APIs exist only to tempt you. They are the dark path.

(Note: by “asynchronous” I mean “the main loop is not blocking, for example because IO is in its own thread” — I don’t mean special AIO system calls.)

Here’s the core issue: the UI’s main loop needs to wake up at 30–60 frames per second to do animations and repaints, and it should respond to user input with similar speed. It needs to do this consistently (think real-time-like), not “on average” — if there are big outliers, like an occasional quarter-second delay in frame rate, the app will feel sluggish.

“Local” IO can be slower than you think; the firefox fsync problem shows one extreme case (the whole kernel gets bogged down, not just the IO operation), but think about network file shares, or large file sizes, or systems under load. Even mild manifestations of these issues are visible in decreased user responsiveness and animation smoothness.

It’s very common for supposedly fast IO operations to end up batched together, creating a long delay at once; for example, calling stat() on everything in a directory, or queuing a bunch of idle handlers that all happen to kick in at once.

Another Firefox example: try uploading a bunch of photos to Flickr in one go. Ouch. The files are local, but … there are a lot of them and they are big. Pretty sure Firefox uses synchronous IO here.

Enough “this local IO should be fast enough” laziness scattered around an app creates distributed bloat that’s hard to pin down and doesn’t show up very well in profiles because it depends on external circumstances.

Not only is it hard to find later, synchronous code is hard to fix — it means rewriting.

Bottom line: don’t write this code in the first place. Using async APIs is not premature optimization, it’s correct design.

(This post was originally found at http://log.ometer.com/2008–09.html#7)