Desktop Task Switching Could Be Improved

by havoc

In honor of GUADEC 2012, a post about desktop UI. (On Linux, though I think some of these points could apply to Windows and OS X.)

When I’m working, I have to stop and think when I flip between two tabs or windows. If I don’t stop and think, I flip to the wrong destination a high percentage of the time. I see this clunkiness every minute or two.

For me to do the most common action (flip between documents/terminals/websites) I may need to use my workspace switch hotkey (Alt+number), app switch (Alt+`), window switch (Alt+Tab), tab switch (Alt+PgUp, Alt+PgDn, C-x-b), or possibly a sequence of these (like change workspace then change window or change window then change tab).

I believe it could be reduced to ONE key which always works.

The key means “back to what I was doing last” and it works whether you were last on a tab, a window, or another workspace. There’s a big drop-off in goodness between:

  • one key that always works
  • two keys to choose from

Once you have two, you have the potential to get it wrong and you have to slow down to think.

Adding more than two (such as the current half-dozen, including sequences) makes it worse. But the big cliff is from one to two.

User model vs. implementation model

Can’t speak for others, but I may have two layers of hierarchy in my head:

  • A project: some real-world task like “file expense report” or “write blog post” or “develop feature xyz”
  • A screen: a window/tab/buffer within the project, representing some document I need to refer to or document I’m creating

The most common action for me is to switch windows/tabs/buffers within a project, for example between the document I’m copying from and the one I’m pasting to, or the docs I’m referring to and the code I’m writing, or whatever it is.

The second most common action for me is to move among projects or start a new project.

Desktop environments give me all sorts of hierarchy unrelated to the model in my head:

  • Workspace
  • Application
  • Window
  • Tab (including idiosyncratic “tabs” like Emacs buffers)
  • Monitor (multihead)

None of these correspond to “projects” or “screens.” You can kind of build a “projects” concept from these building blocks, but I’m not sure the desktop is helping me do so. There’s no way to get a unified view of “screens.”

I don’t know what model other people have in their head, but I doubt it’s as complex as the one the desktop implements.

Not a new problem

I’m using GNOME 3 on Fedora 17 today, but this is a long-standing issue. Back when I was working on Metacity for GNOME 2, we tried to get somewhere on this, but we accepted the existing setup as a constraint (apps, windows, workspaces, etc.) and therefore failed. At litl we spent a long time wrestling with the answer and found something pretty good though perhaps not directly applicable to a regular desktop. I wish I had a good video or link to show for litl’s solution (essentially a zoomable grid of maximized windows, but lots of details matter).

iPhone has simplified things here as well. They combine windows and applications into one. But part of the simplification on iPhone is that it’s difficult to do things that involve more than one “screen” at a time. On a desktop, it wouldn’t be OK to make that difficult.

In GNOME 3, I also use the Windows key to open the overview and pick a window by thumbnail. Some issues with this:

  • It does not include tabs, only windows.
  • In practice, I have to scan all the thumbnails every time to find the one I want.

These were addressed in the litl design:

  • Tabs and windows were the same thing.
  • Windows remained in a stable, predictable location in the overview.
  • The overview was spatially related to the window, that is you were actually zooming in and out, which meant during the animation you got an indication of where you were.
  • I believe you could even click on a window before the zoom in/out animation was complete, though I could be wrong. In any case you could be moving toward it while it was coming onto the screen.

As a result, the litl design was much faster for task switching via overview key plus mouse. If you were repeatedly flipping between two tasks, you could memorize their location in space and find them quickly based on that. If other windows were opened and closed, the remaining ones might slide over, but they’d never reshuffle entirely.

I think GNOME tries to “shrink the windows in their current location” rather than “zoom out”, so it’s trying to have a spatial relationship. A problem is that I have everything maximized (or halfscreen-maximized). “Shrink to current location” ends up as “appears random” when windows don’t have any meaningful relationships on the x/y axes (they’re just in a z-axis stack). (Direction for thought: is there some way maximized windows could be presented as adjacent rather than stacked?)

Overall I vastly prefer Fedora 17 to my previous GNOME 2 setup and I think it’s a step on the path to cleaning this up for good. In the short term, a couple things seem to make the problem worse:

  • The “application” layer of hierarchy (Alt+Tab vs. Alt+`) adds one more way to switch “screens,” though for me this just made an existing problem slightly worse (the bulk of the problem is longstanding and we were already far from one key).
  • The window list on the panel had a fixed order and was always onscreen, so it was faster than the thumbnail overview. I believe the thumbnail overview approach could be fixed; on the litl, for me zoom-out-to-thumbnails was as fast as the window list. The old window list was an ugly kluge (it creates an abstraction/indirection where you have to match up two objects, a button and a window — direct manipulation would be so much better). But its fixed spatial layout made it fast.

GNOME 3 opens the door to improving matters; GNOME 2’s technology (e.g. without animation and compositing) made it hard to implement ideas that might help. GNOME 3 directions like encouraging maximized apps, automatic workspace management, the overview button, etc. may be on the path to the solution.

Can it be improved?

I’ll limit this post to framing the problem and hinting at a couple of directions. I don’t know the right design answer. I’m definitely going to omit speculation on how to implement (for example, getting tabs into the rotation would be possible, but require some implementation heroics).

I know everything is the way it is now for good historical reasons, valid technical and practical constraints, and so on. But I bet there’s a way to get past those with enough effort.

My Twitter account is @havocp.
Interested in becoming a better software developer? Sign up for my email list and I'll let you know when I write something new.