A Sequential, Actor-like API for Server-side JavaScript
Idea: how a JavaScript request handler could look
When I saw node.js, we’d been writing a huge pile of JavaScript code for litl based on gjs (gjs is a custom-built JavaScript runtime, created for native desktop/mobile/consumer-gadget apps rather than server-side code).
At litl, we use a syntax for asynchronicity invented by C. Scott Ananian – who had to talk me into it at some length, but now I’m glad he did. In this syntax, a function that needs to do asynchronous work is a “promise generator”; it passes promises out to a driver loop using the “yield” keyword. The driver loop resumes the async function when the promise is completed. There’s a bug to add this to gjs upstream.
Here’s how a web request handler which relies on a backend service and a cache service might look:
var someBackendService = require('someBackendService'); var someCacheThing = require('someCacheThing'); var id = request.queryParams['id']; var promiseOfFoo = someCacheThing.lookupFoo(id); // yield saves continuation and suspends var foo = yield promiseOfFoo; if (!foo) { promiseOfFoo = someBackendService.fetchFoo(id); // again suspend while waiting on IO foo = yield promiseOfFoo; var promiseOfSaveFoo = someCacheThing.storeFoo(id, foo); // wait for cache to complete, in case there's an exception yield promiseOfSaveFoo; } // this write would be async via event loop also of course request.response.write(foo);
You see the idea. There are potentially three asynchronous requests here to serve the incoming request. While waiting on them, we don’t use up a thread – we return to the event loop. But, the code is still sequential and readable.
In Java, you can do this with Kilim for example. You can get the same performance effect, with less syntactic help, using Jetty Continuations. I’m not claiming this is a new idea or anything. But more frameworks, including node.js, could work this way. The abstraction might be pretty nice in desktop apps as well.
If you aren’t familiar with it, “yield” is how JavaScript supports continuations. See this page on Mozilla Developer Network. The HTTP handler in the above example would be implicitly enclosed in a generator function that generates promises. The framework would generate a promise, wait in the event loop until the promise was completed, then resume the generator. When the generator resumes, its yield statements either evaluate to the value of the promise, or they throw an exception.
I believe Scala/Akka actors could use a style like this now that Scala 2.8 has continuation support. I’m not sure whether anyone has tried that yet. With continuations, a pipeline of actors could be replaced by a single actor that suspends itself whenever it’s waiting on a future.
Why
Each request is a suspendable actor that can save a continuation whenever it’s waiting on the event loop.
- No callback spaghetti, just clear concise syntax and sequential code.
- Threads aren’t visible in JavaScript; thread bugs are only possible in native code modules.
- A bunch of actors like this will automatically max out all cores. Whether you’re doing IO or computation, the right thing happens.
- Rather than “worker threads” you can just have more actors, using the same thread pool and event loop.
You just write code. As long as the code doesn’t block on IO, the framework uses the CPU as efficiently as possible. No need to jump through hoops. Even if the code does block on IO or does a long computation, you can get usable results if you allow the thread pool to add threads.
According to the node.js home page, they aren’t open to adding threads to the framework; node.js also removed promises (aka futures or deferreds). It may still be possible to implement this syntax in node.js as a third-party module, however – I’m not sure.
I agree shared state across threads in JavaScript would be bad, but I’d love to see threads on the framework level.
- It’s easier for people using the framework if they only have to mess with one process.
- The JavaScript level would have less API because worker threads would be replaced by a more general idea of an actor.
- The opportunities for fast message-passing and smart load balancing between request handlers (or more generally, actors) would be increased.
- In short the framework could do more stuff for you, so application code Just Works.
Partial implementation
I started on some code to try out this idea. I decided not to keep going on it for now, so I’m posting the incomplete work just in case someone’s interested or finds it useful. The license is MIT/BSD-style. You’re welcome to fork on github or just cut-and-paste, whatever you find useful.
When plotting how I’d implement the above request-handling code, I wasn’t familiar with actors (an idea from Erlang, taken up by Kilim, Jetlang, Scala, etc.). I ended up re-inventing the idea of a code module, which I called a Task, which would always run in a single thread, but would not be bound to a particular thread or share state with other threads. I didn’t come up with the mailboxes-and-messages idea found in existing actor frameworks, though, so that isn’t implemented.
The most potentially-useful part of the code I have so far is a C GObject called Task; this object is pretty much an actor. You could also think of it as a collection of related event watchers where the event handlers never run concurrently.
My code is based on GLib, libev, SpiderMonkey, and http-parser.
The code falls short of the http request handler described above. I have a good implementation with test coverage of a Task object in C, with a GLib-style API. This may well be useful already for C developers. There’s a lot left to be done though as described in the README.
node.js couldn’t use the code I have here, unfortunately. I didn’t patch node.js because I thought V8 lacked generators – apparently that was wrong – and because node.js upstream has stated opposition to both threads and promises. And I was already familiar with GLib/SpiderMonkey but not V8. If I really wanted to use an API like this in production, building it as a module on top of node.js would probably be logical. An issue to be overcome would be any unlocked global state in the node.js core. I’m not sure what else would be involved.
What’s not implemented
To run the hypothetical HTTP request handler above, you’d have to add some large missing pieces to my code:
- HTTP. The lovely http-parser from node.js is in the code tree, but after parsing there has to be code to handle things like chunking. i.e. it needs to implement HTTP, not just HTTP parsing.
- a JavaScript platform. A module system and a way to write native-code modules.
- some simple http container stuff, such as a convention to map URL paths to a tree of JS files, auto-reloading changed JS files, and executing the JS handlers assuming they contain a generator that will yield promises back to the main loop
- Unlike most actor implementations, I haven’t done anything with message passing among actors. I don’t think it’s even necessary for the web request case, but it would make the framework more useful and general.
See the README for more details.
How it works, for desktop developers who know GLib
If you already understand the GLib main loop or similar, here’s what my code adds:
- Event callbacks are invoked by a thread pool, rather than the GMainContext thread
- All callbacks belonging to the same actor are serialized, so we don’t run the same actor on two threads at once
- Actors automatically disappear when they don’t have any event sources remaining
As long as the actors (which you can think of as groups of main loop sources) don’t share any state, their code doesn’t have to be thread-safe in any way.
In GTK+ programming, it’s discouraged to write sequential code by recursively blocking on the main loop whenever an event is pending. There are two key differences between suspending an actor until the next event, and a recursive main loop:
- the stack doesn’t recurse, so you don’t have unbounded stack growth (or weird side effects such as inability to quit an outer loop until the inner one also quits).
- because actors don’t have shared state, you don’t care about the fact that random event handlers can run; those only affect other actors. In a typical GTK+ app on the other hand, recursing the main loop causes reentrancy bugs due to shared state between your code and stuff that might run in another event handler.
My implementation uses GMainContext “outside” of the actor pool (so things look like a regular GLib API) but there’s a choice of GMainContext or libev “inside” the actor pool. GMainContext can’t be used directly from actors. Unfortunately, GMainContext doesn’t perform well enough for server-side applications, for example this bug, and actors need a custom API to add event sources in any case because the sources have to be associated with the actor.
Have fun!
This is just a code doodle, figured I should put it out there since I spent time on it. I hope the code or the ideas are useful to someone.
The Task (aka actor) implementation is pretty solid though, I believe, and I’d encourage trying it out if you have an appropriate application.