A Sequential, Actor-like API for Server-side JavaScript : Havoc's Blog

this blog contains blog posts

A Sequential, Actor-like API for Server-side JavaScript

by havoc

Idea: how a JavaScript request handler could look

When I saw node.js, we’d been writing a huge pile of JavaScript code for litl based on gjs (gjs is a custom-built JavaScript runtime, created for native desktop/mobile/consumer-gadget apps rather than server-side code).

At litl, we use a syntax for asynchronicity invented by C. Scott Ananian – who had to talk me into it at some length, but now I’m glad he did. In this syntax, a function that needs to do asynchronous work is a “promise generator”; it passes promises out to a driver loop using the “yield” keyword. The driver loop resumes the async function when the promise is completed. There’s a bug to add this to gjs upstream.

Here’s how a web request handler which relies on a backend service and a cache service might look:

var someBackendService = require('someBackendService');
var someCacheThing = require('someCacheThing');

var id = request.queryParams['id'];

var promiseOfFoo = someCacheThing.lookupFoo(id);

// yield saves continuation and suspends
var foo = yield promiseOfFoo;

if (!foo) {
    promiseOfFoo = someBackendService.fetchFoo(id);

    // again suspend while waiting on IO
    foo = yield promiseOfFoo;

    var promiseOfSaveFoo = someCacheThing.storeFoo(id, foo);

    // wait for cache to complete, in case there's an exception
    yield promiseOfSaveFoo;
}

// this write would be async via event loop also of course
request.response.write(foo);

You see the idea. There are potentially three asynchronous requests here to serve the incoming request. While waiting on them, we don’t use up a thread – we return to the event loop. But, the code is still sequential and readable.

In Java, you can do this with Kilim for example. You can get the same performance effect, with less syntactic help, using Jetty Continuations. I’m not claiming this is a new idea or anything. But more frameworks, including node.js, could work this way. The abstraction might be pretty nice in desktop apps as well.

If you aren’t familiar with it, “yield” is how JavaScript supports continuations. See this page on Mozilla Developer Network. The HTTP handler in the above example would be implicitly enclosed in a generator function that generates promises. The framework would generate a promise, wait in the event loop until the promise was completed, then resume the generator. When the generator resumes, its yield statements either evaluate to the value of the promise, or they throw an exception.

I believe Scala/Akka actors could use a style like this now that Scala 2.8 has continuation support. I’m not sure whether anyone has tried that yet. With continuations, a pipeline of actors could be replaced by a single actor that suspends itself whenever it’s waiting on a future.

Why

Each request is a suspendable actor that can save a continuation whenever it’s waiting on the event loop.

No callback spaghetti, just clear concise syntax and sequential code.
Threads aren’t visible in JavaScript; thread bugs are only possible in native code modules.
A bunch of actors like this will automatically max out all cores. Whether you’re doing IO or computation, the right thing happens.
Rather than “worker threads” you can just have more actors, using the same thread pool and event loop.

You just write code. As long as the code doesn’t block on IO, the framework uses the CPU as efficiently as possible. No need to jump through hoops. Even if the code does block on IO or does a long computation, you can get usable results if you allow the thread pool to add threads.

According to the node.js home page, they aren’t open to adding threads to the framework; node.js also removed promises (aka futures or deferreds). It may still be possible to implement this syntax in node.js as a third-party module, however – I’m not sure.

I agree shared state across threads in JavaScript would be bad, but I’d love to see threads on the framework level.

It’s easier for people using the framework if they only have to mess with one process.
The JavaScript level would have less API because worker threads would be replaced by a more general idea of an actor.
The opportunities for fast message-passing and smart load balancing between request handlers (or more generally, actors) would be increased.
In short the framework could do more stuff for you, so application code Just Works.

Partial implementation

I started on some code to try out this idea. I decided not to keep going on it for now, so I’m posting the incomplete work just in case someone’s interested or finds it useful. The license is MIT/BSD-style. You’re welcome to fork on github or just cut-and-paste, whatever you find useful.

When plotting how I’d implement the above request-handling code, I wasn’t familiar with actors (an idea from Erlang, taken up by Kilim, Jetlang, Scala, etc.). I ended up re-inventing the idea of a code module, which I called a Task, which would always run in a single thread, but would not be bound to a particular thread or share state with other threads. I didn’t come up with the mailboxes-and-messages idea found in existing actor frameworks, though, so that isn’t implemented.

The most potentially-useful part of the code I have so far is a C GObject called Task; this object is pretty much an actor. You could also think of it as a collection of related event watchers where the event handlers never run concurrently.

My code is based on GLib, libev, SpiderMonkey, and http-parser.

The code falls short of the http request handler described above. I have a good implementation with test coverage of a Task object in C, with a GLib-style API. This may well be useful already for C developers. There’s a lot left to be done though as described in the README.

node.js couldn’t use the code I have here, unfortunately. I didn’t patch node.js because I thought V8 lacked generators – apparently that was wrong – and because node.js upstream has stated opposition to both threads and promises. And I was already familiar with GLib/SpiderMonkey but not V8. If I really wanted to use an API like this in production, building it as a module on top of node.js would probably be logical. An issue to be overcome would be any unlocked global state in the node.js core. I’m not sure what else would be involved.

What’s not implemented

To run the hypothetical HTTP request handler above, you’d have to add some large missing pieces to my code:

HTTP. The lovely http-parser from node.js is in the code tree, but after parsing there has to be code to handle things like chunking. i.e. it needs to implement HTTP, not just HTTP parsing.
a JavaScript platform. A module system and a way to write native-code modules.
some simple http container stuff, such as a convention to map URL paths to a tree of JS files, auto-reloading changed JS files, and executing the JS handlers assuming they contain a generator that will yield promises back to the main loop
Unlike most actor implementations, I haven’t done anything with message passing among actors. I don’t think it’s even necessary for the web request case, but it would make the framework more useful and general.

See the README for more details.

How it works, for desktop developers who know GLib

If you already understand the GLib main loop or similar, here’s what my code adds:

Event callbacks are invoked by a thread pool, rather than the GMainContext thread
All callbacks belonging to the same actor are serialized, so we don’t run the same actor on two threads at once
Actors automatically disappear when they don’t have any event sources remaining

As long as the actors (which you can think of as groups of main loop sources) don’t share any state, their code doesn’t have to be thread-safe in any way.

In GTK+ programming, it’s discouraged to write sequential code by recursively blocking on the main loop whenever an event is pending. There are two key differences between suspending an actor until the next event, and a recursive main loop:

the stack doesn’t recurse, so you don’t have unbounded stack growth (or weird side effects such as inability to quit an outer loop until the inner one also quits).
because actors don’t have shared state, you don’t care about the fact that random event handlers can run; those only affect other actors. In a typical GTK+ app on the other hand, recursing the main loop causes reentrancy bugs due to shared state between your code and stuff that might run in another event handler.

My implementation uses GMainContext “outside” of the actor pool (so things look like a regular GLib API) but there’s a choice of GMainContext or libev “inside” the actor pool. GMainContext can’t be used directly from actors. Unfortunately, GMainContext doesn’t perform well enough for server-side applications, for example this bug, and actors need a custom API to add event sources in any case because the sources have to be associated with the actor.

Have fun!

This is just a code doodle, figured I should put it out there since I spent time on it. I hope the code or the ideas are useful to someone.

The Task (aka actor) implementation is pretty solid though, I believe, and I’d encourage trying it out if you have an appropriate application.

My Twitter account is @havocp.

Interested in becoming a better software developer? Sign up for my email list and I'll let you know when I write something new.

Published: November 28, 2010

Filed Under: Uncategorized

Tags: tech

26 Responses to “A Sequential, Actor-like API for Server-side JavaScript”

Joe Shaw says:

November 28, 2010 at 2:25 pm

I was also under the impression that v8 didn’t support generators. Have a link for that? This thread from Feb 2009 implies that they weren’t going to add them, at least until JavaScriptCore did:

http://markmail.org/message/ju7h5gfr7rrj3t6m

Reply
Tack says:

November 28, 2010 at 3:01 pm

This approach is indeed very convenient. I’ve been doing it in Python for some time.

See http://doc.freevo.org/api/kaa/base/async/coroutines.html

Reply
Aria Stewart says:

November 28, 2010 at 3:27 pm

The downsides are that you have to store relatively large amounts of state for each yield — an entire set of closures that might otherwise be freed if you let things be divided into whole functions.

It’s a neat and elegant way to do concurrent programming though!

Reply
havoc says:

November 28, 2010 at 3:44 pm

@Joe that link is why I thought V8 didn’t have generators, but then I recently saw something saying offhand that it does have them. But I can’t remember where that was and google turns up nothing. So to know for sure I guess I’d have to be non-lazy and get the V8 code and look 😛

@Aria yep I’m not sure how the memory and CPU benchmarks would look. One reason to finish prototyping out this code would be to quantify some of the pros and cons. I don’t think the per-request stuff in my initial code is huge – the SpiderMonkey context is per-thread, not per-request – but there is a global object and scope chain per request for sure.

Reply
Nicolas Trangez says:

November 28, 2010 at 4:00 pm

Coroutine-based concurrency using ‘yield’-type continuations have been commonly used in the Python world for a while, see e.g. Diesel, Eventlet, Cogen or even Twisted when using the twisted.internet.defer.inlineCallbacks decorator.

It provides a nice interface to concurrency indeed.

Reply
havoc says:

November 28, 2010 at 5:03 pm

btw, obviously this is not some original discovery never before known … but it’s missing by default in a lot of frameworks including three I find interesting (node.js, Scala, and the GObject stack). Scala actors are a nice model, but I’d love them to go just this one step further and let me write a sequential message handler without tying up a thread. Similarly for node.js, the event loop is nice, but why not go this one step further and just deal with using all the CPU cores for me, avoiding the callback tangle.

Reply
- Nicolas Trangez says:
  
  November 28, 2010 at 6:40 pm
  
  There’s one “issue” with this approach: the ‘yield’ keyword needs to be used whenever a continuation-based function is used. Calling these without ‘yield’ can have rather strange results.
  
  As such, an approach like Stackless Python, which is also based on coroutines/continuations, does not require the use of explicit ‘yield’s, these can be hidden inside more normal/sync-looking APIs (without requiring explicit ‘yield’ statements), which might be easier for most programmers out there.
  
  Reply
John Stowers says:

November 28, 2010 at 6:09 pm

From the GObject side, any idea how this approach compares to the other proposed GObject co-routine/continuation implementations; Fibres [1], GTask [2] and Iris (IrisTask) [3] ?

[1] https://bugzilla.gnome.org/show_bug.cgi?id=565501
[2] http://chergert.github.com/gtask/
[3] https://github.com/chergert/iris

Reply
- havoc says:
  
  November 28, 2010 at 7:09 pm
  
  had no idea any of those existed so I can’t compare without digging in. I backed into the GObject library after starting out to make the JS thing. cool that others are thinking along these lines.
  
  Reply
- havoc says:
  
  November 28, 2010 at 7:52 pm
  
  In a quick look those three are all somewhat different in approach from both each other and the code I’ve posted here, though in the same kind of problem space. All interesting stuff.
  
  Reply
Norman says:

November 28, 2010 at 6:58 pm

This looks like the proposed ‘async’ keyword for C# v.Next (5).

While technically different, in both languages the approach makes some very readable and easy to follow code.

In C# the compiler will rewrite the code for you and just add the “rest” as the callback.

Reply
Bob Hazard says:

November 29, 2010 at 8:00 am

Ah I’ve seen yield in Vala but didn’t really understand it coming from C++

Reply
Ole Laursen says:

November 29, 2010 at 8:55 am

Note that there can still be concurrency-related gotchas with this style of programming.

The root of the problem is that you end up with a chunk of code that looks like it’s sequential and atomic (at least if you’re used to single-threaded event-based programming), but in fact it’s not – while we’re suspended in the yield, somebody else can fiddle with the inevitable shared state.

If you are in the middle of updating a shared data structure when the yield happens, then you’re in trouble. That’s why concurrent programming always end up looking different from sequential, you always have to watch your steps.

In your example, you may have two actors writing the same cache entry simultaneously. Probably obvious here. But maybe not always. 🙂

In that light, having the explicit yield keyword might be a good thing.

Reply
Gustavo Carneiro says:

November 29, 2010 at 3:01 pm

This reminds me a few years ago when everyone was raving about “deferreds” in twisted (Python module). When I looked at it I thought “what? this is just callbacks, what’s so cool about it?”. Then came up with “tasklets”, based on an old pygtk trick in the faq (I think invented by james henstridge, but not 100% sure).

http://people.gnome.org/~gjc/gtasklet/gtasklets.html

The most difficult problem with coroutines is debugging. At least with threads and gdb you can interrupt all the threads and look at a stack trace of each thread. With co-routines, I guess you need to write a new debugger…

Reply
- havoc says:
  
  November 29, 2010 at 3:23 pm
  
  I don’t see the need for a special debugger here, any more than with any main loop callback setup.
  
  some coroutines stuff is doing crazy hacks to add continuations to C or Java that don’t have them. but in the JS case you’re just using normal language features.
  
  Reply
- James Henstridge says:
  
  November 30, 2010 at 10:11 pm
  
  While Twisted’s deferred objects are related to the pattern of passing callbacks to asynchronous functions, it turns things inside out. Being able to write code similar to Havoc’s example code in this blog is just one of the benefits.
  
  Since the deferred object acts as a handle for the result of the asynchronous work, you can attach callbacks to it after the work has already begun, which is not generally possible if the API requires passing in the callback at the start. If you’ve got multiple asynchronous tasks to wait on, it is pretty easy to combine them into a single deferred object and wait on that (Twisted provides a generic DeferredList() type to handle this).
  
  There is definitely more to it than just using generator syntax to write asynchronous code.
  
  Reply
  - havoc says:
    
    December 1, 2010 at 11:58 am
    
    right, the proposed gjs Promise object is just like a deferred in that you can add multiple on-got-value/on-error callbacks at any time. The yield stuff is just a cute syntax hack on top of that.
    
    The C code I posted here isn’t implementing coroutines or promises; it’s implementing a dispatcher for actor-like objects – i.e. objects with no shared state, and keep each actor object’s callbacks on only one thread at a time. The idea is to deal with both threads and main loop as an integrated whole in order to keep app code single-threaded, but still use all cores.
    
    You could use this dispatcher thing with any kind of async API you wanted, including just callbacks, or whatever.
    
    Reply
Kris Kowal says:

December 1, 2010 at 2:30 am

If you’re already sold on the idea of promises, you might consider using this library I ported from Tyler Close’s Waterken project in conjunction with the Google Caja team. It’s a very well-designed promise API that’s getting some traction in the CommonJS community. https://github.com/kriskowal/q

I also understand there might already be efforts within Mozilla to build the Node API within Spidermonkey instead of V8. There’s also GPSEE, a CommonJS implementation, which Wes Garland mentioned he was getting to work with the Node API’s as well.

Reply
Hannes Wallnöfer says:

December 1, 2010 at 8:36 pm

Thanks for this post! We already had everything required in RingoJS (async HTTP support via Jetty Continuations, JS 1.7, promises), so it took me very little time to add this feature to our upcoming Stick framework. Here’s the middleware and an example app:

https://github.com/hns/stick/blob/master/lib/stick/middleware/interruptible.js
https://github.com/hns/stick/blob/master/examples/interruptible/app.js

Actually I’m a bit embarrassed for not having thought about this myself as I have spent lots of time thinking about the problem. Somehow the JS generator send() and throw() methods had escaped me.

Reply
- havoc says:
  
  December 1, 2010 at 9:50 pm
  
  Awesome! Credit to C Scott for realizing generators could be combined with promises in this way.
  
  Reply
- Hannes Wallnöfer says:
  
  December 3, 2010 at 7:05 pm
  
  Github links above are broken because I renamed the module to “continuation”.
  
  https://github.com/hns/stick/blob/fa6e8df28a3784f9e8e423e14c893286e62afb16/examples/continuation/app.js
  
  I also added some more examples:
  
  https://github.com/hns/stick/blob/fa6e8df28a3784f9e8e423e14c893286e62afb16/examples/continuation/app.js
  
  Reply
Irakli Gozalishvili says:

December 2, 2010 at 5:43 am

That’s cool but yes it also have been done before:

http://hyperstruct.net/2008/05/27/synchronous-invocation-in-javascript-part-1-problem-and-basic-solution/

Reply
- havoc says:
  
  December 2, 2010 at 10:14 am
  
  the missing piece isn’t the generator syntax but a dispatcher that automatically uses both threads and mainloop.
  
  also having this as the default mode in some major, usable framework.
  
  i.e. can I use it today?
  
  not filing a patent or a research paper here, novelty isn’t really the objective 😉
  
  Reply
C. Scott Ananian says:

December 2, 2010 at 2:50 pm

I’m amused that all the commenters who want to be the first to shout “not new” are all pointing at *JavaScript* implementations of promises (or sometimes Python or C#).

Folks, the basic idea is as old as I am ( http://en.wikipedia.org/wiki/Futures_and_promises ) and the implementation technique is even older ( http://en.wikipedia.org/wiki/Continuation-passing_style ).

If I’m to take any credit at all, I’ll claim it’s for (a) coming up with an API which fit litl’s existing asynchronous code well, so that promises could be integrated incrementally into the large existing codebase, and (b) convincing Havoc to give it all a chance, even at a time when our codebase was supposed to be stabilizing. Give me all the kudos you like for those two things. 😉

Havoc’s contribution is integrating the idea into existing frameworks and mainloops. Like he says, “novelty isn’t really the objective”.

Reply
On the Out-of-Browser JavaScript Ecosystem by Nathaniel McCallum says:

December 6, 2010 at 4:38 pm

[…] is fragmentation. While NodeJS may be very cool, its locked into V8. Similarly, Havoc’s HWF requires SpiderMonkey. GNOME generally uses WebKit’s JavaScriptCore. Similarly, GJS and […]

Reply
C. Scott Ananian says:

May 8, 2013 at 4:43 pm

v8 has generators now: http://wingolog.org/archives/2013/05/08/generators-in-v8

Reply