Praise for Greenlet

Does this situation feel familiar to you? You're writing some kind of system that has to support several clients, or players, or just plain actors, at the same time. A networked multiplayer game would be a good example. Now, the practicalities mean that incoming data (from a network socket, for instance) needs to be handled concurrently; in other words, you can't decide to block for one client while performing some action and ignore everyone else. But you still need to preserve state.

In this situation, two choices quickly spring to mind: Either 1) use client callbacks, or 2) use threads. The first solution tends to lead to clunky situations where a client must repeatedly return and await another callback, and then check state through a convoluted switch or if clause, and decide what to do. The latter lets you block while awaiting more data, but full-fledged threads don't scale very well and you have to care about things like locking (which can be quite a nightmare) to avoid catastrophic results.

There is a better solution: coroutines.

What are coroutines? The easiest way to think of them might be to compare them to subroutines. A subroutine, as the name implies, is subordinate to another routine; the parent routine calls the subroutine (which may in turn call further subroutines), and eventually gets some value back. The subroutine lives inside another subroutine, sits above it on the same stack, and the only way to go back to it is to terminate and return.

Coroutines, on the other hand, work differently. Coroutines don't call each other, and don't live on the same stack. They are not subordinate to each other, but cooperative; hence the "co-" in coroutine. Instead of calling each other, they yield to each other, suspending their own execution and letting the other coroutine continue. Confused yet? Time for an example, I think.

The below is pseudocode, but we'll write actual Python later on. Take a look:

coroutine CreepyKing():
    while True:
        print "Delicious burger served."
        burgerQueue.push(makeBurger())
        yield Wimpy

coroutine Wimpy():
    while True:
        print "I'll gladly pay you Tuesday for a hamburger today..."
        yield CreepyKing
        eat(burgerQueue.pop())
        print "That was a good burger."

Two keywords unfamiliar to those not initiated in the ways of coroutines here: The coroutine keyword itself, and its companion, yield. If we start this off by running the coroutine Wimpy(), we expect to see something like this:

I'll gladly pay you Tuesday for a hamburger today...
Delicious burger served.
That was a good burger.
I'll gladly pay you Tuesday for a hamburger today...
Delicious burger served.
That was a good burger.
...

The magic is in the concept of coroutines, and what makes them special: The ability to suspend execution at any point and return at the same entry point (with local state preserved) at a later point. This is what the yield keyword does.

The coroutine starts running like any other piece of code, right up until we hit a yield, which takes as a parameter another coroutine. At that point, we suspend execution of the coroutine, and resume the other coroutine. If the other coroutine has not yet been started, we start running it from the beginning; otherwise, we continue wherever that coroutine last yielded.

Fans of Python and other languages may soon realize that coroutines are, in a sense, a generalization of generators, including using the yield terminology, and this is exactly right. However, with a coroutine library proper, we can do so much more than produce values.

This is where greenlet comes in. Being a spinoff of Stackless Python, it essentially provides coroutine support for regular, non-stackless Python (though they call them greenlets for some reason). More detailed documentation is available at the greenlet site, but let's have a look at the above example written using greenlet:

class BurgerMaker(greenlet):
    def run():
        while True:
            print "Delicious burger served."
            burgers.append(makeBurger())
            wimpy.switch()

class BurgerEater(greenlet):
    def run():
        while True:
            print "I'll gladly pay you Tuesday for a hamburger today..."
            creepy_king.switch()
            eat(burgers.pop(0))
            print "That was a good burger."

creepy_king = BurgerMaker()
wimpy = BurgerEater()
wimpy.switch()

(I have chosen here to use the class-and-object way of doing things, because I find it encapsulates things a lot better, but you can also create "generic" coroutines on the fly and give them callbacks to run instead.)

As you can see, coroutines are instances of the greenlet class and the yield keyword is a method called switch, but otherwise, it's the same. We create the two greenlet objects and start the first one off by switching into it. The rest proceeds as described above.

You can pass data between greenlets when switching between them, like so:

def onDataArrived(data):
    data_reader.switch(data)

def handleData():
    while True:
        # Wait for some data
        data = greenlet.getcurrent().parent.switch()
        process(data)

data_reader = greenlet(handleData)
data_reader.switch()

(Here, I used the other way of creating greenlets, to show how that's done.)

Here, you can see that any arguments given when switching to a greenlet is provided as return values from that greenlet's own switch it used to pause itself. There's one other peculiarity in above code; instead of explicitly switching to a named greenlet, we use greenlet.getcurrent().parent. Why is that?

Well, as you can imagine, that expression retrieves the greenlet that sits "above" the current one; that is, the greenlet that invoked this one using its switch method. In this case, that greenlet is the implicit "root" greenlet, which is automatically provided and hidden. In practice, what it does here is return control to the main thread of execution, letting it continue with whatever setup it likes.

From there on, whenever we get some data (possibly asynchronously), we pass it to onDataArrived, which switches control to the handling greenlet, passing the data along, and letting it run as far as it likes before it switches back to the parent (the main program), ready to get more data.

I put this to use myself with a private project of mine, which is text-based and networked. Previously, to do things like setup and handshaking, I'd have to do something like this:

def handshakeStepOne(command):
    if command == "login":
        self.send("Login name: ")
        self.state = 2

def handshakeStepTwo(command):
    if not userExists(command):
        self.send("Failname. Try again: ")
        self.state = 1
    else:
        self.end("Password: ")
        self.state = 3

def onDataArrived(command):
    if self.state == 1:
        self.handshakeStepOne(command)
    elif self.state == 2:
        self.handshakeStepTwo(command)

... and so forth. Ugly, stupid, wasteful, and confusing. Thankfully, greenlet came to the rescue. Now, all I have to do is this:

def handshakeProcedure():
    cmd = self.readline()
    if cmd == "login":
        self.send("Login name: ")
        loginName = self.readline()

        if not userExists(loginName):
            self.send("Failname.")
            return
        else:
            self.send("Password: ")
            passwd = self.readline()

def readline():
    return greenlet.getcurrent().parent.switch()

def onDataArrived(data):
    self.switch(data)

... and that's it. You start the handshaking procedure, which continues until it wants to have some data, at which point it calls readline. That method then suspends execution (this is all one and the same greenlet, by the way; the three methods are not individual greenlets) until data is passed to the object via giveData. The result is that once you have the infrastructure in place, things like handshakeProcedure can be written as if you were interacting with a simple single-user terminal, which is both convenient and intuitive.

Even better, you don't have to care about locking anywhere; as you may already have realized, coroutines are synchronous. They are not actually concurrent in the sense that they run independently of each other, given timeslices by a scheduler—if anything, they resemble the concept of cooperative multitasking (see, there's that co- prefix again), meaning they have to pass priority back when they are done with whatever they want to do at the moment. The result is no possibility of race conditions or clobbered variables, and no need to muck about with semaphores or mutexes.

I hope I've piqued your interest in greenlet, and that you may found it useful in your own projects.

Comments

1
Elver On January 2 2008 (January 2 2008 04:10)

Note. Your RSS stream has a Y2K8 error. It links to http://www.monkeyblah.com/blog/2007/12/29/praise-for-greenlet/

2
Peter C O Johansson On January 2 2008 (January 2 2008 04:20)

That's the day I started writing the article, but... the date is supposed to be updated before RSS is generated. I'll check.

3
Elver On January 2 2008 (January 2 2008 07:12)

Topical comment. That readline/login example needs to be wrapped into some kinda protocol superclass. So I could just "def handshake" and not have to worry about any greenlets.

4
Peter C O Johansson On January 2 2008 (January 2 2008 07:30)

You still have to actually *do* it. "Wrapping it in a class" won't get you out of implementing the ridiculous multi-state multi-callback implementation, whereas coroutines will.

If you mean the final solution that uses coroutines, then you'll note how the actual handshakeProcedure() method doesn't actually mess with the greenlet code at all. It just calls readline() when it needs to and the rest is handled by the other methods.

5
Elver On January 2 2008 (January 2 2008 15:30)

Yeah, so it would be good to wrap the readline into a greenlet-enabled superclass. So you could write the handshake thing without actually thinking about greenlets and coroutines.

6
Peter C O Johansson On January 2 2008 (January 2 2008 16:13)

In my project, that's already kind of done; it's all part of a Client class, and the login code and such doesn't need to concern itself with the greenlet details. However, it should be noted that the exact path of execution is going to be dependent on what exactly you are trying to model, and a "readline" superclass would only make sense for certain kinds of input, such as text commands given over a network.

Add comment

Markdown may be used.
You may enter a name for yourself. If left blank, the comment will be signed “Anonymous.”
Optionally, you may enter an URL to appear alongside your name.
Enter the word shown in the picture above.