Another possibility is to use multiple processes/threads and communicate via zeromq. zmq is hyped alot, but ipython uses it for multiprocessing and pyzmq is fairly easy to get on *nix platforms. I haven't checked into its availability on MS Windows. It's not clear to me where this would actually make a difference - most of the code either isn't inherently parallel, or is so easily split up that we can use the multiprocessing module (or even a queuing system). The direct summation is the only embarrassingly parallel routine that I'm aware of that's actually a huge bottleneck.
-Nat At a high level, the major benefit (of pyzmq over multiprocessing) would in instances where parts could be distributed among multiple machines or the process(es) are event-driven. Therefore, the benefit is application dependent, and pyzmq would require installing and testing another 3rd party lib.
--Jeff