There has been a huge response to the MagLev demo I gave on Friday, most of it enthusiastic, though not without the inevitable skepticism that comes with any announcement.
For those who weren't at RailsConf, here's a quick summary of how the demo went.
I started off by describing MagLev as a "full stack Ruby implementation", in the same way that Rails is a full stack web framework. To understand what I mean by that, see my earlier post on the Gemstone architecture: not only does MagLev provide a new (and fast) VM for Ruby, but it also provides an integrated shared memory object cache, and integrated transparent persistence. This fully replaces the typical Rails stack of many mongrel instances + several memcached instances + MySQL.
As a first demo, I showed a "magic trick" with two maglev instances running an irb-like shell in side by side terminal windows. A $hat global was defined in each, which just wraps an array and lets you put things in it. In the left window, I put a Rabbit into the $hat. I then looked at the $hat on the right and showed that the Rabbit had magically been transported there.
>> $hat
=> #<Hat:0x0c184bfd01 @contents=[
() ()
( '.' )
(")_(")
]>
How is this possible? Because they're the same hat. The integrated VMs, cache, and storage conspire to create an illusion that global state is shared across all instances: no matter how many VMs you add, over however many machines, they all see and work with the same set of Ruby objects.
There's no limit to what kinds of objects can be shared this way: procs and classes work just as well as arrays and strings. This isn't RPC - the objects are copied into a shared cache when they're created or modified, and if (but only if) another VM needs the object, it will pull it out of the cache and work on the local copy. All of these copies are kept in sync, and any changes are also written to disk by the storage engine so that the entire model is persistent.
This only applies to globally reachable objects - local variables, method arguments and so on aren't generally shared.
Obviously, with this kind of synchronization there has to be some concern for concurrency. MagLev handles this with transactions. Each VM has its own transaction state. When a VM enters a transaction, all of its changes are only locally visible until it is asked to commit. At that point, all of its changes get recorded to the cache and to disk and are available to every other VM.
A transaction can be aborted, in which case *everything* that has happened in that VM since the last commit (object modifications, creation, method or class definition, etc) will get rolled back. A transaction commit can also fail if it conflicts with concurrent changes elsewhere (for example, two VMs modifying the same instance variable of the same object at once).
Because these shared objects are stored on disk, and are lazily loaded into the VMs only when needed, it means you can work with datasets that have many, many more objects than would fit into available RAM. I showed a dataset that I had loaded in which contained 100 million movie reviews, and took up somewhere around 10GB. I could instantly pull in a single movie, modify it, and commit that change, without needing to load the other couple hundred million objects into RAM.
As a final demo, I showed how far MagLev has currently gotten with compatibility by running a simple WEBrick servlet.
At this point, Bob Walker took over. He gave some company background on Gemstone (they've been working on multi-user persistent dynamic language VMs since 1982), and some technical details on MagLev (the VM is a modified version of their Gemstone/S Smalltalk VM, with some Ruby-specific bytecodes; the bytecode is JITted to native code before execution). Then he showed some micro-benchmarks: for what it's worth, MagLev is anywhere from 6 times to (in the extreme case) 111 times faster than the standard 1.8.6 Ruby interpreter on things like fibonacci, block execution, method dispatch, and so on.
Bob then talked about scale. Gemstone has many customers running things like commodities exchanges, derivatives trading, container shipping, and so on that operate at very large scale on top of the same underlying technology as MagLev. Here are a couple of recent unsolicited quotes from a thread on the Joel on Software forums:
"I work for a major shipping company. We have a massive OODB and
Smalltalk Application (500 gig range) with 3 million lines of code.
We have 2000 plus daily users. We can do 700 transactions a second
before slowing down. We also have a Java + SQL +EMS system. On a
good day they can do 70 transactions a second, with three times the
hardware." --Timo (Saturday, February 16, 2008)
"Along side with the major shipping company, we are a major
commodities exchange using GS and ST and while our operational DB is
small (about 5 GB at the start of the trading day to less than 75 GB
and the end) we are probably one of the fastest. We easily handle
transaction rates approaching 6000/sec with about 8000+ daily
users. Our average data center round trip times are in the 2-3 ms
range." --GemStone Weenie (Monday, February 18, 2008)
It's worth noting that that's 6000 writes per second, sustained, and that this application peaks at about 3x that. By comparison, Twitter was once reported as having 600 requests/s (read and write).
Bob then moved onto the vision for MagLev going forward. A few important points:
- It doesn't run Rails yet, but it will.
- It will be RubySpec compliant.
- The Ruby source will be released. The C source code for the VM most likely will remain closed (but anything is possible).
- There will be a free version which will work for most uses, and a paid version for large-scale deployment.
- Look for another announcement/demo at RailsConf Europe in September.
After that we retired to the DoubleTree for a keg of Ruby ale.