I've had a number of conversations recently about Gemstone Smalltalk, largely in the wake of their announcement of support for my web framework, Seaside. It's complicated to explain Gemstone to people. It's not just an object database (though it is that), and it's not just a Smalltalk implementation (though it's that, too). The best thing I can compare it to is a Ruby on Rails deployment: not the framework, but the entire cluster of servers and software that goes into a large scale Rails app. Which is to say, perhaps, that Gemstone is best understood not as a piece of software but as an architecture.
At a high level, a typical Rails deployment looks like this: a cluster of servers supports one storage engine, several memory caches, and many worker processes. In Rails, the storage engine is always a relational database (usually MySQL), and sits on an especially hefty server by itself. Any number of other smaller, identical servers are each configured to run one memory cache (memcached) and 8-12 or so worker processes (Ruby interpreters running Rails and the Mongrel web server, generally just referred to as "mongrels").
The mongrels accept the web requests and run the actual application code. The objects inside these worker processes are live objects: they're sending and receiving messages, executing methods, changing state, and so on. They exist only inside the memory of a particular mongrel, for the duration of a single request that the mongrel is processing.
Many objects need to be persisted for longer than that, and these get written to and read from the storage engine - in Rails, using ActiveRecord. The storage engine is centralized (though it may be replicated to protect against failure), so that all of the worker processes see a consistent view of the data: if one of the mongrels modifies an object and commits that change to MySQL, the others will see that change the next time they need to load that object. The objects inside the storage engine are dead - they don't do anything until they're loaded into a worker process - but they're well preserved: they're kept on disk, not memory, so they'll survive a server reboot or other catastrophe.
Loading from and saving to the storage engine is relatively slow, and keeping objects there eats disk space, so the memory cache is an important third player in this game. A mongrel that's gone to the work of retrieving an object from MySQL might stash a copy in memcached for the other mongrels to retrieve, more quickly, if and when they need the same one. An object that's expensive to build - like a piece of complex HTML - but not important enough to save to disk might also be placed there for the convenience of the other workers on the same server. In Rails, the cache has to be managed carefully, so that you don't get out of sync with the consistent view of data maintained by the storage engine, but the work pays off with lower loads and faster response times. Objects in the cache are dead - usually marshalled into a meaningless string - and also transient, since the cache is purely in memory.
What about Gemstone? As it happens, the architecture is exactly the same: there's a single storage engine (called a "stone"), a memory cache on each server (the "shared page cache"), and any number of Smalltalk VM worker processes ("gems"). The gems handle the requests and run the code, and they stash objects in the page cache for speed and in the stone for persistence. The difference is, in Gemstone, these have all been designed from the ground up to work together as quickly and seamlessly as possible. In particular, this means two things:
1. Each part of the architecture uses exactly the same format to store the objects: whether it's a live object running in a gem, a cached object in the page cache, or a stored object on disk, the sequence of bytes is exactly the same. Unlike in Rails, where you have to be mapping and marshalling at every step, in Gemstone copying objects from storage to cache to worker process is pretty much just that - a simple byte copy. This makes it fast.
2. Objects are automatically kept in sync between each part of the system. The worker processes always load objects from the memory cache, because they can trust it to grab a recent copy from storage if needed. They also always save to the cache, because it will write the same change through to the storage without being asked. The gems also keep track of which objects have changed so that you don't have to, and will update the cache - and get updates from other gems back - automatically and transparently. The effect is as if all of your worker processes were running their objects inside a single, consistent and impossibly large chunk of persistent memory. This makes it easy.
To be extra clear, here's the mapping I'm trying to describe:
| Rails | Gemstone |
| Provided By | Stores | Provided By | Stores |
| Storage Engine | MySQL | objects mapped to relational tables | "Stone" object store | Smalltalk objects |
| Memory Cache | memcached | objects marshalled to strings | Shared page cache | Smalltalk objects |
| Worker Process | MRI/Mongrel | Ruby objects | "Gem" Smalltalk VM | Smalltalk objects |
So there you have it: Gemstone, it's like Rails, but faster and easier. If only it ran Ruby...