May 12, 2008

Those who misremember history...

In Dynamic Languages Strike Back, Steve Yegge says


StrongTalk was really interesting. They added a static type system, an optional static type system on top of Smalltalk that sped it up like 20x, or maybe it was 12x.

Why do people make this stuff up? The following two statements are true:

  1. Strongtalk has an optional static type system.
  2. Strongtalk is 15-20x faster than most other Smalltalk systems.

What's false is the causal link Steve is claiming between them. They are entirely independent. Strongtalk was that much faster whether you used the optional static type system or not. Strongtalk's optimizing compiler completely ignored the types, and it made your program run not one iota faster to add them.

Update: see also Dave Griswold onStrongtalk's history:


... we had a type system and a compilation technology, which together were perfectly suited for a great production Smalltalk system, since they were independent of each other. This independence was critical, since the system would need to accept untyped as well as typed code, so that people could use the type system as much or as little as they wanted to, without impacting performance.

March 08, 2008

Ruby and other gems

I've had a number of conversations recently about Gemstone Smalltalk, largely in the wake of their announcement of support for my web framework, Seaside. It's complicated to explain Gemstone to people. It's not just an object database (though it is that), and it's not just a Smalltalk implementation (though it's that, too). The best thing I can compare it to is a Ruby on Rails deployment: not the framework, but the entire cluster of servers and software that goes into a large scale Rails app. Which is to say, perhaps, that Gemstone is best understood not as a piece of software but as an architecture.

At a high level, a typical Rails deployment looks like this: a cluster of servers supports one storage engine, several memory caches, and many worker processes. In Rails, the storage engine is always a relational database (usually MySQL), and sits on an especially hefty server by itself. Any number of other smaller, identical servers are each configured to run one memory cache (memcached) and 8-12 or so worker processes (Ruby interpreters running Rails and the Mongrel web server, generally just referred to as "mongrels").

The mongrels accept the web requests and run the actual application code. The objects inside these worker processes are live objects: they're sending and receiving messages, executing methods, changing state, and so on. They exist only inside the memory of a particular mongrel, for the duration of a single request that the mongrel is processing.

Many objects need to be persisted for longer than that, and these get written to and read from the storage engine - in Rails, using ActiveRecord. The storage engine is centralized (though it may be replicated to protect against failure), so that all of the worker processes see a consistent view of the data: if one of the mongrels modifies an object and commits that change to MySQL, the others will see that change the next time they need to load that object. The objects inside the storage engine are dead - they don't do anything until they're loaded into a worker process - but they're well preserved: they're kept on disk, not memory, so they'll survive a server reboot or other catastrophe.

Loading from and saving to the storage engine is relatively slow, and keeping objects there eats disk space, so the memory cache is an important third player in this game. A mongrel that's gone to the work of retrieving an object from MySQL might stash a copy in memcached for the other mongrels to retrieve, more quickly, if and when they need the same one. An object that's expensive to build - like a piece of complex HTML - but not important enough to save to disk might also be placed there for the convenience of the other workers on the same server. In Rails, the cache has to be managed carefully, so that you don't get out of sync with the consistent view of data maintained by the storage engine, but the work pays off with lower loads and faster response times. Objects in the cache are dead - usually marshalled into a meaningless string - and also transient, since the cache is purely in memory.

What about Gemstone? As it happens, the architecture is exactly the same: there's a single storage engine (called a "stone"), a memory cache on each server (the "shared page cache"), and any number of Smalltalk VM worker processes ("gems"). The gems handle the requests and run the code, and they stash objects in the page cache for speed and in the stone for persistence. The difference is, in Gemstone, these have all been designed from the ground up to work together as quickly and seamlessly as possible. In particular, this means two things:

1. Each part of the architecture uses exactly the same format to store the objects: whether it's a live object running in a gem, a cached object in the page cache, or a stored object on disk, the sequence of bytes is exactly the same. Unlike in Rails, where you have to be mapping and marshalling at every step, in Gemstone copying objects from storage to cache to worker process is pretty much just that - a simple byte copy. This makes it fast.

2. Objects are automatically kept in sync between each part of the system. The worker processes always load objects from the memory cache, because they can trust it to grab a recent copy from storage if needed. They also always save to the cache, because it will write the same change through to the storage without being asked. The gems also keep track of which objects have changed so that you don't have to, and will update the cache - and get updates from other gems back - automatically and transparently. The effect is as if all of your worker processes were running their objects inside a single, consistent and impossibly large chunk of persistent memory. This makes it easy.

To be extra clear, here's the mapping I'm trying to describe:

RailsGemstone
Provided ByStoresProvided ByStores
Storage EngineMySQLobjects mapped to relational tables"Stone" object storeSmalltalk objects
Memory Cachememcachedobjects marshalled to stringsShared page cacheSmalltalk objects
Worker ProcessMRI/MongrelRuby objects"Gem" Smalltalk VMSmalltalk objects

So there you have it: Gemstone, it's like Rails, but faster and easier. If only it ran Ruby...

January 21, 2008

Don't Panic

So what happened was, I was at my house on Galiano Island with my shiny new iPhone and without, at the time, either high speed internet or EDGE coverage, and I thought "gee, wouldn't it be nice if...". And I did some hacking, and then I mentioned it to Patrick Collison who was sharing office space with us and he ignored my hacking and did a ton of his own, and even though I now have DSL and EDGE out there it *is* nice: all of Wikipedia, stored and searchable in 2GB of your iPhone's flash drive. Get it here.

It's not perfect yet - there's no images, just text, and the parser is pretty basic and doesn't know about tables and stuff, and clicking on links can be flaky and slow, and if you do happen to have a network around it's probably a better experience to just go to wikipedia.org, but: there's really nothing quite like holding the sum of human knowledge in the palm of your hand. Patrick, I owe you many drams of whiskey whenever you're back in town.

January 02, 2008

DNA as Code

Over the holidays I was chatting with my brother the biophysicist about his research. Roughly speaking, he is trying to create DNA sequences that encode molecular motors. I was trying to understand what it meant to hack DNA from a programmer's perspective. Today I read this, which is in a very similar spirit. Two interesting data points from our conversation: one, the code my brother is "writing" is a few kilobase long, and could be represented in well under one kB of binary data. Two, his edit/compile/run cycle is about three weeks long, although he can do a dozen or so in parallel.

I thought these numbers were impressively small, especially that you could produce a working motor from a few hundred bytes of information (try that in Autocad...). He thought of them as huge, because they made it infeasible to brute-force the design by generating all the random variations and seeing which ones worked.

I'm certainly glad it doesn't take me three weeks to do a new build...

October 18, 2007

Code as Screenplay

Giles Bowkett writes


Debugger support is like nail-biting support, or farting-in-public support. Its absence is a feature. You want to avoid supporting bad habits. If programmers have to break their bad habits, that's a good thing.

I have a confession to make: I bite my nails. That's a bad habit, and I readily admit it. I also use a debugger. That's not.

Let me explain. Giles' argument seems to rest on this point:


Debuggers are based on the idea that the code base has enough places bugs could happen that the work of locating the bug is involved enough to justify machine assistance. This is not true of well-tested code. It is not true of code you understand, either.

What Giles glosses over is how you come to understand the code in the first place. Nothing helps you understand code - whether you wrote it or someone else did - better than stepping through it in a debugger. Since Giles is a sometime screenwriter, maybe this analogy is appropriate: reading the code is like reading a screenplay. Writing tests is maybe like drawing storyboards (they help you visualize the final product). Using a debugger is like actually watching the damn movie. With a jog wheel so you can slow it down. And no matter how good a screenwriter you are, no matter how good your director's storyboards are, when it comes time to cut the film you're going to find out that you didn't understand the movie as well as you thought you did, and you're going to need to watch the footage, sometimes frame by frame, and modify the movie accordingly.

Programs are the same way. Writing tests and reading code show you your program the way you want it to be, but only a debugger shows you the way your program is. Maybe screenwriters sit around in bars in LA and talk about how real filmmakers just read scripts, and the movies themselves are a crutch - me, I guess I like crutches.

See also: Patrick Collison, Ben Matasar, and Slava Pestov.

September 06, 2007

Code generation in Smalltalk and Ruby

Neal Ford had a recent post about the difference between code-generation (he calls it "meta-programming", but that's an overloaded and ambiguous term) in Ruby and Smalltalk. The core of his point is this: in Ruby, code generation is done at runtime, which means that what gets checked into your source code repository is a high level statement like "has_many :foo", which then generates the code when it is executed. In Smalltalk, code generation is done at development time (triggered by some custom wizard-like extension to the IDE), and so the generated code itself is checked in and the intent, according to Neal, is lost (as a trade-off for other benefits, like the ability to take the generated code into consideration when doing refactorings and so on, whereas in Ruby that code is invisible to any static analysis).

This is a straw man: Smalltalkers understand the need to capture (and later modify) the intent as well as anyone else does. The solution is to make the generated code round-trippable. If you look at any real Smalltalk tools that generate code based on a custom tool (the SmaCC parser generator is a good example), it will preserve the settings from that tool, for example in a class comment, and the tool will let you inspect the intent, modify the intent, and regenerate the code.

To be concrete: any self-respecting Smalltalk tool that let you generate all the code associated with a "has_many" expression would annotate those methods with the "has_many" intent, in a way that the tools could understand, present to the user, and modify.

(James Robertson points out that ORM tools in Smalltalk tend not to use code generation anyway, but I don't think that really answers Neal's point.)

July 04, 2007

Moving

Just a quick note that I've moved this blog to a new platform (typepad) and a new URL (www.avibryant.com). If you were subscribed to the old one, you shouldn't have to do anything, because the feeds are redirected. However, although all of the old posts are imported, the old permalinks are currently broken. When I find the time over the next week I'll set up the mapping for them but for now, if you came here from a link to a specific post, I apologize.

June 12, 2007

Technorati needs to catch up to Facebook

I was recently discussing social networks with Jon Udell, who was taking Gary McGraw’s position that “People keep asking me to join the LinkedIn network, but I’m already part of a network, it’s called the Internet” (source).

My social network of choice is Facebook rather than LinkedIn, and although in practice I would love to see the distributed infrastructure of the web, blogs and RSS reach the level of adoption and usefulness that Facebook has, I’m quite certain we’re not there yet. So let me leave this as a challenge to Technorati, Feedburner, TechMeme and anyone else trying to tie together the loose threads of the blogosphere - how do we Facebookize the open web?

Here’s what I wrote to Jon:

In blog terms: having a profile on Facebook is like having a blog, and adding someone as a friend is like subscribing to their blog. Updating your status, posting photos etc is like making blog posts, and writing on someone’s profile “wall” etc is like posting a comment on their blog.

In this context, Facebook has two killer apps. One is a smart feed, which aggregates and filters all of my subscriptions in a holistic way. For example, if 5 of my friends all post similar items on the same day, it will simply say “5 of your friends did X” rather than showing them to me individually. If it’s a slow news day, it will show me mundane items that it might otherwise suppress in favor of higher-content posts. Most interestingly, it will promote items based on combinations of my subscriptions: so if A posts a comment on B’s blog, that comment will appear in this feed only if I’m subscribed to both A and B.

The second is an API which allows access both to your blogroll data and to your smart feed. There can thus be a photo application where, instead of tagging photos with text the way I do in Flickr, I tag them with (semantic) references to my friends - equivalent to tagging with their blog URL. This then hooks into everyone’s smart feeds so that if I post and tag a photo of A on my blog, anyone subscribed to A’s blog - even if they have no idea who I am and aren’t subscribed to me - will get an item in their feed about it.

It’s possible to imagine a Technorati, say, that would do all of this for the real blog world rather than in the Facebook walled garden, but it’ll take some time and thought to implement technically, and even then is unlikely to reach the scale and network effect that Facebook already has for a very long time.

April 24, 2007

Phaux

In the spirit of Applied Web Heresies: William Harford’s Phaux, a Seaside-like framework for PHP. Check out the counter example or the form test. Neat.

April 14, 2007

Turtles need Speed

In a comment on my last post, Steven Swerling (a fellow Smalltalker) questions my repeated nagging about the speed of the current Ruby implementations (both C and Java):

Zed Shaw said that “scalability” is most constructively defined not by raw speed but by how predictably overall throughput can be increased by plugging in additional hardware. (see here. From that perspective, JRuby doesn’t have to be fast to meet its goals, it just needs to be fast enough.

I completely agree. The question is, fast enough for what. To me, one thing that’s wonderful about both Smalltalk and Java is that all of their libraries, including basic data structures like lists and hashtables, are implemented in, respectively, Smalltalk and Java. That’s possible because the underlying execution machinery is fast enough that, although it would certainly still be faster to have highly tuned Collection implementations in C, the pure Smalltalk and pure Java libraries are “fast enough”. That’s a crucial threshold, and it’s one that many languages, including Ruby, have not passed. I would argue, in fact, that’s it’s one of the key things people (perhaps subconsciously) use to distinguish between “real” languages and “scripting” languages. I generally hate that distinction, but in this case I think it’s warranted. You extend Java by writing Java, you extend C++ by writing C++, you extend C# by writing C#, but you extend TCL and Perl and Ruby, as often as not, by writing C. As I replied to Steve:

As long as people feel a need to implement classes like Array or Hash in C (or Java) rather than in Ruby, then the quantitative performance difference is having a qualitative effect, and in my opinion a serious one. Once that changes, I’ll be happy to back off.

If you want to know some of the ill-effects of implementing basic classes in C rather than in Ruby, (apart from how painful and error-prone it is to write the C in the first place) here’s a simple contrived illustration. Try creating a subclass of Array that, when you ask for an element, returns twice the value of what you stored there:

class MyArray < Array def [](i) super * 2 end end x = MyArray.new x << 10 puts x[0] x.each{|el| puts el} 

The result of the first print statement (using either ruby 1.8.2 or JRuby 0.9.8) is “20″, as it should be. The result of the second print statement is “10″. Why? Because the C/Java implementations of Array#each totally ignore our Ruby-level definition of [], and simply access the internal elements directly. This kind of inconsistency is pervasive and puzzling. But even if the C implementation behaved identically to a hypothetical Ruby implementation, it would still bother me, because having your standard library implemented in Ruby makes it so much easier to understand, explore, learn from, debug, and modify. It would also, incidentally, make projects like JRuby so much easier, because you would only have to implement the basic language rather than redo all of the work that had been done at the C level. Dan Ingalls did a lovely binary compatible re-implementation of the Squeak VM in Java as an exercise to learn the language, in a tiny fraction of the time that JRuby has taken, because everything important, down to the parser, compiler, process scheduler, windowing system, and IDE, were implemented in Smalltalk anyway and so could be reused. That’s the kind of trick I’d like to see Ruby able to pull off.

My Photo

Twitter Updates

    follow me on Twitter