Happy Birthday, Debian!

Debian GNU/Linux turns 19 today.

I estimate that I did my first install some time in late 1995, perhaps early 1996. I haven’t really used anything else as my day-in-day-out OS since. I’ve never had a Mac of any stripe, and haven’t used Windows with any frequency other than for World of Warcraft since ’99.

I can pin my first contribution to Debian with far more accuracy: September 3, 1996. That’s the date on the first Debian changelog entry in the libwww-perl package, which was, I believe, the first package I ever made. It still exists in Debian and Ubuntu (and other derivatives) and if you have it installed, you can look at /usr/share/doc/libwww-perl/changelog.Debian.gz, and right down there at the very end, you’ll find my grubby little fingerprints.

Sadly, things quickly went downhill—I am, to some extent, to blame for the fucked-up naming convention (with its poorly-sorting use of a -perl suffix) of every Perl library package in Debian, and probably by extension, the similar poor choices in the -java and -cil groups. Even the PHP guys were smarter.

As I remember it, having packaged libwww-perl (which is the actual name of the package as it exists on CPAN, so I just used that as the package name), I discovered that for some things—FTP support, I believe—it required the libnet package, which provides a lot of Net::* modules. But when I announced that I was going to package it someone (I want to finger Rob Browning, but cannot in be certain, and I don’t know if the Debian archives go back that far, and can’t be bothered to check, really) said they were about to package the libnet C library, which would conflict, so maybe I could call it libnet-perl, like libwww-perl, which I did. And then the next thing I packaged I ended up calling lib<whatever>-perl for no good reason, and it all went to hell.


Almost 16 years later, we have ~ 3000 Perl packages in the repository, all with that damned -perl suffix. So, um, sorry.

My other big accomplishment of any note, I think, was to finally get the 64-bit Alpha port to a self-sustaining state. This started with a bootstrap that someone had done from a set of RedHat binaries before, and gradually pulling along various bits of the system until we had a self-hosting system.

Along the way, I was responsible for another unfortunate bit of “engineering”—@libc6.1@. Again, I don’t think I was totally alone in putting this forth (David, David…um, I forget his last name, and the libc6 changelog.Debian doesn’t go back that far), but it probably would have been better to bite the bullet and avoid all the gymnastics it required.

As I remember it (this was, err, ’97? So I may have some details wrong), RedHat had pushed their first Alpha release using a libc with a SONAME of 6, based on a pre-release glibc-2…and then the 64-bit ABI was changed when glibc-2 was released. Our versioning tools for shared libraries were much more primitive at the time, and we wanted to keep compatibility with RedHat, since that’s where a lot of the heavy duty engineering was going, so we had to follow them in changing the library SONAME to 6.1. This entailed a lot of churn at the time, most of which I’ve blocked out. I did a lot of mechanical patches on a lot of packages.

I did spent a lot of time doing fixes for 64-bit-isms in various packages, and I can remember the flush of pride I had when Alan Cox mentioned that he’d gotten a bunch of 64-bit fixes as well as the conversion of the @mh@ mail client to use an ELF shared library from the Debian package, because that was my work.

My time as an active Debian contributor was not always a smooth one—I wasn’t always as attentive about keeping things up to date or doing triage on bugs as I could have been. I think I finally formally recognized that I wasn’t able to keep up around 2002, which was probably a couple of years later than everyone else had realized it.

Ironically, I probably maintain more packages now, for our internal company purposes at Ironic Design, than I ever did as a developer; the tools have made the maintenance at least of the sorts of packages I do (libraries and simple applications) incredibly easy.

I maintain ~ 20 production Debian servers, with a handful of dev servers and a couple of home systems, including the laptop I’m writing this on, with basically no problems on a day to day basis. It has its warts, but I have boxes that have been continuously upgraded over a span of half a dozen years with no appreciable problems. I am able to be productive and use the environment happily. I remember the rough spots at the beginning (the move from a.out to ELF, for instance), but what Debian provides now always surprises and delights me.

So kudos to those Debian maintainers, former and current, who have contributed to such a great software system.

Thank God they at least got rid of the commas

Someone created a bestiary of “List Code Typography”:http://kazimirmajorinc.blogspot.com/2012/03/few-examples-of-lisp-code-typography.html and my immediate gut reaction upon seeing the earliest possible examples was that the only thing that could ever have been more confusing than all the parenthesis in the world was if you had to put commas in-between every goddamned thing.

The language I’m currently learning, Haskell, has its roots in the Lambda Calculus as well, but goes entirely in the other direction–no punctuation at all.

Choosing a new language

I have been programming primarily–for long stretches, almost exclusively–in Perl for the last 17 years or so. I seem to remember starting to use it around mid-1995, with 5.001–during that long, awkward time between when Perl 5 came out and when the 2nd edition of Programming Perl finally arrived in late 1996.

I’ve kept with it because I’m fluent in it, I am productive in it, and at this point, I can make it do some fairly absurd things (ask me about writing event-driven servers in Perl, I dare you). In fact, I like the language. I understand the complaints people have about it, but the subset in which I write these days is pretty clear while remaining concise and expressive, and the ecosystem that exists around it is simply unparalleled.

Nonetheless, I think the time has come to move on. The downsides of the language–speed, largely, and lack of good language support for expressing things like parallelism–have started to wear at me. I’m tired of the hoops I have to jump through to do the things I want to do.

So for the last 18 months or so, I’ve been reading a lot about a number of languages. I don’t think I’ve rejected any out of hand except PHP, though I certainly have some biases. For instance, I am looking for a mainstream language–something like “IO”:http://iolanguage.com/, though interesting, does not qualify.

But mainstream isn’t everything–I want something that is going to open up new options, that’s going to be fun to get immersed in; so I’m not considering things like Ruby or Python because for the most part I think they recapitulate most of the problems I have with Perl (speed, concurrency support) just with different syntax.

In the end, I came down to three options. Node.js, Scala and Haskell. I find that as I’ve been sitting with the question for the last couple of weeks, though, I’ve stopped thinking about Node.js as a real option. Though it’s fast, and it’s got a great ecosystem of software surrounding it, raw event-driven programming doesn’t really engage me any more. It was fun for the first year or two I did it, but the idea of moving to an environment where Everything Is A Callback leaves me cold.

So it’s down to Scala and Haskell, I think.

As a consequence, I’ve spent the last week reading _Programming in Scala: A Comprehensive Step-by-Step Guide, 2nd Edition_ by Odersky, Spoon and Venners, and before that I got most of the way through _Learn You a Haskell for Great Good_ by Miran Lipovaca (though I’m going to go back through it now and finish it).

I intend, over the next couple of weeks, to post about my experiences working on using each to write a couple of short (but non-trivial) programs with both of them–ones that, incidentally, I have implemented in Perl already, so I can do a real comparison of code.

Getting a good copy of the org-mode refcard on two-sided Letter paper

Dear lazyweb,

Perhaps this was just an oddity of my printer, but here’s what I had to do to get a good print of the org-mode refcard onto Letter paper. From within the org-mode sources, I did:

bc. make doc/orgcard_letter.tex
cd doc
tex orgcard_letter.tex
dvips -O “-.5in,.25in” -t letter -t landscape orgcard_letter.dvi

This got me a .ps file that seemed well-centered on the page. To print it, I did:

bc. ps2pdf14 orgcard_letter.ps
evince orgcard_letter.pdf (print, duplex flipped on the short side)

I probably could have done (using lp directly, but since I was also using evince to eyeball the layout first, it was easiest to do it from there):

bc. lp -o sides=two-sided-short-edge orgcard_letter.ps

What a difference two years make

A little over two years ago, I wrote “a post”:/2009/01/26/web-server-software-on-linux/ about my view of the web server software landscape under Linux, concluding with how I’d ended up sticking with Apache despite having tried most of the other reasonable candidates because they all seemed lacking.

It’s interesting in part because I never recorded when I moved that server from Apache to Cherokee (which I had tried to poor results, as noted in the post), which would have been not too very long after I wrote that post. Oh, well.

Anyway, everything ran alright on Cherokee for 18 months or so, but Cherokee wouldn’t let me do per-client bandwidth throttling, which I really needed as Chet’s blog was getting hammered mercilessly by spammers, and I couldn’t figure out any other way to slow them down.

So I switched to nginx, which by now I have a fair amount of experience with, using it for http and imap proxying as well as serving fastcgi apps for other projects. If it weren’t for the decrepit software that Chet and I have been using for blogging for the last couple or three years, everything would have been great (Movable Type–and even its follow-on project Open Melody–has never modernized its low-level infrastructure to allow good support of FastCGI; they’ll tell you they have, but just ask them if you can do XML-RPC–the foundation for remote posting–and watch their reaction).

Still, I’ve made do with a FastCGI shim for Movable Type, and then decided to start looking at WordPress, which I’ve already converted to, and which I think we will get Chet converted to shortly.

Incidentally, I became even happier about having moved to nginx about a week later, when I ran across a blog post from Cherokee’s author, posting a link to “a performance comparison”:http://www.alobbs.com/1390/Linux_Format_benchmarks_Cherokee.html#comments that showed Cherokee beating everything else.

Normally, I would just say “great” and move on, but knowing a little bit about nginx, I was surprised at the version being used, as it seemed a little old. And so I did some more research, and found that the article was comparing an up-to-date version of Cherokee to much older versions of other servers, in some cases versions from branches that had long been declared obsolete. Still, it’s not the fault of Cherokee’s author someone at a magazine did a crappy test.

However, when I presented my findings, and asked him to acknowledge the issue, and call for the author of the article to do better:

bq. Alvaro, I understand that it is nice to see your software perform well against its competition, but I would encourage you to dissociate yourself from this comparison, or at least take it upon yourself to point out that there were some things that may have left your competitors at a disadvantage.

he suggested that he had made any caveats he needed when he said “you shouldn’t expect an extensive, in-depth benchmark” from the “very well written article”, though he did note that “the benchmark results are still fairly representative IMHO.”, and when pressed, said that he didn’t think the results would have changed with more recent versions of the other software.

It took a while to get the taste out of my mouth. I guess I still haven’t, given that I’m posting this.

Anyway, up with nginx.

A thing of beauty it is…

I’ve spent about the last three weeks converting much of the infrastructure code for AnteSpam to use AnyEvent.

One of the small bits of fallout from using AnyEvent is that we now have a large number of anonymous code references as callbacks, and in our logging code, these all have the same name: @__ANON__@.

This makes debugging output a little less useful.

In browsing some code in AnyEvent::SMTP, I happened across the trick of locally setting the @__ANON__@ typeglob to the name you want to use used in stacktraces and the like:

bc. my $var = sub { local *__ANON__ = ‘What::ever::you::want’; … };

So, this is kinda ugly, and I couldn’t find any official documentation of it, so I went looking around, and found @Sub::Name@, which is a module to make this a little more palatable. Now we can do:

bc. my $var = subname ‘What::ever::you::want’ => sub { … };

Still perhaps not beautiful, but not totally covered in warts, either.

Now to go retrofit this onto all of our code…


As so often happens, we resist things we don’t understand, in favor of those we do, but if we only take the time to learn…

Geek-dom ahead, you have been warned.

I do almost all of my programming in Perl these days–in fact, for the last decade and a half or so. I’m not interested in getting into a langage war here–I know Perl’s weaknesses as well as its strengths.

Anyway, for the last three years or so, we at “AnteSpam”:http://antespam.com/ have used a Perl script to manage refusing connections from malign hosts and rejecting requests to send mail to non-existent addresses–generally at a sustained rate of several per second, occasionally peaking into dozens per second or more, per server (we have 16 production servers).

Perl has no useful multi-threading, and if we tried to service these requests using one script per connection, we would be screwed–in fact, when we first tried to manage this stuff ourselves, three years ago, our first implementation did just that, and the servers melted; they couldn’t take the load of all of those memory-piggy scripts running at once.

Back in 2007, looking for a solution to this, I found “POE”:http://poe.perl.org/, a mature Perl framework that allows you to do event-driven cooperative-multitasking with asynchronous I/O and various other bells and whistles.

It is very good at what it does, and over the last few years I’ve become very conversant with it. We have several very important pieces of our infrastructure written with it, and they work very, very well.

Still, it has some issues, the biggest of which is that you have to write your code in a very particular style–short routines that queue events that are handled by other routines and things like that–things that mean that if you write code to integrate smoothly with POE, it’s going to look very weird if you try to use it outside the POE framework, and if you write your code outside the POE framework, it’s not going to play well with POE.

As a consequence, there are lots of libraries that don’t play well with POE–you can use them, but you loose the smooth cooperative-multitasking and asynchrony; basically, you lose the ability to handle many things at once, at least with low-latency. And people who aren’t used to POE end up looking at your code in bafflement.

Still, this has been a fine solution for us for years, and I’ve resisted changing it, because every time I’ve tried to work with something else–and here I’m thinking specifically of “AnyEvent”:http://search.cpan.org/dist/AnyEvent/, I just couldn’t see the big benefit. The one time I tried reimplementing something with it, the code got a few lines shorter, but otherwise, it was 6 of one, half-dozen of the other.

And then I had my epiphany.

What I realized is that I could rewrite some code that was duplicated between the “regular” programs, and the high-performance daemons to use AnyEvent in such a way that I could use
the same code for both–when I needed high-peformance cooperative multitasking, I would have it, and when I didn’t need it, the code would look exactly the same.

In effect, I was going to be able to get rid of a huge chunk of duplicative code and use the same code everywhere, transparently. And once I got the basic libraries re-done, there was even more code I was going to be able to merge.

In the space of 24 hours, I rebuilt our low-level LDAP and Memcache access layers to transparently use AnyEvent. I didn’t change any code outside of those libraries, and all tests passed once I was done. That performance-critical daemon I talked about at the beginning–I’ve almost finished rewriting it in the space of a couple of hours.

By making this change, everything is looking cleaner and more straightforward than ever before.

When you have that moment of realization, everything can change.

Highlights of Free Software Documentation #1

When someone undertakes something for fun, or out of passion or deep commitment, the end result is often, I think, more reflective of them personally.

This is generally true of Free Software, and in the Free Software universe, I think this is sometimes even more true of documentation–you’re not obligated to write it, no one’s paying you, few people enjoy writing docs, so if you’re doing it at all, it’s because you _believe_.

So the writers’ personalities and convictions show through just a little bit more, Like this bit from “Dave Rolsky”:http://autarch.org/ with whom I am slightly acquainted. Contained within “Moose::Cookbook::Basics::Recipe10”:http://search.cpan.org/~drolsky/Moose/lib/Moose/Cookbook/Basics/Recipe9.pod I ran across this gem:

bq. Our Human class uses operator overloading to allow us to “add” two humans together and produce a child. Our implementation does require that the two objects be of opposite genders. Remember, we’re talking about biological reproduction, not marriage.


Web server software on Linux

So there has always been a multiplicity of web server software for Unix/Linux.

It certainly feels like I have, at some point or another, played with all of them. And I keep coming back to apache, which I’ve been using since 1995, when I first became responsible for running a web server (“this site”:http://www.med.miami.edu/, if you care).

Incidentally: Holy crap, 14 years.

Anyway, as I stare around the unix landscape, I see four general-purpose web servers with some mind-share: apache, lighttpd, cherokee and nginx. Yes, there are others, but they are niche players, or they are not general purpose. So here’s my issues:

h2. “lighttpd”:http://www.lighttpd.net/

For the last couple of years I’ve run wiki.mallet-assembly.org on a box that was running lighttpd. And, honestly, I’ve not really had anything to complain about; it was stable, it was fast enough, etc. But if I wanted to run fastcgi programs as some user other than www-data (better for security), I had to run them as their own daemons. This isn’t the end of the world. What ultimately made me decide against it was that lighttpd has spent much of the last two years in perpetual rewrite mode.

h2. “cherokee”:http://cherokee-project.com/

I’ve been paying attention to Cherokee for the last year or so. It was looking like an interesting alternative to lighttpd. And then I tried it. Just as was always the case with the Netscape Enterprise Server/iPlanet software that I hated when I was at Dorado, the only documented interface for configuration was web-based. This isn’t the end of the world. But when it mysteriously “broke comments”:http://mischeathen.com/2009/01/good-news-bad-news.html for no discernable reason–and we’re using straight CGI, the simplest possible option for it–that got it the boot. And it’s error log? Useless.

h2. “nginx”:http://wiki.codemongers.com/Main

Nginx is great at what it does. Seriously. Couldn’t live without it. But really, it’s a proxy that happens to have also been taught how to speak FastCGI (and IMAP and SMTP and various other things–it really is great), and as a consequence it doesn’t so some important things like, well, CGI. I am using it for wiki.mallet-assembly.org right now, because it’s ultra-light and I’m running mediawiki under FastCGI there, so it all works out, but I’m probably going to move it to apache before too long because…well, who wants to have to keep track of two different packages.

h2. “apache”:http://httpd.apache.org/

Big. Complicated, with too many options to keep track of. One of the more annoying configuration syntaxes around (Fake html tags to denote sections? Really?). But dammit, it works, even when you ask it to take care of spawning fastcgi processes as another user. And it’s not *that* baroque. And even when it is (mod_rewrite, I’m looking at _you_), it’s still better documented than any of the other options. And most of the unix-oriented web software just pretty much assumes you’re going to be using it.

There’s really just not any competition.

Now don’t get me wrong–I would be disappointed if suddenly everyone abandoned all of their other systems. An Apache monoculture would benefit no one. To those working on the other systems, well, I’m gonna keep looking at them and seeing how they evolve. If nginx sprouted *simple* CGI support (none of this “write your own FastCGI server process that would proxy the CGI scripts” stuff), I would almost certainly move to that.

But for the moment, Apache it is.


OK, so the thing about today’s hackers is that they’re often strikingly funny.

So even if you don’t care about javascript or emacs, much less javascript *and* emacs, you should go read “Steve Yegge’s discussion of implementing javascript in emacs lisp”:http://steve-yegge.blogspot.com/2008/11/ejacs-javascript-interpreter-for-emacs.html because all of the digressions and other silliness are sure to make you laugh.

I mean, here’s Steve discussing the name:

bq. In that blog I mentioned I was working nights part-time (among other things) on a JavaScript interpreter for Emacs, written entirely in Emacs Lisp. I also said I didn’t have a name for it. A commenter named Andrew Barry suggested that I should not call it Ejacs, and the name stuck.

“Via”:http://jwz.livejournal.com/965656.html (who has his own history with Emacs)

The natural progression of kernel hacking

I don’t think this is the first time I’ve quoted Rusty Russell:

bq. I think Willy did it because this is for printk. It makes more sense than
everyone opencoding an -ENOMEM handler, which will have to be replaced by
some mildly amusing string like “I want to printk but I have no memory!”.
Next think[_sic_] you know 70% of the kernel will be bad limericks as everyone tries
to one-up each other.

I note this just because Chet has had interchanges with him in the past…

“Steve Gibson, of SpinRite fame”:http://grc.com/ has come up with sort of a “super-simple variation on the little RSA keyfobs”:https://www.grc.com/ppp.htm where you instead carry around a little business card that you can print up yourself that has a bunch of possible second-factor entries you can use for auth.

The part that makes me laugh a bit is that it is–as you might expect if you remember the ads for SpinRite back in the day–a Windows .DLL coded in assembly.

John Graham-Cumming has “a C implementation of the scheme”:http://www.jgc.org/blog/2007/10/open-source-implementation-of-steve.html. He also has a three-letter domain name. Coincidence?

Incidentally, I wrote my first ever bit of python code today…

Funny enough, it was a fix for a bug in the software (“gnome-blog”:http://www.gnome.org/~seth/gnome-blog/) I am using to write this post.

It was mostly a matter of figuring out what was failing–a GConf interface was failing when the app tried to store an integer–and then searching around “Mark Pilgrim”:http://diveintomark.org/’s excellent (and freely-available) “Dive Into Python”:http://diveintopython.org.

If I was really together, I’d post a patch, but I didn’t back up the original.

New blog software

Still “Catalyst”:http://catalyst.perl.org/-based, but this time with a database back-end that also does asset (aka binary files) management.

For about 30 seconds during development the software actually had 100% test coverage.

How to setup Horde applications (IMP, Turba, et. al.) under apache and mod_fcgid

This is one of those times when I hope whatever pathetic amount of google-juice I have can aid others.

Googling around, I have found many oblique references to running horde/imp/turba/etc. using fastcgi, but very few specifics, and what specifics I found are mostly about using “lighttpd”:http://lighttpd.net/ (which is a fine server, but we’re not using it yet), and those for apache seemed wrong, or at least way over-complicated.

For maximum applicability in today’s world, I’m going to do this using “mod_fcgid”:http://fastcgi.coremail.cn/ under apache2, since mod_fastcgi is “basically dead”:http://fastcgi.com/archives/fastcgi-developers/2007-June/004717.html.

I’m also not including the security bits (denying access to config and lib directories and so forth) because they’re mostly boilerplate and you can get them from the Horde documentation anyway.

So, without further ado, here’s what you do:

ServerName mail.example.com
DirectoryIndex index.php
DocumentRoot /usr/share/horde3

SetHandler fcgid-script
FCGIWrapper /usr/lib/cgi-bin/php5 .php

p. Easy, huh? I’ve seen all sorts of baroque suggestions involving setting @Action@ directives and @AddHandler@ stuff and all sorts of things, but this simple invocation works just fine.

Share and Enjoy!

Truly, it must be a lot of work to suck as bad as Internet Explorer

You know, IE7 looks like a reasonable browser, but it’s not. To prove this it’s not even necessary to resort to something like CSS compliance, where no one else gets it entirely right either. It doesn’t even get HTTP right. That is, when confronted with a perfectly legitimate 204 status code if fucks up. Spectacularly.

Now why would someone be using a 204 status code? Well let’s look at “the language in the standard”:http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html:

bq.. 10.2.5 204 No Content

The server has fulfilled the request but does not need to return an entity-body, and might want to return updated metainformation. The response MAY include new or updated metainformation in the form of entity-headers, which if present SHOULD be associated with the requested variant.

If the client is a user agent, it SHOULD NOT change its document view from that which caused the request to be sent. This response is primarily intended to allow input for actions to take place without causing a change to the user agent’s active document view, although any new or updated metainformation SHOULD be applied to the document currently in the user agent’s active view.

The 204 response MUST NOT include a message-body, and thus is always terminated by the first empty line after the header fields.

p. Needless to say, if you’re doing ajax-style processing, and, say, letting people delete stuff out of a list, then having a code that let’s you say, “Yeah, we succeeded, there’s nothing more for us to say, nor any need for you to redisplay or anything, though.” is pretty damned useful.

Would you like to know what IE does to the 204 response code? It *changes it to 1223*.

For those not inculcated in the minutia of HTTP, all response codes are three digits, with the first digit indicating the general category of response (1 is informational, 2 is success, 3 is redirection, 4 is for client errors, 5 is for server errors), and the additional digits giving more specific information. 1223 isn’t even on the map here.

This has been a problem for at least five months, since that’s when “a bug was filed in the dojo toolkit’s trac about it”:http://trac.dojotoolkit.org/ticket/2418.

How can Microsoft be taken seriously? All that money, all those programmers, and they can’t do better than this?

I hate to say it, but I’m not surprised…

“Twitter is apparently finding that Rails doesn’t do massive scaling well”:http://www.radicalbehavior.com/5-question-interview-with-twitter-developer-alex-payne/, at least not the way that all the books will tell you to write stuff for it.

That doesn’t really surprise me. Making applications that scale well is _hard_. I’ve done it twice (though only one of those is a web app), and in both instances, what I found was a need to be able to muck around with the lowest-level code to be able to create app-specific speedups–whether that was writing my own hand-tuned demented-but-fast SQL or being able to back stuff up against memcache that most people wouldn’t think to put in there, like mutexes (and yes, I know it’s not a reliable storage medium, but given the rate at which it fails, we were willing to face potential issues).

And, honestly, I think “Catalyst”:http://catalyst.perl.org/ brings most of the great stuff about Rails while letting you get to the bare-metal if/when you need to.



There is no way it can be a good thing to have “an emulation of the Amiga written in javascript”:http://www.chiptune.com/.


MySQL continues to play catch-up

So, slashdot had “a story about the new Falcon storage engine for MySQL”:http://rss.slashdot.org/~r/Slashdot/slashdot/~3/69904889/article.pl. I don’t care for MySQL for a number of reasons, but some–though not all–could be alleviated with a better storage back-end. So I cruised over to check out “the Falcon feature-set”:http://www.mysql.org/doc/refman/5.1/en/se-falcon-features.html.

Funny enough, with the exception of the next to the last point–which is a potentially non-trivial point, I admit–this is all stuff that PostgreSQL has had for years.

One day, people are going to realize that MySQL has been playing catch-up for the last few years.  The amount of effort people have to do to work around MySQL’s long-standing issues with concurrency and lack of ACID-compliance in its default configuration, and it’s lack of good performance in the configurations that _do_ have those characteristics always amazes me, especially when PostgreSQL has Simply Worked for a long, long time.

Busy as hell

!/2006/12/29/1.png! I truly have been busy as hell the last few days. It’s a pretty good sort of busy, I suppose–I’ve rewritten huge chunks of code (we now no longer have a stand-alone spam checking daemon, it’s instead managed through postfix, which makes a certain sort of sense), implemented a number of new independent processes, etc.

And that’s just what’s happening in my little *intense development* branch; Chris and Dad have been working away on web stuff, with me providing the occasional prod to keep them on course.

Which brings me to the real purpose of this post: cool distributed version control tricks, AKA pretty pictures!

The image attached here is a graph of the revision history of the system. All those lines done together like spaghetti towards the bottom? That’s the sort of mildly disturbing pattern you see when people really start to get used to working with a distributed system–merging from one another as stuff goes along, perhaps from other branches they work on, back from the master, etc.

I don’t know why I think it’s so cool, but I do.


OK, so “Chris Toshok”:http://squeedlyspooch.com/blog/ has apparently “been dinking away”:http://squeedlyspooch.com/blog/archives/002069.html with making Turtle, which I gather is a GPS-monitoring package of some sort, hook up with “F-Spot”:http://f-spot.org/Main_Page, so that, based on timestamps in your photos, you can pinpoint where they were taken.

And then you can export the locations of the photos to google maps and the like.

It’s apparently all very much under development, doesn’t yet work for anyone else, etc., and, for all I know, it may already be a feature of every commercial photo management package in the world. But damn, it sure seems like a neat idea.

Having a Fred Brooks moment

So, one of my consistent gigs is working on “AnteSpam”:http://antespam.com/ for “Ironic Design”:http://ironicdesign.com/. We use “SpamAssassin”:http://spamassassin.apache.org/ as our engine, but we (well, mostly I) have built a bunch of infrastructure around it that allows us to do high-volume, redundant, high-availability deployment for domain customers, present held mail through a web interface, so on and so forth.

For the last 18 months or so, I’ve been embarked on a big rewrite, taking everything we’ve learned from having this system in production for the last three-and-a-half years and synthesizing it into a system that will run more accurately, more smoothly and with less maintenance and upkeep.

The rewrite is vastly superior in any number of ways. It increases performance by finding clever ways to avoid doing unnecessary work. It doesn’t move data around unnecessarily. The interface is largely ajax-based, making for better responsiveness. It changes its message handling to make it possible to do statistical learning on pristine copies of messages for better accuracy. It changes the way it represented various entities in terms of the data they stored for better accuracy. Really, tons of things are different, and it is truly kick-ass.

And about two weeks ago, I finally admitted that there was no way that the rewrite was going to see the light of day.

You see, the problem with starting from a clean slate is simply that of, “So, how do you make a transition.” And it was becoming increasingly clear that making a transition was going to be very, very hard. Nigh on impossible to manage in any reasonable time-frame. And some of the changes way down at the core were ones that we had no way to test under real loads, so it would only be as we started making transitions that we would know if they were going to work. So, really, I started having some real concerns months ago. And they just kept building and building.

And so I had my Fred Brooks moment, where I finally was able to admit that continuing down this path was going to be a mistake. Instead, the rewrite will, effectively, be declared a research project, and I’ll spend the next however long incorporating ideas–and probably even code–from it into the production system. So the grand new features will be introduced incrementally (and even doing that is scary enough in some ways), and we’re much less likely to end up getting ourselves up the creek without a paddle.

I’ve been sleeping a bit easier, even though there’s a fair bit of pressure to get this stuff rolled into the existing system. It’s just much more doable.

Prototype really is that cool

I’ve been stuck in javascript hell for the last couple of weeks, and “Prototype”:http://prototype.conio.net/ has been a large part of keeping me sane. That people then write all sorts of extensions for it, including one for “using data structures to generate HTML”:http://www.arantius.com/article/dollar-e is just great.

Ruby On Rails 1.1 is out

“Ruby On Rails 1.1 has been released”:http://weblog.rubyonrails.com/articles/2006/03/28/rails-1-1-rjs-active-record-respond_to-integration-tests-and-500-other-things.

Although I’m not using it now–my current project has too much code that’s always going to be Perl for me to consider switching languages–it’s something I’d seriously consider using for the future. It does seem silly that they just came out with a book about using it and then introduced a major upgrade, though.

Simon Willison teaches about JavaScript

Or, more accurately, _taught_ about javascript at the ETech conference. And he has very graciously made both “his slides”:http://www.flickr.com/photos/simon/sets/72057594077197868/ and “his notes”:http://simon.incutio.com/slides/2006/etech/javascript/js-reintroduction-notes.html available from his blog.

These are mostly oriented towards people who already know how to program, but haven’t taken JavaScript seriously. I’m definitely in that camp, and I found his notes to be a very clear, consise introduction to some of JS’s more advanced programming features–some of which I’d been exposed to already because of my spelunking around AJAX code, but I’d just been inferring their use rather than knowing exactly what was going on.

I guess I’m going to be learning how to use this eventually.

“Ingy”:http://blog.ingy.net/ has produced a javascript-based templating engine that can actually use templates intended for the Perl-based Template Toolkit. “He talks a little bit about it on his blog”:http://blog.ingy.net/2006/02/jemplate_a_template_toolkit_fo.html. The scary part is that this may have just made it much more reasonable for me to support both an Ajax-based and a “conventional” implementation of the AnteSpam front-end; no more having to maintain two ways of presenting data, etc.

Turns out I was wrong

The default theme that RockBox uses is much less pretty than that of the default iRiver firmware, but as you might have guessed from the way I said that, RockBox is themeable, and the non-default themes are at least as pretty as the iRiver firmware.

In other words, RockBox, err, rocks, in every conceivable way.

Mmmmm, yummy rockbox goodness

So, today I installed “RockBox”:http://www.rockbox.org/ on my “iRiver IHP-140”:http://www.rockbox.org/twiki/bin/view/Main/IriverPort.

It’s not as pretty as the original firmware (which, incidentally, I can still get to because, well, the RockBox guys are pretty smart), but it has two feature that I always wished for that the original firmware never had–1) the ability to use .m3u playlists that also work under “mpd”:http://musicpd.org/ (that is, ones that use forward slashes, as $DEITY intended), and 2) the ability to create playlists on the fly by queueing up tracks interactively.

I would seriously recommend it to anyone who has one of these players, and once the iPod port is to a reasonable point, I’d push people to use it on those too–you get access to actual *free* formats, like OGG and FLAC, instead of being tied to MP3 and AAC.

IE team calls for the end of IE hacks…

The IE 7 team is calling for people to stop using hacks to “work around issues with IE”:http://blogs.msdn.com/ie/archive/2005/10/12/480242.aspx.

It seems to me that the problem is that people with actual websites they want to behave have to use the hacks until IE 7 actually, you know, _ships_. Even on this site, the overwhelming majority of browser-based hits are still for a version of IE that has all these defects.