Well, that was a productive day

Generally, Saturday is my day Away From The Machine. Some Saturdays I don’t even log in–no email, no web surfing, nothing.

Unusually, though, I did some work today, and mighty productive it was, too.

One of the things we need to get a handle on for AnteSpam is building (and maintaining) a corpus of messages. Having a good corpus gives us what we need to build a good Bayes database, which will hopefully keep us nice and accurate, and it will also allow us to contribute some to the SpamAssassin development by running mass-checks and generally giving input on how well things are working.

At the moment we just grab random messages that come through the system and someone has to go in and classify those–which is tough, because what might be spam to me is someone else’s precious newsletter. I end up deleting a lot of messages that I think are probably spam, but might not be–and it’s better to be conservative.

Better, though, is the new capability I implemented. In addition to the random messages, it’s now possible to send messages to some special addresses, and those will be picked up and put in the corpus appropriately marked (but not as verified–we don’t want anyone to be able to screw us up by hitting us with a bunch of spam marked as ham or anything).

Combined with some options to let people submit mail to the corpus when they see good or bad messages in their sideline folder, this could work out really well. I, personally, get thousands of messages each day, good and bad, that I can bounce into these addresses–instant training ground.

I’m pretty stoked.

A relatively unproductive day

I hate to say it–especially where prospective employers might see it–but my version of productivity is kind of…nonstandard.

That is, I have a tendency to sit around play guitar, read email, etc. for a long time, then sit down, crank out in a few minutes something that might take someone else hours to do, and then move on to other activites.

The sad fact is that this frustrates me, perhaps more than anyone else–I mean, from the perspective of my employers, who aren’t actually sitting and watching what I do, I am still terribly productive. I get more stuff done than a lot of other people, in a shorter billable time. That’s great for customers, but tough on the bottom line sometimes.

But it’s also tough on me, in part because I’m not in an office environment–I don’t really have anyone to chat with or go to lunch with, no meetings to go to, or any of those other general wastes of you have you’re in an office.

So even though I’m actually being just as productive as I likely would in an office setting, I always feel less productive. It’s tough.

OK, I was wrong.

So I ended up getting back into the swing of things, and ended up making some fairly significant revisions to the code for the main daemon at the heart of AnteSpam. No really groundbreaking changes to the core functionality, but some optimizations, and some cleanup of the code. It may be a little more accessible now.

How does one do it?

So some super-gigantic catastrophe severed some huge wad of fiber somewhere in my general area, and I find myself without high-speed connectivity.

Even ignoring the fact that I was all stoked yesterday to start on a project that was going to involve doing a lot of programming on another system, it’s frustrating as hell to have to slow down to dialup speeds.

More frustrating, though, is the fact that what I really would like to do is have multiple connections–DSL, cable, backup dial-up–that all converge into one box, and which intelligently fails over (or multipaths). You can do this with a Linux box–and it’s not like I’ve not run a Linux box as my gateway before–but finding a Linux box that is as quiet as my Linksys box would be virtually impossible. And now that I’ve got things this quiet in here, I’m loathe to go back to the perpetually screaming fans situation.

No good database modellers in Linux

I’m not usually much of a “visual” guy. I’d rather read a text description of most stuff than look at pictures.

The one great exception to this is when it comes to modelling large databases–say, more than 10 tables. At that point, staring at line after line of SQL simply doesn’t cut it–I can’t discern the problems at a glance, there’s lots of paging around, etc.

For the moment I use the UML mode of Dia, which is an OK tool, plus a custom XSL stylesheet for coverting the Dia’s output into SQL. It works OK–you do get a diagram out of it, and you can coerce it to produce decent SQL–but it’s just not…fluid. There’s a lot of fiddly stuff that requires very careful work, and supporting things like foreign key references always requires more work than it should. This makes creating the diagram that much more work and frustration.

We’re growing

When you talk actual numbers it sounds pitiful in a way, but AnteSpam is growing consistently, if not super-fast. We’re up to 18 paying domains, and there are reportedly several “ready to land” any moment now.

If I ever felt for a moment that this wouldn’t sell because it didn’t provide value to the customers, I need only look at the stats to see that we’ve got domains that get three spam mails for every good mail. Amazing.