Well, that was a productive day

Generally, Saturday is my day Away From The Machine. Some Saturdays I don’t even log in–no email, no web surfing, nothing.

Unusually, though, I did some work today, and mighty productive it was, too.

One of the things we need to get a handle on for “AnteSpam”:http://antespam.com/ is building (and maintaining) a corpus of messages. Having a good corpus gives us what we need to build a good Bayes database, which will hopefully keep us nice and accurate, and it will also allow us to contribute some to the SpamAssassin development by running mass-checks and generally giving input on how well things are working.

At the moment we just grab random messages that come through the system and someone has to go in and classify those–which is tough, because what might be spam to me is someone else’s precious newsletter. I end up deleting a lot of messages that I think are probably spam, but might not be–and it’s better to be conservative.

Better, though, is the new capability I implemented. In addition to the random messages, it’s now possible to send messages to some special addresses, and those will be picked up and put in the corpus appropriately marked (but not as verified–we don’t want anyone to be able to screw us up by hitting us with a bunch of spam marked as ham or anything).

Combined with some options to let people submit mail to the corpus when they see good or bad messages in their sideline folder, this could work out really well. I, personally, get thousands of messages each day, good and bad, that I can bounce into these addresses–instant training ground.

I’m pretty stoked.

Published by

Michael Alan Dorman

Yogi, brigand, programmer, thief, musician, Republican, cook. I leave it to you figure out which ones are accurate.