I have two recommendations, stemming from the thrills and chills I experience daily running “a commercial anti-spam service”:http://antespam.com/:
* Use some supplementary rules
Go to the “SpamAssassin Custom Rule Emporium”:http://www.merchantsoverseas.com/wwwroot/gorilla/sa_rules.htm, and pick up, at the very least, copies of backhair.cf, chickenpox.cf, weedsonly.cf and bigevil.cf. These have shown an enormous benefit for us.
The great thing is that all you have to do is drop these rules in your SpamAssassin rules directory (our Debian boxes use /etc/spamassassin for local stuff) and it will immediately start using them.
* Don’t let spamassassin automatically train the bayes database
Now our situation differs somewhat from most people–we do filtering for a couple of hundred domains, so we see a very wide range of email, and our users often don’t have the facilities (since LookOut! sucks so much) or the time to get all incorrectly-classified messages fed back into the system. If you are diligent in doing this for every mis-classified message you see, your results will probably be good.
Still, for us, auto-learning was a _disaster_.
We have found it much more effective to pull and classify random mails going through the system, and build a bayes database exclusively from that corpus; the fact is, the accessibility of SpamAssassin’s rule set means that clever people can find holes, and although closing them will happen quickly, it might not be before a number of messages go through your system with scores low enough that they cause SpamAssassin to learn them as ham, and _Whammo!_ you’ve got a bayes database that is going to start working counter to your desires unless you make sure each and every one of those messages gets re-learned as spam.
So my recommendation is that you add a ‘bayes_auto_learn 0’ parameter to your config file.
Of course, the real solution is to sign up with “AnteSpam”:http://antespam.com/, and let us take care of the maintenance headaches.