[LINK] Legit messages adopting anti-spam techniques

Irene Graham rene.lk at libertus.net
Tue Feb 17 16:34:17 EST 2004


On Tue, 17 Feb 2004 11:05:25 +1100 jeff.evans at iird.vic.gov.au wrote:
[...]
>My email to Link (25/11/2002 ) mentioned similar techniques used by an
>email newsletter author:

If that was in 2002, some spam filters have become much smarter since then.

[...]
>   author Tara Calishain has been forced to censor her words because of her
>   list's recipient's spam filter software, eg "now the N*tional Library of
>   Scotland (name slightly altered as the first word apparently trips sp*m
>   filters) " it seems to me that this trend will either see us:
>   a) invent a constantly evolving series of synonyms to keep ahead of the
>   filters or,
>   b) like Tara, resort to  Pig Latin  "Words that might get ResearchBuzz
>   filtered by an overzealous filter will be written in Pig Latin from now
>   on. eefray! ooway! owzayay!) ..."     <
>   http://www.researchbuzz.com/news/2002/sep12sep1802.shtml>
>   c) be forced to develop even more obscure jargon-ridden communication
>   than ever.
>
>   Do any Linkers have reason to believe that one day we'll be able to
>   avoid self-censorship of this type? 

Self-censorship of that type is already more likely to result in the message
being designated as spam, than using proper English, depending on the
particular spam filter being used, i.e. how intelligent it is.

> Can Spam filters one day become
>   redundant? Will "white" and "black" listing of addresses work? Will
>   software Agents ever be able to cope with sophistication needed to parse
>   our confusing language?

Currently, I'm of the view that spam filters that use rules sets *and* a
bayesian filter component are likely to continue to be much more practical and
effective than using white and black listing of addresses.  

A couple of months ago I installed SpamAssassin as implemented in NoSpamToday
for Windows www.no-spam-today.com (which has only been on the market since
about last November). I then spent a bit of time figuring out how to train the
bayesian filter in the SpamAssassin part. Since then I've been absolutely
amazed at how clever it is. It catches about 99.5% of the spam I receive and
I've had no 'false positives' that I would actually call a false positive. The
'FP's have been when someone posts a message containing spammy stuff to a list
like Link. (The message Jeff sent to the list almost got caught because of the
use of spam filter 'avoidance' language, which is a sure sign of spam to
sophisticated spam filters). My filter could be tweaked to stop that
happening, but I haven't bothered to get around to doing it, because it's
usually smart enough to work out that it's a message discussing spam, not
actual spam.

I think bayesian filters are the key, and I think they probably work better
when trained on the type of email a particular individual (or small group of
individuals) receives rather than when trained on the wide range of types of
email that travel through an ISP's system. They can be pretty good even on an
ISP's system (probably very good depending on how careful the ISP is about
proper training of the bayes filter).

Irene
 




More information about the Link mailing list