[TAG] Spam missing Subject header
John Karns
jkarns at csd.net
Mon Jul 5 22:15:05 MSD 2004
On Mon, 5 Jul 2004, Ben Okopnik so eloquently said:
> In that case, you can always tell procmail to file these things to your
> spam folder - or just /dev/null them, if you're that brave. :)
Well all these months that I've been feeding the spamassassin Bayesian
(sp?) filter to train it, it does a darn good job of trapping the spam,
and I get virtually no false positives. But I still prefer to take a
quick glance through the caughtspam folder in an effort to make sure I
don't trash something important or of interest.
However, the spammers are dynamic enough that I still have an estimated
50% spam ratio in the mail that makes it through (the volume of legit mail
that I receive is small - I might receive 20 to 30 legit msgs a day, and
see about the same number of spam msgs in my INBOX that made it past SA).
To SA's credit though, it traps 90% - 95% of the spam - saves a lot of
time and pain.
What would seem to me to be a nice addition to SA would be to have a
secondary, post-SA filter with a different kind of algorithm which would
catch subjects like V1AGrA or V 1 a g r a, and be able to filter out msgs
that had non-intelligable words in the msg body. Those fall into two
subcategories: 1) non-word gibberish and 2) words with no contextual
meaning. Intuitively it seems to me that the first category would be the
easier one to target - do a dictionary lookup on the words and reject or
accept the result based on a statistical rating something like SA does for
its evaluation. The 2nd might be considerably more difficult, and might
require some use of AI. To my knowledge that kind of utility doesn't
exist, or at least I haven't seen any reference to such an animal in my
readings.
> ----------------------------------------------------------------------
> # Toss the stuff w/o a "Subject: " header
> :0
> * ? [ -z "`formail -X 'Subject'`" ]
> /dev/null
> ----------------------------------------------------------------------
>
> Note that this will not match an empty-but-existing subject (you can use
> "-x" instead of "-X" if you want that) - it will only trash the ones
> without a valid "Subject: " field.
That's what I was looking for! Thanks, Ben. For some reason I had the
formail utility pegged more as a tool for constructing email than parsing
it. I have to confess though, that it's been a long time since I've
looked at it, and then it was only briefly.
--
John Karns
More information about the TAG
mailing list