Will Filters Kill Spam?
of the Computer Security Journal.)I get about 45 spams a day, but only about one a week makes it into
my inbox.
If everyone had this much of their spam filtered
out, spammers would give up sending it.Will that happen?The first generation of spam filters used rules to recognize
specific spam features. Now a new generation of statistical
spam filters seems to offer significantly better performance.
Statistical filters look at the
entire contents of each incoming email and decide whether it’s spam
based on its overall similarity to previous spams. This new
kind of filter routinely catches over 99% of current spam with near
zero false positives.The simplest statistical filter can be described in a paragraph.
Users discard all their spam in a separate trash can. At intervals,
a program looks through all the user’s email and, for each token,
calculates the ratio of spam occurrences to total occurrences.
For example, if “cash” occurs in 200 of 1000 spams
and 3 of 500 nonspam emails, its spam probability is
(200/1000) / (3/500 + 200/1000)
or .971. When a new email arrives, extract
all the tokens and find the fifteen with probabilities p1…p15
furthest (in either direction) from .5. The probability that the
mail is a spam is
p1p2…p15