False Positive Finder presented at Virus Bulletin 2010

by John Graham-cumming (Causata)

Every spam filter is forced to weigh the relative cost of a false positive (good mail quarantined as spam) and a false
negative (spam mail let through to the inbox). Whilst a spam mail in the inbox has an immediate small cost, a false
positive has a longer term larger cost. If the false positive represented potential business then even a single false
positive can vastly outweigh the cost of the spam surrounding it.
As spam volumes increase massively, manually searching a spam folder for messages has become impossible. But the cost of
a false positive has remained the same.
This paper reports on a project called 'False Positive Finder' that operates once a spam filter has acted to examine spam
held in a quarantine for potential false positives, presenting the end-user with a small number of messages to examine
from the sea of spam.
False Positive Finder uses a continuous scoring technique to look for false positives not just in recently arrived
messages, but in historic messages that may be hours, days or weeks old.