Spam - recognition by methods independent from text content presented at Virus Bulletin 2006

by Ralf Iffert (Internet security systems),

Tags: Security


Summary : "Today's spam detection methods are based upon common content analysis methods like Bayesian
filters and keyword analysis. These methods can be easily circumvented by spammers by simple
text variations.This presentation describes two other approaches that work completely independently from
any textual analysis:
Structure analysis: this method is based upon an analysis of the HTML structure of the
email. This structure is recorded using a type of meta language. The meta structure is
added to a spam database and all incoming emails can be compared with this structure rather
than the exact email text itself.
Flow analysis: this method is based upon an analysis of the flow of incoming emails.
If there are emails with identical content but different senders and different recipients
within a short time-frame, then a system could conclude that the emails are spam,
because there is no other kind of (mass) email fulfilling this criterion. This method is
highly effective when large quantities of potential spam messages are analysed, such as
in a Mail Service Provider environment. Furthermore, the comparison of the content can be
made resistant against the common spammers' tricks by incorporating other techniques like
the first method described in this talk, structure analysis.