A spam novelty detection system presented at Virus Bulletin 2009

by Claudiu cristian Musat (Bitdefender),

URL : http://www.virusbtn.com/conference/vb2009/abstracts/MusatMiloiuMitrica.xml

Summary : Spam keeps changing, but so far there have not been any quantitative studies to determine the rate at which novelty
appears, nor any to identify the proportion of received messages that are really new and of those which are variations of
older spam. In this work we present a means of determining whether a newly received message is similar to previously seen ones or
somehow different. Furthermore, we provide an apparatus that focuses on the spam samples that are most different from
the ones already known. We use a wave oriented k-means engine to cluster messages with a similar description in the chosen feature space. Then we
use another instance of the engine to cluster the previously obtained spam clusters and single out the most different ones.
Finally, by expanding the timeframe in a final step we detect long-term cluster similarities. The result of this process
is a stream of clusters comprised of the messages that least resemble older ones.
Finding the real novelty is important because it enables analysts to focus on those messages and thus further reduce the
false negative rate.