Automated structural classification of malware presented at DeepSec 2007

by Halvar Flake,

Summary : Malware authors are changing: In the past, their motivation was fame, nowadays it is mostly money. With the change of focus, development practices on the side of the malware authors are changing, too: Hand-crafted polymorphic assembly code is out, cheap-to-maintain-and-develop C/C++ code is in. Simple 'offline polymorphism' (e.g. clever recompile with small changes) and targeted attacks allow the evasion of traditional AV signatures without giving up on massive code reuse. To automatically deal with the (almost boringly) growing flood of malware, several classification methods have been proposed - ranging from looking at instructions n-grams and n-perms and other "features" to generate high-dimensional vectors to behavioral techniques. These techniques suffer from the drawback of high 'brittleness', e.g. they can be easily circumvented without requiring significant skill or time on the side of the malware author. This talk will discuss using structural (e.g. callgraph- and flowgraph-based) metrics for the automated classification of malware into families. The advantage of the discussed approach is it's relative 'suppleness' - it is resistant up to drastic measures such as 'recompiling a virus for a different architectures' etc. A significant investment of work is needed on the malware authors side to break the analysis. An example implementation of a fully automated malware classification system (VxClass) which automatically unpacks, disassembles, and compares new malware against an existing database will be discussed, and a number of horribly incorrect predictions about the future will be given.