Detecting Randomly Generated Strings; A Language Based Approach presented at Defcon 2015

by Mahdi Namazifar,

Summary : Numerous botnets employ domain generation algorithms (DGA) to dynamically generate a large number of random domain names from which a small subset is selected for their command and control. A vast majority of DGA algorithms create random sequences of characters. In this work we present a novel language-based technique for detecting strings that are generate by chaining random characters. To evaluate randomness of a given string (domain name in this context) we lookup substrings of the string in the dictionary that we’ve built for this technique, and then we calculate a randomness score for the string based on several different factors including length of the string, number of languages that cover the substrings, etc. This score is used for determining whether the given string is a random sequence of characters. In order to evaluate the performance of this technique, on the one hand we use 9 known DGA algorithms to create random domain names as DGA domains, and on the other hand we use domain names from the Alexa 10,000 as likely non-DGA domains. The results show that our technique is more than 99% accurate in detecting random and non-random domain names.