Talk : Information Retrieval and Machine Learning Tools for Interactive Bug Hunting presented at Hackitoergosum 2013

by Fabian Yamaguchi,

Tags: Security

Summary : When hunting bugs in large code bases you have never seen before, tools allowing quick navigation and recognition of patterns can be of great help. Surprisingly, most navigational features of popular IDEs and dedicated code understanding tools offer only very basic search capabilities and seldom exploit language information.
In this talk, we present a new language-aware open-source code indexing tool, which allows you to mine code for bugs by quickly executing complex queries on large C/C++ code. The tool offers a fuzzy parser, allowing code with missing headers to be processed in most cases. If you have ever wanted to search code fragments, which “fell off some truck” for all local variables passed as third arguments to memcpy, which hold l-values of assignments involving multiplications as r-values where all variables involved do not occur in conditions, this tool is for you.
We then show that the fine-grained representation of code stored in the generated index is a valuable source of information for code analysis tools. In particular, we present a second tool, which uses this information alone to automatically derive simple programming rules such as “the return value of malloc should be checked when processing packets from the network” using machine learning techniques. By employing anomaly detection, this allows us to
highlight violations of these rules and present them to auditors as they browse code. We close by providing a number of examples demonstrating that these tools are “nice to have” in practice.