Manually Searching Advisories and Blogs for Threat Data--"Who's Got Time for That?" presented at Shmoocon 2015

by Elvis Hovor, Shimon Modi,

Summary : Threat intelligence is generating a lot of buzz, and many vendors/industry driven initiatives are focused on addressing how enterprises can leverage threat intelligence. Despite the appearance that cyber threat intelligence is structured and well formatted, most enterprise receive threat intelligence from external sources in unstructured text format, in forms of advisories, email bulletins, chat forums etc. Threat intelligence is most relevant when it is timely and actionable. The status quo of using human analysts to process threat data and determine its relevance is inefficient and does not scale either.
We have developed a solution that increases automation of extracting threat data from unstructured sources and mapping them to the various STIX data constructs, in effect converting it into a structured form. This has several benefits:
Allows human analyst to focus on analysis, and not spend time parsing text through a document
Increases machine readability by converting incoming data into structured format
Apply customized contextualization and prioritization filters to the extraction process
We have developed this solution on OpenNLP, a natural language processing toolkit. We will demonstrate how to process a batch of threat advisories and prioritize them for analysts to review based on predefined analyst preferences.