Lunchtime Table Talk: Data Science Behind the Scenes, Part 1 - The Data Science Process for Network Security presented at FloCon2019 2019

by Andrew Fast,

Summary : Data science is rapidly becoming an integral part of the network security industry. Although widespread applications of data science in network security are relatively recent, data science has roots going back decades. Unfortunately, this maturity presents an obstacle for those who are new to the field and seeking to learn. Furthermore, most presentations (whether spoken or written) tend to focus only on the final model and performance results, pushing to the background many of the critical intermediate steps required for success.The goal of these “Behind the Scenes” lunchtime talks is to help bridge the gap between network analysts and data scientists by providing an overview of some of the foundational, but often unseen, steps that lead to a successful data science result. These talks are meant to be accessible to those desiring to learn more about data science and are intended to benefit network analysts and data scientists alike.Intended Audience: Anyone who does, leads or manages data science projects and wants to go behind the models to learn strategies for increasing data science success.Behind the Scenes, Part 1: The Data Science Process for Network SecurityThomas Edison is credited with saying that “genius is 1% inspiration and 99% perspiration.” As Edison experienced, the path to success can be a lengthy and circuitous one. To help shorten the journey, it can be helpful to rely on industry frameworks. Most network analysts are familiar with one or more of the security frameworks such as MITRE’s ATT&CK Framework or Lockheed Martin’s Cyber Kill Chain. Similarly, there are several well-known industry processes for taking a data science project from inspiration, through perspiration, to completion including CRISP-DM, SEMMA, and the Team Data Science Process. We go behind the scenes to explore the similarities between these processes and show how to use them to effectively guide data science projects on network data.