Four Machine Learning Techniques that Tackle Scale (And Not Just By Increasing Accuracy) presented at FloCon2019 2019

by Lindsey Lack,

Summary : Because many of the most prominent successes of machine learning have been in the area of prediction via supervised learning, there has been a disproportionately large emphasis in the security realm on using machine learning to identify maliciousness. In the lab, analysis of a new model often looks promising, with any metric greater than 99% being deemed a success. Attempts at implementation in a real environment and at scale often run into irritating and humdrum issues: you can’t get the content you need in the right place, collecting features takes too long, you get some of the data but there are gaps, you didn’t realize that the real data would be so different from your training samples, your model seems to be oddly confident that things are bad but you can’t figure out why. And the most classic: with a billion samples, 99% isn’t so great. Striving for better accuracy in your model may help with the 99% problem, but does little for the other issues.This emphasis on classification accuracy overlooks the other ways that machine learning techniques can help. Several contemporary approaches lend themselves to helping with these issues of scale. In some cases, these techniques provide additional context that reduces the load on human analysis. For example, techniques that deal with the problem of adversarial examples can also be used to flag results that come from a previously unseen distribution. Bayesian approaches can provide insight about levels of confidence in conclusions. Also, techniques aimed at model explainability can provide more rapid troubleshooting of results. In other cases, architectures can enable scalable structures. Multi-stage machine learning models allow for distributed models and effectively merge goals of reducing scaling costs with achieving good model performance. Towards a similar goal, techniques have been developed to reduce the footprint of models, thereby allowing for wider distribution.This work presents an overview of the ways in which recent machine learning techniques can provide ancillary value—value beyond accurate predictions—that helps with the problems of scaling real-world implementations. In addition to an overview of the research, this work will provide specific examples of some of these techniques applied to security data.Attendees will Learn:Attendees will learn about ways in which recently developed machine learning techniques can help with some of the messier aspects of trying to apply a classification model to large-scale data. Learning about these issues and some of the potential remedies ahead of time will make the implementation of machine learning models to real-world security operations environments more likely to succeed.