The Cost of Learning from the Best: How Prior Knowledge Weakens the Security of Deep Neural Networks presented at BlackHatAsia2019 2019

by Yulong Zhang, Tao Wei, Qian Feng, Yunhan Jack Jia, Zhenyu Zhong, Yantao Lu,

Summary : Deep Neural Networks (DNNs) have been found vulnerable to adversarial examples – inputs that an attacker has intentionally designed to cause the model to make mistakes. Fortunately, generating adversarial examples usually requires white-box access to the victim model, and adversarial attacks against black-box models are still imaginary without running unlimited brute-force search. Thus, keeping models in the cloud can usually give a (false) sense of security. Our goal of this talk is to shed light on a new hidden attack vector of DNNs, which allows adversarial examples to be efficiently generated against black-box models used in mission-critical tasks such as facial recognition, image classification, and autonomous driving.We report an intriguing vulnerability that allows an attacker to effectively attack black-box object detection DNNs using adversarial examples generated from white-box open source models. This vulnerability comes from a prevailing strategy used in deep learning areas to alleviate the thirst for data, called transfer learning, where highly tuned and complex models that have been well-trained on huge datasets are used as pre-trained layers for other similar applications. It is also a recommended practice by major deep learning service providers, including Google Cloud ML and Microsoft Cognitive Toolkit. However, despite its appeal as a solution to the data scarcity problem, we show that the model similarity introduced by transfer learning also creates a more attractive and vulnerable target for attackers. In the talk, we will first present the alarming results from our measurement study that most main-stream object detection models are adopting those winning image classification models in the ImageNet contest as their first few layers, to extract low-level features in the image. Then we will discuss the attack algorithms, as well as the techniques to identify which pre-trained feature extractor is used by target object detection model with limited queries. We will demo how the adversarial examples generated using our algorithms from YOLOV3, is able to attack other object detection DNNs, that are usually considered using totally different techniques. Finally, we wrap up the presentation with a demo on attacking models from commercial machine-learning-as-a-service provider to make audience aware that keeping models proprietary isn't a guarantee for security against adversarial examples.