The Case of Adversarial Inputs for Secure Similarity Approximation Protocols presented at IEEEEuroS&P 2019

by Petros Efstathopoulos, Evgenios M. Kornaropoulos,

Summary : Computing similarity between high-dimensional data is a fundamental problem in data mining and information retrieval, with numerous applications---such as e-discovery and patient similarity. To address the relevant performance and scalability challenges, approximation methods are employed. A common characteristic among all privacy-preserving approximation protocols based on sketching is that the sketching is performed locally and is based on common randomness. Inspired by the power of attacks on machine learning models, we introduce the study of adversarial inputs for secure similarity approximations. To formally capture the framework of this family of attacks we present a new threat model where a party is assumed to use the common randomness to perturb her input 1) offline, and 2) before the execution of any secure protocol, so as to steer the approximation result to a maliciously chosen output. We define perturbation attacks under this adversarial model and propose attacks for the techniques of minhash and cosine sketching. We demonstrate the simplicity and effectiveness of the attacks by measuring their success on synthetic and real data from the areas of e-discovery and patient similarity. To mitigate such perturbation attacks we propose a server-aided architecture, where an additional party, the server, assists in the secure similarity approximation by handling the common randomness as private data. We revise and introduce the necessary secure protocols so as to apply minhash and cosine sketching techniques in the server-aided architecture. Our implementation demonstrates that this new design can mitigate offline perturbation attacks without sacrificing the efficiency and scalability of the reconstruction protocol.