Discovering Correlations: A Formal Definition of Causal Dependency Among Heterogeneous Events presented at IEEEEuroS&P 2019

by Charles Xosanavongsa, Eric Totel, Olivier Bettan,

Summary : In order to supervise the security of a large infrastructure, the administrator deploys multiple sensors and intrusion detection systems on several critical places in the system. It is easier to explain and detect attacks if more events are logged. Starting from a suspicious event (appearing as a log entry), the administrator can start his investigation by manually building the set of previous events that are linked to this event of interest. Accordingly, the administrator attempts to identify links among the logged events in order to retrieve those that correspond to the traces of the attacker's actions in the supervised system; previous work is aimed at building these connections. In practice, however, this type of link is not trivial to define and discover. Hence, there is a real necessity to describe and define formally the semantics of these links in literature. In this paper, a clear definition of this relationship, called contextual event causal dependency, is introduced and proposed. The work presented in this paper aims at defining a formal model that would ideally unify previous work on causal dependencies among heterogeneous events. We define a relationship among events that enables the discovery of all events, which can be considered as the cause (in the past) or the effect (in the future) of an event of interest (e.g., an indicator of compromise, produced by an attacker action). This model is gradually introduced and defined by merging two previously defined causality models from the distributed system and operating system research areas (i.e., Lamport's and d'Ausbourg's). Our model takes into consideration heterogeneous events that emanate from different abstraction layers (e.g., network, system, and application) with the main objective of formally defining a causal relationship among logged events. Thereafter, we show how existing implementations separately allow the computation of parts of the model. Finally, we describe the implementation and assessment of the model according to real attacks on distributed environments and its accuracy to extract all causally linked events related to a given attack event trace.