What’s in a Name: Anomaly Detection

Analyzing real life cases, it’s easy to notice that the issue of detecting anomalies is usually met in the context of various fields of application, including but not limited to intrusion detection, fraud detection, failure detection, monitoring of system status, event detection in sensor networks, and eco-system disorder indicators.


Fig.1. Anomaly Detection

In today’s data-driven world, information security attracts special attention when it comes to pattern changes detection. The continuous development and complexity of information processing automation presupposes the decisive role of security in information technology. The year 2015 was especially rich in cyberattacks with companies such as T-Mobile, Kaspersky, and Anthem having had their security compromised and all sorts of personal information about users exposed.

Having a close look at the informational environment of any organization, it’s not hard to see that applying traditional safety measures is a rather prohibitive approach, and often ineffective. Since there is a whole range of possible scenarios in a user’s workflow within the security system, basic rules face multiple exceptions, reducing preventive protection and making the regular analysis of inner threats identification more complicated. Detecting external attacks is also becoming more and more problematic since attackers are aware of typical intrusion detection means and apply covert agents for the attack. Here are main types of the network security breach: 


Fig. 2.  Network Security

These processes may be spotted, for instance, due to the increased activity of certain ports, new unusual services, changes in a user’s work with network resources, etc.

One possible solution to this problem is the development of systems that identify unusual user network behavior, based on analysis of network activity logs. Using data mining techniques, these systems reveal indicative behavior patterns and draw conclusions about behavior that differs from what’s considered conventional. The systems may though be self-adaptive, minimizing human involvement in configuring the system. Without taking into account an organization’s specifics, such systems are of particular interest to specialists in the field of machine learning and data mining.

Anomaly Detection – Unsupervised Approach

Following its etymology, an anomaly is any deviation from a rule. Consequently any violation of the standard behavior traced in historical data can be interpreted as an anomaly. However, pattern violation may be both known in advance and established as a result of analysis. In this context, any problem related to the detection of non-standard behavior leads to a search of the underlying condition (baseline) and the classification of each event is recorded in the system as corresponding to or inconsistent with the found prototype.

Talking about information security, this task may be accomplished by means of analyzing user activity and network equipment to detect irregular behavior in the external and internal network traffic (analysis of Netflow logs) which may serve as a signal of both internal activity and an attempted external attack. So, an anomaly is any event which may be estimated as statistically impossible in accordance with analysis results of the network activity protocol. Here are three types of detected anomalies: 

  1. Significant deviation of the observed values from the expected value
  2. Failure in the process reflecting the change of the measured parameter within the surveillance area
  3. An atypical set of observed values in the measured parameters

Standard Netflow logs contain the following set parameters:

  1. Ingress interface (SNMP ifIndex)
  2. Source IP address
  3. Destination IP address
  4. IP protocol
  5. Source port for UDP or TCP, 0 for other protocols
  6. Destination port for UDP or TCP, type and code for ICMP, or 0 for other protocols
  7. IP Type of Service

Interpretation of the network protocol is not available here, so we don’t have information on which recorded events can be described as the norm, and which events are considered as a deviation. Therefore, to detect anomalies, the network activity protocol analysis should possess the following features:

  1. Collection and storage of network activity results data
  2. Representation of network activity in the form of numerical characteristics series, supplemented by non-numeric attributes (qualifying factors), including time markers
  3. Identification of hidden patterns in data which provide the basis for formation of the  behavior pattern
  4. Evaluation of new observations for pattern matching

The general system structure can be presented as follows: 


Fig. 3.  Anomaly Detection - Infrastructure

The core of the proposed system lies in the ensemble of models, each of which allows estimation of the average statistical activity of users (or groups of users) and classifies the observations recorded in the system as corresponding to normal or abnormal. The following metrics and measurement categories were used as the data source for constructing models: 

According to the anomaly types that were described above, we tried to combine three different models: on one hand, to evaluate the nature of the metrics change within different categories considering the accumulated data as a process extended in time; on the other hand, to analyze the value of metrics in different categories together for the isolated period of time.

  1. Dynamic Threshold Model allows an accepted level of assignment (measured value) to be set for specific time intervals (time of a day, weeks, months, etc.). Subsequently, each observed value is evaluated for compliance with the threshold, making it possible to identify cases of the anomalies appearance. This model is a solution used to detect the first type of anomalies.
  2. Association Rules Based Approach allows the activity of the network to be described as an unrelated collection of events, and as a consequence, presenting it in the form of a stochastic process. Any event that is characterized by low probability can be interpreted as an anomaly.
  3. Time Series Clustering allows the identification of common regularities in the time series structure, capturing any deviation from the established pattern at the same time (fitting the second type of anomalies).


Today, more than ever, the urgent need to solve the problem of anomaly detection lies in the fact that any deviation from a general picture showing the current state of the system may carry important information about the existing issue. Ignoring deviations may lead to undesirable outcomes: for example, an unusual blackout on an X-ray image may serve as evidence of cancer. Prepare and prevent, they say.