New Analytics Approaches to Detecting Fraud

A typical organization loses about 5% of its revenues to fraud each year ( The total cost of non-health insurance fraud in the U.S. is estimated to be more than $40 billion per year ( These numbers stress the importance and need of finding sophisticated tools to both detect and prevent fraud. Big data and analytics offer a new valuable toolkit in the fight against fraud. 

The classic approach to fraud detection is an “expert-based approach,” meaning that it builds upon the experience and business knowledge of the fraud analyst. This approach typically involves a manual investigation of a suspicious case, which may have been signaled, for instance, by a customer complaining of being charged for transactions he did not do. A disputed transaction may indicate a new fraud mechanism has been developed by fraudsters, and therefore requires a detailed investigation to understand and subsequently address the new mechanism. Expert-based approaches to fraud detection typically make use of business rules, such as the following:


• Amount of claim is above threshold OR

• Severe injury, but no doctor report OR

• Claimant has multiple versions of the accident


• Flag claim as suspicious

However, an expert approach suffers from a number of disadvantages. Rule bases are typically expensive to build, since they require advanced manual input, and often turn out to be difficult to manage. Rules have to be kept up-to-date and only trigger real fraudulent cases, since every signaled case requires human follow-up. Therefore, the main challenge concerns keeping the rule base lean and effective; in other words, deciding upon when and which rules to add, remove, update, or merge.

Given these problems, a shift is taking place toward analytics-based fraud detection methodologies for three reasons:

• Precision: Analytics-based fraud detection methodologies offer an increased detection power compared to expert approaches.

• Operational efficiency: An increasing number of cases need to be analyzed, requiring an automated process as offered by analytical fraud detection methodologies.

• Cost efficiency: A more automated, and, as such, more cost-efficient, approach to develop a fraud detection system, as offered by analytical methodologies, is preferred compared to maintaining an expensive rule base.

Various analytical approaches have been suggested for fraud detection. Detection mechanisms based on descriptive analytics aim to find behavior that deviates from normal behavior, or, in other words, to detect anomalies. These techniques learn from historical observations and are called unsupervised since they do not require these observations to be labeled as either a fraudulent or a nonfraudulent example case. Descriptive analytics, however, has been shown to be prone to deception, for example, by using camouflage-like fraud strategies adopted by fraudsters.

Predictive analytics aims to learn from historical observations in order to retrieve patterns which allow differentiation between normal and fraudulent behavior. These techniques are intended to find silent alarms, the parts of their tracks that fraudsters cannot cover up. Predictive analytics can be applied both to predict fraud as well as to estimate the amount of fraud. 

Unfortunately, predictive analytics has its limitations as well. More specifically, it requires historical examples to learn from, such as a labeled dataset of historically observed fraud behavior. This reduces its power to detect drastically different fraud types based on new mechanisms that have not been detected thus far and are hence not included in the historical database. Descriptive analytics may perform better with respect to detecting such new fraud mechanisms, at least if a new fraud mechanism leads to detectable deviations from normal patterns. This illustrates the complementary nature of predictive and descriptive analytics and motivates the use of both methods in developing a powerful fraud detection system.

A third type of complementary tool concerns social network analysis, which further extends the abilities of the fraud detection system by learning and detecting characteristics of fraudulent behavior in a network of linked entities. Social network analytics is the newest tool in our toolbox to fight fraud and has been proven to be a very powerful approach. It introduces an extra source of information in the analysis—the relationships between entities—and as such may contribute to uncovering particular patterns indicating fraud. As an example, consider an insurance fraud detection setting. Here, the network may consist of nodes such as claimant, insured, car, car repair shop, and mobile phone. Studying the relationships between these nodes may provide valuable insights into complex fraud patterns.

All three of these different analytical techniques are complementary to one another since they focus on different aspects of fraud. All three techniques reinforce each other when applied in a combined setup. When developing a fraud detection system, an organization will likely follow the order in which the different tools have been introduced. As a first step, an expert-based rule engine may be developed, which, in a second step, may be complemented by descriptive analytics and subsequently by predictive and social network analytics. Developing a fraud detection system in this order allows the organization to gain expertise and insight in a stepwise manner, thereby facilitating the following step.

This article is based on “Fraud Analytics Using Descriptive, Predictive & Social Network Techniques, The essential guide to Data Science for Fraud Detection,” Wiley, 2015, authored by Bart Baesens, Veronique Van Vlasselaer, Wouter Verbeke.

Image courtesy of Shutterstock.


Subscribe to Big Data Quarterly E-Edition