Complex Event Processing is only a few years old, but it is rapidly entering the mainstream in a large number of fields that require continuous analysis of large volumes of real-time data.
Examples of enterprises using CEP technology today to build highly sophisticated real-time applications include a travel portal using CEP to monitor user behavior patterns on the website to personalize and optimize the user experience. In this case, each page viewed, link clicked, or search performed corresponds to an individual event. Another use case includes a hospital using CEP to look for people stealing controlled substances from the pharmacy. RFID readers located throughout the hospital continuously monitor the location of RFID tags attached to people and objects of interest.
Diverse as these applications may be, they have something in common in that they analyze massive volumes of real-time data. This analysis cannot rely on traditional relational databases-databases were never designed to handle 100,000s of events per second, with processing latency measured in milliseconds or less. Therefore, the only way to build these applications until recently was to code them in C/C++/C# or Java.
This works for simple applications, but building, debugging, deploying, scaling, and maintaining sophisticated CEP applications written in a low-level language is very difficult. How do you know whether your CEP application is simple, complex, or right in the middle?
CEP Complexity Dimensions
To give you a flavor of CEP complexity dimensions, consider the rough guide below. Answer the eight questions below, and sum up the total number of points.
What is the combined data rate you need to support?
- Less than 1 event/sec: 0
- 1-100 events/sec: 1
- 100-1000 events/sec: 2
- 1000-10,000 events/sec: 3
- 10,000-100,000 events/sec: 4
- 100,000+ events/sec: 5
A good CEP engine has advanced optimizations to process and analyze more than 10,000 events/sec on a single CPU core. These optimizations include on-the-fly data indexing, static and dynamic query rewriting, and advanced memory management. Higher rates are supported through the use of clustering.
What is the event processing latency you need to guarantee?
- Minutes/Hours/Days: 0
- Seconds: 1
- 100-1000 milliseconds: 2
- 10-100 milliseconds: 3
- 1-10 milliseconds: 4
- Less than 1 millisecond: 5
Maintaining very low latency, especially submillisecond latency, requires unique optimizations for the processing model, threading model, concurrency model, and memory management. Good CEP engines provide these optimizations, so that developers do not have to re-implement them from scratch.
How many data streams do you have?
- One stream: 1
- More than one, but I don't need to synchronize events across streams: 2
- More than one, and I need to synchronize events across streams, and handle delayed and out-of-order events: 5
Synchronizing events across multiple streams is hard, especially if events can come from different sources, be delayed, and arrive out of order. A good CEP engine has facilities for synchronizing multiple data streams and sorting out-of-order messages, so that developers do not have to worry about this.
How large/complex are your input events?
- Small flat events (1-10 fields): 1
- Large flat events (10+ fields): 2
- Large non-flat / hierarchical / XML events: 5
Large events require special memory management optimizations, and XML events need special efficient processing for serialization/deserialization. A good CEP engine has these out of the box.
What do most of your queries look like?
- Filtering and single-event transformations: 1
- Aggregation over different kinds of windows: 3
- Joins and state management: 4
- Event Pattern Matching: 5
While filtering and transformation queries operate on one event at a time and do not require state management, other kinds of queries, especially joins and pattern matches over multiple streams, are significantly more difficult to implement correctly. A good CEP engine has all the built-in primitives to greatly reduce the amount of code that needs to be written for these queries.
How many queries do you have?
- 1-10: 1
- 10-100: 3
- 100+: 5
CEP engines use the notion of continuous queries, which means that the queries are pre-registered with the engine. Output of one query may feed the input of another query, and the larger the network of queries you have, the more opportunities for optimizations there are.
Do you need to interface with databases?
- No: 0
- Infrequent reads or writes: 2
- Frequent reads/writes, but no read/write caching: 3
- Frequent reads/writes, need basic caching: 4
- Frequent reads/writes, need granular on-demand row caching: 5
While CEP applications are defined as applications that analyze real-time events, most applications require rich interaction with databases, too. Sometimes this interaction is very complex, involves data caching, batching, and asynchronous I/O. A good CEP engine provides a rich framework for accomplishing this.
What are your high-availability/data persistence requirements?
- None: 0
- I need to recover from failures, but I don't care if I lose data: 2
- I don't want to lose data, but I'm ok with losing a few events here and there: 4
- I never, ever, want to lose a single event!: 5
Advanced CEP applications are increasingly mission-critical. Building a system that can provably recover from failures without suffering much downtime or data loss without the right foundation is very difficult. A good CEP engine can certainly help.
To interpret your results, add up all your points:
Less than 10
Your application is not particularly sophisticated. You do not absolutely need a CEP engine right now. A CEP engine will still reduce the amount of code you need to write, and will save you time and money, especially over the long term.
Between 10 and 25
Your application is of medium sophistication. Using a CEP engine is highly advised, but the good news is that most CEP engines, including some open source ones, should be able to handle it, and will save you significant amounts of time, money, and headache.
More than 25
Your application is highly sophisticated. Using a CEP engine is a must, but you are advised to choose your CEP engine very carefully. Many available CEP engines will not be able to handle it out of the box, and you will still need to write very significant amounts of code to implement your application.
The purpose of this sample scorecard is to demonstrate the complexity evaluation process. Dimensions such as adapters, external systems, SDK, deployment, management, security, and determinism, are not included in this version. The sample scorecard is intended to help an organization better understand what a CEP engine can bring to the table, and how to decide whether one is needed in the first place.