The days of looking at your data in the rearview mirror are coming to an end. Most organizations now realize that if they want to make better decisions faster, they need to understand and respond to what is happening in real time. Such an ability to analyze data at the “speed of thought” requires figuring out how to build a high-performance and effective streaming pipeline that is both affordable and scalable, and is also easy to implement and manage.
To address this challenge to date, most organizations have assembled various open source and commercial data analytics tools, and some have experienced decent results for some applications. But any change, such as adding new data streams or Internet of Things (IoT) data sources, can be disruptive.
The time has come for a data analytics solution that is simple to use, delivers real-time performance, and scales cost-effectively. Simplicity can be achieved using database technology that maintains compatibility with SQL and other familiar tools, along with an API layer that eliminates the need to stand up middleware. For performance and scale, an in-memory database accelerated with a graphics processing unit (GPU) provides the power needed to simultaneously ingest, analyze and visualize streaming data.
Let’s begin with a brief introduction to the GPU-powered database itself, and then consider how GPU-powered databases are able to deliver real-time performance in three separate data analytics use cases.
The GPU-Powered Database
There are, of course, numerous database solutions currently available, ranging from traditional RDBMS to NoSQL and NewSQL. Some solutions are a fork of another with some new features designed to solve a specific problem, and many of these are now critical to the success of many organizations. For example, the traditional RDBMS forms the foundation for anything transactional, while NoSQL remains the best tool for key/value lookups. With so many options, choosing the wrong database for the job can result in frustrating complexity and unsatisfactory performance.
That choice becomes even more difficult with advent of IoT and the onslaught of streaming data. But as usual, new challenges inevitably bring new solutions, including those purpose-built for peak performance. For real-time data analytics, that solution involves marrying something “old” (the in-memory database) with something “new” (the GPU with its massively-parallel processing power). The result is nothing short of a paradigm shift in both performance and price/performance.
The GPU is not actually new, as it has been used in graphics applications for many years. What is new are the many advances that now make the GPU ideal for accelerating the processing-intensive workloads common in data analytics applications. Those advances include making GPUs substantially easier for database vendors to program, adding more cores (up to 5,000) and memory, and increasing I/O with both host server and GPU memory. And analytic databases designed to take full advantage these advances have demonstrated some impressive improvements in performance.
Simultaneously Ingesting, Processing, Analyzing, and Visualizing Streaming Data
Live data holds enormous value, but that value has a finite lifespan. Its pure potential has led to the creation of both open source and proprietary software purpose-built for analyzing such streaming data. But these solutions currently are unable to simultaneously ingest, process, analyze and visualize these streams in real time—often in conjunction with “data at rest”—causing organizations to miss out on this opportunity either by being limited to a relatively low volume and velocity of data, or by having the results come too late to realize the full value.
This need to analyze and act on live data in real time is becoming increasingly common. Some organizations have rather obvious sources of streaming data involving mobile assets, financial transactions, health monitoring devices, and point-of-sale systems. And almost every organization has its own data sources—from a data network, a website, inbound and outbound phone calls, heating and lighting controls, machine logs, a building security system, or other infrastructure—all of which continuously generate data that holds potential value, especially in conjunction with other data sources.
The ability to make streaming data truly actionable by fully ingesting, processing, analyzing, and visualizing it as it arrives requires a high-performance configuration, especially when the visualization is graphics-intensive. Achieving peak performance requires dense compute power with the data residing in memory. Adding GPUs to commodity hardware adds the much-needed computational power and modern GPU-enabled databases provide the tiered storage across VRAM (GPU memory), system RAM, SSD, NVMe, and/or Flash SAN. Streaming data lands in memory for real-time analytics and then moves down the tiers for deeper analytics as required. Additionally, GPUs enable server-side rendering of many visualization “widgets” such as maps,and charts. . This reduces network overhead and enables real-time visualization.
The ability to ingest, analyze, process and visualize a high volume and velocity of streaming data in real time might be cost-prohibitive were it not for the massively parallel processing provided by GPUs.
Advanced In-Database Analytics
Most organizations suspect that data analytics is currently not adding as much value as it could. Data scientists often do not understand the business as well as the business analysts do, and the business analysts are often ill-equipped to conduct their own analyses—at least to the extent needed to uncover the value hidden deeply within the data.