Normally, for this column, I would write a technical article or something on leadership. However, Big data and the impact of analytics on large quantities of data is a trend that is here to stay. The general idea is that large amounts of data, from multiple sources and of multiple types, can be analyzed to produce heretofore unknown insights about your business.
When you mention big data and analytics, the first thing most people think of is Hadoop, Spark, and NoSQL. But are these newer technologies required for big data projects? What about the mainframe?
Mainframes are not often mentioned in big data articles and advertising. But they should be! The mainframe is the most secure and reliable processor of business transactions for the Fortune 500. Millions of CICS and IMS/TM transactions are being processed every second by big businesses and their customers. Every time you book a flight, visit the ATM, or make a purchase just about anywhere, chances are that there is a mainframe behind the scenes making sure that the transaction happens accurately and efficiently. According to a recent study by TechTarget, the No. 1 type of data being collected for big data programs is “structured transaction data.” And most of that lives on a mainframe!
The same study also revealed that most organizations are planning to use mainstream relational databases to support their big data environment—more so than Hadoop, NoSQL, or any other type of database or data platform. So traditional RDBMSs, such as Db2 for z/OS, can be—and are being—used to drive big data projects. O’Reilly’s Data Science Salary Survey found that the top tool used by data scientists is not R or Python, but SQL on relational databases. Yes, the same SQL we all know (and love) has not been displaced by other tools for big data analytics.
Relational and Db2 for Big Data?
It makes sense that relational databases can be used for big data projects, but probably not all big data projects. Sometimes relational may not operate or perform as needed when the project requires a large number of columns, a flexible schema, relaxed consistency, or complex graphs. But that does not mean the mainframe cannot be used. IBM’s Linux for System z can run Hadoop as well as many types of NoSQL database systems. Consider, for example, Veristorm’s zDoop distribution of Hadoop that runs on Linux for System z.
IBM also offers Machine Learning for z/OS, a complete machine learning solution to extract value from enterprise data that runs on the mainframe. This solution can be used to ingest and transform data to create, deploy, and manage high-quality self-learning behavioral models using mainframe data.
The IBM Db2 Analytics Accelerator can be integrated with Db2 for z/OS and used to provide powerful analytics processing. The accelerator is a workload-optimized appliance that boosts performance for complex analytic needs. Benchmarks have shown it can run complex queries up to 2000x faster while retaining single record lookup speed. This means that the accelerator delivers high performance for complex business analysis over large amounts of data. And it is completely transparent to your applications; it looks and works just as Db2 for z/OS, only faster.
Furthermore, Db2 has capabilities for ingesting, storing, and processing big data, including JSON documents and integration with Hadoop. IBM’s BigInsights, which is a collection of services for Hadoop, has integration points with Db2 for z/OS. IBM’s InfoSphere BigInsights connector for Db2 for ?z/OS provides user-defined functions that enable developers to write scripts in JAQL to query JSON objects in the BigInsights environment for analysis.
From a more traditional Db2 perspective, compression can be used to reduce the amount of storage used by very large databases. And, you can compress both table data and indexes, thereby using less storage to maintain large amounts of data.
Partitioning can also be used to spread data across multiple datasets, which might be needed for very large table spaces. With relative page numbering in Db2 Universal table spaces, it is possible for tables to reach 4PB in size! Who wouldn’t call that big data?
A Vital Component
The mainframe should be a vital part of creating your big data projects to deliver value to your organization. This does not mean that you shouldn’t learn the new technologies of big data, such as Hadoop, Spark, and NoSQL. It is just that Db2 and the mainframe should absolutely be a major component of your big data planning and infrastructure.