Big Data: The Battle Over Persistence and the Race for Access Hill

<< back Page 4 of 4

Starting from the data end, you could make the single point of access within a database—this database could have connections to other data stores and virtualization as the representation for the users. Next could be to centralize the access and information context above the database layer but between the BI app and consumer layers with a data virtualization technology. Third could be to move further along the path toward the user into the BI application layer, where BI tools have the ability to create meta catalogs and data objects in a managed order for reporting, dashboards, and other consumers. Finally, some argue that the user—or desktop—application is the place where users can freely navigate and work with data within the context they need locally and with a much more agile fashion.

Not All Data Is Created Equal

Despite database, data virtualization, and BI tool vendors racing to be the single point of access for all the data assets in the modern data platform for their own gains, there isn’t one answer for where singular access and context should live because it’s not necessarily an architectural question but perhaps a more philosophical one—a classic “it depends.” With so many options available from the vendors today, understanding how to blend and inherit context under which circumstances or workload is key.

First, understand which data needs to be governed vigorously—not all data is created equal. When the semantic context of data needs to be governed absolutely, moving the context closer to the data itself ensures that access will be inherited context every time. For relational databases, this is the physical tables, columns, and data types that define entities and attribution within a schema of the data. For Hadoop, instead, this would be the definition of the table and columns, with the Hive or HCatalog abstraction layer bound to the data within the Hadoop Distributed File System (HDFS). Therefore, a data virtualization tool or BI server could integrate multiple data stores’ schemas as a single virtual access point. Counter to this approach is certain data that does not have a set definition yet (discovery), or when local interpretation is more valuable than enterprise consistency—here it makes more sense for the context to be managed by users or business analysts in a self-service or collaborative nature. The semantic life cycle of data can be thought of as discovery, verification, governance, and, finally, adoption by different users in different ways.

As for the “it depends” comment regarding different analytic workloads, let’s examine another new hot topic of 2013: Analytic Discovery, or specifically, the analytic discovery process. Analytic databases have been positioned as higher-performing and analytic-?optimized database between the vast amounts of big data in Hadoop and the enterprise reference data, such as data warehouses and master data management hubs. The analytic database is highly optimized for performing dataset operations and statistics by combining the ease of use from SQL and the performance of MPP database technology, columnar data storage, or in-memory processing. Discovery is a highly iterative mental process—somewhat trial and error and verification. Analytic databases may not be as flexible or scalable as Hadoop, but they are faster out of the box. So, when an analytic database is used for a discovery workload, some degree of semantics and remote database connections should live within them. Whether the analytic sandbox is for discovery or is for running production analytics accumulating more analytic jobs over time is still unknown.

What’s Ahead in the Big Data Marketplace

In 2013, two major shifts in the data landscape occurred. The acceptance of leveraging the strengths of various database technologies in an optimized Modern Data Platform has more or less been resolved, but the recognition of a single point of access and context is next. Likewise, the race for access will continue well into 2014—and while one solution may win out over the others with enough push and marketing from vendors, the overall debate will continue for years, with blended approaches being the reality at companies.

And, get ready: The next wave in data is now emerging, once again pushing beyond web and mobile data. The Internet of Things (IoT)—or, Machine-to-Machine (M2M) data—comes from a ratio of thousands of devices per person that creates, shares, and performs analytics, and, in some cases, every second. Whether it’s every device in your home, car, office, or everywhere in between that has a plug or battery generating and sharing data in a cloud somewhere—or it’s the 10,000 data points being generated every second by each airline jet engine on the flight I’m on right now—there will be new forms of value created by business intelligence, energy efficiency intelligence, operational intelligence, and many other forms and families of artificial intelligence. 

About the author

John O’Brien is principal and CEO of Radiant Advisors. With more than 25 years of experience delivering value through data warehousing and business intelligence programs, O’Brien’s unique perspective comes from the combination of his roles as a practitioner, consultant, and vendor CTO in the BI industry. As a globally recognized business intelligence thought leader, O’Brien has been publishing articles and presenting at conferences in North America and Europe for the past 10 years. His knowledge in designing, building, and growing enterprise BI systems and teams brings real-world insights to each role and phase within a BI program. Today, through Radiant Advisors, O’Brien provides research, strategic advisory services, and mentoring that guide companies in meeting the demands of next-generation information management, architecture, and emerging technologies.

<< back Page 4 of 4