Trends and Applications

spacer
Tackling Data Analytics for the New Enterprise

By Joseph Rozenfeld

Enterprises of all sizes are generating new types of data (Web logs, HTML pages, email, network flow), often in very high volumes. This new data is taxing the traditional model of a fixed data warehouse from which meaningful analytics can be produced for users across the enterprise. This issue is only going to get worse.

In many ways, the data warehouse is “under siege.” It has been forced to process extreme volumes of data - often in various formats that are not directly supported by databases. What’s more, this must be done without expensive, time-consuming transformations - which extend processing time beyond an acceptable time window, rendering query results meaningless in today’s rapid-firing business environment. Unfortunately, as a result, these volumes of data are seen as a burden rather than a valuable source of business insights.

Current Technology Solutions: Advantages and Disadvantages

A host of technologies and processes are being used to solve this data analytics dilemma. These include: advanced search capabilities, data sampling options, in-memory processing, new data storage, traditional business intelligence solutions, and Web analytics. The following is a brief overview of the benefits and drawback these options offer:

Advanced Search: Technology advances now offer faster, easier ways to search across enterprise databases and networks to locate data entities. However, enterprise search today is far from offering analysis and insights about these data streams.

Data Sampling: When a data set is too large for timely query processing, smaller subsets of the data are used to represent the whole. This process of data sampling raises serious credibility issues with business users regarding the accuracy of the data subset to accurately represent information about the entire data set.

In-Memory Processing: In-memory processing advances driven by the ever-increasing clock speed of CPUs have increased the speed of analytical processing, but only on a relatively small data set typically limited by short time window. It cannot address the challenges of historical analysis of large volumes of data required for reaching comprehensive business insights.

New Data Storage: Advances in high quality, low cost data storage make it possible to keep high volumes of data, but it doesn’t touch the more important question of how to intelligently process the data into information. Saving data is only one aspect of the solution.

Traditional Business Intelligence: Business intelligence or business analytics is an approach that typically relies on database technologies. While this is the most common approach to the problem, it is fundamentally flawed because it breaks down when dealing with very large data sets. These high volume data sets can cause extreme latency issues, with analysis taking days, weeks or even months, rendering the intelligence meaningless to the business user. It is also inflexible in handling dynamic, changing data sources or complex data.

Web Analytics: There are a host of Web analytics solutions that can offer some insights into web traffic logs. These typically involve special coding placed on corporate Web pages, and often require predictive path for further drill-down. What these solutions don’t do is offer an un-biased view across all Web traffic, and they cannot easily relate the Web log data to other data sources, such as marketing campaign or lead/sales tracking data bases.

Guidelines to New Data Analytics Models

Given the scope, intensity and mission-critical nature of the data analysis problem, a new approach is required. This new approach will be the glue that holds together new technological advances in a way that enables analytical processing of large volumes of new business data. When looking to implement a new analytics solution, consider these three critical technology and process considerations:

 

Perform Analytics on New Data Formats: Businesses produce and rely on data from a wide range of new and frequently changing data sources. Data enters the enterprise from corporate websites, online support sessions, IM, email, VoIP, cell phones, blackberries, and a multitude of other channels. Data formats are equally diverse - log files, Web traffic records, transaction and interactive data files. Meaningful analytics solutions must be able to understand and process these new formats, and to create associations between these new data sources.

Process Data Outside the Warehouse: The new data often comes in at extreme volumes and is difficult and time-consuming to put into an existing data warehouse. Effectively dealing with this new data demands a highly flexible technology solution that can ingest data that is not stored in the data warehouse, but stored in unstructured formats across the enterprise network.

High-speed Processing of All Data: The speed of in-memory processing must be combined with the ability to involve new and historic data. The new analytical technology solution must be able to manage to the complete set of historic data as well as the new streaming data simultaneously.

XML Technology and the New Data Analytics Model

One of the most powerful new technologies available to present an efficient way to bridge the variety of data formats is XML technology. XML uses a common, hierarchical formatting language to read and translate any type of data format into a common mark-up language. Since its inception in 1996, XML has become a favorite of developers as a flexible and adaptable means to identify information. It is now a widely adopted standard for representing text and data in a format that allows content to be processed with relatively little human intervention. It also enables data to be exchanged across diverse hardware, operating systems, and applications as well as used with a wide range of development tools and utilities. At the most basic level, XML is a standardized “meta-format” that can represent any kind of data, and for which precise schema definitions are optional.

Business intelligence vendors are now able to take advantage of XML to not only bridge multi-formatted data but to drive the entire business intelligence stack. These new solutions offer elegant, fast ways to process the extreme volumes of data, across data sources, offering flexible analytical and reporting capabilities to uncover insights that today are hidden from view. Data sources can include structured, semi-structured and unstructured formats, making it possible to gain insight from all data within the enterprise including: CRM, ERP, Web traffic logs, switch data logs, contracts, financial reports and more.

While XML provides a fast and comprehensive new format to analyze multiple data types, many of the new XML technology vendors continue to overlay their solutions onto traditional Business Intelligence processes. This can be prohibitive because they are often tied to a data warehouse which can burden the environment with labor intensive steps that can dilute the timeliness of the analyzed data. Additionally, they can add costly data warehousing infrastructure and management overhead.

There are, however, some XML technology vendors who have perfected analytics without the need for data warehousing. While this may sound like magic, it is available today. These solutions offer ways to work with the XML output by building flexible, multi-dimensional analytical models that point to each high-volume data source where it resides outside the data warehouse. They then translate it into a XML hierarchical format. From here the technology is used to build standardized queries and reporting dashboards for each business user, with data refreshes on a regular basis so users have up-to-date results at all times. Because the high-volume data is analyzed where it is stored outside the data warehouse analysis time and overhead can be dramatically reduced.

Business Analytics Free From Boundaries

It is clear that today’s enterprises are faced with significant challenges in harvesting insights out of the volumes of diverse types of data they are generating, receiving, and storing. This data is not built for the relational data warehouse model, where traditional business intelligence tools play well. Up to 80 percent of data, which is part of an organization’s business intelligence and records, are not being accessed or used to drive a thorough understanding of its business operations and practices. New technologies have not answered this challenge, providing instead more point solutions to parts of the problem. Watch closely and you will see that a new breed of business intelligence vendors are basing their products on XML. Ultimately, they can provide a way to perform analytics across all data types, quickly and with a flexibility and lower overhead not available with traditional business intelligence vendors.

About the Author

Joseph Rozenfeld, co-founder and vice president of strategy and solutions, Skytide, has more than 20 years of software development and management experience and has founded or co-founded four companies, including Skytide and ChainCast Networks. www.skytide.com

|<<TOC  <<Back   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26   Next>>  Masthead >>|

DBTA Home Page | About Us | Contact Us | Partners

To receive a monthly notice about new material and a quarterly
complimentary print edition, click
here.

 
 

DBTA Home Page

To receive a monthly notice about new material and a quarterly complimentary print edition, click here.

Table of Contents

TRENDS AND APPLICATIONS
Is BI for the Masses Finally Arriving?
Data Virtualization: The Next Wave in the Virtualization Revolution
Evaluating Disaster Recovery Technologies
Complex Event Processing: Leveraging Intelligence From Massive Amounts of Data
Tackling Data Analytics for the New Enterprise
LANSA Orchestrates Success for Business Automation
Power Company Works to Secure Oracle Data

MV COMMUNITY
What are the most important challenges ahead for the MultiValue sector?
New Release from Kore Technologies Offers Enhanced Net Change Functionality
BlueFinity Announces “.NET for MultiValue” Seminar
Entrinsik Launches Customer-Driven Webinar Series Featuring Informer Web-Based Reporting
MITS Report Fuels Popular New RV Dealer Applications from Integrated Dealer Systems
Nebula Research and Development Announces new NebulaXLite Software

COLUMNS
An Update on Data Professionals’ Salaries by Craig S. Mullins
COLLABORATE 08 Offers Extensive Educational Opportunities by Ari Kaplan
Database Designs Must Enable Data Flow by Todd Schraml
The Business Benefits of Measuring ROI for Business Intelligence Implementations by Morris Benton
Development as a Service with Salesforce.com by Guy Harrison
Musings on 11g and the Real World by Mike Ault
Better Database Statistics with Oracle 10g by Arun Kumar R.

News
Download Central
Places to Go
Did Ya Hear?
New Products

Online Masthead

DBTA Home Page

DBTA E-Editions
May 2008
April 2008
February 2008
January 2008

 
spacer
spacer
spacer