Big Data, Big Issues - The Year Ahead in Information Management

The year 2010 brought many new challenges and opportunities to data managers' jobs everywhere. Companies, still recovering from a savage recession, increasingly turned to the power of analytics to turn data stores into actionable insights, and hopefully gain an edge over less data-savvy competitors. At the same time, data managers and administrators alike found themselves tasked with managing and maintaining the integrity of rapidly multiplying volumes of data, often presented in a dizzying array of formats and structures. New tools and approaches were sought; and the market churning with promising new offerings embracing virtualization, consolidation and information lifecycle management. Where will this lead in the year ahead? Can we expect an acceleration of these initiatives and more? DBTA looked at new industry research, and spoke with leading experts in the data management space, to identify the top trends for 2011.

Back to exponential database growth: Exponential data growth will continue unabated, and create new performance and budget headaches for companies. A new Independent Oracle User Group (IOUG) study, conducted by Unisphere Research, finds data growing rapidly at nine out of 10 of the 581 respondents' companies, with one in seven seeing data growth at a clip exceeding 50% a year. Business growth is to blame for much of it, but there is another data avalanche pouring in from new sources: the web and social media. "When you begin to factor in the data that social media is generating on a daily - even hourly - basis, you get a better picture of the data avalanche that is happening online," Scott Walz, senior director of product management for Embarcadero Technologies, tells DBTA. "In 2011, we'll see a continued growth of consumer-generated data, as well as corporate data. Corporations are also starting to adopt more internal collaboration software beyond the SharePoints of the world, and that is making them a hotbed for data growth."  While the common approach has to been to throw more hardware at the problem, expect to see smarter approaches such information lifecycle management (ILM) gain traction as well. Ironically, this all bodes well for the jobs of data managers and administrators. "The simple fact is that we're not getting rid of data - we're creating more of it," says Walz. "The good news is that the uptick in data will keep those who manage it in business for a while, since there's a perpetual need for new processes, tools and technologies to manage and make sense of it."

Big Data calls for big realignment: The relentless growth of data is leading to a phenomenon now referred to as "Big Data." What else to call it when some companies are adding add billions of records to their databases annually? While help is on the way, in the form of faster hardware, such as inexpensive multi-core x86 processors to crunch data rapidly, cheaper solid-state storage solutions, Gigabit networks, and data appliances, a fresh perspective is needed. "Big Data requires a different way to think about data management," Perry Rotella, vice president of Verisk Analytics, tells DBTA. "Our traditional approaches are too expensive and cumbersome and takes too long to produce results. The challenges go beyond database technology." The coming year may be a time when managers go in and re-examine the processes that go into building and maintaining data environments. "We must consider the operational aspects of managing large data sets from load to backup, Rotella says. "We will see more organizations establish enterprise data management functions to coordinate data across business units, as well as data sharing frameworks, versus large-scale data warehousing efforts." The rise of Big Data creates additional management conundrums for data managers as they seek places to put it. Charlie Silver, CEO of Algebraix Data Corporation, has seen estimates that more than 15 petabytes of new information, or eight times the information housed in all US libraries, is being generated each day. "If we continue to generate new information at this pace, it will create a situation utterly crippling for IT," he tells DBTA. Why? Because "data is still stored and accessed in a hodgepodge of traditional row-and-column databases, which are tantamount to very large spreadsheets," he explains. This is hugely inefficient and will grow even more inefficient as Big Data gets bigger, he says. Silver says there will be more calls for a new approach to handling data that goes beyond the row-and-column databases, "and instead uses mathematics to define the relationships between data." This type of approach, he says, "would eliminate time-consuming table maintenance and the performance problems related to indexing, importing, and cataloging data that's required to be in a specific format."

Advanced and predictive analytics: How can companies better understand and be able to act on their Big Data? Advanced analytics will take decision-making to the next level over more current rudimentary forms, such as online analytic processing (OLAP). "Expect to see more advanced forms of analytics emerge, based on data mining, predictive analytics, complex SQL, MapReduce, natural language processing, statistics, artificial intelligence, and so on,  Philip Russom, senior manager of research at The Data Warehousing Institute (TDWI) tells DBTA. "These enable the discovery of unknown facts - far more broadly than OLAP can - by supporting ad hoc analytic methods against unknown or changing collections of lightly prepared data." Such capabilities will also pave the way to predictive analytics, delivered at lightening speed to end users. "2010 represented a significant step forward in this area, where predictive analytics capabilities were introduced into the mainstream, helping every day business users gain powerful insights into data so they can forecast future scenarios based on behind-the-scenes data-driven methods," John Callan, director of product marketing for TIBCO Spotfire, tells DBTA. "Users don't have to understand the method. They just have to ask the questions, and get answers instantly, rather than waiting days only to find out that their question has already changed."

Shift from IT to the business user:  While 'Big Data' has the lion's share of attention these days, there's another issue that will increasingly perplex managers in the year to come - what to do about all the mashups, apps, and feeds their users are building and generating to manipulate data. "Many in the software industry are fixated on the Big Data problem, but I believe one of the top data management challenges in the coming years will be the  'small data' problem," John Crupi, CTO of JackBe, tells DBTA. "The future of data actually lies in the hands of millions of next-gen tech-savvy users who understand enough about feeds and mashups and apps that they can chunk the big data into usable small parts so they can answer their own problems." However, this gives rise to new challenges, including data security and governance, he continues. "How do you know what data they're using or publishing and with whom they're sharing?" TIBCO's Callan agrees that the center of gravity in data management will continue to shift to end-users over the coming year. And there will be an abundance of tools and resources from which they can choose. "There are easy delivery vehicles such as new cloud-based offerings that let users download BI tools to their mobile device or desktop, and tools that provide the capabilities that support data exploration - such as more intuitive search capabilities and video game like interfaces," he says. "There will be even deeper integration with statistical programming tools so that users can conduct 'free-dimensional' ad hoc querying within the context of a business process, without having to turn to third-party statisticians or IT."

The result is an explosion in the diversity of user types and devices, which will increasingly be seen in 2011. "Traditional reports and dashboards, which answer the questions the company knew to ask, will have to be complemented with discovery environments where people can ask the questions they just realized matter in the moment," Paul Sonderegger, chief strategist for Endeca, tells DBTA.  This means changes in thinking for data management professionals, he adds. "They'll have to give up the idea of organizing the data ahead of time. Users will have the tools at their disposal to arrange and re-arrange the data the way they want to see it at that moment, like consumers do at e-commerce sites."

More cloud-based services: Another new IOUG survey conducted by Unisphere Research finds numerous companies are packaging and virtualizing their own IT assets into "cloud"-like services to offer across various departments and divisions, and even to outside partners. The survey of 267 IT and data managers and professionals finds that private cloud formations are growing in many companies, often outpacing adoption of public cloud services. Forty-four percent of organizations in the survey already have a private cloud - either already running or are piloting, planning or considering one. Adoption of private clouds will expand significantly over the next 12 months. However, private clouds will bring new sets of issues with them. "There is a dark side to private clouds," Russell Rothstein, vice president of product marketing for OpTier, tells DBTA. "They significantly distort visibility into the flow of services in the cloud," he warns. "Whether they offer database as a cloud service, use a cloud database, or utilize hybrid models, database teams risk losing critical information to match end users and business services with information stored in the database; they will lose critical information needed for capacity planning and they will be relegated to a utility-based computing service that doesn't add value to the business." DBAs will need tools and methodologies that ensure greater visibility into cloud-based transactions, Rothstein says. In addition, data managers and administrators will need to perform more analysis into what data is more suited for cloud deployments, and what should remain in on-premises applications. "More and more enterprise services get automated, email traffic is ever increasing, CRM data due to changed business dynamics have further increased the need of more data capture," Paresh Shah, director of technology solutions for Allied Digital Services, tells DBTA. "One of the missing challenges is going to be what data would remain within the enterprise and what can safely and be best managed in the cloud. Further statutory data geographic protection issues continue to haunt global enterprises in their consolidation process across the globe."

Emergence of Data as a Service: Along with the growth of private cloud-based services, some experts predict the rise of "Data as a Service" (DaaS) approaches to enterprise data management. "This rather simple solution - build once, reuse repeatedly - is now available due to advances in web services standards," Daniel Teachey, senior director of marketing, DataFlux, tells DBTA. Until recently, DaaS - which ensures a common set of enterprise standards for data - was nearly impossible, he observes. "The technology wasn't fast or powerful enough to enable it, and the approach is still met with resistance from vendors who promote piecemeal solutions to data management. Also, issues like MDM [Master Data Management] and regulatory compliance have raised the profile of data management enough that it might finally get the attention it deserves in 2011."

Convergence between Master Data Management and Business Process Management: Typically, MDM (Master Data Management) and BPM (Business Process Management) have existed as two disconnected yet related areas of focus for the enterprise.  But this is changing, and we're likely to see convergence between the two disciplines over the coming year, Jim Walker, product marketing for Talend, tells DBTA. "When successful, an MDM project can be a game changer for an organization. The business value of creating a master set of business data is immense, enabling better cross sell and up sale, improving customer satisfaction, reducing risk and increasing the significance and reliability of business analytics and data warehouses. Successful orchestration of these key aspects requires enforcement and guidelines for the critical processes that are defined to govern the data.  Hence, BPM is a key enabler of data governance and, in turn, MDM. Ultimately, MDM provides the master data and BPM provides the master process."

More data center consolidation and resulting data migration challenges: As organizations attempt to get more control over Big Data, expect greater consolidation in the data center in the year to come, Don MacVittie, technical marketing manager for F5 Networks, tells DBTA. "The trend to virtualize servers and the impact of the global economy have many organizations reducing their number of data centers." Consolidation is not without its challenges, however. "There will be a need to move massive amounts of data - databases, VMs, flat files - out of existing data centers," says MacVittie. "The data has to move from data center A to data center B in a very short amount of time, and the natural solution is to transfer over the internet. But WAN connections, even big ones, can quickly bog down with the type of high-volume transfers associated with moving large applications and their data sets." Establishing more robust connectivity to new providers - such as cloud-based services - also exposes some of these issues, he adds. "Since your WAN connection is often the bottleneck for delivering data, it is a good place to look for performance issues post-move, storage being another."

"Data monetization": There may be big value in Big Data, but the question many companies will be grappling with over the coming year is how to find and extract that value. Bill Schmarzo, global EIM competency lead at EMC, even has a term for it: "data monetization." Data monetization will increasingly gain traction as companies work to make their data more actionable, tells DBTA. "Data is growing, but companies are only now learning how to use this data to make it actionable and valuable to their business," he tells DBTA. "This includes improving business decision making - especially around previously "difficult-to-answer" business challenges - and new ways of monetization by combining their data with analytics. To do this will require collaboration between the technology and business side like never before."

Rise of unstructured and Internet data in enterprises: Organizations have been struggling with unstructured data - such as graphical or video files - for years. Still, Dave Kellogg, CEO of MarkLogic, tells DBTA, "unstructured data still catches many people by surprise." The problem is that most data management efforts focus on "efficiency, security, and analytics on the data that you've already mastered," he says. "But as you hit diminishing marginal returns on those efforts, you need to recognize the elephant in the room - the 80% to 90% of your data that you are ignoring because it's unstructured, and thus doesn't fit well into your database and data warehouse infrastructure." There is also an additional class of unstructured data being added to the mix - internet data. "Additional growth will be driven by this emerging class of data, that is less structured, and brings with it its own set of data management challenges," Timo Kissel, chief technology officer and senior vice president with Fetch Technologies, tells DBTA. "There is a large and ever-growing amount of business-critical information available on the internet - from online pricing, to product information, to news - yet it is difficult to get access to this data with the reliability that we are used to from traditional database systems. As technologies emerge to address these challenges, it will be transformative for companies to be able to treat the web as just another data source, not just to enhance decision-making in traditional businesses, but enabling whole new business models."  Data professionals "will have to abandon the notion of modeling and processing data in tables," Endeca's Sonderegger says. "Working with heterogeneous data, including unstructured content like documents or media files, means dealing with the records themselves, with all of their idiosyncratic attributes."

Data "app stores": The app stores now offered through Apple and other other vendors are providing a working model for enterprises to offer and distribute data, some experts predict. "The future of data will feel much like the Apps in Apple's iStore do today - data packaged in bite-size units that are easy to read, rate, and share," says Crupi of JackBe. "The challenge will be to manage the users in your credit department, your operations group, or your finance branch, as they set about using whatever tools they like to make this happen. You can't exactly lock every database up in a vault."

Multi-platform and multi-skills: As data management moves to the center of the organization, data managers will need to be able to handle a variety of challenges, from overseeing technical implementations to providing a consulting role to the business. Organizations are looking for more than "Oracle database specialists" or "DB2 database specialists." Cross-platform database administration "is going to present a challenge, and a mounting one at that," Embarcadero's Walz relates. "Whether they are introduced via a new application, an M&A or just somehow pop up, most DBAs have to now support two or more types of database platforms. Single-platform shops are becoming as rare as an ice cube in the Sahara."

Editor's Note: After the interviews for this article took place, John Callan joined Qliktech as senior director, global product marketing.