Enterprise Data Growth: to Petabytes and Beyond

As data grows,  the reflex reaction within many organizations is to buy and install more disk storage. Smart approaches are on the horizon but still only prevalent among a minority of companies.

Remember when, not too long ago, people were awed to hear about massive databases or data warehouses topping the 1 terabyte  (TB) mark? One could just imagine the huge server rooms, sophisticated clustering algorithms, and large disk arrays needed to support such an operation. The operations supported by such an immense data environment had to have a strong, multi-billion-dollar business case, such as records of telecommunications customer data and transactions, or storing data on particle acceleration experiments.

Of course, nowadays, some laptop computers have more than 1TB of storage capacity on their hard drives. And, for organizations pursuing big data environments, data is not only topping hundreds of terabytes but also expanding into the near-petabyte and even multi-petabyte (1,000 terabyte) realm. 

How is it data has grown so far so fast? Technology growth along the lines of Moore's Law (doubling every 18 months) has made petabyte-capable hardware and software a reality. And data growth itself appears to be keeping pace with the hardware and systems. In fact, a petabyte's worth of data is almost commonplace, as shown in a new survey conducted by Unisphere Research among members of the Independent Oracle Users Group (IOUG). In "The Petabyte Challenge: 2011 IOUG Database Growth Survey," close to 1 out of 10 respondents report that the total amount of online (disk-resident) data they manage today - taking into account all clones, snapshots, replicas and backups - now tops a petabyte.

For most organizations, in fact, data is streaming into, out of, and through enterprises from a dizzying array of sources - transactions, remote devices, partner sites, websites, and nonstop user-generated content. Not only are enterprise data stores - both in core, mission-critical databases as well as other environments - scaling into the terabyte and petabyte range but they encompass an unfathomable range of formats as well, from traditional structured, relational data to message documents, graphics, videos and audio files.

The IOUG survey, conducted in partnership with Oracle Corp., included input from 611 data managers and professionals. Respondents to the survey have a variety of job roles and represent a wide range of company types, sizes, and industry verticals.

Almost all respondents report data growth over the past year; in one-third of the cases, this growth is significant. One-third of respondents report the amount of data within their enterprises grew by 25% or more over the past year. A number of companies are compelled to hang onto data for extended periods of time to meet compliance requirements. As a result, more data is being kept in online mode for longer periods of time to keep it available - which increases storage costs.

Along with compliance requirements, a majority of respondents report that their data stores are growing in conjunction with growing business demands. Close to half also report that data warehousing and business intelligence are fueling data growth. 

Many respondents report increasing issues in the performance of their applications as a result of data growth. However, many also still look to hardware-additional server and storage systems-as the way to handle near-petabyte or multi-petabyte data.

As data grows, the reflex reaction within most organizations is to buy and install more disk storage. Smart approaches are on the horizon, but still only prevalent among a minority of companies. Close to one-third now embrace tiered storage strategies, and only 1 out of 5 is putting information lifecycle strategies into place to better and more cost-effectively manage its data. A sizable segment of respondents report that a majority of their data is managed within core enterprise databases.

Data managers in the survey are struggling with rapid data growth, but few have control over the storage technologies used to manage this growth. In many cases, those respondents close to the ground in data sites - such as DBAs - do not have a great awareness of accumulated or projected storage costs.

For many survey respondents, the surge of near-petabyte data environments is changing the information management landscape. As one respondent, a DBA with a large financial services firm, put it: "I would not say that ‘big data' has made it more difficult, but we have to think and plan carefully before implementing any new strategy because the impact of any decision related to this volume of data will certainly be huge."

The Executive Summary of the report is available from the IOUG