The Emerging Agile Data Architecture: NoSQL, Hadoop & Beyond

NoSQL and Hadoop—two foundations of the emerging agile data architecture—have been on the scene for several years now, and, industry observers say, adoption continues to accelerate—especially within mainstream enterprises that weren’t necessarily at the cutting edge of technology in the past.

For enterprises today, “data management is a critical differentiator that can determine market winners and losers,” Kelly Stirman, vice president of strategy and product marketing for MongoDB, pointed out.

The move to NoSQL and Hadoop is driven by the necessity to remain competitive, rather than initiatives to advance technically. Driven by market pressure and by increasing customer demand, “we’ve seen a growing number of mainstream enterprises working to become digital transformers,” said Ravi Mayuram, senior vice president of products and engineering for CouchBase. “They want to leverage NoSQL and Hadoop in order to transform their business—generating new sources of revenue, improving customer experience, and generating new data-driven insights that improve how the business interacts with its customers.” Both NoSQL and Hadoop—separate technology initiatives that have grown in tandem—have turned the database industry, which has been static for many years, on its head. “In the early and middle aughts, both NoSQL and Hadoop came roaring out of consumer internet companies,” said Mike Olson, co-founder and  CSO  of  Cloudera. “Both took the industry by surprise.”

Adoption of the agile architectures being created by NoSQL and Hadoop is now widespread. “Companies that our mothers and grandmothers know are now adopting NoSQL technology for applications that require a distributed database foundation,” said Robin Schumacher, vice  president  of products  for  DataStax. “This movement  isn’t  just  in one or two specific industries; we’re seeing this across all vertical markets.”

While NoSQL and Hadoop address different types of data problems, adoption is accelerating to the point where virtually all organizations are likely to have both within their data centers, said Tony Kavanagh, chief marketing officer at Actian  Corp. In  many  cases, he cautioned, enterprises are still only learn- ing the business applications for these approaches. “We must keep in mind that both these markets are, by and large, still in the nascent phase, even though they have been around for years,” he said.

This acceleration is being driven by the proliferation of data across the enterprise. “The year-over-year increase in data from sensors, social, mobile, even log files is forcing enterprises to look for ways to put this data under management, and this drives technology transformation as well as economic pressure. Every business is a data business,” said Scott Gnau, chief technology officer of Hortonworks.

Tradition Versus Greenfield

The question is: Are these technologies being embraced at the core of enterprises, or  do  they remain relegated to development  shops  and ancillary  activities?  Most NoSQL deployments may still be occurring at the periphery of enterprises, some experts argue. “It’s one thing to say that your product is used by an enterprise. It’s quite another to say that it is used by an enterprise for a mission-critical application,” Joe Pasqua, executive vice president of products for MarkLogic, pointed out, observing that “most of the NoSQL market isn’t focused on mission-critical apps which require really strong security, high availability, disaster recovery, replication modes, deployment modes, and so on. Those features are very hard to implement and take many years to harden. It’s just not the design point for most NoSQL products.”


The move to NoSQL and Hadoop is driven by the necessity to remain competitive, rather than initiatives to advance technically.


Experts emphasize that agile architectures accommodate existing database environments. “In the majority of use cases that we’ve seen to date, NoSQL is being introduced as a complement to existing RDBMS environments,” said Mayuram. “This is especially true when we look at use cases like caching and online, massively scalable customer-facing applications.” He added that NoSQL systems are “broadening the queries and use cases that they support, and are rapidly becoming much more general-purpose data-management platforms. We’ve seen NoSQL and Hadoop used both for new and green field applications and for  enhancing  or  replacing  existing applications.”

It’s most likely most agile architecture workloads will be greenfield applications. “It’s pretty rare that someone will take an existing workload that runs well on current systems and move it over,” said Olson. Kavanaugh agreed, adding that “there are use cases that will always be better addressed by traditional RDBMS technology, such as financial applications. In much the same way as mainframes are still in use today, RDBMS environments will play a role in the foreseeable future.”

Applications and Capacity

Another question on the minds of enterprise data executives is whether NoSQL and Hadoop-based architectures are capable of handling the big data loads being thrown at them. Industry experts say these technologies are more than capable. “You never want to say ‘never’ in technology, but we haven’t, so far, encountered any instance of data that was too complicated, or scale that was too breathtaking, to break the system,” said Olson. “Large-scale web serving apps run very well on NoSQL. Analytic and large-scale processing apps run very well on Hadoop. And, of course, there’s some overlap.”


The question is: Are these technologies being embraced at the core of enterprises, or do they remain relegated to development shops and ancillary activities?


A wide array of applications are now being supported with NoSQL databases, from back-ending consumer apps to supporting financial services transactions. “The common theme across all of these varied use cases is data integration,” said Pasqua. “Most people think about bringing data together into data warehouses, but customers don’t just want to look at the data, they want to operate on it. They want it to be live.”

For Hadoop, the main application area has been data analytics, but it may be expanding to new application areas as well. “Initially, technologies like Hadoop were seen as being able to handle bigger volumes and a wide variety of data but the uses were still in the reporting and analysis area,” said Jack Norris, SVP of data and applications for MapR Technologies. “We’re now seeing organizations increasingly breaking down the divisions between production and   analytic  systems  by integrating analytics to make adjustments while the business is happening.”

Challenges Ahead

With every emerging technology, of course, there are challenges. Often, the hurdles are not related to the viability of the technology, but to the ability of the organization to leverage the solution. With  NoSQL,  for  instance, issues arise “when a decision is made in the enterprise to deploy open source NoSQL technologies into production when they are not enterprise ready,” said Kavanaugh. “Typically, they don’t provide the security, management or implementation services, training, or the 24x7 sup- port necessary to deliver the confidence enterprises demand when deploying such technologies.”

It’s a question of whether a NoSQL and Hadoop-enhanced architecture can finally achieve the dream that that has eluded data teams for generations now—bringing together disparate data sources and silos. “We’ve found that organizations hit roadblocks even with the variety of their structured data,” Pasqua explained. “They’ve got 20 systems that all have the concept of  a customer and have 20 wildly different schemas. Being able to bring together all of the structured data across all of these schemas and also accommodating unstructured data is giving them a huge advantage.”


Success with agile database architectures requires ‘raw material’ in the form of data, and ‘energy’ in terms of being able to apply multiple types of analytics.


Effective data modeling also helps make a difference in making the transition to a new architecture. “That’s the number-one success factor we constantly see,” said Schumacher. “Once an enterprise transitions from modeling data the RDBMS way to the NoSQL way, success is almost always a given.” At the same time, however, Hadoop “continues to suffer from extreme complexity and ease-of-use issues.” Schumacher suggests addressing these challenges through both“education and experience.”

Norris recommends focusing on the architecture to make these implementations possible. “Selecting an option that has the underlying support for mission-critical features and real-time capabilities will provide you the power and flexibility that you will invariably need,” he said. The next step is to “focus on the data,” he continued. “Understanding the significance of events—web interactions,  machine  sensors,  health status, vehicle stats—and being able to react appropriately are key to increased revenues, decreased costs, and lower risk.”

The role of management and people is central to the success of these efforts, said Pasqua.“The first thing to do is develop a solid use case with a clear understanding of what you are trying to accomplish. It’s important to have buy-in on this from the top level and equally important to have the people in place to carry out the project.” Citing the importance of collaboration, Kavanaugh advises assembling “a cross-functional team of business and technical experts to build a project plan, with clearly defined goals, objectives, milestones, and owners.” Plus, he adds, “acquire  or  train talent  in  your chosen technologies. It’s also important to utilize the skills you already have.”

Success with agile database architectures “requires raw material and energy,” said Gnau. The raw material, he continued, is “data—all data—so get started by capturing as much as possible.” The energy comes from “being able to apply multiple types of analytics to find relevant patterns or the signals in the noise.” Speed is critical—particularly the ability to “fail or succeed quickly, then move on to the next data source,” Gnau added. Open source can be a big help here. “The speed at which the community innovates is more rapid than any one vendor going it alone,” said Gnau. “The community will ensure a constant flow of new innovation and help future-proof your data architecture.”



Newsletters

Subscribe to Big Data Quarterly E-Edition