Page 1 of 2 next >>

Technologies and Skills that Build the Foundation for Data Management

What are the enabling technologies that make enterprise architecture what it is today? There are a range of new-generation technologies and approaches shaping today’s data environments. The key is putting them all together to help enterprise architecture fit into the enterprise’s vision of itself as a data-driven organization. Tools and technologies emerging within today’s data-driven enterprise include cloud, data lakes, real-time analytics, microservices, containers, Spark, Hadoop, and open source trends.


Cloud computing, in its current form, has been on the scene for close to a decade. It has only been within the past 2–3 years, however, that it has hit its stride as the solution of choice for data environments. “The acceleration to the cloud has passed the point of no return,” said Matthew Glickman, vice president of product management for Snowflake. “More and more companies, regardless of scale, are all in the cloud.”

Organizations are embracing cloud “to reach new levels of agility, increase the speed of innovation, and improve time-to-market rates,” said Mat Keep, director of product and market analysis for MongoDB. “We estimate that the majority of our deployments today are in the cloud, and we’re seeing those numbers increase.”

There are a range of benefits enterprises are already seeing from cloud, including the ability to “scale applications to new geographies, decrease investments in local data center resources, and improve the ability to deliver apps quickly—all while reducing application and infrastructure provisioning,” said Keep. For the most part, startups “will never have their own data centers, opting instead to be cloud natives,” according to Joe Pasqua, executive vice president of products for MarkLogic.

Benefits also include “agility, in which, for example, public clouds allow for quick spin-up or spin-down of infrastructure,” as well as “scale, in which public clouds allow for nearly-unlimited storage and compute, enabling customers to burst data and/or analytics into a cloud on an as-needed basis,” said Jack Norris, senior vice president of data and applications for MapR. Finally, there are cost savings, in which “public clouds allow for a pay-as-you-go model, where customers are charged based on resources used.”

For smaller operations, cloud is the de facto platform, as pointed out by Eric Mizell, vice president of global solution engineering for Kinetica. “Most startups are 100% cloud, as it’s easier to spin up and down instances versus standing up servers in an office.”

At the same time, Mizell sees movement even among the largest data centers “away from traditional datacenters for most workloads.” That is the case, he says, because “it is now essential to have global collection and processing zones in the cloud for easier and faster data handling around the world. They say that data has gravity, and what is collected in the cloud stays in the cloud.” Moreover, the infrastructure behind the cloud keeps getting faster and more powerful.

Areas where cloud is gaining the most traction include “newer digital business projects that provide responsive and personalized customer and employee-centric experiences using mobile, web, and IoT applications,” said Ravi Mayuram, senior vice president of products and engineering for Couchbase. “We see these new systems being built across many industries, including ecommerce, travel and hospitality, digital health, digital media, financial services, and gaming.” While tech and media were the early cloud adopters, other industries are now joining the cloud movement, Glickman agreed.

Be careful not to associate cloud exclusively with “public” cloud services, Norris cautioned. There’s a key role for on-premises data centers, as well. “Cloud is less about which sites to deploy to, and more about taking advantage of all physical sites available,” Norris said. “Hybrid models, where services or resources are managed in some combination of on-premises and public cloud, are quite prevalent.”

Interestingly, the cloud “is already starting to be seen as the more secure place to operate your business,” Glickman said. “Regulators might soon begin rewarding their constituents who operate in the cloud since they can provide greater transparency to their respective businesses.” Ultimately, he added, “cloud adoption will reach its point of full-on adoption once everyone stops talking about cloud adoption.”

Data Lakes

Industry experts are bullish on the concept of the data lake. As Syed Mahmood, director of product marketing at Hortonworks, pointed out, “The data lake is a natural extension of a company’s decision to embark on its big data journey.”

However, they disagree about whether Spark or Hadoop is being used to support these environments. The urgency of the data lake concept is acute. “The need to bring data from different systems together into a centralized repository for analytics and reporting is nothing new but with data volumes exploding, and much of that data now being semi-structured and unstructured, traditional enterprise data warehouses are buckling under the load,” said Keep. “Data lakes augment, rather than replace, the enterprise data warehouse.” He noted that in building data lakes, Hadoop isn’t the only solution available, and likely introduces complexity. “If organizations go the Hadoop route, they need to consider how they will integrate the analytics created in the data lake with the operational systems that need to consume those analytics in real time. This demands the integration of a highly scalable, highly flexible operational database layer.”

While Hadoop has made the data lake possible, it also introduces challenges, such as “the potential to become a data dump, security issues, lack of skill sets, and slow performance, causing smaller or less agile companies to either not try or give up on Hadoop,” said Keep. He noted that he has seen many companies add “a fast data layer on top of Hadoop to help increase its value.” According to Keep, “Spark offers new life to the data lake concept. It brings performance and machine-learning algorithms that enable the desired data munging businesses want. It also plays well in the cloud by enabling data in cloud storage to be processed faster than ever before.”

Still, some experts caution against diving too deep into a data lake. “Unfortunately, many companies have seen their data lakes turn into data swamps,” said Norris. With respect to Hadoop and Spark, “we see two types of customer adoption patterns,” he said. “The first group started with Hadoop and then adopted Spark and are using both technologies. The second group adopted Spark initially and use Spark independent of Hadoop.” Spark’s streaming analytics, he added, benefits significantly from running on a data platform that is not limited by Hadoop’s batch constraints.

Real Time

What are the best technologies for enabling real-time analytics? For Dinesh Nirmal, vice president of analytics development at IBM, the answer is Spark. “Spark radically simplifies the analysis of large datasets, enabling even those without advanced data science degrees to access information faster and more reliably than ever before,” he explained.

Apache Spark is appealing for real-time environments “because users can compute analytics very quickly, which is especially important in today’s highly responsive customer-facing applications,” said Mayuram. He pointed to another real-time enabler, Apache Kafka, which provides a “standard way to move data from an application context into a broker, so your web application team doesn’t need to worry about how to make it available to downstream consumers—their responsibility ends at Kafka. Likewise, different application teams can build analytics on the website data by consuming it from Kafka—no prearrangement required.”

A notable benefit of both Kafka and Spark, Mayuram continued, “is the ability to support real-time data streaming, which significantly reduces the traditional time lag between when data enters the system and when the results of ETL and analytical processes are available.”

Page 1 of 2 next >>


Subscribe to Big Data Quarterly E-Edition