Page 1 of 3 next >>

Building a Competitive Data Architecture, One Technology at a Time

It’s time to rethink data architecture. The architectures that have been built over the years were suitable for on-premise, legacy environments but tended to be static, inflexible, and accessible to only a few. Today’s data-driven organizations demand capabilities that adapt to the enterprise and open new paths of innovation to business users. Achieving leadership in today’s economy requires identifying and preparing for the emerging technologies and methodologies that deliver transformation.

At stake is the ability to compete on analytics, deliver superior customer experience, enable personalization, and do so in real time. Data industry leaders see an impressive constellation of technologies that are making data architecture development more aligned with pressing business requirements, as well as helping to make their jobs easier. Here are leading technologies that are reshaping approaches to data architecture.


Interest is growing in data fabric, an architecture intended to provide standardized and consistent data services across enterprises. Gartner defines a data fabric as “a design concept that serves as an integrated layer fabric of data and connecting processes,” employing “continuous analytics over existing, discoverable and inferenced metadata assets to support the design, deployment and utilization of integrated and reusable data across all environments, including hybrid and multi-cloud platforms.”

Data fabric “integrates, manages, and governs all data across the hybrid cloud, bringing together pieces of business that would otherwise be approached in isolated instances,” said Beate Porst, program director, product management, data, and AI, at IBM. “Advanced machine learning and AI technology allow for data fabric to bring all of these parts together while also reaching a higher degree of automation, optimization, and augmentation.”

A data fabric architecture has the potential to expose data “to AI and other projects without having to move it,” Porst continued. “A well-implemented data fabric can analyze and extract information regardless of where it is stored. Crucially, it can build a connected network of data assets, giving businesses a holistic view of where their data lives.”

Looking ahead, Porst sees the data fabric approach continuing to help businesses “unlock their data and open it up to the power of AI.” In addition, she added, “there will most likely be a significant increase in the level of automation, self-healing, self-learning, and knowledge awareness that takes place within data fabric technology to achieve hyper-automation.”


Without automation, a company has no chance to compete on data in any way, shape, or form, said Ayush Parashar, vice president of engineering with Boomi. “With 850 applications on average, an enterprise has too many data sources and sprawl for manual data integration, analysis, or preparation to make sense,” he said. “At the core of any data strategy, you need to know your data really well—automation can help a company do this quickly and reliably.” 

Still, even with all this data, automation isn’t yet pervasive enough to make a difference. “There’s only about 5% to 10% penetration of automation at companies, maybe even less,” Parashar said. “The emergence of smart discovery and integrators will enable such automation. For example, if we can auto-discover and identify all parts of a company’s data sources, such as [through the use of] a smart discovery catalog that can feed into an integrator that can start syncing all of data automatically, data engineers, data integrators, and application engineers can immediately understand where data is coming from, where it’s classified, and how to leverage it.”

Currently, Parashar pointed out, “automation can connect 10 or so business applications together without interference and do data discovery. In 5 years, we should have automation that can do this for hundreds of applications at a time. We’ll also see new advances in automation intelligence. Right now, we have a lot of rule-based automation, but we’re starting to add more artificial intelligence, machine learning, natural language processing, and more on top of automation.”


Essential to any data-driven enterprise is an architecture built around information catalogs that helps data producers and data consumers understand the data available to them. These catalogs can “inventory, search, discover, profile, interpret, and apply semantics to not just data but also reports, analytic models, decisions, and other analytic assets,” said Tapan Patel, senior manager for data management at SAS. While information catalogs are in the early mainstream stage of maturity, the technology is understood and vendors are responding, he added.

There is a key role for AI and machine learning in the efficacy and performance of information catalogs. Looking to the future, Patel sees machine learning algorithms being used “to further simplify, augment, and automate the information catalog process for broader adoption.” For example, he explained, “the features of information catalogs are likely to include automated flags and alerts of data outliers; detection of personally identifiable information and recommended next steps; automated data profiling and role-based user access; suggestions to source, prepare, and serve data; and identification of issues with data pipelines and lineage analysis.”


No discussion of evolving architecture is complete without weighing the impact of ubiquitous cloud computing. Through public clouds, advanced features such as AI acceleration and high-performance computing are possible, said David Rhoades, manager of cloud and software-defined infrastructure at Intel. “The public cloud is a perfect environment for enterprises to run their database applications, and it’s evolving at a hyperscale pace.”

In the years ahead, “more enterprises will tap into the public cloud to run their infrastructure and software environments. We’ve already seen key workloads like AI and high-performance computing become mainstream in the public cloud, and this will only continue.” For its part, Intel is working with leading cloud service providers to support hyperscale data application capabilities, Rhoades noted.


A number of products—particularly analytical tools and platforms supported by the cloud—are becoming increasingly easy to use and more widely accessible to end users across enterprises. The versatility and accessibility of these technologies are driving the trend to data democratization.

Page 1 of 3 next >>


Subscribe to Big Data Quarterly E-Edition