Building a Modern Data Architecture for the 2020s

<< back Page 4 of 4

At the same time, “it’s no fun working in data analyt­ics when you are the bottleneck in your company’s busi­ness processes,” said Chris Bergh, CEO of DataKitchen. “A barrage of errors, missed deadlines, and slow response time can overshadow the contributions of even the most brilliant data scientist or engineer. The problem lies some­where in between the enterprise data platform architecture and the enterprise’s business processes.”

A digitally driven data architecture is a constantly evolving one. “A data platform will grow over time, driven by use cases creating business value,” said Burggraaff. He outlined the key components of a data architecture, delivered by commercial off-the-shelf software, cloud provider platforms, or in-house engineering teams. These are data exchange and integration, including data pipelines and streaming ingestion, data repositories, data infrastructure, data reporting and analytics; data management tools, including quality, monitoring, and lineage; and data access, including visualization, sharing, API management, and security.

It’s also notable that cloud technologies have brought data warehouses back into vogue. There has been “a huge swing back” to data warehouses for analytics, with the twist that these data repositories are now predomi­nantly cloud-based, said Wilkes. Extending these capa­bilities are data lakehouses, which provide an architec­ture “for data-driven businesses since they combine the best qualities of warehouses and lakes,” said Joel Minn­ick, vice president of marketing at Databricks. This approach enables “a single solution for all major data workloads, ranging from streaming analytics to busi­ness intelligence, data science, and AI.”

Along with tools, forward-looking methodologies are important for successful data architecture imple­mentations. Aligning the organization effectively calls for approaches such as DataOps, which ensure an automated flow of data managed by collaborative teams. “DataOps automation is adept at handling the delicate balance between freedom, necessary for innovation, and centralization, needed for efficiency, scalability, and governance,” Bergh said. This method “greatly simplifies migrating new and updated ana­lytics to production. A successful data mesh initiative requires a judicious mix of self-service, decentralized activity, and centralized shared infrastructure.”

In addition, next-generation approaches to cloud-based development and deployment, particularly con­tainers and microservices, mean new ways of working for data managers. “The direct effects of this on the DBAs and data managers is that there is an evolution where they need to understand and advise to accommo­date this shift to the cloud-native design,” said Schabell. For example, it is important to understand that with cloud usage comes pricing for bandwidth usage that requires reconsidering your data usage designs. “Think of the data usage by your organization’s application landscape, where data is constantly being pushed and pulled to or from the cloud environment. The resulting costs mean you’re going to have uncomfortable conver­sations very quickly unless DBAs and data managers understand, advise, and participate in the proper design phases of their organizations’ projects Schabell added.”

In the end, simplicity and openness will win the day. “Simplicity should be the goal to reduce data movement between systems, multiple copies of data, and errors in security and governance,” said Minn­ick. According to Carr, “There is unanimous agree­ment—though begrudgingly for some of the legacy players—that open platforms are more responsive to the business.” He also advocates greater ease of use and extension to non-IT users as a core architectural element and design principle driving all new feature and functionality implementation. “If it requires IT to operate the product or handhold users through data ingestion, preparation, analysis, and visualization, it’s a non-starter.”

Open standards and open formats “reduce lock-in on every level to maximize an organization’s ability to future-proof today’s decisions,” Minnick said. “If given the redo button, 50% of data and AI leaders would embrace more open standards and open formats.”

Data is an “appreciating asset,” Natarajan pointed out. “Data asset management and lifecycle tools should be used. The enterprise data architecture itself is an information asset and, as such, it should be easy to create, collaborate on, extend, and plan.”

<< back Page 4 of 4