With heightened pressure to compete on analytics, the purpose and role of data architecture is coming into clearer focus to meet business operating necessities. Recently, John O’Brien, CEO and principal advisor of Radiant Advisors, talked with BDQ about the principles of a resilient enterprise data architecture and the enabling methodologies and technologies.
Is there now a common understanding of what a data architecture is?
There continues to be confusion between good solution architectures and enterprise data architectures. A resilient enterprise data architecture will enable agile teams to develop data and analytics faster.
Do I think the concept of enterprise architecture is widely understood? I don’t think it ever has been because people confuse IT data management requirements with business customer requirements. Understanding enterprise architectures disciplines such as TOGAF and the Zachman Framework help enterprise architecture teams lay a foundation at a lot of companies. But you usually only find those in large, mature companies— which is not the bulk of companies that are out there today. Very rarely have I seen a well-organized team within companies. However, keep in mind is that there is a likelihood that I don’t see clients with a full-blown discipline because they don’t need our help.
For more articles about the future of data management in 2022, download Big Data Quarterly: Data Sourcebook (Winter 2021) Issue
What are some principles for a successful enterprise data architecture?
First, we need a definition. Many companies are looking for consistency in delivering data for analytics in the business. When we talk about enterprise data architecture, we are talking about building road maps and building an architecture that enables the delivery of business analytics. What does the business side require from a data architecture? Resiliency is a top need we see today, and another might be digital transformation. Depending on the requirement, you can do your data architecture a little differently to support agile delivery of reactive and ideation in analytics.
Can you provide an example?
We have some clients that are doing a lot of mergers and acquisitions as a result of the volatility that has been caused by the pandemic because there is an opportunity to buy companies. Those organizations need an architecture that’s going to be really good at quickly integrating all of these companies and their data and systems. We can gear the architecture for that with standardized data ingestion, data analysis, and consolidation patterns. But when you gear an architecture for resiliency, it is to deliver data and analytics capabilities to the business faster.
What does a resilient data architecture require?
If you want to be resilient, number one, you need to be agile and adaptable. Number two is that you need capabilities for data enablement to empower people to figure things out quicker. A resilient data architecture includes having tools—data prep tools, data visualization tools, agile-type tools,?and having a community and a culture that encourages people to connect, to learn SQL, and to embrace data literacy. It is important to build an architecture that doesn’t have this big IT wall on the side that requires you to know the secret passcode. The third thing is openness up and down the stack. We are doing a lot of work around data lakehouses and data mesh architectures, and all of this is really about openness. Data is ingested and put into a format that makes it accessible, no matter what tools people are using throughout the organization. They can all access the same data and share and collaborate with it.
Why is the combination of these three attributes—agility and adaptability, capabilities for data enablement, and openness—essential?
All three of these in a resilient architecture enable the business to be more self-sufficient and quicker to respond to the changes in the business environment. For some companies, it’s the current supply chain challenges looking for alternative solutions and, for others, it’s finding new ways to connect with customers during a public lock-down. Overall, the ability to look for insights and act on them makes the company more adaptable to change.
What are the biggest hurdles you see in this process?
I see companies challenged architecturally with understanding the role of their data lakehouse, their data foundation, the curated datasets, analytic datasets, and data hubs. A modern data architecture is so different from a traditional one, where it’s a hub-and-spoke model or architected data marts, and very waterfall. We distill everything down and it’s really like a mesh. It’s hard for people to wrap their mind around this kind of open distributed scalability. There’s usually an urge to consolidate data to manage it rather than monitor distributed data.
What other challenges do you see?
I am also finding a challenge in that teams are so lean and small and there is quite a bit of upskilling that is necessary to migrate to a modern cloud data architecture. There is a steep learning curve for a lot of these people to transition from what they’ve learned in the last 10 or 20 years to what they need to learn now in order to work with a cloud-native architecture mindset and services. They also have limited budgets for tools and technology these days with fear of cloud-usage billing surprises. Saving money usually leads them to the question of whether they should use an open source version or buy a vendor version. One common mistake I see people making is thinking that they are going to do all of this coding in Python and deploy it, but then they run into a lot of skill and deployment problems, whereas, if they had just made the investment in a well-made tool, it would have solved much of the inconsistency and orchestration. They think they’re saving money, but they’re actually creating another problem.
What is at the root of this problem?
I think it’s tactical “pandemic uncertainty thinking”—being conservative with funding and projects. Companies can know something is good, like cloud migration, but they really can’t fund it properly. They think that is a cost-saving measure rather than thinking of funding the new initiative as a strategic investment. A big challenge we see clients expressing is: How do we do what we want on such a limited budget?
What is the answer?
Our answer is: prioritize and align to business outcomes, deliver fast, and refactor/evolve into a more resilient architecture incrementally. Architectural building increments have to be justified just like they are in a business project. There has to be a quantifiable return on that investment in enterprise architecture. Documenting and communicating architecture benefits with measurable ROI has always been a challenge.
Are there any hot technologies in use now or on the horizon that people should pay attention to?
I have a few favorites. One is robotic process automation, coupled with analytics automation. There is so much low-hanging fruit in organizations, especially in the public sector, to empower people to automate their manual processes and bring analytics into their workflow. In terms of database technology, what I find most interesting is the cloud-native multimodal databases—the databases that are capable of doing columnar analytics, graph analytics, and geospatial time series, in addition to relational/ SQL. The value lies in not having to manage five different database technologies with locked-up data in them but instead having the data under one hood and open for different analytics engines.
Are there other supporting capabilities?
People need to be able to trust the data. What we have to do on the architecture side is deliver data, but we also have to share its quality assessment and lineage so people can use it at their own discretion or show that the data has been certified by these user groups that really like it in a certain context and verify that it can be trusted. There needs to be governance and security in place to say, “Don’t worry, we’re not going to let you misuse it.” If we’re going to deliver enablement, we have to deliver healthy, trusted data, so people can work confidently. If you give people a tool and a database connection, that doesn’t mean they’re going to be empowered to work with the data and, therefore, they can’t make decisions. They will be calling around, asking for help. IT needs to enable collaboration in the architecture for that side of the house.
Where does DataOps fit in?
It is one of the top requests we have from clients, and we have a DataOps course that I teach at conferences in the U.S. and in Europe. It’s been a demand area for us because while you can give people an ETL or ELT tool for integration, the question is then, how do you use it and what is the methodology? The process side of data work or pipelines is DataOps. And so, understanding the principles behind The DataOps Manifesto, understanding how to adopt it, and how available technologies should align to it is critical to building, deploying, and monitoring to data pipelines at an enterprise scale. A lot of people are trying to learn the DataOps methodology, get experience with it, develop proficiency at it. People understand that it is the way forward but getting there takes time and practice.
As you look ahead, is there anything else on the horizon?
I am happy to see that MLOps is finally getting more traction. A lot of clients have been held back because they can do data science projects, they can deploy some things, but they can’t take it to the next level because they hit a wall without MLOps in terms of scalability. It’s the difference between delivering 10 machine learning models to the business in a year versus hundreds of models. Companies also need to master model testing, multi-racehorse analysis, and monitoring models, and figure out which one continues to work best and when reinforcement training may be needed. DataOps and MLOps are the two leading methodologies that are enabling companies, more so than any new technology, and the data analytics architecture does have to be designed to enable them. The technology is already there, but learning and changing the way you work takes time and persistence.
Looking ahead, what is the biggest challenge that companies are facing in implementing a data architecture?
One of the things we try to do is socialize a common vision for data and analytics architecture so that everybody in the organization can relate with their own agile teams or their data science teams or BI teams. The major problem, as I mentioned, is that the teams are leaner and have to wear many hats from different disciplines, unfortunately, and this presents a big challenge. It is also why it is important to have a clearly defined road map for data architecture and standards so that doesn’t need to be reinvented on every project.