Data management is going through a paradigm shift—again, said Joe Caserta, president of Caserta Concepts, who delivered a keynote titled, “Architecting Data for the Modern Enterprise,” at Data Summit 2017 in NYC.
The problem is that every time there is a major shift, there are people who can make the change from what they have been doing previously to what needs to be done now—and some people who can’t.
As we move into today’s data-driven culture, it is important to understand that everything you thought you knew about data you need to check at the door when entering your new life working with data, said Caserta. “You have to evolve or you are going to die, just like any other process in evolution.”
Complicating the current situation at most enterprises is that data is being created on a massive scale through tweets, instant messages, emails and more, and copies of data are also proliferating through shadow IT. The answer to the complexity, says Caserta, is the often maligned data lake.
New companies like Airbnb and Uber don’t have the baggage that older companies have, and as a result, older companies must change the way they do things to stay alive and compete.
The paradigm shift is in the way we onboard and process data, he noted. Formerly, we created data structure before we would ingest and analyze it. Now, we ingest and analyze data, and then structure it. This allows immediate access for both analysts and data scientists. We have also moved from fixed capacity to on-demand infrastructure. Large datasets and new datasets are being added at a rapid rate. They grow or shrink on demand; and many of the providers are startups. We are also moving from “monolith” to “ecosystem” and there is no one set of tools that will solve everything. Instead, we will use a diverse set of tools that will evolve over time, and use combination of three overarching concepts: the cloud computing, the data lake, and polyglot warehouse, said Caserta.
Caserta suggests that the way to structure the data management system is the "Corporate Data Pyramid," whose first layer is the Landing Area for ingestion of raw data, and is architected very much like the source systems. The next level is the Data Lake, an integrated sandbox, where data is lightly governed to organize, define, and complete it. Next is the Data Science Workspace for munging, blending, and machine learning, and finally at the top is the Big Data Warehouse for arbitrary/ad hoc queries and reporting where data is fully governed and trusted.
This process of implementing a Corporate Data Pyramid is not technically complex, said Caserta. However, what is vexing are the political, emotional, and human aspects. “We are walking in and turning everything upside down. That is the most challenging part.”
Figuring out how to deal with people who are worried that their jobs are in jeopardy or that they will not be able to learn what they need to survive is not easy, and a process is required today to help people through it, he noted.
This is why, Caserta said, the newest position at the organizational leadership level is the emerging role of chief data officer who is responsible for changing the company from being conventional wisdom-driven to being analytics-driven.
According to Caserta, the CDO is responsible for:
- Providing a single point of accountability for data initiatives and issues
- Finding ways to use existing data
- Enriching and augmenting data by combining internal and external sources
- Supportomg efficient and agile analytics through training and templates
- Evangelizing a data vision for the organization
- Supporting & enforcing data governance policies via outreach, training & tools
- Monitoring and enforcing data quality in collaboration with data owners
- Monitoring and enforcing data security along with Legal/Security/Compliance
- Working with IT to develop/maintain an enterprise repository of strategic data
- Setting standards for analytical reporting and generating data insights
The person in this role is responsible for changing the entire culture of the organization, while also understanding additional issues such as the revenue and governance aspects, and managing the people. “They say the data scientist is really hard to find, but the CDO is the hardest,” said Caserta.
Many conference presentations have been made available by speakers at www.dbta.com/datasummit/2017/presentations.aspx