Wednesday’s Data Summit 2022 keynotes opened with Laura Sebastian-Coleman, data quality director, Prudential Financial, who discussed “Data Quality Deniers & What We Learn From Them.”
One of the biggest organizational obstacles to data quality management is basic pessimism about the possibility of managing the quality of data. This is due to lack of clarity—the goals and processes for data quality management have not been defined or have not been understood—and disbelief that the quality of data could be subject to control.
“It’s about common sense and applying common sense,” Sebastian-Coleman said.
She described an example in which analysts were learning to use profile techniques via assessing zip codes. However, there was missing data and it’s not OK to have nulls or defaults in zip code data, she explained.
There are problems with the data but the real problem is people are afraid to ask questions or say the data is OK when it’s not, she said.
“People are afraid of being embarrassed or held accountable,” Sebastian-Coleman said. “Then there’s a pessimissim about data and there’s a lack of understanding about what data quality management is.”
Data management is the application of applying techniques from quality management to data, she explained.
“You can improve efficiency, reduce waste, measure quality,” Sebastian-Coleman said.
The role of DQ management is about understanding three questions within your organization. These questions include:
- What do we mean by high-quality data?
- How do you detect data that is low-quality?
- What actions do you take when you have low-quality data?
What gets in the way of answering these questions is data denialism. There are several kinds of data deniers.
One is the Straight-forward denier, who is convinced there are no problems. The second group is technicians. These types of people don’t have an opinion on the data but believe technology will solve the problem. The next group is the “it’s not my jobbers,” who know there is a problem but think others will solve it. Another is the perfectionists, who assume everything should be perfect and if you don’t meet your goal, there’s no point. The scientist skeptic believes data should be “fit for purpose” and only adheres to that definition. The last type is the accountant, who looks at data quality through the lens of cost.
“If we think about the lifecycle of data, there’s only one point in that lifecycle where you create value,” she said. “The customers, the people that use it are the most important people.”
All processes that touch data can affect the quality of data. Data quality management must raise awareness of those connections. All data is not equal, some needs to be prioritized than others.
Poor data quality can lead to inefficiencies, money lost, distrust among employees, customer dissatisfaction, and more, she said.
“The deniers are not completely wrong and we can learn from them,” she said. “You need to work with your people who have different perspectives and try to share with them what they can do to make things better.”
She recommended a set of rules to apply common sense: Start small but keep the big picture in mind. Always prioritize. It’s a circle for a reason. And learn from everything you do and everyone you meet.
“Improvement depends on the willingness to be open to change your own ways and influence other people to change their ways,” Sebastian-Coleman said.
Innovating with Graph Databases
David Mohr, Regional VP, Neo4j zoned in on “Innovating With Graph Database Technology” with his talk.
From powering NASA’s mission to Mars to driving business innovation for Fortune 500 companies, graph database technology is delivering value to organizations across the globe.
There’s an explosion of data and there’s a rise in the connectedness of data, Mohr explained.
“Graph really plays in driving business value,” Mohr said. “Finding those connections and exposing those is going to expand that value.”
Connections in the data are as valuable as the data itself, he noted. Networks of people, transaction networks, and knowledge networks are filled with relationships that can be optimized. Harnessing connections drives business value.
Graph is a unique advantage, he said. Citing Gartner, he said, “Finding relationships in combinations of diverse data, using graph techniques at scale, will form the foundation of modern analytics.”
To unlock the value of data you need to store data relationships natively. Model data as a network (graph) of data and relationships, which is very intuitive. Use relationship information in real-time to transform your business. And add new data and relationships on the fly to adapt to your changing business, he said.
John Lynch, senior sales engineer, AtScale, closed out the session with his discussion on “Using A Semantic Layer To Drive AI & BI Impact At Scale.”
Using a semantic layer makes data accessible and accelerates the business impact of AI and BI at your organization.
A semantic layer provides the agility, consistency, and control needed to scale enterprise BI and AI, Lynch said. The Value a semantic layer supports data literacy and self-service, controls the complexity and cost of analytics and provides consistency and governance for AI and BI.
The annual Data Summit conference returned in-person to Boston, May 17-18, 2022, with pre-conference workshops on May 16.
Many Data Summit 2022 presentations are available for review at https://www.dbta.com/DataSummit/2022/Presentations.aspx.