The Significance of Data Lineage: Industry Leader Q&A with MANTA’s Ernie Ostic

MANTA brings intelligence to metadata management by providing a data lineage platform that automatically scans your data environment to build a powerful map of all data flows and deliver it through a native UI and other channels to both technical and non-technical users. With MANTA, everyone gets full visibility and control of their data pipeline.

MANTA recently released version 38 of its data lineage platform, giving users access to new and improved lineage capabilities for further visibility into data environments and access to the highest quality data to inform business decisions that fuel company growth.

MANTA’s SVP of Products Ernie Ostic explains what data lineage is and why organizations need it.

You've had a storied career in the data industry, from small software companies to IBM. What brought you to MANTA?

I loved IBM but realized I wanted to get back to small environment to focus again on something that has become my passion: data lineage. I’ve been blogging about that since 2008 and it has been a passion of mine throughout working with IBM. When I was there, I became the de facto product and field person around data lineage.

That got me to MANTA and working with other data lineage companies. When I was meeting with MANTA I clicked with the CEO and joined MANTA about three and a half years ago.

Awareness of data lineage is steadily increasing. Why?

There are so many use cases where people need to understand where data came from in order to feel comfortable and competent with it. Early on it was driven by certain industries that felt pressured to have regulatory compliance.

From 2006-2010, lineage was not well understood. But in 2009 during the recession, data lineage became important within the financial industry, forcing banks to have credible record keeping in order to understand commercial loans, etc. Regulators said, “you need to prove that you know where data came from that you’re making decisions on the level of risk and exposure.”

They had to comply and know where data was coming from and what helped drive it. This led to a discipline called data governance and information governance where people wanted to be able to enable decision makers to be more comfortable to make faster decisions without asking so many others.

Data lineage is now expanding to companies that are trying to make decisions with information that isn’t simple to know. People inside the organization must track it down, having the tools be the significant driver for information governance. It’s part of a “three-legged stool,” people care about what it means, how it’s defined, quality, the condition, can I trust it, and where did it come from.

Some key reasons driving it right now are GDPR and other regulations regarding privacy. People want to know where their information is being sent, where it came from, and what elements of data reflect personal information.

What challenges does MANTA solve, particularly as they relate to the large swaths of data collected by enterprise organizations?

We help people get more insights into their data and where it flows through an organization. And we help people prioritize things within their enterprise so they can examine data and understand patterns.

People are using different tools and providing a path of that data is critical.

MANTA can present more than pretty pictures. We can identify things, examine the way data is flowing, and see objects that were never touched.

Because we know about flow, MANTA can apply smarts to apps as well as do things like looking at changes to examine the code to determine lineage. MANTA can see which things are changing within the environment.

MANTA is in the process of building some tools to make it more automated so people can get reports on what’s been changing.

What is the number one challenge your engineering team faces? How is MANTA addressing it?

The number one challenge has been gearing up to provide MANTA in the cloud as a service. We are taking our time with it because our biggest customers—banks, retail, insurance, etc.—are reluctant to go to the cloud, so MANTA is setting pace along with them. Everyone knows that the future is in the cloud. Those large banks are moving in that direction.

Why is the ability to automate data lineage collection a benefit for MANTA's end users?

Lineage is hard and it takes discipline and understanding of where data comes from, which things to analyze, and it takes technology. Overall, it takes a certain degree of hybrid activity.

Organizations want a large percentage to be done in an automated way so it’s faster and accurate. They want to do data lineage as immediately as possible without any effort and do it without running anything specialized.

There are still companies that use file transfer protocol or are not perfectly scripted. Things are manual, but they still want to capture that lineage.

What is active metadata? When combined with MANTA, what impact does it have on an enterprise organization?

It’s one of those new buzzwords. The definition I think fits and resonates the most is that active metadata is about taking advantage of knowledge that a solution or company can harvest from all its different solutions and share it where it is most useful, in the right context for a particular user.

At MANTA, we can see active metadata in analytics where we make discoveries when something changes and then notify the organization. MANTA has a framework in place that when an event occurs, the system then can throw a flag to let people know something happened.

 How does MANTA compare to other data lineage providers?

Data lineage providers fall into different categories. MANTA believes strongly that looking at the real truth or source code is the true way to understand lineage. Our approach gives us this foundation, which is not the same across other vendors.

Our ability to scale up and handle not just 10s of millions but 100s of millions of assets separates us from others.

For example, we did some benchmarks and included Neo4J to support these massive customers and needs. That separates us also. We believe that the research and work we’ve done on parsing puts us ahead of others in this space because of the fact that we pick up nuanced syntax parts of various languages that represent their lineage to give people clarity.

How do you see the need for data lineage expanding over the next few years? How will MANTA respond to that need?

The need continues to expand, there’s no question about that. Even though I’ve been talking about code-based lineage, there are other use cases that come into play along the lines of active metadata and new approaches to do lineage. We see that everyone is talking about lineage now; I used to have to convince people that it’s worth it. Everyone appreciates that they need it for maximizing their ability to use data. MANTA is well positioned to take advantage of that.

As lineage expands to more companies and enterprises, they will need MANTA to be available in the cloud. We are finishing our hosted SaaS offering next year and that will open us to more companies without the infrastructure today. It’s going to be a big trend as we see more companies across more verticals getting into lineage and they’ll need a flexible company to help them either on-premises or in the cloud.