How Knowledge Graphs Automate Data Preparation

Video produced by Steve Nathans-Kelly

At Data Summit Connect 2020, Thomas Cook, director of sales, Cambridge Semantics, explained the basics of knowledge graphs and how they leverage natural-language processing to automate the often-cumbersome process of data preparation.

Full videos of Data Summit Connect 2020 presentations are available at

"So how do we create a repeatable way to get the answers that we need?" Cook asked.

Gartner, last year in their top 10 trends of 2019, had this quote: "Graph analytics will grow in the next few years due to the need to ask complex questions across complex data, which is not always practical or even possible at scale using SQL queries."

Knowledge graphs can be the answer and AnzoGraph DB is uniquely positioned to create knowledge graph applications, Cook explained. 

The ability to combine both structured and unstructured data into a single place and create relationships between those entities is critical. The ability to use natural-language processing, which processes unstructured data extracting the entities and relationships from that is also a key element. Many of the popular NLP tools automatically create RDF data, which can easily be loaded into the database, according to Cook.

The graph data model is also very flexible. It doesn't resist change the same way the SQL data model does. Refactoring is not necessary when there's a change to the application or the data sources. All this drives insights on the relationships, as they're a first-class citizen in a graph database, he observed. 

AnzoGraph DB also supports traditional analytics and also AI and ML. The feature engineering is a key element as well. There are also many industry-standard data models, which support RDF and the graph model. For example, Fibo for financial services, HL7 FHIR for financials, for healthcare and then there's also many public knowledge graphs like Wikidata and DBP mediaand these continue to grow and expand, he said.

Knowledge graphs are a connected graph of data and associated metadata applied to integrate and access an organization's information assets. The knowledge graph represents real-world entities, concepts, and events, as well as all the relationships between them, yielding a more accurate representation of a business's data.

"You can see the different views that exist of a knowledge graph, from an executive view to the data architect's view and the ontologist's view down to the very granular level," Cook said. 

Anzograph DB provides the ability to create this canonical data model to have a better understanding of the overall data, as all of the data is in a single place to make it easier to find and access the right data, automate complex data preparation tasks, and perform deep link analysis of complex relationships for improved insights.