Leveraging Ontologies for Extracted Structured Data (VIDEO)

Oct 1, 2019

By Joyce Wells

In this clip from his Data Summit 2019 presentation, Access Innovations' Bob Kasenchak defines the role of ontologies in defining relationships within extracted structured data.

Kasenchak explained how ontologies serve two purposes. “We need to define the classes of the objects that we have extracted, and give them properties and explicate what kind of relations they can have with each other. Second, we need to define the relationships between the objects, the predicates in our graph database as carefully and specifically as possible. The first type of ontology, sometimes called an upper ontology, is used to say that this object is a date, and therefore, it had certain formatting rules. This object is a name, it can have alternate names, and has other properties and so on. So, the upper ontology defines the basic domain-independent concepts as well as some basic relationships between them, an author is a person and things like that. They can be extraordinarily useful for resolving your data set with other datasets. If we all agree, for example, that we're going to use the same upper ontology classes for names, and they're generic, they are very broad, high-level data object modeling."

DBTA’s next Data Summit conference will be held May 19-20, 2020, in Boston, with pre-conference workshops on Monday, May 18.

Core or intermediate ontologies are essentially the upper ontologies for broad application domains, said Kasenchak. "Specific to the field or the industry that your data is describing. This might help you make real world decisions for which upper ontologies may fall short for certain domains or what problems you might find in them. Domain-specific ontologies are the lowest level and are used to model topics and objects that are particular to the vertical or industry or topic that you are dealing with. This will include some kind of taxonomy or thesaurus vocabulary for classification and other things."

There is some ambiguity and not every ontological structure is cleanly categorizable and that this can be overwhelming, noted Kasenchak. "The good news is that lots of upper ontologies already exist, you don't have to build one. You can go find one that already exists in the world. Some common upper ontologies Cyc, SUMO, and BFO, are all widely used, they're more or less freely available, you can go out in the world and find lots of information about them, you don't have to go build the upper ontology. That work has been done for you. These will all have basic objects and concepts already modeled in them. One or the other might be a particular fit for your data; Cyc is built for machine learning and complex objects. You might not ever need to model something like intangible objects in your data, but they have something existing. I think that's kind of weird, but it works, and it’s widely used in machine learning-type applications. SUMO, on the other hand, is a little less abstract. It has measures and objects and processes but it doesn't have things like intangibles."

In the end, you're going to have to do a little exploring to figure out which upper ontology is suitable for your project, said Kasenchak. "Schema.org and the W3C are great places to start for that stuff. There are lots of existing intermediate and domain ontologies--far too many to try to go through. But there are a couple of registries that you can look them up on. Bartoc.org has, I think, surpassed everyone and is now the main place that people go look, it's maintained by the University of Basel and it's a registry of taxonomies and ontologies that then point you to the owners of those things. So, if you need to build an ontology to describe something in your model and domain go search there and see what you can find."

Any existing domain ontology will likely require some customization to meet your needs, Kasenchak said.

"Be sure to check the licenses. Although many ontologies are freely available to use and adapt, you may be encouraged to upload your customizations to the library to build on a lot of these projects are done as open source projects. I think that is the most likely scenario. Is that you'll find something that works, that you're going to need to do a little addition and customization of. What I do want to emphasize is that it's very common to take objects from various ontological sources and combine them to suit your needs. Most ontologies are structured in the same kinds of common languages, that allow for and facilitate this kind of combination."

Many presenters have made their slide decks available on the Data Summit 2019 website at www.dbta.com/DataSummit/2019/Presentations.aspx.

To access the full presentation, "Structured Text to Knowledge Graphs: Creating RDF Triples From Published Scholarly Data," go tohttps://datasummit.brightcovegallery.com/detail/video/6040884584001/a204.-from-structured-text-to-knowledge-graphs:-creating-rdf-triples-from-published-scholarly-data?autoStart=true&q=kasenchak#links