<< back Page 2 of 2

Semantic Graphs and the New Data Integration Landscape

Representational Power Unlocks Reusability

Relational models also create missed opportunities because the relational model is a leaky abstraction, and it’s very unusual to see relational schemas or data models reused across many different applications. In fact, it only seems to ever really happen when reusing actual physical relational databases or the equivalent, including materialized views.

Yet, nowadays, multiple schemas, or data models, are required to manage an enterprise. Which is just to say that there are many different perspectives within an enterprise and there is no hope for one data model that will satisfy all those various perspectives, use cases, and requirements. This is true because a data model establishes the meaning of the data as well as the relationship of various entities to one another. At enterprise scale, it is inevitable that different business units define terms differently; it may even be required by regulation or law. And, with the increasing relevance of third-party data, it’s simply impossible to impose one definition upon all data producers or consumers. 

Given the limits of conventional data integration, the typical way of overcoming this issue is to copy data for each new use case, creating a new and distinct data model in the process. Even with all the advances in IT over the past 20 years, the most common data integration technique is still batch or bulk copying of data. However, this practice leads to a proliferation of data within an organization, degrading its quality and causing uncertainty over which copy is the source of truth. Then, when faced with a new project that requires making existing applications speak to one another, effort is wasted on patchworks of otherwise unnecessary efforts to “reintegrate,” that is, to undo the copying that’s been going on, in lieu of an actual solution, to try to work back upstream to something like a source of ground truth. 

It’s important to understand that the disease is actually caused by the purported cure; or, if that seems a bit extreme, the so-called solution is at a minimum making the disease worse instead of better. All of this storage-level reintegration leads to the enterprise being very slow to respond to emerging threats, crises, and opportunities. When unanticipated questions or needs arise, work grinds to a halt as the data preparation starts anew. This reactive data strategy leaves teams flat-footed when the market shifts or new questions arise. As such, enterprises require a more responsive data strategy, one that keeps pace with the needs of the business itself.

Support for Data Reuse

Data integration based on a semantic graph can end the cycle of copies of copies of copies within the enterprise because it’s able to represent business meaning at a level of abstraction beyond the storage layer. Additionally, this representational power leads directly to reuse of data rather than copying copies of data. Reusing both data and data models means that each new application, or response to a new crisis, requires less time and energy because reusing previous work builds value incrementally over time. New project timelines plummet. Enterprise responsiveness increases.

In fact, the largest information integration projects on the planet already use this semantic graph model. Consider your web browser, for instance. The web itself contains a world of information, created by different contributors, and accessible through a single browser. Google Search, which includes a network of 500 billion facts about 5 billion entities, uses a knowledge graph as well. Both Google and the web are great examples of this large-scale, complex, and decentralized information integration style as it delivers information based on its meaning and relationships.

Traditional data integration based on the relational model originally arose in response to the creation of storage and database systems based on that same model. Today, integration must follow the intersection of the needs of the enterprise and the nature of the data itself. Those needs and the nature of data integration have changed dramatically, and it’s good to see that modern integration systems such as semantic graph technology are stepping up to capture the real-world context of data, regardless of where it resides and how.

<< back Page 2 of 2


Subscribe to Big Data Quarterly E-Edition