Semantic Graph Databases: A worthy successor to relational databases

By Barry Zane

Nov 2, 2016

The development of database technology is one of the defining achievements of the information technology era. It not only has been the key to dramatically improved record-keeping and business process automation, but also has enabled enterprises to get tactical and even strategic value out of the data that was stored.

Yet, that relational technology model has reached its limitations, as it was not built in anticipation of the big data movement which deals with a rapidly increasing volume and variety of data sources of all shapes and sizes, accompanied by soaring expectations for new insights and data-derived value.

A new semantic-based graph data model has emerged within the enterprise.

This data model has all of the advantages of the relational data model, but goes even further in providing for more intelligence built into the database itself, enabling greater elasticity to absorb the inevitable changes to data requirements, at cloud scales.

The Evolution

Relational technologies originated in the 1960s and 1970s, and were characterized by a tabular format in which data were stored in rows and columns. Empowered by Structured Query language (SQL), users were able to issue queries in adherence to data modeling configurations predicated on predefined schema. Significantly, the basics of this paradigm remained in place for the better part of 45 years and are still regularly deployed today.

Nevertheless, a number of gains in various facets of IT during the past two decades helped to propel operational database technology forward. Storage advancements made the amassing of greater amounts of data less expensive. In-memory capabilities surpassed traditional memory restrictions, and massively parallel distributed computing methods, in conjunction with greater CPU capabilities, highly accelerated traditional computing speeds. Consequently, the relational model began to expand.

That expansion has resulted in the emergence of semantic graph databases, which can do everything possible in relational systems—and so much more. Semantic graph databases have finally achieved performance parity with other databases, but now offer unprecedented flexibility and the ability to reasonably accommodate much richer varieties of data at volume.

Better Relational Methods

The natural evolution of relational technology to semantic graph databases is underpinned by the numerous points of similarity between these two approaches. In each instance, the capabilities of graph exceed those of relational simply because database necessities are easier in a semantic graph environment. That ease of use, coupled with the commonalities between these two approaches, is directly responsible for the transition of operational database technology from relational to semantic graphs.

The smart graph database approach to data modeling typifies this fact. These databases utilize an expanding semantic model that readily incorporates new varieties of data sources and more easily adjusts to changed requirements as they arise. Conventional concerns about schema and structure no longer apply in this environment. Organizations merely take the data they already have and evolve a unified model based on standards to which additional sources and requirements must adhere. Subsequently, linking disparate data sets is far easier in a semantic graph setting.

The same principle applies to transformation and analytics. The semantic model includes reusable mapping from source systems to target ones for the purpose of transformation, including all relevant business, industry, and system information. That mapping also provides the basis for the generation of ETL jobs, without code, on the ETL tool of choice—including ones used with relational technologies. The unified model’s efficient linking of data provides a rich contextualization of relationships to inform analytics.

Beyond Relational With Better Relationships

The intrinsic understanding of the relationships of the underlying data is the premier advantage of semantic graph technologies. When leveraged at scale, this fuels countless capabilities that are impossible with most other technologies. The granular nature of semantics enables the data to determine the relationships among its various elements, as opposed to guessing what those relationships are and then asking the data to confirm them. The richness of this contextualized understanding of how data is linked and related across sources, structures, and systems creates remarkable analytic insight, especially when exploited at scale.

By deploying a highly distributed, massively parallel query engine based on in-memory processing of semantic data in smart graph databases, organizations can analyze whole data sets incorporating an enormous variety of interlinked entity types at once. Even better, these environments make it possible to link all enterprise data and encompass them in a single query. This approach eliminates the myriad, linear steps that other technologies require to traverse through data at this scale, assuming they can account for the integration of such disparate data in a timely fashion. The practicality of these realities is demonstrated in examples pertaining to intelligence, fraud detection, and pharmaceutical testing. In each of these use cases, databases allow users to query a host of different factors related to some pressing application. Those factors frequently include multiple types of data and their relationships to one another, which are easily discerned in a standards-based environment.

As convincing as the contextualization of data in semantic graph databases is when conducting analytics at scale, it is perhaps more compelling to consider the ease with which such queries are issued. In a standards-based setting, one can determine an exhaustive list of relationships between data at scale in an average of 50 keystrokes or less. Accounting for those relationships in other environments requires determining all of the combinations of results beforehand—and attempting to do so at scale.

Faster

The fundamental understanding of data relationships, reduced query complexity and scale produce a causal effect on the velocity of semantic graph databases, which is illustrated in two principal ways. The first pertains to query speed and the sheer amounts of data the previously mentioned semantic graph database engine can rifle through. The combination of parallel processing and in-memory techniques maximize the discovery of relationships across a unified semantic model, enabling the parsing of billions of semantic statements each second. The significance of this fact becomes clear when considering that traditionally, issues of scale and speed (particularly in operational settings), have hampered graph databases. The technological advances that power contemporary query engines in semantic graph environments have addressed such concerns.

The second demonstration of the speed of smart graph databases relates to the data preparation preceding analytics. The ease with which data modeling, transformation, and ETL jobs are facilitated—in addition to their reuse—with semantic technologies drastically accelerates what can encompass lengthy time periods with others. This fact is magnified in production and with the realities of fluctuating business, industry or organizational requirements. By spending less time on data preparation and queries, users are able to maximize insight while simultaneously reducing costs.

Cheaper Total Cost of Ownership

A key point of semantic graph databases that is likely to resonate most with upper level management is their decreased total cost of ownership (TCO). Measuring those savings requires scrutinizing both tangible and intangible results. Intangibles are characterized by increased efficiency at the individual, departmental and enterprise levels. The swiftness, ease of use, and analytic insight of the graph-aware approach of these databases translates into higher rates of productivity, allowing organizations to optimize their manpower. Again, the relationship between scale and speed correlates into employees accomplishing more in less time with more data, which is ideal for most businesses. The cost advantages are associated with utilizing a single mature semantics vendor’s platform for all aspects of data management—from ingestion to insightful action. When compared with the typical piecemeal approach of many organizations, this method simplifies and hastens time to action.

A crucial aspect of the lower TCO of this method involves customer enablement: the evolution of skills required for data-driven companies. The true progression of operational database technology transcends IT to manifest in the skills required of users. The similarities between relational databases and semantic graph databases allow for a natural evolution of user skills, which is substantially more cost efficient than having to finance a slew of data scientists or those with Hadoop experience (One immediately recognizable example is in how much SPARQL, the standard language used to query semantic graph databases, has inherited from the standard SQL query language). The result is minimal outsourcing or consultancy because organizations can leverage the existing skills, and their evolution, that they already have.

Overcoming Technological Constraints

Companies who are adopting the graph approach are fundamentally transforming their organizations in the “data age.” Semantic graph databases are the successors of relational databases. They represent the organic evolution of the relational paradigm and its intersection with IT developments in memory, storage and computational processing. The progression from relational to semantic graph databases enhances technology, database fundamentals, and the skills required to use them in a unique way that has made smart databases undoubtedly better, faster and cheaper than their forbears. Nearly every aspect of working with a relational database management system (RDBMS) is improved with semantic graph databases.

Image from Cambridge Semantics.