When asked, our consultants tend to use property graphs for content and metadata management and data analytics-heavy use cases, and they choose RDF tools for most other purposes. This is an important area to keep an eye on though, as the landscape is changing
quickly. Property graphs are now pushing a new proposed standard called Graph
Query Language (GQL). GQL would be supported by an international standards committee and improve interoperability between property graph solutions. RDF graphs are beginning to add many of the visualization features and integration with JSON that were historically found more frequently in property graph databases. This convergence of these two
divergent solutions is good for organizations adopting graph databases but also makes it more difficult to make hard-and-fast statements about the use of these two different graph database offerings. Organizations looking to purchase a graph solution should look at both types of graphs closely to see which option best meets their needs.
What’s Ahead in Graph Database Technology
The graph database market continues to change rapidly. Graph databases are adding
a number of new features that change how they can be used. The RDF vendors are investing money in improving their analytics capabilities, text processing, and
relational database support. Neo4j used to be the clear leader in graph analytics. Products such as Stardog and Ontotext are now rolling out advanced analytic features such as the
ones offered by Neo4j. This includes features such as PageRank, path optimization,
and graph embeddings. PageRank, developed by Google, is an algorithm for measuring the value of a node in the graph based on the number of links and the importance of the links
that are associated with it. It is a powerful tool for recommendation engines or analytics solutions to identify important entities in a graph that might not be seen using old-fashioned databases. Path optimization is an algorithm that improves query speed by identifying the fastest path between two nodes in the graph database. Finally, graph embeddings are a way of scoring the closeness of items in a graph so that machine learning can be used to
identify the proximity/similarity of nodes in a graph database. These features are now becoming common in graph databases and offer organizations powerful tools for executing complex analyses from within their graph databases.
Natural Language Processing
In addition to the focus on advanced analytics features, many graph vendors are also investing in natural language processing capabilities that allow the graph databases to ingest and work with unstructured information such as social media and company reports. These tools can offer services that allow the graph databases to identify the people, places, and things within documents that align with the entities in the graph.Graph databases with these features can associate information in documents with the structured information in the company databases to report on and identify trends in all types of information. This is another feature that sets graph databases apart
from relational databases. Their natural structure works well with a combination
of structured and unstructured data.
Integration With Relational DBs
Quite possibly the most important new feature that graph databases are rolling out is the integration with relational databases, called virtual graphs. In a virtual graph, each entity or node in the graph can be mapped to data in a relational database through an SQL query. It is this virtual graph capability that supports the data mesh concept that so many enterprises are adopting. This feature allows the graph to map common entities in the organization
to data elements found in corporate data wherever it resides. When setting up a data mesh solution, an ontology is defined that mirrors the way the business thinks about its organization.
The ontology includes entities or nodes and relationships that exist between each of them. For example, a manufacturing organization may have an ontology that maps factories to parts and parts to products and then products to customers. This ontology is then instantiated in the graph database as the model for how information is stored within the graph. Each of the different entities becomes a node in the graph. Virtual graphs then
allow each node to use SQL to map the node to the data element in whichever database that it resides. When a user queries the graph for information, the graph automatically retrieves the data from the relational database according to the SQL query in each node. These data mesh solutions eliminate the need for costly ETL processes and democratize access to data by organizing it in a way that makes sense to the business user.
Graph databases continue to struggle with performance at scale. Querying graph
databases that include terabytes of information remains too slow for many organizations. There are a number of approaches that have been taken to address this issue. GraphQL is a new query language that offers better performance, but it is not as flexible as SPARQL, so it is not a complete replacement. Products such as BlazeGraph and TigerGraph are specifically designed to operate at scale. Vendors, including Amazon, Microsoft, and data.world, are offering native cloud graph databases to simplify scalability. Finally, we are seeing more and more in-memory graph databases such as RDFox and Memgraph. Currently, the best way to address scalability is through careful architecture design or the use of specialized solutions. Given the effort that vendors in the industry are making to fix this problem, performance and scalability should continue to improve rapidly in the coming years.
Graph databases are powerful new tools for managing and analyzing heterogeneous
data across the enterprise. They are popular and maturing quickly. Most importantly, organizations are beginning to understand the specific use cases that graph databases solve well. Graph databases are now enterprise-ready and can be used to give organizations better access to data and information.